Anyone got a contact at OpenAI. They have a spider problem.

As I think I have mentioned before, I have the world's lamest content farm
at https://www.web.sp.am/. Click on a link or two and you'll get the idea.

Unfortunately, GPTBot has found it and has not gotten the idea. It has
fetched over 3 million pages today. Before someone tells me to fix my
robots.txt, this is a content farm so rather than being one web site
with 6,859,000,000 pages, it is 6,859,000,000 web sites each with one
page. Of those 3 million page fetches, 1.8 million were for robots.txt.

It's not like it's hard to figure out what's going on since the pages
all look nearly the same, and they're all on the same IP address with
the same wildcard SSL certificate.

Amazon's spider got stuck there a month or two ago but fortunately I was
able to find someone to pass the word and it stopped. Got any contacts
at OpenAI?

R's,
John

PS: If you were wondering what they're using to train GPT-5, well, now you know.

"John Levine" <johnl@iecc.com> writes:

PS: If you were wondering what they're using to train GPT-5, well, now you know.

And it's not very trainable, it seems? I believe most amoebas respond
better to 3 million identical events...

Bjørn

Amazon's spider got stuck there a month or two ago but fortunately I was
able to find someone to pass the word and it stopped. Got any contacts
at OpenAI?

why? you are doing a societal good by ensnaring them. dig a deeper
hole.

randy

Yeah. Who should we donate compute resources to so that we can further this project?

A random image along with a different random ALT tag and description would be a nice touch.

“The world hates change, yet it is the only thing that has brought progress.”-Charles Kettering

You’ll recall that the proliferation of the Internet was met with similar resistance. There’s some fascinating irony here.

-Matt