Yahoo! clue

Simon_Waters1 · March 29, 2007, 11:31am

Is there a Yahoo! abuse contact around who will talk, and not just sended me
canned responses?

Their abuse team seems very responsive, but I fear they don't actually read
the whole email, but just hit the button for the most appropriate "canned
response" as soon as they think they know what is being said. (Let he who is
without sin here, cast the first stone).

Thanks,

Simon

Dennis_Dayman3 · March 29, 2007, 12:26pm

Simon Waters wrote:

Is there a Yahoo! abuse contact around who will talk, and not just sended me canned responses?

Their abuse team seems very responsive, but I fear they don't actually read the whole email, but just hit the button for the most appropriate "canned response" as soon as they think they know what is being said. (Let he who is without sin here, cast the first stone).

Thanks,

Simon

Sent this to Yahoo! abuse contact. Someone should be in touch shortly

Kradorex_Xeron · March 29, 2007, 1:05pm

Slightly OT: Does anyone know what is with the web spiders from Yahoo/Inktomi?
I've been seeing reports and have seen a problem with them opening 10 to 100
connections to any specific site.

Valdis_Kletnieks · March 29, 2007, 1:15pm

And 10 concurrent connections (or 100) causes a production-quality webserver
difficulties, how, exactly?

Kradorex_Xeron · March 29, 2007, 2:17pm

True - however:

It may cause certain sites to go over quota for transfer (even if you do rate
limit them via robots.txt). As well as it could cause servers that limit
to "x number of connections at once" (i.e. some public file hosting servers
that don't alow more than x users at once) to lock out legitamate requests -
which if you per-se don't control the robots.txt of such sites, you would be
unable to get access that site.

Another problem is that the Yahoo/Inktomi search robots do not stop if no site
is present at that address, Thus, someone could register a DNS name and have
a site set on it temporarily, just enough time for Yahoo/Inktomi's bots to
notice it, then redirect it thereafter to any internet host's address and the
bots would proceed to that host and access them over and over in succession,
wasting bandwidth of both the user end (Which in most cases is being
monitored and is limited, sometimes highly by the ISP), and the bot's end
wasted time that could have been used spidering other sites.

People shouldn't need to protect themselves from search engine bots, The
Internet already has enough problems as it is with Spam and Botnets among
other items, Search engine bots with large pipes don't need to be on that
list of nuicences as well.

But that aside, from what I've seen, no other search engine takes that
aggressively toword sites. -- I was just curious as to why Yahoo/Inktomi's
bots are so aggressive (Even more than Google, MSN and such), I reviewed
their site's reason, however, the others do review millions/billions as well.

Apologies if my postings are unclear.

Zach_White1 · March 30, 2007, 5:17pm

It's not limited to that. I bought this domain which had previously been
in use. I've owned the domain for over 5 years, but I still get requests
for pages that I've never had up.

<zwhite@leet:/var/www/logs:8>$ grep ' 404 ' access_log | grep
darkstar.frop.org | awk '/Yahoo/ { print $8 }' | wc -l
830
<zwhite@leet:/var/www/logs:9>$ grep ' 404 ' access_log | grep
darkstar.frop.org | awk '/Yahoo/ { print $8 }' | sort -u | wc -l
82

That's 82 unique URLs that have been returning a 404 for over 5 years.
That log file was last rotated 2006 Sep 26. That's averaging 138
requests per month for pages that don't exist on that one domain alone.
How many bogus requests are they sending each month, and what can
we do to stop them? (The first person to say something involving
robots.txt gets a cookie made with pickle juice.)

Sure, on my domain alone that's not a big deal. It hasn't cost me any
money that I'm aware of, and it hasn't caused any trouble. However, it
is annoying, and at some point it becomes a little ridiculous.

Can anyone that runs a large web server farm weigh in on these sorts of
requests? Has this annoyance multiplied over thousands of domains and
IPs caused you problems? Increased bandwidth costs?

-Zach

Matthew_Petach2 · June 5, 2007, 8:57pm

Speaking purely for myself, and not for any other organization, I would
wonder what level of response you had gotten from the abuse address
listed in the requesting netblock:

mpetach@netops:/home/mrtg/archive> whois -h whois.ra.net 74.6.0.0/16
route: 74.6.0.0/16
descr: YST
origin: AS14778
remarks: Send abuse mail to slurp@inktomi.com
mnt-by: MAINT-AS7280
source: RADB
mpetach@netops:/home/mrtg/archive>

First line of inquiry in my mind would be to use the slurp@
email, and work my way along from there.

Matt