Google captcha issue

We run a smaller ISP of about 7.5k customers and the other day we got an email (excerpt below) from one of Google's automated tools.

We are seeing automated scraping of Google Web Search from a large
number of your IPs. Automated scraping violates our /robots.txt file
and also our Terms of Service. We request that you terminate this
traffic immediately. Failure to do so may cause your network to be
blocked by our abuse systems.

To allow you to identify the traffic, we are providing a list of
your IPs they used today (Source field), as well as the most common
destination (Google) IP and port and a timestamp of a recent request
(in UTC) to aid in your identification. Note that this list may not
be exhaustive, and we request that you terminate all such traffic, not
just traffic from IPs in this list.

All of the destination ports are either 80 or 443, so they at least appear to be legit web traffic on the surface. They are obviously spoofed IP address as there are network addresses in the list and the IP belongs to a router that doesn't appear to be compromised in any way. The initial letter included 700+ IP addresses from our network.

It's now affecting our customers as they are now getting Captcha's for every couple of Google searches that they perform.

Does anyone know of a good way to track the perpetrator(s) down and/or know of a way to mitigate this?

Hi Christopher,

Presumably Google is smart enough to know the difference between
spoofed port scanning and completed TCP connections performing a web
search. If you take Google's report at face value, the addresses
aren't spoofed; something else is happening. The question is how.

There was a company revealed on Nanog earlier this year (or maybne
last year, I'm not great with dates) which contracts small ISPs and
virtual server providers to use their "spare bandwidth" to
pseudonymously originate web requests. They don't require you to
assign them IP addresses because they overload their activity on all
of your IP addresses. In theory they do this without disturbing your
customers and only access web sites whose owners have contracted them
to do so, generally to test connectivity. In practice, there's a
device inline with your traffic flow that injects TCP connections and
captures the associated return packets across your entire address
space. Including, for example, your routers' IP addresses.

Do you, or perhaps your upstream have such a contract?

Bill Herrin


Do you, or perhaps your upstream have such a contract?

I'd be pretty unhappy if someone that I'm paying for transit spoofs traffic
with my IP space as the source.



I don't think william's description is 'spoofing', it's perhaps:
  "Manufacturing hosts on the fly"

is still skeezy though ;(

Pretty sure that we traced to a service DiviNetworks that uses "unused" IP space/bandwidth and tracks Invalid DNS queries. Thanks for all of the input and assistance, special thanks to William Herrin who pointed this out.