For everybody who is "monitoring" other people's websites, please please
please, monitor something static like /robots.txt as that can be
statically served and is kinda appropriate as it is intended for robots.
Depends on what you are monitoring. If you're looking for layer 4 ipv6
connectivity then robots.txt is fine. If you're trying to determine
whether a site is serving active content on ipv6 and not serving http
errors, then it's pretty pointless to monitor robots.txt - you need to
And as can be seen with the monitoring of ipv6.level3.com it will tell
you 'it is broken' but as the person who is monitoring has no relation
or contact with them, it only leads to public complaints which do not
If the site themselves cannot be arsed to monitor their own, then why
would you bother to do so.
Indeed, I agree that it can be useful, especially as an access ISP, to
monitor popular websites so that you know that you can reach them, but
that does not mean you need to pull large amounts of data.
(for determining MTU issues yes, but likely you have a full 1500 path
anyway thus these should as good as possible not happen anyway)
But unless you have a contact at the site it will be tough to resolve
the issue anyway.
Oh and of course do set the User-Agent to something logical and to be
super nice include a contact address so that people who do check their
logs once in a while for fishy things they at least know what is
happening there and that it is not a process run afoul or something.
Good policy, yes. Some robots do this but others don't.
Of course, asking before doing tends to be a good idea too.
Depends on the scale. I'm not going to ask permission to poll someone
else's site every 5 minutes, and I would be surprised if they asked me the
same. OTOH, if they were polling to the point that it was causing issues,
that might be different.
I was not talking about that low rate, not a lot of people will notice
that, but the 1000qps from 500 sources was quite noticed and thus at
first they got blocked, then we tried to find out who was doing it, and
then they repointed to robots.txt, unblocked them and all was fine.
The IPv6 Internet already consists way too much out of monitoring by
pulling pages and doing pings...
"way too much" for what? IPv6 is not widely adopted.
In comparison to real traffic. There has been a saying since the 6bone
days already that IPv6 is just ICMPv6...
Fortunately that should heavily change in a few months.
We've been saying this for years. World IPv6 day 2012 will come and go,
and things are unlikely to change a whole lot. The only thing that World
IPv6 day 2012 will ensure is that people whose ipv6 configuration actively
interferes with their daily Internet usage will be self-flagged and their
configuration issues can be dealt with.
Fully agree, but at least at that point nobody will be able to claim
that they can't deploy IPv6 on the access side as there is no content
(who noticed a certain s....h company performing latency checks against
one of his sites, which was no problem, but the fact that they where
causing almost more hits/traffic/load than normal clients was a bit on
the much side
If that web page is configured to be as top-heavy as this, then I'd suggest
putting a cache in front of it. nginx is good for this sort of thing.
nginx does not help if your content is not cacheable by nginx, for
instance if you simply show the IP address of the client and if they
thus have IPv6 or IPv4.
In our case, indeed, everything that is static is served by nginx, which
is why hammering on /robots.txt is not an issue at all...
For everybody who is "monitoring" other people's websites, please
please please, monitor something static like /robots.txt as that
can be statically served and is kinda appropriate as it is intended
This could provide a false positive if one is interested in ensuring
that the full application stack is working.
As stated above and given the example of the original subject of
ipv6.level3.com, what exactly are you going to do when it does not?
And again, if the owner does not care, why should you?
Also, maybe they do a redesign of the site and remove the keywords or
other metrics you are looking for. It is not your problem to monitor it
for them, unless they hire you to do so of course.
Oh and of course do set the User-Agent to something logical and to
be super nice include a contact address so that people who do check
their logs once in a while for fishy things they at least know what
is happening there and that it is not a process run afoul or
A server side process? Or client side?
Take a guess what something that polls a HTTP server is.
If the client side monitoring
is too aggressive , then your rate limiting firewall rules should
kick in and block it. If you don't have a rate limiting firewall on
your web server, (on the server itself, not in front of it) then you
have bigger problems.
You indeed will have a lot of problems when you are doing connection
tracking on your website, be that on the box itself or in front of it in
a separate TCP state engine.
Of course, asking before doing tends to be a good idea too.
If you are running a public service, expect it to get
monitored/attacked/probed etc. If you don't want traffic from
certain sources then block it.
That is exactly what happened, but if they would have set a proper
user-agent it would not have taken time to figure out why they where
There is a big difference between malicious and good traffic, people
tend to want to serve the latter one.
The IPv6 Internet already consists way too much out of monitoring
by pulling pages and doing pings...
Who made you the arbiter of acceptable automated traffic levels?
And as you state yourself, if you do not like it, block it, which is
what we do. But that was not what this thread was about, if you recall,
it started with noting that you might want to ask for permission and
that you might want to provide proper contact details in the probing.
(who noticed a certain s....h company performing latency checks
against one of his sites, which was no problem, but the fact that
they where causing almost more hits/traffic/load than normal
clients was a bit on the much side,
Again. Use a firewall and limit them if the traffic isn't in line
with your site policies.
I can only suggest running a site once with more than a few hits per
second that is distributed around the world and with actual users
And for the few folks putting nagios's on other people's sites,
they obviously do not understand that even if the alarm goes off
that something is broken that they cannot fix it anyway, thus why
You obviously do not understand why people are implementing these
Having written various monitoring systems I know exactly why they are
doing it. I also know that they are monitoring the wrong thing.
It's to serve as a canary for v6 connectivity issues.
Just polling robots.txt is good enough for that.
Asking the site operator if it is good with them is also a good idea.
Providing contact details in the User-Agent is also a good idea.
was implementing a monitor like this, I'd use the following logic:
HTTP 200 returned via v4/v6 == all is well HTTP 200 returned via v4
or v6 , no HTTP code returned via v4 or v6 (ie one path works) ==
v6/v4 potentially broken. no HTTP code returned via either method ==
end site problem. nothing we can do. don't alert.
And then you get an alert, who are you going to call?
Presumably you'd also implement a TCP 80 check as well.
Ehmmm, you do realize that if you are able to get a HTTP response that
you have (unless doing HTTPS) actually already contacted port 80 over