Potentially on-Topic: is MSNBot for real?

On a website I host with nearly 9000 unique visits month-to-date (thats visits, not hits) a full 20% of the recorded 'hits' (Hitcount is ~40,000) are being generated by 'msnbot'. We see this as a large amount of http traffic from IP addresses owned by Microsoft.

I've actually seen this across a number of websites (including my own) but the guest on my server has raised the issue of loading being completely misproportionate to the perceived value of the visit - and asked about potentially blocking them off entirely.

Is this unusual, or what? Are search engines supposed to be amongst the biggest user agents recorded on a typical website? How much trolling and indexing is considered 'too much' ?

At what point to the search engines themselves become a menace - the load the cause outweighs the value of said load? (I'd like my cpu cycles to be for real people, please...)

Off-list thoughts on this welcome if the operational relevance of this issue is questioned...

Cheers
Mark.

Is this unusual, or what? Are search engines supposed to be amongst the
biggest user agents recorded on a typical website? How much trolling and
indexing is considered 'too much' ?

Whenever it becomes a problem.

If you don't have enough genuine traffic, and you don't have much, then the
search engines will look like they are dominating it, as they are pretty
thorough.

I've seen issues arise with some search bots, where they have discovered loops
in a websites structure and downloaded multiple copies, or found novels links
to dynamic content and indexed your entire database. So worth checking what
pages they have been to, to see if those could be an issue.

Off-list thoughts on this welcome if the operational relevance of this
issue is questioned...

Trust me, anything involving 40,000 hits is off-topic in Nanog, unless you
have reason to believe the same 40,000 are happening to everyone on the net,
or they took down 40,000 important websites.

Most of the regular are just getting in, so expect to be flamed mercilessly.

Is this unusual, or what? Are search engines supposed to be amongst the
biggest user agents recorded on a typical website? How much trolling and
indexing is considered 'too much' ?

Whenever it becomes a problem.

If you don't have enough genuine traffic, and you don't have much, then the
search engines will look like they are dominating it, as they are pretty
thorough.

I spose its all about scale. In a country of 4 million odd people, a website with a domestic focus in a niche area - 40,000 hits in 21 days is 'fair' IMHO.

I've seen issues arise with some search bots, where they have discovered loops
in a websites structure and downloaded multiple copies, or found novels links
to dynamic content and indexed your entire database. So worth checking what
pages they have been to, to see if those could be an issue.

Good point. Thanks for the pointer.

Off-list thoughts on this welcome if the operational relevance of this
issue is questioned...

Trust me, anything involving 40,000 hits is off-topic in Nanog, unless you
have reason to believe the same 40,000 are happening to everyone on the net,
or they took down 40,000 important websites.

Seeing stats on sites much bigger than my own helps put perspective on it, so i'm already grateful for those who've responded.

Most of the regular are just getting in, so expect to be flamed mercilessly.

Anythings gotta be better tham beating on Gadi, right?

=)

Mark.