Research - Valid Data Gathering vs Annoying Others

Hi NANOG folks,

We have a situation (which has come up in the past) that I'd like some opinions on.

Periodically, we have researchers who develop projects which will do things like randomly port probe off-campus addresses. The most recent instance of this is a group studying "bottlenecks" on the internet. Thus, they hit hosts (again, semi-randomly) on both the commodity internet and on I2 (abeline) to look for places where there is "traffic congestion".

The problem is that many of their "random targets" consider the probes to be either malicious in nature, or outright attacks. As a result of this, we, of course, get complaints.

One suggestion that I received fro a co-worker to help to mitigate this is to have the researchers run the experiments off of a www host, and to have the default page explain the experiment and also provide contact info.

We also discussed having the researchers contact ISPs and other large providers to see if they can get permission to use addresses in their space as targets, and then providing the ISPs with info from the testing.

How do you view the issue of experiments that probe random sites? Should this be accepted as "reasonable", or should it be disallowed? Something in between?

What other suggestions might you have about how such experiments could be run without triggering alarms?

Please send any suggestions directly to me and once I have some answers, I'll post a compilation to the list.

Thanks!

John

John K. Lerchey
Computer and Network Security Coordinator
Computing Services
Carnegie Mellon University

Hi NANOG folks,

We have a situation (which has come up in the past) that I'd like some
opinions on.

Periodically, we have researchers who develop projects which will do
things like randomly port probe off-campus addresses...

Here are some observations based on an internal corporate R&D project we
ran about 4 years ago that crawled all the websites on the Internet for
use with a search engine.

* Lower your impact. Limit the number of requests sent to a specific IP
within a time period. Limit how fast you make requests. Don't assume
adjacent IPs aren't the same server, don't make parallel requests to IPs
within the same /24. Limit the total number of requests you make to a
specific IP. Limit the amount of data transferred from each IP.

* Make sure to implement a block list to avoid scanning people that ask
you to stop.

* Make your hostname something that helps explain what you are doing.

* Make sure that other people in your group know that you are running the
experiment and who to forward phone calls to.

* Run a webserver on the IP or IPs that are doing the scanning explaining
what you are doing.

* Honor robots.txt, and other "access denied" type responses or error
codes.

* Don't assume the data returned is valid or nonhostile. Some people run
search engine traps (infinitely large programmatically generated websites)
to try to salt the search engines with their bogus advertising data.
Some people want to crash any program that scans them. Some people will
do things you didn't think of.

* Expect some people to send automated complaints without knowing that
they are sending them and without understanding the contents of the
complaints they are sending.

* Expect some people to complain about you attacking them on port 53 when
you look up the address for their domain name, even if you never scan
their website or otherwise interact with any of their IPs. (During the
experiment this was the largest source of complaints.)

* If you run the project 24 x 7, you need to respond 24 x 7.

Mike.

+----------------- H U R R I C A N E - E L E C T R I C -----------------+