### casually decided to expound upon nanog@merit.edu the following thoughts

### about "representativeness of flow data based on samples":

For example, if I am trying to rank the top traffic sinks for my

network beyond an attached peer (i.e. an ordinal rather than cardinal

measurement), will I get different answers if I use a sampling rate

of 1:1000 compared to 1:50, given a statistically "long enough"

measurement period?

I suspect that it will just determine the smoothness of your statistics over

the long run which I assume is what you're interested in. I guess it will

depend on the ballpark expected packet flow. One might ask the question of

"how close do things seem/need to be?" One has to assume the sampling run

time is bigger than the sampling rate by a certain order of magnitude

because the amount of sampling error can be predicted as the square root of

the number of samples. So what does a per-sample loss mean to you? And how

much error can you tolerate? Figure that out and you can narrow in on an

appropriate sampling period.