What does 95th %tile mean?

I've gotten myself into an argument with a provider about the definition of
'industry-standard 95th percentile method.'

To me, this means the following:

a) take the number of bytes xfered over a 5 minute period, and determine
rate for both the inbound and outbound. Store this in your favorite
data-store.

b) at billing time, presumably on the first of the month or some other
monthly increment, take all the samples, sort them from greatest to least,
hacking off the top 5% of samples. Actually, this is done twice, once for
inbound, once for outbound. Then, take the higher of those two, and multiply
it by your favorite $ multiple (ie, $500 per megabit per second, or $1 per
kilobit per second, etc).

I think that most people agree with the above; the issue we are running into
is one rogue provider who is billing this at in + out, not the greater of in
or out.

How is everyone else doing it? Specifically, larger folks (UU, Sprint, CW,
Exodus/FGC, GX, Qwest, L3)

Thanks!

Hi Alex,

  I work as an engineer in the product development group at Telseon. I'm
curious about what feedback you get, especially what method the other
providers you list use to calculate the 95th percentile.

  For what it's worth, I agree with you and the method you mention. I'd
be surprised if others in that league are doing it differently.

  Telseon doesn't bill using the 95th percentile method though. We let
the customer adjust their bandwidth on the fly and bill them for what they
provision. Since they can reprovision via a web interface they can jump
around (5 megs one day 500 megs the next).

Thanks in advance for any info you garner.

-Sean Morrison

FWIW, Abovenet handles it exactly like you mentioned -
Max(95%(out),95%(in)), and I'm fairly certain UUnet does as well. At
least, the weekly transfer stats they mail us seem to indicate that, but
we don't have a contract that would make use of transfer stats, so it is
possible that it varies, but I don't think so.

Andy

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Andy Dills 301-682-9972
Xecunet, LLC www.xecu.net
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Dialup * Webhosting * E-Commerce * High-Speed Access

Sorry to followup my own post, but I just received my weekly utilization
report. Here is the statement at the bottom of that email:

"The statistics for Tiered and Burstable customers are derived by taking a
sample of the customer traffic every five minutes on the in and out
packets sent/received on the customer's UUNET connection. These
statistics are then aggregated over a (daily, weekly, monthly) period
where the top (20%, 5%, 1%) of the traffic is discarded to arrive at the
(P80, P95, P99) flow statistic. Metered customer's statistics are derived
by calculating the total in and out octets on the customer's UUNET
connection."

Everything makes sense until the last line. "Calculating the total in and
out octets" could imply either max or sum, so it probably depends on the
contract, which seems to be the bottom line to this thread anyhow.

Andy

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Andy Dills 301-682-9972
Xecunet, LLC www.xecu.net
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Dialup * Webhosting * E-Commerce * High-Speed Access

How is everyone else doing it? Specifically, larger folks (UU, Sprint, CW,
Exodus/FGC, GX, Qwest, L3)

I believe (but am not positive) that Exodus does billing on 95in + 95out,
where 95th in and 95th out are calculated separately for the month and
then added together.

What also might be interesting is to see if the burstable rates charged
by the various providers differ based on the calculation type. For
example, if you did the calculation as 95in+95out, then you could charge
less per "byte" than someone who did Max(95in,95out) - but end up
charging the same customer more each month... Since most of the people
signing the contracts don't have a clue about how burstable is calculated,
I can see marketing at a provider saying "go with us because our rates are
cheaper" when in fact they are more expensive... Just my $0.02...

Eric :slight_smile:

[ On Thursday, April 19, 2001 at 12:35:29 (-0400), Eric Gauthier wrote: ]

Subject: Re: What does 95th %tile mean?

I can see marketing at a provider saying "go with us because our rates are
cheaper" when in fact they are more expensive... Just my $0.02...

Any person in any department, marketing or otherwise, that actually does
that should be put up on fraud charges ASAP (well if the customer signs,
I guess). That's not a case of "buyer beware", that's flat out lying.

Of course if the "engineers" tell marketing that story and then marketing
just passes it on, well you've got to be sure you get the right culprit.

[ On Thursday, April 19, 2001 at 12:13:34 (-0400), Andy Dills wrote: ]

Subject: Re: What does 95th %tile mean?

Everything makes sense until the last line. "Calculating the total in and
out octets" could imply either max or sum, so it probably depends on the
contract, which seems to be the bottom line to this thread anyhow.

When I see the words "total" and "and" used like that it can only mean
that addition is the operation of choice.

Inserting the missing word "of" in there might help:

  Calculating the total of in and out octets ...

There's a nice description/example of the 95th percentil usage calculation
on uunet's page in the burstable access section at

http://www.uu.net/ca/products/uudirect/burstable/

I know one company in Europe that uses the in + out model.

Thomas

AT&T's policy for measured burstable service looks something like this:

The Provider Access Router is polled every 5 minutes for total octets in and
total
octets out.
Data is divided by 300 (the number of seconds in a 5 minute period),
giving two averages (one in, one out) for the previous 5 minute period

These averages become data points, which are tracked over the course of
the customer's monthly billing cycle. Top 5% of the data points are
disregarded (be they IN
or OUT).

We bill at the 95% level of usage

Michelle Truman

Isn't in+out a more fair representation of usage? I've always assumed that
this was the standard to be honest. Thank god I'm not the billing person.
I think Exodus does in+out.

-M

[ On Thursday, April 19, 2001 at 16:07:37 (-0400), Martin Hannigan wrote: ]

Subject: Re: What does 95th %tile mean?

Isn't in+out a more fair representation of usage? I've always assumed that
this was the standard to be honest. Thank god I'm not the billing person.
I think Exodus does in+out.

Either (in+out) or MAX(in,out) should be an equally fair measure of
usage, at least from the customer's perspective. The difference is in
the pricing, and if both the customers and the vendors are not equally
aware of the particular computation used by each other then it's
impossible to know what's competetive and what's a rip-off (accidental
or otherwise).

That's true of any form of usage-based billing too -- i.e. for either
bulk throughput pricing (octets per period), or Nth percentile pricing.

Some ISPs have un-balanced in/out loads though and those that do can
usually afford to sell whichever they've got in surplus at a lower
price. A wise ISP might attract more wise customers by offering
separate pricing strucutres for in and out traffic, or they might offer
"free" services in whichever direction they can (eg. a primarily
access-only provider offering to host mailing lists, FTP archives, etc.;
or hosting providers offering to provide access POPs for charity groups,
etc.).

The 95% reading always struck me as a randomly generated number in any case.

Take an extreme example - a customer operates a wire such that both in and out are at line rate for five minutes, and then both in and out are idle for five minutes, continually.

Depending on the synchronization between the burst pattern and the sampling system, and the sampling technique itself, the 95% reading can be zero, half the line rate, or the line rate, and all answers are equally valid in some sense.

While real situations do not exhibit such a large range of potential variability (i.e. 100%), there is still a hefty level of variation in a 95% reading due to the interactions between the time base of the traffic, the time base of the meter engine and the sampling technique used by the meter engine.

It leads to the situation where the provider confidently asserts that the 95% value was xkbps, and the customer confidently asserting ykbps and both readings are equally valid, with both measurements using the _same_ measurement technique. How is the consequent billing dispute resolved _fairly_?

Isn't in+out a more fair representation of usage? I've always assumed
that
this was the standard to be honest. Thank god I'm not the billing person.
I think Exodus does in+out.

-M

  It depends upon the cost model for your provider. For most providers,
outbound bandwidth is at more of a premium, so it doesn't make sense to
charge you for more of the expensive bandwidth just because you use more of
the cheap bandwidth.

  DS

They don't take a one-second sample every five minutes, they take the
five-minute average rate measured by their router.

Unless they're insane, or their routers don't support that. I dunno who
makes routers that don't support that, though.

They don't take a one-second sample every five minutes, they take the

> five-minute average rate measured by their router.
>
> Unless they're insane, or their routers don't support that. I dunno who
> makes routers that don't support that, though.

Sorry, perhaps I didn't make the extreme example sufficiently clear:

In the extreme I cited, (full rate for 5 minutes, idle for five minutes, repeated), the five minute average rate oscillates between zero and full line rate. The period of oscillation is 10 minutes (i.e. five minutes for the five minute rate to decay from line rate to zero and fine minutes to build back to line rate).

Now if you sample every five minutes, and the sample point is synchronized to the peak and trough of the five minute rate you will get successive readings of 'line rate', zero, 'line rate', zero, etc. The 95% sample value will be 'line rate'.

If you change _nothing_ except shift the sample point two and one half minutes forward in time the sample points will consistently produce outcomes of 'half line rate', 'half line rate', ..., and the 95% point is 'one half of line rate'.

Same algorithm, same raw data, different 95% answers, both valid, yet one is twice as large as the other. Great outcome for a billing system isn't it?

(The comment in my earlier note about getting a zero reading requires using something other than a 5 minute average data rate. The point I'm trying to make in this posting is that even if you do the 'right' thing and collect interface data readings every five minutes and do the first order differentials yourself to get the five minute data rates, the 95% 'answer' is still variable.)

When you purchase a DS1, you're purchasing 1.5Mb/s. That means, 1.5Mb/s
in BOTH directions. If the circuit was supposed to be billed as 3Mb/s,
they would claim 3Mb/s linerate.

Ethernet, ATM, blah blah blah works the same way. ADSL and cable modems
are the strange mediums that are not SYMETRIC.

IMHO, 1Mb/s means 1Mb/s IN, OUT or BOTH.

[ On Friday, April 20, 2001 at 08:03:02 (+1000), Geoff Huston wrote: ]

Subject: Re: What does 95th %tile mean?

The 95% reading always struck me as a randomly generated number in any case.

Huh? It's a simple and mathematically sound and highly repeatable and
auditable way of drawing a line on the usage graph that says something
like: If you were to have had a fixed-rate connection this is the
bandwidth that you would have required over the previous billing period
in order to have obtained effectively the same level of performance as
you actually enjoyed over that period. The only trick (from the
customer P.O.V.) is in understanding that this is what you're buying and
in realising that if you use it then you will pay for it. It probably
works best for links that have aggregated traffic (eg. for 1st and 2nd
tier providers).

Depending on the synchronization between the burst pattern and the sampling
system, and the sampling technique itself, the 95% reading can be zero,
half the line rate, or the line rate, and all answers are equally valid in
some sense.

Perhaps you need to learn that the "bit rate" values used in deriving an
N'th percentile value are first calculated by counting the number of
octets that crossed an interface since the last sample was taken and
dividing by the amount of time since that last sample was taken (and
then adjusting with a multiplier for different units, eg. octets vs.
bits or whatever). In other words the bit rate values are taken as the
average rate over the specified sample time. No data is thrown away or
ignored -- every single byte is counted and every count is critical to
finding the correct N'th percentile value.

There's absolutely nothing in the way of synchronisation required and
indeed there's no such thing as a "burst pattern" when you consider that
at any given instant in time an octet will cross a (to pick a specific
example) 10-mbit interface at ten megabits per second! How else can you
imagine measuring the bit rate utilisation of a fixed-rate pipe?

The same N'th percentile measurement can always be calculated from
either end of a pipe so long as the sample interval is the same at both
ends, and so long as the pipe has no (measurable) loss. If there's
measurable loss then you'd better measure it and take it into account or
else you will end up with unfair billing.

In fact the very same octet-count measurements are needed for any kind
of usage-based billing. The only difference with N'th percentile
metering is that the sample time needs to be short enough to catch
user-noticable bursts (i.e. to avoid averaging out bursts that were they
to be flattened out to the average rate would be noticable to the user).
For most currently used IP services this might be somewhere between 5
seconds and 60 seconds. For straight bulk throughput billing you only
need to sample often enough to aoivd missing counter roll-over or
counter reset events.

Same algorithm, same raw data, different 95% answers, both valid, yet one
is twice as large as the other. Great outcome for a billing
system isn't it?

  Any billing scheme based upon statistical sampling will, with some
probability, err in the favor of one party or the other randomly. But it is
important that the customer understands that he is being billed based upon
statistical sampling and thus there are no "exact" measurements.

  I've looked at other ways and can't find any better. Billing based upon
NetFlow, for example, is still statistical sampling since NetFlow loses a
percentage of flows. For example, one of my VIP2-50's says:

  368351628 flows exported in 12278484 udp datagrams
  33838 flows failed due to lack of export packet
  269989 export packets were dropped enqueuing for the RP
  108825 export packets were dropped due to IPC rate limiting

  Billing based upon total bytes transferred tends to create similar
problems. Do you bill based upon bytes transferred per day? Per month? If
so, it's still statistical sampling if you have some amount of 'paid
bandwidth'.

  And you can't collect this data from interfaces because interface rates
include local traffic, which (for example) grossly overbills customers with
newsfeeds.

  I think there would be a market for a device with two GE interfaces that
accounted for everything that passed through the two interfaces in a
reliable and configurable way. It would have to be capable of fault-tolerant
operation with multiple units. It would have to be free too. :wink:

  DS

I think its the last part of this statement about 'paid bandwidth' which is the bit that may make your statistical sampling comment , but I'm unsure if your 'paid bandwidth' is the same as the one thats in my head.

In general (minus 'paid bandwidth' and taking the view that all bytes passed between the customer and the provider have the same billable value) byte transferred systems are more reliable if you take as your yardstick of 'reliability' that the same algorithm applied to the same raw data should yield the same result. As long as both parties can agree (precisely) when the measurement interval starts and stops, of course.

Of course if you then want to complicate the picture by attaching different billing rates to different packets, then once more the complexity rises and the accuracy tends to drop.