95th Percentile again (was RE: C&W Peering Problem?)

Richard_A_Steenbegen · June 2, 2001, 10:49pm

Exodus is the worst on billing bit for bit. The way I read the Exodus 95th
percentile document (though I still havn't gotten it confirmed by a person
who actually knew what they were talking about), they bill for the
MAXIMIUM 95th percentile, on both inbound AND outbound.

UUNet bills for the MAXIMIUM 95th percentile on inbound OR outbound,
whichever is higher, as does AboveNet, and probably the majority of
networks trying to be like UUNet. But a significant portion of other
networks will bill for the AVERAGE under the 95th percentile.

I think the MAXIMIUM is unclear to a lot of people, especially the sales
people if you try to get a straight answer out of them. When most people
refer to the UUNet "95th percentile", this is what they mean. You take
traffic samples, line them up in order, lop off the top 5%, and whatever
the number is for the sample right under that is what gets multiplied by
the cost per mbit. This means that if you push 1Mbps for 25 days and
10Mbps for 5 days you will pay 10 * $$$ per mbit. Average means you will
pay 2.5 * $$$ per mbit (((1 * 25) + (10 * 5)) / 30), obviously a major
difference.

A LOT of sales people are misleading or utterly clueless about this, and a
lot of providers actually WILL bill you for the AVERAGE under the 95th
percentile (though if you think about it the 95th percentile makes little
sense if you average it, it was designed to extract the maximium amount
of money while not making people utterly afraid to burst higher).

Moral of the story, check for those words "AND" vs "OR", and "MAXIMIUM" vs
"AVERAGE", ask for their boss and check it again, and then get it in
writing.

Timothy_Brown2 · June 2, 2001, 9:28pm

As an interesting aside to this discussion, Digital Island bills for total traffic transmitted per month (in GB increments). Does anyone using them have any comments on this approach besides the obvious? Does anyone else do a similar deal?

Thanks,
Tim

E.B_Dreger · June 2, 2001, 11:43pm

Date: Sat, 2 Jun 2001 17:28:52 -0400
From: Timothy Brown <tcb@ga.prestige.net>

As an interesting aside to this discussion, Digital Island bills for
total traffic transmitted per month (in GB increments). Does anyone
using them have any comments on this approach besides the obvious? Does
anyone else do a similar deal?

I only care to mention the obvious... this is essentially the same type of
billing as average-use total traffic billing. Total traffic in + out,
just not divided by number of days in a month.

I can't recall names, but I believe that several colo shops (space +
bandwidth, not carrier-neutral, a la Exodus) do this.

IMHO, 95th percentile has its drawbacks. Sure, one can charge more for
"peaky" customers than with average-use billing, but that can backfire in
extreme cases: Recall when the Starr Report was released... 5% of a month
is 1.5 days, so the heavy traffic during that time was simply "above the
cutoff".

Eddy

Al_Reuben · June 3, 2001, 12:22am

I believe, as well, that 95th %tile billing is quite dumb, and there are
better measurements (gigs, average (which, remember is not 50th %tile)),
and there are no measurements at all ($x for y mb/s, whether you use it or
not).

Then again, VHS beat out BetaMax.

-- Alex Rubenstein, AR97, K2AHR, alex@nac.net, latency, Al Reuben --
-- Net Access Corporation, 800-NET-ME-36, http://www.nac.net --

Richard_A_Steenbegen · June 3, 2001, 1:39am

> Date: Sat, 2 Jun 2001 17:28:52 -0400
> From: Timothy Brown <tcb@ga.prestige.net>
>
> As an interesting aside to this discussion, Digital Island bills for
> total traffic transmitted per month (in GB increments). Does anyone
> using them have any comments on this approach besides the obvious? Does
> anyone else do a similar deal?

I only care to mention the obvious... this is essentially the same
type of billing as average-use total traffic billing. Total traffic
in + out, just not divided by number of days in a month.

I can't recall names, but I believe that several colo shops (space +
bandwidth, not carrier-neutral, a la Exodus) do this.

Of course any system which bills for actual usage is pretty much
statistically fair, regardless of whether its measured in the average of a
rate or total amount sent/received. In my experience, people who bill in
actual GB transfered tend to inflate it substantially to abuse those who
can't do math, but there's nothing wrong with it as a system. I think
people are more used to comparing price in $$$ per Mbit/sec though.

$1 per gigabyte is equivalent to $316/Mbit fairly averaged.

IMHO, 95th percentile has its drawbacks. Sure, one can charge more
for "peaky" customers than with average-use billing, but that can
backfire in extreme cases: Recall when the Starr Report was
released... 5% of a month is 1.5 days, so the heavy traffic during
that time was simply "above the cutoff".

I'm pretty sure they make out like bandits everytime there is a major
spike like that. Maybe the absolute peak was shorter then 1.5 days, but 2
days later I'm sure there were still people hitting it enough to lock in a
very good peak for that month. Unless the customer is specifically trying
to game the system by bursting only for 4.9% worth and using inbound
traffic to match, they pretty much always win. But I don't think that's
unfair, if 95th percentile is the rules they wanna play by to make money
off the unsuspecting then they should play by it for the plotting as well.

Joe_Abley · June 3, 2001, 1:58am

This may be obvious, but billing by volume (bytes transferred) places far
greater availability requirements on the measurement system than rate-based
charging schemes.

If I am charging by the byte, I have to count every packet. If my measurement
system breaks, I lose money until it is fixed.

If I am charging by the 95%tile of five-minute average throughput measurements
obtained during a calendar month, I can make do with much more coarse-grained
sampling. Measurement system breaks, I'm quite possibly going to bill the
same amount as if it hadn't broken.

Do Digital Island contracts specify any interpolation they are permitted to
do in the event that their traffic data acquires black spots? Or is their
measurement platform good enough to be able to count every packet reliably
without loss?

As to other examples, volume charging is still quite common in New Zealand;
people have been counting bytes and charging by the gigabyte there since
the first 9k6 circuit connecting the University of Waikato to NASA went live
in April 1989. See

http://www2.auckland.ac.nz/net/Accounting/nze.html

"New Zealand Experiences with Network Traffic Charging" by Nevil Brownlee
if you're interested in the history.

Joe

Richard_A_Steenbegen · June 3, 2001, 2:15am

No, you are confused. A rate based billing system polls a byte counter on
a switch or router at set intervals (ex: every 5 mins), subtracts the
previously recorded value, and divides by the number of seconds in that
interval. If the polling system cannot reach the device it is monitoring,
samples can be missed, this is a very old problem of rate-based
monitoring. Every rate-based system of which I am aware orders these
"samples" to calculate 95th percentile, so a missed sample is equivilent
to a 0 sample. A rate can be interpolated for the missing time, but it is
pretty much guaranteed not to be accurate, and I'd suspect a case could be
made against a provider who "makes up numbers" because of a failure in
their billing system.

A volume based billing system on the other hand, could theoretically poll
only once a billing period. In reality it would probably poll more often,
both to keep the customer apprised of their currently used amount, and to
prevent the possibility of counter rollovers, but it would never "miss" a
billing sample.

Joe_Abley · June 3, 2001, 2:36am

> > As an interesting aside to this discussion, Digital Island bills for
> > total traffic transmitted per month (in GB increments). Does anyone
> > using them have any comments on this approach besides the obvious? Does
> > anyone else do a similar deal?
>
> This may be obvious, but billing by volume (bytes transferred) places far
> greater availability requirements on the measurement system than rate-based
> charging schemes.
>
> If I am charging by the byte, I have to count every packet. If my measurement
> system breaks, I lose money until it is fixed.
>
> If I am charging by the 95%tile of five-minute average throughput
> measurements obtained during a calendar month, I can make do with much
> more coarse-grained sampling. Measurement system breaks, I'm quite
> possibly going to bill the same amount as if it hadn't broken.

No, you are confused.

No, just viewing the world from a strange perspective

A rate based billing system polls a byte counter on
a switch or router at set intervals (ex: every 5 mins), subtracts the
previously recorded value, and divides by the number of seconds in that
interval.

Yes. I referred to the result of that calculation as the "five-minute
average throughput measurement", but I was being more general about the
mechanics of measurement -- in some cases there are no counters to poll
(see below).

If the polling system cannot reach the device it is monitoring,
samples can be missed, this is a very old problem of rate-based
monitoring. Every rate-based system of which I am aware orders these
"samples" to calculate 95th percentile, so a missed sample is equivilent
to a 0 sample.

No. If you are missing a "five-minute average throughput measurement"
for some reason, you just have fewer samples to sort at the end of the
month. Chances are you still have a reasonable approximation of the
95%ile sample value, if you don't miss too many.

A rate can be interpolated for the missing time,

I agree, that would be yucky.

[...]

A volume based billing system on the other hand, could theoretically poll
only once a billing period. In reality it would probably poll more often,
both to keep the customer apprised of their currently used amount, and to
prevent the possibility of counter rollovers, but it would never "miss" a
billing sample.

If you have bytes-in/bytes-out counters to poll, then you're totally right.
[you also have to deal with counters being reset to zero due to mysteriously
exploding router issues].

There are cases where there are no such counters, however, such as customers
who obtain transit through a shared ethernet/FDDI/ATM interface, and where
equivalent counters are not available at layer-2 (e.g. someone else runs
the switches, switches suck, etc).

The last time I worried about this we were using an ATM network to aggregate
customer PVCs, and it was not possible to obtain per-PVC stats from the
routers or the switches, for various disgusting reasons. We were carrying
sufficiently little international transit traffic (this was NZ) that we were
able to make measurements using NeTraMeT meters to sniff-n-count all
international traffic through span ports on ethernet switches.

In such an environment, billing 95%ile reliably is easier than billing
volume accurately.

Joe

Richard_A_Steenbegen · June 3, 2001, 3:17am

There are only two ways you can poll the rate, either you poll a "rate"
value maintained on the device, or you poll a difference in bytes divide
by the length of time between samples to calculate a rate. Either way, if
the device does not support polling of the "interface" in question you are
pretty screwed.

No matter how you stack it, if you miss a rate sample there is no way to
go back and get the data again. You either discard it and lose the ability
to bill the customer for it (which demands high availability polling
systems), or you make up a number and hope the customer doesn't notice.
Volume polling does not suffer from this problem.

Volume polling does have more difficulty detecting corruption of the
counters (due to a mysteriously exploding router, etc), for example adding
a GB that wasn't actually transfered while a corrupted rate sample would
be discarded in the 95th percentile. There are plenty of ways to detect
this kind of thing though, and you could always just discard the top X% of
volume samples just in case.

On a side note, there is something neat to be said for the potential of
"push billing" on a Juniper, by running a local program which collects
billing information and never risks being unable to reach the device, then
pushes the data out when it is convenient to do so. This could also be
used to get more accurate samples and reduce the load of polling.

Joe_Abley · June 3, 2001, 3:30am

There are only two ways you can poll the rate, either you poll a "rate"
value maintained on the device, or you poll a difference in bytes divide
by the length of time between samples to calculate a rate. Either way, if
the device does not support polling of the "interface" in question you are
pretty screwed.

Not necessarily; you just have to find other ways of measuring, as we did,
to good effect.

No matter how you stack it, if you miss a rate sample there is no way to
go back and get the data again. You either discard it and lose the ability
to bill the customer for it (which demands high availability polling
systems), or you make up a number and hope the customer doesn't notice.

No -- there is no need to do that. You don't need a sample for every single
five-minute interval during the month to produce a meaningful 95%ile
measurement for the month; you just need a representative sample population.
You increase the chances of your sample population being representative
if you consider lots of samples, but dropping one or two does not mean
you lose revenue.

Volume polling does not suffer from this problem.

It does, if you don't have per-customer interface counters. You need to
count every packet using some other method, and if you can't count packets,
you can't bill for them.

Joe

Jim_Mercer2 · June 3, 2001, 3:37am

i gave up on per-customer interface accounting, didn't scale for me.

for a while, i had a BSD box in the middle of my network, and i used
ipfw rules (which worked both as counters for accounting, and as
ingress/egress filters).

we've since moved to cisco, and, well, now i have cache flow stats which
are parsed into customer subnets.

unfortuneately, i've practically had to install seperate interfaces for
the cache flow data, as it is a steady huge flow of data, especially
for sub-30 minute periods.

Joe_Abley · June 3, 2001, 3:58am

How do you bill? Per byte, flat-rate, some measurement of rate, or other?

Richard_A_Steenbegen · June 3, 2001, 4:04am

> No matter how you stack it, if you miss a rate sample there is no way to
> go back and get the data again. You either discard it and lose the ability
> to bill the customer for it (which demands high availability polling
> systems), or you make up a number and hope the customer doesn't notice.

No -- there is no need to do that. You don't need a sample for every
single five-minute interval during the month to produce a meaningful
95%ile measurement for the month; you just need a representative
sample population. You increase the chances of your sample population
being representative if you consider lots of samples, but dropping one
or two does not mean you lose revenue.

Actually you gain revenue if you drop samples below the 95th percentile
mark, since you are forcing the cutoff point higher by reducing the number
of samples.

I think your argument is in favor of 95th percentile vs an accurate
average, not rate vs amount samples. If for some reason you lose a sample
with an average system, your revenue goes down, whereas if you lose a
sample in 95th percentile you're more likely not to make it go down much.

But this is completely circumvented by polling the amount instead of
polling the rate. Measurements in amount are always better then
measurements by rate. If you have some horribly ghetto hack that makes you
count the packets yourself and you have the possibility of missing
samples, it may not be completely better then 95th percentile, but this is
a seperate issue.

> Volume polling does not suffer from this problem.

It does, if you don't have per-customer interface counters. You need
to count every packet using some other method, and if you can't count
packets, you can't bill for them.

I'd say the real problem is with the vendor. Fortunantly most people have
counters.

Joe_Abley · June 3, 2001, 4:27am

> > No matter how you stack it, if you miss a rate sample there is no way to
> > go back and get the data again. You either discard it and lose the ability
> > to bill the customer for it (which demands high availability polling
> > systems), or you make up a number and hope the customer doesn't notice.
>
> No -- there is no need to do that. You don't need a sample for every
> single five-minute interval during the month to produce a meaningful
> 95%ile measurement for the month; you just need a representative
> sample population. You increase the chances of your sample population
> being representative if you consider lots of samples, but dropping one
> or two does not mean you lose revenue.

Actually you gain revenue if you drop samples below the 95th percentile
mark, since you are forcing the cutoff point higher by reducing the number
of samples.

Right. So, dropping samples != dropping revenue.

I think your argument is in favor of 95th percentile vs an accurate
average, not rate vs amount samples. If for some reason you lose a sample
with an average system, your revenue goes down, whereas if you lose a
sample in 95th percentile you're more likely not to make it go down much.

Not really. For any averaging function you care to apply to the sample
population, there will be some samples that tend to increase the result,
and some that tend to decrease the result. Whether or not the billable
value goes up or down depends on the sample that was dropped, on the
remaining samples, and on the averaging function being used.

I don't see how you can say in general that losing a sample "with an
average system" makes revenue go down.

You can certainly speculate about particular "averaging" functions being
more likely to increase or decrease given random loss from a particular
sample distribution, but that wasn't what we were talking about (we were
talking about rate vs. volume).

But this is completely circumvented by polling the amount instead of
polling the rate. Measurements in amount are always better then
measurements by rate.

Always?

If you have some horribly ghetto hack that makes you
count the packets yourself and you have the possibility of missing
samples, it may not be completely better then 95th percentile, but this is
a seperate issue.

Except in this case, maybe

> > Volume polling does not suffer from this problem.
>
> It does, if you don't have per-customer interface counters. You need
> to count every packet using some other method, and if you can't count
> packets, you can't bill for them.

I'd say the real problem is with the vendor. Fortunantly most people have
counters.

Suppose you are selling transit to several customers across a switch
operated by someone else (an exchange operator, for example), such that
the traffic for several customers is carried by a single interface on
your router. Suppose direct interconnects are not practical, and suppose
you have no access to any counters that may be available on the switch.

The options are: (1) do not sell to these customers, or (2) find some
way to sell to these customers by counting packets yourself. Option (1)
presents a far more consistent opportunity to decrease potential revenue
than does option (2).

I do not believe this is a particularly far-fetched scenario: hence I
think this is not simply a vendor problem.

Joe

Al_Reuben · June 3, 2001, 4:28am

$1 per gigabyte is equivalent to $316/Mbit fairly averaged.

Yes, but:

Let's assume that someone sells at $1/gig, then is billed $316/mb/s/mon by
thier provider. Let's further assume that the customer who is buying at
$1/gig is averaging 1 mb/s, but has perfect sine-wave bandwidth usage, ie,
0 kb/s at midnight, 1 mb/s at 6a, 2 mb/s and noon, 1 mb/s at 6p, and 0
mb/s again at midnight. (Agreeing that a perfect sine wave of usage is
mostly unlikely, but it's a reasonable assumption that said customer won't
be at the average all month). Problem: Provider is billed for 2 mb/s.

Al_Reuben · June 3, 2001, 4:31am

This may be obvious, but billing by volume (bytes transferred) places far
greater availability requirements on the measurement system than rate-based
charging schemes.

Not particularly.

If you have the average (not median) of usage for the month, even losing a
sample here or there, you'd be just as accurate as a 95th %tile which may
have missed the same measurements.

If I am charging by the byte, I have to count every packet. If my measurement
system breaks, I lose money until it is fixed.

See above, 'average'.

Al_Reuben · June 3, 2001, 4:35am

"samples" to calculate 95th percentile, so a missed sample is equivilent
to a 0 sample. A rate can be interpolated for the missing time, but it is
pretty much guaranteed not to be accurate, and I'd suspect a case could be
made against a provider who "makes up numbers" because of a failure in
their billing system.

Or, just take the next sample and divide it by 10 minutes, rather than 5,
and count it as two samples in the 95th calculation.

Richard_A_Steenbegen · June 3, 2001, 4:37am

> I think your argument is in favor of 95th percentile vs an accurate
> average, not rate vs amount samples. If for some reason you lose a sample
> with an average system, your revenue goes down, whereas if you lose a
> sample in 95th percentile you're more likely not to make it go down much.

Not really. For any averaging function you care to apply to the sample
population, there will be some samples that tend to increase the
result, and some that tend to decrease the result. Whether or not the
billable value goes up or down depends on the sample that was dropped,
on the remaining samples, and on the averaging function being used.

No, you're working under the assumption that the divisor goes up only with
increased samples, while the system I outlined continues to go up with the
progression of time. No reason that can't be changed though, and that
isn't important to the argument...

> I'd say the real problem is with the vendor. Fortunantly most people have
> counters.

Suppose you are selling transit to several customers across a switch
operated by someone else (an exchange operator, for example), such
that the traffic for several customers is carried by a single
interface on your router. Suppose direct interconnects are not
practical, and suppose you have no access to any counters that may be
available on the switch.

The options are: (1) do not sell to these customers, or (2) find some
way to sell to these customers by counting packets yourself. Option
(1) presents a far more consistent opportunity to decrease potential
revenue than does option (2).

You can do it with VLANs, I believe Equinix does this on their exchange
switches.

Al_Reuben · June 3, 2001, 4:40am

i gave up on per-customer interface accounting, didn't scale for me.

Thats a very bold statement. A what point (what metric?) did you feel that
this method didn't scale?

NAC is no super-duper tier-1 (I had to throw that in), but we do monitor

1400 interfaces every 5 minutes, 100 or so at more than 105 mb/s (that

magic number for 32 bit counter rollover in 5 minutes, and yes, we use 64
bit counters), shove them all in a nice SQL table, and we've not seen any
reason for non-scalibilty, at least for a while (at least 5000 more
interfaces before will have to rewrite the collection engine).

we've since moved to cisco, and, well, now i have cache flow stats which
are parsed into customer subnets.

Eeek. Relying on flow-stats? Yikes.

Richard_A_Steenbegen · June 3, 2001, 4:44am

If you can't get enough extra customers based on your better pricing,
don't lower your price (or sell it for $2/gig).