What does 95th %tile mean?

[ On Friday, April 20, 2001 at 00:52:39 (-0400), Charles Sprickman wrote: ]
> Subject: RE: What does 95th %tile mean?
> > Neither MRTG nor Cricket (nor anything with RRDtool or anything similar
> > underlying it), in their standard released form, are truly suitable for
> > accounting purposes since they both can introduce additional averaging
> > errors. You need to keep all of the original sample data.
> This actually works pretty well:
> http://www.seanadams.com/95/

If you read that page carefully you'll note that he's using a modified
version of MRTG that doesn't average its samples. As it says:

   This is a patch to add 95th percentile metering to MRTG. This is not as
   simple a feature as one might think. MRTG normally saves only one day
   worth of 5-minute samples. It is not possible to accurately calculate the
   95th percentile without having all of the samples for a one month period.
   In order to calculate the 95th percentile for a 30-day period, it is
   necessary to save an entire 30 days worth of the 5-minute samples.

MRTG does not do that by default, nor does Cricket, nor will any tool
using RRDtool as an underlying database.

You need to use the "old" MRTG without RRDTOOL to avoid the averaging. It maintains an accurate timestamp for the previous sample so that the data store in the table is accurate even if there was some jitter in the collection interval. You still do need to maintain backup logs so that you have the entire month's of 5-minute samples.

I have tried arguing against the "corrections" that RRDTOOL makes to data, but the only suggested "fix" is to lie to RRDTOOL about the timestamp. I understand that the old MRTG database is "wrong" since the timestamp it stores in the database is not the actual sample collection time. However, for most of the things I want to do, I prefer to know what the real data was at the collection point closest to the time of interest instead of what the data "should" have been if it was collected at precisely the right time.

As for the original topic, we used Alex's (max(in,out)) definition of 95% percentile billing. I always thought that the in+out method was a little "sleazy" since the explanation is usually buried in some fine print and people who aren't careful can be easily tricked into making an invalid provider comparison.

For the journal of meaningless statistics, we found that over time, the average (mean) usage for our "typical" colocation customer was 69-72% of the 95% value.

The 95% measure definitely isn't the answer to all problems. It does address some problems that "actual usage" doesn't though. Mainly, if you bill based on actual usage, customers can get very nervous that things like smurf attacks that are out of their control will send their bill through the roof. Depending on your customers and your business model, there are other ways to deal with the problem though. FWIW, I expect that the 95% model will slowly be phased out as the industry matures.