Polling Bandwidth as an Aggregate

Has anyone had to aggregate bandwidth data from multiple interfaces
for billing. For example I'd like to poll with an open source tool
and aggregate data from multiple interfaces connected to the same
customer or multiple customers for the purpose of billing and capacity
management. Is there an easy way to do this with cacti/rrd or another
open source kit?

Keegan Holley :black_small_square: Network Architect :black_small_square: SunGard Availability Services :black_small_square:
401 North Broad St. Philadelphia, PA 19108 :black_small_square: (215) 446-1242 :black_small_square:

keegan.holley@sungard.com Keeping People and Information Connected® :black_small_square:
http://www.availability.sungard.com/
Think before you print

CONFIDENTIALITY: This e-mail (including any attachments) may contain
confidential, proprietary and privileged information, and unauthorized
disclosure or use is prohibited. If you received this e-mail in
error, please notify the sender and delete this e-mail from your
system.

Hi Keegan,

Has anyone had to aggregate bandwidth data from multiple interfaces
for billing. For example I'd like to poll with an open source tool
and aggregate data from multiple interfaces connected to the same
customer or multiple customers for the purpose of billing and capacity
management. Is there an easy way to do this with cacti/rrd or another
open source kit?

With the rrdtool backend, you can certainly define and add multiple
sources from different files together. Using 'AREA' first and
subsequently 'STACK' to view multiple data sources is particularly
nice for visualization.

Otherwise, the RRDs and Statistics::Descriptive libraries in Perl can
probably go a long way towards what you might be wanting for reporting.

Dale

Except Cacti/RRDTOOL is really just a great visualization tool, while you
can build stacks, it is not something that accurately meters data for
billing purposes. The right kind of tool to use would be a netflow or
network tap-based billing tool, that actually meters/samples specific
datapoints at a specific interval and applies the billing business logic
for reporting based on sampled data points, instead of smoothed averages
of approximations.

RRDTOOL is clearly not designed to accurately report on information for
billing. To a great extent, RRDTOOL aggregates, averages, interpolates,
smooths what it reports.
http://oss.oetiker.ch/rrdtool/tut/rrdtutorial.en.html
See "Data Resampling"

Aggregation could be mitigated by including a large number of data rows at
step=1 while creating the RRD file, eg for 5 minute polling
1440*(ndays) data rows; (enough rows to include the whole bill period +
some number of days without aggregating), but not the rest of the issues
with RRD, and including so many rows greatly increases .rrd file size.
I would look at Torrus or RTG before RRDTOOL for that, but even then...

If data is not gathered using a mechanism that communicates timestamp to
the poller, datapoints will still be imprecise, SNMP would be an example
-- the cacti application may assume the SNMP response is current data, but
possibly on the actual hardware, the internal MIB on the device was
actually updated 10 seconds ago, which means there will be small spikes
in traffic rate graphs that do not represent actual spikes in traffic.

RTG uses MySQL for it's backend, so you can basically setup queries however you like and you can use RTGPOLL to graph multiple interfaces as well.

It's a super good tool and I think there is a group working on RTG2 at googlecode (I think).

-Drew

In a message written on Fri, Jan 20, 2012 at 12:16:14AM -0600, Jimmy Hess wrote:

Except Cacti/RRDTOOL is really just a great visualization tool, while you
can build stacks, it is not something that accurately meters data for
billing purposes. The right kind of tool to use would be a netflow or
network tap-based billing tool, that actually meters/samples specific
datapoints at a specific interval and applies the billing business logic
for reporting based on sampled data points, instead of smoothed averages
of approximations.

To suggest Netflow is more accurate than rrdtool seems rather strange
to me. It can be as accurate, but is not the way most people
deploy it.

RRDTool pulls the SNMP counters from an interface and records them to a
file. With no aggregation, and assuming your device has accurate SNMP,
this should be 100% accurate. While you are right that the defaults for
RRDTOOL aggregate data (after a day, week, and month, approximately)
those aggregates can be disabled keeping the raw data. I know several
ISP's that keep the raw data and use it for billing using these tools.

Netflow often suffers right at the source. If you want to bill off
netflow data 1:1 netflow is almost required, while most ISP's do sampled
Netflow at 1:100 or 1:1000. Those sampling levels produce more
inaccuracy than RRDTool's aggregation function. What's more, once the
data is put into the Netflow collector, they all do aggregation as well,
just like RRDTool. Again, you can disable much of it with careful
configuration.

But let's compare apples to apples. Let's consider RRDTool configured
to not aggregate with 1:1 netflow configured to not aggregate. RRDTool
polls a monotonically increasing counter. Should a poll be missed no
data is lost about the total number of bytes transferred. Thus you can
bill by the number of bytes transferred with 100% accuracy, even with
missed polls. If you bill by the bit-rate, you can interpolate a single
missing data point which high accuracy as well.

Netflow is a continuous stream of UDP across the network. If a UDP
packet is lost between the router and the collector there is no way to
reconstruct that data, and it is lost forever. Thus any network events
means you won't have the data to bill your customer, and you're pretty
much stuck always underbilling them with the data actually collected.

If data is not gathered using a mechanism that communicates timestamp to
the poller, datapoints will still be imprecise, SNMP would be an example
-- the cacti application may assume the SNMP response is current data, but
possibly on the actual hardware, the internal MIB on the device was
actually updated 10 seconds ago, which means there will be small spikes
in traffic rate graphs that do not represent actual spikes in traffic.

Most of the large ISP's I know of moved away from both of the solutions
above to propretary, custom solutions. They SNMP poll the counters and
store that data in a database with high resolution counters, forever,
never aggregated. The necessary perl/python/ruby code to do that and
stick it in mysql or postgres is only a few pages long and easy to
audit.

Thanks all for the responses. I think I'm going to use cacti and plugins
to aggregate. Aggregated billing is kind of something that would be nice
to have but wasn't required. It's nice to know there are concerns with
using cacti for this. My last question is if there is any easy/automated
way to pull interfaces into cacti and configure graphs for them either via
SNMP or reading from a mysql DB. I suddenly remember how much I hate
importing large routers into cacti and configuring the graphs.

No. This is one of cacti's major failings: there is no externally
accessible API. You're going to end up injecting SQL directly into the
cacti database and hoping that version upgrades don't screw up the schema
layout too much.

Nick

In a message written on Fri, Jan 20, 2012 at 10:36:38AM -0500, Keegan Holley wrote:

using cacti for this. My last question is if there is any easy/automated
way to pull interfaces into cacti and configure graphs for them either via
SNMP or reading from a mysql DB. I suddenly remember how much I hate
importing large routers into cacti and configuring the graphs.

I find using MRTG is easier than Cacti for _automation_ purposes.
It's configmaker script will generate a config file for a single
router. I've written about 5 different versions of a small script
that's basically a customized config maker so the graphs get named
with customer names or the like. The job can be fully automated
with a few hours of coding; run it out of Cron to rebuild your interface
list automatically and you'll never miss a customer turn up because
someone forgot to configure a graph.

Is there a plugin for MRTG that allows you to go back to specific times?
I like MRTG better for this as well but cacti's graphs are much more
flexible.

Once upon a time, Leo Bicknell <bicknell@ufp.org> said:

To suggest Netflow is more accurate than rrdtool seems rather strange
to me. It can be as accurate, but is not the way most people
deploy it.

Comparing Netflow to RRDTool is comparing apples to cabinets; one is a
source of information and one is a way of storing information.

RRDTool pulls the SNMP counters from an interface and records them to a
file.

No, RRDTool stores data given to it by a front end such as MRTG,
Cricket, Cacti, etc. That front end can fetch data from any number of
sources, including (but not limited to) SNMP. RRDTool then stores
information in its database.

With no aggregation, and assuming your device has accurate SNMP,
this should be 100% accurate. While you are right that the defaults for
RRDTOOL aggregate data (after a day, week, and month, approximately)
those aggregates can be disabled keeping the raw data.

RRDTool does not store the raw data. Even for 5-minute intervals, it
adjusts the data vs. the timestamp to fit the desired interval. Since
you don't read every counter at the exact time of your interval, RRDTool
is always manipulating the numbers to fit. The only numbers that are
not changed before storing are the timestamp and value for the most
recent update (which get overwritten at each update); everything else is
adjusted to fit.

I suggest reading
http://oss.oetiker.ch/rrdtool/tut/rrd-beginners.en.html

Not an external API but scripts have been available for some time now:

http://www.cacti.net/downloads/docs/html/scripts.html

Ian

Once upon a time, Leo Bicknell <bicknell@ufp.org> said:
> To suggest Netflow is more accurate than rrdtool seems rather strange
> to me. It can be as accurate, but is not the way most people
> deploy it.

Comparing Netflow to RRDTool is comparing apples to cabinets; one is a
source of information and one is a way of storing information.

I assumed he meant an RRDTool kit that creates graphs with RRDTool.
Technically, mysql is the "way of storing information". RRDTool processes
it and has the ability to make it pretty for us humons.

> RRDTool pulls the SNMP counters from an interface and records them to a
> file.

No, RRDTool stores data given to it by a front end such as MRTG,
Cricket, Cacti, etc. That front end can fetch data from any number of
sources, including (but not limited to) SNMP. RRDTool then stores
information in its database.

Same as above

> With no aggregation, and assuming your device has accurate SNMP,
> this should be 100% accurate. While you are right that the defaults for
> RRDTOOL aggregate data (after a day, week, and month, approximately)
> those aggregates can be disabled keeping the raw data.

RRDTool does not store the raw data. Even for 5-minute intervals, it
adjusts the data vs. the timestamp to fit the desired interval. Since
you don't read every counter at the exact time of your interval, RRDTool
is always manipulating the numbers to fit. The only numbers that are
not changed before storing are the timestamp and value for the most
recent update (which get overwritten at each update); everything else is
adjusted to fit.

I think every graphing tool does this. I pretty much ignored this though

since I was asking about aggregating data from multiple objects not
aggregating data over time.

Cheers

It also has another slightly subtle but hugely useful advantage: the
primary index reference of a graph does not refer to an interface name or a
number, but can be defined as an arbitrary unique token. This is
ridiculously useful when it comes to 3rd party scripting and moving
customers around the place

Nick

RTG uses MySQL for it's backend, so you can basically setup queries
however you like and you can use RTGPOLL to graph multiple interfaces
as well.

It's a super good tool and I think there is a group working on RTG2 at
googlecode (I think).

Another RTG user! I didn't know many of us existed!

RTG is a great tool. It's design (perl and PHP and MySQL) lends itself to being modified at will; integration with tools like PHP NetworkWeathermap is very straightforward (<?phpclass WeatherMapDataSource_rtg extends WeatherMapDataSource {functi - Pastebin.com), and the MySQL backend makes it super flexible. There's no aggregation of data, unless you hack it in yourself with some fancy queries.

RTG's data is ideal for doing MySQL partitioning, and there are some indexes that need to be added. But when you get those things in place, it becomes fast and powerful - and it's easy to drop out old data without a lengthy query (just drop the partition). The fact that each SNMP device gets its own table is also a big performance win over the more popular tools.

The web interface allows for interface aggregation, and the code for doing that could probably be reverse engineered easily enough for other reporting mechanisms as well.

Nathan Eisenberg

On this point (of nice aggregation UIs) is anyone here using Graphite
as a backend for their time series data stores? You have to
supply/write the poller yourself but it seems an ideal backend for a
"just graph everything" approach which allows the poller to use SNMP
get-bulk requests which I haven't seen other pollers (rtg/mrtg/spine)
doing.

~Matt

I'm not personally, but I know some of our support clients are happily using it along with OpenNMS' support for outboarding of data storage via TCP and Google protobuf.

-jeff

I agree with Drew -- I have several functions that do their best to
correlate readings amount multiple interfaces, combine them with other
readings near the same time intervals, and output a single set of aggregate
bandwidth data.

One of RTG's big problems is scalability -- as you monitor more and more
devices, going further and further back in time, you're ending up with a
gigantic MySQL dataset that can be difficult to manage. Fortunately, there
are open-source tools to help manage this. There's a Ruby program that
automates consolidation of multiple rows into single rows based on
configuration data -- allowing you to keep 5-minute readings of interface
data for 2 months, then condensing it to 1 hour readings after that, with
the flexibility to identify specific tables and specific timeframes to give
you maximum control.