Alternative to NetFlow for Measuring Traffic flows

Hi all -

Here is the problem: Everyone wants to know how much traffic would ultimately be passed in peering relationships at an IX before signing up/building into an IX.

I heard an interesting solution recently to estimating the traffic volume destined to an AS in the absence of NetFlow or the like. The ability to measure traffic via sampling has been difficult for a variety of reasons (lack of staff resources, capabilities of the interface cards, expensive SW, etc.) and I found from talking to Peering Coordinators that less than 1 in 20 ISPs actually do the traffic measurements. In the absence of data, ISPs are often left to intuition, guessing that a particular AS would be a good peering candidate.

Here is the Solution:

Assuming that:
1) You are multi-homed
2) You have some ideas of who you would want to peer with
3) <more assumptions here I'm sure>

1) You adjust routing to prefer one transit provider or the other for the AS
2) Shift traffic to the particular AS from one transit provider to the other, noting the change in the loads on the transit providers.

If you do this at peak time you can get a rough estimate of the peak traffic to this AS, and therefore a rough order of magnitude estimate of the amount of traffic that would go to this AS in a peering relationship. (Rough Estimate means determining if the traffic volume is likely to be 2 Mbps vs. 20 Mbps vs. 200Mbps)

Interesting idea. Comments?

The other approach some ISPs use is to set up a "trial" peering session, usually using a private cross connect to measure the traffic volume and relative traffic ratios. Then both side can get an idea of the traffic before engaging in a contractual Settlement-Free Peering relationship.

Bill

Hi Bill,

Impressive numbers but of course, slackers aside, if it was your connection
and resources wouldnt you want more accurate information than just a guess?
This may be effective for an IX decision if you created some sort of a map
based on ALL the ASN's of the people on the peering switch.. but in most
cases anyone pushing any real traffic will probably not have fine grained
samples enough to determine a peering relationship based on a single AS
with this method. Maybe Im wrong but hey if you are taking 200megs from any
one ASN I would hope you knew about it.

Interesting idea. Comments?

Again it seems to iffy. What if you get a short DOS when you shift an ASN..
how much of a chump will you look like when you need that peer to be 1gbps
and you hook up and its only pulling 2mpbs ?

The other approach some ISPs use is to set up a "trial" peering session,
usually using a private cross connect to measure the traffic volume and
relative traffic ratios. Then both side can get an idea of the traffic
before engaging in a contractual Settlement-Free Peering relationship.

I like this one the best if I didnt have Netflow stat's... however I doubt
everyone will allow this because of time, money, resources, security, etc.
I tend to look at peering as something you need to know when to do because
the data tells you so. In this industry as it stands now why would you NOT
run netflow stats to give you this information? all you are doing is
wasting more money paying for transit that could be offloaded to peering.
And the flipside is also true.. why even worry about peering if you cant
get more than a meg or two max to each AS?

Ouch. Among other problems, you really have no idea what is peak time for
that peer, which may be very different from your overall peak time.

Is there some reason you don't have netflow available, or are you just
trying to work around the known problems of it, such as not being able to
get a true reading due to best path issues?

I am quite happy with my method, using netflow export projected against an
external routing table. It gives pretty accurate results, *IF* you have
the correct external routing table. For two potential peers this is easy
to get, but for other use it is sometimes fairly difficult to get accurate
and current table of someone's customer routes. Unfortunately it seems
many people view this as "NDAable information", and won't make it
accessable via a route-views type service.

Also, that method has the same "knowing the routes" problem as netflow.
Whereever you are getting your list of ASN's route ASN.*"'s routes, there
is pretty much no way they are accurate (for an ASN of ANY size).

You would have to statically route (or otherwise inject routes with a
specific nexthop) a list of their customer prefixes that would have to be
manually transmitted.

If you are interested in traffic *to* a particular destination, surely you could just tweak localpref on routes based on an as-path filter?

If you are interested in traffic *from* a particular destination (you have a network full of eyes, not content) then this approach is not useful anyway.

Joe

And then quantify it how? Ie; useful Netflow-like "x Mbps to AS x, y Mbps to
AS y" statistics?

I think the idea was to say "well, from the mrtg graph, the difference between this circuit with all my _9327_ traffic and this circuit without any _9327_ traffic, at what I might reasonably estimate their peak time to be, looks to be about 2 megs or so".

It's a pretty crude measure, but it does have the advantage of requiring no more than mrtg and a route-map to set up.

Joe

Hi all -

Here is the problem: Everyone wants to know how much traffic would
ultimately be passed in peering relationships at an IX before signing
up/building into an IX.

I heard an interesting solution recently to estimating the traffic volume
destined to an AS in the absence of NetFlow or the like.

I have an amazingly simple proposition - as opposite to guesstimating the data
coming up with excuses why not to use NetFlow, get NetFlow data for your own
network.

Alex

Impressive numbers but of course, slackers aside, if it was your connection
and resources wouldnt you want more accurate information than just a guess?

Yes, but I am also sympathetic to the challenges to ISPs in this economy, and the challenges with large networks where there are so many ingress/egress points that getting sampling in place is problematic. I hear from some Tier 1 ISPs that in some cases sampling is not available on the too new or too old NIC. In some cases there are simply too many points to measure, requiring too much disk, time, processing, etc. I heard stories of those that process the data monthly and do so at great expense, with the occasional crashes of the weekend jobs. Sometimes the quick and dirty approach is easier. Doing the research it was surprising to find how many of the largest ISPs in the world don't/can't do the detailed traffic analysis.

<snip>

> Interesting idea. Comments?

Again it seems to iffy. What if you get a short DOS when you shift an ASN..
how much of a chump will you look like when you need that peer to be 1gbps
and you hook up and its only pulling 2mpbs ?

Good point - another assumption (3) that the traffic is normal predictable sinusoidal pattern such that the peak for the target AS matches the peak of the rest of the traffic.

> The other approach some ISPs use is to set up a "trial" peering session,
> usually using a private cross connect to measure the traffic volume and
> relative traffic ratios. Then both side can get an idea of the traffic
> before engaging in a contractual Settlement-Free Peering relationship.

I like this one the best if I didnt have Netflow stat's... however I doubt
everyone will allow this because of time, money, resources, security, etc.

Yes, the Empirical Approach is most accurate but, besides the cost of implementing the trial peering, there are examples of Tier 2 ISPs trying to game the trial with a Tier 1 ISP in order to obtain the peering relationship. I heard stories of some pretty wacky routing and traffic engineering in order to demonstrate during the trial that ratios and traffic volumes fell within a certain range. ( The "Art of Peering" documents a few of these tactics.) I can understand why the Tier 1's are hesitant to do the trial peering even when they don't have the data to refute the "peering worthiness".

I tend to look at peering as something you need to know when to do because
the data tells you so. In this industry as it stands now why would you NOT
run netflow stats to give you this information? all you are doing is
wasting more money paying for transit that could be offloaded to peering.

Me too, but differentiate between Tier 1 and Tier 2 solely for the motives; Tier 2's want to peer broadly to reduce transit fees, while Tier 1's by definition don't pay transit fees to anyone.

And the flipside is also true.. why even worry about peering if you cant
get more than a meg or two max to each AS?

I have found peering to have additive value; a lot of 1-2 Mbps peering sessions can save as much money for you as a single large traffic peer. The more traffic, the stronger the case for peering.

Bill

> based on ALL the ASN's of the people on the peering switch.. but in most
> cases anyone pushing any real traffic will probably not have fine grained
> samples enough to determine a peering relationship based on a single AS
> with this method. Maybe Im wrong but hey if you are taking 200megs from any
> one ASN I would hope you knew about it.

Also, that method has the same "knowing the routes" problem as netflow.
Whereever you are getting your list of ASN's route ASN.*"'s routes, there
is pretty much no way they are accurate (for an ASN of ANY size).

The vast majority of the routes will be an intersection of routes announced
by the AS to other AS (including looking glasses).

Alex

My total traffic is Z, my traffic to AS X is Px%. My traffic to AS Y is Py%.
Py is 70x Px. I therefore should attempt to get interconnect with y.

Alex

> > based on ALL the ASN's of the people on the peering switch.. but in most
> > cases anyone pushing any real traffic will probably not have fine grained
> > samples enough to determine a peering relationship based on a single AS
> > with this method. Maybe Im wrong but hey if you are taking 200megs from any
> > one ASN I would hope you knew about it.
>
> Also, that method has the same "knowing the routes" problem as netflow.
> Whereever you are getting your list of ASN's route ASN.*"'s routes, there
> is pretty much no way they are accurate (for an ASN of ANY size).

The vast majority of the routes will be an intersection of routes announced
by the AS to other AS (including looking glasses).

oops, this should be read as "by the AS to other AS' (including the data you
can pull of from looking glasses)."

Alex

It is also useful as a supplement to netflow statistics, as sort of a
verification to your flow data. Sometimes due to design, operating
conditions, etc netflow data is not always the most reliable and/or
meaningful.

As an example:

You run two main types of border router platforms. On one platform you
must sample netflow @ 1% due to performance limitations. On the other
platform there is no sampling functionality built into the software.
This creates an immediate skew of data, unless software is created to
sample the flows coming off the second platform.

Now take into account that your traffic is mainly outbound from your
network, which means that you need to ignore vendor best practice
and enable flow caching on your core (internal) facing interfaces to
measure the traffic flowing out of your network.

So, in order for you to get any kind of traffic statistics for a peer,
you've got to spend many hours distilling data manually, doing AS
aggregations, and create a possibly unstable networking environment.

No big deal, right?

It may be crude, but sometimes it can be the most reliable _available_
method to tell how much traffic is going to the ISP and ISP customers.

Joe

Assume you are provider A, and you are considering peering with provider
B. Assume Provider B has customer Z, who buys transit from Provider B and
Provider C. Assume you already peer with provider C.

You have no way to know if customer Z will be part of your routes to
Provider B, or if you will prefer them over provider C, without having the
route list.

This is a very common situation if you have any decent amount of peering,
and/or if you are considering peering with a provider who has any
reasonable number of multihomed customers. As we've already proved in
previous nanog emails, the top 20 route-announcing providers added
together have enough routes to cover the internet around 8 times over.
Even looking glasses may not contain all the paths available.

Projecting actual IP traffic onto actual IP routes is the only way to do
it.

Quantifiable Proof and "Peering Profiles"...see below.

> I think the idea was to say "well, from the mrtg graph, the difference
> between this circuit with all my _9327_ traffic and this circuit
> without any _9327_ traffic, at what I might reasonably estimate their
> peak time to be, looks to be about 2 megs or so".
>
> It's a pretty crude measure, but it does have the advantage of
> requiring no more than mrtg and a route-map to set up.

Right, it is crude, but in an economy where business decisions require "Quantifiable *Proof*", this is quantifiable and easy to do. Some Peering Coordinators are putting together business plans now for peering at the IX that includes the #'s of Mbps of peering traffic, and e-mail confirmation from the peers at the IX that they will indeed peer with them at the IX. Smart customers; if they exceed the breakeven point then peering makes sense. A lot more work up front than it used to be.

It is also useful as a supplement to netflow statistics, as sort of a
verification to your flow data. Sometimes due to design, operating
conditions, etc netflow data is not always the most reliable and/or
meaningful.

As an example:

You run two main types of border router platforms. On one platform you
must sample netflow @ 1% due to performance limitations. On the other
platform there is no sampling functionality built into the software.
This creates an immediate skew of data, unless software is created to
sample the flows coming off the second platform.

Now take into account that your traffic is mainly outbound from your
network, which means that you need to ignore vendor best practice
and enable flow caching on your core (internal) facing interfaces to
measure the traffic flowing out of your network.

So, in order for you to get any kind of traffic statistics for a peer,
you've got to spend many hours distilling data manually, doing AS
aggregations, and create a possibly unstable networking environment.

No big deal, right?

It may be crude, but sometimes it can be the most reliable _available_
method to tell how much traffic is going to the ISP and ISP customers.

Joe is absolutely right here, and this still represents a common scenario and problem for the peering community.

Another approach I have been thinking about is to generate "Peering Profiles" for the community...here is how it works. Let's say I work with a few Internet Gaming companies and find that the netflow stats show a certain pattern, or profile of traffic destinations. Maybe I find that
2% to Cox
3% to Shaw
2% to Comcast
5% to Roadrunner
2% to Adelphia
and the next top 20 ASes represent the next 10% of traffic.

Anonymized, this "Peering Profile" for Internet Gaming companies can probably be applied to other Internet Gaming companies and can provide a rough idea of good targets for peering and how much traffic can be expected at a peering point, as a percentage of their total traffic. Empirically, these top traffic destinations and volumes have been large enough, 10's of Mbps each, generally more than enough to justify peering a an IX where the breakeven point is 10-30Mbps. The design of the tool/template is pretty obvious from there.

Side Note: See all the trouble we go through because traffic flow measurement is still non-trivial? If the netflow data is available at ingress/egress points, I was pointed to http://ehnt.sourceforge.net/ as a good freeware tool for evaluating and translating the netflow raw data.

Bill

> > Also, that method has the same "knowing the routes" problem as netflow.
> > Whereever you are getting your list of ASN's route ASN.*"'s routes, there
> > is pretty much no way they are accurate (for an ASN of ANY size).
>
> The vast majority of the routes will be an intersection of routes
> announced by the AS to other AS (including looking glasses).

Assume you are provider A, and you are considering peering with provider
B. Assume Provider B has customer Z, who buys transit from Provider B and
Provider C. Assume you already peer with provider C.

You have no way to know if customer Z will be part of your routes to
Provider B, or if you will prefer them over provider C, without having the
route list.

This is a standard problem resolved in the set theory. Pick your set.
Measure. Pick your set again, measure. Repeat N times. Decide which set of
results you accept as more likely. Use them.

Alex

Right, it is crude, but in an economy where business decisions
require "Quantifiable *Proof*", this is quantifiable and easy to
do. Some Peering Coordinators are putting together business
plans now for peering at the IX that includes the #'s of Mbps of
peering traffic, and e-mail confirmation from the peers at the
IX that they will indeed peer with them at the IX. Smart customers;
if they exceed the breakeven point then peering makes sense. A
lot more work up front than it used to be.

Business decisions surrounding bandwidth and connectivity is still
in the early years of development. A few, highly technical people
understand it. Unfortunately, most of those people do not also
know how financial roi works, or how to define a project or strategy
that meets business hurdle rates.

I personally feel that this is a gap in how internet marketing
started idetifying measurements such as `click-throughs' and
application- oriented numbers to fulfill their statistical needs
for building partners and identifying their relationship to their
market. Clearly, just by looking at the names they chose for things
(`click-throughs'?), they had no idea what they were doing, and
likely were also not willing to listen to their technical counterparts.
Engineers don't know anything about business partnerships and
relationships to their market, right?

As a business unit manager responsible for Internet connectivity,
one is obligated to look at their WACC / discount-/hurdle- rates
and determine the value of future returns. Find out the WACC and
marginal tax rates of your company. Find a financial controller,
or someone who manages finance and locate information on how capital
expenditures are evaluated and depreciated (how long?). Find out
what metrics they want and need for technology expenditures that
involve both capex and opex in the same budget/project. What are
the business expectations?

Another approach I have been thinking about is to generate "Peering
Profiles" for the community...here is how it works. Let's say I
work with a few Internet Gaming companies and find that the
netflow stats show a certain pattern, or profile of traffic
destinations. Maybe I find that
2% to Cox
3% to Shaw
2% to Comcast
5% to Roadrunner
2% to Adelphia
and the next top 20 ASes represent the next 10% of traffic.

Anonymized, this "Peering Profile" for Internet Gaming companies
can probably be applied to other Internet Gaming companies and
can provide a rough idea of good targets for peering and how much
traffic can be expected at a peering point, as a percentage of
their total traffic. Empirically, these top traffic destinations
and volumes have been large enough, 10's of Mbps each, generally
more than enough to justify peering a an IX where the breakeven
point is 10-30Mbps. The design of the tool/template is pretty
obvious from there.

Bill, I fully agree with your methods and think they are wonderful.
If there is any quantifiable proof, it needs to be identified and
executed on. Get some numbers, any numbers. Take some tcpdump
samples, or even load a permit acl on your routers to determine
estimates. What is the problem you are trying to solve? My answer:
an internal business unit agreement that a minimum percentage of
costs can be reduced through peering. Do this in any way that you
can.

Why is this so difficult? Probably because the disagreeable parties
aren't talking the same language. I also feel that in many
businesses, technology (such as bandwidth) is not taken seriously.
Prothat include capex+opex to a variety of vendors (creating vendor
dependence) with new "extra" routers (equipment), and seemingly
costly exchange point "extra" connectivity, with "extra" racks and
power requirements with monthly re-occurring charges - well that's
just not intuitively cost-effective, now is it?

Ask the right questions. If managers do not want to do peering
because it doesn't meet some marketing or partnership requirement,
then take some different angles. If managers don't believe that
peering can actually save money, come up with the numbers and the
financial language they are used to.

If you are told not to come up with technology estimates, or simply
can't because you don't have the time - then consider using someone
else's work and time (like Bill Norton). Augment/replace estimates
(and even really accurate NetFlow data) with externally researched
estimates and national or global averages.

BTW: Yes, I believe NetFlow is an excellent tool, and I use it
myself for determing who would make a good peer (and ras is correct
that using an external routing table along with the netflow data
improves this even further). NetFlow is easy. Set it up and use
it, or get the needed help from your vendor. If you determine that
for your environment - that NetFlow is not easy, then use something
else.

dre

s/Prothat/Projects that/

stupid vi. ;>

dre