common method to count traffic volume on IX

Hi,

many Internet exchange points post publicly available graphs which
describe aggregated traffic volumes on IX. For example:

Netnod: http://www.netnod.se/ix-stats/sums/
AMS-IX: https://www.ams-ix.net/technical/statistics
LINX: https://www.linx.net/pubtools/trafficstats.html

Is there a common method to count this traffic on a switch-fabric?
Just read all the switch interface "packets input" counters with an
interval to get the aggregated input traffic and read all the switch
interfaces "packets output" counters to get the aggregated output
traffic?

regards,
Martin

Yes, as far as I have understood it's a sum of all traffic on all customer-facing ports.

most IXPs count this as the sum of all ingress packets over a period of 300
seconds. A small number of IXPs do different stuff, e.g. different
sampling interval or counting traffic on inter-switch links.

Nick

I am unaware of any IXP that uses a smaller sampling period (presumably in an attempt to make their IXP look bigger) other than DE-CIX.

Is there another one?

And yes, DE-CIX is more than well aware everyone thinks this is .. uh .. let's just call it "silly" for now, although most would use far more disparaging words. Which is probably why no serious IXP does it.

It's not silly - it's just not what everyone else does, so it's not
possible to directly compare stats with other ixps. I'm all in favour of
using short (but technically sensible) sampling intervals for internal
monitoring, but there are good reasons to use 300s / ingress sum for
prettypics intended for public consumption.

Nick

Thanks for all the replies!

Nick,

counting traffic on inter-switch links is kind of cheating, isn't it?
I mean if "input bytes" and "output bytes" on all the ports facing the
IX members are already counted, then counting traffic on links between
the switches in fabric will count some of the traffic multiple times.

Patrick,

how does smaller sampling period help to show more traffic volume on
switch fabric? Or do you mean that in case of shorter sampling periods
the traffic peaks are not averaged out and thus peak in and peak out
traffic levels remain higher?

regards,
Martin

And yes, DE-CIX is more than well aware everyone thinks this is .. uh ..
let's just call it "silly" for now, although most would use far more
disparaging words. Which is probably why no serious IXP does it.

It's not silly

We disagree.

it's just not what everyone else does

I don't think anyone else does 2 minutes, but happy to be educated otherwise.

so it's not
possible to directly compare stats with other ixps. I'm all in favour of
using short (but technically sensible) sampling intervals for internal
monitoring, but there are good reasons to use 300s / ingress sum for
prettypics intended for public consumption.

Your IXP (network, whatever), you decision. Use 2 second timers for all I care.

Unfortunately, DE-CIX has done exactly what you said - compared themselves to other IXPs using that apples-to-oranges comparison. There are words for that sort of thing, but they are impolite, and I otherwise like the people at DE-CIX, so I shall let each NANOG-ite decide how to view such, um, tactics.

The graph has a bigger peak, and DE-CIX has claimed "see, we are bigger" using such graphs. Not only did they not caveat the fact they were using a non-standard sampling method, they have refused to change when confronted or even say what their traffic would be with a 300 second timer.

In a message written on Tue, Sep 17, 2013 at 07:11:23PM +0300, Martin T wrote:

counting traffic on inter-switch links is kind of cheating, isn't it?
I mean if "input bytes" and "output bytes" on all the ports facing the
IX members are already counted, then counting traffic on links between
the switches in fabric will count some of the traffic multiple times.

Sounds like a marketing opportunity.

customer--s1--s2--s3--s4--s5--s6--s7--s8--s9--s10--customer

Presto, highest volume IX!

Maybe I should patent that idea.

That's easy to counter. just estimate some characteristics of the distribution from the sample, then apply extreme value theory to renormalize to 300 s.

(My math background talking. I once got similar stuff written into an ITU-T recommendation for provisioning trunk groups based on limited traffic samples.)

Tom T.

"Why do you have 10 48-port switches, 239 VLANs, but only 2 peers?"

"Uhh... for accounting reasons."

In a message written on Tue, Sep 17, 2013 at 07:11:23PM +0300, Martin T wrote:

counting traffic on inter-switch links is kind of cheating, isn't it? I mean if "input bytes" and "output bytes" on all the ports facing the IX members are already counted, then counting traffic on links between the switches in fabric will count some of the traffic multiple times.

I don't know of any IXP that does this. Industry standard is as you and others wrote before: the 5-minute counter difference on all customer-facing ports, publishing both input and output bps and pps.
I guess MRTG is to 'blame' for these values more than anything.

* bicknell@ufp.org (Leo Bicknell) [Tue 17 Sep 2013, 20:52 CEST]:

Sounds like a marketing opportunity.

customer--s1--s2--s3--s4--s5--s6--s7--s8--s9--s10--customer

Presto, highest volume IX!

Highest latency too, and here's to hoping all those devices actually work - it'll sure be an interesting exercise to find out wat switch in the path dropped a frame - you might as well just multiply your stats to get the same effect

  -- Niels.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Le 17/09/2013 20:15, Patrick W. Gilmore a �crit :

Thanks for all the replies!

Nick,

counting traffic on inter-switch links is kind of cheating, isn't it?
I mean if "input bytes" and "output bytes" on all the ports facing the
IX members are already counted, then counting traffic on links between
the switches in fabric will count some of the traffic multiple times.

Patrick,

how does smaller sampling period help to show more traffic volume on
switch fabric? Or do you mean that in case of shorter sampling periods
the traffic peaks are not averaged out and thus peak in and peak out
traffic levels remain higher?

Hi,

Good reading, to get an idea:

https://www1.ethz.ch/csg/people/dimitroc/papers/p95pam.pdf

Section 3, mainly.

Cheers,

mh

The graph has a bigger peak, and DE-CIX has claimed "see, we are

bigger" using such graphs. Not only did they not caveat the fact they
were using a non-standard sampling method, they have refused to change
when confronted or even say what their traffic would be with a 300
second timer.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Le 17/09/2013 20:15, Patrick W. Gilmore a �crit :

Hi,

Good reading, to get an idea:

https://www1.ethz.ch/csg/people/dimitroc/papers/p95pam.pdf

Section 3, mainly.

Cheers,

mh

somehow, a serious case of testosterone poisoning combined with insane
goal drift has hit a number of the large european exchanges. instead of
the goal being how well they serve their local communities, they have
gone wild with sleazy means of having traffic contests, doing really
sick attempts at techno-colonial expansion into foreign countries and
continents, ... instead of running a public service, they think they
are running competitive commercial enterprises. imiho, the members
should be up in arms.

if you are jealous of commercial expansion, then send your resume to
equinix. sheesh!

randy

Serious question, at an IXP shouldn't IN = OUT nearly perfectly?

Most exchanges do everything possible to eliminate broadcast packets, and they don't allow multicast on the unicast VLAN's. So properly behaved you have a bunch of routers speaking unicast to each other. The only way to get a difference is if there is packet loss, IN - loss = OUT.

* bicknell@ufp.org (Leo Bicknell) [Wed 18 Sep 2013, 19:23 CEST]:

if you host multicast on your unicast peering lan, then this will be
affected by the unicast:multicast ratio and the number of recipient ports.
Most networks which support multicast will also support multicast pruning,
so in reality this counts for very little.

Most IXPs rely on unicast flooding to determine forwarding paths, which
adds a little to the outbound numbers. So on these IXPs, outbound
aggregate is usually a tiny amount larger than inbound aggregate. The
larger the network, the smaller this effect. And on networks which
precompute forwarding paths, the in and out aggregate figures will be equal
+/- counter entropy.

Nick

* randy@psg.com (Randy Bush) [Wed 18 Sep 2013, 04:39 CEST]:

somehow, a serious case of testosterone poisoning combined with insane
goal drift has hit a number of the large european exchanges. instead of
the goal being how well they serve their local communities, they have
gone wild with sleazy means of having traffic contests, doing really
sick attempts at techno-colonial expansion into foreign countries and
continents, ... instead of running a public service, they think they
are running competitive commercial enterprises. imiho, the members
should be up in arms.

if you are jealous of commercial expansion, then send your resume to
equinix. sheesh!

Wow Randy, you really misunderstand the situation in Europe and the reasons behind the horizon expansions, and I'm surprised by your advocacy of American hegemony in a market where that really doesn't exist (those of independent not-for-profit internet exchanges).

If only you worked for a company that allowed you input into the decision processes of all these member-driven associations!

  -- Niels.

Ding ding ding! And that's why honest IXPs graph both, to show that

> they have no packet loss on their inter-switch links.

It depends on what is being measured. At TorIX we'll see deviations between in/out on our aggregate graph. As we combine all peer ports to form the aggregate graph, any large deviations are almost always due to peers who have reached capacity limits on their port (which is not always port speed, btw, always include their transport behind the port). Another common reason is the difference in measurement times across all ports.

-- Stephen