Rapidly-variable routing on the time scale of seconds to minutes?

We did a "traceroute" end-to-end routing measurement in 2004 and found about
5-10% of measuremnts exhibiting rapidly-variable routing on the time scale
of a single traceroute (seconds to minutes). In other words, the packets
belonging to a single traceroute took multiple paths.

Vern Paxson mentioned in 1997 one mechanism that can lead to this "route
fluttering" behavior as "route splitting", which is explicity allowed in
RFC1812 - Requirements for IP version 4 Routers.

Route change in such a short scale for packets in the same flow could be
troublesome. But the occurrence of such behavior does not seem to have
reduced over the past years at least from our measurements. Does anyone know
how to explain this behavior? Thanks!

An example traceroute record containing the fluttering is shown below (see
the 5th hop)

Fri Apr 09 09:35:35 2004

1 cisfhfb.fh-friedberg.de (212.201.24.1) 1.095 ms 0.402 ms 0.321 ms
2 ar-frankfurt2.g-win.dfn.de (188.1.42.9) 120.105 ms 198.766 ms 200.040
ms
3 cr-frankfurt1-ge5-0.g-win.dfn.de (188.1.80.1) 2.093 ms 2.142 ms 2.087
ms
4 so-6-0-0.ar2.FRA2.gblx.net (208.48.23.141) 2.461 ms 2.349 ms 2.333 ms
5 pos5-0-2488M.cr2.FRA2.gblx.net (67.17.65.53) 2.448 ms
pos6-0-2488M.cr1.FRA2.gblx.net (67.17.65.77) 2.368 ms 2.281 ms
6 so3-0-0-2488M.ar2.FRA3.gblx.net (67.17.65.82) 2.676 ms 2.750 ms
so2-0-0-2488M.ar2.FRA3.gblx.net (67.17.65.58) 2.569 ms
7 ge-7-2.Frankfurt1.Level3.net (195.122.136.245) 10.971 ms 10.967 ms
10.882 ms
8 ae-0-55.mp1.Frankfurt1.Level3.net (195.122.136.97) 11.488 ms 11.417 ms
11.353 ms
9 so-0-0-0.mp1.London2.Level3.net (212.187.128.61) 27.203 ms 27.042 ms
27.048 ms
10 so-1-0-0.bbr1.Washington1.Level3.net (212.187.128.138) 91.004 ms
91.006 ms 90.977 ms
11 ge-0-0-0.mpls1.Honolulu2.Level3.net (4.68.128.13) 212.254 ms 212.321
ms 212.351 ms
12 so-7-0.hsa1.Honolulu2.Level3.net (4.68.112.90) 212.407 ms 212.250 ms
212.365 ms
13 s1.lavanet.bbnplanet.net (4.24.134.18) 212.609 ms 212.372 ms 213.270
ms
14 malasada.lava.net (64.65.64.17) 212.260 ms 212.460 ms 212.226 ms

Best regards,

Charles

We did a "traceroute" end-to-end routing measurement in 2004 and found about
5-10% of measuremnts exhibiting rapidly-variable routing on the time scale
of a single traceroute (seconds to minutes). In other words, the packets
belonging to a single traceroute took multiple paths.

[...]

Route change in such a short scale for packets in the same flow could be
troublesome. But the occurrence of such behavior does not seem to have
reduced over the past years at least from our measurements. Does anyone know
how to explain this behavior? Thanks!

Yes, this is normal per-flow load balancing on parallel backbone lines.
Usually, "flows" are defined via a hash on L3 and possibly L4
addressing information (IP source/dest, and TCP/UDP port source/dest,
ICMP code, etc.). If the "flow hash" contains L4 information, every
traceroute probe packet is considered a different "flow" and you see
exactly that:

5 pos5-0-2488M.cr2.FRA2.gblx.net (67.17.65.53) 2.448 ms
pos6-0-2488M.cr1.FRA2.gblx.net (67.17.65.77) 2.368 ms 2.281 ms
6 so3-0-0-2488M.ar2.FRA3.gblx.net (67.17.65.82) 2.676 ms 2.750 ms
so2-0-0-2488M.ar2.FRA3.gblx.net (67.17.65.58) 2.569 ms

You would NOT see the same effect with packets of e.g. the same TCP
session, so this (multipath forwarding) is usually no problem (as for
TCP and UDP applications there is no reordering happening). So your
analysis results (traceroute) are misleading for most real-life
applications. I agree that it's irritating and I personally favor
using aggregated SONET/Ethernet devices (IEEE 801.3ad) to bundle
parallel lines if possible.

Best regards,
Daniel

Charles Shen wrote:

An example traceroute record containing the fluttering is shown below (see
the 5th hop)

Fri Apr 09 09:35:35 2004

1 cisfhfb.fh-friedberg.de (212.201.24.1) 1.095 ms 0.402 ms 0.321 ms
2 ar-frankfurt2.g-win.dfn.de (188.1.42.9) 120.105 ms 198.766 ms 200.040
ms
3 cr-frankfurt1-ge5-0.g-win.dfn.de (188.1.80.1) 2.093 ms 2.142 ms 2.087
ms
4 so-6-0-0.ar2.FRA2.gblx.net (208.48.23.141) 2.461 ms 2.349 ms 2.333 ms
5 pos5-0-2488M.cr2.FRA2.gblx.net (67.17.65.53) 2.448 ms
pos6-0-2488M.cr1.FRA2.gblx.net (67.17.65.77) 2.368 ms 2.281 ms

That sure looks like ECM to me. Equal Cost Multi-Path. This is NOT anything new. What's the big deal?

6 so3-0-0-2488M.ar2.FRA3.gblx.net (67.17.65.82) 2.676 ms 2.750 ms
so2-0-0-2488M.ar2.FRA3.gblx.net (67.17.65.58) 2.569 ms

Same here.

Daniel Roesen wrote:

You would NOT see the same effect with packets of e.g. the same TCP
session, so this (multipath forwarding) is usually no problem (as for
TCP and UDP applications there is no reordering happening). So your
analysis results (traceroute) are misleading for most real-life
applications. I agree that it's irritating and I personally favor
using aggregated SONET/Ethernet devices (IEEE 801.3ad) to bundle
parallel lines if possible.

Best regards,
Daniel

Good point Daniel.

Perhaps the researchers should be using Layer Four traceroute.

John

Please see inline.

From: John Fraizer [mailto:nanog@enterzone.net]
Sent: Monday, January 31, 2005 8:21 AM
To: Charles Shen; nanog@merit.edu
Subject: Re: Rapidly-variable routing on the time scale of
seconds to minutes?

Charles Shen wrote:
> An example traceroute record containing the fluttering is
shown below
> (see the 5th hop)
>
> Fri Apr 09 09:35:35 2004
>
> 1 cisfhfb.fh-friedberg.de (212.201.24.1) 1.095 ms 0.402
ms 0.321
> ms 2 ar-frankfurt2.g-win.dfn.de (188.1.42.9) 120.105 ms
198.766 ms
> 200.040 ms 3 cr-frankfurt1-ge5-0.g-win.dfn.de
(188.1.80.1) 2.093 ms
> 2.142 ms 2.087 ms
> 4 so-6-0-0.ar2.FRA2.gblx.net (208.48.23.141) 2.461 ms
2.349 ms 2.333 ms
> 5 pos5-0-2488M.cr2.FRA2.gblx.net (67.17.65.53) 2.448 ms
> pos6-0-2488M.cr1.FRA2.gblx.net (67.17.65.77) 2.368 ms 2.281 ms

That sure looks like ECM to me. Equal Cost Multi-Path. This is NOT
anything new. What's the big deal?

From the responses, the answer to "the rapidly-variable routing on the time

scale of seconds to minutes" seems to be:

1. It could be link layer load balancing, with the two interfaces belonging
to the same router.
2. It could be per-flow load balancing where flows are defined via both L3
and L4 info, so traceroute probe could not reflect the truth.

My question is then: would it be safe to argue that the above two causes
explain all (or most of?) the observed "fluttering" routers? (some examples
listed below) What we are concerned about is per-packet load balancing
(packets in the same flow go through different paths), which will cause
trouble to protocols that install state information in routers along the
flow path.

Example pairs:

144.223.27.146 sl-telia1-1-0.sprintlink.net
144.232.230.30 sl-telia1-4-0.sprintlink.net

216.140.0.66 s3-0-0.a1.hywr.broadwing.net
216.140.0.70 s4-0-0.a1.hywr.broadwing.net

67.17.65.53 pos5-0-2488M.cr2.FRA2.gblx.net
67.17.65.77 pos6-0-2488M.cr1.FRA2.gblx.net

67.17.65.54 so5-0-0-2488M.ar2.FRA2.gblx.net
67.17.65.78 so4-0-0-2488M.ar2.FRA2.gblx.net

67.17.65.57 pos11-0-2488M.cr2.FRA2.gblx.net
67.17.65.81 pos11-0-2488M.cr1.FRA2.gblx.net

67.17.65.58 so2-0-0-2488M.ar2.FRA3.gblx.net
67.17.65.82 so3-0-0-2488M.ar2.FRA3.gblx.net

67.17.64.66 pos6-0-2488M.cr2.SFO1.gblx.net
67.17.74.157 pos8-0-2488M.cr1.SFO1.gblx.net

129.250.2.183 p16-3-0-0.r01.snjsca04.us.bb.verio.net
129.250.5.136 p16-7-0-0.r00.snjsca04.us.bb.verio.net

[ snip ]

>From the responses, the answer to "the rapidly-variable routing on the time
scale of seconds to minutes" seems to be:

1. It could be link layer load balancing, with the two interfaces belonging
to the same router.
2. It could be per-flow load balancing where flows are defined via both L3
and L4 info, so traceroute probe could not reflect the truth.

My question is then: would it be safe to argue that the above two causes
explain all (or most of?) the observed "fluttering" routers? (some examples
listed below) What we are concerned about is per-packet load balancing
(packets in the same flow go through different paths), which will cause
trouble to protocols that install state information in routers along the
flow path.

AFAIK, multiple routers showing up in a single-hop in traceroute response is
a sign of packet-by-packet load balancing, not flow based.

I could be wrong, though this was my past observation.

P.S.: What router-interacting applications are you using?

-J

I am talking about e.g. QoS reservation signaling applications.

James wrote:

AFAIK, multiple routers showing up in a single-hop in traceroute response is
a sign of packet-by-packet load balancing, not flow based.

I could be wrong, though this was my past observation.

P.S.: What router-interacting applications are you using?

-J

I would venture to guess that in 99% of the cases, it's not multiple routers showing up in a single hop but, rather multiple interfaces on the same router showing up.

John

Not necessarily, and in most cases probably not a fact. Don't forget
that standard UNIX traceroute uses UDP where the destination port of
the probes is increased for each subsequent probe. So per-flow
balancing hashes taking L4 header information into account will see
each traceroute probe as distinct "flow".

Best regards,
Daniel

From the responses, the answer to "the rapidly-variable routing on
the time scale of seconds to minutes" seems to be:

1. It could be link layer load balancing, with the two interfaces
   belonging to the same router.
2. It could be per-flow load balancing where flows are defined via
   both L3 and L4 info, so traceroute probe could not reflect the
   truth.

That's no contradiction as far as I read it. Wether the two equal-cost
paths are terminated on the same routers doesn't matter actually.

My question is then: would it be safe to argue that the above two
causes explain all (or most of?) the observed "fluttering" routers?

Taking seldom observed, transient control plane convergence effects
(IGP/BGP converging while traceroute is used), probably yes.

(some examples listed below)

Well, to see wether flow-balancing is used, use e.g. TCP traceroute.
If you see "stable" results (all three probes of a hop matching) there
all the time, ...

What we are concerned about is per-packet load balancing
(packets in the same flow go through different paths), which will cause
trouble to protocols that install state information in routers along the
flow path.

Modern core router hardware like Juniper (IP2 ASIC) can't do classic
per-packet load balancing anymore at all, only per-flow balancing.

I'm not sure for the GSR platform, but as far as I remember, it's not
supported at all on Engine 2 line cards, and has a performance penalty
otherwise.

Exec summary: I seriously doubt the larger shops do so, either because
their hardware can't do so at all (Juniper-based cores) and/or people
know that per-packet load balancing leads to packet reordering which
might make your customers quite unhappy. It's generally a bad idea.

Best regards,
Daniel

Found some reference on that:
http://www.cisco.com/en/US/products/sw/iosswrel/ps1829/products_feature_guide09186a0080087c5b.html

Bottom line: works on E0 and E1 linecards, and with caveats on E2. Not
with newer ones. So Cisco is dumping that too. Good to see that.

Best regards,
Daniel