LAGing backbone links

Payam_Chychi · April 5, 2011, 12:17am

Hello All,

I was wondering if anyone had any thoughts as to the best practices of
running multiple backbone links between 2 routers. In the past we've added
additional links as needed, then simply enabled IS-IS when they were good to
go. I'd then let IS-IS handle load balancing the traffic over the two
links. But I know that others out there would setup a LAG once they had
more than one link between two routers. Is there a best practice? Does it
matter? Any implications to a MPLS setup?

Thanks

Shane_Amante · April 5, 2011, 3:30pm

Payam,

Hello All,

I was wondering if anyone had any thoughts as to the best practices of
running multiple backbone links between 2 routers. In the past we've added
additional links as needed, then simply enabled IS-IS when they were good to
go. I'd then let IS-IS handle load balancing the traffic over the two
links. But I know that others out there would setup a LAG once they had
more than one link between two routers. Is there a best practice? Does it
matter? Any implications to a MPLS setup?

In general, if you're using relatively modern, medium- to higher-end equipment, it should "just work". Some things to watch out for in order of importance:
1) Be mindful of the number of component-links you can put into a single LAG. This varies by platform. In general, for higher-end routers/switches the minimum number of component-links in a single LAG is 16. More recently, in the last couple of years, several vendors are shipping equipment and/or software that will take this up to 64x component-links in a single LAG. (Depending on platform, LAG's may allow you to build larger virtual-links between adjacent devices compared to ECMP which may be limited to 8x component-links in a single ECMP ... but, again, that all depends on the platform type).
2) The distribution of flows across the component-links in a single LAG could vary, dramatically, depending on the type of traffic you're pushing. Specifically, for /Internet/ (IPv4 or IPv6) over MPLS traffic, you will most likely very get good load distribution given the pseudo-randomness of IP addresses and Layer-4 port information, (in particular source port's from a client toward a server). OTOH, if you have traffic in [very large] PW's, then typically LSR's/switches/routers can't look past the MPLS labels and inner Layer-2 encapsulation to find granular input keys used for the load-balancing hash. Thus, the load-balancing hash result will cause all traffic for a single PW VC to non-deterministically be placed on a single component-link in the LAG. The good news is that there is hope on the horizon in the form of:

... which, in short, expects the ingress PE to [try to] find granular input keys from the incoming traffic, (e.g.: find input keys from an IP header contained within an Ethernet frame that will be transported as a PW VC over your MPLS core), and create a hash of that that will get placed into a "FAT PW" label that sits below the PW VC label. The idea is that Core LSR's would still load-balance based on the bottom-most to top-most MPLS labels, which should result in more even load-distribution of PW VC flows over component-links in a LAG. This feature is just starting to appear in one vendor's equipment and will hopefully show up in others soon, as well. (Please bug your vendors for this!
3) Depending on the vendor, you may specifically have to configure the device to do load-balancing over LAG's or ECMP paths, (e.g.: Juniper & Brocade, possibly others). Generally, you have to configure the device what input keys to look for and/or what # of MPLS labels to look past for those input-keys, e.g.: in Juniper you configure forwarding-options -> hash-key -> family mpls -> labels-1, label-2, payload -> ip, etc.

Some other things to look out for:
4) Some vendor's may use different hash algorithms for LAG vs. ECMP, so you may get "better" load-balancing from one compared to the other. Ask your vendor for details as this may not be obvious from Lab testing.
5) Some vendors may have a limit, of the maximum number of MPLS labels that they can look past to find, say, an IP payload that can be used as input-keys for the load hashing algorithm. This used to be a concern several years ago, but in general most medium- to high-end equipment can look past /at least/ 3 MPLS labels, which should cover you in the more common cases where either:
a) You have IP/LDP/RSVP/RSVP-FRR, where the outermost label is a RSVP Bypass Label when you're [briefly] running on a Bypass; or,
b) You have VPN-label/LDP/RSVP, where you're moving IPVPN or 6PE, etc. traffic and using LDP over RSVP tunneling.

Anyway, HTH,

-shane

Nick_Hilliard3 · April 5, 2011, 7:05pm

Some older equipment will unequally prefer certain links over others, depending on the number of members in the LAG. I.e. a 2-member LAG might load balance equally under ideal conditions, but a 3-member LAG might naturally load balance 2:2:1. This is particularly a problem if you have, say an 8-member LAG and you lose a single member, which could drop your overall throughput to the total of 4 members.

Nick

Daniel_Roesen · April 6, 2011, 10:17pm

Even newer gear does that. TurboIron 24X for example. Some Force10
switch model(s) as well, no clue how old though.

LAGs have one big advantage over ECMP: with gear implementing
"minimum-links" feature, you can make sure your LAG bandwidth doesn't
fall below a certain capacity before being removed from IGP topology
so you can make sure redundant (full!) capacity elsewhere can automatically
kick in.

With ECMP traffic engineering and capacity/redundancy planning
becomes... "interesting". Aside of all the operational problems
regarding troubleshooting (traceroutes/mtr do love such ECMP hells) and
operational consequences of having a lot of adjacencies and links.

For all those reasons, I usually prefer LAGs (with LACP) above ECMP, even
when that means "more bugs" (vendors tend to not properly test all their
features on LAGs too).

Best regards,
Daniel

Nick_Hilliard3 · April 7, 2011, 6:45am

I believe this has been fixed on s/w version 4.2.00 on the turboiron, and that it can now support arbitrary numbers of lag members. Haven't tested it though...

Nick

Daniel_Roesen · April 7, 2011, 8:49am

Interesting, as Fou^WBrocade's statement was that this is unfixable due
to a chipset (which is Broadcom) limitation.

Best regards,
Daniel

Nick_Hilliard3 · April 7, 2011, 9:56am

I asked them about this exact point, but my SE said it was a software restriction which was fixed as of 4.2.

Nick