L2VPN/L2transport, Cumulus Linux & hardware suggestion

Dear folks,

have anyone already tried to run VXLAN/EVPN + “Bridge Layer 2 Protocol
Tunneling” on Cumulus Linux as an replacement for classic MPLS L2VPN/VPWS
(“xconnect”, l2circuit, VLL) ?

I need to provide transparent Ethernet P2P virtual leased lines to my
customers and these have to support stuff like LLDP, STP, LACP, etc. The
transport L2 network is not THAT big: max hops between VTEP is 4.

Anyone have suggestions for the below hardware request?
#) 1-3U L2/L3 box
#) 48x SFP28 / 1/10/25G
#) 6x QSFP28 / 100G
#) VXLAN/EVPN with L2 tunneling support
or
#) MPLS VPWS/l2circuit
#) Dual PSU

thanks & best regards
Jürgen

Good luck with tunnelling LACP, no matter what boxes you have - LACP has (de facto) hard jitter requirements of under 1msec, or you'll be getting TCP resets coming out your ears due to mis-ordered packets.

For your requirements, although I hesitate to recommend them for enterprise/carrier use, Miktotik's EoIP protocol does a much better job of this than most "carrier-grade" implementations.

Otherwise, Juniper and Arista both come to mind, Juniper has the EX4650 that matches your h/w specs, and Arista has, oh, at least half a dozen boxes of various spec that comply, too. Not 100% sure the Juniper EX does 25G, now that I think of it.

Adam Thompson
Consultant, Infrastructure Services
MERLIN
100 - 135 Innovation Drive
Winnipeg, MB, R3T 6A8
(204) 977-6824 or 1-800-430-6404 (MB only)
athompson@merlin.mb.ca
www.merlin.mb.ca

Dear Adam,

yeah, forget about LACP - the bigger problem is all the LLDP and STP stuff,
that gets interpreted at the UNI port. LACP is a bad example - but there are
many other frames and protocols, which must work. Could be that a customer
wants to run MPLS+LDP on his VLL (for whatever reason ...).

For your requirements, although I hesitate to recommend them for

enterprise/carrier use, Miktotik's EoIP protocol does a much better job of
this than most "carrier-grade" implementations.

Not at wirespeed ... and not without causing other issues (single thread
load, etc).

Juniper has the EX4650 that matches your h/w specs,... Not 100% sure the

Juniper EX does 25G, now that I think of it.

Yeah, EX4650 it does: 48x 1/10/25G + 6x 100G + MPLS
It also supports Ethernet over MPLS (at least they say here:
https://www.juniper.net/documentation/en_US/junos/topics/topic-map/mpls-over
view.html#id-mpls-feature-support-on-qfx-series-and-ex4600-switches) but at
some of their sites they mention, that MPLS-based CCC are not support:
https://www.juniper.net/documentation/en_US/junos/topics/topic-map/mpls-over
view.html#jd0e2531

" ... MPLS-based circuit cross-connects (CCC) are not supported—only
circuit-based pseudowires are supported. ..."

There is also the QFX5120-48Y - 48x 1/10/25G + 8x 100G + MPLS
In the past QFX wasn't the best idea for MPLS topics ... has this changed?

and Arista has, oh, at least half a dozen boxes of various spec that

comply, too.

Yeah, I already know them (do have some older 7050S). The call it "VXLAN P2P
Pseudowire", but there is absolutely nothing in there CLI documentation :(.
Looks like the feature is only support on the 7280 platform.

Possible options:
7280SR2-48YC6

Do you have any experience with what they call "VXLAN P2P Pseudowire"? I
can't even find a config example on the net :frowning:

thanks & best regards
Jürgen

Hey Adam,

Good luck with tunnelling LACP, no matter what boxes you have - LACP has (de facto) hard jitter requirements of under 1msec, or you'll be getting TCP resets coming out your ears due to mis-ordered packets.

Can you elaborate on this? Where is LACP jitter defined and for what
purpose? We push packets around the globe in sub 200us jitter on any
given day, so 1000us isn't for us a particularly hard goal.

Only reason why I could imagine someone would care about jitter here
is if protocol measures delay (LACP doesn't) and relies on delay to
remain static and then balances per-packet or per-byte or otherwise
between multiple links.
However we of course put all packets from given TCP session to always
same LACP interface, so from TCP session POV, each LACP is exactly a 1
interface. Per-packet balancing on LACP is possible via a special
configuration, but anyone who does it, doesn't care about reordering,
no matter of jitter, because even in very stable jitter, the paths may
be unequal length and cause reordering.

LACP hellos are sent every 1s when in fast mode with 3s keepalive,
which also isn't particularly tight. We do have customers running LACP
over MPLS pseudowires over great distances.

Hmmh - this is odd.

We once provided a customer with an EoMPLS pw between Johannesburg and
London, which tunneled a number of L2CP's, including LACP.

Worked well, and I'd say jitter varied but never exceeded 20ms.

Mark.

Good luck with tunnelling LACP, no matter what boxes you have - LACP
has (de facto) hard jitter requirements of under 1msec, or you'll be
getting TCP resets coming out your ears due to mis-ordered packets.

Errr.... sorry, but at the latest news, TCP was supposed to handle out of order packets and reorder them before sending them to upper layer.
Not to mention hashing that almost systematically makes that all packets of the same TCP stream will be sent on the same link in an LAG (also on most if not all ECMP implementations).

Miktotik
"carrier-grade"

.....

Yes, however new reno and the like are tuned for practical Internet.
Practical Internet has lot more packet loss than reordering, so TCP
algorithm considers any amount of reordering a packet loss, causing an
immediate resend, destroying your performance.

However, as you state TCP will only ever see single port LACP interfaces.

True, but TCP is unaware about if the interface is a LAG or a native
port. It's just another tube.

We tested per-packet load balancing on the MX Trio line cards. The
traffic spread is perfect, but the OoO experience is atrocious.

Either settle for per-flow load balancing, move to a faster native port,
or stick with ECMP at the IP layer.

For instance, this is why we don't do LACP for backbones anymore. It is
far more reliable to have individual IP links, and let ECMP do its thing.

The only place we run LAG's in our network is 802.1Q trunks between
router and switch. But the moment those get to 4x 10Gbps, we go native
100Gbps (which has the added benefit of making per-service policing on
the router easier).

Mark.

If jitter were defined anywhere vis-à-vis LACP, it would be de jure, not de facto as I said.

Yes, if you have guaranteed that TCP sessions hash uniquely to a single link in your network, you might be able to successfully tunnel LACP (or EtherChannel, or any other L1 link-bonding technique). The last time I attempted to do this on my network, I discovered that guarantee wasn’t nearly as ironclad as I expected. I don’t remember the gory details, at this remove, sorry. Maybe it wasn’t TCP? Maybe it wasn’t the default hashing algorithm? Dunno.

-Adam

Hey Adam,

Good luck with tunnelling LACP, no matter what boxes you have - LACP has (de facto) hard jitter requirements of under 1msec, or you’ll be getting TCP resets coming out your ears due to mis-ordered packets.

Can you elaborate on this? Where is LACP jitter defined and for what
purpose? We push packets around the globe in sub 200us jitter on any
given day, so 1000us isn’t for us a particularly hard goal.

Only reason why I could imagine someone would care about jitter here
is if protocol measures delay (LACP doesn’t) and relies on delay to
remain static and then balances per-packet or per-byte or otherwise
between multiple links.
However we of course put all packets from given TCP session to always
same LACP interface, so from TCP session POV, each LACP is exactly a 1
interface. Per-packet balancing on LACP is possible via a special
configuration, but anyone who does it, doesn’t care about reordering,
no matter of jitter, because even in very stable jitter, the paths may
be unequal length and cause reordering.

LACP hellos are sent every 1s when in fast mode with 3s keepalive,
which also isn’t particularly tight. We do have customers running LACP
over MPLS pseudowires over great distances.

If jitter were defined anywhere vis-à-vis LACP, it would be _de jure_, not _de facto_ as I said.

I suspect the de-facto domain you think of has modest population. As
jitter would only matter in case where protocol measures delay and
artificially adds static delay to compensate. This is not the case for
LACP (some balancing solutions do latency compensation), jitter is
immaterial.

Yes, if you have *guaranteed* that TCP sessions hash uniquely to a single link in your network, you might be able to successfully tunnel LACP (or EtherChannel, or any other L1 link-bonding technique). The last time I attempted to do this on my network, I discovered that guarantee wasn't nearly as ironclad as I expected. I don't remember the gory details, at this remove, sorry. Maybe it wasn't TCP? Maybe it wasn't the default hashing algorithm? Dunno.

Jitter on software device connected directly has order of magnitude
higher jitter than operator pseudowire across globe, so adding tunnel
or not adding tunnel is not at all indicative of amount of jitter,
which still is not a metric that LACP cares about.

Internet works, because hashing works, it's not perfect, but it's good
enough that in practical Internet most links you traverse are relying
on hash to work, be it ECMP or LAG.

I do run the 7280SR2-48YC6, but I don’t do VPLS or pseudowires on them right now so I can’t help directly with that.

Based on my experience with Arista so far, it’ll be perfectly-well documented, just for a different platform, and in a blog post instead of in the user manual. :frowning:

(Note to anyone from Arista lurking on the list: your User Manual sucks rocks because it’s wildly incomplete. Please put some of the effort that goes into those EOS Central blog posts, into the manual instead.)

As to the Juniper, I’m a client on a Juniper-based VPLS system, and the only thing it consistently intercepts is LLDP… which I’m actually OK with, mostly. Other BPDUs and other Ethernet protocols get passed through (that we’ve tested, so far). We have heard of some feature limitations on the EX4650, no CCC is unfortunate. I don’t have any experience with the QFX series as an operator or customer so can’t comment.

Adam Thompson
Consultant, Infrastructure Services
1593169877849
100 - 135 Innovation Drive
Winnipeg, MB, R3T 6A8
(204) 977-6824 or 1-800-430-6404 (MB only)
athompson@merlin.mb.ca
www.merlin.mb.ca

(re-adding Adam's text that didn't get quoted, but matters)

You get the RESETs from people that do anycast when your broken ECMP hashing splits the packets between multiple upstream providers. This might cause parts of your TCP stream to end up at entirely different destinations. Probably not going to happen with LACP but these things are related and often use the same knops in the configuration.

Regards,

Baldur

The EX 4650 does indeed do 25G.

Chris