Forwarding issues related to MACs starting with a 4 or a 6 (Was: [c-nsp] Wierd MPLS/VPLS issue)

Job_Snijders3 · December 2, 2016, 2:32pm

Hi all,

Ever since the IEEE started allocating OUIs (MAC address ranges) in a
randomly distributed fashion rather then sequentially, the operator
community has suffered enormously.

Time after time issues pop up related to MAC addresses that start with a
4 or a 6. I believe IEEE changed their strategy to attempt to
purposefully higher the chance of collisions with MAC squatters, to
encourage people to register and pay the fee.

The forwarded email at the bottom is yet another example of a widely
deployed, but fundamentally broken ASIC. The switch can't forward VPLS
frames which contain a payload where the inner packet is destined to a
MAC starting with a 4 or a 6. This is with the switch operating in pure
layer-2 mode, it doesn't know what MPLS or VPLS even are. The switch is
dropping packets on the floor, based on their _payload_. Try selling
such circuits to customers "discounted layer-2 service, some flows might
not be forwarded".

Had IEEE continued the sequential OUI allocations, it probably would've
taken many years before we ever reached MACs starting with a 4 or a 6,
but instead, in 2012 the first linecards started rolling out of
factories with MACs burned in which start with a 4 or a 6, and this took
some vendors by surpise.

There have been quite some issues, both in hardware and software:

Brocade produced a 24x10GE linecard to the market in 2013/2014, with
limited FIB scale, meant for a BGP-free MPLS core, but the card can't
keep flows together on LACP bundles if the inner packets in a pseudowire
were destined for a 4 or 6 MAC. The result: out of order delivery,
hurting performance.

Cisco ASR 9k's had a bug where if a payload started with a 6, it assumed
it would be an IPv6 packet, compare the calculated packet-length with
the packet-length in the packet and obviously fail because an ethernet
packet is not an IPv6 packet. The result: packets dropped on the floor.
(Fixed in 4.3(0.32)I)

The Nexus 9000 issue described at the top of this mail. Brocade IronWare
had an issue related to packet reordering for flows inside pseudowires,
fixed in 2013/2014. There are probably many more examples out there in
the wild, slowly driving operators insane.

At this moment, some issues related to MACs starting with a 4 or a 6 can
be mitigated if you enable Pseudowire Control-Word (RFC 4385) _AND_
Flow-Aware Transport (RFC 6391). You need both to mitigate certain issues
in multi-vendor networks (for instance if you have Cisco edge + Juniper
core). But what to do when the ASIC won't forward the payload? As ISP
you often don't control the payload.

Unfortunatly, I don't think we've seen the end of this. The linecards
bought in 2012 will trickle down to the grey/second-hand market about
now, often without accompanying support contracts. In a world with
increased complexity in our interconnectedness, and lack of visibility
into the underlaying infrastructure (think remote peering, cloud
connectivity, resellers reselling layer-2) it will hurt when some
flows inexplicably fail to arrive.

Dear IEEE, please pause assigning MAC addresses that start with a 4 or a
6 for the next 6 years. Or at least, next time you change the policy,
consult the operational community. This 4/6 MAC issue was well
documented in BCP128 back in 2007. The control-word drafts mentioned
that there would be dragons related to 4 and 6 back in 2004.

Dear Vendors, take this issue more serious. Realise that for operators
these issues are _extremely_ hard to debug, this is an expensive time
sink. Some of these issues are only visible under very specific, rare
circumstances, much like chasing phantoms. So take every vague report of
"mysterious" packetloss, or packet reordering at face value and
immediately dispatch smart people to delve into whether your software or
hardware makes wrong assumptions based on encountering a 4 or a 6
somewhere in the frame.

And you, my fellow operators, please continue to publicly document these
issues and possible workarounds.

Kind regards,

Job

resources:

c-nsp thread "Wierd MPLS/VPLS issue": https://puck.nether.net/pipermail/cisco-nsp/2016-December/thread.html
https://www.nanog.org/meetings/nanog57/presentations/Tuesday/tues.general.SnijdersWheeler.MACaddresses.14.pdf
BCP128: https://tools.ietf.org/html/bcp128

----- Forwarded message from Simon Lockhart <simon@slimey.org> -----

Christopher_Morrow · December 2, 2016, 3:29pm

you'd think standard testing of traffic through the asic path somewhere
between 'let's design an asic!' and 'here's your board ms customer!' would
have found this sort of thing, no? or does testing only use 1 mac address
ever?

Simon_Lockhart1 · December 2, 2016, 4:02pm

Well, it's actually payload, rather than src/dst MAC used for forwarding, so
there's quite a few more combinations to look for...

2^(8*9216) is quite a lot of different packets to test through the forwarding
path... But, wait, that assumes every bit combination for 9216 byte packets,
but the packet might be shorter than that... So multiply that by (9216-64).

Anyone want to work out how many years that'd take to test, even at 100G?

Simon

Christopher_Morrow · December 2, 2016, 4:07pm

> you'd think standard testing of traffic through the asic path somewhere
> between 'let's design an asic!' and 'here's your board ms customer!'
would
> have found this sort of thing, no? or does testing only use 1 mac address
> ever?

Well, it's actually payload, rather than src/dst MAC used for forwarding,
so
there's quite a few more combinations to look for...

2^(8*9216) is quite a lot of different packets to test through the
forwarding
path... But, wait, that assumes every bit combination for 9216 byte
packets,
but the packet might be shorter than that... So multiply that by (9216-64).

but most/all forwarding asics (aside from perhaps extreme's?) only deal
with the first N bits in the header (128 or so..) so... not quite as many
right?

Christopher_Morrow · December 2, 2016, 4:08pm

and REALLY they could have just started ~9 yrs ago: "Hey, maybe this 4/6
thing is really a problem? how about we add 2 other things to our testing
framework?"

instead of: "High Five! First to market!"

Alia_Atlas · December 2, 2016, 4:16pm

> > you'd think standard testing of traffic through the asic path somewhere
> > between 'let's design an asic!' and 'here's your board ms customer!'
> would
> > have found this sort of thing, no? or does testing only use 1 mac
address
> > ever?
>
> Well, it's actually payload, rather than src/dst MAC used for forwarding,
> so
> there's quite a few more combinations to look for...
>
> 2^(8*9216) is quite a lot of different packets to test through the
> forwarding
> path... But, wait, that assumes every bit combination for 9216 byte
> packets,
> but the packet might be shorter than that... So multiply that by
(9216-64).
>
>
but most/all forwarding asics (aside from perhaps extreme's?) only deal
with the first N bits in the header (128 or so..) so... not quite as many
right?

This sounds related to the well-known (at least 10+ years) issues around
guessing the
type of IP packet by looking at the first nibble of the encapsulated packet.
Take a quick look at RFC 7325, section 2.4.5.1 bullet 6.
This is what using the pseudo-wire code-word is meant to protect against.

I don't know if that's an option for networks using this.

Regards,
Alia

Nick_Hilliard3 · December 2, 2016, 4:59pm

Job Snijders wrote:

Dear IEEE, please pause assigning MAC addresses that start with a 4 or a
6 for the next 6 years.

Disagree that this is an IEEE problem. This is problem that vendors
need to work around. There is limited MAC space, and deprecating 1/8 of
it due to the inability of vendors to cope properly with it seems like a
really bad long term idea.

It seems that the problem that cropped up on cisco-nsp is that a layer 2
switch, the Nexus 92160 (and possibly everything else which uses the
same forwarding ASIC), cannot forward vpls frames with a 4 or 6 buried
at a specific location inside the contents of the frame.

This is an extraordinary bug which renders the hardware useless in
specific circumstances. What makes it worse is that this is a well
known corner case which should have been shaken out during design, if
not found during QA.

Nick

Leo_Bicknell1 · December 2, 2016, 5:32pm

In a message written on Fri, Dec 02, 2016 at 03:32:13PM +0100, Job Snijders wrote:

Dear Vendors, take this issue more serious. Realise that for operators
these issues are _extremely_ hard to debug, this is an expensive time
sink. Some of these issues are only visible under very specific, rare
circumstances, much like chasing phantoms. So take every vague report of
"mysterious" packetloss, or packet reordering at face value and
immediately dispatch smart people to delve into whether your software or
hardware makes wrong assumptions based on encountering a 4 or a 6
somewhere in the frame.

I also do not think this is an IEEE/MAC assignement problem. This
is a vendor's box can't forward a particular payload problem.

If I had boxes with this issue, I would be talking to my vendor
about how:

a) They were going to replace every single one of them with something
that does not have the bug.

b) What discount I would get on mainteance/support for having to swap
all of the devices.

Then I would follow it up with the other vendors I'm talking to
about all of my future purchases if they are unable to produce boxes
that work. And if the vendor who supplied these did not fix it, I
would give them no more business.

Job_Snijders3 · December 2, 2016, 7:50pm

Yes the vendors are doing a poor job. I also appreciate the argument
that IEEE just manages that number space and we should consider these
'just numbers' and the vendors need to make do. On the other hand if
IEEE had just stuck to the original allocation plan, you and I wouldn't
be dealing with this garbage situation.

IEEE told one of my friends: "We changed our allocation methods to
prevent vendors using unregistered mac addresses."

Does the cost of some squatters on poorly usable MAC space outweight the
cost of the community spending countless hours tracking down where those
dropped packets went?

IEEE could've shown more restrain by (temporary, until IPv4 is dead?)
avoiding 4 and 6 and still accomplished some of their goal (if this
dubious strategy even is effective).

I consider this a cascading failure. Clearly IEEE's change had a ripple
effect, and suprised a number of implementers, and ended up hurting us.

Kind regards,

Job

Nick_Hilliard3 · December 2, 2016, 9:16pm

Job Snijders wrote:

I consider this a cascading failure. Clearly IEEE's change had a ripple
effect, and suprised a number of implementers, and ended up hurting us.

this would be credible if this were a previously unknown problem, but it
isn't. It's been known for years that you need to be careful when
handling mpls encapsulated packets which encapsulate L2 frames and where
the source mac address starts with 4 or 6. This is not a new problem
and because it's not new, there is no good reason for vendors to make
the same mistakes again and again. TBH, it beggars belief that new L2
hardware is being thrown out the door which is unable to forward frames
of this form due to hardware limitations, and that it's apparently
unfixable.

Nick

Sukumar_Subburayan_s · December 2, 2016, 9:23pm

All,

I just want to come back on behalf of Cisco on this. We just investigated this issue and the issue is not an ASIC bug, but a flag set wrong by SW.
We will reach out to the original customer through TAC who posted this in NSP to resolve this issue.

sukumar

    > I also do not think this is an IEEE/MAC assignement problem. This is a
    > vendor's box can't forward a particular payload problem.

    > Job Snijders wrote:
    > > Dear IEEE, please pause assigning MAC addresses that start with a 4
    > > or a 6 for the next 6 years.
    >
    > Disagree that this is an IEEE problem. This is problem that vendors
    > need to work around. There is limited MAC space, and deprecating 1/8 of
    > it due to the inability of vendors to cope properly with it seems like a
    > really bad long term idea.

    Yes the vendors are doing a poor job. I also appreciate the argument
    that IEEE just manages that number space and we should consider these
    'just numbers' and the vendors need to make do. On the other hand if
    IEEE had just stuck to the original allocation plan, you and I wouldn't
    be dealing with this garbage situation.

    IEEE told one of my friends: "We changed our allocation methods to
    prevent vendors using unregistered mac addresses."

    Does the cost of some squatters on poorly usable MAC space outweight the
    cost of the community spending countless hours tracking down where those
    dropped packets went?

    IEEE could've shown more restrain by (temporary, until IPv4 is dead?)
    avoiding 4 and 6 and still accomplished some of their goal (if this
    dubious strategy even is effective).

    I consider this a cascading failure. Clearly IEEE's change had a ripple
    effect, and suprised a number of implementers, and ended up hurting us.

    Kind regards,

    Job

Nick_Hilliard3 · December 2, 2016, 9:26pm

Sukumar Subburayan (sukumars) wrote:

I just want to come back on behalf of Cisco on this. We just
investigated this issue and the issue is not an ASIC bug, but a flag
set wrong by SW. We will reach out to the original customer through
TAC who posted this in NSP to resolve this issue.

oh cool - this is great. Thanks for following up and clarifying.

Nick

Simon_Lockhart1 · December 2, 2016, 9:58pm

Sukumar,

Can I just publicly say thank you for taking the time to investigate my issue,
and identify that a fix is possible.

Looking forward to having a working Nexus...

Simon

Saku_Ytti1 · December 2, 2016, 11:17pm

Some devices by default look inside pseudowires to find IP inside
them, in this case even control-word won't help, you'll need to also
disable looking inside pseudowire.

Bandy_Rush1 · December 2, 2016, 11:31pm

I just want to come back on behalf of Cisco on this. We just
investigated this issue and the issue is not an ASIC bug, but a flag
set wrong by SW.

damn! you just took all the fun out of lynching ieee. sheesh!
</sarcasm>

randy

Job_Snijders3 · December 6, 2016, 12:28pm

Folks on NLNOG found another gem: http://mailman.nlnog.net/pipermail/nlnog/2016-December/002637.html

Liberal translation below. The big take-away for operators is that
service providers need to make it part of the MPLS Psuedo-wire
troubleshooting procedure to ask the customer which MACs are involved
and raise the red flag when a 4 or 6 is involved.

Mike_Jones1 · December 6, 2016, 12:50pm

MACs that didnt make it through the switch when running 4.12.3.1:

    4*:**:**:**:**:**
    6*:**:**:**:**:**
    *4:**:**:**:**:**
    *6:**:**:**:**:**
    **:**:*B:**:6*:**
    **:**:*F:**:4*:**

Can anyone explain the last 2 for me?

I was under the impression that this bug was mainly caused by some
optimistic attempt to detect raw IPv4 or IPv6 payloads by checking for
a version at the start of the frame. This does not explain why it
would be looking at the 5th octet.

I also would assume that there must be something else to the last 2
examples beyond just the B or F and 4 or 6 because otherwise it would
match way too many addresses to have not been noticed before. Perhaps
the full MAC address looks like some other protocol with a 4 byte
header?

Thanks,
Mike

Saku_Ytti1 · December 6, 2016, 1:50pm

Expect also per-vendor behaviour on ethertype values, result from one
vendor: http://ytti.fi/ether_type.png

Granted these are not technically ethertypes at all, but 802.3 frame
length, still some other vendors don't care and pass each of these
transparently. Here we can observe blackholing and policing depending
on 802.3 frame length value.

The same vendor here experiences packet loss on pseudowires if
ethertype tells it's ipv4, ipv6, mpls, vlan and packet /does not/
contain said payload. Potentially because NPU time-cost increases too
much.

Vendor never really explained either behaviour.

Other behavioural differences is that some vendors don't accept bad
source addresses, like MCAST source address, some other vendors do.
Pseudowires behaviour is highly dependent on hardware and software
release in corner cases. It's easy to debate that bad MACs should be
dropped, but it's also easy to argue that perhaps you're testing
things, and you expect to get transparent pipe and you to test if your
SUT accepts bad MACs or not.

Leo_Bicknell1 · December 6, 2016, 1:58pm

In a message written on Fri, Dec 02, 2016 at 08:50:40PM +0100, Job Snijders wrote:

IEEE told one of my friends: "We changed our allocation methods to
prevent vendors using unregistered mac addresses."

Does the cost of some squatters on poorly usable MAC space outweight the
cost of the community spending countless hours tracking down where those
dropped packets went?

That's the wrong question to ask.

The right question is, what could have been done to prevent this entire
situation?

This problem has occured in all sorts of number spaces before.
There have been squatters in almost every number space, boxes
"optimized" based on the pattern of allocation, code bugs that went
unnoticed due to part of the number space not being used. It's
happened to MAC's, IP's, ports, even protocol numbers.

One of the answers is to better allocate numbers. Starting at the
bottom and working up is almost never the optimal solution. Various
sparce allocation strategies exist which insure a wider range of
addresses are used early, there is a greater chance of wacking a
squatter early, and that the number space ends up more efficiently
used in many cases.

Had the IETF allocated a MAC starting with 0 then 2, then 4 then 6
then 8 then 10 then 12 then 14 this problem would have likely been
identified early on in vendor labs when testing the pseudowire code
and would have prevented the "hack" of looking deeper in the packet
and guessing because too many 4 and 6 MACs were already deployed.

Alexandru_Suciu1 · December 7, 2016, 12:19pm

The root cause for that issue is most likely due to the following bug:

BUG65077 : On the DCS-7150 series, the MPLS label of a frame may be
incorrectly overwritten by a DSCP field update in the ASIC. Fixed in
4.11.7 , 4.12.6 , 4.13.0 .

It was not related on the MAC values but rather the incorrect parsing of
the MPLS header.