Cisco 7600 PFC3B(XL) and IPv6 packets with fragmentation header

Just thought I'd share some operational info.

PFC3B will by default punt IPv6 packets with fragmentation header to RP and route them there, with the obvious performance penalty this incurs.

Workaround is to change this behaviour, meaning ACLs won't work for packets with fragmentation header anymore:

   #platform ipv6 acl fragment hardware ?
     drop Drop IPv6 fragments at hardware
     forward Forward IPv6 fragments at hardware

PFC3C is supposed to not be affected.

A lot of Teredo and 6to4 traffic has fragmentation headers, so this actually is a real problem. We discovered this at our Teredo relay upstream router.

Just thought I'd share some operational info.

PFC3B will by default punt IPv6 packets with fragmentation header to RP and
route them there, with the obvious performance penalty this incurs.

when will vendors learn that punting to the RE/RP/smarts for packets
in the fastpath is ... not just 'unwise' but wholesale stupid? :frowning:

Workaround is to change this behaviour, meaning ACLs won't work for packets
with fragmentation header anymore:

#platform ipv6 acl fragment hardware ?
drop Drop IPv6 fragments at hardware
forward Forward IPv6 fragments at hardware

your recommendation is to ... forward? (or perhaps not 'recommendation' but:
"Forward means do not pass go, just ship out the proper egress interface.
drop means ... send to hell"

If you do nothing the default behavior is to send the packet to the
RP... why? (why would you want this packet sent to the RP? it's got a
valid destination, no? so deliver it out the egress interface?)

thanks!
-chris

I was told it's because PFC3B can't look into the packet far enough to determine what the payload is (TCP/UDP etc) and port, that's only the RP that can do ACL handling of the packet.

So if you configure "forward", people can put a fragmentation header on the packet and skip past your ACL.

What to do with IP options or IPv6 hop-by-hop options? What to do with IPv6
packets which contain options which push TCP/UDP past your lookup view?

Punting transit is not only not stupid but also necessary in hardware routers
which cannot handle every case in hardware (which is all routers).
There should just be adequate way to limit these and there should exist default
limitation.

when will vendors learn that punting to the RE/RP/smarts for packets
in the fastpath is ... not just 'unwise' but wholesale stupid? :frowning:

What to do with IP options or IPv6 hop-by-hop options? What to do with IPv6
packets which contain options which push TCP/UDP past your lookup view?

a switch to be used that stops processing this sort of thing, in an
internet core (and honestly most enterprise core) routers, all I want
is packet-in/packet-out. there's no need for anything else, stop
trying to send line-rate packets to the cpu.

Punting transit is not only not stupid but also necessary in hardware routers
which cannot handle every case in hardware (which is all routers).

no. all you need is a default 'do not process these, just fwd them'
switch. (or, a switch at any rate that the operator can select one way
or the other, they SHOULD know what is the best for their deployment).

There should just be adequate way to limit these and there should exist default
limitation.

I really think zero limit is the right limit... (for a large number of
deployments)

a switch to be used that stops processing this sort of thing, in an
internet core (and honestly most enterprise core) routers, all I want
is packet-in/packet-out. there's no need for anything else, stop
trying to send line-rate packets to the cpu.

This would break e.g. RSVP. For some instances dropping all of them in hardware
is an option, for other instances ignoring and forwarding without understanding
is ok but some situation you simply must punt.

no. all you need is a default 'do not process these, just fwd them'
switch. (or, a switch at any rate that the operator can select one way
or the other, they SHOULD know what is the best for their deployment).

It would also break L4 ACL under certain situations, as well as RSVP as already
explained. And probably issues I'm not aware of. Unsure if blind forwarding is
best option. But I'm all for giving operator options, but calling it stupid
that vendors punt something is misguided.

I really think zero limit is the right limit... (for a large number of
deployments)

Traceroute would also break. Unpoliced punting certainly is extremely unwise,
but punting to a level that does not introduce significant CPU load, should be
safest default.

explained. And probably issues I'm not aware of. Unsure if blind forwarding is
best option. But I'm all for giving operator options, but calling it stupid
that vendors punt something is misguided.

after this long, yes... this is just dumb, there's no reason that the
default should be punt. There are cases (you've brought up a few)
where it's required today because of design limitations, there really
shouldn't be cases like this anymore. this isn't our first rodeo,
'lessons learned' and all that...

I really think zero limit is the right limit... (for a large number of
deployments)

Traceroute would also break. Unpoliced punting certainly is extremely unwise,

traceroute could certainly be handled in the fastpath.

but punting to a level that does not introduce significant CPU load, should be
safest default.

what is that limit? from a single port? from a single linecard? from a
chassis? how about we remove complexity here and just deal with this
in the fastpath?

My point in calling this all 'stupid' is that by now we all have been
burned by this sort of behavior, vendors have heard from all of us
that 'this is really not a good answer', enough is enough please stop
doing this.

-chris

after this long, yes... this is just dumb, there's no reason that the
default should be punt. There are cases (you've brought up a few)
where it's required today because of design limitations, there really
shouldn't be cases like this anymore. this isn't our first rodeo,
'lessons learned' and all that...

Certainly possible, but will you pay the premium? I won't. To implement IPv6
according to standard your lookup engine needs to have MTU wide view, so up-to
65kB. Most common view today probably is 64B and highest I know 256B.
And for the corner cases where this isn't enough, I'm happy to handle it in
software, rather than pay premium to do it all in hardware.

traceroute could certainly be handled in the fastpath.

Yup. But again who would pay for this? I cannot be dossed by TTL exceeds as
there is sufficient protetion mechanism in my hardware. So I would not pay
premium for this feature.

what is that limit? from a single port? from a single linecard? from a
chassis? how about we remove complexity here and just deal with this
in the fastpath?

It would increase cost and complexity greatly. If I could get it for free, then
I would take it, but I have lot more important things I want router vendors fix
first. I do wish vendor would do is test box with attack vectors and implement
sane defaults (IOS-XR is relatively good in this respect, or maybe it just
looks that way as rest of them are really bad with their defaults).

Very recently I had chat with GSR owner who was happy how GSR/IOS is solid DDoS
resistant platform, while actually it is impossible to protect GSR/IOS (outside
iACL) as none of the protections (rACL/CoPP) are implemented in hardware. 7600
is reasonably good for its age in this matter.
But even modern examples, like MX80 completely fail with defaults. Killed MX80
in lab with bit over 5Mbps of IP options. Protection is quite easy but still
most people do not do it, so vendors really should ship boxes with saner
defaults.

traceroute could certainly be handled in the fastpath.

which traceroute? icmp? udp? tcp? Traceroute is not a single protocol.

what is that limit? from a single port? from a single linecard? from a
chassis? how about we remove complexity here and just deal with this
in the fastpath?

on a pfc3, the mls rate limiters deal with handling all punts from the
chassis to the RP. It's difficult to handle this in any other way.

My point in calling this all 'stupid' is that by now we all have been
burned by this sort of behavior, vendors have heard from all of us
that 'this is really not a good answer', enough is enough please stop
doing this.

"This is a Hard Problem". There is a balance to be drawn between hardware
complexity, cost and lifecycle. In the case of the PFC3, we're talking
about hardware which was released in 2000 - 11 years ago. The ipv6
fragment punting problem was fixed in the pfc3c, which was released in
2003. I'm aware that cisco is still selling the pfc3b, but they really
only push the rsp720 for internet stuff (if they're pushing the 6500/7600
line at all).

Nick

They are pushing sup2T - however more for enterprise ip layer (6500 series).
   Regards,
     Janos Mohacsi

Path MTU discovery would also break... oh wait, that's usually broken anyway.

-Vinny

they are now, yes. But until the sup2t started becoming available a couple
of weeks ago the only option for the 6500 was a sup720. You're right that
this was only pushed on the enterprise market.

Of course, if you wanted a 10g capable service provider router and didn't
want an asr9k, they were pushing the 7600 because the 6500 is a switch and
the 7600 is a router and the two are totally different, no really you've
gotta believe it. But at least the rsp720 could handle ipv6 fragments better.

Nick

traceroute is really an example of 'packet expired, send
unreachable'... that, today is basically:
  o grab 64bytes of header (or something similar)
  o shove that in a payload
  o use the src as the dst
  o stick my src on
  o set icmp
  o crc and fire

there's not really any need to do this in the slow path, is there?
-chris

if I turn my head to the side I can almost believe you.

there are unconfirmed rumours that icmp ping and traceroute are handled by
hardware on the asr1k. I don't know if they are true. But you're right -
it would be good to support this without resorting to hammering the routing
engine. I don't really like the idea of punters running traceroutes
reducing my bgp convergence time.

Nick

traceroute is really an example of 'packet expired, send
unreachable'... that, today is basically:
o grab 64bytes of header (or something similar)
o shove that in a payload
o use the src as the dst
o stick my src on
o set icmp
o crc and fire

there's not really any need to do this in the slow path, is there?

there are unconfirmed rumours that icmp ping and traceroute are handled by
hardware on the asr1k. I don't know if they are true. But you're right -

some platforms do some/all of this in hardware, yes. (I forget the matrix)

it would be good to support this without resorting to hammering the routing
engine. I don't really like the idea of punters running traceroutes
reducing my bgp convergence time.

this is exactly why punting anything NOT management and/or
routing-protocols should be banned. Thanks for making that point
explicitly.

-chris

Yes, but keep in mind that this particular issue has to do with an ASIC which is several years old and which contains other significant handicaps as well (viz. NetFlow caveats, no per-interface uRPF mode, etc.).

So, complaining about most anything on this particular ASIC isn't going to accomplish much, unfortunately. The key is to a) evaluate newer ASICs on more operationally useful platforms in order to see how they handle this sort of thing (EARL8 should be fine, AFAICT) and b) put the appropriate requirements into RFCs so that vendors have a monetary value associated with doing the right thing.

And this is the requirement which should be placed in RFPs, along with other specific requirements for ACL handling, flow telemetry functionality, uRPF, et. al.

If folks want to influence vendors to do the Right Thing, they have to expend the time and effort to quantify and qualify said Right Thing(s), and then put it into RFP requirements. Otherwise, complaining post-procurement isn't generally going to accomplish much.

yes, my bitchfest was also a 'could we all start asking for this, now?' ... :slight_smile: