ICMPv6 "too-big" packets ignored (filtered ?) by Cloudflare farms

hello,

    I confess using IPv6 behind a 6in4 tunnel because the "Business-Class" service
    of the concerned operator doesn't handle IPv6 yet.

    as such, I realised that, as far as I can figure, ICMPv6 packet "too-big" (rfc 4443)
    seem to be ignored or filtered at ~60% of ClouFlare's http farms

    as a result, random sites such as http://nanog.org/ or https://www.ansible.com/
    are badly reachable whenever small mtu are involved ...

    support@cloudflare answered me that because I'm not the owner of concerned site,
    and because of security reasons, they wouldn't investigate further.

    are there security concerns with ICMP-too-big ?

    regards,

Hey Jean,

    I confess using IPv6 behind a 6in4 tunnel because the "Business-Class" service
    of the concerned operator doesn't handle IPv6 yet.

    as such, I realised that, as far as I can figure, ICMPv6 packet "too-big" (rfc 4443)
    seem to be ignored or filtered at ~60% of ClouFlare's http farms

Might be related to this:
https://blog.cloudflare.com/path-mtu-discovery-in-practice/

If you run ECMP then the hash algorithms make no guarantees ICMP
messages generated by transit devices reach the correct host.

Hey Jean,

   I confess using IPv6 behind a 6in4 tunnel because the "Business-Class" service
   of the concerned operator doesn't handle IPv6 yet.

   as such, I realised that, as far as I can figure, ICMPv6 packet "too-big" (rfc 4443)
   seem to be ignored or filtered at ~60% of ClouFlare's http farms

Might be related to this:
https://blog.cloudflare.com/path-mtu-discovery-in-practice/

If you run ECMP then the hash algorithms make no guarantees ICMP
messages generated by transit devices reach the correct host.

Then Cloudflare should negotiate MSS’s that don’t generate PTB’s if
they have installed broken ECMP devices. The simplest way to do that
is to set the interface MTUs to 1280 on all the servers. Why should
the rest of the world have to put up with their inability to purchase
devices that work with RFC compliant data streams.

Mark

I've had this issue with cdnjs.cloudflare.com for the longest time at my
house. But as some of you may recall, my little unwanted TCP MSS hack
for IPv6 last weekend fixed that issue for me.

Not ideal, and I so wish IPv6 would work as designed, but...

Mark.

Then Cloudflare should negotiate MSS’s that don’t generate PTB’s if
they have installed broken ECMP devices. The simplest way to do that
is to set the interface MTUs to 1280 on all the servers. Why should
the rest of the world have to put up with their inability to purchase
devices that work with RFC compliant data streams.

I've had this issue with cdnjs.cloudflare.com for the longest time at my
house. But as some of you may recall, my little unwanted TCP MSS hack
for IPv6 last weekend fixed that issue for me.

Not ideal, and I so wish IPv6 would work as designed, but…

It does work as designed except when crap middleware is added. ECMP
should be using the flow label with IPv6. It has the advantage that
it works for non-0-offset fragments as well as 0-offset fragments and
also works for transports other than TCP and UDP. This isn’t a protocol
failure. It is shitty implementations.

That's what I mean... we find ways to break protocols ourselves.

Mark.

Then Cloudflare should negotiate MSS’s that don’t generate PTB’s if
they have installed broken ECMP devices. The simplest way to do that
is to set the interface MTUs to 1280 on all the servers. Why should
the rest of the world have to put up with their inability to purchase
devices that work with RFC compliant data streams.

I've had this issue with cdnjs.cloudflare.com for the longest time at my
house. But as some of you may recall, my little unwanted TCP MSS hack
for IPv6 last weekend fixed that issue for me.

Not ideal, and I so wish IPv6 would work as designed, but…

It does work as designed except when crap middleware is added. ECMP
should be using the flow label with IPv6. It has the advantage that
it works for non-0-offset fragments as well as 0-offset fragments and
also works for transports other than TCP and UDP. This isn’t a protocol
failure. It is shitty implementations.

Your mobile carrier’s stateless tcp accelerator should stop sending acks with a zero flow label so we can actually identify them as part of the same flow...

There a lot of headwind in the real world for using the flow label as a hash component.

Out of curiosity does that imply you are aware of non-broken ECMP
devices, which are able to hash on the embedded original packet?

Then Cloudflare should negotiate MSS’s that don’t generate PTB’s if
they have installed broken ECMP devices. The simplest way to do that

Out of curiosity does that imply you are aware of non-broken ECMP
devices, which are able to hash on the embedded original packet?

Parsing the icmp payload was something we considered in rfc7690 but wasn’t one the approaches we pursued (we broadcasted the ptb to all hosts on the segment(s) behind the load balancers in our original implementation).

It actually seems like it is becoming feasible to do in an Ethernet switch ASIC like tofino if that is what you want to burn real estate on. Being worthwhile is another matter.

It does work as designed except when crap middleware is added. ECMP
should be using the flow label with IPv6. It has the advantage that
it works for non-0-offset fragments as well as 0-offset fragments and
also works for transports other than TCP and UDP. This isn’t a protocol
failure. It is shitty implementations.

Out of curiosity, which operating systems put anything useful (for use
in ECMP) into the flow label of IPv6 packets? At the moment, I only
have access to CentOS 6 and CentOS 7 machines, and both of them set the
flow label to zero for all traffic.

There is also the problem that the device generating the Packet Too
Big ICMP, is not the same as the end host that the big packet was
destined for, and does not know what flow label the end host would
have set in its TCP responses. RFC 6437 is also explicit that:

   o Forwarding nodes such as routers and load distributors MUST NOT
      depend only on Flow Label values being uniformly distributed. In
      any usage such as a hash key for load distribution, the Flow Label
      bits MUST be combined at least with bits from other sources within
      the packet, so as to produce a constant hash value for each flow

In practice, using at least the source and destination IP(v6) addresses
in addition to the flow label. But the ICMP packet has a different
source address than TCP responses from the end host.

Further problem is that the TCP responses from the destination end host
might not even be *passing* the router that generates a Packet Too Big
ICMP error. In an anycast scenario, that router might have a route to
the sending IPv6 address that goes to a different datacenter than the
host that sent the large packet. E.g, consider the following network:

         A1 A2
         > >
        DC1 DC2
        / \ /
       / \ /
      / \ /
     R1 R2
      \ /
       \ /
        \ /
         R3
         >
         B

A1 and A2 are hosts in different datacenters, using the same anycast
address A. Host B initiates a TCP session with address A, R3 selects
the route via R1, and thus reaches A1 in datacenter DC1. A1 sends a
large packet towards B, but the router in DC1 elects to send that via
R2. R2 generates a PTB ICMP, but has its best route to address A
towards DC2...

  /Bellman

It is definitely possible in all relevant existing NPUs like Trio,
Solar, FP, EZChip, Lightspeed et.al. As it is within visibility of
lookup engine and it is at fixed offset. So not only possible but also
cheap.

Out of curiosity, which operating systems put anything useful (for use
in ECMP) into the flow label of IPv6 packets? At the moment, I only
have access to CentOS 6 and CentOS 7 machines, and both of them set the
flow label to zero for all traffic.

FreeBSD 11.2-STABLE.

Steinar Haug, Nethelp consulting, sthaug@nethelp.no

Did you submit a bug report?

Stephen Satchell <list@satchell.net> writes:

For those who might need this feature, and have a Red Hat contract, a
suggestion:

If you submit a ticket, someone at Red Hat might backport the patch for you.

Please see: https://tools.ietf.org/html/rfc5927

and also: https://tools.ietf.org/html/rfc8021

Thanks,

Not to play devil's advocate but the IETF fot to publish a spec for ECMP
use of Flow Labels only a few years ago.

For quite a while, they were unasable... and might still be, for some
implementations.

Then Cloudflare should negotiate MSS’s that don’t generate PTB’s if
they have installed broken ECMP devices. The simplest way to do that
is to set the interface MTUs to 1280 on all the servers. Why should
the rest of the world have to put up with their inability to purchase
devices that work with RFC compliant data streams.

I've had this issue with cdnjs.cloudflare.com for the longest time at my
house. But as some of you may recall, my little unwanted TCP MSS hack
for IPv6 last weekend fixed that issue for me.

Not ideal, and I so wish IPv6 would work as designed, but…

It does work as designed except when crap middleware is added. ECMP
should be using the flow label with IPv6. It has the advantage that
it works for non-0-offset fragments as well as 0-offset fragments and
also works for transports other than TCP and UDP. This isn’t a protocol
failure. It is shitty implementations.

Not to play devil's advocate but the IETF fot to publish a spec for ECMP
use of Flow Labels only a few years ago.

For quite a while, they were unasable... and might still be, for some
implementations.

And if it is still using the quintuple the PTB has all the necessary information
for unfragmented and 0 offset fragment packets (which there shouldn’t be with a
working TCP stack) to be passed through.

* Jean-Daniel Pauget

    I confess using IPv6 behind a 6in4 tunnel because the "Business-Class" service
    of the concerned operator doesn't handle IPv6 yet.

    as such, I realised that, as far as I can figure, ICMPv6 packet "too-big" (rfc 4443)
    seem to be ignored or filtered at ~60% of ClouFlare's http farms

    as a result, random sites such as http://nanog.org/ or https://www.ansible.com/
    are badly reachable whenever small mtu are involved ...

Hi Jean-Daniel.

If you're using using tunnels you'll want to have your tunnel endpoint
adjust down the TCP MSS value to match the MTU of the tunnel interface.
That way, you'll avoid problems with Path MTU Discovery. Even in those
situations where PMTUD does work fine, doing TCP MSS adjustment will
improve performance as the server does not need to spend an RTT to
discover your reduced MTU.

(This isn't really an IPv6 issue, by the way - ISPs using PPPoE will
typically perform MSS adjustment for IPv4 packets too.)

If you're using Linux as your tunnel endpoint, try:

ip6tables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu

Tore

hello,

    Tore Anderson, you're right, clamping MSS is very efficient and very
    certainly solves most of the problems.

    now for UDP, I don't know yet how does things like QUIC can be handled ...

    regards,