PMTU and Broken Servers

I've recently had the pleasure of troubleshooting a problem I don't
normally have to deal with, and the results don't quite make sense
to me. I'm hoping someone can enlighten me as to what is going on.
A diagram:

server---internet---fw---tunnelbox1----tunnelbox2----user

The tunnel between the tunnelboxes is a lower (1480) MTU. Originally
the user couldn't access some servers, turns out the firewall was
filtering ICMP Can't Fragment messages, preventing PMTU from working
in the server->user direction (tunnelbox1 would generate Can't
Fragement, firewall would filter).

That's been corrected. Going to a server I control I see good PMTU
in both directions between the server and the user. However, there
are still a number of web servers for popular sites that behave
just like the firewall was still filtering Can't Fragments. The
theory is that the servers are behind a firewall/load balancer that
is filtering them on the server side -- but I find it slightly
(emphasis on the slightly) that someone would turn on PMTU discovery,
and then filter it out right in front of the boxes where they turned
it on. Also, it seems to me most DSL users are behind PPPoE links
with lower MTU, and should get hit by the same problem.

The temporary hack is to have tunnelbox1 clear the DF bit on all
incoming packets, which just causes the packets to get fragmented
going down the tunnel. A minor performance hit, but it works.

This is a new problem to me, but I'm sure people have run into it
before. Are the servers really that broken (PMTU enabled, ICMP
Can't Fragement filtered)? Does the head end box of DSL services
generally do something to work around this (ie, clear the DF bit)?
Am I just being an idiot and missing something obvious?

I've recently had the pleasure of troubleshooting a problem I don't
normally have to deal with, and the results don't quite make sense
to me. I'm hoping someone can enlighten me as to what is going on.
A diagram:

I had a rant about this a few months back (as many others have done before me),
its a combination of ICMP filtering and RFC1918 links on the Internet that cause
this

server---internet---fw---tunnelbox1----tunnelbox2----user

The tunnel between the tunnelboxes is a lower (1480) MTU. Originally
the user couldn't access some servers, turns out the firewall was
filtering ICMP Can't Fragment messages, preventing PMTU from working
in the server->user direction (tunnelbox1 would generate Can't
Fragement, firewall would filter).

That's been corrected. Going to a server I control I see good PMTU
in both directions between the server and the user. However, there
are still a number of web servers for popular sites that behave
just like the firewall was still filtering Can't Fragments. The
theory is that the servers are behind a firewall/load balancer that
is filtering them on the server side -- but I find it slightly
(emphasis on the slightly) that someone would turn on PMTU discovery,
and then filter it out right in front of the boxes where they turned
it on. Also, it seems to me most DSL users are behind PPPoE links
with lower MTU, and should get hit by the same problem.

The temporary hack is to have tunnelbox1 clear the DF bit on all
incoming packets, which just causes the packets to get fragmented
going down the tunnel. A minor performance hit, but it works.

Consider this a permanent hack if you want to keep things working on the
tunnel..

This is a new problem to me, but I'm sure people have run into it
before. Are the servers really that broken (PMTU enabled, ICMP

Absolutely

Can't Fragement filtered)? Does the head end box of DSL services
generally do something to work around this (ie, clear the DF bit)?

I've wondered this too, not sure but they clearly do something, perhaps they
encapsulate the packets in fragments then recombine without altering the
original packet?

Am I just being an idiot and missing something obvious?

Steve

This is a new problem to me, but I'm sure people have

    > run into it before. Are the servers really that broken
    > (PMTU enabled, ICMP Can't Fragement filtered)? Does the
    > head end box of DSL services generally do something to
    > work around this (ie, clear the DF bit)? Am I just
    > being an idiot and missing something obvious?

I first saw this about four years ago with a web site running behind
a load balancing device. It was -- and probably still is -- another
issue of default configuration hell. The web servers were configured
by default to do Path MTU discovery, while the load balancer had
no concept of passing the ICMP Need Fragment packet back to the
appropriate server.

(There may still be no good way to do this; if I remember right,
the ICMP Need Fragment packet contains only IPs and not ports;
the host sending the ICMP packet will be using its IP and the outside
IP of the load balancer, giving the load balancer no good way to
determine where to pass the ICMP packet, unless the load balancer
is guaranteeing that all data from a particular IP goes to a particular
server -- also not a default configuration.)

It's a hard call for which to make the default; PMTU makes sense,
obviously, unless you're running behind a load balancer. It's another
one of those things that probably isn't documented anywhere, or if it is,
it's buried in an appendix that nobody gets to.

The only solution is to mail the folks maintaining the web sites you
can't get to with a short explanation of what you think the problem is,
and hope they look into it and fix it. Not unlike smurf relays and
networks that don't filter outgoing source addresses. }:>

-dalvenjah

I've had the problem before. Not all routers handle PMTU correctly.

Curtis

You mean theres routers which get a large packet and silently drop it rather
than return an icmp?

Curious as to know which vendors? (read fundementally broken!)

Steve

Thus spake "Stephen J. Wilcox" <steve@telecomplete.co.uk>

You mean theres routers which get a large packet and silently drop it

rather

than return an icmp?

Curious as to know which vendors? (read fundementally broken!)

Well, most core routers rate-limit the ICMP messages they generate, so any
given packet may not result in a Needs-Fragmentation error.

If the result is consistent, however, you're likely dealing with an ACL or
broken loadbalancer as Leo describes:

However, there

> > are still a number of web servers for popular sites that behave
> > just like the firewall was still filtering Can't Fragments. The
> > theory is that the servers are behind a firewall/load balancer that
> > is filtering them on the server side -- but I find it slightly
> > (emphasis on the slightly) that someone would turn on PMTU discovery,
> > and then filter it out right in front of the boxes where they turned
> > it on. Also, it seems to me most DSL users are behind PPPoE links
> > with lower MTU, and should get hit by the same problem.

The problem here is that the Needs-Frag error comes back as an ICMP, and
many load balancers don't bother looking inside at the offending packet to
determine which server to forward the error to. Why do these people use
PMTUD? It's on by default, and you have to muck with the registry (or the
unix equivalent) to disable it, at which point you're better off enabling
PMTU Black Hole Detection. Hopefully BHD will also be default someday.

Most network folk have found it's easier to provide 1500 MTU than to educate
all of the server operators and end users as to what's going wrong with
PMTU. This is also, IMHO, the only significant reason jumbo frames aren't
in widespread use -- we have no reliable means of coping with networks that
remain at 1500 MTU.

S

I had a problem where a NXNetworks VPN router didn't process the results
properly. I couldn't put my finger on exactly whose router was causing
the trouble, but using freeswan to a freeswan I was able to test my theory
as I gradually increased the MTU on my connection until I got a failure.
One end of the VPN is on a RoadRunner connection and the other was on a
Prexar connection. The route in between is anyone's guess, but I think,
at the time, Prexar was trying to push traffic over their Cable and
Wireless connection. Now that C&W is gone, I'll have to try it again.

Curtis

Okay we're not actually saying the TCP stack is broken then as I interpreted
your previous email, we mean there are routers with broken (user) config on them
ie dropping icmp frags. Sorry!

Steve

Most of the equipment in between would be Cisco, Juniper and Redback. I
doubt that any of that equipment has broken stacks, just configured to
not send ICMP replies so PMTU discovery will break. IPSEC is rather
picky. :slight_smile:

curtis

* stephen@sprunk.org (Stephen Sprunk) [Mon 12 May 2003, 19:24 CEST]:

Most network folk have found it's easier to provide 1500 MTU than to
educate all of the server operators and end users as to what's going
wrong with PMTU. This is also, IMHO, the only significant reason jumbo
frames aren't in widespread use -- we have no reliable means of coping
with networks that remain at 1500 MTU.

That was already the case when the FDDI MAEs were still in operation
with their 4470 byte MTUs, where the Gigaswitches didn't have IP
addresses they could send ICMP Fragmentation Needed messages from
when having to bridge large frames from FDDI to Ethernet...

Regards,

  -- Niels.