PMTU and Broken Servers

Blah...forgot to send this to nanog as well...it seems there is a lack
of understanding of why this works in some situations and not in others,
so attempted to spread some knowledge here.

Also Sprach Leo Bicknell

but I find it slightly (emphasis on the slightly) that someone would
turn on PMTU discovery, and then filter it out right in front of the
boxes where they turned it on.

As someone else mentioned...it happens all the time...disconnect between
server and network admins probably, or something along those lines (or
just general cluelessness)

Also, it seems to me most DSL users are behind PPPoE links with lower
MTU, and should get hit by the same problem.

No. The trick, here, is that the PPPoE (typically) terminates on the
same system that's terminating the TCP connection, so the PPPoE end
system can see that the PMTU is going to be, at most, 1492, so it can
use a lower Maximum Segment Size in TCP to start the whole scenario off
at the 1492 MTU size and try to go down from there. You're seeing the
problem because the tunnel is not terminated on the system that's also
terminating the TCP connection, so the TCP processing can't know about
the 14xx MTU somewhere out there except through PMTU (which is broken in
this case), so it can't set the corresponding MSS to compensate for it
initially.

The temporary hack is to have tunnelbox1 clear the DF bit on all
incoming packets, which just causes the packets to get fragmented going
down the tunnel. A minor performance hit, but it works.

An only slightly better hack would be to have the tunnel and/or firewall
twiddle the MSS on outgoing TCP connections to compensate for the lower
tunnel MTU. Still pretty gross, but won't have as much of an effect on
the TCP performance.

Are the servers really that broken (PMTU enabled, ICMP Can't Fragement
filtered)?

Yes. The last time I ran into this, my test site was www.harvard.edu
(!)...though that's been a year or more ago, so they may have resolved
their issues since then. We ran into plenty more sites that had the
problem, but that's the one that sticks out in my mind because, like I
said, it was the one that I used as a site to try to connect to as a
test.

Does the head end box of DSL services generally do something to work
around this (ie, clear the DF bit)? Am I just being an idiot and
missing something obvious?

I wouldn't say idiot, or missing anything obvious...but you were missing
the whole MSS issue. I've never thought the behavior was intuitive or
obvious...but once you think about it becomes a "Why didn't I think of
that?" sorta thing.

In a message written on Thu, May 08, 2003 at 11:12:19AM -0400, Jeff McAdams wrote:

An only slightly better hack would be to have the tunnel and/or firewall
twiddle the MSS on outgoing TCP connections to compensate for the lower
tunnel MTU. Still pretty gross, but won't have as much of an effect on
the TCP performance.

I did leave out some details. We tried Cisco code that mucks with MSS,
and that part worked but the image had other issues for us. :frowning:

Of course, I'm also highly annoyed that you can't raise the MTU on
a Cisco tunnel. You can raise the MTU on T1's, so you can add the
tunnel overhead, but you can't then raise the MTU of the tunnel
itself. In a couple cases we could make it so the tunnel had an
MTU of 1500 if we could change it. *sigh*

"ip mtu 1500".

Works on most newer IOS:es.

In a message written on Thu, May 08, 2003 at 05:43:16PM +0200, Mikael Abrahamsson wrote:

"ip mtu 1500".

Works on most newer IOS:es.

Could you define newer? My limited testing with 12.0S and 12.2something
shows both returing MTU can't be set on a tunnel interface.

You cannot change the MTU of the tunnel, but you can change the "ip mtu"
of the tunnel. I've done this on 7200 and 1600 that are running IOSes that
are at least a year old. The feature is at least 2-3 years old, because
that's the first time I encountered it.

The router will do re-assembly on the tunnel level to be able to transfer
a 1500 byte sized IP packet non-fragmented.

Be warned, while fragmentation follows some form of fast path, reassembly
is process switched. A beefy 7206 will fragment 300Mbps easily, but will
reassemble around 30Mbps.

Hello Leo,

Could you define newer? My limited testing with 12.0S and 12.2something
shows both returing MTU can't be set on a tunnel interface.

concrete version number: 12.0(17)ST5. Just tried on a test 7200 with that code.

7204-R3(config-if)#mtu 1500
% Interface Tunnel1 does not support adjustable maximum datagram size

7204-R3(config-if)#ip mtu 1500
7204-R3(config-if)#

And from a tcpdump-ed ping I'm convinced it works :slight_smile:

Regards, Marc