RE: SOLVED! The cause of puzzling TCP (eg. WHOIS) connection failures with some hosts

Mark Kosters writes:

> The problem has to do with the failure of a host to fragment larger
> packets on demand (i.e. when the other host sends an ICMP "needs frag"
> notification). This may be because the ICMP packet never gets through
> (perhaps someone who didn't understand TCP/IP and ICMP and everything
> else related implemented a filter on all "abnormal" ICMP packets); or it
> may be because the receiving host doesn't understand the ICMP "needs
> frag" request (and also doesn't implement path MTU discovery, or have I
> got that backwards?).
> No matter what the problem really is, I'm sure a *lot* of people would
> be much happier if this problem were fixed, specifically for the WHOIS
> service (though I've also had troubles receiving HTTP too). I got quite
> a few replies about similar experiences when I first posted about this
> on NANOG recently.

Thanks Greg for the good information.

The InterNIC load balancers (BigIP made by F5 Labs) do have a problem with
path MTU discovery. We have taken a short term fix of turning off path MTU
discovery on the hosts behind BigIP until F5 issues a fix.



Mark Kosters InterNIC Registration Services

Firstly, thanks to Greg Woods and Mark Kosters for isolating this
problem. Now that they have, the fix is fairly straight-forward, and
we'll be working with Mark to deploy that shortly.

We will make available a BIG/ip software patch that allows path MTU
discovery for servers situated behind it. Whenever BIG/ip receives an
ICMP "needs frag" message, it will replicate it to each of the servers
that may need it at that time. This replication is necessary because
a specific client may have concurrent connections to more than one
server behind BIG/ip.

By their very nature, load balancers do not play by the same rules as
"normal" network devices such as routers and hosts. Except where we
have implemented specific support, BIG/ip blocks all ICMP messages
from reaching servers situated behind it.

Chris Mauritz writes:

Actually, this isn't a BigIP problem. It's a BSDI problem (the underlying
OS for the BigIP box). I believe BSDI has a patch available for BSD/OS 4.0
(mayabe even 3.1). While you can't benefit from this patch directly, you
can perhaps nudge your F5 rep about expediting a patch for your boxen.

Although BIG/ip is based, in part, on BSDI, these patches are not
applicable because BIG/ip's packet processing logic is very unique.
So in this case, it *is* BIG/ip's problem. It's not the first
problem, and it won't be the last. As Internet applications and
infrastructure evolve, and BIG/ip's installed base grows and feature
set expands, we are constantly discovering new ways to improve the
product. The best suggestions come from our customers and from
experts like the NANOG members.

If anyone has any questions or comments about F5 Labs products
(including their behavior on the net), please feel free to contact me


Rob Gilde
Principal Engineer
F5 Labs, Inc.