should TCPs do MTU black hole detection?

The IETF's tcp-impl (TCP implementation) working group has a draft document
discussing problems with path MTU discovery:

  http://www.ietf.org/internet-drafts/draft-ietf-tcpimpl-pmtud-02.txt

The main issue we're trying to decide is whether the draft should advocate
"black hole detection". That is, when a TCP is doing PMTU discovery, but
somewhere the necessary ICMPs are either not being generated or are being
filtered out before the TCP receives them, the TCP notices that it's losing
multiple packets of the same size, so it then tries sending smaller segments,
even though it hasn't received a "Datagram Too Big" ICMP.

The plus of black hole detection is that it can work around a sometimes very
hard to debug problem. The minus is that it masks problems that should
instead be fixed.

To help resolve this issue, I'm wondering whether the ISP community has a
clear preference for either yes-do-detection or no-we-want-the-problems-fixed.
Comments appreciated.

  Thanks,

    Vern

Unfortunately, the MTU problem can be caused by the client's network admin as
well as by the ISP; it's very difficult to explain what's wrong, for this
admins, and MTU discovery is not the part of traditional IP approach. This means
that black-hole detection whould be implemented anyway to prevent lost of
connectivity which we have sometimes nopw when some MS-based server or
crlient refuse to allow ip fragmentation.

I think that most ISP's would prefer that problems were fixed. However, we
also know this doesn't happen very often, unless provoked (by customers,
usually.)

Can you provide more detail as to what problems would be masked or
otherwise ignored if TCP implementations started to accomodate for the lack
of Path-MTU discovery ?

I'm wondering whether the ISP community has a clear preference for either
yes-do-detection or no-we-want-the-problems-fixed.
Comments appreciated.

I think that most ISP's would prefer that problems were fixed.

the choice seems

  o when it breaks, the noc gets the call, debugs it, and it gets fixed

  o when it breaks, the software does successive guesswork back-offs until
    it makes it through. the performance sucks big-time, the customer
    thinks the isp is at fault, but the noc does not get called and the
    real problem never gets fixed.

randy

>> yes-do-detection or no-we-want-the-problems-fixed.
>> Comments appreciated.
> I think that most ISP's would prefer that problems were fixed.

the choice seems

  o when it breaks, the noc gets the call, debugs it, and it gets fixed

'it get fixed' if the problem is inside the network controlled by the NOC. If
not??

  o when it breaks, the software does successive guesswork back-offs until
    it makes it through. the performance sucks big-time, the customer
    thinks the isp is at fault, but the noc does not get called and the
    real problem never gets fixed.

randy

Aleksei Roudnev, Network Operations Center, Relcom, Moscow
(+7 095) 194-19-95 (Network Operations Center Hot Line),(+7 095) 230-41-41, N 13729 (pager)
(+7 095) 196-72-12 (Support), (+7 095) 194-33-28 (Fax)

Thus spake Randy Bush

the choice seems

o when it breaks, the noc gets the call, debugs it, and it gets fixed

                                                           ^^^^^^^^^^^^^
Optimist!

o when it breaks, the software does successive guesswork back-offs until
   it makes it through. the performance sucks big-time, the customer
   thinks the isp is at fault, but the noc does not get called and the
   real problem never gets fixed.

This one is pretty accurate though.

yes-do-detection or no-we-want-the-problems-fixed.
Comments appreciated.

I think that most ISP's would prefer that problems were fixed.

the choice seems
  o when it breaks, the noc gets the call, debugs it, and it gets fixed

'it get fixed' if the problem is inside the network controlled by the
NOC. If not??

noc tells user to lower mtu

o when it breaks, the noc gets the call, debugs it, and it gets fixed

                                                          ^^^^^^^^^^^^^
Optimist!

but at least the user knows where the problem lies, as opposed to

o when it breaks, the software does successive guesswork back-offs until
   it makes it through. the performance sucks big-time, the customer
   thinks the isp is at fault, but the noc does not get called and the
   real problem never gets fixed.

This one is pretty accurate though.

where the user thinks the isp sucks. and the latter will be forever
increasing entropy. the net as experienced just gets worse and worse.

randy

[ On Thursday, November 18, 1999 at 22:03:21 (-0500), Randy Bush wrote: ]

Subject: Re: should TCPs do MTU black hole detection?

>> o when it breaks, the noc gets the call, debugs it, and it gets fixed
> ^^^^^^^^^^^^^
> Optimist!

but at least the user knows where the problem lies, as opposed to

I certainly agree with you here Randy!

However as a reasonably expert user there are times when I want the
ability to have my equipment try to work around even ultra-stupid
configurations elsewhere on the net, especially when those of us
experiencing the problem are far more rare than those who do not.

Some time ago when I first began to personally experience this problem
with path MTU discovery on my home network I discovered through analysis
of the upstream packet traces that it was almost always possible to have
the router sending the needs-frag reply to realize when its attempts to
do so were futile and thus enforce fragmentation anyway. While this may
make some protocols un-usable anyway it could at least allow me to limp
along and to allow me to use that same network to communicate with the
offending people at the other end using protocols affected by this
problem such as SMTP, FTP, HTTP, etc.

Unfortunately I have not yet had time (or in this case the need -- I've
since increased my link's MTU to 1500 :-), to implement this algorithm
in my own upstream router (which thankfully I do have the root password
for! ;-).

BTW, I think my algorithm provides a much more efficient work-around to
the problem than black-hole discovery. Unfortunately it also makes it
harder to convince the offending admins to fix their stupid filter
definitions (or alternately at least turn off PMTUd in their servers).

What I'd really like to do is find some way to enhance my algorithm in
such a way that it could send a tiny tactical nuke down the wire after
my connection to/through the offending network safely closes. I.e. I
want to cause grief to any ICMP filter that has caused my router to have
to work around its stupidity! :wink:

foxRandy Bush wrote:

>> I'm wondering whether the ISP community has a clear preference for either
>> yes-do-detection or no-we-want-the-problems-fixed.
>> Comments appreciated.
> I think that most ISP's would prefer that problems were fixed.

the choice seems

  o when it breaks, the noc gets the call, debugs it, and it gets fixed

Well Randy, ideally yes but reality sucks as we know. We had this crap as
well and tried to get folks to fix it - important sites just don't care and
yes, we are usually talking broken web stuff here. Hard to tell their
ISP to get them to fix it. I am asking myself if we can nuke it elsewhere,
nothing obvious right off ?

  o when it breaks, the software does successive guesswork back-offs until
    it makes it through. the performance sucks big-time, the customer
    thinks the isp is at fault, but the noc does not get called and the
    real problem never gets fixed.

Agreed. Regardless of what I mention above, nuke em, i.e. fix.
Dave