RE: MTU path discovery and IPSec

The problem is described pretty clearly at
http://www.cisco.com/warp/public/105/56.html. The issue I have
experienced is that fragmentation can lead to performance impacts that are
unacceptable.

I wish we could start a clue campaign informing people why ICMP should not
be summarily dumped at the firewall.

Chris Proctor
EPIK Communications

Do not just blame random company's firewall's for dumping ICMP. There are
some very well known hosting groups that filter ICMP on edge of their
network's in their routers.

It gets even worse when their server admin's decide to leave PMTU discovery
on. Sort of defeats the purpose...

Given the nastiness of ICMP DDoS attacks of late, it might be better to hit
the server and client admin's with the clue bat about not using PMTU
discovery (which also extends to the writers of the App's and OS's). Frag.
is in the fast path of just about every current version of brand C code, so
giving the tunneling folks the OK to frag the packet might be preferred to
forcing them to mess about with alternate options.

David

You could drop ICMP packets at your firewall if the firewalls properly
implemented stateful inspection of ICMP packets. The problem is few
firewalls include ICMP responses in their statefull analysis. So you are
left with two bad choices, permit "all" ICMP packets or deny "all" ICMP
packets.

Actually, any halfway decent firewall allows you to permit certain ICMP
type codes while rejecting others. Not a perfect solution, but, for the
most part, there aren't a lot of fragmentation-needed exploits running
around. (In fact, I'm hard pressed to imagine how a Frag needed packet
for an invalid session could do much of anything).

Owen

there are expert modes where you can apply the
name source destination protocol time comments. rank state action track
for more stabilized dedicated connections

I am certain there are more depending on the vender

-Henry

You can use a forged 'frag needed' to stomp an existing connection of the
victim's down to 64 byte MTU or similar silliness, but other than sheer
"it's a packet" DDoS effects, I can't think of a malicious use for one for
an invalid session either....

Agreed. However, the former pretty much requires knowledge, a lot of packets,
or a really lucky set of guesses.

Owen

Given the nastiness of ICMP DDoS attacks of late, it might be better to

hit

the server and client admin's with the clue bat about not using PMTU
discovery (which also extends to the writers of the App's and OS's).

This idea that some protocol has been used for some form of attack means
that we should for now and evermore block that protocol leads clearly to a
network with all protocols blocked. No, I don't buy the argument that
icmp (at least most forms of it) should be blocked.

Frag.
is in the fast path of just about every current version of brand C code,

so

giving the tunneling folks the OK to frag the packet might be preferred

to

forcing them to mess about with alternate options.

Fragmentation should be an ok eventuality for some traffic, but there are
a couple of points that make it more painful than it might seem:

1. Encapsulated traffic (such as most vpns - GRE, IPSEC, etc.) often
results in packets that subsequently need to be fragmented. That
typically yields lots of 1500 byte packets followed by 80 byte packets.

2. I really don't know how NAPT routers deal with fragments. These guys
depend on the port information in a packet to reliably determine the
target of inbound traffic. But there is no port information in anything
other than fragment 1. When they receive a frag other than 1 they don't
definitely know who to deliver it to. They have to either guess or drop
the packet. Ugh, in both cases.

(And note that frag 1 often is not the first fragment to arrive at
downstream nodes. In my example in (1), frequently frag 2 will reach
places before frag 1 does (if any router along the path reorders its
transmit queue based on packet size).)

Tony Rall

Tony Rall wrote:

<snipped>

(And note that frag 1 often is not the first fragment to arrive at downstream nodes. In my example in (1), frequently frag 2 will reach places before frag 1 does (if any router along the path reorders its transmit queue based on packet size).)

I agree with all I have snipped.
I was wondering would it not be wiser for fraggers to frag in half instead of just the overflow?

For instance, suppose router has to fragment 1500 byte packet to go over 1476 GRE. Instead of having a big packet/little fragment why not just divide in half?
This would give them more equal buffer treatment, but an even bigger potential win is to avoid perhaps a second (maybe ipsec?) fragmenting later on down the pipe.

Once you are going to do it, do it right. It is not as if your decreasing header overhead by producing small fragment packets. And I am assuming the whole packet is already in buffer when it comes time to fragment it.

There's 2 cases here:

1) This is the final frag on the path - if PMTUD is in use, we want to frag
right at the overflow so the connection can use the max (so if we're fragging
from 1500 down to 1410, they end up with 1410 rather than 750).

2) There's an even more restrictive frag further downstream. We frag from 1500
to 1460, and somebody else frags from 1460 down to 1410. If you frag at overflow,
you end up with a PMTU of 1410. If you fragged it in half, you avoid the second
frag but end up with a PMTU of 750.

After several dozen packets, the difference between 750 and 1410 will start to become
noticable.....

That's not how PMTUD works. If DF is set, you discard the packet and
report back with ICMP. If DF is not set, you frag the packet - but
that's not PMTUD, because no report ever goes back to the sender.

Barney Wolff wrote:

I was wondering would it not be wiser for fraggers to frag in half instead of just the overflow?
     

There's 2 cases here:

1) This is the final frag on the path - if PMTUD is in use, we want to frag
right at the overflow so the connection can use the max (so if we're fragging
from 1500 down to 1410, they end up with 1410 rather than 750).

2) There's an even more restrictive frag further downstream. We frag from 1500
to 1460, and somebody else frags from 1460 down to 1410. If you frag at overflow,
you end up with a PMTU of 1410. If you fragged it in half, you avoid the second
frag but end up with a PMTU of 750.

After several dozen packets, the difference between 750 and 1410 will start to become
noticable.....
   
That's not how PMTUD works. If DF is set, you discard the packet and
report back with ICMP. If DF is not set, you frag the packet - but
that's not PMTUD, because no report ever goes back to the sender.

Probaly better to say that in this day and age PMTUD doesnt work and the best interoperability feature of IP, the fragmenting is becoming useless as the internet remains pegged to a 1500 MTU everywhere, with evil hacks everywhere else to keep things working (mss adjustment/clamping, DF bit clearing).

Is there any discussion on better alternatives to PMTUD such as leaving off DF and a new ICMP subtype, rate limited, to inform senders that they've been fragged and at what (call it reverse PMTUD?) ? Or how about a new TCP option (Call it MSSr/s maximum segment size sent/received) for the receiver to tell the sender if packet sizes are less than expected/fragged? (again with DF off)?

Does IP6 really do away with fragmenting? Is there any current discussion on all this?

I see I have to go do some research.

Oh, so we compute ONE number if DF is set, saying what number we think they
should use - but if DF *isn't* set, we use a different number. Sounds like more
complicated code that's just there so it can sink its teeth into the rump of the
first banana-eating NOC dweller that has to figure out what's wrong....

Unless of course there's a *reason* we want it different? Though it escapes me what
it might be....

Joe Maimon wrote:

Tony Rall wrote:

>
>
>
>
<snipped>

>(And note that frag 1 often is not the first fragment to arrive at
>downstream nodes. In my example in (1), frequently frag 2 will reach
>places before frag 1 does (if any router along the path reorders its
>transmit queue based on packet size).)
>
>
>
I agree with all I have snipped.
I was wondering would it not be wiser for fraggers to frag in half
instead of just the overflow?

For instance, suppose router has to fragment 1500 byte packet to go over
1476 GRE. Instead of having a big packet/little fragment why not just
divide in half?
This would give them more equal buffer treatment, but an even bigger
potential win is to avoid perhaps a second (maybe ipsec?) fragmenting
later on down the pipe.

Once you are going to do it, do it right. It is not as if your
decreasing header overhead by producing small fragment packets. And I am
assuming the whole packet is already in buffer when it comes time to
fragment it.

Programmers are lazy.

Excerise for the reader:

Devise an algorthm that will take an arbitrarily sized packet 20-65535
octets and an arbitrarily sized MTU, > 576 octets, and split the
packet into the minimum number of "n" fragments where each fragment is
(1) less than the MTU, (2) no two fragments differ by more than 8 octets,
and the fragments obey the IP fragmentation rules, (3) data payload must
end on an 8-octet boundary for all but the last fragment and (4) each
fragment has an exact copy of the original header except for differences
in the fragmentation fields and checksum.

Compare to the algorithm of cutting the data in to "m" (mtu - ip_hl)-
chunks and putting the leftovers into the final fragment.

As I have said previously, some reasons are that
A) Your fragmenting the packet anyways, thus there will be extra header overhead. Splitting that overhead into 1 big and 1 small packet does not seem to be a performance win**.
B) Fragmenting into equal sizes may mean that equipment can treat them more equaly and may reduce Out of Order fragments, which is easier on state keeping devices.
C) Equal buffer treatment may mean easier handling of switching and reassembly, I havent thought this through.
D) And the best part, avoid the insult to injury by lessening the chance that further fragmentation will occur on the packet. Picture a packet coming in from ATM to Ethernet to PPPoE through Ipsec. How many fragments is that? How much overhead?

As far as code goes how is that a problem? One assumes the length of the packet is there already. SO all we have to do is divide in half use that number and use it instead of the value of next_hop_mtu.

And we use different numbers because when DF is set our only option is telling the sender to lower. Lower to what? Well to what we know is good. How do we know the next hop isnt even lower? Well we should know if its in the same AS, otherwise we just do our best. And besides, PMTUD is a performance orientated feature. One would like to avoid compromising the performance gains. The precise maximum path MTU is exactly what the sender wants to find out. So give it.

But IP without DF is best attempt delivery. So do whatever will be best compromise. And we are fragmenting anyway... (GOTO START)

**But, one case where this could be undesired is by causing buffer fragmentation.

Crist Clark wrote:

Joe Maimon wrote:

Tony Rall wrote:

<snipped>

(And note that frag 1 often is not the first fragment to arrive at
downstream nodes. In my example in (1), frequently frag 2 will reach
places before frag 1 does (if any router along the path reorders its
transmit queue based on packet size).)

I agree with all I have snipped.
I was wondering would it not be wiser for fraggers to frag in half
instead of just the overflow?

For instance, suppose router has to fragment 1500 byte packet to go over
1476 GRE. Instead of having a big packet/little fragment why not just
divide in half?
This would give them more equal buffer treatment, but an even bigger
potential win is to avoid perhaps a second (maybe ipsec?) fragmenting
later on down the pipe.

Once you are going to do it, do it right. It is not as if your
decreasing header overhead by producing small fragment packets. And I am
assuming the whole packet is already in buffer when it comes time to
fragment it.
   
Programmers are lazy.

Excerise for the reader:

Devise an algorthm that will take an arbitrarily sized packet 20-65535
octets and an arbitrarily sized MTU, > 576 octets, and split the packet into the minimum number of "n" fragments where each fragment is
(1) less than the MTU, (2) no two fragments differ by more than 8 octets,
and the fragments obey the IP fragmentation rules, (3) data payload must
end on an 8-octet boundary for all but the last fragment and (4) each
fragment has an exact copy of the original header except for differences
in the fragmentation fields and checksum.

Compare to the algorithm of cutting the data in to "m" (mtu - ip_hl)-
chunks and putting the leftovers into the final fragment.

How about only going to the bother if 'n' would only be 2 in either algorithm? That should keep things nice and simple for all the lazy programmers.

And we wonder why there are so many security holes.

As for the rest, I do not see the real difference. And now I will shut ip about implementation until/when(if ever) I could write some.

Crist Clark wrote:

Joe Maimon wrote:
>
> Tony Rall wrote:
>
> >
> <snipped>
>
> >(And note that frag 1 often is not the first fragment to arrive at
> >downstream nodes. In my example in (1), frequently frag 2 will reach
> >places before frag 1 does (if any router along the path reorders its
> >transmit queue based on packet size).)
> >
> I agree with all I have snipped.
> I was wondering would it not be wiser for fraggers to frag in half
> instead of just the overflow?
>
> For instance, suppose router has to fragment 1500 byte packet to go over
> 1476 GRE. Instead of having a big packet/little fragment why not just
> divide in half?
> This would give them more equal buffer treatment, but an even bigger
> potential win is to avoid perhaps a second (maybe ipsec?) fragmenting
> later on down the pipe.
>
> Once you are going to do it, do it right. It is not as if your
> decreasing header overhead by producing small fragment packets. And I am
> assuming the whole packet is already in buffer when it comes time to
> fragment it.

Programmers are lazy.

Excerise for the reader:

Devise an algorthm that will take an arbitrarily sized packet 20-65535
octets and an arbitrarily sized MTU, > 576 octets, and split the
packet into the minimum number of "n" fragments where each fragment is
(1) less than the MTU, (2) no two fragments differ by more than 8 octets,
and the fragments obey the IP fragmentation rules, (3) data payload must
end on an 8-octet boundary for all but the last fragment and (4) each
fragment has an exact copy of the original header except for differences
in the fragmentation fields and checksum.

Compare to the algorithm of cutting the data in to "m" (mtu - ip_hl)-
chunks and putting the leftovers into the final fragment.

I've got to jump in and display my considerable ignorance here.

Are there not machines in service now that start blatting bits out
(when able) before the whole packet has been recieved?

Given that to be correct, it would seem to be Really Hard to whack up
a packet into equal-sized chunks (given that it is otherwise a Good
Thing To Do) on-the-fly. Lazy programmers (which I have long taught
are the Best Kind) will blat out bits until the buffer is full,
start a new buffer, rinse, lather, repeat until the input buffer
is exhausted.

Where did I go into the ditch?

Laurence F. Sheldon, Jr. wrote:

Crist Clark wrote:

Joe Maimon wrote:
   

Tony Rall wrote:

<snipped>

(And note that frag 1 often is not the first fragment to arrive at
downstream nodes. In my example in (1), frequently frag 2 will reach
places before frag 1 does (if any router along the path reorders its
transmit queue based on packet size).)

I agree with all I have snipped.
I was wondering would it not be wiser for fraggers to frag in half
instead of just the overflow?

For instance, suppose router has to fragment 1500 byte packet to go over
1476 GRE. Instead of having a big packet/little fragment why not just
divide in half?
This would give them more equal buffer treatment, but an even bigger
potential win is to avoid perhaps a second (maybe ipsec?) fragmenting
later on down the pipe.

Once you are going to do it, do it right. It is not as if your
decreasing header overhead by producing small fragment packets. And I am
assuming the whole packet is already in buffer when it comes time to
fragment it.
     

Programmers are lazy.

Excerise for the reader:

Devise an algorthm that will take an arbitrarily sized packet 20-65535
octets and an arbitrarily sized MTU, > 576 octets, and split the
packet into the minimum number of "n" fragments where each fragment is
(1) less than the MTU, (2) no two fragments differ by more than 8 octets,
and the fragments obey the IP fragmentation rules, (3) data payload must
end on an 8-octet boundary for all but the last fragment and (4) each
fragment has an exact copy of the original header except for differences
in the fragmentation fields and checksum.

Compare to the algorithm of cutting the data in to "m" (mtu - ip_hl)-
chunks and putting the leftovers into the final fragment.
   
I've got to jump in and display my considerable ignorance here.

Are there not machines in service now that start blatting bits out
(when able) before the whole packet has been recieved?

Given that to be correct, it would seem to be Really Hard to whack up
a packet into equal-sized chunks (given that it is otherwise a Good
Thing To Do) on-the-fly. Lazy programmers (which I have long taught
are the Best Kind) will blat out bits until the buffer is full,
start a new buffer, rinse, lather, repeat until the input buffer
is exhausted.

Where did I go into the ditch?

Maybe because IP tells the length of the packet up front?

Then re-read rfc792:

   The ICMP messages typically report errors in the processing of
   datagrams. To avoid the infinite regress of messages about messages
   etc., no ICMP messages are sent about ICMP messages. Also ICMP
   messages are only sent about errors in handling fragment zero of
   fragemented datagrams. (Fragment zero has the fragment offeset equal
   zero).

and think which is more likely to silently fail - fragmenting a 1500 byte
packet 750/750, or 1430/70, if the *second* frag is in possible danger
of being lost due to buffer congestion or similar?

Joe Maimon wrote:

Tony Rall wrote:

<snipped>

<snipped>
I was

wondering would it not be wiser for fraggers to frag in half instead of just the overflow?
<snip>

I noticed today this URL

Interesting part down on the page:

          Uniform Fragmentation

Packets are fragmented into equally sized units to prevent further downstream fragmentation.