Strange public traceroutes return private RFC1918 addresses

> A more important question is what will happen as we move out
> of the 1500 byte Ethernet world into the jumbo gigE world. It's
> only a matter of time before end users will be running gigE
> networks and want to use jumbo MTUs on their Internet links.

The performance gain achieved by using jumbo frames outside of very
specific LAN scenarios is highly questionable, and they're still not
standardized. Are "jumbo" Internet MTUs seen as a pressing issue by
ISPs and vendors these days?

-Terry

  for some, yes. running 1ge is fairly common and 10ge is
  maturing. bleeding edge 40ge is available ... and 1500byte
  mtu is -not- an option.

--bill

bill wrote:

for some, yes. running 1ge is fairly common and 10ge is
maturing. bleeding edge 40ge is available ... and 1500byte
mtu is -not- an option.

Me wonders why people ask for 40 byte packets at linerate if the mtu is supposedly
larger?

Pete

In a message written on Tue, Feb 03, 2004 at 08:15:13AM -0600, Terry Baranski wrote:

The performance gain achieved by using jumbo frames outside of very
specific LAN scenarios is highly questionable, and they're still not
standardized. Are "jumbo" Internet MTUs seen as a pressing issue by
ISPs and vendors these days?

While the rate of request is still very low, I would say we get
more and more requests for jumbo frames everyday. The pressing
application today is "larger" frames; that is don't think two hosts
talking 9000 MTU frames to each other, but rather think IPSec or
other tunneling boxes talking 1600 byte packets to each other so
they don't have to split 1500 byte Ethernet packets in half. Since
most POS is 4470, adding a jumbo frame GigE edge makes this application
work much more efficiently, even if it doesn't enable jumbo (9k)
frames end to end. The interesting thing here is it means there
absolutely is a PMTU issue, a 9K edge with a 4470 core.

There is also a lot of work going on in academic networks that uses
jumbo frames. I suspect in a few more years this will make it into
more common applications.

In a message written on Tue, Feb 03, 2004 at 04:40:15PM +0200, Petri Helenius wrote:

Me wonders why people ask for 40 byte packets at linerate if the mtu is
supposedly
larger?

This is a problem that is going to get worse. I support IP you
have to support a 40 byte packet. As long as that exists, DDOS
tools will use 40 byte packets, knowing more lookups are harder on
the software/hardware in routers. At the same time I suspect software
is going to continue to slowly move to larger and larger packets,
because at the higher data rates (eg 40 gige) it makes a huge difference
in host usage. You can fit 6 times in the data in a 9K packet that
you can in a 1500 byte packet, which means 1/6th the interrupts, DMA
transfers, ACL checks, etc, etc, etc.

* pete@he.iki.fi (Petri Helenius) [Tue 03 Feb 2004, 15:42 CET]:

Me wonders why people ask for 40 byte packets at linerate if the mtu
is supposedly larger?

Support for the worst-case scenario. Same why you spec support for
a BIGINT-line ACL without excessive impact on forwarding capacity.

  -- Niels.

Niels Bakker wrote:

* pete@he.iki.fi (Petri Helenius) [Tue 03 Feb 2004, 15:42 CET]:

Me wonders why people ask for 40 byte packets at linerate if the mtu
is supposedly larger?
   
Support for the worst-case scenario. Same why you spec support for
a BIGINT-line ACL without excessive impact on forwarding capacity.

Why large MTU then? Most modern ethernet controllers don�t care if you�re sending
1500 or 9000 byte packets. (with proper drivers taking advantage of the features there)
If you�re paying for 40 byte packets anyway, there is no incentive to ever go beyond 1500
byte MTU.

Pete

Leo Bicknell wrote:

because at the higher data rates (eg 40 gige) it makes a huge difference
in host usage. You can fit 6 times in the data in a 9K packet that
you can in a 1500 byte packet, which means 1/6th the interrupts, DMA
transfers, ACL checks, etc, etc, etc.

This is wrong. Interrupt moderation has been there for quite a while, DMA is chained and
predictive.

ACL checks I can agree on, but if you are optimizing the system, what do you need ACL�s
for anyway because you can make the applications secure in the first place?

Pete

In a message written on Tue, Feb 03, 2004 at 08:40:22PM +0200, Petri Helenius wrote:

If you're paying for 40 byte packets anyway, there is no incentive to
ever go beyond 1500

With a 20 byte IP header:

A 40 byte packet is 50% data.

A 1500 byte packet is 98.7% data.

A 9000 byte packet is 99.7% data.

Anyone who pays by the bit should like large packets better than
small packets, as you pay for less "overhead" bandwidth.

Note that a 1500 byte IP in IP packet becomes 1520, and then gets
fragmented to 1500 and a 40 byte packet (20 data, 20 header). That's
only 97.3% efficient, where as a single 1520 byte packet, if it
could be carried, is 98.7% efficient.

Obviously talking in smaller numbers, but to a lot of VPN vendors
1.4% improvement in bandwidth usage, bus usage, or avoiding the
path through the device that fragments a packet in the first place
is a big win.

Why large MTU then? Most modern ethernet controllers don�t care if you�re
sending 1500 or 9000 byte packets. (with proper drivers taking advantage of
the features there) If you�re paying for 40 byte packets anyway, there is no
incentive to ever go beyond 1500 byte MTU.

I think its partially due to removal of overhead and improvements you get out of
TCP (bearing in mind it uses windowing and slow start)

Bit of data on this link that i googled up,

http://www-iepm.slac.stanford.edu/monitoring/bulk/10ge/20030303/tests.html

Leo Bicknell wrote:

because at the higher data rates (eg 40 gige) it makes a huge difference
in host usage. You can fit 6 times in the data in a 9K packet that you
can in a 1500 byte packet, which means 1/6th the interrupts, DMA
transfers, ACL checks, etc, etc, etc.

* pete@he.iki.fi (Petri Helenius) [Tue 03 Feb 2004, 19:47 CET]:

This is wrong. Interrupt moderation has been there for quite a while,
DMA is chained and predictive.

Just like the extra chopping up of the data you want to send into more
packets, it's things you have to do a few extra times. That takes time.
There is no way around this. What Leo wrote is in no way wrong.

ACL checks I can agree on, but if you are optimizing the system, what
do you need ACL?s for anyway because you can make the applications
secure in the first place?

You're trolling, right?

  -- Niels.

Stephen J. Wilcox wrote:

Why large MTU then? Most modern ethernet controllers don�t care if you�re
sending 1500 or 9000 byte packets. (with proper drivers taking advantage of
the features there) If you�re paying for 40 byte packets anyway, there is no
incentive to ever go beyond 1500 byte MTU.
   
I think its partially due to removal of overhead and improvements you get out of TCP (bearing in mind it uses windowing and slow start)

Sure, if you control both endpoints. If you don�t and receivers have small (4k,8k or 16k) window
sizes, your performance will suffer.

Maybe we should define if we�re talking about record breaking attempts or real operationally
useful things here.

Pete

Niels Bakker wrote:

Just like the extra chopping up of the data you want to send into more
packets, it's things you have to do a few extra times. That takes time.
There is no way around this. What Leo wrote is in no way wrong.

Maybe we need to define what the expression "huge difference" means in this context. Previously
it has been defined as 1.4% difference which in my opinion qualifies as understatement of the day.

If we would be talking about 20% or more difference here, the pain from larger MTU might be tolerable.

ACL checks I can agree on, but if you are optimizing the system, what
do you need ACL?s for anyway because you can make the applications
secure in the first place?
   
You're trolling, right?

No. I�ll trust my digital signatures over the source IP filters any day.

Pete

In a message written on Tue, Feb 03, 2004 at 09:53:30PM +0200, Petri Helenius wrote:

Sure, if you control both endpoints. If you don´t and receivers have
small (4k,8k or 16k) window
sizes, your performance will suffer.

Maybe we should define if we´re talking about record breaking attempts
or real operationally
useful things here.

Google and Akamai are just two examples of companies with hundreds
of thousands of machines where they move large amounts of data
between them and have control of both ends. Many corporations are
now moving off-site backup data over the Internet, in large volumes
between two end points they control.

The Internet is not just web servers feeding dial-up clients.

By definition of this discussion about using large MTU we are assuming that
packets are arriving >1500 bytes and therefore that we do have control of the
endpoints and they are set to use jumbos

Steve

Leo Bicknell wrote:

Google and Akamai are just two examples of companies with hundreds
of thousands of machines where they move large amounts of data
between them and have control of both ends. Many corporations are
now moving off-site backup data over the Internet, in large volumes
between two end points they control.

Makes me wonder if either one of the mentioned want to take the operational and support
burden of increasing the MTU across maybe one of the most diverse set of paths in any
environment. I would probably never send even a 1500 byte packet if I would be either
of them, but live somewhere in the low-1400 range.

Pete

Leo Bicknell wrote:

Since most POS is 4470, adding a jumbo frame GigE edge makes
this application work much more efficiently, even if it doesn't
enable jumbo (9k) frames end to end. The interesting thing
here is it means there absolutely is a PMTU issue, a 9K edge
with a 4470 core.

This brings up the question of what other MTUs are common on the
Internet, as well as which ones are simply defaults (i.e., could easily
be increased) and which ones are the result of device/protocol
limitations.

And why 4470 for POS? Did everyone borrow a vendor's FDDI-like default
or is there a technical reason? PPP seems able to use 64k packets (as
can the frame-based version of GFP, incidentally, POS's likely
replacement).

-Terry

9k isn't an absolutely necessity, especially for x86. I believe the
original reason for 9k as picked by Alteon was to support the 8192 byte
page size on the Alpha. As long as there is enough to squeeze an x86
memory page (4096 bytes of payload) plus some room for headers, the
important goal of jumbo frames (which is NOT to lower the packet/sec
count, this is only a mild by-product for those who are still doing things
wrong) is achieved. This would also eliminate the problems of IPSec, GRE,
and other forms of tunneling which may or may not be applied breaking
things where PMTUD is blocked, since the "standard" payload packet for TCP
would only be 4136 octets (leaving plenty for other "stuff").

The 4470 MTU of POS meets this requirement perfectly, and the world of end
to end connectivity would be an infinitely better place if everyone could
expect to pass 4470 through the Internet. But alas, there are probably too
many people people running GigE in the core which doesn't support jumbo
frames let alone a standardized size of jumbo frame, due to various vendor
hijinks to truly make use of POS's MTU these days.