Jumbo frame Question

Harris_Hui1 · November 26, 2010, 12:13am

Hi

Does anyone have experience on design / implementing the Jumbo frame
enabled network?

I am working on a project to better utilize a fiber link across east coast
and west coast with the Juniper devices.

Based on the default TCP windows in Linux / Windows and the latency between
east coast and west coast (~80ms) and the default MTU size 1500, the
maximum throughput of a single TCP session is around ~3Mbps but it is too
slow for us to backing-up the huge amount of data across 2 sites.

The following is the topology that we are using right now.

Host A NIC (MTU 9000) <--- GigLAN ---> (MTU 9216) Juniper EX4200 (MTU 9216)
<---GigLAN ---> (MTU 9018) J-6350 cluster A (MTU 9018) <--- fiber link
across site ---> (MTU 9018) J-6350 cluster B (MTU 9018) <--- GigLAN --->
(MTU 9216) Juniper EX4200 (MTU 9216) <---GigLAN ---> (MTU 9000) NIC - Host
B

I was trying to test the connectivity from Host A to the J-6350 cluster A
by using ICMP-Ping with size 8000 and DF bit set but it was failed to ping.

Does anyone have experience on it? please advise.

Thanks

Adrian_Chadd · November 26, 2010, 12:19am

TCP maximum window sizes.

Application socket buffer sizes.

Fix those and re-test!

Adrian

Wil_Schultz · November 26, 2010, 12:33am

This helps tons.

speedguide.net has some registry 'tweeks' for different versions of windows.

Also Win7 had the ability to turn on a FASTTCP type of congestion management called Compound TCP. I haven't tried the windows version so ymmv, but I have experienced great success by changing the congestion avoidance algorithm on other devices.

-wil

Hank_Nussbacher1 · November 26, 2010, 4:37am

You might want to read this:
http://kb.pert.geant.net/PERTKB/JumboMTU

-Hank

George_Bonser · November 26, 2010, 5:14am

Hi

Does anyone have experience on design / implementing the Jumbo frame
enabled network?

I am working on a project to better utilize a fiber link across east
coast
and west coast with the Juniper devices.

Based on the default TCP windows in Linux / Windows and the latency
between
east coast and west coast (~80ms) and the default MTU size 1500, the
maximum throughput of a single TCP session is around ~3Mbps but it is
too
slow for us to backing-up the huge amount of data across 2 sites.

There are a lot of stack tweaks you can make but the real answer is
larger MTU sizes in addition to those tweaks. Our network is completely
9000 MTU internally. We don't deploy any servers anymore with MTU 1500.
MTU 1500 is just plain stupid with any network >100mb ethernet.

Brandon_Kim · November 26, 2010, 4:02pm

Where would the world be if we weren't stuck at 1500 MTU? I've always kinda thought, what if that was larger
from the start....

We keep getting faster switchports, but the MTU is still 1500 MTU! I'm sure someone has done some testing with
a 10/100 switch with jumbo frames enables versus a 10/100/1000 switch using regular 1500 MTU and compared
the performance.....

Mikael_Abrahamsson · November 26, 2010, 4:40pm

1500 MTU made sense when network was 10 megabit/s.

Now that we have gig and 10GE (and soon general availability of 100GE), I don't understand why 9000 makes people excited, if we're going to do a serious effort towards larger MTU, let's make it 150000 then (100x) or at least 64k.

6x size different isn't that much, and it's going to involve a lot of work to make it happen, so if we're going to do that work, do it properly.

Geo · November 26, 2010, 5:03pm

Is there anyone on the list from facebook? Email me directly please.

George Roettger

Jon_Meek · November 26, 2010, 5:16pm

I have the "opposite problem". I use iperf to test WAN and VPN
throughput and packet loss, but find that the sending Linux system
starts out with the expected MTU / MSS but then ramps up the packet
size to way beyond 1500. The result is that network equipment must
fragment the packets. On higher bandwidth circuits there are a lot of
re-transmits that mask any real packet loss that might exist in the
path.

I have tried multiple methods to clamp the MTU, but nothing has worked
so far. This leads me to wonder how often real bulk transfer
applications start using jumbo packets that just end up getting
fragmented downstream.

The jumbo packets from iperf occur on various versions of the Linux
kernel and different distributions. It might only happen on GigE.

Suggestions on clamping the MTU are welcome.

Thanks,

Jon

Richard_Graves_RHT · November 26, 2010, 5:23pm

Jon,

Do you have something blocking MTU Path Discovery? Unless I'm off base on this, shouldn't that be taking care of your issue?

-Richard

Saku_Ytti1 · November 26, 2010, 5:26pm

Hey George,

9000 MTU internally. We don't deploy any servers anymore with MTU 1500.
MTU 1500 is just plain stupid with any network >100mb ethernet.

I'm big proponent of high MTU, to facilitate user MTU of 1500 while adding
say GRE or IPSEC overhead. But calling it plain stupid to run MTU of 1500
is quite the over statement.

irb(main):001:0> 1460.0/(38+1500)
=> 0.949284785435631
irb(main):002:0> 8960.0/(38+9000)
=> 0.991369772073468
irb(main):003:0>

You are theoretically winning 4.2%, which works only internally in your
network, so maybe you'll be able to capitalize on that 4.2% on backup
traffic or so.
Doesn't seem like that critical win to be honest.

Valdis_Kletnieks · November 26, 2010, 5:39pm

That's only half the calculation. The *other* half is if you have gear that
has a packets-per-second issue - if you go to 9000 MTU, you can move 6 times as
much data in the same packets-per-second. Anybody who's ever had to
trim a complicated ACL list because it saturated the CPU knows what I mean.

Saku_Ytti1 · November 26, 2010, 5:55pm

Academically speaking interesting topic, of course the actual time to copy
the packet is not constant, so you are not going to see linear increase in
bandwidth. It would be very nice to see graph of say VXR with long enough
ACL to cap 1500B rate very low and then see results of different packet
sizes of 3000, 6000, 9000.
If this is something you regularly need to operationally consider, do you
happen to have such numbers and if not would it be too much of work for you
to provide the numbers?

In my world, we've been running hardware lookup engines since 2003, so we
really don't need to care about features affecting lookup speed.

Bandy_Rush1 · November 26, 2010, 8:24pm

1500 MTU made sense when network was 10 megabit/s.

Now that we have gig and 10GE (and soon general availability of 100GE), I
don't understand why 9000 makes people excited, if we're going to do a
serious effort towards larger MTU, let's make it 150000 then (100x) or at
least 64k.

the reason ieee has not allowed upping of the frame size is that the crc
is at the prudent limits at 1500. yes, we do another check above the
frame (uh, well, udp4 may not), but the ether spec can not count on
that.

randy

George_Bonser · November 26, 2010, 8:56pm

> 1500 MTU made sense when network was 10 megabit/s.
>
> Now that we have gig and 10GE (and soon general availability of
100GE), I
> don't understand why 9000 makes people excited, if we're going to do
a
> serious effort towards larger MTU, let's make it 150000 then (100x)
or at
> least 64k.

the reason ieee has not allowed upping of the frame size is that the
crc
is at the prudent limits at 1500. yes, we do another check above the
frame (uh, well, udp4 may not), but the ether spec can not count on
that.

randy

The CRC loses its effectiveness at around 12K bytes so yeah, 64K bytes
would probably require a change to detect all possible double-but
errors. But 9K bytes is still within the effective range of the current
CRC algorithm.

From Dykstra:

"'Jumbo frames' extends ethernet to 9000 bytes. Why 9000? First because
ethernet uses a 32 bit CRC that loses its effectiveness above about
12000 bytes. And secondly, 9000 was large enough to carry an 8 KB
application datagram (e.g. NFS) plus packet header overhead. Is 9000
bytes enough? It's a lot better than 1500, but for pure performance
reasons there is little reason to stop there. At 64 KB we reach the
limit of an IPv4 datagram, while IPv6 allows for packets up to 4 GB in
size. For ethernet however, the 32 bit CRC limit is hard to change, so
don't expect to see ethernet frame sizes above 9000 bytes anytime soon."

But it actually washes because if you have a larger packet size, you
have fewer packets so while you might have a higher "false pass" rate on
the larger packets, since you have fewer packets involved, the actual
false pass rate for a given amount of data is virtually unchanged.

http://staff.psc.edu/mathis/MTU/arguments.html#crc

Mikael_Abrahamsson · November 26, 2010, 8:58pm

<http://staff.psc.edu/mathis/MTU/arguments.html#crc> seems to disagree?

Joel_Jaeggli · November 26, 2010, 9:29pm

10/100 switches and NICs pretty much universally do not support jumbos.

Joel's widget number 2

John_Kristoff2 · November 29, 2010, 7:10pm

I wasn't there, but I paid some attention to the discussion of
jumbos when it would frequently pop up on comp.dcom.lans.ethernet.
Rich Seifert, who was involved, would jeer jumbos and point out the
potential problems. A search in that group with his name and jumbo
frames should bring up some useful background.

In a nutshell, as I recall, one of the prime motivating factors for not
standardizing jumbos was interoperability issues with the installed
base, which penalizes other parts of the network (e.g. routers having
to perform fragmentation) for the benefit of a select few (e.g. modern
server to server comms).

I also seem to recall Rich had also once said something to the effect
that it might have been nice if larger frames were supported at the
onset of Ethernet's initial development, but alas, such is life and
it's simply too late now. The "installed base defeats us".

John

Jack_Bates · November 29, 2010, 9:18pm

Given that IPv6 doesn't support routers performing fragmentation, and many packets are sent df-bit anyways, standardized jumbos would be nice. Just because the Internet as a whole may not support them, and ethernet cards themselves may not exceed 1500 by default, doesn't mean that a standard should be written for those instances where jumbo frames would be desired.

Let's be honestly, there are huge implementations of baby giants out there. Verizon for one requires 1600 byte support for cell towers (tested at 1600 bytes for them, so slightly larger for transport gear depending on what is wrappers are placed over that). None of this indicates larger than 1500 byte IP, but it does indicate larger L2 MTU.

There are many in-house setups which use jumbo frames, and having a standard for interoperability of those devices would be welcome. I'd personally love to see standards across the board for MTU from logical to physical supporting even tiered MTU with future proof overheads for vlans, mpls, ppp, intermixed in a large number of ways and layers (IP MTU support for X sizes, overhead support for Y sizes).

Jack

Douglas_Otis · November 29, 2010, 11:21pm

The level of undetected errors by TCP or UDP checksums can be high. The summation scheme is remarkably vulnerable to bus related bit errors, where as much as 2% of parallel bus related bit errors might go undetected. Use of SCTP, TLS, or IPSEC can supplant weak TCP/UDP summation error detection schemes. While Jumbo frames reduce serial error detection rates of the IEEE CRC restored by SCTP/CRC32c for Jumbo frames, serial detection is less of a concern when compared to bus related bit error detection rates. CRC32c solves both the bus and Jumbo frame error detection and is found in 10GB/s NICs and math coprocessors.

-Doug