Jumbo frame Question

Kevin_Oberman · November 26, 2010, 12:26am

From: Harris Hui <harris.hui@hk1.ibm.com>
Date: Fri, 26 Nov 2010 08:13:57 +0800

Hi

Does anyone have experience on design / implementing the Jumbo frame
enabled network?

I am working on a project to better utilize a fiber link across east coast
and west coast with the Juniper devices.

Based on the default TCP windows in Linux / Windows and the latency between
east coast and west coast (~80ms) and the default MTU size 1500, the
maximum throughput of a single TCP session is around ~3Mbps but it is too
slow for us to backing-up the huge amount of data across 2 sites.

The following is the topology that we are using right now.

Host A NIC (MTU 9000) <--- GigLAN ---> (MTU 9216) Juniper EX4200 (MTU 9216)
<---GigLAN ---> (MTU 9018) J-6350 cluster A (MTU 9018) <--- fiber link
across site ---> (MTU 9018) J-6350 cluster B (MTU 9018) <--- GigLAN --->
(MTU 9216) Juniper EX4200 (MTU 9216) <---GigLAN ---> (MTU 9000) NIC - Host
B

I was trying to test the connectivity from Host A to the J-6350 cluster A
by using ICMP-Ping with size 8000 and DF bit set but it was failed to ping.

Does anyone have experience on it? please advise.

Thanks

MTU is only one issue. System tuning and a clean path are also
critical. Getting good data streams between two systems that far apart
is not easy, but with reasonable effort you can get 300 to 400 Mbps.

If an 8000 byte ping fails, that says that SOMETHING is not jumbo
enabled, but it's hard to tell what. This assumes that no firewall or
other device is blocking ICMP, but I assume that 1400 byte pings
work. Try hop-by-hop tests.

I should also mention that some DWDM gear needs to be configured to
handle jumbos. We've been bitten by that. You tend to assume that layer
1 gear won't care about layer 2 issues, but the input is an Ethernet
interface.

Finally, host tuning is critical. You talk about "default" window size",
but modern stack auto-tune window size. For lots of information on
tuning and congestion management, see http://fasterdata.es.net. We move
terabytes of data between CERN and the US and have to make sure that the
10GE links run at close to capacity and streams of more than a Gbps will
work. (It's not easy.)

Matthew_Petach2 · November 26, 2010, 1:06am

We move hundreds of TB around from one side of the planet to the
other on a regular basis. Kevin's link has some really good resources
listed on it. I can't stress enough the requirement for doing BOTH
OS-level kernel tuning (make sure that RFC1323 extensions are
enabled, make sure you have big enough maximum send and receive
buffers; if you OS does auto-tuning, make sure the maximum parameters
set are big enough to support all the data you'll want to have in flight at
any one time) AND application level adjustments. One of the biggest
stumbling blocks we run across is people who have done their OS tuning,
but then try to use stock SSH/SCP for moving files around. It doesn't
matter how much tuning you do in the OS if your application only has
a 1MB or 64KB buffer for data handling, you just won't get the throughput
you're looking for.

But with proper OS and application layer tuning, you can move a lot of
data even over stock 1500 byte frames; don't be distracted by jumboframes,
it's a red herring when it comes to actually moving large volumes of data
around. (yes, yes, it's not completely irrelevant, for the pedants in the
audience--but it's not required by any means).

Matt