Best utilizing fat long pipes and large file transfer

Date: Thu, 12 Jun 2008 15:37:47 -0700
From: Sean Knox <sean@craigslist.org>

Hi,

I'm looking for input on the best practices for sending large files over
a long fat pipe between facilities (gigabit private circuit, ~20ms RTT).
I'd like to avoid modifying TCP windows and options on end hosts where
possible (I have a lot of them). I've seen products that work as
"transfer stations" using "reliable UDP" to get around the windowing
problem.

I'm thinking of setting up servers with optimized TCP settings to push
big files around data centers but I'm curious to know how others deal
with LFN+large transfers.

Not very fat or very long. I need to deal with 10GE over 200 ms (or more).

These should be pretty easy, but as you realize, you will need large
enough windows to keep the traffic in transit from filling the window
and stalling the flow. The laws of physics (speed of light) are not
forgiving.

There is a project from Martin Swaney at U-Delaware (with Guy Almes and
Aaron Brown) to do exactly what you are looking for.

and
http://www.internet2.edu/pubs/phoebus.pdf
ESnet, Internet2 and Geant demonstrated it at last November's
SuperComputing Conference in Reno.

The idea is to use tuned proxies that are close to the source and
destination and are optimized for the delay. Local systems can move data
through them without dealing with the need to tune for the
delay-bandwidth product. Note that this "man in the middle" may not
play well with many security controls which deliberately try to prevent
it, so you still may need some adjustments.

The idea is to use tuned proxies that are close to the source and
destination and are optimized for the delay. Local systems can move data
through them without dealing with the need to tune for the
delay-bandwidth product. Note that this "man in the middle" may not
play well with many security controls which deliberately try to prevent
it, so you still may need some adjustments.

and for those of us who are addicted to simple rsync, or whatever over
ssh, you should be aware of the really bad openssh windowing issue.

randy

Karl Auerbach wrote:

Randy Bush wrote:

and for those of us who are addicted to simple rsync, or whatever over
ssh, you should be aware of the really bad openssh windowing issue.

I was not aware of this. Do you have a pointer to a description?

see the work by rapier and stevens at psc

    <http://www.psc.edu/networking/projects/hpn-ssh/&gt;

this is why rsync starts off at a blazing pace and then takes a serious
dive.

randuy

And while I certainly like open source solutions, there are plenty of
commercial products that do things to optimize this. Depending on the type
of traffic the products do different things. Many of the serial-byte
caching variety (e.g. Riverbed/F5) now also do connection/flow optimization
and proxying, while many of the network optimizers now are doing serial-byte
caching.

I also for a while was looking for multicast based file transfer tools, but
couldn't find any that were stable. I'd be interested in seeing the names
of some of the projects Robert is talking about- perhaps I missed a few when
I looked.

One thing that is a simple solution? Split the file and then send all the
parts at the same time. This helps a fair bit, and is easy to implement.

Few things drive home the issues with TCP window scaling better than moving
a file via ftp and then via ttcp. Sure, you don't always get all the file,
but it does get there fast!

--D

Randy Bush <randy@psg.com> writes:

and for those of us who are addicted to simple rsync, or whatever over
ssh, you should be aware of the really bad openssh windowing issue.

As a user of hpn-ssh for years, I have to wonder if there is any
reason (aside from the sheer cussedness for which Theo is infamous)
that the window improvements at least from hpn-ssh haven't been
backported into mainline openssh? I suppose there might be
portability concerns with the multithreaded ciphers, and there's
certainly a good argument for not supporting NONE as a cipher type out
of the box without a recompile, but there's not much excuse for the
fixed size tiny buffers - I mean, it's 2008 already...

-r

OpenBSD has relayd(8), a versatile tool which can be used here.
There is support for proxying TCP connections. These can be modified
in a few ways - socket options (nodelay, sack, socket buffer) can
be adjusted on the relayed connection, also SSL can be offloaded.
It works with the firewall state table and can retain the original
addresses. Parts of this are only in development snapshots at present
but will be in the 4.4 release.

Robert E. Seastrom wrote:

As a user of hpn-ssh for years, I have to wonder if there is any
reason (aside from the sheer cussedness for which Theo is infamous)
that the window improvements at least from hpn-ssh haven't been
backported into mainline openssh? I suppose there might be
portability concerns with the multithreaded ciphers, and there's
certainly a good argument for not supporting NONE as a cipher type out
of the box without a recompile, but there's not much excuse for the
fixed size tiny buffers - I mean, it's 2008 already...

Fedora 8 and 9 and Ubuntu 8.04 include the upstream OpenSSH which include
large window patches. OpenSSH 4.7 ChangeLog contains:

Other changes, new functionality and fixes in this release:

...

* The SSH channel window size has been increased, and both ssh(1)
   sshd(8) now send window updates more aggressively. These improves
   performance on high-BDP (Bandwidth Delay Product) networks.

Cheers, Glen

Glen Turner <gdt@gdt.id.au> writes:

Fedora 8 and 9 and Ubuntu 8.04 include the upstream OpenSSH which include
large window patches. OpenSSH 4.7 ChangeLog contains:

Other changes, new functionality and fixes in this release:

...

* The SSH channel window size has been increased, and both ssh(1)
   sshd(8) now send window updates more aggressively. These improves
   performance on high-BDP (Bandwidth Delay Product) networks.

Turns out that the Mac does too. Haven't checked FreeBSD 7 yet, but
6.x is definitely lagging.

Thanks for the clue,

-r