Best utilizing fat long pipes and large file transfer

Hi Sean,
from thursday, we have copied some ~300 GB packages from Prague to San Diego (~200 ms delay, 10 GE flat ethernet end machines connected via 1GE) files using RBUDP which worked great.

Each scenario needs some planning. You have to answer several questions:
1) What is the performance of storage subsystem (sometimes you need to connect external harddrives or tape robot)
2) How many files you need to transfer?
3) How big are these files?
4) What is the consistency scenarion (it is file consistency or package consistency)?

In example, I've sent some film data. Lot (~30.000) of 10 MB DPXes. Consistency was package based. Harddrives have been at the beggining connected via iLink (arrived on this media), then moved to eSATA (went to shop, bought another drive and connected it into export machine). Main tuning for RBUDP has been to buy another harddrive and tar these files.


Many thanks for great replies on and off-list.

The suggestions basically ranged from these options:

1. tune TCP on all hosts you wish to transfer between
2. create tuned TCP proxies and transfer through those hosts
3. setup a socat (netcat++) proxy and send through this host
4. use an alternative to plain netcat/scp for large file transfers

My needs are pretty simple: occasionally I need to push large database files (300Gb+) around linux hosts. #4 seems like the best option for me.

People suggested a slew of methods to do this: RBUDP, gridftp, bbcp, and many others, with programs either sending with reliable UDP or breaking large transfers into multiple streams. Because it was easy to use right away, I tried RBUDP and was able to copy a tarball at about 700Mb/s over a 20ms delay link, and when factoring in destination disk write speed, isn't too bad a starting point. GridFTP and bbcp look very useful too; I'll be exploring them as well. The presentation links Kevin O. sent were very interesting.

I've looked at HPN-SSH before but haven't played with it much. I'll definitely try it out based on the feedback from this thread.

Thanks again.


Sean Knox wrote: