923Mbits/s across the ocean

I am not normally on this list but someone kindly gave me copies of some of the email concerning the Internet2 Land Speed record. So I have joined the list.

As one of the PIs of the record, I thought it might be useful to comment on a few interesting items I have seen, and no I am not trying to flame anybody:

"Give em a million dollars, plus fiber from here to anywhere and let me muck with the TCP algorith, and I can move a GigE worth of traffic too - Dave"

You are modest in your budgetary request. Just the Cisco router (GSR 12406) we had on free loan listed at close to a million dollars, and the OC192 links just from Sunnyvale to Chicago would have cost what was left of the million/per month.

We used a stock TCP (Linux kernel TCP). We did however, use jumbo frames (9000Byte MTUs).

In response Richard A Steenbergen we are not "now living in a tropical foreign country, with lots and lots of drugs and women" but then the weather in California is great today.

"What am I missing here, theres OC48=2.4Gb, OC192=10Gb ..."

We were running host to host (end-to-end) with a single stream with common off the shelf equipment, there are not too many (I think none) > 1GE host NICs available today that are in production (e.g. without signing a non-disclosure agreement).

"Production commercial networks ... Blow away these speeds on a regular basis".
See the above remark about end-to-end application to application, single stream.

"So, you turn down/off all the parts of TCP that allow you to share bandwidth ..."
We did not mess with the TCP stack, it was stock off the shelf.

"... Mention that "Internet speed records" are measured in terabit-meters/sec."
You are correct, this is important, but reporters want a sound bite and typically only focus on one thing at a time. I will make sure next time I talk to a reporter to emphasize this. Maybe we can get some mileage out of Petabmps (Peta bit metres per second) sounds

"What kind of production environment needs a single TCP stream of data at 1Gbits/s over a 150ms latency link?"
Today High Energy Particle Physics needs hundreds of Megabits/s between California and Europe (Lyon, Padova and Oxford) to deliver data on a timely basis form an experiment site at SLAC to regional computer sites in Europe. Today on production acadmeic networks (with sustainable rates of 100 to a few hundred Mbits/s) it takes about a day to transmit just over a Tbyte of data which just about keeps up with the data rates. The data generation rates are doubling / year so within 1-3 years we will be needing speeds like in the record on a production basis. We needed to ensure we can achieve the needed rates, and whether we can do it with off the shelf hardware, how the hosts and OS' need configuring, how to tune the TCP stack or how newer stacks perform, what are the requirements for jumbo frames etc. Besides High Energy Physics other sciences are beginning to grapple with how to repliacte large databases across the globe, such sciences include radio-astronmoy, human genome, global
weather, seismic ...

The spud gun is interesting, given the distances, probably a 747 freightliner packed with DST tapes or disks is a better idea. Assuming we fill the 747 with say 50 Gbps tapes (disks would probably be better), then if it takes 10 hours to fly from San Francisco (BTW Sunnyvale is near San Francisco not near LA as one person talking about retiring to better weather might lead one to believe) the bandwidth is about 2-4 Tbits/s. However, this ignores the reality of labelling, writing the tapes, removing from silo robot, pocaking, getting to airport, loading, unloading, getting through customs etc. In reality the latency is really closer to 2 weeks. Even worse if there is an error (heads not aligned etc.) then the the retry latency is long and the effort involved considerable. Also the network solution lends itself much better to automation, in our case we saved a couple of full time equivalent people at the sending site to distribute the data on a regular basis to our collaborator sites
in France, UK and Italy.

The remarks about window size and buffer are interesting also. It is true large windows are needed. To approach 1Gbits/s we require 40MByte windows. If this is going to be a problem, then we need to raise question like this soon and figure out how to address (add more memory, use other protocols etc.). In practice to approcah 2.5Gbits/s requires 120MByte windows.

I am quite happy to concede that this does not need to be about some jocks beating a record. I do think it is important to catch the public's attention to why high speeds are important, that they are achievable today application to application (it would also be useful to estimate when such speeds are available to universities, large companies, small companies, the home etc.), and for techies it is important to start to understand the challenges the high speeds raise, e.g. cpu and router memories, bugs in TCP, OS, application etc., new TCP stacks, new (possibly UDP based) protocols such as tsunami, need for 64 bit counters in monitoring, effects of the NIC card, jumbo requirements etc., and what is needed to address them. Also to try and put it in meaningful terms (such as 2 full length DVD movies in a minute, that could also increase the "cease and desist" legal messages shipped ;-)) is important.

Hope that helps, and thanks to you guys in the NANOG for providing todays high speed networks.

Date: Sat, 08 Mar 2003 10:04:20 -0800
From: "Cottrell, Les"

The remarks about window size and buffer are interesting
also. It is true large windows are needed. To approach
1Gbits/s we require 40MByte windows. If this is going to be
a problem, then we need to raise question like this soon and
figure out how to address (add more memory, use other
protocols etc.). In practice to approcah 2.5Gbits/s requires
120MByte windows.

Yup. About 2x to 2.5x the bandwidth*delay product.

I'm still curious about insane SACK or maybe NACK. Spray TCP
packets hoping they arrive (good odds), and wait to hear what
made or didn't make it. Let the receiving end have the large
buffers... sending machines generally must handle a greater
number of sessions. ECN also would be a nice way of telling a
sender to back off, [hopefully] proactively avoiding packet loss.

It certainly seems a shame to require big sending buffers and
slow down entire streams just in case a small bit gets lost.

Eddy

You are modest in your budgetary request. Just the Cisco router (GSR
12406) we had on free loan listed at close to a million dollars, and the
OC192 links just from Sunnyvale to Chicago would have cost what was left
of the million/per month.

No, your budget folks have no clue, which they clearly demonstrate. Anyone
here who buys Cisco at the list prices works for companies that for some
reason want to waste money. We pay about 10c on a dollar.

Anyone leasing OC-192 at that price as opposite to lighting it up is
smoking.

"What am I missing here, theres OC48=2.4Gb, OC192=10Gb ..."

We were running host to host (end-to-end) with a single stream with common
off the shelf equipment, there are not too many (I think none) > 1GE host
NICs available today that are in production (e.g. without signing a
non-disclosure agreement).

Again, if this is all available today, what is so new that you guys have
done, apart from blowing tons of money?

The remarks about window size and buffer are interesting also. It is true
large windows are needed. To approach 1Gbits/s we require 40MByte windows.
If this is going to be a problem, then we need to raise question like this
soon and figure out how to address (add more memory, use other protocols
etc.). In practice to approcah 2.5Gbits/s requires 120MByte windows.

I am quite happy to concede that this does not need to be about some jocks
beating a record. I do think it is important to catch the public's
attention to why high speeds are important, that they are achievable today
application to application (it would also be useful to estimate when such
speeds are available to universities, large companies, small companies,
the home etc.), and for techies it is important to start to understand the
challenges the high speeds raise, e.g. cpu and router memories, bugs in
TCP, OS, application etc., new TCP stacks, new (possibly UDP based)
protocols such as tsunami, need for 64 bit counters in monitoring, effects
of the NIC card, jumbo requirements etc., and what is needed to address
them. Also to try and put it in meaningful terms (such as 2 full length
DVD movies in a minute, that could also increase the "cease and desist"
legal messages shipped ;-)) is important.

High speeds are not important. High speeds at a *reasonable* cost are
important. What you are describing is a high speed at an *unreasonable*
cost.

Alex

On Sat, Mar 08, 2003 at 03:29:56PM -0500, alex@yuriev.com quacked:

High speeds are not important. High speeds at a *reasonable* cost are
important. What you are describing is a high speed at an *unreasonable*
cost.

To paraphrase many a california sufer, "dude, chill out."

The bleeding edge of performance in computers and networks is always
stupidly expensive. But once you've achieved it, the things you
did to get there start to percolate back into the consumer stream,
and within a few years, the previous bleeding edge is available
in the current O(cheap) hardware.

A cisco 7000 used to provide the latest and greatest performance
in its day, for a rather considerable cost. Today, you can get a
box from Juniper for the same price you paid for your 7000 that
provides a few orders of magnitude more performance.

But to get there, you have to be willing to see what happens when
you push the envelope. That's the point of the LSR, and a lot of
other research efforts.

  -Dave

To paraphrase many a california sufer, "dude, chill out."

When the none of my taxes goes to the silly projects, I will chill out.

It had been stated by the people that participated in this research that

(a) they bought hardware at the prices to help Cisco to make its quarters
(b) they have spent millions of dollars for OC-192 links when they did not
need them.
(c) they did not come up with anything new apart from a "proof" that they
achieved that speed.

The bleeding edge of performance in computers and networks is always
stupidly expensive. But once you've achieved it, the things you
did to get there start to percolate back into the consumer stream,
and within a few years, the previous bleeding edge is available
in the current O(cheap) hardware.

That is all great if they *actually* *developed* something. However, they
did not. They bought off the shelf products for list prices plugged them in,
ran slightly tweaked kernels, helped Qwest/Globalcrossing etc prop its
quarters and announced "we did it".

A cisco 7000 used to provide the latest and greatest performance
in its day, for a rather considerable cost. Today, you can get a
box from Juniper for the same price you paid for your 7000 that
provides a few orders of magnitude more performance.

But to get there, you have to be willing to see what happens when
you push the envelope. That's the point of the LSR, and a lot of
other research efforts.

That's the argument that pentagon used to justify buying $40 lightbulbs.
Does not work, sorry.

Alex

We used a stock TCP (Linux kernel TCP). We did however, use jumbo
frames (9000Byte MTUs).

What kind of difference did you see as opposed to standard 1500 byte
packets? I did some testing once and things actually ran slightly faster
with 1500 byte packets, completely contrary to my expectations... (This
was UDP and just 0.003 km rather than 10,000, though.)

The remarks about window size and buffer are interesting also. It
is true large windows are needed. To approach 1Gbits/s we require
40MByte windows. If this is going to be a problem, then we need to
raise question like this soon and figure out how to address (add
more memory, use other protocols etc.). In practice to approcah
2.5Gbits/s requires 120MByte windows.

So how much packet loss did you see? Even with a few packets in a
million lost this would bring your transfer way down and/or you'd need
even bigger windows.

However, bigger windows mean more congestion. When two of those boxes
start pushing traffic at 1 Gbps with a 40 MB window, you'll see 20 MB
worth of lost packets due to congestion in a single RTT.

A test where the high-bandwidth session or several high-bandwidth
sessions have to live side by side with other traffic would be very
interesting. If this works well it opens up possibilities of doing this
type of application over real networks rather than (virtual)
point-to-point links where congestion management isn't an issue.

That is not the argument used to justify buying 40 lightbulbs. They do
not actually purchase 40 lightbulbs, the prices that you see in rag
magazine reports has to do with how the budgets are handled. If you can
budget a multi-billion dollar organization and put in reasonable price
and performance controls there are many schools that would hire you
after you revolutionized public administration and the DoD...