TCP and WAN issue

Philip_Lavine · March 27, 2007, 8:26pm

To all,

I have an east coast and west coast data center connected with a DS3. I am running into issues with streaming data via TCP and was wondering besides hardware acceleration, is there any options at increasing throughput and maximizing the bandwidth? How can I overcome the TCP stack limitations inherent in Windows (registry tweaks seem to not functions too well)?

Philip

Joe_Abley1 · March 27, 2007, 8:35pm

You might take a look through RFC 2488/BCP 28, if you haven't already. The circuit propagation delays in that scenarios painted by that document are far higher than yours, but the principles are the same.

Joe

Roland_Dobbins1 · March 27, 2007, 8:37pm

You should certainly look at your MTU and MSS values to ensure there are no difficulties of that sort. Is there any other factor such as perhaps slow DNS server (or other lookup-type services) responses which can be contributing to the perceived slowness? How about tuning the I/O buffers on the relevant routers? Can you tune the I/O buffers on the servers?

And what about your link utilization? Is the DS3 sufficient? Take a look at pps and bps, and take a look at your average packet sizes (NetFlow can help with this). Are your apps sending lots of smaller packets, or are you getting nice, large packet-sizes?

Finally, if none of the above help, you could look at something like SCTP, if your servers and apps support it. But I'd go through the above exercise, first.

Joe_Abley1 · March 27, 2007, 8:43pm

"... in *the* scenarios..." I am having trouble with words, today.

jako.andras · March 27, 2007, 9:02pm

Philip,

I have an east coast and west coast data center connected with a DS3. I
am running into issues with streaming data via TCP and was wondering
besides hardware acceleration, is there any options at increasing
throughput and maximizing the bandwidth? How can I overcome the TCP
stack limitations inherent in Windows (registry tweaks seem to not
functions too well)?

I don't know the RTT, but you should have at least 300 kByte buffers on
the end hosts for a 60 ms RTT path to reach 40 Mbps TCP throughput. (This
requires window scaling as well.) Is this what you were trying to tune on
your Windows hosts?

Is your DS3 free of errors? Even a very low packet loss can degrade TCP
performance badly.

You'll find a lot of useful information about TCP performance in the
GEANT2 PERT Knowledge Base at http://www.kb.pert.geant2.net/

Andras

Robert_Boyle1 · March 27, 2007, 9:04pm

You will have problems obtaining anything more than 5-7Mbit/s based on 1500 byte Ethernet packets and a RTT latency of 70-90ms. You can increase your window size or use Jumbo Ethernet frames. Almost all GigE gear supports jumbo frames. I'm not sure of your application, but without OS tweaks, each stream is limited to 5-7Mbit/s. You can open multiple streams between the same two hosts or you can use multiple hosts to transfer your data. You can utilize the entire DS3, but not without OS TCP stack tweaks or a move to jumbo frames. You can also use UDP or another connectionless packet method to move the data between sites. Good luck.

-Robert

Tellurian Networks - Global Hosting Solutions Since 1995
http://www.tellurian.com | 888-TELLURIAN | 973-300-9211
"Well done is better than well said." - Benjamin Franklin

michael.dillon · March 27, 2007, 10:11pm

I have an east coast and west coast data center connected
with a DS3. I am running into issues with streaming data via
TCP and was wondering besides hardware acceleration, is there
any options at increasing throughput and maximizing the
bandwidth?

Use GigE cards on the servers with a jumbo MTU and only buy IP network
access from a service provider who supports jumbo MTUs end-to-end
through their network.

How can I overcome the TCP stack limitations
inherent in Windows (registry tweaks seem to not functions too well)?

Install a pair of Linux servers and use them to send/receive the data
over the WAN.

Also, do some googling for Internet Speed Record and read pages like
this one:
http://www.internet2.edu/lsr/history.html

And read up on scaling MTI size with bandwidth here:
http://www.psc.edu/~mathis/MTU/arguments.html

--Michael Dillon

Jim_Shankland1 · March 27, 2007, 10:32pm

<michael.dillon@bt.com> writes:

Use GigE cards on the servers with a jumbo MTU and only buy IP network
access from a service provider who supports jumbo MTUs end-to-end
through their network.

I'm not sure that I see how jumbo frames help (very much). The
principal issue here is the relatively large bandwidth-delay
product, right? So you need large TCP send buffers on the sending
side, a large (scaled) receive window on the receiver side, and
turn on selective acknowledgement (so that you don't have to
resend the whole send buffer if a packet gets dropped).

At 45 Mb/s and 120 ms RTT, you need to be able to have ca. 700 KBytes
of data "in flight"; round up and call it a megabyte.

Having said that, I too have tried to configure Windows to use
a large send buffer, and failed. (In my case, it was Windows
machines at a remote location sending to Linux machines.)
I'm not a Windows person; maybe I didn't try hard enough. In
the event, I threw up my hands and installed a Linux proxy server
at the remote site, appropriately configured, and went home happy.

Jim Shankland

Mikael_Abrahamsson · March 27, 2007, 10:37pm

You should talk to the vendor (microsoft) and ask them how to tweak their product to properly work over the WAN.

Don't let them get away with substandard product when it comes to WAN optimization. If you can get microsoft to clean up their act, you'd have done ISPs a great service, because then we can stop trying to convince customers that it's not ISP fault that they get bad speeds with their windows PCs.

michael.dillon · March 27, 2007, 11:14pm

> Use GigE cards on the servers with a jumbo MTU and only buy
IP network
> access from a service provider who supports jumbo MTUs end-to-end
> through their network.

I'm not sure that I see how jumbo frames help (very much). The
principal issue here is ...

The people who know what helps are the ones who have been setting the
Internet land speed records. They typically use frames larger than 1500.
As for the principal issue, well, if there are several factors that will
contribute to solving the problem, I think that you get better results
if you attack all of them in parallel. Then, if you learn that there
really is one principal factor and you need some management approval to
move on that issue, you will have laid the groundwork for making a
business case because you've already done all the other things.

--Michael Dillon

Jim_Shankland1 · March 27, 2007, 11:28pm

<michael.dillon@bt.com> writes:

[...]
if there are several factors that will contribute to solving the
problem, I think that you get better results if you attack all of them
in parallel.

Well, I guess; except that "only buy IP network access from a service
provider who supports jumbo MTUs end-to-end through their network"
may be a much bigger task than tuning your TCP stack.

Jumbo frames seem to help a lot when trying to max out a 10 GbE link,
which is what the Internet land speed record guys have been doing.
At 45 Mb/s, I'd be very surprised if it bought you more than 2-4%
in additional throughput. It's worth a shot, I suppose, if the
network infrastructure supports it.

On a coast-to-coast DS-3, a TCP stack that's correctly tuned for a
high bandwidth-delay product environment, on the other hand, is
likely to outperform an untuned stack by a factor of 10 or so
in bulk transport over a single TCP session. (Though, as somebody
pointed out, tuning may have to occur all the way up the application
stack; there are, e.g., ssh patches out there for high-BDP environments.)

So I guess, sure, try anything you can; but I know what I'd try
first :-).

Jim Shankland

Perry_Lorier1 · March 28, 2007, 12:53am

Jim Shankland wrote:

<michael.dillon@bt.com> writes:

Use GigE cards on the servers with a jumbo MTU and only buy IP network
access from a service provider who supports jumbo MTUs end-to-end
through their network.

I'm not sure that I see how jumbo frames help (very much).

Jumbograms don't change your top speed, but they do mean you acclerate through slow start more quickly. If there is non-congestion based packet loss on a link you can end up with slow start being stopped early, and waiting for linear increase which can mean it will take hours to reach steady state instead of minutes. Jumbograms reduces this by a factor of 6 which of course helps (60 minutes -> 10 minutes...).

At 45 Mb/s and 120 ms RTT, you need to be able to have ca. 700 KBytes
of data "in flight"; round up and call it a megabyte.

I have written a calculator to help people explore these issues:
http://wand.net.nz/~perry/max_download.php

It also includes TFRC to show how non-congestion-related packet loss impacts your performance too (got a dodgy wireless hop there somewhere? Well expect everything to be glacially slow...)

Having said that, I too have tried to configure Windows to use
a large send buffer, and failed. (In my case, it was Windows
machines at a remote location sending to Linux machines.)
I'm not a Windows person; maybe I didn't try hard enough. In
the event, I threw up my hands and installed a Linux proxy server
at the remote site, appropriately configured, and went home happy.

I've never really been a windows guy either and I've never had a windows machine in a position that it needed to be tuned. Of course most of the tuning is just upping the rwin. Apparently Vista has a larger default rwin, and an optional "Compound TCP" congestion control system designed for use over high bandwidth delay WAN links if upgrading windows is an option.

Lincoln_Dale · March 28, 2007, 1:24am

I have an east coast and west coast data center connected with a DS3. I am
running into issues with streaming data via TCP and was wondering besides
hardware acceleration, is there any options at increasing throughput and
maximizing the bandwidth? How can I overcome the TCP stack limitations
inherent in Windows (registry tweaks seem to not functions too well)?

even on "default settings" on a modern TCP stack, getting close to
path-line-rate on a 80msec RTT WAN @ DS3 speeds with a single TCP stream should
not be that difficult.

the Windows TCP stack as of Windows XP SP2 has some fairly decent defaults. it
will do RFC1323 / large windows / SACK., but all of these can be tuned with
registry settings if you wish.

with a particular implementation of FCIP (Fibre Channel over IP) i worked on,
we could pretty much sustain a single TCP stream from a single GbE port at
wire-rate GbE with RTT up to 280msec with minimal enhancements to TCP.
at that point it started to get difficult because you had close to 32MB of data
in transit around at any given time, which is the current standard limit for
how large you can grow a TCP window.

i think the first thing you should do is ascertain there are no problems with
your LAN or WAN. i.e. that there are no drops being recorded, no duplex
mismatch anywhere, etc.

i suggest you fire up "ttcp" on a host on each end and see what throughput you
get. with both tcp & udp you should be able to get close to 5.5 to 5.6 MB/s.
if you can't, i'd suggest looking into why & addressing the root cause.

once you've done that, its then a case of ensuring the _applications_ you're
using can actually "fill the pipe" and aren't latency-sensitive at that
distance.

Marshall_Eubanks3 · March 28, 2007, 1:33am

You might want to look at this classic by Stanislav Shalunov

http://shlang.com/writing/tcp-perf.html

Marshall

Steve_Meuse2 · March 28, 2007, 1:40am

I was under the impression that XP’s default window size was 17,520 bytes, rfc1323 options disabled.

Assuming 80ms and 45Mb/s, I come up with a window size of 440Kbytes required to fill the pipe. At windows default I would only expect to see 220Kbs over that same path.

I think even modern *nix OSs tend to have default window sizes in the 64kB region, still not enough for that Bandwidth/delay.

Hank_Nussbacher1 · March 28, 2007, 4:46am

Use GigE cards on the servers with a jumbo MTU and only buy IP network
access from a service provider who supports jumbo MTUs end-to-end
through their network.

To check MTU on transit paths, try mturoute:

As well as the MTU eye-chart:

-Hank Nussbacher
http://www.interall.co.il

Andre_Oppermann · March 28, 2007, 9:12am

Marshall Eubanks wrote:

You might want to look at this classic by Stanislav Shalunov

http://shlang.com/writing/tcp-perf.html

The description on this website is very good.

Disclaimer: I'm a FreeBSD TCP/IP network stack kernel hacker.

To quickly sum up the facts and to dispell some misinformation:

  - TCP is limited the delay bandwidth product and the socket buffer sizes.
  - for a T3 with 70ms your socket buffer on both endss should be 450-512KB.
  - TCP is also limited by the round trip time (RTT).
  - if your application is working in a request/reply model no amount of
    bandwidth will make a difference. The performance is then entirely
    dominated by the RTT. The only solution would be to run multiple
    sessions in parallel to fill the available bandwidth.
  - Jumbo Frames have definately zero impact on your case as they don't
    change any of the limiting parameters and don't make TCP go faster.
    There are certain very high-speed and LAN (<5ms) case where it may
    make a difference but not here.
  - Your problem is not machine or network speed, only tuning.

Change these settings on both ends and reboot once to get better throughput:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
"SackOpts"=dword:0x1 (enable SACK)
"TcpWindowSize"=dword:0x7D000 (512000 Bytes)
"Tcp1323Opts"=dword:0x3 (enable window scaling and timestamps)
"GlobalMaxTcpWindowSize"=dword:0x7D000 (512000 Bytes)

http://www.microsoft.com/technet/network/deploy/depovg/tcpip2k.mspx

Marshall_Eubanks3 · March 28, 2007, 12:34pm

Marshall Eubanks wrote:

You might want to look at this classic by Stanislav Shalunov
http://shlang.com/writing/tcp-perf.html

The description on this website is very good.

Disclaimer: I'm a FreeBSD TCP/IP network stack kernel hacker.

To quickly sum up the facts and to dispell some misinformation:

- TCP is limited the delay bandwidth product and the socket buffer sizes.
- for a T3 with 70ms your socket buffer on both endss should be 450-512KB.
- TCP is also limited by the round trip time (RTT).
- if your application is working in a request/reply model no amount of
   bandwidth will make a difference. The performance is then entirely
   dominated by the RTT. The only solution would be to run multiple
   sessions in parallel to fill the available bandwidth.
- Jumbo Frames have definately zero impact on your case as they don't
   change any of the limiting parameters and don't make TCP go faster.
   There are certain very high-speed and LAN (<5ms) case where it may
   make a difference but not here.
- Your problem is not machine or network speed, only tuning.

Change these settings on both ends and reboot once to get better throughput:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
"SackOpts"=dword:0x1 (enable SACK)
"TcpWindowSize"=dword:0x7D000 (512000 Bytes)
"Tcp1323Opts"=dword:0x3 (enable window scaling and timestamps)
"GlobalMaxTcpWindowSize"=dword:0x7D000 (512000 Bytes)

http://www.microsoft.com/technet/network/deploy/depovg/tcpip2k.mspx

And, of course, if you have Ethernet duplex or other mismatch issues anywhere along the
path, performance will be bad.

Regards
Marshall

Simon_Leinen1 · March 28, 2007, 1:23pm

Andre Oppermann gave the best advice so far IMHO.
I'll add a few points.

To quickly sum up the facts and to dispell some misinformation:

- TCP is limited the delay bandwidth product and the socket buffer
sizes.

Hm... what about: The TCP socket buffer size limits the achievable
throughput-RTT product?

- for a T3 with 70ms your socket buffer on both endss should be
450-512KB.

Right. (Victor Reijs' "goodput calculator" says 378kB.)

- TCP is also limited by the round trip time (RTT).

This was stated before, wasn't it?

- if your application is working in a request/reply model no amount
   of bandwidth will make a difference. The performance is then
   entirely dominated by the RTT. The only solution would be to run
   multiple sessions in parallel to fill the available bandwidth.

Very good point. Also, some applications have internal window
limitations. Notably SSH, which has become quite popular as a bulk
data transfer method. See http://kb.pert.geant2.net/PERTKB/SecureShell

- Jumbo Frames have definately zero impact on your case as they
don't change any of the limiting parameters and don't make TCP go
faster.

Right. Jumbo frames have these potential benefits for bulk transfer:

(1) They reduce the forwarding/interrupt overhead in routers and hosts
by reducing the number of packets. But in your situation it is quite
unlikely that the packet rate is a bottleneck. Modern routers
typically forward even small packets at line rate, and modern
hosts/OSes/Ethernet adapters have mechanisms such as "interrupt
coalescence" and "large send offload" that make the packet size
largely irrelevant. But even without these mechanisms and with
1500-byte packets, 45 Mb/s shouldn't be a problem for hosts built in
the last ten years, provided they aren't (very) busy with other
processing.

(2) As Perry Lorier pointed out, jumbo frames accelerate the "additive
increase" phases of TCP, so you reach full speed faster both at
startup and when recovering from congestion. This may be noticeable
when there is competition on the path, or when you have many smaller
transfers such that ramp-up time is an issue.

(3) Large frames reduce header overhead somewhat. But the improvement
going from 1500-byte to 9000-bytes packets is only 2-3%, from ~97%
efficiency to ~99.5%. No orders of magnitude here.

There are certain very high-speed and LAN (<5ms) case where it
may make a difference but not here.

Cases where jumbo frames might make a difference: When the network
path or the hosts are pps-limited (in the >Gb/s range with modern
hosts); when you compete with other traffic. I don't see a relation
with RTTs - why do you think this is more important on <5ms LANs?

- Your problem is not machine or network speed, only tuning.

Probably yes, but it's not clear what is actually happening. As it
often happens, the problem is described with very little detail, so
experts (and "experts" have a lot of room to speculate.

This was the original problem description from Philip Lavine:

I have an east coast and west coast data center connected with a
DS3. I am running into issues with streaming data via TCP

In the meantime, Philip gave more information, about the throughput he
is seeing (no mention how this is measured, whether it is total load
on the DS3, throughput for an application/transaction or whatever):

This is the exact issue. I can only get between 5-7 Mbps.

And about the protocols he is using:

I have 2 data transmission scenarios:

    1. Microsoft MSMQ data using TCP
    2. "Streaming" market data stock quotes transmitted via a TCP
       sockets

It seems quite likely that these applications have their own
performance limits in high-RTT situations.

Philip, you could try a memory-to-memory-test first, to check whether
TCP is really the limiting factor. You could use the TCP tests of
iperf, ttcp or netperf, or simply FTP a large-but-not-too-large file
to /dev/null multiple times (so that it is cached and you don't
measure the speed of your disks).

If you find that this, too, gives you only 5-7 Mb/s, then you should
look at tuning TCP according to Andre's excellent suggestions quoted
below, and check for duplex mismatches and other sources of
transmission errors.

If you find that the TCP memory-to-memory-test gives you close to DS3
throughput (modulo overhead), then maybe your applications limit
throughput over long-RTT paths, and you have to look for tuning
opportunities on that level.

Andy_Davidson · March 29, 2007, 7:56am

The original poster was talking about a streaming application - increasing the frame size can cause it take longer for frames to fill a packet and then hit the wire increasing actual latency in your application.

Probably doesn't matter when the stream is text, but as voice and video get pushed around via IP more and more, this will matter.