Long-haul 100Mbps EPL circuit throughput issue

Hello NANOG,

We've been dealing with an interesting throughput issue with one of our
carrier. Specs and topology:

100Mbps EPL, fiber from a national carrier. We do MPLS to the CPE providing
a VRF circuit to our customer back to our data center through our MPLS
network. Circuit has 75 ms of latency since it's around 5000km.

Linux test machine in customer's VRF <-> SRX100 <-> Carrier CPE (Cisco
2960G) <-> Carrier's MPLS network <-> NNI - MX80 <-> Our MPLS network <->
Terminating edge - MX80 <-> Distribution switch - EX3300 <-> Linux test
machine in customer's VRF

We can full the link in UDP traffic with iperf but with TCP, we can reach
80-90% and then the traffic drops to 50% and slowly increase up to 90%.

Any one have dealt with this kind of problem in the past? We've tested by
forcing ports to 100-FD at both ends, policing the circuit on our side,
called the carrier and escalated to L2/L3 support. They tried to also
police the circuit but as far as I know, they didn't modify anything else.
I've told our support to make them look for underrun errors on their Cisco
switch and they can see some. They're pretty much in the same boat as us
and they're not sure where to look at.

Thanks
Eric

hi eric

...

Linux test machine in customer's VRF <-> SRX100 <-> Carrier CPE (Cisco
2960G) <-> Carrier's MPLS network <-> NNI - MX80 <-> Our MPLS network <->
Terminating edge - MX80 <-> Distribution switch - EX3300 <-> Linux test
machine in customer's VRF

We can full the link in UDP traffic with iperf but with TCP, we can reach
80-90% and then the traffic drops to 50% and slowly increase up to 90%.

if i was involved with these tests, i'd start looking for "not enough tcp send
and tcp receive buffers"

for flooding at 100Mbit/s, you'd need about 12MB buffers ...

udp does NOT care too much about dropped data due to the buffers,
but tcp cares about "not enough buffers" .. somebody resend packet# 1357902456 :slight_smile:

at least double or triple the buffers needed to compensate for all kinds of
network whackyness:
data in transit, misconfigured hardware-in-the-path, misconfigured iperfs,
misconfigured kernels, interrupt handing, etc, etc

- how many "iperf flows" are you also running ??
  - running dozen's or 100's of them does affect thruput too

- does the same thing happen with socat ??

- if iperf and socat agree with network thruput, it's the hw somewhere

- slowly increasing thruput doesn't make sense to me ... it sounds like
something is cacheing some of the data

magic pixie dust
alvin

Eric,

I have seen that happen.

1st double check that the gear is truly full duplex....seems like it may
claim it is and you just discovered it is not. That's always been an issue
with manufactures claiming they are full duplex and on short distances
it's not so noticeable.

Try to perf in both directions at the same time and it become obvious.

Thank You
Bob Evans
CTO

Along with recv window/buffer which is needed for your particular
bandwidth/delay product, it appears you're also seeing TCP moving from
slow-start to a congestion avoidance mechanism (Reno, Tahoe, CUBIC etc).

Greg Foletta
greg@foletta.org

With default window size of 64KB, and a delay of 75 msec, you should only
get around 7Mbps of throughput with TCP.

You would need a window size of about 1MB in order to fill up the 100 Mbps
link.

1/0.75 = 13.333 (how many RTTs in a second)
13.333 * 65535 * 8 = 6,990,225.24 (about 7Mbps)

You would need to increase the window to 1,048,560 KB, in order to get
around 100Mbps.

13.333 * 1,048,560 * 8 = 111,843,603.84 (about 100 Mbps)

*Pablo Lucena*

*Cooper General Global Services*

*Network Administrator*

*Office: 305-418-4440 ext. 130*

*plucena@coopergeneral.com <plucena@coopergeneral.com>*

With default window size of 64KB, and a delay of 75 msec, you should only
get around 7Mbps of throughput with TCP.

You would need a window size of about 1MB in order to fill up the 100 Mbps
link.

1/0.75 = 13.333 (how many RTTs in a second)
13.333 * 65535 * 8 = 6,990,225.24 (about 7Mbps)

You would need to increase the window to 1,048,560 KB, in order to get
around 100Mbps.

13.333 * 1,048,560 * 8 = 111,843,603.84 (about 100 Mbps)

​I realized I made a typo:

1/*0.075* = 13.333

not

1/0.75 = 13.333

switch.ch has a nice bandwidth delay product calculator.
https://www.switch.ch/network/tools/tcp_throughput/

Punching in the link spec from the original post, gives pretty much exactly what you said Pablo, including that it'd get ~6.999 megabits with a default 64k window.

BDP (100 Mbit/sec, 75.0 ms) = 0.94 MByte
required tcp buffer to reach 100 Mbps with RTT of 75.0 ms >= 915.5 KByte

Theo

Hi Pablo,

Modern TCPs support and typically use window scaling (RFC 1323). You
may not notice it in packet dumps because the window scaling option is
negotiated once for the connection, not repeated in every packet.

Regards,
Bill Herrin

Modern TCPs support and typically use window scaling (RFC 1323). You
may not notice it in packet dumps because the window scaling option is
negotiated once for the connection, not repeated in every packet.

Absolutely. Most host OS should support this by now. Some test utilities
however, like iperf (at least the versions I've used) default to a 16 bit
window size though.

The goal of my response was to allude to the fact that TCP relies on
windowing unlike UDP, thus explaining the discrepancies.

This is a good article outlining these details:

https://www.edge-cloud.net/2013/06/measuring-network-throughput/

Hello NANOG,

We've been dealing with an interesting throughput issue with one of our
carrier. Specs and topology:

100Mbps EPL, fiber from a national carrier. We do MPLS to the CPE

providing

a VRF circuit to our customer back to our data center through our MPLS
network. Circuit has 75 ms of latency since it's around 5000km.

Linux test machine in customer's VRF <-> SRX100 <-> Carrier CPE (Cisco
2960G) <-> Carrier's MPLS network <-> NNI - MX80 <-> Our MPLS network <->
Terminating edge - MX80 <-> Distribution switch - EX3300 <-> Linux test
machine in customer's VRF

We can full the link in UDP traffic with iperf but with TCP, we can reach
80-90% and then the traffic drops to 50% and slowly increase up to 90%.

Any one have dealt with this kind of problem in the past? We've tested by
forcing ports to 100-FD at both ends, policing the circuit on our side,
called the carrier and escalated to L2/L3 support. They tried to also
police the circuit but as far as I know, they didn't modify anything else.
I've told our support to make them look for underrun errors on their Cisco
switch and they can see some. They're pretty much in the same boat as us
and they're not sure where to look at.

Thanks
Eric

Hi Eric,

Sounds like a TCP problem off the top of my head, however just throwing it
out there, we use a mix of wholesale access circuit providers and carriers
for locations we haven't PoP'ed and we are an LLU provider (CLEC in US
terms). For such issues I have been developing an app to test below TCP/UDP
and for pseudowires testing etc:

https://github.com/jwbensley/Etherate

It may or may not shed some light when you have an underlying problem
(although yours sounds TCP related).

Cheers,
James.