Can someone explain how a TCP conversation could degenerate into congestion avoi
dance on a long fat pipe if there is no packet/segment loss or out of order segm
ents?
Here is the situation:
WAN = 9 Mbps ATM connection between NY and LA (70 ms delay)
LAN = Gig Ethernet
Receiver: LA server = Win2k3
Sender: NY server = Linux 2.4
Data transmission typical = bursty but never more that 50% of CIR
Segment sizes = 64k to 1460k but mostly less than 100k
Typical Problem Scenario: Data transmission is humming along consistently at 2 M
bps, all of a sudden transmission rates drop to nothing then pickup again after
15-20 seconds. Prior to the drop off (based on packet capture) there is usually
a DUP ACK/SACK coming from the receiver followed by the Retransmits and congesti
on avoidence. What is strange is there is nothing prior to the drop off that wou
ld be an impetus for congestion (no high BW utilization or packet loss).
Also is there any known TCP issues between linux 2.4 kernel and windows 2003 SP1
? Mainly are there issues regarding the handling of SACK, DUP ACK's and Fast Ret
ransmits.
Of course we all know that this is not a application issue since developers make
flawless socket code, but if it is network issue how is caused?
Philip
In order to solve this, you need to see a trace from both sides of the
WAN. Which side is your trace from? Can you see the original ACK on
both ends?
If the receiver is sending a DUP ACK, then the sender either never
received the first ACK or it didn't receive it within the timeframe it
expected.
Brian
or received it out of order.
Yes, a tcpdump trace is the first step.
Can someone explain how a TCP conversation could degenerate into congestion av
oidance on a long fat pipe if there is no packet/segment loss or out of order se
gments?
Here is the situation:
WAN = 9 Mbps ATM connection between NY and LA (70 ms delay)
Do you know there is no cell loss on your ATM path? Have you
also accounted for SAR overhead?
Do you know if they use any sort of cell-chaining technology
in their network to reduce overhead?
- Jared
Just because there wasn't any congestion reason that *you* could see where you
hat your instrumentation doesn't mean there's 100% congestion free end-to-end.
(Feel free to hit delete if you actually *do* have instrumentation looking
both directions on every segment involved).
Who knows, maybe a few packets got corrupted on the wire, and the TCP chucksum
actually caught it and dropped the offending packets.
> consistently at 2 Mbps, all of a sudden transmission rates drop to
> nothing then pickup again after 15-20 seconds. Prior to the drop off
> (based on packet capture) there is usually a DUP ACK/SACK coming
> from the receiver followed by the Retransmits and congestion
> avoidence. What is strange is there is nothing prior to the drop off
> that would be an impetus for congestion (no high BW utilization or
> packet loss).
Perhaps you're filling buffers by flowing from a 1Gbps link into a 9Mbps circuit. Dropped packets induces slow-start/congestion avoidance.
Joe
Who knows, maybe a few packets got corrupted on the wire, and
the TCP chucksum actually caught it and dropped the offending packets.
Or there could be flags in the bitstream...
--Michael Dillon
Philip Lavine wrote:
Can someone explain how a TCP conversation could degenerate into congestion av
oidance on a long fat pipe if there is no packet/segment loss or out of order se
gments?
Here is the situation:
WAN = 9 Mbps ATM connection between NY and LA (70 ms delay)
LAN = Gig Ethernet
Receiver: LA server = Win2k3
Sender: NY server = Linux 2.4
Data transmission typical = bursty but never more that 50% of CIR
Segment sizes = 64k to 1460k but mostly less than 100k
Typical Problem Scenario: Data transmission is humming along consistently at 2
Mbps, all of a sudden transmission rates drop to nothing then pickup again afte
r 15-20 seconds. Prior to the drop off (based on packet capture) there is usuall
y a DUP ACK/SACK coming from the receiver followed by the Retransmits and conges
tion avoidence. What is strange is there is nothing prior to the drop off that w
ould be an impetus for congestion (no high BW utilization or packet loss).
Also is there any known TCP issues between linux 2.4 kernel and windows 2003 S
P1? Mainly are there issues regarding the handling of SACK, DUP ACK's and Fast R
etransmits.
Of course we all know that this is not a application issue since developers ma
ke flawless socket code, but if it is network issue how is caused?
Duplex mismatch on an intermediate ethernet segment?
So, when you say "pickup again after 15-20 seconds" do you mean that
it takes 15-20 seconds to ramp back up to the original speed or that
the line is basically idle for 15-20 seconds before any packets start
flowing again? If the latter, I'd suggest that you take a look at the
apps some more..
Actually, you might want to try and duplicate the issue with
identical machines sitting next to each other and a piece of cable
between them...
Philip Lavine wrote:
Can someone explain how a TCP conversation could degenerate into
congestion avoidance on a long fat pipe if there is no packet/
segment loss or out of order segments? Here is the situation:
WAN = 9 Mbps ATM connection between NY and LA (70 ms delay)
LAN = Gig Ethernet
Receiver: LA server = Win2k3
Sender: NY server = Linux 2.4
Data transmission typical = bursty but never more that 50% of CIR
Segment sizes = 64k to 1460k but mostly less than 100k
Typical Problem Scenario: Data transmission is humming along
consistently at 2 Mbps, all of a sudden transmission rates drop to
nothing then pickup again after 15-20 seconds. Prior to the drop
off (based on packet capture) there is usually a DUP ACK/SACK
coming from the receiver followed by the Retransmits and
congestion avoidence. What is strange is there is nothing prior to
the drop off that would be an impetus for congestion (no high BW
utilization or packet loss).
Also is there any known TCP issues between linux 2.4 kernel and
windows 2003 SP1? Mainly are there issues regarding the handling
of SACK, DUP ACK's and Fast Retransmits. Of course we all know
that this is not a application issue since developers make
flawless socket code, but if it is network issue how is caused?
Duplex mismatch on an intermediate ethernet segment?
Oooh, I like that one....