I'm losing the will to live with this networking headache ! Please feel free
to point me at a Linux list if NANOG isn't suitable. I'm at a loss where
else to ask.
I've diagnosed some traffic oddities and after lots of head-scratching,
reading and trial and error I can say with certainty that:
With and without shaping and over different bandwidth providers using the
e1000 driver for an Intel PRO/1000 MT Dual Port Gbps NIC (82546EB) I can
replicate full, expected throughput with UDP but consistently only get
300kbps - 600kbps throughput _per connection_ for outbound TCP (I couldn't
find a tool I trusted to replicate ICMP traffic). Multiple connections are
cumulative and increase incrementally at roughly 300kbps - 600kbps. Inbound
seems slightly erratic in holding a consistent speed but manages 15Mbps as
expected, a far cry from 300kbps to 600kbps.
The router is Quad Core sitting at no load and there's very little traffic
being forwarded back and forth. The NIC's kernel parameters are set at
default as 'built-in'. NAPI is not enabled though (enabling it requires a
reboot which is a problem as this box is in production).
The only other change to the box is that over Christmas IPtables
(ip_conntrack and its associated modules mainly) was loaded into the kernel
as 'built-in'. There's no sign of packet loss on any tests and I upped the
conntrack max_connections size suitably for the amount of RAM. Has anyone
come across IPtables without any rules loaded causing throughput issues ?
I've also changed the following kernel parameters with no luck:
It feels to me like a buffer limit is being reached 'per connection'. The
throughput spikes at around 1.54Mbps and TCP backs off to about 300kbps -
600kbps or so. What am I missing ? Is NAPI that essential for such low
traffic ? A very similar build moved far higher throughput on cheap NICs.
MTU is at 1500, txqueuelen is 1000.
I'm losing the will to live with this networking headache ! Please feel free
to point me at a Linux list if NANOG isn't suitable. I'm at a loss where
else to ask.
The linux-net might be more appropriate indeed.
With and without shaping and over different bandwidth providers using the
e1000 driver for an Intel PRO/1000 MT Dual Port Gbps NIC (82546EB) I can
replicate full, expected throughput with UDP but consistently only get
300kbps - 600kbps throughput _per connection_ for outbound TCP
I've seen this behavior as the result of duplex mismatches.
(The tcp settings are end system matters and do not affect how the
router forwards traffic.)
Thanks loads for the quick replies. I'll try and respond individually.
Lee > I recently disabled tcp_window_scaling and it didn't solve the
problem. I don't know enough about it. Should I enable it again ? Settings
differing from defaults are copied in my first post.
Mike > Strangely I'm not seeing any errors on either the ingress or egress
NICs:
and it may be worth noting that flow control is on. Are these a reasonable
level of pause frames to be seeing ? They seem to be higher on non-routing
boxes.
Thanks, Nickola.
What's your opinion on these settings ? Do you recommend switching off "tcp
segmentation offload" ?
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: on
Thanks loads for the quick replies. I'll try and respond individually.
Lee > I recently disabled tcp_window_scaling and it didn't solve the
problem. I don't know enough about it. Should I enable it again ? Settings
differing from defaults are copied in my first post.
I don't know if the tcp window size makes any difference when the box
is acting as a router. But when UDP works as expected & each
additional TCP connection gets 300-600kbps the first thing I'd look at
is the window size. If it was a duplex mismatch additional TCP
connections would make things worse instead of each getting 3-600Kb
bandwidth.
The only other change to the box is that over Christmas IPtables
(ip_conntrack and its associated modules mainly) was loaded into the kernel
If all else fails, backing out recent changes usually works
Thanks very much, Lee. My head's whirring. Am I right in thinking by turning
on scaling (which I just did) then the window size is automatically set ?
I'll do some more reading.
I'm looking at TSO too as above, mentioned by Nickola. I'll maybe risk
changing it with ethtool during a quiet network moment.
I've just discovered the netstat -s command which gives loads more info than
anything else I've come across. Any pointers about window size or TSO from
the output appreciated
Thanks again,
I'm looking at TSO too as above, mentioned by Nickola. I'll maybe risk
changing it with ethtool during a quiet network moment.
Turning off offloading might be something to try indeed.
Regarding the negotation issue, can you look at the other end of the
link and check what it's saying?
Looking at "netstat -s" statistics at the endpoint (not the router)
could be illuminating, too. I haven't got any expertise in this area,
but TCP problems can often be diagnosed by looking at tcpdump/packet
captures and analyzing them using tcptrace (and the special xplot
variant which can plot tcptrace output).
Thanks for all the answers. I'm currently going down the path of looking at
IPtables' conntrack slowing the forwarding rate.
If I can't find any more docs then I'll boot the router with a kernel
without any IPtables built-in and see if that's it.
As Lee said "rollback" ! That's the last change to the box. If I can rule
out the logging of traffic from conntrack is slowing down
the forwarding then I can look into hardware further
Thanks for all the answers. I'm currently going down the path of looking at
IPtables' conntrack slowing the forwarding rate.
If I can't find any more docs then I'll boot the router with a kernel
without any IPtables built-in and see if that's it.
As Lee said "rollback" ! That's the last change to the box. If I can rule
out the logging of traffic from conntrack is slowing down
the forwarding then I can look into hardware further
Thanks, Karl, Allen and Nickola.
I failed-over to another router last night and briefly had full expected
throughput but this morning despite dropping providers and moving between
routers again for trial and error I still see _outbound_ TCP at about the
same 300 - 600kbps per session.
I eliminated conntract modules firstly, then iptables as a whole. I've
eliminated TSO and checksumming (which caused very sticky connections) on
the e1000 NIC.
The failover router has a slightly older kernel and was working before
Christmas so it's not most likely not kernel versions. I've also tried
removing FIB_TRIE as a stab in the dark with no success. And the failover
router connects using FE not GE so I've eliminated NICs and connection
speeds to a front-facing switch.
The only constant is the front-facing switch (it's negotiating perfectly at
FD though) so all I can think of is removing that from the equation.
It's definitely only _outbound_ TCP getting buffered though ! I've pushed
92Mbps on a FE link with UDP and uploaded at 16Mbps on a 16Mbps link.
Any last ideas appreciated before causing headaches removing switches would
be appreciated.
Thanks for the suggestions everyone.
I've got to the bottom of the problem now (I'm sure there will be a
collective sigh of relief from the list because of the noise this thread
generated :-)).
I installed two brand new, low spec, 3Com switches one at the 'front' of the
network and one 'behind' the routers. They are the same model, same latest
firmware, same config (saved to and then copied off disk) and only their IPs
were different.
The front switch was the problem. As two final tests before removing it we
switched off unicast/multicast broadcast control and flow control
(and simultaneously on the same port on the switch behind the routers)
because there were pause frames showing but not a massive amount in terms of
percentage.
The switch behind the routers however is serving the same bandwidth equally
well ! We've put an ancient switch in place of the front switch and its been
working perfectly so far.
My lesson learned is too change as little as possible at once ! That said,
recent network changes were spread about a month apart and this very odd
issue was far easier to dismiss than believe due to its bizarre nature.
Especially when providers have changes in their network conditions as
testing is done.
I really appreciate all the input and have learnt loads, possibly just not
in the way I would have liked to
The TCP offloading should be suspect. Any current PC hardware should
be able to deal with huge amounts of traffic without any offloading.
Start with turning that off, so everything will be handled by Linux
directly. Even if you still would have a problem, it's easier to trouble
shoot without magic black boxes (TOE) in between.