RFC 793 arbitrarily defines 2MSL (how long to hold a socket in
TIME_WAIT state before cleaning up) as 4 min.
Linux is a little more reasonable in this and has it baked into the
source as 60 seconds in "/usr/src/linux/include/net/tcp.h":
#define TCP_TIMEWAIT_LEN (60*HZ)
Where there is no way to change this though /proc (probably a good
idea to keep users from messing with it), I am considering re-building
a kernel with a lower TCP_TIMEWAIT_LEN to deal with the following
With a 60 second timeout on TIME_WAIT, local port identifiers are tied
up from being used for new outgoing connections (in this case a proxy
server). The default local port range on Linux can easily be
adjusted; but even when bumped up to a range of 32K ports, the 60
second timeout means you can only sustain about 500 new connections
per second before you run out of ports.
There are two options to try an deal with this, tcp_tw_reuse, and
tcp_tw_recycle; but both seem to be less than ideal. With
tcp_tw_reuse, it doesn't appear to be effective in situations where
you're sustaining 500+ new connections per second rather than a small
burst. With tcp_tw_recycle it seems like too big of a hammer and has
been reported to cause problems with NATed connections.
The best solution seems to be trying to keep TIME_WAIT in place, but
being faster about it.
30 seconds would get you to 1000 connections a second; 15 to 2000, and
10 seconds to about 3000 a second.
A few questions:
Does anyone have any data on how typical it is for TIME_WAIT to be
necessary beyond 10 seconds on a modern network?
Has anyone done some research on how low you can make TIME_WAIT safely?
Is this a terrible idea? What alternatives are there? Keep in mind
this is a proxy server making outgoing connections as the source of
the problem; so things like SO_REUSEADDR which work for reusing
sockets for incoming connections don't seem to do much in this
Anyone running large proxies or load balancers have this situation?
If so what is your solution?