Flash crowds and network management

It was small as most web pages are today. I don't think the government
servers were hit even close as hard as some of the big news sites. I bet
CNN has 10 x the hits the LOC site had. NetRail hosted the servers for US
Treasury and USDA. We gave them a 100 BaseT ethernet connection into a
core router, but it never was a big deal because their servers would die
way before the like utilization ever got high. I have found many
government sites are like that. I would not be surprised if LOCs servers
died before the links maxed out.

I tend to have a more holistic approach to customer service. The
customer's servers dying due to the network load is concerns me as
a network manager even if though link utilization never got high.

I wouldn't jump to the conclusion that the government servers were not
hit even as close as hard as some of the big news sites, depending on
your definition of 'hit.' Hits for traffic management purposes are
different than hits for advertising purposes. Your servers stop
counting 'hits' when they are down, but that doesn't mean the requests
stop. Outbound traffic goes down when the servers die, but inbound
traffic doesn't stop.

I'd like to chat someday with the webserver managers at some of
the large media web farms someday, not necessarily with a reporter
listening in, about what they were seeing. But at the moment there
isn't a real good way for us to communicate in real-time about what
is going on.

Think of the worst SYN or SMURF attack you've ever seen and then
combine them. The Internet doesn't have the equivalent of "choke"
exchanges found in local telephone exchanges. And if you think
the phone network is any more reliable, remember what happened
when Garth Brooks tickets went on sale in the Capital a few years

A useful addition to CAR would be a clear way to limit SYN packets per
second, but letting other traffic through so once the person gets
connected they clear out as fast as possible. Using the 'established'
keyword in a access list and guessing at the size of SYN packets gets
part of this. For traffic management purposes, you want to distribute
the choke points around the entire backbone rather than just at the one
hot interface. The queue discipline gets a bit weird because of
"duplicate" SYN packets and essentially zero packet inter-arrival
spacing. I have a gut feeling that robots were a problem, but editing
access-lists in the middle of storm wasn't a good solution.