TCP session disconnection caused by Code Red?

I've been told (but not given permission to forward details of
who/how/what) that some major sites with a single router
and relatively flat network topology are dying due to the ARP
request flood that is being generated by Code Red scans on the
inside of their border router choking the router. Check the
rate of ARP requests coming off your border router and see if
it seems excessive; if so, that may be it.

-george william herbert
gherbert@retro.com

Some things that are worth looking if you are running Cisco's
( I blieve the original poster was):

http://www.cisco.com/warp/public/63/ts_codred_worm.html

Regards,
Kevin

I've been told (but not given permission to forward details of
who/how/what) that some major sites with a single router
and relatively flat network topology are dying due to the ARP
request flood that is being generated by Code Red scans on the
inside of their border router choking the router. Check the
rate of ARP requests coming off your border router and see if
it seems excessive; if so, that may be it.

2 points:

1. RFC826 appears to mandate only positive ARP caching. I can't
   see a reason why negative ARP caching shouldn't work this
   way:

   Keep only one ARP request in flight at a time. Retry ARPs
   a maximum of [5] times, separated by at least [1] second.
   After that, cache non-existance of a h/w address for that
   IP address for normal positive caching time. If you see any
   IP traffic inbound on that interface with that IP address,
   remove the negative cache. However, to get a positive cache
   entry you still need a valid ARP response (promiscuous or not).

   More formally, when address resolution is required:

   a) Look up IP address in ARP table
      i) If entry is PRESENT (i.e. h/w address OK)
           return this value.
      ii) If entry is NEXIST return ARP failure
           immediately (i.e. as a router, drop into
           the code where no route is found - on Cisco
           this would be rate-limited unreachables)
      iii) If entry is INCOMPLETE[\d] go to (b) performing
           further packet transmission (i.e. transmitting
           an ARP packet ONLY if the entry is fully aged
           (i.e. otherwise perform
           your RFC826 compatible / current operation
           without transmitting another ARP packet)
      iv) If entry is absent, transmit ARP packet
           as normal, set entry to INCOMPLETE[0] and go to (b)
   b) [this is the action we perform if we don't yet
      know the h/w address]. RFC826 suggests returning
      allowing a higher layer to retransmit, though I
      suppose blocking is theoretically possible

   If a valid ARP response is received (promiscuous or
   otherwise), remove any existing entry, and generate
   a PRESENT entry.

   If /any/ packet is received from with a valid IP
   address remove an NEXIST entry if present (on the
   ARP table for the interface on which it was received only)
   [this check is arguably too thorough as it will remove
   valid NEXIST entries for IP addresses that exist, but behind
   a router on the current subnet, rather than on it directly,
   though this is (a) better than nothing, and (b) required
   to support proxy ARP properly; note that you can't rely
   on the MAC address being that of the IP though - still have
   to ARP]

   Age INCOMPLETE[n] states to INCOMPLETE[n+1] states after
   [t1] seconds (probably about 1 second), for n<N, and to
   NEXIST for n>=N (N is probably about 5)

   Age NEXIST state to deleted after about [t2] seconds (where
   t2 is probably close to the arp timeout - i.e. about 300)

   INCOMPLETE essentially means PENDING

2. It has been observed that Cisco products in particular do not
   handle ARP storms well. Even worse is the Catalyst 5[50]00. This
   may have been fixed since I saw it. The application in which I
   saw it seriously merited having a linux box or similar 'proxy'-arp
   all non-existant addresses to null. You can probably achieve the
   same result with static arp entries to a non-existant h/w address.

Alex Bligh
Personal Capacity

Alex Bligh wrote:

1. RFC826 appears to mandate only positive ARP caching. I can't
   see a reason why negative ARP caching shouldn't work this
   way:

   Keep only one ARP request in flight at a time. Retry ARPs
   a maximum of [5] times, separated by at least [1] second.
   After that, cache non-existance of a h/w address for that
   IP address for normal positive caching time.

The immediate problem with this is that it requires a *MUCH* larger ARP
cache. Rather than needing enough memory for a couple of thousand active
entries (the current norm for middle-of-the road routers), you need enough
room for every possible address on every attached segment.

[unsubstantiated conjecture] This may be what's killing the cable networks.
If they are making room in the NAS ARP caches for the addresses that are
being probed, then they are making room by flushing the "real" ARP entries,
resulting in a constant flush/load cycle. [/uc, but exemplary of the problem
with negative ARP caching.]

Adding to this conjecture, I'm seeing VERY high ARP rates (arp broadcast packets) arriving via the cable modem in my office. Also seeing a high rate of Code Red type attacks attempted at the machines attached. Firewall is just catching and logging them.

The immediate problem with this is that it requires a *MUCH* larger ARP
cache. Rather than needing enough memory for a couple of thousand active
entries (the current norm for middle-of-the road routers), you need enough
room for every possible address on every attached segment.

Eric A. Hall http://www.ehsco.com/

  Weight that against the advantages, however. If you have a large address
space for the segment with few attached hosts (the case where this is a
problem), you're better off with a lot of negative entries cached then with
a lot of active ARP attempts.

  One thing I see a lot of on segments with large address spaces is that the
quantity of ARP traffic can get high. Each ARP request causes an interrupt
on each attach host on the segment. I'd rather the router have a larger ARP
cache than the network have more broadcast traffic.

  I'm curious what kind of algorithms my routers currently use. If it's one
packet per second with five retries -- consider a network with a /22 that's
only half full. You could see as much as 512 broadcast packets a second just
from one router. Sounds like an interesting technique for getting
amplification by a factor of 5 -- 5 broadcast packets for every unicast
packet you send.

  Smarter rate limiting sounds like a win.

  DS