RE: Gb ethernet interface keeping dropping packet in ingress

Now, we do try to monitor some things like that. We have several crons
running checking the number of entries in the arp tables of our CPE
devices at customer locations, as well as several crons dedicated to
specific tell-tale signs of various worms and virii.

One that helped out a lot recently was Nachi/Welchia search. Caught 40%
of our subscribers that were infected, and helped stop all but 3
specific broadcast storms on our network. All the cron did was look for
the specific ICMP packet that the virus put out, and flagged the
connection in a list that is emailed to the NOC staff.

Joe Johnson

From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu] On Behalf

Of

Jeff Kell
Sent: Monday, September 13, 2004 10:46 PM
To: Joe Shen; nanog@merit.edu
Subject: Re: Gb ethernet interface keeping dropping packet in ingress

If you're sniffing one gigabit port from a switch with much higher
bandwidth, you're going to lose something. Our primary sensor sits on
an aggregation switch just prior to hitting the net, and we have a 2Gb
fast etherchannel span port defined and lose relatively little in

terms

of packet loss. If course, the more aggregate traffic you have, the
higher the probability you will max out the span port and it's

buffers.

Unless you're just drilling the heck out of the server farm(s) on that
switch, you won't lose all that much with an etherchannel of 2 Gig
ports. We have 2Gb etherchannel uplinks back to the core, and the

most

the switch could throw at us would be 2Gb etherchannel traffic. So we
are spanning the uplinks there.

Just as your switches/routers can be "over subscribed" the the 4506
backplane is only 6Gb/slot, and we don't lose that much, and some of
that loss is due to buffer constraints on the switch. Not perfect,

but

it works. In less critical ennvironments, we can sniff with a 100Mb
interface and still do well.

The only caution here is that you can seldom catch local traffic. If
there's a local scanner (like Blaster started out to be) it doesn't

show

up except for excessive arps. We have some cron'ed scripts that
periodically (1) look at connection counts in the PIX, if they're out

of

"range), we quarantine them to the Perfigo dungeon. Similarly there

is

a script that counts ARP requests (just the dorms specifically right
now) and for every 1000 it forks itself to start anew, and analyzes

the

numer of ARPS per station. Local scanners get eaten up here really
quickly and they are also quarantined.

Not how sure this fits into NANOG, this is more of a local
ISP/Universiity setting. I don't know that an ISP can do that much,
they're too busy keeping the packets flowing and being only minimally
intrusive on your traffic without special arrangements, at least as a
usual case. Special cases like Slammer, Blaster, and the initial
Bagel/MyDoom mix some may have initiated ingress/egress filters for
those, temporarily.

You should be able to handle an OC-12 with a gig interface or two on

the

sensor. I wouldn't make any claims for an OC-48 or above. These

things

Joe Johnson wrote:

Now, we do try to monitor some things like that. We have several crons
running checking the number of entries in the arp tables of our CPE
devices at customer locations, as well as several crons dedicated to
specific tell-tale signs of various worms and virii.

Our list of crons is growing too...

One that helped out a lot recently was Nachi/Welchia search. Caught 40%
of our subscribers that were infected, and helped stop all but 3
specific broadcast storms on our network. All the cron did was look for
the specific ICMP packet that the virus put out, and flagged the
connection in a list that is emailed to the NOC s

We do the Nachi/Blaster 445 and ICMP pings with a route policy map on our core so as not to disturb the PIX with senseless traffic. We do at least catch the random Nachi probles (which are local), didn't work so sell with machines destined off the local subnet.

We do extensive ingress/egress filtering at the border that catches most junk from getting in our out. We're in the process of intergrating this into our Perfigo system, but we've only had the Perfigo solution in place for a few months. It has helped by logically micro-managing each station on their own logical /30 subnet that makes them up, but virii/worms that don't care about gateways and so forth aren't really stopped if they catch a 0-day, very few viruses make meaningful IP address guessing (they'll nail thhe local ranges first, but some go off-campus). This is hopefully caught by a script under developent that uses ipaudit (sourceforge.net) and keeps the top 10 traffic sources inbound/outbound, and cumulative counts each 30 minutes for how many local/hosts appear to be scanning, and likewise for the reverse.

We used to shut these ports down, but now we're having Perfigo lock them into a "quarantine" LAN where their situation is explained, and has hooks to our SUS and antivirus tools (AdAware, Spybot) with contact numbers for the helpdesk if they need assistannce.

So far, so good, but could be better.

Jeff