RE: What good is a noc team? How do you mitigate this? [was: How many backbones ...]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

From: Gadi Evron [mailto:ge@linuxbox.org]
Sent: Thursday, December 02, 2004 3:21 PM
To: Chad Skidmore
Cc: Aaron Glenn; nanog@merit.edu
Subject: What good is a noc team? How do you mitigate this?
[was: How many backbones ...]

Okay, making this an operational issue. Say you are attacked.
Say it isn't even a botnet. Say a new worm is out and you are
getting traffic from 19 different class A's.

Who do you call? What do you block?

How can a noc team here help?

"Please block any outgoing connections from your network to
ours on port 25? Please?" I tried this once.. it doesn't
help. I ended up blackholing an entire country just to
mitigate it a bit, for a few hours.

Any practical suggestions?

  Gadi.

Well, the easy answer is that it depends. Lets use SQL Slammer as
one example that might be comparable to the scenario you mention.
During Slammer some networks did stay up. We'd have to ask each one
of them what they did to know why they stayed up but I think I can
guess at some. Shortly after Slammer there was a NANOG presentation
on Slammer and some discussion at the NSP-Sec BOF at that NANOG
regarding why some people survived and others didn't. What came out
of that was enlightening, if not obvious in hind sight.

1. Those providers that made use of contacts at other providers and
worked together, shared information, etc. were less affected than
those that did not.

2. Those providers that had various mechanisms in place for just such
an issue did better than those that did not. This included, but was
not limited to, darknet monitoring & quick reaction to darknet data
anomalies, automated and semi-automated sifting of Netflow data,
pre-staged classification ACLs on at least key
backbone/peering/transit routers, and BGP (or other) triggered
blackhole mechanisms.

3. Teams with dedicated incident response teams did better than those
that didn't.

4. Those with grossly oversubscribed networks did worse than those
with sufficient bandwidth to handle the ebb and flow of traffic that
rides the Internet today. Good traffic engineering practices don't
mean that you have to purchase lots of excess bandwidth to make this
happen. Not being oversubscribed is also not just an issue of circuit
utilization. For example, make sure you have enough CPU on your
routers, line cards, whatever so that you can turn various features
on to help track and mitigate an attack without making your routers
fall over.

So, armed with that data you can assume the following.

With good darknet monitoring practices you would likely see a rapid
up tick in scanning, backscatter, etc. and could start investigating
the cause prior to the issue becoming service affecting. Maybe it is
so crazy and randomized that you don't see it on your darknet
monitoring but you see it on your PPS data collection. More often
than not I know we see indications of miscreant activity on PPS
monitoring first.

The classification ACLs are a good way to turn the router into a poor
mans sniffer (assuming it isn't so heavily loaded already that it
falls over) so you can see what types of traffic you are dealing
with. Using MCI/UUs method you could track any spoofed traffic back
to where it enters your network pretty easily. I know that Chris and
company do it with amazing speed across 701. If it works for them
then it likely works for the rest of you.

Netflow data would likely lead you to sources of the most pain so you
could go after those first. Fighting an attack isn't always about
making the attack go away. Often times the key to not getting killed
is to find the "big guns" and get them silenced first. Sure, you're
still getting shot, but it isn't going to kill you and you can take
some additional time to find the smaller guns. If you are seeing the
bulk of the attack come from a few sources let their security teams
deal with it and take the pain away from you.

Armed with the data you glean from this approach you will usually be
able to get a positive response from your upstream or peers. If not
make a quick note to yourself that you need to replace them once your
attack is over and done with. If all else fails blackhole the host
under attack at your borders, or even better on your upstream's
network via BGP triggered blackhole (if they don't support it make a
note to replace them with someone who does when the attack is over).
You might sacrifice that host but you'll save the rest of your
network and likely buy yourself some more time to track back to the
source and kill it.

I'm certainly not suggesting I have all the answers or that I have it
all figured out. I also realize that the world is not a rosy place
where inter-provider communication is perfect and I always get the
answers I need when I call them. I'm just tired of seeing people
play the victim, complaining how the "Big Providers" won't protect
them, etc. without looking in the mirror and deciding that today is
the day I take my network back and take care of myself. Gadi and
Aaron, this isn't directed specifically at you so please don't take
that as a personal flame to you.

I've personally had great luck in getting quick reaction from a
number of providers when an attack is ongoing. That certainly isn't
always the case but more often than not it is. Some that have done a
great job in the past that come to mind for me are 701, 2914, 3549,
1239, 3356. That isn't all of them but they are the ones that come
to mind the quickest.

Chad

- ----------------------------
Chad E Skidmore
One Eighty Networks, Inc.
http://www.go180.net
509-688-8180