Nachi/Welchia Aftermath

Well folks, since the middle of August I've been tracking the spread and subsequent efforts by our community to stop the nachia/welchia infection that took down so many networks.

Sadly, by my estimations, only about 20-30% of infected hosts were cleaned. After Jan 1, 2004 it appears that the thousands, (millions?) of remaining infected hosts were rebooted and the worm removed itself. Network traffic has finally returned to normal.

What kind of effects did everyone see from this devastating worm and what lessons did we learn for preventing network downtime in the future?

Flow based "attacks" can kill flow based routers.

James Edwards
Routing and Security Administrator
jamesh@cybermesa.com
At the Santa Fe Office: Internet at Cyber Mesa
Store hours: 9-6 Monday through Friday
505-988-9200 SIP:1(747)669-1965

: What kind of effects did everyone see from this devastating worm and what
: lessons did we learn for preventing network downtime in the future?

Proper network design is a good thing... :wink:

scott

: : What kind of effects did everyone see from this devastating worm and what
: : lessons did we learn for preventing network downtime in the future?
:
:
: Proper network design is a good thing... :wink:

Before I get flamed, I should say that is for end-user networks, not the
normal BANs (Big A$$ Networks) that're the norm on this list. I'm a
.edu-eyeball network now-a-days and I don't have to get everything to the
end-user nor do I have to send out everything from the end-user, so I can
block whatever I need, so long as my customers are happy. 99.9% of the
time, they don't know I'm blocking anything...

scott

lesson learned:
  stop using /makeshift/ layer3 switches (without naming vendor) to run
  L3 core

-J

Not all L3-switches are flow-based; prefix-based ones should do just fine.
Can people add/correct this initial list ?

Flow-based: Foundry with IronCore modules, Cisco Catalyst 6500 with Sup1(A)
Prefix-based: Foundry with JetCore modules, Cisco Catalyst 6500/7600 with
Sup2(A), Sup3(A/BXL)

Rubens

yes in concur.. prefix based ones (like FIB) are fine.

  unfortunately some models from some vendors (tisk tisk) who use
  slow process path to reprogram the CAM per flow can be quite painful
  during situations like random dest. dos attacks and worms..

  add the E vendor to your list too.. we had summit48i that loved the
  worm traffic

-J

Not all L3-switches are flow-based; prefix-based ones should do just fine.
Can people add/correct this initial list ?

Flow-based: Foundry with IronCore modules, Cisco Catalyst 6500 with Sup1(A)
Prefix-based: Foundry with JetCore modules, Cisco Catalyst 6500/7600 with
Sup2(A), Sup3(A/BXL)

Rubens

Where do the Extreme and Juniper fit into this?

> Flow-based: Foundry with IronCore modules, Cisco Catalyst 6500 with

Sup1(A)

> Prefix-based: Foundry with JetCore modules, Cisco Catalyst 6500/7600

with

> Sup2(A), Sup3(A/BXL)
Where do the Extreme and Juniper fit into this?

Private and public answers to my question indicate that both Summit 48i and
Black Diamond from Extreme are flow-based; Juniper doesn't make layer 3
switches, but their routers also do prefix-based forwarding; Cisco routers
also do prefix-based forwarding at usual configurations.

Also of notice, flow-based forwarding is not the only thing that makes a L3
device suffer at worm attacks. If a directly connected interface is an
Ethernet (or any other medium that is not point to point), ARPing for a lot
of new addresses per second can also do harm.

Rubens

> > Flow-based: Foundry with IronCore modules, Cisco Catalyst 6500 with
Sup1(A)
> > Prefix-based: Foundry with JetCore modules, Cisco Catalyst 6500/7600
with
> > Sup2(A), Sup3(A/BXL)
> Where do the Extreme and Juniper fit into this?

Private and public answers to my question indicate that both Summit 48i and
Black Diamond from Extreme are flow-based; Juniper doesn't make layer 3
switches, but their routers also do prefix-based forwarding; Cisco routers
also do prefix-based forwarding at usual configurations.

Also of notice, flow-based forwarding is not the only thing that makes a L3
device suffer at worm attacks. If a directly connected interface is an
Ethernet (or any other medium that is not point to point), ARPing for a lot
of new addresses per second can also do harm.

Nearly. Any frames needing to go to the CPU will harm your box.. this tends to
be L2 occurances (arp storms is one ) which therefore means connected ethernets.
DoSing (L3 IP eg smurf) a router will usually hurt and if you can manage it
higher level applications (announce/withdraw 1000s routes in BGP, fill up NAT
tables). Of course your architectures differ so ymmv.

Steve

Don't confuse "flow based" with "slow-path initial lookup", they aren't
the same.

The 2948G-L3 and the 4908G-L3 I believe are Prefix/ASIC based.
I believe the 3550-EMI is as well, but I'm not familiar with that
equipment.

> > lesson learned:
> > stop using /makeshift/ layer3 switches (without naming vendor) to run
> > L3 core

more generally... "if you want routing, buy a router."

i have a hybrid switer that i'm very happy with. at my house, that is.
(the idea of using one in commerce or production gives me cold shivers.)

Juniper do not make L3-switches so they dont really compare.

Extreme i-plattform is currently destination ip based with inital cache
lookup. (guess this is flow based)

> Where do the Extreme and Juniper fit into this?

Juniper do not make L3-switches so they dont really compare.

Others have said that too, but given where Junipers are used, I think they
sneak into the same category as the Cisco 6500/7600s and other high-end L3
switches.

Extreme i-plattform is currently destination ip based with inital cache
lookup. (guess this is flow based)

I guess I just don't understand the architecture. What I really don't
understand is _why_ you'd bother with flow-based architecture over
prefix-based architecture..... am I looking green yet?

(If this isn't appropriate on-list, then feel free to reply off-list.)

Cheap + Legacy.
Some gear doesn't want to die :slight_smile:

Since these boxes are priced around $3000-$4000 or so and have multiple
gig ports and loads of 10/100 ports, they make a nice edge/distribution
box.

Extreme I-chipset stuff talk ISIS, OSPF and BGP just fine and have 128
megs of memory, and they do L2/L3 at wirespeed (once the flow is set up).
The L2 is interesting since then you can use it for basically everything
and not L2 or L3, but both in the same box and on the same links.

Cisco Catalyst 4500 with Sup3/4 is also prefix based.

more generally... "if you want routing, buy a router."

  amen.
  imho there can't be a better routing equipment than a real router :slight_smile:

-J

But unfortunately, not true. A router is anything which makes decisions by
performing a longest prefix match lookup against a layer 3 header, period.
That "I route with a router and switch with a switch" nonsense is tired,
usually covers for a lack of understanding of the issues involved, and
prevents you from reaching the correct conclusion which is "I route with
the device which is most appropriate for the task".

There are some good routers, there are some bad routers, there are some
TERRIBLE routers, there are even some routers which are good at some
things and bad at others, but a router does not have to be a
switch-turned-router to suck (at a specific task) any more than a
switch-turned-router has to suck.

For example, would you rather have the reassuring consistancy of a 7206VXR
which tops out at 300Mbps come rain or shine, or might you prefer to use a
Foundry BigIron which routes a couple gigabits under normal friendly
non-stressful conditions and sits at 1% CPU? Of course, depending on the
type of traffic and if you are from an older school of thinking your
answer might very well be "I'd take the VXR", but the reality is that
there is a lot more bandwidth out there than there used to be, and 300Mbps
might just be an insignificant amount of traffic that is coming off 1
server for some people.

Understanding the design limitations of ANY device, be it a software
router, an asic based router with a prepopulated FIB, an asic based router
with a CPU first lookup, a "hack on an ethernet cam" router, or two people
with tin cans and a string yelling at each other in binary, is the first
step to using it effectively. Understanding that the limitations of a
"layer 3 switch" may make it ENTIRELY inappropriate for core routing work
is a good beginning, understanding that a Juniper T640 may be entirely
inappropriate for edge work or datacenter ethernet aggregation is a good
middle ground, and understanding where and with what steps a "layer 3
switch" CAN be used effectively is even better still. Anyone who doesn't
understand this is probably working for a bankrupt or soon to be bankrupt
company.