Level3 routing issues?

James_Cowie · January 28, 2003, 3:34pm

> So far it's been visible as an apparently accidental byproduct of an
attack
> with other goals. Are you willing to bet your bifocals that the same
> mechanism can't be weaponized and used against the routing infrastructure
> directly in the future?
>

Yet the question becomes the reasoning behind it. How much is a direct
result of the worm and how much is a result of actions based on the NE's?

Good question. null routing of traffic destined to a network with a BGP
interface on it will cause the session to drop. That is a BGP effect due
to engineers' actions, indirectly triggered by the worm.

On the other hand, we also know (from private communications and from
other mailing lists.. ahem) that high rate and high src/dst diversity
of scans causes some network devices to fail (devices that cache flows, or
devices that suffer from cpu overload under such conditions).

Some BGP-speaking routers (not all, by any means, but some subpopulation)
found themselves pegged at 100% CPU on Saturday. Just one example:

http://noc.ilan.net.il/stats/ILAN-CPU/new-gp-cpu.html

Whether you believe "anthropogenic" explanations for the instability
depends on how fast you believe NEs can look, think, and type, compared
to the speed with which the BGP announcement and withdrawal rates are
observed to take off. For my part, I'd bet that the long slow exponential
decay (with superimposed spiky noise) is people at work. But the initial
blast is not.

Jack_Bates · January 28, 2003, 3:47pm

<snip>

On the other hand, we also know (from private communications and from
other mailing lists.. ahem) that high rate and high src/dst diversity
of scans causes some network devices to fail (devices that cache flows, or
devices that suffer from cpu overload under such conditions).

Some BGP-speaking routers (not all, by any means, but some subpopulation)
found themselves pegged at 100% CPU on Saturday. Just one example:

GP1 CPU LOAD (new-gp.ilan.net.il)

Was it not known that under certain conditions the router would flatline?
What percautionary measures were put into place in such an event to limit
the damage?

Whether you believe "anthropogenic" explanations for the instability
depends on how fast you believe NEs can look, think, and type, compared
to the speed with which the BGP announcement and withdrawal rates are
observed to take off. For my part, I'd bet that the long slow exponential
decay (with superimposed spiky noise) is people at work. But the initial
blast is not.

When the crisis is on you, it's too late. You are either prepared and know
exactly what to do at that critical moment or you don't. You either had a <5
minute response time to the crisis or you didn't. We also know (from private
communications and from other mailing lists.. yes, I'm a thief that many
NEs were caught with their pants down, a mistake they aren't apt to do
again. It comes down to one's outlook. Do you just configure and maintain or
do you strive to push it to the envelope? Do you truly know your network?
Remember, it's a living, breathing thing. The complexity of variables makes
complete predictability impossible, and so we must learn to understand it
and how it reacts.

Then again, perhaps I'm a lunatic.

Jack Bates
BrightNet Oklahoma

Hank_Nussbacher · January 28, 2003, 4:17pm

From: <cowie@renesys.com>

<snip>
> On the other hand, we also know (from private communications and from
> other mailing lists.. ahem) that high rate and high src/dst diversity
> of scans causes some network devices to fail (devices that cache flows, or
> devices that suffer from cpu overload under such conditions).
>
> Some BGP-speaking routers (not all, by any means, but some subpopulation)
> found themselves pegged at 100% CPU on Saturday. Just one example:
>
> GP1 CPU LOAD (new-gp.ilan.net.il)
>
Was it not known that under certain conditions the router would flatline?

Yes. And so does Cisco.

What percautionary measures were put into place in such an event to limit
the damage?

A very reactive NOC. -Hank

Jared_Mauch · January 28, 2003, 4:28pm

I wonder how much of this was because of packets
destined *TO* the router. I don't know about you but I'm not
about to go put access-lists on all 600+ interfaces in some of
my routers. My push is for Cisco to (and i'm sure others agree, as
well as the other vendors who don't have a similar feature today)
to port their "ip receive acl" to other important platforms. The
GSR is not the only router that needs to be protected on the internet
and they seem to be missing that bit of direction.

http://www.cisco.com/en/US/products/sw/iosswrel/ps1829/products_feature_guide09186a00800a8531.html

Not putting this feature in the next releases of software
would be irresponsible on their part after the critical nature
of this attack, IMHO.

- jared

Haesu · January 28, 2003, 5:31pm

GP1 CPU LOAD (new-gp.ilan.net.il) > Was it not
known that under certain conditions the router would flatline? What
percautionary measures were put into place in such an event to limit
the damage?

scheduler allocate

-hc