DOS?

Greetings,

It looks like all hell is breaking loose on some of the nations
backbones. http://www.internethealthreport.com

The port counters on my AT&T DS3 were reading in the 250 megabit range,
that is a DS3, mind you.

Any source IP's I can add to the circular file would be appreciated.
Any ranges I find I'll echo back to the list.

Regards,
Christopher J. Wolff, VP CIO
Broadband Laboratories, Inc.
http://www.bblabs.com

It's an MS SQL worm that is sending and receiving UDP on 1434.
http://www.nextgenss.com/advisories/mssql-udp.txt appears to be relevant.

Anyone want to get involved in some sort of real time chat (like IRC) to
disuss strategies? We're seeing some pretty big traffic, and related
problems in multiple colo's world wide.

Doug

Greetings,

It looks like all hell is breaking loose on some of the nations
backbones. http://www.internethealthreport.com

The port counters on my AT&T DS3 were reading in the 250 megabit range,
that is a DS3, mind you.

Any source IP's I can add to the circular file would be appreciated.
Any ranges I find I'll echo back to the list.

Regards,
Christopher J. Wolff, VP CIO
Broadband Laboratories, Inc.
http://www.bblabs.com

You need a filter similar to this (in junos format):

show configuration firewall filter filter-012503

term deny-dos {
    from {
        packet-length 404;
        protocol udp;
        destination-port 1434;
    }
    then {
        count codered-4;
        discard;
    }
}
term allow-rest {
    then accept;
}

--Phil
ISPrime

Greetings,

It looks like all hell is breaking loose on some of the nations
backbones. http://www.internethealthreport.com

The port counters on my AT&T DS3 were reading in the 250 megabit range,
that is a DS3, mind you.

Outbound? (can't imagine inbound counters breaking that badly)

Any source IP's I can add to the circular file would be appreciated.
Any ranges I find I'll echo back to the list.

Forget IPs. Just block port 1434 protocol UDP in *and* out.

Hi,

It looks like all hell is breaking loose on some of the nations
backbones. http://www.internethealthreport.com

You are not the only one.. I've been sitting here since 06:30 now. So far
I have discovered that a lot of Windows boxes send out UDP packes of 376
bytes to random addresses.

09:36:51.711380 802.1Q vlan#50 P0 213.136.0.251.3303 > 239.103.224.157.1434: udp 376 [ttl 1] (id 10818, len 404)
0x0000 0032 0800 4500 0194 2a42 0000 0111 e78e.2..E...*B......
0x0010 d588 00fb ef67 e09d 0ce7 059a 0180 81db.....g..........
0x0020 0401 0101 0101 0101 0101 0101 0101 0101................
0x0030 0101 0101 0101 0101 0101 0101 0101 0101................
0x0040 0101 0101 0101 0101 0101 0101 0101 0101................
0x0050 0101 ..

Someone already posted this, but its some crazy wormy thingy on port 1434
udp.

What's to discuss? If you put something like

access-list 150 deny udp any any eq 1434 log-input
access-list 150 permit ip any any

on all your customer-facing ports you get to

1. filter out the disruptive traffic
2. see which customer systems are infected

This works well even on relatively underpowered Cisco 7200 boxes.

Hi

Any ranges I find I'll echo back to the list.

not sure if you've received any nanog mail yet. don't worry about source
ip's, unless you're doing to deny '0.0.0.0'.

block anything with a destination of udp 1434, find hosts pushing extreme
amounts of traffic, get them patched
(http://www.microsoft.com/technet/treeview/default.asp?url=/technet/security/bulletin/MS02-039.asp)
and then wait for the rest of the internet to catch up...

--Rob

Hi, NANOGers.

] access-list 150 deny udp any any eq 1434 log-input

Be _very_ careful about enabling such logging. Some of the worm flows
have filled GigE pipes. I doubt you really want to log that; Netflow
is a better option in this case. Too much logging will raise the CPU
utilization to the point of creating a DoS on the router.

Thanks,
Rob.

As a general rule, yes. But:

" Access list logging does not show every packet that matches an entry.
Logging is rate-limited to avoid CPU overload. What logging shows you is
a reasonably representative sample, but not a complete packet trace.
Remember that there are packets you're not seeing.

Access lists and logging have a performance impact, but not a large one.
Be careful on routers running at more than about 80 percent CPU load, or
when applying access lists to very high-speed interfaces. "

( http://www.cisco.com/warp/public/707/22.html )

There doesn't seem to be a noticable impact on CPU usage for a C12000
GigE linecard. Can you do Netflow rather than CEF on such a beast
without a performance penalty?

> ] access-list 150 deny udp any any eq 1434 log-input

> Be _very_ careful about enabling such logging. Some of the worm flows
> have filled GigE pipes. I doubt you really want to log that; Netflow
> is a better option in this case. Too much logging will raise the CPU
> utilization to the point of creating a DoS on the router.

As a general rule, yes. But:

" Access list logging does not show every packet that matches an entry.
Logging is rate-limited to avoid CPU overload. What logging shows you is
a reasonably representative sample, but not a complete packet trace.
Remember that there are packets you're not seeing.

either way, the logging for this, ESPECIALLY with log-input, is a
dangerous proposition. One thing to keep in mind is that the S-train
platforms are different in handling logging than the normal trains... so
S-train rate-limits (and bumps out them annoying messages about
rate-limited messages) while others punt as much to the route processor as
possible and happily saturate it :frowning: (Don't log on like a 7500 for instance
if the packet rates are over like 5kpps...)

Access lists and logging have a performance impact, but not a large one.
Be careful on routers running at more than about 80 percent CPU load, or
when applying access lists to very high-speed interfaces. "

right, or on platforms not built to scale :slight_smile: (like 7500 or smaller boxen)

( http://www.cisco.com/warp/public/707/22.html )

There doesn't seem to be a noticable impact on CPU usage for a C12000
GigE linecard. Can you do Netflow rather than CEF on such a beast
without a performance penalty?

One thing to keep in mind is that perhaps you don't care about the logging
:slight_smile: Just drop it and make your customers fix their borked boxes...

Date: Sat, 25 Jan 2003 09:43:24 +0100 (CET)
From: Sabri Berisha

You are not the only one.. I've been sitting here since 06:30
now. So far I have discovered that a lot of Windows boxes
send out UDP packes of 376 bytes to random addresses.

Main body of worm contains an infinite loop that spews 0x178-byte
long payload.

Eddy

> " Access list logging does not show every packet that matches an entry.
> Logging is rate-limited to avoid CPU overload.

either way, the logging for this, ESPECIALLY with log-input, is a
dangerous proposition.

Are you saying that I shouldn't believe Cisco's own documentation?
Obviously, it's going to take _some_ CPU cycles, but I would expect the
box to remain operational.

One thing to keep in mind is that the S-train
platforms are different in handling logging than the normal trains...

Ok, I've been working with Cisco equipment for 8 years now and I can
configure them in my sleep, but all the version/image/train/feature set
is still voodoo to me. Obviously, the router caches the information it
wants to log for a while and then counts hits against the cache until it
actually logs. This should work very well, and it does as per my tests
on a heavily loaded 4500 router. So why would one type of IOS do this
right and another version that isn't immediately recognizable by the
version number as inferior do it wrong?

possible and happily saturate it :frowning: (Don't log on like a 7500 for instance
if the packet rates are over like 5kpps...)

I think today's events show that CPU-based routers have no business
handling anything more than 1 x 100 Mbps in and 1 x 100 Mbps out. If a
box has 40 FE interfaces or 4 GE interfaces, at some point you'll see 4
Gbps coming in so the box must be able to handle it to some usable
degree.

> There doesn't seem to be a noticable impact on CPU usage for a C12000
> GigE linecard. Can you do Netflow rather than CEF on such a beast
> without a performance penalty?

One thing to keep in mind is that perhaps you don't care about the logging
:slight_smile: Just drop it and make your customers fix their borked boxes...

That's why I want the logging: to see which customer is spewing out the
garbage. (-:

> > " Access list logging does not show every packet that matches an entry.
> > Logging is rate-limited to avoid CPU overload.

> either way, the logging for this, ESPECIALLY with log-input, is a
> dangerous proposition.

Are you saying that I shouldn't believe Cisco's own documentation?
Obviously, it's going to take _some_ CPU cycles, but I would expect the
box to remain operational.

Yes, you'd expect this to remain operational.. but the real world
'testing' shows that not to be the case. If the attack has highly random
source or destination the log messages get gen'd for each packet :frowning: This
causes a little pain (or alot if you qualify dropping routing protocols as
alot) on the router :frowning: CPU spikes due to logging large floods are quite
common. This I know from very personal experience.

> One thing to keep in mind is that the S-train
> platforms are different in handling logging than the normal trains...

Ok, I've been working with Cisco equipment for 8 years now and I can
configure them in my sleep, but all the version/image/train/feature set
is still voodoo to me. Obviously, the router caches the information it

me too.

wants to log for a while and then counts hits against the cache until it

only for identical packets... so source A:123 -> Dest B:80 x500000 packets
gets logged 'once'. One log for the first packet and update logs at 5 min
intervals (which may be setable in some ios command, which may only exist
in S-train code). If the attack is randomized, sources, destinations, or
ports... there is effecively a new 'flow' for each packet and thus a new
log message for each... (again, in S-train code or 12.0(21)+ code this is
rate-limited to the RP and thus to the logs... somewhat atleast)

actually logs. This should work very well, and it does as per my tests
on a heavily loaded 4500 router. So why would one type of IOS do this
right and another version that isn't immediately recognizable by the
version number as inferior do it wrong?

S-train code has specific features that don't get propogated to other
trains because they aren't 'required' there or aren't applicable, or not
asked for.

> possible and happily saturate it :frowning: (Don't log on like a 7500 for instance
> if the packet rates are over like 5kpps...)

I think today's events show that CPU-based routers have no business
handling anything more than 1 x 100 Mbps in and 1 x 100 Mbps out. If a
box has 40 FE interfaces or 4 GE interfaces, at some point you'll see 4
Gbps coming in so the box must be able to handle it to some usable
degree.

that may be, but CPE isn't normally vendor J for t1/t3/oc3 customers...
never mind dsl/dial/cable customers, eh? The vast majority is cpu based
equipment. Whether or not that's a good thing is immaterial, no one is
going to upgrade all ruouting gear overnight :frowning: (or in 2 years as we've
seen)

> > There doesn't seem to be a noticable impact on CPU usage for a C12000
> > GigE linecard. Can you do Netflow rather than CEF on such a beast
> > without a performance penalty?

> One thing to keep in mind is that perhaps you don't care about the logging
> :slight_smile: Just drop it and make your customers fix their borked boxes...

That's why I want the logging: to see which customer is spewing out the
garbage. (-:

well, then.. log vs log-input :slight_smile: cause log-input is more processing and
thus more pain. (and if its 'inbound' on interfaces the 'log-input' is
kinda pointless, eh?

> wants to log for a while and then counts hits against the cache until it

only for identical packets... so source A:123 -> Dest B:80 x500000 packets
gets logged 'once'. One log for the first packet and update logs at 5 min
intervals (which may be setable in some ios command, which may only exist
in S-train code). If the attack is randomized, sources, destinations, or
ports... there is effecively a new 'flow' for each packet and thus a new
log message for each... (again, in S-train code or 12.0(21)+ code this is
rate-limited to the RP and thus to the logs... somewhat atleast)

It seems the flow recognition isn't that strict but I might just have
been lucky.

> actually logs. This should work very well, and it does as per my tests
> on a heavily loaded 4500 router. So why would one type of IOS do this
> right and another version that isn't immediately recognizable by the
> version number as inferior do it wrong?

S-train code has specific features that don't get propogated to other
trains because they aren't 'required' there or aren't applicable, or not
asked for.

Lovely when others decide what you require.

> I think today's events show that CPU-based routers have no business
> handling anything more than 1 x 100 Mbps in and 1 x 100 Mbps out. If a
> box has 40 FE interfaces or 4 GE interfaces, at some point you'll see 4
> Gbps coming in so the box must be able to handle it to some usable
> degree.

that may be, but CPE isn't normally vendor J for t1/t3/oc3 customers...

CPE for T1 would be 2500, T3 3600, OC3 7200 or some such. All are fine
for day-to-day stuff but don't pack enough power to handle today's
events at line rate. But the difference is small enough that it can be
remedied by simply using faster CPUs. Those were available at the time
the boxes were introduced, but I assume a faster CPU would have
increased the cost price too much.

never mind dsl/dial/cable customers, eh?

Those are slow enough to be done in software easily.

The vast majority is cpu based
equipment. Whether or not that's a good thing is immaterial, no one is
going to upgrade all ruouting gear overnight :frowning: (or in 2 years as we've
seen)

People are buying GE equipment left right and center too. It doesn't
make much sense to have more computing power in the ethernet chip (GE
over UTP takes a lot of processing power) than in the chip doing the
routing.

Maybe its possible to find some middle ground, for instance by doing
some basic flow recognition and rate limiting in hardware but the actual
routing in software. That way, you can build a GE CPE router that can do
100 kpps which is enough for regular traffic but still have some
protection when there is a 1.4 Mpps DoS attack which would otherwise
have killed the CPU.

> That's why I want the logging: to see which customer is spewing out the
> garbage. (-:

well, then.. log vs log-input :slight_smile: cause log-input is more processing and
thus more pain. (and if its 'inbound' on interfaces the 'log-input' is
kinda pointless, eh?

Good point. The reason it's there is that I didn't know what I was
dealing with when I enabled this logging and I wanted to see the MAC
addresses in case the source IP addresses were spoofed.

Are you saying that I shouldn't believe Cisco's own documentation?
Obviously, it's going to take _some_ CPU cycles, but I would expect the
box to remain operational.

Actually, Cisco's documentation is not always accurate, and it heavily
depends on IOS version, train, feature set, and hardware.

> One thing to keep in mind is that the S-train
> platforms are different in handling logging than the normal trains...

Ok, I've been working with Cisco equipment for 8 years now and I can
configure them in my sleep, but all the version/image/train/feature set
is still voodoo to me. Obviously, the router caches the information it
wants to log for a while and then counts hits against the cache until it
actually logs. This should work very well, and it does as per my tests
on a heavily loaded 4500 router. So why would one type of IOS do this
right and another version that isn't immediately recognizable by the
version number as inferior do it wrong?

As stated above, it depends on the code. When logging high volume, I
recommend turning off all logging facilities except the one you plan to use.
Multiple logging facilities will create a multiple effect on the CPU for
some trains and versions. ie. logging to console and syslog and running a
term mon is a very, very bad thing under heavy logging. This also depends on
what you are logging. Narrow the scope as much as possible, ie, log only a
narrow customer selection at a time, then try the next.

> possible and happily saturate it :frowning: (Don't log on like a 7500 for

instance

> if the packet rates are over like 5kpps...)

I think today's events show that CPU-based routers have no business
handling anything more than 1 x 100 Mbps in and 1 x 100 Mbps out. If a
box has 40 FE interfaces or 4 GE interfaces, at some point you'll see 4
Gbps coming in so the box must be able to handle it to some usable
degree.

Actually, you wouldn't expect to see 4 Gbps comming in. That would be full
saturation, which would imply serious performance degregation. Most networks
that I've dealt with stick to a 70-80% saturation rule. In addition, many of
the problems concerning this traffic weren't throughput issues. Each router
has a bandwidth limitation and a pps limitation. The worst DDOS I've had to
deal with didn't even show as a bandwidth spike on my circuits but exceeded
the pps of the router. Luckily, such attacks are easily dealt with using
access-lists as the router is optimized to block more pps than it is
designed to switch. This worm had both. The packets were small and the
bandwidth utilization was high. Blocking the packets would lower cpu
utilization to a manageable degree while the bandwidth usage on each
infected circuit was localized to that circuit. Depending on the type of
circuit depended on how well it dealt with the loading as different L2
protocols handle saturation differently. ATM is the ideal medium as the
latency remains lower than FE or GE at peak saturation. One's responsibility
is only to the edge of their controllable network, though. If you can't shut
off the ethernet port to an infected server, the customer is responsible for
that equipment. Ideally, you have one customer per each circuit that you
control.

Jack Bates
Network Engineer
BrightNet Oklahoma

> I think today's events show that CPU-based routers have no business
> handling anything more than 1 x 100 Mbps in and 1 x 100 Mbps out. If a
> box has 40 FE interfaces or 4 GE interfaces, at some point you'll see 4
> Gbps coming in so the box must be able to handle it to some usable
> degree.

Actually, you wouldn't expect to see 4 Gbps comming in.

You wouldn't expect it, but it simply happens anyway.

That would be full
saturation, which would imply serious performance degregation. Most networks
that I've dealt with stick to a 70-80% saturation rule.

Unfortunately worms (or denial of service attackers) don't play nice.

In addition, many of
the problems concerning this traffic weren't throughput issues. Each router
has a bandwidth limitation and a pps limitation. The worst DDOS I've had to
deal with didn't even show as a bandwidth spike on my circuits but exceeded
the pps of the router.

That's my point: if you can exceed the router's pps while staying within
the aggregate bandwidth for all ports on the box, you'll find yourself
in trouble at some point.

Luckily, such attacks are easily dealt with using
access-lists as the router is optimized to block more pps than it is
designed to switch. This worm had both.

First of all, I don't want to have to install a filter to make a router
usable again. Second, this one was easy to filter. We can't count on
always being that lucky.

circuit depended on how well it dealt with the loading as different L2
protocols handle saturation differently. ATM is the ideal medium as the
latency remains lower than FE or GE at peak saturation.

??? Latency is strictly a function of the average queue size, which is a
function of the number of bits coming in vs the number of bits going out
per unit of time.

Iljitsch van Beijnum