UDP clamped on service provider links

Hi,

Is it true that UDP is often subjected to stiffer rate limits than TCP? Is
there a reason why this is often done so? Is this because UDP is stateless
and any script kiddie could launch a DOS attack with a UDP stream?

Given the state of affairs these days how difficult is it going to be for
somebody to launch a DOS attack with some other protocol?

Glen

Hi,

Is it true that UDP is often subjected to stiffer rate limits than TCP? Is

I hear tell that some folk are engaging in this practice... You might
have seen this hear little ditty:
  <http://tools.ietf.org/html/draft-byrne-opsec-udp-advisory-00&gt;

you may have also put your ear to the tracks and seen a bunch of kids
using these 'you-dee-pee en-tee-pee' packets to fill up the tubes
across the lands... Sometimes they use not just 'en-tee-pee', but also
that old hoary bastard 'dee-en-ess' for their no good traffic backup
propositions.

there a reason why this is often done so? Is this because UDP is stateless
and any script kiddie could launch a DOS attack with a UDP stream?

I understand, and I'm new hear so bear with me, that there are
you-dee-pee services out there in the hinterlands which will say a
whole lot more to you than you said to them... like your worst
nightmare when it comes to smalltalk.

Given the state of affairs these days how difficult is it going to be for
somebody to launch a DOS attack with some other protocol?

not very hard at all... but here's your lipstick and there's the pig... :slight_smile:

"It depends on the network." is really the only answer.

It's the kind of thing that happens quietly and often can be transient in
nature (e.g. temporary "big stick" filters to deal with an active attack).

As far as the reason it happens to UDP:

UDP is a challenge because it's easy to leverage for reflection attacks
where the source IP is spoofed to be the target.

The major targets are small services that are typically left open on host
systems. The big ones being NTP, DNS, and more recently SSDP (universal
plug and play left open on consumer routers). Once in a while you see some
really old protocols open like CHARGEN, but these are less common. The
ones like NTP and DNS are popular because a small request can trigger a
large response (e.g. amplification attack) if services are not
appropriately locked down on the host.

A while back a big one a lot of people were caught off guard by was the NTP
MONLIST function which resulted in up to a 500:1 amplification.

Hopefully rate limiting UDP traffic is something that doesn't happen often,
and when people do rate-limit it they ideally limit the scope to known
problem protocols (like NTP and DNS) and base limits such that normal use
shouldn't be a problem. That said I'm sure there are some who just
rate-limit everything (likely arguing that UDP is "mostly peer-to-peer
anyway"). It's a bad practice no doubt.

TCP is still vulnerable to some level of reflection, but these are
generally easy to mitigate, and because the setup and teardown for TCP is
so small, not very effective for denial of service. There isn't much that
happens traffic-wise until the source address has confirmed a connection
which is what avoids spoofing being as big of a problem with TCP as it is
for UDP. Similarly ICMP is generally not a problem because ICMP responses
are small by design.

Is it true that UDP is often subjected to stiffer rate limits than
TCP?

Yes, although I'm not sure how widespread this is in most, if even many
networks. Probably not very widely deployed today, but restrictions and
limitations only seem to expand rather than recede.

I've done this, and not just for UDP, in a university environment. I
implemented this at time the Slammer worm came out on all the ingress
interfaces of user-facing subnets. This was meant as a more general
solution to "capacity collapse" rather than strictly as security issue,
because we were also struggling with capacity filling apps like Napster
at the time, but Slammer was the tipping point. To summarize what we
did for aggregate rates from host subnets (these were generally 100 Mb/s
IPv4 /24-/25 LANs):

  ICMP: 2 Mb/s
   UDP: 10 Mb/s
MCAST: 10 Mb/s (separate UDP group)
  IGMP: 2 Mb/s
IPSEC: 10 Mb/s (esp - can't ensure flow control of crypto traffic)
   GRE: 10 Mb/s
Other: 10 Mb/s for everything else except for TCP

If traffic was staying local within the campus network, limits did not
apply. There were no limits for TCP traffic. We generally did not
apply limits to well defined and generally well managed server subnets.
We were aware that certain measurement tools might produce misleading
results, a trade-off we were willing to accept.

As far as I could tell, the limits generally worked well and helped
minimize Slammer and more general problems. If ISPs could implement a
similar mechanism, I think this could be a reasonable approach today
still. Perhaps more necessary than ever before, but a big part of the
problem is that the networks where you'd really want to see this sort
of thing implemented, won't do it.

Is there a reason why this is often done so? Is this because UDP
is stateless and any script kiddie could launch a DOS attack with a
UDP stream?

State, some form of sender verification and that it and most other
commonly used protocols besides TCP do not generally react to implicit
congestion signals (drops usually).

Given the state of affairs these days how difficult is it going to be
for somebody to launch a DOS attack with some other protocol?

There has been ICMP-based attacks and there are, at least in theory if
not common in practice, others such as IGMP-based attacks. There have
been numerous DoS (single D) attacks with TCP-based services precisely
because of weaknesses or difficulties in managing unexpected TCP
session behavior. The potential sending capacity of even a small set
of hosts from around the globe, UDP, TCP or other protocol, could
easily overwhelm many points of aggregation. All it takes is for an
attacker to coerce that a sufficient subset of hosts to send the
packets.

John

<https://app.box.com/s/r7an1moswtc7ce58f8gg>

​Hmmm. The WebRTC ​stack has a pretty explicit form of getting and then
maintaining consent; it also rides on top of UDP (SRTP/UDP for media and
SCTP/DTLS/UDP for data channels). Because both media and data channels go
from peer to peer, it has no preset group of server addresses to white list
(the only way I can see to do that would be to force the use of TURN and
white list the TURN server, but that would be problematic for
performance). How will you support it if the default is to throttle UDP?

Clue welcome,

Ted

We will install a middlebox to strip off the UDP and expose the SCTP
natively as the transport protocol !

Patent pending!

RTCweb made a series of trade offs. Encapsulating SCTP in UDP is one of
them... the idea at the time was the this is only WebRTC 1.0, so we'll do a
few silly things to ship it early. As i am sure you know :slight_smile:

To bring this discussion to specifics, we've been fighting an issue where
our customers are experiencing poor audio quality on SIP calls. The only
carrier between our customers and the hosted VoIP provider is Level3. From
multiple wiresharks, it appears that a certain percentage of UDP packets -
in this case RTP - are getting lost in the Level3 network somewhere. We've
got a ticket open with Level3, but haven't gotten far yet. Has anyone else
seen Level3 or other carriers rate-limiting UDP and breaking these
legitimate services?

No. But I've seen Level3 just have really bad packet loss.

In one case, when we were having an issue with a SIP trunk, we re-numbered
our end to another IP in the same subnet. Same path from A to Z, but the
packet loss mysteriously disappeared using the new IP. It sure seems like
they are throttling somewhere.

We have similar problems with UDP 500 and being able to keep IPSEC tunnels up over Level3. It happens quite a bit when there are no signs of TCP or ICMP packet loss.

Several months ago we had an issue with a customer whose IPSEC tunnels we
manage. One of the tunnels dropped, and after troubleshooting we were able
to prove that only udp/500 was being blocked in one direction for one
specific source and destination IP. Level3 resolved the issue, but claimed
it was due to a "mis-configured NNI" between themselves and Charter. Seems
odd that an NNI mis-config could cause something that specific, doesn't it?

Several months ago we had an issue with a customer whose IPSEC tunnels we
manage. One of the tunnels dropped, and after troubleshooting we were able
to prove that only udp/500 was being blocked in one direction for one
specific source and destination IP. Level3 resolved the issue, but claimed
it was due to a "mis-configured NNI" between themselves and Charter. Seems
odd that an NNI mis-config could cause something that specific, doesn't it?

NNI is a peering link.

Peering links blow up during ddos since they act as a narrow funnel of
traffic between networks.

So NNI is exactly where udp ddos filters show up most, at least that is my
guess

Oh, I'm aware of the function of an NNI. I even accept that a carrier might
feel the need to filter bad traffic. I've certainly done so for things like
the Moon exploit. What I don't like is arbitrary filtering of traffic and
the denial of such filtering by the carrier.

In one case, when we were having an issue with a SIP trunk, we re-numbered
our end to another IP in the same subnet. Same path from A to Z, but the
packet loss mysteriously disappeared using the new IP.

lag hash put you on a congested fiber?

Not knowing how you evaluated the two paths, but if MPLS was not
considered, it may have perhaps been due in part to ECMP behavior.
While not ruling out UDP limits, it is plausible that the changed
source IP address resulted in a less congested path to be chosen.

John

ding! this sounds like the most plausible answer... I wouldn't expect
L3 to limit udp/5060/6061/SIP traffic, as a common carrier that also
runs a SIP trunking service they:
  1) probably know what SIP traffic is
  2) don't want to get bitten being seen as preferring their own
network offerings over other external ones (or perhaps accidentally
impacting actual customers).

https://github.com/jlmcgraw/networkUtilities/blob/master/parseMlsQosInterfaceStatistics.pl

I whipped up this perl script as part of some troubleshooting I was doing and I'd thought I send it out to NANOG in case anyone else might find it useful (Note that you may need to install some additional perl libraries, see the setup.sh file in that repository for installing them). I've only tested this with Ubuntu flavors of Linux, your OS might not work out of the bag.

What it does is take the output of "show mls qos interface statistics", which lists COS/DSCP/Queue counters by individual interface, from a file and summarize all of those numbers into overall totals.

This will (theoretically) give you a more holistic view of the markings of traffic in/out of the switch along with which queues are dropping packets. This will hopefully assist in more intelligent allocation of queue buffers and thresholds etc.

If you have any thoughts on improvements or bug fixes I'd love to hear them

-Jesse

Output should look something like this:

           'cos:incoming ( Tag -> Packets )' => {
                                                  '0' => '13378773318',
                                                  '6' => 1192965355,
                                                  '5' => 241414642,
                                                  '7' => 93307502,
                                                  '3' => 32812572,
                                                  '1' => 705042,
                                                  '4' => 5812
                                                },
           'cos:outgoing ( Tag -> Packets )' => {
                                                  '0' => '18309565892',
                                                  '6' => 4725226136,
                                                  '7' => 2016871236,
                                                  '5' => 1937646890,
                                                  '3' => 423068898,
                                                  '2' => 41422754,
                                                  '1' => 11665393,
                                                  '4' => 567635
                                                },
           'dscp:incoming ( Tag -> Packets )' => {
                                                   '0' => '11778685571',
                                                   '46' => 2184094729,
                                                   '26' => 394305418,
                                                   '40' => 131328936,
                                                   '48' => 84660939,
                                                   '18' => 42501678,
                                                   '24' => 32812572,
                                                   '12' => 8553900,
                                                   '56' => 6240271,
                                                   '10' => 3500510,
                                                   '44' => 502413,
                                                   '34' => 463741,
                                                   '4' => 247044,
                                                   '52' => 113496,
                                                   '32' => 103896,
                                                   '28' => 29765,
                                                   '53' => 11408,
                                                   '20' => 243,
                                                   '49' => 152,
                                                   '54' => 7,
                                                   '2' => 6,
                                                   '50' => 3
                                                 },
           'dscp:outgoing ( Tag -> Packets )' => {
                                                   '0' => '16977173711',
                                                   '48' => 4360459601,
                                                   '46' => 1862171624,
                                                   '26' => 390223766,
                                                   '40' => 75123620,
                                                   '18' => 41422531,
                                                   '24' => 32812591,
                                                   '56' => 9411197,
                                                   '12' => 8258740,
                                                   '10' => 3406653,
                                                   '52' => 491103,
                                                   '34' => 463739,
                                                   '44' => 351642,
                                                   '4' => 228762,
                                                   '32' => 103896,
                                                   '53' => 47290,
                                                   '28' => 32541,
                                                   '20' => 223,
                                                   '49' => 152,
                                                   '54' => 7,
                                                   '2' => 6,
                                                   '50' => 3
                                                 },
           'output queues:dropped (queue - threshold)' => {
                                                            '3-3' => 23391191,
                                                            '2-2' => 8571412,
                                                            '2-1' => 1729737,
                                                            '4-3' => 103134,
                                                            '2-3' => 1948,
                                                            '3-1' => 45
                                                          },
           'output queues:enqueued (queue - threshold)' => {
                                                             '3-3' => '14238358295',
                                                             '2-3' => 7146794945,
                                                             '4-3' => 3298145116,
                                                             '1-3' => 1862349561,
                                                             '2-2' => 1078365695,
                                                             '2-1' => 230568942,
                                                             '1-1' => 75476596,
                                                             '4-1' => 541520,
                                                             '3-1' => 32559
                                                           }

Processed 260 interfaces

In one case, when we were having an issue with a SIP trunk, we re-numbered
our end to another IP in the same subnet. Same path from A to Z, but the
packet loss mysteriously disappeared using the new IP.

lag hash put you on a congested fiber?

Or perhaps a switch fabric module geeked out and impacted 1/3rd or 1/4th of the flows through the box? We’ve seen that exact scenario more times then I want to admit with our old equipment vendor.

Jason said it was the RTP traffic that was lossy over L3...so that's not UDP/5060, but [at least commonly] UDP/10000:20000. i.e. to a network engineer trying to harden the network against DDoS attacks, just random UDP traffic. Someone writing a stateless UDP filter/policer not thinking about RTP might easily implement a filter that doesn't allow all RTP packets to pass.