Data on latency and loss-rates during congestion DDoS attacks

Dear NANOG,

One of my ongoing research works is about a transport protocol that ensures (critical) communication in spite of DDoS congestion attack (which cannot be circumvented), by (careful) use of Forward Error Correction. Yes, obviously, this has to be done and used carefully since the FEC clearly increases traffic rather than the typical congestion-control approach of reducing it, I’m well aware of it; but some applications are critical (and often low-bandwidth) so such tool is important.

I am looking for data on loss rate and congestion of DDoS attacks to make sure we use right parameters. Any chance you have such data and can share?

Many thanks!

I suggest testing with a broad variety of values, as losses as low as 5% can be annoying, but losses at 50% or more are not uncommon.

Damian

Damian, thanks!

That’s actually roughly the range of losses we focused on; but it was based on my rough feeling for reasonable loss rates (as well as on experiments where we caused losses in emulated environments), and a reviewer - justifiably - asked if we can base our values on realistic values. So I would love to have real value, I’m sure some people have these measured (I’m actually quite sure I’ve seed such values, but the challenge is recalling where and finding it…).

Also, latency values (under congestion) would be appreciated. Also here, we used a range of values, I think the highest was 1sec, since we believe that under congestion delays goes up considerably since many queues fill up [and again I seem to recall values around this range]. But here the reviewer even challenged us and said he/she doubts that delays increase significantly under network congestion since he/she thinks that the additional queuing is something mostly in small routers such as home routers (and maybe like the routers used in our emulation testbed). So I’ll love to have some real data to know for sure.

Apart from knowing these things for this specific paper, I should know them in a well-founded way anyway, as I’m doing rearch on and teaching net-sec (incl. quite a lot on DoS) :slight_smile:

That's actually roughly the range of losses we focused on; but it was based on my rough feeling for reasonable loss rates (as well as on experiments where we caused losses in emulated environments), and a reviewer - justifiably - asked if we can base our values on realistic values. So I would love to have real value, I'm sure some people have these measured (I'm actually quite sure I've seed such values, but the challenge is recalling where and finding it...).

DDoS is very very cheap, if there is a single global egress for given
interface then the DDoS traffic can easily be 100 times the egress
capacity (1GE egress, 100GE DDoS). I'm very skeptical if FEC will
help, I think this is case of cat and mouse, based on data you see now
it may seem reasonable, but now is only result of minimum viable ddos,
which is trivial to increase should need occur. Similarly DDoS attacks
are excessive dumb often, like dumb UDP ports which are easy drop, but
should we solve protection well for these, it's trivial to make it
proper HTTPS TCP SYN.

Also, latency values (under congestion) would be appreciated. Also here, we used a range of values, I think the highest was 1sec, since we believe that under congestion delays goes up considerably since many queues fill up [and again I seem to recall values around this range]. But here the reviewer even challenged us and said he/she doubts that delays increase significantly under network congestion since he/she thinks that the additional queuing is something mostly in small routers such as home routers (and maybe like the routers used in our emulation testbed). So I'll love to have some real data to know for sure.

Backbone device interface can add hundreds of milliseconds during
congestion, but more commonly we're talking about tens of milliseconds
during congestion and low microseconds to high nanoseconds outside
congestion.
Backbone device buffer space is highly googlable, BRCM
trident/tomahawk styte boxes have very little, but they are more
intended for DC/LAN scenarios, than WAN. Nokia FP, Huawei Solar,
EZchip, Cisco nPower, Cisco Silicon One, Juniper Trio, Juniper
Paradise, Broadcom Jericho all will buffer high tens of milliseconds
to low hundreds.

DDoS is very very cheap, if there is a single global egress for given
interface then the DDoS traffic can easily be 100 times the egress
capacity (1GE egress, 100GE DDoS).

Thanks. However, my question is about statistics of attacks actually seen in the wild' - and not just the worst’ but also more common attacks. Furthermore, I’m asking about the outcome of the congestion - mainly, loss-rates and latency - and not about the amount of DDoS traffic. DDoS traffic often gets lost itself in different intermediate routers, so its ultimate impact is not trivial to estimate.

I’m very skeptical if FEC will
help, I think this is case of cat and mouse

hmm, I don’t think so; it is more a matter of justification, and also, obviously, amount of over-capacity - which is still, obviously, a basic thing anybody concerned about congestion would worry about. Let me be extreme and simplify… Suppose idd attacker can send 100 times the capacity of a (say, single) router, resulting in 99% loss rate. Then FEC should work - but, of course, with high overhead, let’s even simplify and say it requires 100 times redundancy (although it’s actually not as bad as that). Still, this can be Ok if I have 100 times overcapacity - which for many critical applications, is not even a big deal, as crazy as it sounds (and is) for general applications.

, based on data you see now
it may seem reasonable, but now is only result of minimum viable ddos,
which is trivial to increase should need occur.

I still think evaluation should preferably compare to attacks reported in reality, with potential additional analysis of projections of potential attacks.

Similarly DDoS attacks
are excessive dumb often, like dumb UDP ports which are easy drop, but
should we solve protection well for these, it’s trivial to make it
proper HTTPS TCP SYN.

hmm, tcp-syn is already a different story (and we have pretty good defenses against it and many other attacks on the end host). I do work on some of these attacks (and defenses) too but in this specific case I’m focusing on bandwidth-DoS attacks (network congestion). I’m further focusing in this work on a defense which may involves a transport (end to end) protocol, of course I’m aware of network-based defenses, it’s just not focus of this work (think of customer with no ability to `fix’ the network service).

Backbone device interface can add hundreds of milliseconds during
congestion, but more commonly we’re talking about tens of milliseconds
during congestion and low microseconds to high nanoseconds outside
congestion.
Backbone device buffer space is highly googlable, BRCM
trident/tomahawk styte boxes have very little, but they are more
intended for DC/LAN scenarios, than WAN. Nokia FP, Huawei Solar,
EZchip, Cisco nPower, Cisco Silicon One, Juniper Trio, Juniper
Paradise, Broadcom Jericho all will buffer high tens of milliseconds
to low hundreds.

Thanks again, but I’m not looking for data on particular devices; the latency during congestion attacks may be impacted by multiple devices along the path. So again my interest is mainly in measured values under real attacks.

tks! Amir

Getting (and releasing) numbers from DDoS attacks will be challenging for most, but I think your research could apply to more than just DDoS. There are often cases where one might want to work from an environment which has very poor networking. As an extreme example, in 2007 I got online from an internet cafe in Paramaribo. But, as I told a friend at the time, “latency is about 1s and packet loss around 10%”. It would be great if forward error correction could have improved that experience.

Damian

Hi Damian, thanks, that’s right; actually in high-latency and 10% loss, you get much better performance than either TCP or Quic. However, these are not as common scenarios as clogging due to DDoS… So we still want to find relevant data, to know which ranges of latency and loss make sense.

Guys: if you can share data but only privately, please do :slight_smile: thanks!

Amir

" he/she doubts that delays increase significantly under network congestion since he/she thinks that the additional queuing is something mostly in small routers such as home routers (and maybe like the routers used in our emulation testbed) "

Wow, this is the first time I’ve found an academic challenging the increase of delay in routers under network congestion.

The doubt is childish. It’s like a question you’d expect to hear in a “networking 101” class.

I don't know if context implies reviewer was academic. Whilethe common
case remains that latencies per link jump from low microseconds to
tens of milliseconds during congestion of BB interface, there are also
a lot of deployments using devices (trident, tomahawk) with minimal
buffering not allowing even millisecond of buffering during
congestion. Reviewer may have thought of those devices when they
answered, but I agree that answer would be generally wrong.

I have no idea who was the reviewer (academic or industry or whatever). However, he didn’t actually object to the assertion that latency increases with congestion; he only raised the question of the which latency values would be typical/reasonable for a congestion DoS attack. Notice also that the relevant parameter is end-to-end latency (or RTT), not the per-device latency. And surely, there can be wide variety here (that’s why we do experiments under different values and plot graphs…). The question is, what is the most important range to focus on (when measuring and comparing different protocols).

Anyway, thanks for the comments; if anyone has such data they can share, that’ll be great and appreciated.