DDOS, IDS, RTBH, and Rate limiting

Pavel_Odintsov · November 20, 2014, 9:36pm

Hello, folks!

I'm author of fastnetmon, thank you for some PR for my toolkit

I use this tool for similar type of attacks and we do analyze all
traffic from uplinks ports using port mirroring. You can look at this
network diagram:
https://raw.githubusercontent.com/FastVPSEestiOu/fastnetmon/master/network_map.png

I tried to use netflow many years ago but it's not accurate enough and
not so fast enough and produce big overhead on middle class network
routers. It's because I wrote this tool and do every packet analyze.
It can detect attack in 2 seconds max and call BGP blackhole as quick
as thought.

It can detect three types of attacks:
1) Speed attack for certain IP (we ban every IP which exceed 1 Gbps)
2) Packet per second attack for certain IP (we ban every IP which
exceed 100 000 ppps)
3) And flow flood (very useful mode in networks with big bandwidth/pps
per client)

FastNetMon can handle 2-3 million of packets per second and ~20Gbps on
standard i7 2600 Linux box with Intel 82599 NIC.

If you need any help or suggestions you can email me directly or ask via GitHub.

Thank you!

Dobbins_Roland · November 20, 2014, 9:59pm

I tried to use netflow many years ago but it's not accurate enough and
not so fast enough and produce big overhead on middle class network
routers.

These statements are not supported by the facts. NetFlow (and other varieties of flow telemetry) has been used for many years for traffic engineering-related analysis, capacity planning, and security purposes. I've never seen the CPU utilization on even a modest mid-range router rise above single-digits, except once due to a bug (which was fixed quickly).

Flow telemetry scales and provides invaluable edge-to-edge traceback information. NetFlow telemetry is accurate enough to be used for all the purposes noted above by network operators across the world, from the smallest to the largest networks in the world.

There are several excellent open-source NetFlow analysis tools which allow folks to benefit from NetFlow analysis without spending a lot of money. Some of these projects have been maintained and enhanced for many years; their authors would not do that if NetFlow analytics weren't sufficient to needs.

Packet-based analysis is certainly useful, but does not scale and does not provide traceback information.

FastNetMon can handle 2-3 million of packets per second and ~20Gbps on standard i7 2600 Linux box with Intel 82599 NIC.

See the comments above with regards to scale. This is inadequate for a network of any size, it does not provide traceback information, and it does not lend itself to broad deployment across a network of any size.

I'm sure FastNetMon is a fine tool, and it's very good of you to spend the time and effort to develop it and to make it available. However, making demonstrably-inaccurate statements about other technologies which are in wide use by network operators and which have a proven track record in the field is probably not the best way to encourage folks to try FastNetMon.

Denys_Fedoryshchenko · November 20, 2014, 11:22pm

Netflow is stateful stuff, and just to run it on wirespeed, on hardware, you need to utilise significant part of TCAM,
i am not talking that on some hardware it is just impossible to run it.
So everything about netflow are built on assumption that hosting or ISP can run it. And based on some observations, majority of small/middle hosting providers are using minimal,just BGP capable L3 switch as core, and cheapest but reliable L2/L3 on aggregation, and both are capable in best case to run sampled sFlow.
And last thing, from one of public papers, netflow delaying factors:
1. Flow record expiration
2. Exporting process
• Typical delay: 15-60 sec.
So for a small hosting(up to 10G), i believe, FastNetMon is best solution. Faster, and no significant investments to equipment. Bigger hosting providers might reuse their existing servers, segment the network, and implement inexpensive monitoring on aggregation switches without any additional cost again.
Ah, and there is one more huge problem with netflow vs FastNetMon - netflow just by design cannot be adapted to run pattern matching, while it is trivial to patch FastNetMon for that, turning it to mini-IDS for free.

Data_Zone · November 21, 2014, 1:07am

What happens when someone spoofs legitimate hosts that your customers use?

Dobbins_Roland · November 21, 2014, 1:12am

Netflow is stateful stuff,

This is factually incorrect; NetFlow flows are unidirectional in nature, and in any event have no effect on processing of data-plane traffic.

and just to run it on wirespeed, on hardware, you need to utilise significant part of TCAM,

Again, this is factually incorrect.

i am not talking that on some hardware it is just impossible to run it.

This is also factually incorrect. Some platforms/linecards do not in fact support NetFlow (or other varieties of flow telemetry) due to hardware limitations.

And last thing, from one of public papers, netflow delaying factors:
1. Flow record expiration

This is tunable.

• Typical delay: 15-60 sec.

This is an entirely subjective assessment, and does not reflect operational realities. These are typically *maximum values* - and they are well within operationally-useful timeframes. Also, the effect of NetFlow cache size and resultant FIFOing of flow records is not taken into account, nor is the effect on flow termination and flow-record export of TCP FIN or RST flags denoting TCP traffic taken into account.

So for a small hosting(up to 10G), i believe, FastNetMon is best solution.

This is a gross over-generalization unsupported by facts. Many years of operational experience with NetFlow and other forms of flow telemetry by large numbers of network operators of all sizes and varieties contract this over-generalization.

It is generally unwise to make sweeping statements regarding operational impact which are not borne out by significant operational experience in production networks.

Faster, and no significant investments to equipment.

This statement indicates a lack of understanding of opex costs, irrespective of capex costs.

Bigger hosting providers might reuse their existing servers, segment the network, and implement inexpensive monitoring on aggregation switches without any additional cost again.

This statement indicates a lack of operational experience in networks of even minimal scale.

Ah, and there is one more huge problem with netflow vs FastNetMon - netflow just by design cannot be adapted to run pattern matching, while it is trivial to patch FastNetMon for that, turning it to mini-IDS for free.

This statement betrays a lack of understanding of NetFlow-based (and other flow telemetry-based) detection and classification, as well as the undesirability and negative operational impact of stateful IDS/'IPS' deployments in production networks.

You should also note that FastNetMon is far from unique; there are multiple other open-source tools which provide the same type of functionality, and none of them have replaced flow telemetry, either.

Tools such as FastNetMon supplement flow telemetry, in situations in which such tools can be deployed. They do not begin to replace flow telemetry, and they are not inherently superior to flow telemetry.

Again, I'm sure FastNetMon is a useful tool in many circumstances. But it would be a much better idea to define FastNetMon positively in terms of its own inherent value, rather than attempting to define it based upon factually incorrect negative 'comparisons' to other well-established, widely-used technologies which have demonstrable track records within the global operational community.

Rob_Duffy · November 21, 2014, 2:19am

Roland, you seem to have a lot of experience with these kinds of tools.
What open-source NetFlow analysis tools would you recommend for quickly
detecting a DDoS attack?

Dobbins_Roland · November 21, 2014, 2:37am

I generally recommend that folks get started with something like nfdump/nfsen or ntop. There are other, more sophisticated tools out there, but these allow one to get up and running quickly, and to gain valuable operational experience with which to evaluate more sophisticated tools, if they're needed.

Tim_Jackson · November 21, 2014, 2:50am

I highly recommend pmacct and it's in-memory tables. Lightweight, easy to
query and super fast.

You can also easily run multiple aggregates of traffic to find what you are
interested in, tag common interface types to easily filter traffic..

Or you can use pmacct to insert this into whatever database you want, AMQP
or MongoDB..

My current favorite is using an IMT table for DoS detection and another for
aggregates for interesting traffic types and querying this every X minutes
and inserting it into ElasticSearch. Kibana makes the most powerful netflow
dashboard ever.

Rob_Duffy · November 21, 2014, 3:00am

I've been using NTOP for couple of years. I'm mostly looking for something
that can quickly detect DDoS attacks in a datacenter environment. Thanks
for the suggestions. I"ll check them out.

Paul_S · November 21, 2014, 5:08am

WANguard from andrisoft has worked well on this for us.

It supports flow telemetry and mirrored ports both (We use flows strictly), and does what it says it does.

No complaints.

Dobbins_Roland · November 21, 2014, 6:40am

I believe the thread was focusing on open-source tools.

Denys_Fedoryshchenko · November 21, 2014, 8:17am

Netflow is stateful stuff,

This is factually incorrect; NetFlow flows are unidirectional in
nature, and in any event have no effect on processing of data-plane
traffic.

Word stateful has nothing common with stateful firewall.Stateful protocol. "a protocol which requires keeping of the internal state on the server is known as a stateful protocol." And sure unidirectional/bidirectional is totally unrelated.

and just to run it on wirespeed, on hardware, you need to utilise significant part of TCAM,

Again, this is factually incorrect.

Proof, that majority of solutions runs *flow not in software.

Cisco 65xx (yes, they are obsolete, but they run stuff wirespeed)
Aug 24 12:30:53: %EARL_NETFLOW-SP-4-TCAM_THRLD: Netflow TCAM threshold exceeded, TCAM Utilization [97%]
This is best example. Also on many Cisco's if you use UBRL, then you cannot use NetFlow, just because they use same part of TCAM resources. Others, for example Juniper, are using sampling (read - missing data), just to not overflow resources, and has various limitations, such as RE-DPC communication pps limit, licensing limit.
For example MS-DPC is pretty good one, few million flows in hardware, 7-8Gbps of traffic, and... cost $120000.

i am not talking that on some hardware it is just impossible to run it.

This is also factually incorrect. Some platforms/linecards do not in
fact support NetFlow (or other varieties of flow telemetry) due to
hardware limitations.

But still they can run fine mirroring, and fastnetmon will do it's job.

And last thing, from one of public papers, netflow delaying factors:
1. Flow record expiration

This is tunable.

In certain limits. You can't set flow-active-timeout less than 60 seconds in Junos 14 for example.
On some platforms even if you can, you just run in the limits of platforms again (forwarding - management communications).

• Typical delay: 15-60 sec.

This is an entirely subjective assessment, and does not reflect
operational realities. These are typically *maximum values* - and
they are well within operationally-useful timeframes. Also, the
effect of NetFlow cache size and resultant FIFOing of flow records is
not taken into account, nor is the effect on flow termination and
flow-record export of TCP FIN or RST flags denoting TCP traffic taken
into account.

So for a small hosting(up to 10G), i believe, FastNetMon is best solution.

This is a gross over-generalization unsupported by facts. Many years
of operational experience with NetFlow and other forms of flow
telemetry by large numbers of network operators of all sizes and
varieties contract this over-generalization.

Fastnetmon and similar tools popularity says for itself.

It is generally unwise to make sweeping statements regarding
operational impact which are not borne out by significant operational
experience in production networks.

"What can be asserted without evidence can be dismissed without evidence."

Faster, and no significant investments to equipment.

This statement indicates a lack of understanding of opex costs,
irrespective of capex costs.

Sweet marketing buzzwords, that is used together with some unclear calculations,
to sell suffering hosting providers various expensive tools, that is not necessary for them.
OPEX of fastnetmon is a small fee for qualified sysadmin, and often not required,
because already hosting operator should have him.

Bigger hosting providers might reuse their existing servers, segment the network, and implement inexpensive monitoring on aggregation switches without any additional cost again.

This statement indicates a lack of operational experience in networks
of even minimal scale.

Ah, and there is one more huge problem with netflow vs FastNetMon - netflow just by design cannot be adapted to run pattern matching, while it is trivial to patch FastNetMon for that, turning it to mini-IDS for free.

This statement betrays a lack of understanding of NetFlow-based (and
other flow telemetry-based) detection and classification, as well as
the undesirability and negative operational impact of stateful
IDS/'IPS' deployments in production networks.

You should also note that FastNetMon is far from unique; there are
multiple other open-source tools which provide the same type of
functionality, and none of them have replaced flow telemetry, either.

Thats a power of opensource. Since FastNetMon is not only tool, worth to mention others,
people here will benefit from using it, for free. And i'm sure, author of FastNetMon will
not feel offended at all.

Tools such as FastNetMon supplement flow telemetry, in situations in
which such tools can be deployed. They do not begin to replace flow
telemetry, and they are not inherently superior to flow telemetry.

Again, I'm sure FastNetMon is a useful tool in many circumstances.
But it would be a much better idea to define FastNetMon positively in
terms of its own inherent value, rather than attempting to define it
based upon factually incorrect negative 'comparisons' to other
well-established, widely-used technologies which have demonstrable
track records within the global operational community.

I can agree only that arguing about this subject is waste of time.
FastNetMon has it's narrow specific purpose - detecting very quickly DDoS attacks on <10G bandwidth,
where netflow just by design cannot outperform it. But FastNetMon cannot be used for telemetry,
and such stuff.

Dobbins_Roland · November 21, 2014, 12:50pm

Word stateful has nothing common with stateful firewall.Stateful protocol. "a protocol which requires keeping of the internal state on the server is known as a stateful protocol."

Correct - and NetFlow is not stateful, by this definition.

And sure unidirectional/bidirectional is totally unrelated.

On the contrary, it is quite relevant.

Cisco 65xx (yes, they are obsolete, but they run stuff wirespeed)

They are not obsolete - they perform very well with Sup2T and EARL8-based linecards.

Aug 24 12:30:53: %EARL_NETFLOW-SP-4-TCAM_THRLD: Netflow TCAM threshold exceeded, TCAM Utilization [97%]

This is from a 6500 with either an EARL6 or EARL7 ASIC, which had many caveats with regards to NetFlow, including a lack of packet-sampled control of flow creation - i.e., sampled NetFlow. As part of the extended team which defined requirements for the EARL8 ASIC, which is utilized in the Sup2T and DFC-4 enabled linecards, I can assure you that this is no longer an issue with 6500s running EARL8-based Sups and linecards.

Also on many Cisco's if you use UBRL, then you cannot use NetFlow, just because they use same part of TCAM resources.

This is where TCAM carving comes into play. Also, it is not so much an issue with newer hardware, per the above. Also, URBL is not commonly used in ISP networks.

Others, for example Juniper, are using sampling (read - missing data),

The largest networks in the world use sampled NetFlow every hour of every day for many purposes, including DDoS detection/classification/traceback. It works quite well for all those purposes.

just to not overflow resources, and has various limitations, such as RE-DPC communication pps limit, licensing limit.
For example MS-DPC is pretty good one, few million flows in hardware, 7-8Gbps of traffic, and... cost $120000.

You get what you pay for.

But still they can run fine mirroring, and fastnetmon will do it's job.

On the contrary - SPAN nee port mirroring cuts into the frames-per-second budget of linecards, as the traffic is in essence being duplicated. It is not 'free', and it has a profound impact on the the switch's data-plane traffic forwarding capacity.

Unlike NetFlow.

In certain limits. You can't set flow-active-timeout less than 60 seconds in Junos 14 for example.

Platforms vary, this is true. However, I have never run into an issue with an active flow timer of 60s, nor have I ever run into anyone who has done so.

On some platforms even if you can, you just run in the limits of platforms again (forwarding - management communications).

This is incorrect.

Fastnetmon and similar tools popularity says for itself.

Yes, it does - they are far less popular that NetFlow, because they do not scale on networks of any size, nor do they provide traceback (given your lack of comments on traceback elsewhere in this thread, it appears that you aren't familiar with this concept).

"What can be asserted without evidence can be dismissed without evidence."

You make my point very well, thank you. There is overwhelming evidence that NetFlow and similar forms of flow telemetry scale well and provide real, measurable, actionable operational value on networks of all types and sizes. The reason for the popularity of flow telemetry is that it is low-opex (no probes to deply); low-capex (no probes to deploy); scales to tb/sec speeds; is practicable for large networks (no probes to deploy); provides instantaneous traceback (probes can't do this); and provides statistics on dropped traffic (probes can't do this, either).

Sweet marketing buzzwords,

It's pretty obvious which half of this 'conversation' is focused on marketing; and it isn't mine.

that is used together with some unclear calculations,

No calculations have been discussed during the course of this 'conversation'.

to sell suffering hosting providers various expensive tools,

I'm uninterested in selling anyone anything. What I'm interested in doing is correcting the misinformation you are promulgating regarding the utility of flow telemetry coupled with open-source flow analysis systems. There has been no mention of any commercial systems or products in my half of this 'conversation'.

that is not necessary for them.

Again, the benefits of flow telemetry are quite clear for networks of any size.

OPEX of fastnetmon is a small fee for qualified sysadmin, and often not required, because already hosting operator should have him.

You obviously do not know what the term opex actually means, nor what it encompasses.

I can agree only that arguing about this subject is waste of time.

Yes, it isn't a profitable use of time to argue with someone who does not have the degree of operational expertise nor experience to back his demonstrably incorrect assertions.

where netflow just by design cannot outperform it

Again, this is a completely unsupported statement with no basis in fact, and it totally ignores the inherent characteristics of flow telemetry (instantaneous traceback, statistics on dropped traffic, scalability, low opex) which make it eminently suitable for these various applications.

To be clear - the particular tool you are doing such a poor job of advocating is in no way unique, and is completely orthogonal to the utility, capabilities, and scalability of flow telemetry. If such tools were so superior to flow telemetry, they would've eclipsed flow telemetry as the preferred mechanism for achieving visibility into network traffic many years ago.

I am going to stop replying to your trolling, because you obviously do not have the requisite operational experience and depth/breadth of knowledge to even try to plausibly support your demonstrably-incorrect assertions. One can only hope that such a potentially useful tool as FastNetMon isn't tarnished in the view of those who have read this thread due to such uninformed, erroneous misadvocacy.

Denys_Fedoryshchenko · November 21, 2014, 2:42pm

Word stateful has nothing common with stateful firewall.Stateful protocol. "a protocol which requires keeping of the internal state on the server is known as a stateful protocol."

Correct - and NetFlow is not stateful, by this definition.

Not stateful, if you pick on "server" word.
To be able to make bytes/packets accounting for a flow, you need to keep this specific flow previous state. To be able to differentiate between flows with same src/dst ip+ports (if one is ended, next is started with same data) you need to track it's state, again. And just to keep track of _flows_ in packet switched network you need states. Surprising lack of knowledge.

And sure unidirectional/bidirectional is totally unrelated.

On the contrary, it is quite relevant.

Cisco 65xx (yes, they are obsolete, but they run stuff wirespeed)

They are not obsolete - they perform very well with Sup2T and
EARL8-based linecards.

Seems yes, i'm wrong on that point, i was not successful to run netflow reliable way , but it was before CSCul90377 and CSCui17732 fixed.

Others, for example Juniper, are using sampling (read - missing data),

The largest networks in the world use sampled NetFlow every hour of
every day for many purposes, including DDoS
detection/classification/traceback. It works quite well for all those
purposes.

Use case of fastnetmon is not largest networks. Sampled netflow is useless for per-traffic billing purpose for example.

just to not overflow resources, and has various limitations, such as RE-DPC communication pps limit, licensing limit.
For example MS-DPC is pretty good one, few million flows in hardware, 7-8Gbps of traffic, and... cost $120000.

You get what you pay for.

While i can pay $1500 for a server, and get netflow and ~3second BGP blackholing with fastnetmon.

But still they can run fine mirroring, and fastnetmon will do it's job.

On the contrary - SPAN nee port mirroring cuts into the
frames-per-second budget of linecards, as the traffic is in essence
being duplicated. It is not 'free', and it has a profound impact on
the the switch's data-plane traffic forwarding capacity.

Unlike NetFlow.

In hosting case mirroring usually done for uplink port, but i have to agree, it might be a problem.

Yes, it does - they are far less popular that NetFlow, because they do
not scale on networks of any size, nor do they provide traceback
(given your lack of comments on traceback elsewhere in this thread, it
appears that you aren't familiar with this concept).
You make my point very well, thank you. There is overwhelming
evidence that NetFlow and similar forms of flow telemetry scale well
and provide real, measurable, actionable operational value on networks
of all types and sizes. The reason for the popularity of flow
telemetry is that it is low-opex (no probes to deply); low-capex (no
probes to deploy); scales to tb/sec speeds; is practicable for large
networks (no probes to deploy); provides instantaneous traceback
(probes can't do this); and provides statistics on dropped traffic
(probes can't do this, either).

And again and again we are going to tb/s. I don't need TB/s, i dont need traceback,nor on relatively small ISP nor on VDS provider i dont need all that above. I just need inexpensive way to block attacked ip and/or announce it from different location within minimal timeframe, to minimize impact on other customers.
You might be highly professional with large scale operators, but small guys needs and capabilities are very different.
I had developed tool similar to fastnetmon for almost same purpose, detecting attacks and switching affected network by BGP to "protected" backbone. After calculating "OPEX/CAPEX", capable server turned to be much cheaper alternative in short and long term than buying netflow capable hardware (and support for it) just for netflow purposes, and buying hardware for netflow collector.
Let's talk numbers.
My case is small hosting, 4G, C4948-10G, one 10G uplink, one 10G port is free. Switch is not capable to run sFlow or Netflow.
Decent server is available already, since it is hosting company, so the only expenses are 10G 82599 card, which is around $500. Even in case server is not available, based on data from fastnetmon author still total cost is within $1500. Deployment time - hours from installing hardware, without distrupting existing traffic.
"Major" expenses - tuning server according author recommendations, and writing shell script that will send to 4948 command to blackhope IP. For qualified sysadmin it is 2 hours of work, and $500 max as a "labor" cost. Thats it. What can be cheaper than $2000 in this case? I guess i wont get answer.

I'm uninterested in selling anyone anything. What I'm interested in
doing is correcting the misinformation you are promulgating regarding
the utility of flow telemetry coupled with open-source flow analysis
systems. There has been no mention of any commercial systems or
products in my half of this 'conversation'.

I didn't meant you at all, but i meant when i'm hearing OPEX/CAPEX, often it is
not real detailed calculations, but some very well unrealistic mangled numbers,
that surprisingly looks for good marketed product, and bad for competing products.

I am going to stop replying to your trolling, because you obviously do
not have the requisite operational experience and depth/breadth of
knowledge to even try to plausibly support your demonstrably-incorrect
assertions. One can only hope that such a potentially useful tool as
FastNetMon isn't tarnished in the view of those who have read this
thread due to such uninformed, erroneous misadvocacy.

So much arrogance. But on something i have to agree, again, it is perfect idea to stop this useless flamewar.