Real world sflow vs netflow?

David_Hubbard · July 13, 2012, 5:30pm

Can anyone on or off list give me some real world
thoughts on sflow vs netflow for border
routers? (multi-homed, BGP, straight v4 & v6 only
for web hosting, no mpls, vpns, vlans, etc.)

Finding it hard to decipher the vendor version
of the answer to that question. We use
netflow v9 currently but are considering hardware
that would be sflow. We don't use it for
billing purposes, mostly for spotting malicious
remote hosts doing things like scans, spotting
traffic such as weird ports in use in either
direction that warrant further investigation,
watching for ddos/dos destinations to act on
mitigation, or investigating the nature of unusual
levels of traffic on switch ports that set off
alarms. I'm concerned things like port scans,
etc. won't be picked up by the NMS if fed by
sflow due to the sampling nature, or similar
concern if 500 ssh connections by the same remote
host are sampled as 1 connection, etc. Of course
these concerns were put in my head by someone
interested in me continuing to use equipment that
happens to output netflow data, hence me wanting some
real people answers.

Thanks!

Jeroen_Massar1 · July 13, 2012, 5:44pm

[..]

We don't use it for
billing purposes, mostly for spotting malicious
remote hosts doing things like scans, spotting
traffic such as weird ports in use in either
direction that warrant further investigation,

[..]

The primary difference between NetFlow/IPFIX and sFlow is that NetFlow
is unsampled while sFlow is sampled. As such, for these kind of cases it
might be more worthy to have NetFlow than sFlow as you get all the
source/dest ports. On the other hand sFlow can give you packet headers
and that might be useful if you get every first say 200 bytes of every flow.

Though depending on the hardware and traffic volume and traffic mix you
might have to sample anyway.

Oh and there is a small difference in the packet formats and the idea
behind why something exists, but that won't hurt you too much.

Greets,
Jeroen

Harry_Hoffman · July 13, 2012, 5:52pm

Hi David,

I'm not sure that sflow is going to get your the granularity that you
are looking for. It's usually better to start more granular and then
aggregate into larger flows when you graph or reference for historic values.

Have you looked at other options, such as argus [1] to collect flow data
outside of the networking gear?

This way the networking gear can do what its primary job and flow
collection can happen elsewhere.

There's a whole argus community that discusses the information security
topics you're interested in and Carter, the guy who wrote all (?) of the
code is very responsive. Argus can also take in NetFlow flows from your
routers too.

There are obviously other tools available, that may work as well or
better, but argus is one I've been using with great success in a fairly
heavily trafficked environment.

Cheers,
Harry

[1] http://www.qosient.com/argus/

Peter_Phaal · July 13, 2012, 8:20pm

Hi David,

The main architectural difference between sFlow and Netflow is the
location of the flow cache:

1. NetFlow: Packets are decoded on the router, flow keys are extracted
and used to lookup/create an entry in a flow cache which is then
updated based on values in the packet. Records are exported from the
flow cache in the form of Netflow datagrams when the flow completes or
based on a timeout.
2. sFlow: Packets are randomly sampled in hardware and the packet
headers are immediately exported as sFlow datagrams - there is no flow
cache on the switch/router. In addition to exporting the packet
header, the sFlow agent captures the FIB state associated with
forwarding the sampled packet, exporting information such as next hop
router, AS-path, communities etc. An sFlow agent also periodically
sends all the MIB-II interface counters, eliminating the need for SNMP
polling - this isn't very important if you are only monitoring a few
links, but makes a big difference if you are monitoring large chassis
switches or tens or hundreds of thousands of ports in a data center or
campus environment.

Moving the flow cache off the router has a number of benefits:
1. You are no longer limited by the hardware/firmware capabilities of
the router - your analysis software decides which fields to decode and
how to accumulate results. For example, if you are managing a mixed
IPv4/IPv6 environment you can decide to use sFlow to look into v6 over
v4 and v4 over v6 tunnels (to do the same thing with Netflow would
likely require a hardware upgrade). You can even feed sFlow into
Wireshark for detailed analysis of protocols and packet headers.
2. Operational complexity is greatly reduced since the configuration
options and resource management issues associated with the flow cache
are eliminated.
3. Low latency. Measurements aren't delayed by the flow cache - you
can detect DDoS attacks/large flows within seconds.
4. Scalability - you can turn on sFlow on every link (even 100G
links), on every device for a comprehensive view of traffic.
5. Multi-vendor interoperability. The sFlow measurements are
interoperable across vendors (since very little processing is
performed on the devices). With NetFlow, different vendors and devices
have different hardware limitations affecting the fields that they can
export.

Unsampled Netflow is only practical for moderate traffic levels. If
you carry significant traffic you would want to enable sampling
anyway, even with Netflow. However, there are a wide range of Netflow
sampling implementations, many of which yield questionable results. In
contrast, the sFlow standard specifies how sampling must be performed
and ensures that information is included that allows the sampled data
to be correctly scaled and produce unbiased measurements.

Cheers,
Peter

Joe_Loiacono · July 14, 2012, 1:30am

headers are immediately exported as sFlow datagrams - there is no flow
cache on the switch/router. In addition to exporting the packet
header, the sFlow agent captures the FIB state associated with
forwarding the sampled packet, exporting information such as next hop
router, AS-path, communities etc

What about byte counts? Just those in the sampled packet (i.e., no running
totals per flow)?

In contrast, the sFlow standard specifies how sampling must be performed
and ensures that information is included that allows the sampled data
to be correctly scaled and produce unbiased measurements.

Does sflow software typically recreate the total byte count per flow (e.g.,
TCP session) by scaling?

Thanks,

Joe

Lukasz_Bromirski · July 14, 2012, 8:30am

1. NetFlow: Packets are decoded on the router, flow keys are extracted
and used to lookup/create an entry in a flow cache which is then
updated based on values in the packet. Records are exported from the
flow cache in the form of Netflow datagrams when the flow completes or
based on a timeout.

This is because NetFlow is based on the Flows, where sFlow name is
misleading - it's actually PACKET monitoring technology, not FLOW
monitoring. So the difference in the way both mechanisms work is
inline with their definition.

2. sFlow: Packets are randomly sampled in hardware and the packet
headers are immediately exported as sFlow datagrams - there is no flow
cache on the switch/router.

And that's the biggest problem with sFlow. Packets are sampled, not
flows. You may miss the big or important flow, you don't have
visibility into every conversation going through the device.

sFlow and randomized sampling rely heavily on statistics, but as soon
as you agree on that, you'll loose accuracy right away.

Moving the flow cache off the router has a number of benefits:
1. You are no longer limited by the hardware/firmware capabilities of
the router - your analysis software decides which fields to decode and
how to accumulate results. For example, if you are managing a mixed
IPv4/IPv6 environment you can decide to use sFlow to look into v6 over
v4 and v4 over v6 tunnels (to do the same thing with Netflow would
likely require a hardware upgrade). You can even feed sFlow into
Wireshark for detailed analysis of protocols and packet headers.

NetFlow supports IPv6. As well as L2 traffic (v9), MPLS, multicast and
so on.

2. Operational complexity is greatly reduced since the configuration
options and resource management issues associated with the flow cache
are eliminated.

That will depend on the device and the options. It takes around
3-4 commands to configure the export and then one to activate
it without any templates on a interface on Cisco device.

What's more important, you can have multiple monitors on one
interface monitoring & exporting different sets of traffic to
different groups within company (Security, Network Monitoring,
Trafic Engineering). sFlow gives just sampled packets.

3. Low latency. Measurements aren't delayed by the flow cache - you
can detect DDoS attacks/large flows within seconds.

The same with NetFlow. Cache can be actively flushed.

4. Scalability - you can turn on sFlow on every link (even 100G
links), on every device for a comprehensive view of traffic.

Same with NetFlow & sampling turned on.

However, there are a wide range of Netflow
sampling implementations, many of which yield questionable results. In
contrast, the sFlow standard specifies how sampling must be performed
and ensures that information is included that allows the sampled data
to be correctly scaled and produce unbiased measurements.

The measurements provided by sFlow are only approximation of the real
traffic and while it may be acceptable on LAN links where details don't
matter as much, it's hardly good enough to present a real view on the
WAN links.

sFlow was built to work on switches and provide "some" accuracy, it's
not good enough (unless you do sampling on a 1:5-1:10 basis) to
do billing or some detailed analysis of traffic:

You can use it to *estimate* the traffic, detect DDoS, sure. But the
data & scaling used by sFlow (and additionally tricks used by ASIC
vendors implementing it in the hardware) can't change the fundamental
difference - sFlow is really sPacket, as it doesn't deal with flows.

NetFlow, jFlow, IPFIX deal with flows. You can discuss sampling
accuracy and things like that, but working with flows is more accurate.

Mikael_Abrahamsson · July 14, 2012, 9:15am

If you do 1:1000 sampling with both Netflow and sFlow, why would one of them be more accurate than the other? If you analyze the flow on the device or on the collector (as might be done with sFlow), I don't see why one would be btter than the other.

Lukasz_Bromirski · July 14, 2012, 5:15pm

Sure, but with sampling you'll loose accuracy anyway. The difference is
subtle, and depends on the (Net|j)Flow implementation - on some devices
for sampled NetFlow you'll still get sampled FLOWS (1:x) not sampled
PACKETS (thus disregarding the flow advantage).

Paolo_Lucente · July 15, 2012, 12:16pm

Let's be real and speak implementations: where is L2 information in
NetFlow for routed traffic on bigger platforms typically thrown for
peering at internet exchanges - ASR9K, C7600 (ie. hopefully without
get to invest more money in such platform to upgrade to Sup2T), MX,
CRS?

Cheers,
Paolo

PS: Let's not return on the point of availability of MAC accounting,
since that is not the solution.

Nick_Hilliard3 · July 15, 2012, 8:52pm

And that's the biggest problem with sFlow. Packets are sampled, not
flows. You may miss the big or important flow, you don't have
visibility into every conversation going through the device.

Unless you enable sampling, which is pretty much necessary for non-trivial
traffic volumes.

NetFlow supports IPv6. As well as L2 traffic (v9), MPLS, multicast and
so on.

It does, depending on hardware variety, but you need specific platform
support for each packet variety (v4 / v6 / mpls / etc), and platform
support for this can be very dodgy. You don't need this with sflow - it
just punts 1 in N raw packets out to your collector, and the statistical
assumptions which were made by the networking device are well documented.
I've never seen documentation on the sampling technique used for each
netflow implementation.

The measurements provided by sFlow are only approximation of the real
traffic and while it may be acceptable on LAN links where details don't
matter as much, it's hardly good enough to present a real view on the
WAN links.

sFlow was built to work on switches and provide "some" accuracy, it's
not good enough (unless you do sampling on a 1:5-1:10 basis) to
do billing or some detailed analysis of traffic:

Depends on how detailed your requirements are. For billing, most people
don't classify by packet analysis, but rather by byte count which can be
handled by snmp port counters. If you need to do something fancier,
non-sampled netflow is indeed good enough for billing.

http://www.inmon.com/pdf/sFlowBilling.pdf

You can use it to *estimate* the traffic, detect DDoS, sure. But the
data & scaling used by sFlow (and additionally tricks used by ASIC
vendors implementing it in the hardware) can't change the fundamental
difference - sFlow is really sPacket, as it doesn't deal with flows.

agreed, the name is wrong.

NetFlow, jFlow, IPFIX deal with flows. You can discuss sampling
accuracy and things like that, but working with flows is more accurate.

Depends on your use case. For large traffic values, you run into the law
of large numbers and you can get accurate visibility into what's happening
on your network.

Certainly, netflow _can_ offer amazingly precise visibility into your
network. But the trade-off is that you need specialised hardware to do
this on your line cards or your forwarding engine. This drives up both the
capex (extra hardware) and the opex (tcam is power hungry) of your network.
sflow is much cheaper to implement as you're not maintaining any state on
your chassis. You're just picking out a packet every so often.

The current generation of high end service provider hardware (juniper
mx-3d, cisco sup2t / n7k / asr9k) is pretty much the first generation of
hardware which doesn't have crippling netflow limitations, such as poor
support for v6 / mpls, too small cache sizes, etc. This fact alone should
provide a good indication of how difficult it is to implement it well on
fast boxes.

sflow is simpler, cheaper and in many cases is simply a better choice if
you don't need drill-down into every single flow going through your networking.

Nick

James_Braunegg · July 16, 2012, 10:01pm

Dear All

Around a year ago I had the same debate sflow vs netflow vs snmp port counters. read lots of stories lots of myths lots of good information. My Conclusion

In the end I did real life testing comparing each platform

We routed live traffic (about 250mbits) from our Cisco 7200 G2 routers though Brocade MLXe routers and exported netflow from the Cisco platform and sFlow from the Brocade platform.

Each router sent netflow/sflow traffic to two collectors on independent hardware (same specifications) running the same collection netflow analyzer software.

The end result was after hours of testing, or even days and weeks of testing there was no significant difference between traffic volumes netflow was showing vs slfow. Ie less than 0.5% variance between each environment.

That being said both netflow and sflow both under read by about 3% when compared to snmp port counters, which we put to the conclusion was broadcast traffic etc which the routers didn't see / flow.

Regardless if you're going to bill from netflow or sflow in our test environment we saw no significant difference between either platform.

Hope that helps
Kindest Regards

James Braunegg
W: 1300 769 972 | M: 0488 997 207 | D: (03) 9751 7616
E: james.braunegg@micron21.com | ABN: 12 109 977 666

This message is intended for the addressee named above. It may contain privileged or confidential information. If you are not the intended recipient of this message you must not use, copy, distribute or disclose it to anyone other than the addressee. If you have received this message in error please return the message to the sender by replying to it and then delete the message from your computer.

David_Hubbard · July 16, 2012, 10:25pm

Dear All

Around a year ago I had the same debate sflow vs netflow vs
snmp port counters. read lots of stories lots of myths lots
of good information. My Conclusion

In the end I did real life testing comparing each platform

We routed live traffic (about 250mbits) from our Cisco 7200
G2 routers though Brocade MLXe routers and exported netflow
from the Cisco platform and sFlow from the Brocade platform.

Each router sent netflow/sflow traffic to two collectors on
independent hardware (same specifications) running the same
collection netflow analyzer software.

The end result was after hours of testing, or even days and
weeks of testing there was no significant difference between
traffic volumes netflow was showing vs slfow. Ie less than
0.5% variance between each environment.

That being said both netflow and sflow both under read by
about 3% when compared to snmp port counters, which we put to
the conclusion was broadcast traffic etc which the routers
didn't see / flow.

Regardless if you're going to bill from netflow or sflow in
our test environment we saw no significant difference
between either platform.

What are your thoughts on the non-billing aspects after your
comparison testing; if you are/were using it for those purposes?
We don't use our current netflow for billing, just for security
investigation and (ideally) early alerting of abnormal activity
like port scans, compromised apps on servers, etc.

Thanks,

David

James_Braunegg · July 16, 2012, 10:54pm

Dear David

From a visibility point of view, we obtain as much information as we require to know exactly what's occurring on our network where and when in real-time.

We know what's happening, on any interface on any network at any time. - that being said for us the most important visibility is all about the flow of traffic and packet counts.... the security side should be done at the firewall level !

If anyone wants a demo of our sFlow setup happy to show you via a team viewer session or something !

By the way we are using sFlow now

Kindest Regards

James Braunegg
W: 1300 769 972 | M: 0488 997 207 | D: (03) 9751 7616
E: james.braunegg@micron21.com | ABN: 12 109 977 666

This message is intended for the addressee named above. It may contain privileged or confidential information. If you are not the intended recipient of this message you must not use, copy, distribute or disclose it to anyone other than the addressee. If you have received this message in error please return the message to the sender by replying to it and then delete the message from your computer.

Simon_Leinen · July 17, 2012, 3:32pm

James Braunegg writes:

In the end I did real life testing comparing each platform

Great, thanks for sharing your results!

(It would be nice if you could tell us a little bit about the
configuration, i.e. what kind of sampling you used.)

[...]

That being said both netflow and sflow both under read by about 3%
when compared to snmp port counters, which we put to the conclusion
was broadcast traffic etc which the routers didn't see / flow.

That's one reason, but another reason would be that at least in Netflow
(but sFlow may be similar depending on how you use it), the reported
byte counts only include the sizes of the "L3" packets, i.e. starting at
the IP header, while the SNMP interface counters (ifInOctets etc.)
include L2 overhead such as Ethernet frame headers and such.

Nick_Hilliard3 · July 17, 2012, 4:37pm

sflow includes both figures.

Nick

Peter_Phaal · July 17, 2012, 5:16pm

In the case of sFlow, the collector determines how to report bytes.
The sFlow agent reports the size of the sampled layer 2 frame (along
with the first 128 bytes of the frame) and the collector can choose
whether to report L2 bytes, L3 bytes, L4 bytes etc. by subtracting the
sizes of the headers. It seems likely that the sFlow collector used in
the tests was reporting L3 bytes since the numbers were in agreement
with the numbers reported by NetFlow.

Peter

Peter_Phaal · September 20, 2012, 4:59pm

I am a puzzled by the orthodoxy that seems to prevail around the value
"flows" as a measure of network traffic in packet switched networks.

The following article contains some thoughts on flow oriented and
packet oriented measurements. Apologies to NANOG readers for the
simplistic analogies used to describe packet switching, the article is
also intended for server administrators and application developers who
often don't really know what happens when they write some bytes to a
TCP socket.

http://blog.sflow.com/2012/09/packets-and-flows.html

The article positions flows as a useful abstraction for characterizing
host and application performance, but as a poor fit for understanding
packet traffic and measuring the performance of packet switches and
routers. This isn't really an issue of sFlow vs. NetFlow/IPFIX etc.
Either protocol can be used to export both types of measurements; the
question is what types of measurement should be exported.

What do people think?

Peter

Nick_Hilliard3 · September 20, 2012, 6:10pm

Flows are good for measuring some things; raw packet sampling is good for
measuring others.

Decide on what you're trying to measure, then pick the best tool for the job.

Nick

Mikael_Abrahamsson · September 20, 2012, 6:21pm

What platforms actually do real unsampled netflow today, and do it well for multi-10gigabit worth of typical Internet traffic?

Most of the platforms I know of do sampled netflow at 1:100-1:1000 or so, and then I don't really see the fundamental difference in doing the flow analysis on the router itself (classic netflow) or doing the same but at the sFlow collector.

Benoit_Claise · September 21, 2012, 12:48pm

http://www.plixer.com/blog/netflow/netflow-vs-sflow-for-network-monitoring-and-security-the-final-say/

Regards, Benoit.