Open source Netflow analysis for monitoring AS-to-AS traffic

Brian_Knight · March 27, 2024, 12:04am

What’s presently the most commonly used open source toolset for monitoring AS-to-AS traffic?

I want to see with which ASes I am exchanging the most traffic across my transits and IX links. I want to look for opportunities to peer so I can better sell expansion of peering to upper management.

Our routers are mostly $VENDOR_C_XR so Netflow support is key.

In the past, I’ve used AS-Stats for this purpose. However, it is particularly CPU and disk IO intensive. Also, it has not been actively maintained since 2017.

InfluxDB wants to sell me on Telegraf + InfluxDB + Chronograf + Kapacitor, but I can’t find any clear guide on what hardware I would need for that, never mind how to set up the software. It does appear to have an open source option, however.

pmacct seems to be good at gathering Netflow, but doesn’t seem to analyze data. I don’t see any concise howto guides for setting this up for my purpose, however.

I’m aware Kentik does this very well, but I have no budget at the moment, my testing window is longer than the 30 day trial, and we are not prepared to share our Netflow data with a third party.

Elastiflow appears to have been open source at one time in the past, but no longer. Since it too appears to be hosted, I have the same objections as I do with Kentik above.

On-list and off-list replies are welcome.

Thanks,

-Brian

Andrew_Hoyos1 · March 27, 2024, 12:55am

Brian,

Take a peek at Akvorado - https://github.com/akvorado/akvorado
We recently set up a lab instance, and seems to check the boxes below.

Pascal_Masha · March 27, 2024, 4:54am

Interested in responses to this as well. Perhaps something informative that I can also adopt for zero $$ would be amazing. In case you do get pointers off-list kindly share- we can walk the journey together and compare notes

jstitt · March 27, 2024, 12:46am

I’m using Alvarado for netflow and I’m pretty happy with it. Seeing it recommended more frequently on Reddit and elsewhere lately too.

[

akvorado/akvorado: Flow collector, enricher and visualizer
github.com

](GitHub - akvorado/akvorado: Flow collector, enricher and visualizer)

John Stitt

Marinos_Dimolianis · March 27, 2024, 8:09am

Brian,

I have used Akvorado in an environment with ~80G of traffic and I was super happy.

It can be easily set via a docker-compose file and amongst its key benefits is the user-friendly UI that allows you to gain insight into your network traffic.

There is also a demo instance available to find out what to expect: https://demo.akvorado.net/

My only “concern” was that it did not provide an API for consuming data externally.

Marinos

Joe_Loiacono1 · March 27, 2024, 6:07pm

Try FlowViewer http://flowviewer.net

Free, complete, graphical netflow analysis tool.

Developed for NASA. Runs on top of SiLK, a powerful open-source netflow capture and analysis tool developed by Carnegie-Mellon for DoD. Supports IPFIX, netflow v5, sflow, IPv6. Text reports, graphing and long-term tracking via graphs. Automatic storage control capability.

In general, as you probably know, it’s amazing what you can get from netflow.

Best,

Joe

Peter_Phaal · March 27, 2024, 6:58pm

Brian, you may want to see if your routers support sFlow (vendors have added the feature over the last few years).

In particular, see if it includes support for the sFlow extended_gateway structure:

/* Extended Gateway Data /
/ opaque = flow_data; enterprise = 0; format = 1003 */

struct extended_gateway {
next_hop nexthop; /* Address of the border router that should
be used for the destination network /
unsigned int as; / Autonomous system number of router /
unsigned int src_as; / Autonomous system number of source /
unsigned int src_peer_as; / Autonomous system number of source peer /
as_path_type dst_as_path<>; / Autonomous system path to the destination /
unsigned int communities<>; / Communities associated with this route /
unsigned int localpref; / LocalPref associated with this route */
}

The dst_as_path field is particularly valuable since it allows you to see who your customers are peering with.

While not a complete solution, you might want to take a look at sflowtool, https://github.com/sflow/sflowtool, to decode the sFlow records and convert them to JSON. It’s not hard to write a Python script to calculate BGP peering metrics and push the results into a time series database (Prometheus, InfluxDB, etc) and build dashboards in Grafana. The following article gives a few examples:

https://blog.sflow.com/2018/12/sflow-to-json.html

Saku_Ytti1 · March 28, 2024, 6:03am

Why is this a solution, what does it solve for OP? Why is it
meaningful what the wire-format of the records are? I read OP's
question at a much higher level, about how to interact and reason
about data, rather than how to emit it.

Ultimately sFlow is a perfect subset of IPFIX, when you run IPFIX
without caching you get the functional equivalent of sFlow (there is
an IPFIX entity for emitting n bytes from frame as well as data).

Tore_Anderson1 · March 28, 2024, 10:02am

What's presently the most commonly used open source toolset for monitoring AS-to-AS traffic?

I want to see with which ASes I am exchanging the most traffic across my transits and IX links. I want to look for opportunities to peer so I can better sell expansion of peering to upper management.

…

pmacct seems to be good at gathering Netflow, but doesn't seem to analyze data. I don't see any concise howto guides for setting this up for my purpose, however.

pmacct will do what you want and it's not particularly difficult to set it up.

For example, you can aggregate data into a database using:

aggregate[in]: src_as,src_net,src_mask
aggregate[out]: dst_as,dst_net,dst_mask

Now you can issue SQL queries that tell you which ASes or prefixes you send/receive the most bits or packets to/from.

Tore

Nick_Plunkett · March 27, 2024, 11:38pm

In the same vein, if you can get your devices exporting sFlow, or for others reading that do have sFlow capable devices: the sFlow-RT team has built ready to deploy, all in one docker containers using Grafana and Prometheus that you can stand up within minutes to start visualizing and easily querying/processing sFlow data from your routers, with no prior experience with the underlying software needed.

https://blog.sflow.com/2023/07/deploy-real-time-network-dashboards.html
https://github.com/sflow-rt/prometheus-grafana

Peter_Phaal · March 28, 2024, 3:49pm

I hope my comments were useful. I was trying to raise awareness that bgp as-path information is an option and might be helpful in addressing Brian’s requirements, “I want to see with which ASes I am exchanging the most traffic across my transits and IX links. I want to look for opportunities to peer so I can better sell expansion of peering to upper management.”

Possible reports that could be of interest are:

destination AS numbers by traffic volume and as-path length
destination AS numbers by traffic volume and second to last AS in path (AS of peering with destination).
traffic volume by transit AS
traffic volume passing through AS allow / deny ASN list.

What other types of report might be interesting?

sFlow was mentioned because I believe Brian’s routers support the feature and may well export the as-path data directly via sFlow (I am not aware that it is a feature widely supported in vendor NetFlow/IPFIX implementations?). However, some of the tools mentioned (pmacct, Kentik, Akvorado) can enrich flow data downstream (through BGP / BMP peering session with router) if it isn’t present in the sFlow/Netflow/IPFIX records, although downstream enrichment does add a level of operational complexity.

Saku_Ytti1 · March 28, 2024, 5:48pm

Hey,

sFlow was mentioned because I believe Brian's routers support the feature and may well export the as-path data directly via sFlow (I am not aware that it is a feature widely supported in vendor NetFlow/IPFIX implementations?).

Exporting AS information is wire-format agnostic feature, if it's
supported or not, it can equally be injected into sFlow, NetflowV5
(src and dst only), NetflowV9 and IPFIX. The cost is that you need to
program in FIB entries the information, so that the information
becomes available at look-up time for record creation.

In OP's case (IOS-XR) this means enabling 'attribute-download' for
BGP, and I believe IOS-XR will never download any other asn but src
and dst, therefore full information cannot be injected into any
emitted wire-format.

Tom_Beecher · March 28, 2024, 6:35pm

Yeah, cost to implement dst_as_path lookups far outweighs the usefulness IMO. If you really want that it’s much better to get it via BMP. ( Same with communities and localpref in the extended gateway definition of sflow. )

Fundamentally I’ve always disagreed with how sFlow aggregates flow data with network state data. IMO you collect the two things separately, and join them off-device should you need to for analysis.

Peter_Phaal · March 28, 2024, 6:35pm

The documentation for IOS-XR suggests that enabling extended-router in the sFlow configuration should export “Autonomous system path to the destination”, at least on the 8000 series routers:

https://www.cisco.com/c/en/us/td/docs/iosxr/cisco8000/netflow/command/reference/b-netflow-cr-cisco8k/m-sflow-commands.html

I couldn’t find a similar option in the NetFlow/IPFIX configuration guide, but I might have missed it.

Brian_Knight · March 29, 2024, 12:00am

Thanks to all who took the time to comment and make suggestions.

To summarize the private messages, one respondent suggested Argus as a collector. Another mentioned that they are still using AS-Stats.

I’m drawn to Akvorado. I like the self-contained nature of the application. NF collector, database, and modern web GUI are all bundled in one docker container. The full-featured demo is fantastic. That the app can enrich the Netflow data with BMP is an added bonus.

The best part is, the GUI has the report viz I need, and it is actually the default visualization in the demo. It also has the graph types that I didn’t know I needed, like the Sankey graph.

FlowViewer looks interesting as well. I suspect getting the reports right may take some time, given the amount of GUI filtering options.

pmacct and Argus seem to be capable tools that have been around for a long time, but I haven’t seen a concise stack building guide to get Netflow data into a good GUI using these. Looks like there are some older Docker images available for both. I could write my own SQL or roll my own stack, but I’d much rather spend my time on other things.

I appreciate the conversation around sFlow. I actually wasn’t aware that XR supported it. AS path probably doesn’t add a whole lot of value given that I’m focused on flows across our IP transit circuits. I’m able to determine my next AS hop simply by looking at the flow’s associated tuple of (flow exporter, interface). I can use other tools like RouteViews or RIPE’s RIS to determine the destination AS’s upstreams if needed. The rest of the path is probably not too helpful for determining peering opportunities.

I think I’m going to get Akvorado running in my environment. If that doesn’t pan out, I’ll likely go back to AS-Stats.

Can those running Akvorado comment on their system specs? The only spec I’ve seen is a mention in this blog post: “Akvorado is performant enough to handle 100 000 flows per second with 64 GB of RAM and 24 vCPU. With 2 TB of disk, you should expect to keep data for a few years.”

Thanks again all,

-Brian

Nick_Hilliard3 · March 29, 2024, 12:15am

"can aggregate" rather than "aggregates" - this is implementation dependent and most implementations don't bother with it.

Overall, sflow has one major advantage over netflow/ipfix, namely that it's a stateless sampling mechanism. Once you have hardware that can reliably pick out one in N frames, the rest of the protocol is straightforward enough, which means that it's cheap to implement in hardware. If you're ok with 1. sampling and 2. the set of data that sflow provides, then sflow is great.

Netflow / ipfix, on the other hand, assumes that it's learning about flow state. For this, you need both a flow lookup mechanism and flow storage memory. Usually the flow lookup mechanism is implemented using the same technology as the packet forwarding lookup mechanism due to performance requirements, i.e. expensive. Similarly, the storage mechanism needs to be fast, which often precludes being large. Often both the lookup and storage mechanism are linked, e.g. tcam.

Obviously, not all netflow/ipfix implementations implement flow state, but most do; some implement stateless sampling ala sflow. Also many netflow implementations don't export mac address information, which limits usefulness in certain situations. But this is an implementation gap rather than a protocol weakness.

Tools should be chosen to fit the job. There are plenty of situations where sflow is ideal. There are others where netflow is preferable.

Nick

Saku_Ytti1 · March 29, 2024, 6:09am

Hope this clarifies.

------- Netflow Configuration Guide for Cisco ASR 9000 Series Routers, IOS XR Release 7.9.x - Configuring NetFlow [Cisco ASR 9000 Series Aggregation Services Routers] - Cisco
Use the record ipv4 [peer-as] command to record peer AS. Here, you
collect and export the peer AS numbers.
Note
Ensure that the bgp attribute-download command is configured. Else, no
AS is collected when the record ipv4 or record ipv4 peer-as command is
configured.

Saku_Ytti1 · March 29, 2024, 6:17am

This seems like a long-winded way of saying, sFlow is a perfect subset of IPFIX.

We will increasingly see IPFIX implementations omit state, because
states don't do anything anymore in high-volume networks, you will
only ever create flow in cache, then delay exporting the information
for some seconds, but the flow is never hit twice, therefore paying
massive cost for caching, without getting anything out of it. Anyone
who actually needs caching, will have to buy specialised devices, as
it will no longer be economical for peering-routers to offer such
memory bandwidth and cache sizes that caches will actually do
something.
In a particular network we tried 1:5000 and 1:500 and in both cases
flow records were 1 packet long, at which point we hit record export
policer limit, and couldn't determine at which sampling rate we will
start to see cache being useful.

I've wondered for a long time, what would a graph look like, where you
graph sampling ratio and percentage of flows observed, it will be
linear to very high sampling ratios, but eventually it will start to
taper off, I just don't have any intuitive idea when. And I don't
think anyone really knows what ratio of flows they are observing in
the sFlow/IPFIX, if you keep sampling ratio static over a period of
time, say decade, you will continuously reduce your resolution, seeing
a smaller percentage of flows. This worries me a lot, because
statistician would say that you need this share of volume or this
share of flows if you want to use the data like this with this
confidence, therefore if we formally think the problem, we should
constantly adjust our sampling ratios to fit our statistical model to
keep same promises about data quality.

Steven_Bakker · March 29, 2024, 6:07pm

Precisely. From my corner of the industry, my use case for flow data is extremely limited: I need (sampled) frame information: src-mac, dst-mac, qtag, ethernet protocol, framesize, sample rate. sFlow provides that in every sample, in a straighforward manner. (Never mind that the vendor we use does interesting things with the way they sample.)

IPFIX, by comparison, is a nightmare: to understand the data records, you need to have seen (and stored) the corresponding data template first. Those records will contain most of the information I need, except the sampling rate, which comes from an options data record… which you first have to match to an options template. Then, the sampling rate may not be present, but the sampling probability can be. Slightly different semantics. So that’s four types of records your collector may receive. There is also at least one vendor that believes it’s perfectly fine to export those over different transport sessions (read: different UDP source ports), which makes it really hard to do load balancing on the receiving side.

To top it off, both the sFlow and IPFIX specs are sufficiently vague about the meaning of the “frame size”, so vendors can implement whatever they want (include/exclude padding, include/exclude FCS). This implies that you shouldn’t trust these fields.

Ah, well.

– Steven

Peter_Phaal · March 29, 2024, 8:08pm

The sFlow frame_length field isn’t intended to be vague. If you are seeing non-conforming sFlow implementations, please raise the issue with the vendor so they can fix the issue.

Verifying that the frame_length and stripped fields are correctly implemented is one of the tests performed by the sFlow Test tool and running the tool can be helpful in persuading a vendor that they are out of compliance:

https://blog.sflow.com/2015/11/sflow-test.html

The following language is included in the sFlow Version 5 spec, https://sflow.org/sflow_version_5.txt.

/* Raw Packet Header /
/ opaque = flow_data; enterprise = 0; format = 1 */

struct sampled_header {
header_protocol protocol; /* Format of sampled header /
unsigned int frame_length; / Original length of packet before
sampling.
Note: For a layer 2 header_protocol,
length is total number of octets
of data received on the network
(excluding framing bits but
including FCS octets).
Hardware limitations may
prevent an exact reporting
of the underlying frame length,
but an agent should attempt to
be as accurate as possible. Any
octets added to the frame_length
to compensate for encapsulations
removed by the underlying hardware
must also be added to the stripped
count. */

v1.00 sFlow.org [Page 35]

FINAL sFlow Version 5 July 2004

unsigned int stripped; /* The number of octets removed from
the packet before extracting the
header<> octets. Trailing encapsulation
data corresponding to any leading
encapsulations that were stripped must
also be stripped. Trailing encapsulation
data for the outermost protocol layer
included in the sampled header must be
stripped.

In the case of a non-encapsulated 802.3
packet stripped >= 4 since VLAN tag
information might have been stripped off
in addition to the FCS.

Outer encapsulations that are ambiguous,
or not one of the standard header_protocol
must be stripped. /
opaque header<>; / Header bytes */
}