MRTG in Fourier Space

Maybe a slightly off topic math-geek kind of question to
take time out from the ARIN/death-of-IPv4/IPv6-evangalist
thread of the week.

Has anyone found any value in examining network utilization
numbers with Fourier analyses? After staring at pretty
MRTG graphs for a bit too long today, I'm wondering if
there are some interesting periodic characteristics in the
data that could be easily teased out beyond, "Well, the
diurnal fluctuations are obvious, but looks like we may
have some hourly traffic spikes in there too. And maybe
some of those are bigger every fourth hour."

A quick Google search turned up nothing at all. With many
EE-types who find their way into network operations and
wannabe-EEs already there, I found that maybe a little
surprising. I know the EEs love Fourier transforms.

Hi Crist,

Has anyone found any value in examining network utilization
numbers with Fourier analyses? After staring at pretty
MRTG graphs for a bit too long today, I'm wondering if
there are some interesting periodic characteristics in the
data that could be easily teased out beyond, "Well, the
diurnal fluctuations are obvious, but looks like we may
have some hourly traffic spikes in there too. And maybe
some of those are bigger every fourth hour."

A quick Google search turned up nothing at all.

Such techniques are used in the are of network anomaly detection.
For instance, a search for "network anomaly detection" at
scholar.google.com will yield very many results.

Our 2002 paper, "A Signal Analysis of Network Traffic Anomalies"
[ACM SIGCOMM Internet Measurement Workshop 2002, Barford, et al.],
is one such work. We mention that we use wavelet analysis rather
than Fourier analysis because wavelet/framelet analysis is able
to localize events both in the frequency and time domains, whereas
Fourier analysis would localize the events only in frequency, so an
iterative approach (with varying intervals of time) would be necessary.
In general, this is the reason why Fourier analysis has not been a
common technique used in network anomaly detection.

That work used data stored in RRD files at five minute intervals.
Our subsequent work used data stored at one second intervals, again
in RRD files.

Dave

Gents,

Hi Crist,

Has anyone found any value in examining network utilization
numbers with Fourier analyses? After staring at pretty

In short, yup!

there are some interesting periodic characteristics in the
data that could be easily teased out beyond, "Well, the

Indeed, there are. Interesting things emerge in frequency (or phase)
space - bits/sec, packets/sec, and ave size, etc. - all have new
meaning, often revealing subtle details otherwise missed. The UW paper
[Barford/Plonka et. al] is one of my favories and often referenced in
other publications.

Along similar lines, I presented a lightning talk at nanog that
demonstrates using windowed Ft's (mostly Gaussian or Hamming) in
three-axis graphs (i.e. 'waterfalls') available in common tools
(buadline, sigview, labview, etc) for characterizing round trip times
through various network queues and queue states. Unexpectedly,
interesting details regarding host IP stacks and OS scheduler behavior
became visible.

Find the talk slides and video here (look for 'kapela'):

http://www.nanog.org/meetings/nanog37/agenda.php

A quick Google search turned up nothing at all.

Signal analysis, sadly, isn't as fun as going shopping or posting to
webhosting talk, etc. so you won't likely find much there.

Such techniques are used in the are of network anomaly detection.
For instance, a search for "network anomaly detection" at
scholar.google.com will yield very many results.

I would also mention citeseer (http://citeseer.ist.psu.edu/) and ieee
explore (http://ieeexplore.ieee.org) - there's lots of related
application of Ft's and wavelet/fir filters in various disciplines,
all of which can apply to the analysis of time-series data.

is one such work. We mention that we use wavelet analysis rather
than Fourier analysis because wavelet/framelet analysis is able
to localize events both in the frequency and time domains, whereas
Fourier analysis would localize the events only in frequency, so an
iterative approach (with varying intervals of time) would be necessary.
In general, this is the reason why Fourier analysis has not been a
common technique used in network anomaly detection.

I want to suggest that time windowed Ft might be a reasonable middle
ground, certainly for Crist's case. Naturally, the trade-offs will be
in frequency accuracy (ie. longer window) vs. temporal accuracy (ie.
short window). Another solution for your needs might be cascaded FIR
"bandpass" filters, but again, you're subject to time/frequency error
trade-offs as related a filter's bandwidth.

While you're at it, consider processing your time series data into
histogram stacks, or nested histograms. I haven't specifically seen a
paper covering this, but another UW gent (DW, are you reading this?)
used to process their 30 second ifmib data into a raw .ps file, and
printed this out weekly/daily. The trends visible here were quite
interesting, but I don't think much further work was done to see if
anything super-interesting was more/less visible in this form than
traditional ones.

-Tk

Forgot to mention one point - since packets/bits/etc data is more
monotonic than not (math wizards, please debate/chime in) and since
it's not a 'signal' in the continuous sense, you might find value in
differentially filtering the input data *before* FT or wavelet
processing. This would serve to remove the weird-looking "DC" offset
in the output simply by creating a semi-even distribution of both
positive and negative input sample values.

-Tk

As IP traffic is assumed to be self-similar, my EE origins tell me to
look for parameters that could measure it from stochastic process
theory. On a Google search this paper sounded interesting:
http://www.sparc.uni-mb.si/OPNET/PDF/IWSSIP2007Fras.pdf
(...) We estimated
the Hurst parameter (H) for the arrival process, and the
fitted distributions for the measured data (packet size and
inter-arrival processes). Using the autocorrelation function of
the process, we determined long-range or short-range
dependence.
distribution and its parameters. The Hurst parameter was
estimated using three graphical methods (variance, R/S,
and periodogram methods). Distribution and its parameters
were estimated using fitting tools. (...)

Doing it in RRD-time seems like a challenge, though.

It might be easier to plot fractals from the data source if your
target audience is made of humans, because they will spot patterns
real fast with much less number crunching.

Rubens