The other day we started using Cisco netflow accounting software
together with IP flow export feature of recent Cisco IOS versions.
What we found was that although we put a lot of protocols in the
nfknown.protocols file of the accounting software (everywthing we
could find in the /etc/services file of Solaris and Linux), there
is still a lot of traffic under TCP-Other and UDP-Other. This
indicated that traffic is going over our network using ports that
the software doesn't know about.
This could for example be Real-audio, Cuseeme, Pointcast, Backweb
etc traffic. Unfortunately, I don't have a list of these newer
protocols together with their port numbers. Has anyone compiled
such a list? There's the Assigned numbers RFC but the last version
of it is RFC 1700 of October 1994.
I'm using my own netflow collection program (perl) and found out that
there are alot of random port above 1023 that I don't think
coresspond to iana assign number.
When analyzing Netflow accounting data, we also found much traffic
with UDP and TCP port numbers that couldn't be attributed to specific
applications.
One important contributor to this are FTP transfers that don't use the
FTP-data port (TCP 20). I assume that this happens when a client uses
PASV (passive-mode FTP). This accounts for the majority of "unknown
TCP ports" traffic between SWITCH and the rest of the Internet.
Since we have our own software to process Netflow accounting packets,
I added the following heuristics to the program:
* When we see a TCP flow with unknown TCP port numbers, count it as
"unknown TCP" traffic for now, but make a note containing the IP
source and destination address, start/end time, and packet/byte
counts (well we only count bytes not packets).
* When we see a TCP flow with either port 21 (FTP control), then we
look whether we find notes about that particular pair of
source/destination IP address that correspond to the lifetime of the
FTP-control flow. If so, we assume that those "unknown TCP" flows
were actually FTP data transfers, and reclassify them as such.
The cost of this consists of storing some data about the "TCP unknown"
flows for 30 minutes (somewhat more depending on the time slicing of
your traffic counting) and doing a table lookup whenever FTP control
flows are seen.
We found that the number of such flows is sufficiently low, and the
amount of traffic they represent sufficiently high, for this to work
and be worth the effort.
An important win is that the remaining "unknown TCP" traffic can be
investigated much more efficiently once you get rid of this FTP
traffic.