Comparing an old flow snapshot with some packet size data

NANOG Folks;

[Cross posted to big-internet list in a separate message.]

I'm hoping to get some comment and perhaps some more cisco flow stats or
sniffer stats from participants on this list on the state of flows on the
WAN Internet.

I did a little traffic comparison to see what I could glean from comparing
Sean Doran's flow stats posted last January to big-internet with an
unpublished analysis of a snippet of FIX West data, collected by Kim Claffy
at NLANR and analysed by Jerry Scharf of the CIX.

Back in January, Sean Doran and Dorian Kim posted some cisco IP flow stats
to the big-internet list. I haven't seen any since, but my big-internet mail
delivery seems spotty so I may have missed some messages. I'd be interested
in seeing some more flow stats, if anyone has been collecting more data.

Kim Claffy collected 15 minutes of traffic data from FIX West on 12 Feb 96
and Jerry Scharf analyzed the packet size distribution of that sample. I
used this data in a paper I recently finished on WAN protocol overhead.
Here's a portion of the packet size histogram from this data. Only packet
sizes that exceed 1% of the total traffic over this fifteen minute period
are listed, although Jerry's data contains counts of all the traffic that
Kim collected.

IP Payload Per cent of Packets
    40 30.55%
    41 1.51%
    44 3.04%
    72 4.10%
   185 2.72%
   296 1.48%
   552 22.29%
   576 3.59%
  1500 1.51%

All other packet sizes are less than 1% of the total, but as you can see that
adds up to about 29% of the traffic. There were almost no packets larger than
1500 bytes. And the 29% of other traffic was scattered over the interval up
to 1500 bytes. Jerry has a perl script that does a "what if" calculation on
what the WAN protocol overhead would be if all this traffic were HDLC or FR
or ATM, but so far he hasn't published anything.

The most interesting thing to me is that the most common traffic is probably
file transfer (whether HTTP or FTP), since the 552 bytes corresponds to a
TCP payload of 512 bytes, the largest power of two smaller than the IP
default MTU of 576. 30% of the traffic is a zero byte TCP payload
corresponding to all the connection setup and flow control traffic for all
those file transfers going on.

To recall what Sean originally posted in January:
This is from a fairly small-traffic router (,...

- --
IP Flow Switching Cache, 29999 active, 2769 inactive, 58411388 added
  1418487 lru, 22352334 timeout, 20923593 tcp fin, 2633568 invalidates
  5253815 dns, 5799592 resent syn, 0 counter wrap
  statistics cleared 141949 seconds ago

Protocol Total Flows Packets Bytes Packets Active(Sec) Idle(Sec)
-------- Flows /Sec /Flow /Pkt /Sec /Flow /Flow
TCP-Telnet 267034 1.8 233 75 439.3 182.6 36.5
TCP-FTP 1030837 7.2 10 78 76.6 22.6 43.7
TCP-FTPD 554967 3.9 164 345 641.3 52.7 15.7
TCP-WWW 32107858 226.2 15 247 3610.6 13.5 28.1
TCP-SMTP 3526231 24.8 13 159 323.1 10.2 23.6
TCP-X 9600 0.0 121 129 8.2 148.2 55.1
TCP-BGP 111096 0.7 14 77 11.5 229.2 61.1
TCP-other 5729172 40.3 70 220 2858.1 71.0 41.3
UDP-TFTP 2398 0.0 3 62 0.0 13.4 69.5
UDP-DNS 12875077 90.7 2 110 195.4 5.4 43.6
UDP-other 1489072 10.4 30 293 321.8 28.5 68.7
ICMP 665771 4.6 13 259 62.8 75.5 66.8
IGMP 5144 0.0 18 278 0.6 82.4 64.3
IPINIP 4450 0.0 933 377 29.2 166.7 61.0
IP-other 2693 0.0 11 136 0.2 80.8 65.7
Total: 58381400 411.3 20 227 8579.4 0.0 0.0

I would say that these two different sets of statistics are roughly consistent.
(Note that neither one represents a lot of data. The FIX West data is only
over 15 minutes and Sean's was over the major part of a day.)

Note the small number of packets per flow for WWW and FTP in Sean's data,
from 10 to 15 for each flow. I don't understand the 78 bytes/pkt for FTP,
[Robert Elz points out I'm looking at the FTP control channel. duh.] but the
WWW bytes/pkt of 247 [and the FTPD bytes/pkt of 345] are roughly consistent
with the packet distribution of 30% at 40 bytes and 22% at 552. If I average
40 and 552 I get 296, near to 247. It's rough, but sensible.

With all appropriate caveats about the limited sample size, the majority of
the TCP flows are WWW or FTP file transfers with a data payload of about 512
bytes (from the Claffy/Scharf data) and a total number of packets about 15
(from Sean's data). If I assume it takes 2 empty packets to open the
connection, 6 packets of data, 5 ACKs back, and 2 more empty packets to
close, then we have a file size of about 6*512 or 3100 bytes. [I could be
off on those counts, but not by much.]

Therefore, the average or most common Web/FTP file size transferred is about
3000 bytes. Simon Spero's trace analysis of an HTTP page load (available at
the W3C web site) is remarkably similar.

All in all, these three data sources (Claffy/Scharf, Doran, Spero) seem
relatively consistent. An overwhelming amount of the flows in the Internet
seem to be small file transfers, the TCP payload for this traffic is mostly
<=512 bytes, when it could easily be <=1460 bytes. And slow start adds at
least one extra RTT to each transfer that might be avoided if the payloads
were 1460 instead of 512.

Would there be any improvement if hosts used path MTU discovery, or would it
add up to about the same thing? I'm not sure whether you can do path MTU
discovery at the same time you are starting a TCP session or whether, as is
more likely, it is a separate process and uses an RTT or more before
starting the TCP session.

Now, is there more data to bolster or refute these conclusions? I've done
what I can with what I've found, but there just isn't much data to go on
anymore. But I think it is pretty consistent with the view that a lot of the
traffic is WWW TCP sessions of a few kilobytes. Would you agree? Would path
MTU discovery help or could we all just informally set the Internet default
MTU up to 1500 bytes [as John Hawkinson suggested on big-internet] and
suffer a few fragmented slow speed links. Are most PPP MTUs set at the
default 1500 or no?


(Please note that as far as I know neither Kim nor Jerry have published
anything from this data, so don't bug them for information or hold them
responsible in any way for what I did with it.)