Tail Drops and TCP Slow Start

Murphy_Brennan · December 7, 2001, 5:12pm

If I have a DS3 or OC3 handling mounds and mounds of FTP download traffic,
what is the easiest way to detect if the bandwidth in use is falling into
a classic Tail Drop pattern? According to a Cisco book I am reading, the
bandwidth utilization should graph in a "sawtooth" pattern of gradual
increases in accordance with multiple machines gradually increasing
via TCP slow start and then sharp drops. Will this only happen when
the utilization approaches 100%. (maybe dumb question)

Should I be able to do a show buffers and see misses or is there some
better way to detect other than via graphing?

Also, suppose in examining my ftp traffic patterns that I noticed that it
spikes at 15minutes after the type of the hour, consistently, etc.
Could I create a timed access list to only kick in at that time?
Anyone have experience with WRED to handle ftp congestion?

I usually take these types of questions to Cisco but I thought I'd post
it to this list to get any generic real world advice.

sh buff
Buffer elements:
499 in free list (500 max allowed)
5713661 hits, 0 misses, 0 created

Public buffer pools:
Small buffers, 104 bytes (total 600, permanent 600):
     580 in free list (20 min, 1250 max allowed)
     2225528470 hits, 6 misses, 18 trims, 18 created
     0 failures (0 no memory)
Middle buffers, 600 bytes (total 450, permanent 450):
     448 in free list (10 min, 1000 max allowed)
     68259213 hits, 7 misses, 21 trims, 21 created
     0 failures (0 no memory)
Big buffers, 1524 bytes (total 450, permanent 450):
     449 in free list (5 min, 1500 max allowed)
     6807747 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)
VeryBig buffers, 4520 bytes (total 50, permanent 50):
     50 in free list (0 min, 1500 max allowed)
     46167681 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)
Large buffers, 5024 bytes (total 50, permanent 50):
     50 in free list (0 min, 150 max allowed)
     0 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)
Huge buffers, 18024 bytes (total 5, permanent 5):
     5 in free list (0 min, 65 max allowed)
     34 hits, 6 misses, 12 trims, 12 created
     0 failures (0 no memory)

Interface buffer pools:
IPC buffers, 4096 bytes (total 768, permanent 768):
     768 in free list (256 min, 2560 max allowed)
     769236774 hits, 0 fallbacks, 0 trims, 0 created
     0 failures (0 no memory)

Header pools:

John_Kristoff3 · December 10, 2001, 6:13pm

"Murphy, Brennan" wrote:

what is the easiest way to detect if the bandwidth in use is falling into
a classic Tail Drop pattern? According to a Cisco book I am reading, the
bandwidth utilization should graph in a "sawtooth" pattern of gradual
increases in accordance with multiple machines gradually increasing

It may be difficult to tell at any one router. Depending on where the
endpoints are, and I'm assuming they are scattered around the net, some
connections may lose packets at different times and places on the net.
If that one OC3 or DS3 link were the only thing that mattered, perhaps
it would be easier to tell.

via TCP slow start and then sharp drops. Will this only happen when
the utilization approaches 100%. (maybe dumb question)

Again I think it depends, but I would venture to say: probably.

Should I be able to do a show buffers and see misses or is there some
better way to detect other than via graphing?

I haven't studied those statistics on a Cisco, but they may tell you
something, but I would suspect it would be difficult to discern what
you're looking for based on them alone. Another parameter to monitor is
packet drops.

Also, suppose in examining my ftp traffic patterns that I noticed that it
spikes at 15minutes after the type of the hour, consistently, etc.
Could I create a timed access list to only kick in at that time?

I guess you could, but that seems to be a very short and narrow minded
approach to managing your capacity.

I usually take these types of questions to Cisco but I thought I'd post
it to this list to get any generic real world advice.

Based on your 'show buffers' output, Cisco may recommend some tuned
buffer settings for you.

John

Rodney_Dunn · December 10, 2001, 7:48pm

If I have a DS3 or OC3 handling mounds and mounds of FTP download traffic,
what is the easiest way to detect if the bandwidth in use is falling into
a classic Tail Drop pattern? According to a Cisco book I am reading, the
bandwidth utilization should graph in a "sawtooth" pattern of gradual
increases in accordance with multiple machines gradually increasing
via TCP slow start and then sharp drops. Will this only happen when
the utilization approaches 100%. (maybe dumb question)

It could be either/or. If the link is oversubscribed you may see what
you are describing via the 'bits/sec' counter in 'sh int'. Turn the timers
down via 'load-interval' to get a more granular timeframe. This link
could be at 50% utilization but the upstream link feeding it running at maxiumum
capacity so the 50% you see locally would experience the same behavior.

Should I be able to do a show buffers and see misses or is there some
better way to detect other than via graphing?

'sh buffers' really isn't what you want to look at. The 'bits/sec' counter
is more inline with the throughput on the interface. Turn the load interval
down for better granularity. If you are seeing buffer misses there are usually
other issues going on like very bursty traffic or other resource contention.
Typically buffer misses are seen more on LAN segments and I don't usually
recommend changing the defaults because most of the time there is some other
underlying issue that tuning the buffers is hacking around.

Also, suppose in examining my ftp traffic patterns that I noticed that it
spikes at 15minutes after the type of the hour, consistently, etc.
Could I create a timed access list to only kick in at that time?
Anyone have experience with WRED to handle ftp congestion?

It's more of a dynamic thing than that. WRED will smooth out the curve for
you if the link you are working on is the source of the problem. What were
you suggesting to do with the ACL anyway if it did kick in?

Say for example you see the rate vary on a DS3 from 30M to 45M in a sawtooth
manner. After applying WRED the high and low points of the peaks should
be less and monitoring the throughput on the interface should show it stay
consistently closer to linerate for that circuit.

I usually take these types of questions to Cisco but I thought I'd post
it to this list to get any generic real world advice.

This comes from lab testing and real world experience.

hth,
rodney

Jon_tex_Boone · December 11, 2001, 9:19am

"Murphy, Brennan" <Brennan_Murphy@NAI.com> writes:

If I have a DS3 or OC3 handling mounds and mounds of FTP download
traffic, what is the easiest way to detect if the bandwidth in use is
falling into a classic Tail Drop pattern? According to a Cisco book
I am reading, the bandwidth utilization should graph in a "sawtooth"
pattern of gradual increases in accordance with multiple machines
gradually increasing via TCP slow start and then sharp drops. Will
this only happen when the utilization approaches 100%. (maybe dumb
question)

    My rule #1 of troubleshooting performance problems is that most of
  the effect that you are seeing is due to a single problem somewhere
  along the path. Fixing that 1 item will yield a huge boost in
  performance. Fixing the rest of the problems will yield smaller,
  incremental boosts - although possibly still substantial.

    If the sawtooth in the graph represents a single session going into
  error recover (i.e. slow-start), then you *might* see this without
  necessarily seeing overwhelming evidence of it on the Cisco. If the
  sawtooth represents multiple sessions entering slow start nearly
  simultaneously (aka global synchronization), then it would be much
  easier to capture evidence of this via "show interface" stats.

    The bursty nature of data traffic is such that you can experience
  temporary congestion events that are "smoothed over" by queueing on
  the outbound interface. If these events are brief in duration and
  separated sufficiently in time, you will not necessarily see any
  indication of it via the "show interfaces" output. The shorter you
  have set your load-interval, the more likely you are to see the
  bandwidth increase represented by the burst (although 30 seconds is
  long enough to hide bursts on lightly loaded links).

    Note: it is difficult to get instantaneous measures of queueing
  reliably from a Cisco. Best is to check the output of "show
  controller cbus" (on 75xx series) which will show you the hardware
  details: look for the txacc and txlimit values. txlimit is the
  maximum # of items allowed in the transmit ring for that interface,
  while txacc == (txlimit - # items in the queue). This command needs
  to be issued repeatedly (and quickly) to get some sense of the queue
  size. This is *slightly* better than looking at the output of show
  interface, but still leaves a lot to be desired.

    Once your tx queue fills up completely (txacc == 0), all packets
  scheduled for transmission on that interface are dropped. As your
  bandwidth utilization approaches 100%, you will see more and more
  queueing take place, meaning that the chances of the next incoming
  packet being dropped increase significantly as the load goes up. I
  doubt that you would see global synchronization until the load on your
  link was very near to 100%, but I haven't done the traffic studies to
  prove it.

    RED attempts to prevent this situation by pro-actively dropping
  packets from the tx queue before it fills up. Using an
  exponentially-weighted average (to smooth out burstiness) of queue
  size to determine how likely it is that the current packet will be
  dropped, RED will tend to hit the "high bandwidth" users first,
  leaving the smaller users relatively unharmed. If RED does its job
  correctly, then you will not see global synchronization, although a
  graph of the throughput of a single FTP session that happened to be
  policed by RED would demonstrate the sawtooth.

Should I be able to do a show buffers and see misses or is there
some better way to detect other than via graphing?

    You are very unlikely to see this via "show buffers", as this is not
  likely to be caused by your device running out of memory if your cards
  are sized correctly. The only way to tell is by looking at the
  instantaneous measures of queue size if you are looking for a single
  session performance drop (in the face of near constant high background
  utilization). If you are seeing global synchronization, then you
  should see a *big* dip in your usage via "show interface" when set to
  30 second load-interval.

Anyone have experience with WRED to handle ftp congestion?

    RED is specifically designed to deal with this problem. WRED and
  dWRED do a decent job, but nothing can help you if you simply have
  more aggregate demand for bandwidth than your interface can support.
  And, neither WRED nor dWRED work for UDP applications/DoS attacks.

-jon