Link capacity upgrade threshold

Devangnp · August 30, 2009, 3:50am

Hi All,

I just wanted to know what is Link capacity upgrade threshold in terms of %
of link utilization? Just to get an idea...

thanks,
Devang Patel

Justin_Wilson1 · August 30, 2009, 3:54am

I consider a circuit nearing capacity at 80-85%. Depending on the circuit
we start the process of increasing capacity around 70%. There are almost
always telco issues, in-building issues, not enough physical ports on the
provider end, and other such things that slow you down.

Justin

herrin · August 30, 2009, 4:15am

If your 95th percentile utilization is at 80% capacity, it's time to
start planning the upgrade. If your 95th percentile utilization is at
95% it's time to finish the upgrade.

If you average or median utilizations are at 80% capacity then as
often as not it's time for your boss to fire you and replace you with
someone who can do the job.

Slight variations depending on the resource. Use absolute peak instead
of 95th percentile for modem bank utilization -- under normal
circumstances a modem bank should never ring busy. And a gig-e can run
a little closer to the edge (percentage-wise) before folks notice
slowness than a T1 can.

Regards,
Bill Herrin

Mikael_Abrahamsson · August 30, 2009, 5:23am

I now see why people at the IETF spoke in a way that "core network congestion" was something natural.

If your MRTG graph is showing 95% load in 5 minute average, you're most likely congesting/buffering at some time during that 5 minute interval. If this is acceptable or not in your network (it's not in mine) that's up to you.

Also, a gig link on a Cisco will do approx 93-94% of imix of a gig in the values presented via SNMP (around 930-940 megabit/s as seen in "show int") before it's full, because of IFG, ethernet header overhead etc.

So personally, I consider a gig link "in desperate need of upgrade" when it's showing around 850-880 megs of traffic in mrtg.

Bandy_Rush1 · August 30, 2009, 12:04pm

If your 95th percentile utilization is at 80% capacity, it's time to
start planning the upgrade.

s/80/60/

the normal snmp and other averaging methods *really* miss the bursts.

randy

Nick_Hilliard3 · August 30, 2009, 12:26pm

Definitely. For fun and giggles, I recently turned on 30 second polling on some kit and it turned up all sorts of interesting peculiarities that were completely blotted out in a 5 minute average.

In order to get a really good idea of what's going on at a microburst level, you would need to poll as often as it takes to fill the buffer of the port in question. This is not feasible in the general case, which is why we resort to hacks like QoS to make sure that when there is congestion, it is handled semi-sensibly.

There's a lot to the saying that QoS really means "Quantity of Service", because quality of service only ever becomes a problem if there is a shortfall in quantity.

Nick

Peter_Hicks · August 30, 2009, 12:34pm

Nick Hilliard wrote:

Definitely. For fun and giggles, I recently turned on 30 second polling on some kit and it turned up all sorts of interesting peculiarities that were completely blotted out in a 5 minute average.

Would RMON History and Alarms help? I've always considered rolling them out to some of my kit to catch microbursts.

Poggs

Tom_Sands · August 30, 2009, 2:06pm

If talking about just max capacity, I would agree with most of the statements of 80+% being in the right range, likely with a very fine line of when you actually start seeing a performance impact.

Operationally, at least in our network, I'd never run anything at that level. Providers that are redundant for each other don't normally operate above 40-45%, in order to accommodate a failure. Other links that have a backup, but don't actively load share, normally run up to about 60-70% before being upgraded. By the time the upgrade is complete, it could be close to 80%.

Shane_Ronan1 · August 30, 2009, 4:53pm

What system were you using to monitor link usage?

Shane

Nick_Hilliard3 · August 30, 2009, 5:02pm

yrtg

Nick

ianai · August 30, 2009, 5:03pm

I've heard this said many times. I've also seen 'sho int' say 950,000,000 bits/sec and not see packets get dropped. I was under the impression "show int" showed -every- byte leaving the interface. I could make an argument that IFG would not be included, but things like ethernet headers better be.

Does this change between IOS revisions, or hardware, or is it old info, or ... what?

Richard_A_Steenbegen · August 30, 2009, 6:46pm

Actually Cisco does count layer 2 header overhead in its snmp and show
int results, it is Juniper who does not (for most platforms at any rate)
due to their hw architecture. I did some tests regarding this a while
back on j-nsp, you'll see different results for different platforms and
depending on whether you're looking at the tx or rx. Also you'll see
different results for vlan overhead and the like, which can further
complicate things.

That said, "show int" is an epic disaster for a significantly large
percentage of the time. I've seen more bugs and false readings on that
thing than I can possibly count, so you really shouldn't rely on it for
rate readings. The problem is extra special bad on SVIs, where you might
see a reading that is 20% high or low from reality at any given second,
even on modern code. I'm not aware of any major issues detecting drops
though, so you should at least be able to detect them when they happen
(which isn't always at line rate). If you're on a 6500/7600 platform
running anything SXF+ try "show platform hardware capacity interface" to
look for interfaces with lots of drops globally.

Janos_Mohacsi · August 31, 2009, 7:41am

Agreed. Internet traffic is very burtsy. If you care your customer experience upgrade at 60-65% level. Especially if an interface is towards a customers is similar in bandwith of backbone links...

Best Regards,
Janos Mohacsi

Paul_Jakma1 · September 1, 2009, 10:55am

Or some enterprising vendor could start recording utilisation stats?

regards,

Aaron_J_Grier · September 1, 2009, 7:18pm

do any router vendors provide something akin to hardware latches to keep
track of highest buffer fill levels? poll as frequently/infrequently as
you like...

Holmes_David_A · September 1, 2009, 10:00pm

Another approach to collecting buffer utilization is to infer such
utilization from other variables. Active measurement of round trip times
(RTT), packet loss, and jitter on a link-by-link basis is a reliable way
of inferring interface queuing which leads to packet loss. A link that
runs with good values on all 3 measures (low RTT, little or no packet
loss, low jitter with small inter-packet arrival variation) can be
deemed not a candidate for bandwidth upgrades. The key to active
measurement is random measurement of the links so as to catch the
bursts. The BRIX active measurement product (now owned by EXFO) is a
good active measurement tool which randomizes probe data so as to, over
time, collect a randomized sample of link behavior.

Deepak_Jain · September 1, 2009, 10:29pm

do any router vendors provide something akin to hardware latches to
keep
track of highest buffer fill levels? poll as frequently/infrequently
as
you like...

Without getting into each permutation of a device's architecture, aren't buffer fills really just buffer drops? There are means to determine this. Lots of vendors have configurable buffer pools for inter-device traffic levels that record high water levels as well.

Deepak Jain
AiNET

Jack_Bates · September 1, 2009, 10:49pm

Holmes,David A wrote:

runs with good values on all 3 measures (low RTT, little or no packet
loss, low jitter with small inter-packet arrival variation) can be
deemed not a candidate for bandwidth upgrades. The key to active

Sounds great, unless you don't own the router on the other side of the link which is subject to icmp filtering has a loaded RE, etc. If you pass the traffic through the routers to a reliable server, you'll be monitoring multiple links/routers and not just a single one.

Jack