Cisco 6509 SUP32 SNMP Meltdown With CatOS

Anyone have experience with Cisco 6509E/SUP32 crashing under heavy SNMP
polling load, causing high cpu utilization and 6509 lockup, requiring 6509
reboot? CatOS is deployed. Is the behavior any different with 6509 IOS?

David

You're being very coy about details here. I've not managed to actually
crash a 6500 running IOS by excessive snmp, but the more interesting
question is: how on earth are you running so many snmp queries that this is
happening?

E.g. a fully loaded 6509 with 384 ports would take ~3000 queries every
several minutes to perform full port diagnostic polling, and you'd want to
be doing this every couple of seconds to cause serious CPU impact. Are you
doing something like full DFZ or MAC table polling? Or IP accounting over
snmp? If you are, there are probably better ways of achieving what you're
trying to do.

Also, you may want to consider moving away from CatOS, as it's now
basically abandonware (or at least will formally be in Jan 2013), and
hasn't even seen maintenance updates in the last 4 years.

Nick

By any chance were you querying a Sup32 that had BGP full routes? That and other large tables can easily swamp the cpu on the Sup32.

This technote is based on IOS, and I don't know if the same facilities exist in CatOS, but as Nick mentioned, run, don't walk and convert to IOS. CatOS is dead.

http://www.cisco.com/en/US/tech/tk648/tk362/technologies_tech_note09186a00800948e6.shtml

E.g. a fully loaded 6509 with 384 ports would take ~3000 queries every
several minutes to perform full port diagnostic polling, and you'd want to
be doing this every couple of seconds to cause serious CPU impact. Are you
doing something like full DFZ or MAC table polling?

I bet you're close toward the end there. My guess is he's carrying a
large BGP feed and querying the ipRouteTable. The caveat below is for
IOS 12.4(20)T but equivalent issues surely exist for CatOS:

http://www.cisco.com/en/US/docs/ios/12_4t/release/notes/124TCAVS3.html#wp2057950

The killer in this case is not the SNMP traffic or anything resulting
directly from it, but the CPU overhead from constantly re-sorting the
ipRouteTable since that's generated from the FIB when CEF is enabled.
Workaround is to disable CEF (heh) or configure a MIB view that excludes
the ipRouteTable. This one bites an OpenNMS support customer a few
times a year -- happened again just today, in fact, at a shop that just
enabled topology discovery.

Also, you may want to consider moving away from CatOS, as it's now
basically abandonware (or at least will formally be in Jan 2013), and
hasn't even seen maintenance updates in the last 4 years.

What you said :slight_smile:

-jeff