Much of the advice I'm about to offer depends on scale, a lot. If you're
got a small network, you'd be fine with Cacti and similiar OSS toolkits.
If you're running more than a couple devices, MRTG will always look
attractive as long as you're not responsible for administrating it.
Please, don't try to scale it. There are lots of decent OSS/OTS toolkits
in the wild, take the time to find one you're comfortable with and you'll
save yourself some pain.
If you don't have anyone in-house with the cluepower to install, maintain,
and understand one of these, consider buying, with support. I've been
spending a lot of time elbow-deep in the Monolith platform. Even though I
can build all my own tools, finding a good organized, full featured
platform that I can hack to hell and back has been a pretty big boon.
If you're going to roll your own:
Even if only a contract basis, I'd recommend tapping the skillset of a
good DBA or a professional network toolsmith for advice on organizing the
sheer dearth of data involved here. Getting off on a good footing is the
single most important part of a task like this, to minimize how much time
you spend going back and redoing things because they didn't scale or
simply don't apply generally enough.
The hard part in building management tools from scratch is coming up with
a good schema for standardizing how you organize your data. A scheme that
seems to make great sense will be completely obliterated when it first
encounters SNMP based conventions, and heaven help you if you standardize
on the lingo of a single vendor. Take Cisco vs *, for example. The Cisco
standard way of describing things is decent enough, and that's fine until
you decide to bring another vendor into the relationship.
Another reason I support a good standard is having multiple hands in the
cookpot. If you have a team of people working on tools, versus one
dedicated snmp ninja, you can't have multiple designers. It just doesn't
work. This is an area of networking that needs sunlight at all times, to
keep evil things from growing in the code. Ugly hacks and stupid shell
scripts are all well and good, if they're in your personal toolbox bin
directory. You don't want them in your enterprise/production grade tools.
There's no telling what will happen three years down the road.
Organization of network data is usually pretty simple: Devices contain
interfaces and sessions, interfaces contain counters, states, and
descriptions.
Smart pollers won't bother with counters on a down/shut interfaces, and
will only check descriptions/labels every so often. Your only frequent
polls should be in/out/error counters. Use 64 bit counters, and account
for rollover even if it's less of an issue at 64 bits.
If you're a hosting company or have high traffic on servers, the ARP cache
and the bridge tables are your friend.
The single most important piece of advice I can offer when building your
own tools: Never poll the routing table with SNMP. Ever. Any OTS tool that
says it can, as a feature, well, it's a witch, burn it. (If you need
routes, build something that can speak BGP. It's not that hard, last time
I did it was maybe 50 lines of code plus perl modules.)
- billn