monitoring tools

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

It has been a while since I have had to seriously think about network/system/application monitoring and now I have got to look at it. Can anyone point me towards:

1. Serious documents on monitoring (i.e. not vendor whitepapers)
2. Open Source Tools that you use or would recommend (I know the obvious smokeping, mrtg, nagios).

Thanks,

- ---> Phil

Nesser, Phil (nesser) writes:

It has been a while since I have had to seriously think about network/system/application monitoring and now I have got to look at it. Can anyone point me towards:

1. Serious documents on monitoring (i.e. not vendor whitepapers)

  Hi Phil,

  There's lots of different papers out there -- define serious.
  Is an online column comparing monitoring systems serious enough ?
  What focus ? Best practices ? Agent vs SNMP based, etc... Topics
  are varied.

2. Open Source Tools that you use or would recommend (I know the obvious smokeping, mrtg, nagios).

  That can be a long thread as well...
  Nagios, OpenNMS, Zabbix, Hyperic, ZenOSS - for the application/
  service/server/network monitoring, and Cacti, Smokeping, NFsen
  for capacity/availability monitoring.

  We used Nagios and co. until a few years ago, when we figured it
  wouldn't scale for large networks. Then we wrote our own :slight_smile:

  Cheers,
  Phil

Stephen Stuart and Joe Abley did a tutorial at NANOG26 called "Managing IP Network with Free Software". It covers more than just monitoring, and is great if you aren't going to just roll your own...

The PDF is here: http://www.nanog.org/mtg-0210/ppt/stephen.pdf

I cannot remember if they mentioned it, but incase they didn't, you should also include NAV (Network Administration Visualized) -- http://metanav.uninett.no/

W

It has been a while since I have had to seriously think about network/system/application monitoring and now I have got to look at it. Can anyone point me towards:

1. Serious documents on monitoring (i.e. not vendor whitepapers)

I think there have been several sets of slides presented at previous NANOG meetings that may be of interest, but I'll have to locate specific URLs.

2. Open Source Tools that you use or would recommend (I know the obvious smokeping, mrtg, nagios).

As much as I hate to give a wishy-washy answer like "it depends", in this case, that's a reasonable start. What tools you use would depend on many factors, such as:

* hardware and OS platforms that are realistic for your organization
   Put another way, if your IT or net mgmt organization standardizes
   on some flavor of Windows as part of a regular server build, it
   might not make sense to use tools that require Linux, *BSD, etc,
   unless you have the people and processes to handle that. Since
   you mentioned tools like nagios and MRTG, I'm assuming you're
   working in the unix/Linux/*BSD world, but you know what they say
   about assumptions :slight_smile:
* goals and metrics
   What information do you want to get out of your monitoring setup?
   Do you need to produce regular reports from your NM tools?
   Do they need to integrate with tools you already use?
   Do you want the tools to automatically trigger certain actions?
     if X consecutive pings to $router_ip fail, send out a
       page, email the NOC, etc...
   What data do you want to collect from your network devices?
     SNMP traps? Netflow records? Syslog messages? RMON?
   Do you need to visualize the data, i.e. generate usage graphs,
     top-talker scoreboards, etc?
   Do you need to store the output in a central SQL database so
     other apps can work with it, do reports, etc?

This is by no means an all-inclusive list, but I think it covers some of the important points.

jms

Several NANOG presentations are available via VoD and preso files which discuss this subject, check the archives at nanog.org. Besides the usual SNMP instrumentation, I would recommend taking a look at NetFlow and starting with an open-source tool like nfsen/nfdump.

hi phil. long time.

so, before opening my big mouth, the From: line makes me first ask
  o what is the scale of what you are trying to measure/monitor?
  o what kinds of parameters are you trying to measure/monitor?
  o what kinds of reporting/alerts are you seeking?
  o is this snmp kind of stuff, or far more?

randy

Randy Bush (randy) writes:

hi phil. long time.

so, before opening my big mouth, the From: line makes me first ask

  And, are you limited to monitoring, or are you actually thinking
  about network management as well ? (things like Rancid, RT, incident/
  event management, configuration management/change management come to mind).

> 2. Open Source Tools that you use or would recommend (I know the
> obvious smokeping, mrtg, nagios).

As mentioned, you can get alot of network information from netflow. There are several open-source options. One such for netflow collection/analysis is 'flow-tools' with 'FlowViewer'.

http://www.splintered.net/sw/flow-tools (original development)
http://code.google.com/p/flow-tools (active fork)

http://ensight.eos.nasa.gov/FlowViewer

Joe

I don't see netdisco mentioned in this space very much, but I
recommend it for the "what is plugged into what" question - both in an
enterprise environment ("where is this misbehaving MAC address?") and
a data center ("which port was that server plugged into on the
switch?").

  Bill

Bill Fenner (fenner) writes:

> 2. Open Source Tools that you use or would recommend (I know the obvious smokeping, mrtg, nagios).

I don't see netdisco mentioned in this space very much, but I
recommend it for the "what is plugged into what" question - both in an
enterprise environment ("where is this misbehaving MAC address?") and
a data center ("which port was that server plugged into on the
switch?").

  Some of the Metanav features actually do this, but yes
  NetDisco is quite useful (especially its Perl modules are invaluable
  when doing configuration management across the bogos^H^Hvariety
  of Cisco equipment out there).

Anecdotal evidence of the usefulness of such tools:

The environment was a pair of cat6509s running multiple gigabit
etherchannel crossconnects, with lots of gigabit and 100mbit servers on
either side, talking back and forth to each other, or up the stick to the
egress routers. I was building an inventory tool to help me track down
mislabelled or unlabelled ports, to clean up and audit the device
inventory. I notice one lonely 100 meg port bridging a large number of MAC
addresses that were homed on the *other* 6509. I mentioned it as odd in
passing to the network engineer, and was advised that my tool was probably
broken. I took it under advisement and when on about my business.

A few hours later, I discovered that I could make the sysadmins and
network engineers run around asking each other what was broken by scp'ing
a huge file between two databases on opposite switches. When I stopped my
transfer, they stopped running. Start it again, panic at the disco. Very
refreshing.

I brought up the 100 meg port bridging all those addresses, and lo an
behold, a misconfigured load balancer had somehow suborned the multi-gig
etherchannel crossconnects and was bridging everything in the one big vlan
that all the servers sat in. (That's a different story.)

- billn