Backbone Monitoring Tools

Hi All,

I want a simple backbone monitor for my 5 datacenters. My "backbone"
consists of redundant IPSEC/GRE tunnnels.

At the very least I want to ping, traceroute and transfer a small file
every few minutes over all IPSEC links. I am sure there are products
that do this already, but I am having a hard time finding any.

The display format should be noc-friendly. A basic grid with green/red
status indicators at the least. Geographical maps a plus.

Do most of you use a home grown tool for this monitoring and alerting ?

Regards,
Ashe

mrtg…

If you can't say something useful..

Assuming you're looking for basic latency and availability monitoring, with alerts:
http://www.smokeping.org

- billn

I want a simple backbone monitor for my 5 datacenters. My "backbone"
consists of redundant IPSEC/GRE tunnnels.

At the very least I want to ping, traceroute and transfer a small file
every few minutes over all IPSEC links. I am sure there are products
that do this already, but I am having a hard time finding any.

autostatus, mrtg, cricket, hobbitmon, cacti, nagios, big brother,are
all good options, find these and more on Freshmeat:
     http://freshmeat.net/browse/152/

The display format should be noc-friendly. A basic grid with green/red
status indicators at the least. Geographical maps a plus.

For noc-friendly latency reporting, look at SmokePing. For deeper
tests of HTTP page loads and file transfers, HobbitMon could be what
you're looking for.

I'm not aware of any freeware products which draw nice geographic
maps, we have OpenView for that. A few years ago I started work
towards generating dynamic network status graphics with Graphviz, but
management decided it would be easier and faster to buy OpenView
licenses.

Do most of you use a home grown tool for this monitoring and alerting ?

I've found that I always end up writing some custom code, but you
could do worse than to build on top of one of the open-source
monitoring tools.

For example, I use a highly customized version of AutoStatus for
up/down alerting, primarily because I like how it handles
dependencies.

Kevin

I have had a decent amount of success with Nagios. It is not trivial to
setup, but once it is up and running, it has always handled our
dependencies and such very well. Additionally, because it calls external
programs to do the checks, it is pretty simple to write a script that
measures whatever value you would like to monitor. As I said before, it
is a pain to set up initially, but after getting it set up, I couldn't
be happier with it.

Ashe Canvar wrote:

Take a look at Nagios (http://www.nagios.org/) for active monitoring and Cricket
(http://cricket.sourceforge.net/) which uses RRDtool to monitor throughput like MRTG, but is a little easier to configure.

Good Luck,

Thanks for the quick responses. Perhaps I should have been more explicit.

I already use "remstats"
(http://remstats.sourceforge.net/release/index.html) for interface b/w
monitoring. I have worked with nagios and openview int he past.

I have an ospf based network. The specific monitoring problem I am
trying to solve is :

1. actively test the currently active path for packet loss and transfer
     i.e. draw a latency grid between every datacenter and every other
datacenter

2. actively detect routing changes / failover to redundant paths
using traceroutes
     i.e. alert if SFO->CHG->NYC changes to SFO->LXE->HOU->NYC
     ( link state protocols suck as far as testing backup paths go)

3. actively transfer a fixed file
   i.e. draw a datarate grid between every datacenter and every other datacenter

So, I am not looking for a generic graphing/alerting NMS. Does anyone
use a specific tool that is capable of doing this ?

I am in a buy vs. build debate with my boss :wink:

Regards,
Ashe.

1. Cricket with Acktomic tools to monitor Cisco SLA/SAA/RTR values
2. ospf snmp traps to snmptrapd? I think somewhere in the archives someone
did some perl scripting to watch ospf stuff. OSPF has some mibs that can be
used for data gathering. Ed Ravin had an add-on for
http://linux.kernel.org/software/mon/. Check the archives around 2006/02/06.
John Kristoff has an integrity tool at http://ntgrd.depaul.edu/software/
(may not be what you look for). Check the archives around 2006/01/18. If
nothing else, they may show you how to get at the OSPF stuff you want.
3. is netmap what you are describing:
http://www.it.teithe.gr/~v13/netmap/img/netmap-1.3.0-1.png? Maybe use
Netmap to plot RTR values from 1) rather than the standard bandwidth values

Oh! Then take a look at the FCP product from Internap.

http://www.internap.com/solutions/routecontrol/page1980.html

The price alone will convince your PHB to let you build a box.

MRTG is not a monitoring system. It's a data collection system.
Toby should have never put in the alarm configuration. But he
did. Anyhow.

There's some tools listed in the NANOG faq, but two very easy
ones come to mind.

1. NAGIOS

2. NOCOL (I know about snips. I don't care).

-M<

A few more comments.

I found a link to snmp management for ospf in an archive message:

801177ff.shtml. That may yield you the info you need for monitoring links
and/or routes.

From my other message, if you collect 1) and 3) with cricket, you can

extract RTR and bandwidth data with perl from cricket's config file. I took
a bit of code reverse engineering, but I managed to get some mod_perl code
going to do such a thing, so it can be done. If you pull out the
appropriate interface stats, you'd be able to generate your grid for 1) and
3).

Do you need generate alerts? Or provide trending information to measure performance?

I said mrtg or rrd because you can create graphs based on the ping repsonse time & packet loss between the datacenters, you could also create a graph showing how long it takes to transfere a file to remote site. Basic mrtg and a few simple scripts and a webserver.

If you need something that alerts you with e-mail/pages, then nagios, but you’ll spend a lot of time in setup and trying to export the nagios checks to into a someting that makes pretty graphs if you need that.

I thought the Internap FCP is only for bgp setups, also it doesn’t provide the informatoin you’re gonna want, at least not that I can tell yet… :slight_smile:

I use snmpstatd - snmpstat.sf.net .

I use snmpstatd - snmpstat.sf.net .

Oooh, looks nice!

From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu] On Behalf Of
Ashe Canvar

2. actively detect routing changes / failover to redundant paths using
traceroutes
     i.e. alert if SFO->CHG->NYC changes to SFO->LXE->HOU->NYC
     ( link state protocols suck as far as testing backup paths go)

Ashe,

I've done this using "mon" (http://www.kernel.org/software/mon/). It comes with
two traceroute monitors which remember the past paths and alert when that path
changes. In fact, one of the monitors can even detect load-balanced alternate
paths, e.g. if there are multiple possible intermediate paths during normal
operation.

You'll want to look at the latest 1.1 release from CVS:

     http://www.kernel.org/software/mon/development.html

3. actively transfer a fixed file
   i.e. draw a datarate grid between every datacenter and every other
datacenter

In fact, I belive people have done precisely this with mon before.
Try asking on the mailing list, I'm quite sure someone will respond.

I am in a buy vs. build debate with my boss :wink:

Build! I think mon gets you at least 90% to where you want to go.

D'oh!! At first I thought he was asking for backHOE monitoring
tools. Around here we simply bury a short length of fiber and wait a few
minutes until the backhoes sniff it out and start digging.... sorta like
the way they use pigs to search for truffles.

          David Leonard
          ShaysNet

Snmpstat was esigned for ISP in Russia, and is used actively by a few ISP. I
modified it for enterprise here in USA and use for entyerprise monitoring as
well. It if _fixed parameter system_ so it imonitors just
routeres/switches/firewalls for a limited set of parameters (interfaes and
ports) but do it very well and have very useful compactt view, tickets,
sopund alerts for opertators, etc.

It uses simple config file which can be easily generated or can be modified
by the web. I use it (Poll.conf file) as a primary documentation (saving it
into CVS on each change). We are using snmpstat in combination with cricket
or mtg (which monitors parameters not covered by snmpstat), and combine it
with CCR - cisco configuration repository (track cisco config changes),
ProBIND2 (control all DNS'es around), acid (snort viewer), inventory
database (shows hardware in the racks), alert aliasing system (just set of
aliases + archive for alerts, warnings and so on), osiris (control server's
changes), and few other tools (you can see short description on the snmpstat
page).

It is not (yes; I have it in TODO but did not had demand so it was not
completed) packed as 'rpm' or well auto-configured (but the only problem we
hais usually _fix small inconsistancy in include files of embeddded snmp
package), but is very fast (we monitor 1,000 - 2,000 interfaces without any
visible impact on our FreeBSD servers) and relatively simple.

Two words: "Asymmetric routes". Just be aware of the implications.

Well, True. But the idea is to have a full mesh of 'n' sensors each
doing 'tests' to the remaining n-1 sensors. Finding asymmetric routes
should be trivial as I plan to feed it my router configs from rancid,
for detecting interfaces that belong to the same router. ( Of course,
this can't be extended to the Internet in genral. )

From all the replies I have received, I don't think anything open

source fits the bill.

Going to the mines to write my own. Good bye cruel world...

Wouldn't you be better served just walking the netToMedia tables for your devices? Parsing configs sucks. Even caching the contents of a simple snmpwalk would save you some pain. Shovel 'em into a db and call it a day.

- billn