I want a simple backbone monitor for my 5 datacenters. My "backbone"
consists of redundant IPSEC/GRE tunnnels.
At the very least I want to ping, traceroute and transfer a small file
every few minutes over all IPSEC links. I am sure there are products
that do this already, but I am having a hard time finding any.
The display format should be noc-friendly. A basic grid with green/red
status indicators at the least. Geographical maps a plus.
Do most of you use a home grown tool for this monitoring and alerting ?
I want a simple backbone monitor for my 5 datacenters. My "backbone"
consists of redundant IPSEC/GRE tunnnels.
At the very least I want to ping, traceroute and transfer a small file
every few minutes over all IPSEC links. I am sure there are products
that do this already, but I am having a hard time finding any.
autostatus, mrtg, cricket, hobbitmon, cacti, nagios, big brother,are
all good options, find these and more on Freshmeat: http://freshmeat.net/browse/152/
The display format should be noc-friendly. A basic grid with green/red
status indicators at the least. Geographical maps a plus.
For noc-friendly latency reporting, look at SmokePing. For deeper
tests of HTTP page loads and file transfers, HobbitMon could be what
you're looking for.
I'm not aware of any freeware products which draw nice geographic
maps, we have OpenView for that. A few years ago I started work
towards generating dynamic network status graphics with Graphviz, but
management decided it would be easier and faster to buy OpenView
licenses.
Do most of you use a home grown tool for this monitoring and alerting ?
I've found that I always end up writing some custom code, but you
could do worse than to build on top of one of the open-source
monitoring tools.
For example, I use a highly customized version of AutoStatus for
up/down alerting, primarily because I like how it handles
dependencies.
I have had a decent amount of success with Nagios. It is not trivial to
setup, but once it is up and running, it has always handled our
dependencies and such very well. Additionally, because it calls external
programs to do the checks, it is pretty simple to write a script that
measures whatever value you would like to monitor. As I said before, it
is a pain to set up initially, but after getting it set up, I couldn't
be happier with it.
I have an ospf based network. The specific monitoring problem I am
trying to solve is :
1. actively test the currently active path for packet loss and transfer
i.e. draw a latency grid between every datacenter and every other
datacenter
2. actively detect routing changes / failover to redundant paths
using traceroutes
i.e. alert if SFO->CHG->NYC changes to SFO->LXE->HOU->NYC
( link state protocols suck as far as testing backup paths go)
3. actively transfer a fixed file
i.e. draw a datarate grid between every datacenter and every other datacenter
So, I am not looking for a generic graphing/alerting NMS. Does anyone
use a specific tool that is capable of doing this ?
1. Cricket with Acktomic tools to monitor Cisco SLA/SAA/RTR values
2. ospf snmp traps to snmptrapd? I think somewhere in the archives someone
did some perl scripting to watch ospf stuff. OSPF has some mibs that can be
used for data gathering. Ed Ravin had an add-on for http://linux.kernel.org/software/mon/. Check the archives around 2006/02/06.
John Kristoff has an integrity tool at http://ntgrd.depaul.edu/software/
(may not be what you look for). Check the archives around 2006/01/18. If
nothing else, they may show you how to get at the OSPF stuff you want.
3. is netmap what you are describing: http://www.it.teithe.gr/~v13/netmap/img/netmap-1.3.0-1.png? Maybe use
Netmap to plot RTR values from 1) rather than the standard bandwidth values
I found a link to snmp management for ospf in an archive message:
801177ff.shtml. That may yield you the info you need for monitoring links
and/or routes.
From my other message, if you collect 1) and 3) with cricket, you can
extract RTR and bandwidth data with perl from cricket's config file. I took
a bit of code reverse engineering, but I managed to get some mod_perl code
going to do such a thing, so it can be done. If you pull out the
appropriate interface stats, you'd be able to generate your grid for 1) and
3).
Do you need generate alerts? Or provide trending information to measure performance?
I said mrtg or rrd because you can create graphs based on the ping repsonse time & packet loss between the datacenters, you could also create a graph showing how long it takes to transfere a file to remote site. Basic mrtg and a few simple scripts and a webserver.
If you need something that alerts you with e-mail/pages, then nagios, but you’ll spend a lot of time in setup and trying to export the nagios checks to into a someting that makes pretty graphs if you need that.
I thought the Internap FCP is only for bgp setups, also it doesn’t provide the informatoin you’re gonna want, at least not that I can tell yet…
From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu] On Behalf Of
Ashe Canvar
2. actively detect routing changes / failover to redundant paths using
traceroutes
i.e. alert if SFO->CHG->NYC changes to SFO->LXE->HOU->NYC
( link state protocols suck as far as testing backup paths go)
Ashe,
I've done this using "mon" (http://www.kernel.org/software/mon/). It comes with
two traceroute monitors which remember the past paths and alert when that path
changes. In fact, one of the monitors can even detect load-balanced alternate
paths, e.g. if there are multiple possible intermediate paths during normal
operation.
You'll want to look at the latest 1.1 release from CVS:
D'oh!! At first I thought he was asking for backHOE monitoring
tools. Around here we simply bury a short length of fiber and wait a few
minutes until the backhoes sniff it out and start digging.... sorta like
the way they use pigs to search for truffles.
Snmpstat was esigned for ISP in Russia, and is used actively by a few ISP. I
modified it for enterprise here in USA and use for entyerprise monitoring as
well. It if _fixed parameter system_ so it imonitors just
routeres/switches/firewalls for a limited set of parameters (interfaes and
ports) but do it very well and have very useful compactt view, tickets,
sopund alerts for opertators, etc.
It uses simple config file which can be easily generated or can be modified
by the web. I use it (Poll.conf file) as a primary documentation (saving it
into CVS on each change). We are using snmpstat in combination with cricket
or mtg (which monitors parameters not covered by snmpstat), and combine it
with CCR - cisco configuration repository (track cisco config changes),
ProBIND2 (control all DNS'es around), acid (snort viewer), inventory
database (shows hardware in the racks), alert aliasing system (just set of
aliases + archive for alerts, warnings and so on), osiris (control server's
changes), and few other tools (you can see short description on the snmpstat
page).
It is not (yes; I have it in TODO but did not had demand so it was not
completed) packed as 'rpm' or well auto-configured (but the only problem we
hais usually _fix small inconsistancy in include files of embeddded snmp
package), but is very fast (we monitor 1,000 - 2,000 interfaces without any
visible impact on our FreeBSD servers) and relatively simple.
Well, True. But the idea is to have a full mesh of 'n' sensors each
doing 'tests' to the remaining n-1 sensors. Finding asymmetric routes
should be trivial as I plan to feed it my router configs from rancid,
for detecting interfaces that belong to the same router. ( Of course,
this can't be extended to the Internet in genral. )
From all the replies I have received, I don't think anything open
source fits the bill.
Going to the mines to write my own. Good bye cruel world...
Wouldn't you be better served just walking the netToMedia tables for your devices? Parsing configs sucks. Even caching the contents of a simple snmpwalk would save you some pain. Shovel 'em into a db and call it a day.