System and Network Monitoring

OK, I'm looking for "real world" data and not sales/marketing hype.

I'm looking for a package to do network and system monitoring. It's a
heterogenous environment consisting of all manner of routers, at least 700
servers (Mostly FreeBSD, but also Solaris and Network Appliance boxen).

Requirements:

1.) Must support FreeBSD (obviously, its the platform of probably about
98% of the servers)

2.) Must scale well. Today we have (at best guess given by management)
"somewhere between 500 and 1000 servers", but that number grows daily.
Operations adds about 4-5 servers a week at the present time.

3.) Must support the ability to provide checks on custom situations (e.g.,
an external program which checks the output from an e-commerce server to
check that the pricing/qty/etc. is accurate and not stale or outdated)

4.) Must provide a flexible escalation method including e-mail and paging.

5.) Provide a nice interface to quickly find and troubleshoot problems.
Preferably this would be X based, so that the server itself could reside
in our server rooms, exporting the display to whomever had a need to watch
the network/servers, or troubleshoot their particular corner of the world.

I know this product has to exist, but I've only so far found one product
that claims to be able to do all of that (HP OpenView). I'm not against
HP, but I'd like to have some more options than what I currently have
available to me.

Replies probably should be made off-list (and if anyone wants me to, I'll
happily summarize to the list what I get).

Thanks in advance.

Take a look at sysmon that Jared Mauch wrote... it kicks ass. While
it cannot doo all the checks you are looking for, you may be able
to have it suit your needs.

  ftp://puck.nether.net/pub/jared/sysmon-0.80.1.tar.gz

That does multiple system checks, blah blah blah.

MRTG is great for utilization and misc statistics like temp,
outgoing mail, portmaster users, bandwidth, etc. It uses
SNMP so anything that uses SNMP can be polled.

  http://ee-staff.ethz.ch/~oetiker/webtools/mrtg/mrtg.html

Both work well on FreeBSD up to 3.0-current as well as Solaris 2.x

-r

I didn't want to self promo, but since ravi started it ... :wink:

  http://puck.nether.net/sysmon/

  Documentation and usage is very raw, but I'm willing to answer
questions about it and help you get it going, because I believe
you'll be happy with it.

  I'm bug fixing as I just released the 0.80 (series) versions
recently, but will be putting a number of new features in shortly...

  Please feel free to contact me about what is supported,
feature requests, bug reports, etc.. I'm responsive as for fixing
and (slightly less, but still will do) feature requests. Features
will be added, i've just not got lots of time to work on adding them
as I do have a real job besides this :slight_smile:

  Please ask me questions about it offline and respect my
reply-to:

  - jared

Any examples of this up and running? I can see the one reference to it
at Sysmon Home Page, but would like something a little more
detailed.
I would be curious to see how people have implemented it and
exactly how they have extracted information out of
it.

Thanks.

-Ashley

What are you looking for? The best way to get an example is to
run it yourself. I use it to monitor our entire network, all
production machines and each of their processes. The system
emails out up/down pages which I send to our internal ticketing
system, to a pager gateway, and log to a local facility.

From there, you can just imagine what can be done.

-r