I apologize for not starting a new thread before, I didn't realize that the nanog mailing list created a thread-index rather than using the subject.
Even though NANOG is primarily for network operators, I know that a number of members work in NOCs where there is also monitoring of servers/applications. I would appreciate it if anyone has suggestions about monitoring systems that would be applicable to our environment. We have a large number of custom applications on a large number of hosts including Windows 2003/2008, Linux x86/x86_64 and Solaris Sparc/x86_64. We are looking for a better way of monitoring our environment. We are looking for recommendations for opensource or low-cost. We would prefer solutions where the basic monitoring is ready out of the box. Native agents with custom scripting would be highly desired (rather than SNMP/DMI/WMI polling).
Some of our requirements:
. Native agents for Windows 2003/2008, Linux, Linux x86_64, Solaris Sparc and Solaris x86_64. Either binaries or source code.
. Ability to send alerts via email, pager and/or snmp
. Monitoring of OS properties like memory, disk, cpu, etc...
. Ability to extend agents with scripting to allow monitoring of custom services
. Plug-in architecture for third-party add-ons
. Reliable Architecture
. Reasonable user interface
. Non-blocking polling
. Active Project (New Releases on regular basis and have existed for a reasonable period)
Based on our research and feedback from NANOG, we have put a preliminary list of product to evaluate:
Hyperic http://www.hyperic.com/
OpenNMS http://www.opennms.org/wiki/Main_Page
opsview http://www.opsview.org/
osimius http://www.osmius.net/en/
PandoraFMS http://pandorafms.org/
Zabbix http://www.zabbix.com/
Groundwork http://www.groundworkopensource.com/
Nagios http://www.nagios.org
Zenoss http://zenoss.com
OpManager http://www.manageengine.com
Orion http://www.solarwinds.com/products/orion/
BigBrother http://bb4.com/
Argus http://argus.tcp4me.com/
Xymon http://www.xymon.com
Spiceworks http://www.spiceworks.com/
ICINGA http://www.icinga.org
It's not the nanog mailing list, it's your own email client (and ours)
that keeps the threads intact. The mailing list simply forwards the
headers you send it.
Matthew Huff wrote:
Some of our requirements:
. Native agents for Windows 2003/2008, Linux, Linux x86_64, Solaris Sparc and Solaris x86_64. Either binaries or source code.
. Ability to send alerts via email, pager and/or snmp
. Monitoring of OS properties like memory, disk, cpu, etc...
. Ability to extend agents with scripting to allow monitoring of custom services
. Plug-in architecture for third-party add-ons
. Reliable Architecture
. Reasonable user interface
. Non-blocking polling
. Active Project (New Releases on regular basis and have existed for a reasonable period)
You probably have the list of the most commonly used. Each has good and bad points. A few of them I believe are limited on using agents and supporting external scripts. Several are considered Nagios on steroids, using a Nagios core wrappered in a bunch of other OSS. Several, like Zenoss are particular about the primarily monitoring system (though agents might run on any OS).
Jack
It's neither open source, nor free, but I moved from Nagios/Groundwork
to Solarwinds ipMonitor 9.
Solarwinds recently cut the price down to under $1000 for unlimited
monitors. Up until about a year ago, the unlimited license ran about
$5K.
So for a large nationwide environment like ours, our ROI was pretty
decent, but if you are only watching a dozen or two systems with maybe
ten monitors each, Nagios would be the best bet.
We've been using Ipswitch WhatsUp Gold for many years. Their recent improvements to the product have been mainly system monitoring stuff.
The product has grown in capabilities hugely since version 4 when we started with them (they are on version 12 now), and with that improvement in capabilities, the price has gone up a bit. It's still a whole lot less than most other options, however.
There isn't too much in the way of agents, but we've integrated a ton of proprietary systems with WhatsUp Gold via it's SQL database back-end.
They also have fully scriptable monitoring as a standard feature now.
Anyways, thought I'd put in my two cents...
- Erik
May I also mention InterMapper from Dartware. Very low price solution. Doesn't do well with trending and graphing out of the box (use RRD to get data out of it), but I like the live maps, platform independence and ease of creating new probes. I try to use SNMP wherever possible, but it can take pass parameters to an external script and analyze it's output as well.
I would disagree; nagios is not limited to small systems... We're currently monitoring about 8500 services on 2834 routers with nagios quite successfully and have been doing so for nearly a decade now -- we started with Netsaint. With custom scripts receiving data from our inventory management system, Nagios config generation for 99% of the hosts is completely automated with only a handful of special cases that are hand-modified as needed. Our investment, both in initial/ongoing man-hours, hardware, etc is minimal so our ROI is decent too 
I would disagree; nagios is not limited to small systems... We're currently monitoring about 8500 services on 2834 routers with nagios quite successfully and have been doing so for nearly a decade now -- we started with Netsaint. With custom scripts receiving data from our inventory management system, Nagios config generation for 99% of the hosts is completely automated with only a handful of special cases that are hand-modified as needed. Our investment, both in initial/ongoing man-hours, hardware, etc is minimal so our ROI is decent too 
--
Marc
+1 / what he said
I auto generate my nagios configs from an in house asset management system as well. It works great. Monitoring over 1k devices. We built a custom reporting system around nagios as well.
+1 / what he said
I used it at the DNC worked like a charm!
Winn Johnston
Linux Systems Administrator
703 380 8666
I use Zabbix and Cacti primarily, but in the same situation - there
has to be better stuff out there.
You listed some that I wasn't familiar with, so I'll try those out.
I did use Hyperic, while it has a lot of features, its bulky UI really
detracts from its abilities.
Wish Zabbix had some more community and developers, has many promising
features, but needs SNMP and IPMI more tightly integrated.