Incident notification

Nanog list members,

I was looking at some statistic and noticed we are sending out a massive amount of SMS messages from our monitoring systems.
This left me wondering if there isn't a better (and cheaper) alternative to this, something just as reliant but IP based. We all have smartphones these days anyway.

Therefore my question, what are you using to notify admins of incidents?

Kind regards / Met vriendelijke groet,

Thijs Stuurman

[IS Logo]

The advantage of SMS is that it is out of band. Any smtp or other IP based solution requires a stable and working network environment, which is what the alert may be trying to tell you is down.

Pagerduty for phone calls. Can do SMS as well, I believe.

Josh Luthman
Office: 937-552-2340
Direct: 937-552-2343
1100 Wayne St
Suite 1337
Troy, OH 45373

The advantage of SMS is that it is out of band. Any smtp or other IP based solution requires a stable and working network environment, which is what the alert may be trying to tell you is down.

I do not worry so much about that, part of the monitoring solution is out of band for that reason.

Kind regards / Met vriendelijke groet,
Thijs Stuurman

We use OpsGenie for notifications (and on-call scheduling, etc). There are other similar options such as PagerDuty, etc, as well.

Notifications can be submitted to the service in a variety of ways (email, web API, etc), has a variety of integrations with other tools (Nagios, Pingdom, etc) to aggregate all of your alerts, and there is a callback mechanism where the user can trigger custom actions right from the app (for example, I wrote an interface for it such that when we get an alert, the on-call person can choose to restart the affected service -- or even reboot the entire VM hosting it -- right from within the OpsGenie app).

Each user can choose their method of contact (notification to the smartphone app, SMS, phone call, email, whatever), and on-call schedules (and exceptions) are easily managed.

It works for us... YMMV. :wink:

- Peter

I know of a firend that is using Growl / Prowl to push out the notifications to their phones, even to their TV's at home.

Sk.

Which is why you locate a small NMS outside your network (on a VM
somewhere) whose only job is to start alerting when it can't reach the NMS
inside your network. That also helps when your interior NMS system gets
gummed up or when a general emergency in your locality damages your
infrastructure at the same time as the SMS provider's infrastructure.

If your monitoring system is structured well to begin with, email has
efficacy comparable to sms. A smartphone app expecting heartbeats via your
in-band infrastructure has effectiveness superior to both.

Regards,
Bill Herrin

Pushover and email to sms from both an inband and off site monitoring vm.

Multiple nagios servers directly sending via amazon web services SES to
pager duty.

Unlikely SES would go completely down. Nagios boxes monitor eachother from
different continents.