SLA Monitoring

What do you guys use for monitoring of SLAs, be it an upstream or a downstream SLA? I know of a couple services, just looking to see who's doing what and how they like it.

We use various tools for monitoring here. Pingdom for external monitoring, Observium for internal and SmokePing for internal/external.

As far as Pingdom goes, we ended up paying for 3 years to lock in pricing because it keeps going up and the service doesn't improve, but it is useful. I need to find an alternative prior to the next renewal.

Hi Mike,

We have customers that use Cisco IPSLA or juniper RPM to the actual SLA test, then use an NMS system to collect and report on that data.

So I suppose depending on what you mean by monitoring there are a few options.
1. Real time graphing collected via SNMP
2. Proactive alerting based on threshold configuration
3. Reporting on SLA based on contractual obligations.
4. Central provisioning of the individual tests.

I’m a little biased as I work for a monitoring company, but if you let me know what you mean by monitoring I can try help out.

regards

Alan

Hey,

What do you guys use for monitoring of SLAs, be it an upstream or a downstream SLA? I know of a couple services, just looking to see who's doing what and how they like it.

It might be useful to understand what type of data are you expecting
out. Bunch of kit out there Accedian, Creanord, Netrounds, JDSU, Exfo,
Polystar...

Personally for me important things are:
  a) full-mesh, any pop to any pop
  b) high resolution, I want to know at least down to 10ms (100pps *
cos * pops - may not be trivial amount of traffic)
  c) multiple CoS for all paths
  d) ability to discriminate measurement by SPORT (to troubleshoot ECMP issues)
  e) 1us or better precision for 2way jitter, latency
  f) good API to configure, to get data out, to get alerts out
  g) verify that received data is same as send (to find out if network
has mangled bits) - this is very rare feature for some reason

1us or better precision basically removes all virtualised setups,
because SR-IOV does not provide access to HW timestamping today. So
you'll need dedicated HW for it, and vast majority of these shops only
offer HW timestamping in the the upper range of the products.

Personally I like Creanord, in previous life I've worked with them and
found them to be knowledgeable and reactive partner. They've recently
released new small/affordable boxes with HW timestamping. But are
lacking in some department, like no data validity checking today, and
GUI creation of full-mesh measurement is quite a chore as you need to
individually pick interfaces. Latter isn't so big deal for me, as I'd
do it programmatically anyhow, but may be big deal to others. I know
that both are on the table to be fixed.

If precision and resolution are not important and you're happy to
write your tooling to present the data and alert, you can probably get
away with CSCO IP SLA and/or JNPR RPM. Coworker of mine has written
very convenient and high performance IP SLA responder, so that you
don't have to buy several expensive Cisco boxes just to respond to the
queries - GitHub - cmouse/ip-sla-responder: Cisco IP-SLA / Juniper RPM responder

LogicMonitor is an excellent all-inclusive SaaS solution:
https://www.logicmonitor.com/

Thousandeyes works well, but it's also really expensive.