May I have your recommendation regarding any outage management software and NOC log book(preferably open source) .
I want to get fresh ideas about available software in this area.
The below scenario may explain what I am looking for:
One of the sites gets down, monitoring team would log it. Technical staffs follow it, they find there is something wrong
in the site. Someone gets to the site and find there is a power failure. Make it correct. Monitoring team again see that
site UP and update their log book put the recovery time and the reason (i.e. power failure)
One of the simplest report from this system would be downtime per site/per reason.
The ability to record group outage - manually or automatically based on network topology - (i.e. failure of a core
router in a city which would be caused several sites failure) would be also useful.
Zabbix allows to acknowledge events with a comment.
Otrs seems to be a ticketing system. We are using RT (bestpractical)
as our ticketing system and our monitoring guys use RT to issue a
trouble ticket to our maintenance team.
Sometimes something happened by our upstream provider and for example
in less than 7 minutes resolved.
In all cases monitoring staff log the start time, type of failure and
resolved time in thei log-book. Later they tried to put these data
including the affected sites and it would be used to create mane
reports regarding sites uptime.
S I'm looking for an application with very easy and handy interface to
simulate their log book for outages.
I can create some custom fields in our RT to maintain these data, but
there are some problems:
1- not all of the incidents are recorded in our ticketing system,
because they should be followed by someone out of our system
2- some problems may get resolved in a few minutes, I.e. By. Phone
call. So, creating a ticket may not make sense.
3- the interface of RT is not good enough to be used as a fast
log-book system for our outage