outage/maintenance window opinion

Trying to get clarification on an issue.

Maintenance/outage window is 2:00AM to 5:00AM, during the window the router we are working on fails and does not come back online until 8:00AM.

From a outage reporting/documentation standpoint is the outage start time 2:00AM or 5:01AM since 5:01AM is when the maintenance window and planned outage was over...

My take is that the outage starts when the planned maintenance/outage window is over at 5:01AM.

Luke

Luke Parrish
Centurytel Internet Operations
318-330-6661

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

It depends.

If your device(s) was part of the change management notification then
that's correct.

regards,
//virendra//

Luke Parrish wrote:

Trying to get clarification on an issue.

Maintenance/outage window is 2:00AM to 5:00AM, during the window the

router

we are working on fails and does not come back online until 8:00AM.

From a outage reporting/documentation standpoint is the outage start

time

My opinion:

For the customer, the outage starts when their service stops working* and
ends when their service starts working again. Your goal should be to make
that all happen during the maintenance window. If it doesn't, then the part
that was during the window is "planned outage" and the part that wasn't is
"unplanned outage".

Good ISPs have good explanations for, and sometimes even monetary credit,
for "unplanned outages". "Planned outages" can simply be explained by
pointing at the announced maintenance interval policy.

Matthew Kaufman
matthew@eeph.com

*Note that this can be different times for different customers, and "stops
working" means different things to different people... Some customers are
unhappy if their traffic is taking the slightly longer alternate path,
others are happy as long as they can reach CNN, even if the rest of the net
disappears.

I suspect that this depends rather entirely on the person who is
*looking* at your outage reports.

That is: if you're compiling them only for internal purposes, use
whatever policy you like. If someone else, like say, NERC, is the
intended audience, then they probably already have an answer to that
question.

My *personal* approach would be to use the end of the window, yes, but
I am not the person you're reporting to.

Cheers,
-- jra

Also, the possibility of equipment failure should *always* be factored into backout/recovery plans. You can have all the faith in your hardware that you want, but Murphy has enable/root.

If it's something has simple as having redundant capacity to shift the load to, or as drastic as having a spare chassis sitting on hand, it's always a possibility, however remote.

- billn

Heya,

I disagree as this entire event wasn't a planned outage. The "planned" part
was what you intended to do and, if its anything like the maintenance reports
that I send and receive, you typically state how long you expect the impact
will be and that it will take place within your maintenance window. I'd argue
that you should start the clock ticking when the outage first happened and
then take off from that whatever you annouced as the impact duration.

For example, if you said that the impact would be a ten-minute outage sometime
during your window from 2am to 5am and your outage started at 2am, I'd count
this as an unplanned outage starting from 2:10am. That's just my $0.02...

On another note, you had a 3 hour window and a 6 hour outage. It sounds like
someone didn't seriously consider the "back out" part of your change management
planning. You really should have that as part of your process and have a hard
deadline within the window after which you revert the network to its previous
state.

Eric :slight_smile:

Luke Parrish wrote:

Trying to get clarification on an issue.

Maintenance/outage window is 2:00AM to 5:00AM, during the window the router we are working on fails and does not come back online until 8:00AM.

From a outage reporting/documentation standpoint is the outage start time 2:00AM or 5:01AM since 5:01AM is when the maintenance window and planned outage was over...

To a small degree, it depends on how long you anticipated the outage to be. Were you expecting a three-hour tour^h^h^h^houtage, or something shorter but opened a big window to give you flexibility on when to do it? I would say that a fifteen-minute expected impact means the outage started at 2:15AM (or fifteen minutes after your work interrupted services).

My $0.005,

pt

The event I stated in my first email was an example, not an actual incident.

I think from the 30+ emails I have received I have had 2 responses that said I should start my SLA credits and outage minutes from the beginning of the window and the rest that feel the outage minutes start ticking when the planned outage was over...

Regarding Change Management procedures, we do have had deadlines for backing out, verification, etc etc. But you are right...

luke

In this situation we were expecting to be done for the majority of the maintenance window, but yes I see your point. However I block out a 3 hour window for maintenance because the activities I am performing on the network could easily cause a longer service outage than planned as we all know. So if I plan for a 4 hour window but only expect 20 minutes of downtime that actually turns into 3 hours, as long as it is inside the maintenance window specified then it should not go against outage minutes. It was done in the window for a reason...

??
Luke