Never push the Big Red Button (New York City subway failure)

NEW YORK CITY TRANSIT RAIL CONTROL CENTER POWER
OUTAGE ISSUE ON AUGUST 29, 2021
Key Findings
September 8, 2021

Key Findings
[...]

3. Based on the electrical equipment log readings and the manufacturer’s official assessment, it was determined that the most likely cause of RCC shutdown was the “Emergency Power Off” button being manually activated.

Secondary Findings

1. The “Emergency Power Off” button did not have a protective cover at the time of the shutdown or the following WSP investigation.

[...]
Mitigation Steps

1. Set up the electrical equipment Control and Communication systems properly to stay active so that personnel can monitor RCC electrical system operations.

[...]

Reminds me of something that happened about 25 years ago when an elementary school visited our data center of the insurance company where I worked. One of our operators strategically positioned himself between the kids and the mainframe, leaned back and hit it's EPO button.

Matthew Huff | Director of Technical Operations | OTA Management LLC

Office: 914-460-4039
mhuff@ox.com | www.ox.com
...........................................................................................................................................

Reminds me of something that happened about 25 years ago when an elementary school visited our data center of the insurance company where I worked. One of our operators strategically positioned himself between the kids and the mainframe, leaned back and hit it’s EPO button.

Or when your building engineering team cuts themselves a new key for the ‘main breaker’ for the facility… and tests it at 2pm on a tuesday.
Or when that same team cuts a second key (gotta have 2 keys!) and tests that key on the same ‘main breaker’ … at 2pm on the following tuesday.

not fakenews, a real story from a large building full of gov’t employees and computers and all manner of ‘critical infrastructure’ for the agency occupying said building.

True EPO story; maintenance crew carrying new drywall into the data center backed into the EPO that didn’t have a cover on it. One of the most eerie sounds in networking…a completely silent data center.

-chris

Since we are telling power horror stories…

How about the call from the night operator that arrived at 10:00pm asking “Is there any reason there is no power in the data center?”

Turns out someone had plugged in a new high end workgroup laser printer to the outside wall of the datacenter. The power receptacle was wired into the data center’s UPS and completely smoked the UPS. Luckily the static transfer switched worked, but the three mainframes weren’t’ happy…

Or

Our building had a major ground fault issue that took years to find and resolve. We got hit with lightning that caused the mainframe to fault and recycle…and two minutes in, we got hit by lightning again. When the system failed to start, we called IBM support. When we explained what happened there was a very long pause…then some mumbling off phone, then the manager got on the line and said someone would be flying out and be onsite within 12 hours. We were down for 3 days, and got fined $250,000 by the insurance regulators since we couldn’t pay claims.

Matthew Huff | Director of Technical Operations | OTA Management LLC

Office: 914-460-4039

mhuff@ox.com | www.ox.com

Aka "molly-guard".

https://en.wiktionary.org/wiki/molly-guard

Reminds me of something that happened about 25 years ago when an elementary school visited our data center of the insurance company where I worked. One of our operators strategically positioned himself between the kids and the mainframe, leaned back and hit it’s EPO button.

Or when your building engineering team cuts themselves a new key for the ‘main breaker’ for the facility… and tests it at 2pm on a tuesday.
Or when that same team cuts a second key (gotta have 2 keys!) and tests that key on the same ‘main breaker’ … at 2pm on the following tuesday.

not fakenews, a real story from a large building full of gov’t employees and computers and all manner of ‘critical infrastructure’ for the agency occupying said building.

In the early 2000s a friend of mine worked for a company in NYC that provided stock feeds to large banks and brokerages and similar. They’d ship a (locked) cabinet full of stuff to their customers, complete with an Ethernet cable stickin’ out the back. Customer would plug this into their network and, um, do whatever it is stock people do. There was some horrendously expensive SLA attached, and so they outsourced support to one of the managed services companies so that they could provide 24x7x2hour response all over the country.

One day, one of their largest customers, a large bank, also in NYC is down. This means that the brokerage arm is unable to do any trades, and so is, um, annoyed. Support rushes over to the customer and “fix it”. My friend doesn’t really get a good explanation of how it got fixed, but, meh, it’s working, so all good. A few weeks later, same thing - customer devices disappear from monitoring, smart-hands/support rush over and fix it, no useful RFO. This happens a few more times, and everyone is getting increasingly annoyed.

Eventually my friend arranges it so that he gets paged at the same time as the support provider. Pager goes off, friend jumps in a cab to the customer. He arrives at the same time as the smart-hands person, who, oddly, is clutching 1: an Ethernet face-plate and 2: a punch-down tool. Somewhat mystified, my friend follows the support person to where the cabinet is located. Because it’s important and special, but not actually bank owned, it cannot live in their data-center… and so it is located in the corridor, just outside the server room.

Because of where the Ethernet cable comes out the back of the cabinet, and where the wall jack is, there is basically no slack. When someone goes in or out, especially if they are wheeling a cart or carrying a box of equipment, they bang into the cabinet, which slowly rolls away – ripping the wall jack off the wall, and the cable out the back of the jack. Support’s “solution” to this has been to punch down the cable onto a new wall jack, screw it back onto the wall, wheel the cabinet back into place, and call it fixed.

My friend screwed down the cabinet feet, so it wasn’t resting on the wheels any more, replaced the 6ft Ethernet with a 15ft, and the issue never recurred :stuck_out_tongue:

W

A nearby datacenter once lost power delayed because someone hit the switch to transfer from city power to generator power and then failed to notice. The power went out the day after when there was no fuel left.

A nearby datacenter once lost power delayed because someone hit the switch to transfer from city power to generator power and then failed to notice. The power went out the day after when there was no fuel left.

:slight_smile:

A story, told to me by a friend…

The utility let them know that they were going to be doing some maintenance work in the area. No impact expected, but out of an abundance of caution, they transfer over to generators. After the utility lets them know that the maintenance work is all finished, they want to switch back. If the generators are “emergency power”, and you need to switch back to “utility power”, obviously the way to do this must be the big red button, clearly marked as “EMERGENCY POWER OFF”, no?!

I suspect it is apocryphal, but it’s still entertaining,
W

I don't even *do* datacenter for a living, and I know that when you hit the
Molly button,

1) A Klaxon goes off in the Data Center -- one that sounds *different* from
the Halon Klaxon, in both cadence and tone (just for a couple bursts), and

2) Yellow rotating beacons turn on, and stay on while you're on Emergency Power.

Yes, real honest-to-ghod *rotating mechanical beacons*, none of this flashing LED
crap.

Clearly, it's important that the use of Emergency Power be annoyingly noticeable.

Cheers,
-- jra

Now I’m curious… in all of the DCs and COs I’ve worked in - to the best of my knowledge, I haven’t personally tested this! - the EPO button does not​ switch to emergency power. It turns off ALL equipment power in the space - no lights, no klaxons, nothing. In simpler setups, the EPO is connected to the UPS so anything plugged in to the UPS does dark instantly. In one DC I’m familiar with, the EPO switch kills all the UPS output and​ uses several relays to kill commercial power at the same time.

In some, the room lights were not covered by the EPO switch, in some they were. Emergency exit lamps will continue to be lit, as they have internal batteries, and are required by building/fire code.

Is it (somewhat) common for an EPO switch to only disconnect commercial power and leave local redundant power live? What sort of facilities would have this?

-Adam

It was always my understanding EPO was to be used for “We have an electrical fire and need to remove the source RFN”, not “we need to be on the redundant power instead of city power and don’t want to wait for the automatic transfer”.

That's my understanding as well. Not necessarily the room lights depending on the facility, but all equipment power. To be used if the space is on fire or someone is in the process of being electrocuted.

I've never seen a klaxon or audible alarm connected with EPO. Things just get very quiet.

Hi Daniel,

That's correct. I'm not sure what Jay was on about, but the EPO button
kills power to everything that would otherwise be protected from a
building power failure. There's generally no warning; you know it
happened from the rapid silence.

I've also never seen warning lights that the facility is on emergency
power. It's probably a good idea but I've never seen it.

Regards,
Bill Herrin

Well, there is the EPO button, which generally does that, and the (variously labeled) HALON/FM-200/GAS FIRE SUPPRESSION/GAS DISCHARGE button, which does the flashy lights and klangly bell and similar. This is fairly much always required by code, to give people time to evacuate before the gas dumps and they suffocate. People often refer to both of these as EPOs (or “the buttons that must not be pressed unless you have a REALLY good reason.”).

When I grew up (in South Africa), Halon/BCF was still in active use. When there was a fire (or you pressed and held the big red HALON button) a siren would sound and lights would flash for a few seconds to allow everyone time to evacuate the machine room.
I’m assuming that things are now less stupid, but at the local University, the BCF was stored in large gas bottles, with a pyrotechnic valve to release it. The pyrotechnic charge was initiated with LA/LS (Lead Azide/Lead Styphnate) hot-wire initiators, which were supposed to be replaced every 2 years as part of some maintenance schedule - when LA/LS ages, especially in the presence of humidity, it apparently can form a much more sensitive crystal structure, which is very shock sensitive.

The system was installed in the 1960s, and the initiators were replaced once or twice. Eventually, however, with sanctions, especially on things that can be made to go boom, it became hard to get replacements, and so they stopped replacing them… and eventually forgot about them … right up until sometime in the early 1990s, when someone accidentally knocked into the bottles with a loaded equipment cart.
By this time the initiators had become sufficiently old and ornery that they decided that they’d had enough, and set off the pyro charges, which dumped all of the Halon into the room.

Luckily everyone survived, but IIRC, two people passed out before making it to the door, and someone had to rush in and pull them to fresh air. The added gas pressure also cracked the big glass window (what’s the point in having a big mainframe with flashy lights and spinning tapes if you cannot show it off?), and also caused a few head-crashes.

W

If the generators are “emergency power”, and you need to switch back to “utility power”, obviously the way to do this must be the big red button, clearly marked as “EMERGENCY POWER OFF”, no?!

The owner of my previous company did the same thing to us many years ago because there was a small smudge on the placard between POWER and OFF that he interpreted as a dash.

He was never happy with the custom sign I hung after that, REVENUE REDUCTION SWITCH. But he never tried to be helpful after that, so mission accomplished.

No... I just hadn't had my coffee yet that morning and I crossed the streams.

That should be the response to the *ATS cutover*, not the Molly switch.

If someone hits the Molly button, you don't *need* an alarm. :-}

Cheers,
-- jra

One of the many stories that came out of 9/11 was a switching center in NY City that had a diesel generator as a power backup - which of course acted as primary when the city power is off. After a few days of operation, it needed to be refueled, so a truck was sent in carrying gasoline. The generator was refueled and restarted, and - oops - diesel != gasoline. So then they needed to bring in a new generator.

Yup, it happens, and it happened.

The utility let them know that they were going to be doing some maintenance work in the area. No impact expected, but out of an abundance of caution, they transfer over to generators. After the utility lets them know that the maintenance work is all finished, they want to switch back. If the generators are “emergency power”, and you need to switch back to “utility power”, obviously the way to do this must be the big red button, clearly marked as “EMERGENCY POWER OFF”, no?!

One of the many stories that came out of 9/11 was a switching center in NY City that had a diesel generator as a power backup - which of course acted as primary when the city power is off. After a few days of operation, it needed to be refueled, so a truck was sent in carrying gasoline. The generator was refueled and restarted, and - oops - diesel != gasoline. So then they needed to bring in a new generator.

Oooof. I’ve seen someone at a gas station do something similar – I cannot remember if it was putting diesel in their gasoline car, or gas in their diesel pickup, but I do remember the sudden yelp and look of dismay when they suddenly realized what they were doing. It must be really easy to get wrong in a car (operating on autopilot), but that’s a much less bad failure than a generator…

Anyway, refueling generators reminds me of: https://www.mail-archive.com/nanog@nanog.org/msg111947.html

W

The diesel nozzle has a larger diameter than the gasoline one. It
doesn't fit in the filler neck of a normal gasoline-powered car.

Regards,
Bill Herrin