[fwd] Rats take down Stanford ...

A follow-up thought on redundancy issues.

- paul


Date: Mon, 21 Oct 1996 12:54:05 -0700 (PDT)
From: risks@csl.sri.com
Subject: RISKS DIGEST 18.54


Date: Fri, 18 Oct 96 11:03 EST
From: William Hugh Murray <0003158580@mcimail.com>
Subject: Re: Rats take down Stanford ... (RISKS-18.53)

PGN's request for redundancy brings to mind the story of the infrastructure
computer center in Trumbull, Connecticut. It is an old story but bears

Seems that a squirrel got into a transformer and brought down the external
power supply. The UPS kicked in, engine generators came on line, and the
center operated in this mode for about an hour and a half. At the end of
that time the external power was restored. The external power, the UPS, and
the engine generators went inot a deadly embrace. The whole thing came down
and would not come back up.

I take two lessons from this. First, redundancy adds some complexity and a
lot of redundancy adds a lot of complexity. At some point the redundancy
begins to introduce failure modes and failure events that would not have
exited in its absence. There is an upper bound to such redundancy.

Second, test redundant systems through to resumption of normal operations.
In this case, the operators had tested to ensure that the redundant systems
would come online in the event of a failure of the primary system. They had
not tested to see what would happen when the primary system was restored to
normal operation.

Who would have even thought about it? I confess that I would not have.

William Hugh Murray, New Canaan, Connecticut



As a rule we exercise our generator under load every Thursday at 1000hrs.
I would like to do this monthly but, the automatic exercise mechanism
cannot deal with 30 day intervals. In addition, once every 6 months, we
simulate a power outage by switching off and on the main breaker at the
service entrance durig a traffic lull period.


Patrick J. Chicas
Email: pjc@unix.off-road.com
URL: http://www.Off-Road.com