More on Moscow power failure( was RE: Moscow: global power outage)

More from MosNews:

UES Management Faces Criminal Investigation After Moscow Power Cut

Russian prosecutors on Wednesday opened a criminal case against the management of power monopoly Unified Energy System (UES) after a major power outage in Moscow, agencies reported Wednesday.

The case was opened to investigate possible negligence, the Interfax agency quoted the Prosecutor General�s Office as saying.

Under Russian law, prosecutors must formally open a criminal case to allow police fully to investigate the incident. It does not necessarily lead to prosecution, Reuters reports.

President Vladimir Putin has already blamed UES Chief Executive Anatoly Chubais for the power cut which left much of the capital without power, saying management had neglected the company�s problems to concentrate on a restructuring plan.

Chubais, a leading political liberal who is spearheading the reform of the electricity giant, is viewed with suspicion by Kremlin hardliners, Reuters adds.

http://mosnews.com/news/2005/05/25/chubaiscriminalcase.shtml

- ferg

More from MosNews:

This is a publication run by right-wing ex-pat American
business men with an ax to grind.

President Vladimir Putin has already blamed UES Chief Executive
Anatoly Chubais for the power cut which left much of the capital
without power, saying management had neglected the company?s
problems to concentrate on a restructuring plan.

And Anatoly Chubais said:
   Here it is not possible to be of two minds, RAO "UES Russia"
   and I, personally as management, are responsible for the
   accident. All of this is on our conscience, nobody is going
   to shift the responsibility, but the basic actions are now
   directed towards restoration of service and on blocking
   the development more failures.

Would the CEO of your company be as forthright, even in today's
post-Enron, post-Worldcom world?

The technical roots of the failure have been blamed on equipment
dating from 1958-1962 timeframe which wasn't kept repaired and
up-to-date. This is reminiscent of the Comair systems failure
http://www.cio.com/archive/050105/comair.html
and the 9-11 problems with emergency responder preparedness
and the Challenger O-ring failure and the Columbia foam incident.

How many old and forgotten devices and systems are doing mission
critical jobs in your network?

--Michael Dillon

The Russian media have lots of details about the power
outage, but the general media hardly mentions the fact
that there was a disruption of Internet service.

The MSK-IX web page still has no news about the incident
and no explanation as to why they shut down.

Here is one Russian article that covers the shutdown
http://www.webplanet.ru/news/internet/2005/5/25/shit_happens.html
but they are scratching their heads as well. They
say it is completely incomprehensible why MSK-IX was
shut down because there should have been a reserve
generator system in place. According to them, during
normal times 80% of Russian Internet traffic passes
through MSK-IX. Traffic did get rerouted to alternate
international routes, however they became clogged up
because the major international routes all rely on
MSK-IX.

Major Russian websites maintained power at their own
data centres but that didn't help when most of their
traffic goes through MSK-IX.

The article summarizes by saying that the chief problem
today is that there is not alternative internet exchange
in Moscow and that means that it is easy to cut off
Moscow from the Internet, even easier than one might
have thought it would be.

To that, I would add that Russia's entire telecomms
infrastructure is still too highly centralized on
Moscow. Even in a small country like England, we are
moving away from centralizing everything through the
capital city.

Yet another lesson in how a single-point-of-failure
is just plain bad design which *WILL* bite somebody
in the end.

--Michael Dillon

Finally, a bit more info found in this Russian
article http://www.comnews.ru/index.cfm?id=15645

According to the director of a well-known but
unnamed Russian telecoms company, there were no
diesel generators at MSK-IX. They had 3 external
power feeds which all failed at once due to the
cascading failure. UPS systems lasted from
one-half to two hours. He says that they learned
the lesson that they need to build a few
distributed and technically independent exchanges
even in the capital, Moscow.

Some background on the power failure.
It started with a fire in old equipment which caused
a major power station to shed load and shut down
in the middle of the night. As the sun rose and
Moscow's power demands grew, this initiated a
cascading failure which spread 200 kms south.
However, it did not affect most of the northern
half of the city. It did not affect the military
who switched to their own generators. This is rather
important considering that this "military" is responsible
for roughly half of the serious nuclear weapons arsenal
in the world. The military brought out their portable
generators to support hospitals so it would appear that
all hospitals did not have independent backup power.
In Southern Moscow, much of the cellular telephone
service also failed. In one of the regional towns
a chemical factory released a cloud of nitrogen oxides
which cause the population to panic and begin evacuation
because in that town even the landlines had failed.

After a lot of work, most power stations were back online
this morning. There were only 400 apartment buildings
with no power compared to thousands yesterday. The damaged
station where the fire occurred is still not functioning
and some backup power generation is still in place. The
metro is running but some suburban electrical train lines
are still shutdown.

All in all, this was a remarkable event. The causes were
identified so quickly. They recovered from the outage so
quickly. The country's major Internet exchange was shown
to be remarkably short-sighted.

--Michael Dillon

It's not clear to me that the MSK-IX shut down entirely, although it does look like it took a major hit. While I see most of our MSK-IX sessions came up around 2 days 3 hours ago we have at least one that has been up for 4 weeks, suggesting that at least part of one of the switch fabrics stayed up throughout.

The F root nameserver in Moscow is colocated with RIPN. Neither of the nameservers in the F-root cluster there show signs of power failure, in case it helps anybody else here to know of a site in Moscow that has functional power supply protection.

F-root traffic graphs in Moscow suggest local impact was limited to a 5-6 hour window ending around midnight Tuesday UTC.

Joe

RIPN and Relcom was not affected, except their M9 colocations. They had, in
theory, backup connectivity thru another node, but I am not sure, if it
really worked or not.