United Airlines is Down (!) due to network connectivity problems

http://www.reuters.com/article/2015/07/08/us-ual-flights-idUSKCN0PI1IX20150708

At least, that's what I just heard on the radio. I know no other details.

Regards
Marshall Eubanks

Lifted as of 0920 EDT.

<http://www.foxnews.com/us/2015/07/08/united-airlines-flights-in-us-grounded-due-to-computer-issues/?intcmp=latestnews>

Hmmm,

Wall Street Journal and NYSE both down….

WSJ has a static page up…

DDOS ???

All completely coincidental networking issues, not related to anything
malicious.

- - ferg

Hmmm,

Wall Street Journal and NYSE both down….

WSJ has a static page up…

DDOS ???

Lifted as of 0920 EDT.

<http://www.foxnews.com/us/2015/07/08/united-airlines-flights-in-us-g

rounded-due-to-computer-issues/?intcmp=latestnews>

- --

TTFN, patrick

http://www.reuters.com/article/2015/07/08/us-ual-flights-idUSKCN0PI1

IX20150708

At least, that's what I just heard on the radio. I know no other details
.

Regards Marshall Eubanks

- --
Paul Ferguson
PGP Public Key ID: 0x54DC85B2
Key fingerprint: 19EC 2945 FEE8 D6C8 58A1 CE53 2896 AC75 54DC 85B2

And now trading has been halted at the NYSE.

http://www.npr.org/sections/thetwo-way/2015/07/08/421153353/trading-halted-on-new-york-stock-exchange

Again undisclosed technical issue

Once is happenstance
Twice is coincidence
Three times is enemy action…

Serious, could all be just everyone having a bad day. On the other hand, the WSJ has to deal with DOS/DDOS all the time, and usually if the NYSE has issues, it’s normally on a Monday.

NYSE: "The issue we are experiencing is an internal technical issue
and is not the result of a cyber breach."

https://twitter.com/NYSE/status/618818929906085888

United Air statement CNBC: “An issue with a router degraded network
connectivity for various applications. We fixed the router."

https://twitter.com/barronstechblog/status/618816643821633536

- - ferg

All completely coincidental networking issues, not related to
anything malicious.

- ferg

Hmmm,

Wall Street Journal and NYSE both down….

WSJ has a static page up…

DDOS ???

Lifted as of 0920 EDT.

<http://www.foxnews.com/us/2015/07/08/united-airlines-flights-in-us-

g

rounded-due-to-computer-issues/?intcmp=latestnews>

- --
Paul Ferguson
PGP Public Key ID: 0x54DC85B2
Key fingerprint: 19EC 2945 FEE8 D6C8 58A1 CE53 2896 AC75 54DC 85B2

- --
Paul Ferguson
PGP Public Key ID: 0x54DC85B2
Key fingerprint: 19EC 2945 FEE8 D6C8 58A1 CE53 2896 AC75 54DC 85B2

It's important to not form an opinion too early, especially anyone involved with forensic analysis of these systems. This is a classic fault in amateur investigation: an early opinion will lead you into confirmation bias, irrationally accepting data agreeing with your opinions and rejecting that disproving it.

-mel beckman

Given that the Internet is held together with paper clips, bailing
twine, and bubblegum, I'd prefer to take theses organizations' initial
word for the fact that there is nothing obviously malicious in these
outages.

The mainstream press, on the other hand, seems to want it to be a hack
or data breach or... something other than a "glitch". :slight_smile:

- - ferg

Given that the technical resources at the NYSE are significant and the lengthy duration of the outage, I believe this is more serious than is being reported. OTOH, the fact that the market is now mostly decentralized and instruments are multiply listed, the impact of the NYSE is much less serious than it used to be.

I think you are over estimating the technical resources at NYSE.

My personal, totally zero-info suspicion:

Some chuckleheaded NOC banana-eater made a typo, and discovered an entirely new
class of wondrous BGP-wedgie style "We know how we got here, but how do we get
back?" network misbehaviors....

(Such things have happened before - like the med school a few years ago that
extended their ethernet spanning tree one hop too far, and discovered that
merely removing the one hop too far wasn't sufficient to let it come back up...)

I did say significant…not brilliant :slight_smile:

Still, it’s possible that Valdis is correct, something got changed that wasn’t easy to undo. Might be a combination of network/software changes that will require significant overnight downtime.

I think you are over estimating the technical resources at NYSE.

Given that the technical resources at the NYSE are significant and the lengthy duration of the outage, I believe this is more serious than is being reported. OTOH, the fact that the market is now mostly decentralized and instruments are multiply listed, the impact of the NYSE is much less serious than it used to be.

> Given that the technical resources at the NYSE are significant and
> the lengthy duration of the outage, I believe this is more serious
> than is being reported.

My personal, totally zero-info suspicion:

Some chuckleheaded NOC banana-eater made a typo, and discovered an
entirely new class of wondrous BGP-wedgie style "We know how we got
here, but how do we get back?" network misbehaviors....

We don't know how long the underlying problem lasted, and how much of
the continued outage time is dealing with the logistics of restarting
trading mid-day. Completely stopping and then restarting trading
mid-day is likely not a quick process even if the underlying technical
issue is immediately resolved.

(Such things have happened before - like the med school a few years ago that
extended their ethernet spanning tree one hop too far, and discovered that
merely removing the one hop too far wasn't sufficient to let it come back up...)

No, but picking a bridge in the center, giving it priority sufficient
for it to become root, and then configuring timers[1] that would
support a much larger than default diameter, possibly followed by some
reboots, probably would have.

From what has been publicly stated, they likely took a much longer and

more complicated path to service restoration than was strictly
necessary. (I have no non-public information on that event. There may
be good reasons, technical or otherwise, why that wasn't the chosen
solution.)

     -- Brett

[1] You only have to configure them on the root; non-root bridges use
what root sends out, not what they ahve configured.

I noticed there are days when different nets has no links with each
other became faultly. It magically happens. We usually stop all our
planned works this days.

UA, WSJ /and/ NYSE all in the same day?

Once is an accident; twice is a coincidence...

Three times is enemy action.

Traders on the floor are being told that it’s a software glitch from new software that was rolled out Tuesday night. Nothing official has been said. The only thing I know for sure is that if the NYSE was hacked, they wouldn’t tell anyone the details for a long time, if ever.

The impact of the NYSE being down is much less significant than it used to be since most stocks are multiple-listed on other exchanges.

The lack of information through official channels is unusual though. In previous situations, there has been at least a little hand-holding. So far, nada. In fact, other than financial service provider’s emails, there has been no emails so far today from the NYSE, including the announcement of resumption of service. According the the NYSE web page, trading will resume at 3:05pm EST today with primary specialist, and 3:10 for everyone.

I’m with Ferg-dog.

I can’t tell you the number of times someone (yes, including me) has designed, purchased, and installed a system with multiple backups, failovers, redundancies, etc., and some vital piece fails in a weird way which sends the whole thing into a tailspin.

Taking UA as an example, since we have the most information (FSVO “most”), namely it was a “bad router”. Let’s assume they had multiple routers configured with VRRP, BGP, OSPF, and an alphabet soup of other ways to detect and route-around failures. Now further assume one of those routers has a software or hardware bug which doesn’t take the router out of service, but leaves it up, replying to pings, answer SNMP polls, speaking BGP or OSPF, sending VRRP hellos, etc., etc. - but also eats half of all packets going _through_ the router. That can happen, I’ve seen it first hand.

All those redundant systems do nothing, since the “bad router” is doing everything a good router would do. The systems designed to catch such problems all think things are fine, but they are not. Is it an attack? No, it’s bad luck.

Now some will claim - and perhaps rightfully - that UA should have systems which monitor for exactly this type of failure as well. Perhaps they should have, or perhaps the problem was nothing like what I explained. Either way, the point still stands that a company can have had multiple redundancies in place, but still experienced a failure mode which caused exactly the problem described.

At this point, we move on to: “All three simultaneously?!? NO WAY!!” To which I would point out they were not simultaneous. UA was back up before NYSE went down. But even if they were simultaneous, sometimes stuff happens. The human mind is very good at seeing connections, even when there are none. Absent other evidence, I’m going to believe the companies’ public statements that this was not a hack. Perhaps I am being naive, but as I said, absent other evidence, it is a perfectly plausible explanation.

> Once is an accident; twice is a coincidence...
>
> Three times is enemy action.

T

Jay Ashworth <jra@baylink.com> writes:

UA, WSJ /and/ NYSE all in the same day?

Once is an accident; twice is a coincidence...

Three times is enemy action.

Or common factors.

In this case, I think it's probably enough to point out it's the first
Tuesday of the fiscal year. For a 24x7 organization, early Tuesday
morning is a good time to do updates; you have support staff available
for the rest of the week if anything goes wrong, you can do final
planning, checks, and preparation during the day Monday, and it's
usually one of the lowest usage times.