RE: AT&T NYC

Since when is BGP a bug-free protocol? Let's not forget the BGP best
path selection algorithm itself is broken (there are circumstances under
which it will NEVER converge on a best path see ietf draft on IDR route
oscillation). Not to mention the various malformed AS-Path bugs which
have shown up over the years. I took a vendor class once where they made
us do a lab where we had to run BGP w/o an IGP, in a later revision of
the class they removed that lab because they decided it was too much of
a nightmare even for a lab environment.

Since when is BGP a bug-free protocol? Let's not forget the BGP best
path selection algorithm itself is broken (there are circumstances under
which it will NEVER converge on a best path see ietf draft on IDR route
oscillation). Not to mention the various malformed AS-Path bugs which
have shown up over the years. I took a vendor class once where they made
us do a lab where we had to run BGP w/o an IGP, in a later revision of
the class they removed that lab because they decided it was too much of
a nightmare even for a lab environment.

BGP is not a bug-free protocol.

BGP is the easiest protocol to *debug* when the problem shows up.

BGP does not help to accidently affect *unaffected* paths when a problem
shows up.

It looks like everyone forgot the reason for this discussion to begin with.
It is the outage caused by a mistake on a single router that affected parts
of the network that were *NOT* affected by the original mess.

Please not that this discussion tends to get restarted whenever we have a
real OSPF (or ISIS) caused mess.

Alex

Actually, the RFC says the route selection algorithm is a local matter, so
if it's broken on your network, then strictly speaking, it's your own
fault.

You keep referring to the problem of OSPF causing the outage
for AT&T and unaffected customers. The AT&T released RFO simply states
that OSPF network statements were removed. That can happen just as easy
with static routes and BGP network/neighbor statements.

OSPF did what it was instructed to do, just as BGP would have done if it
were told to drop neighbors, or networks.

-jf

You keep referring to the problem of OSPF causing the outage
for AT&T and unaffected customers. The AT&T released RFO simply states
that OSPF network statements were removed. That can happen just as easy
with static routes and BGP network/neighbor statements.

OSPF did what it was instructed to do, just as BGP would have done if it
were told to drop neighbors, or networks.

OSPF network statements were removed, according to RFO, which I have
received, on one router. Can you please explain to me why customers in other
*cities* which clearly were terminated into different routers were affected?

Since we know based on our emprirical observation that it did happen, it can
be concluded that AT&T has bad network design. It does not matter *why*
customers who were not terminated into the affected routers could not use
AT&T network. What matters is that they *could* b not use AT&T's network
because AT&T's engineering made a choice of using a broken design. This
broken design is going to cost AT&T a couple of million. Hopefully, at some
point a VP of Engineering for AT&T is going to realize that his job is going
to be on the line if stuff like this keeps happening, at which point certain
engineers within AT&T are going to get their heads handed back to them on a
platter. Again, hopefully at that point, those who remain at AT&T will
realize that their existing design is broken and another outage is going to
cost them their jobs and redo it. At the end we are going to have a lot more
stability on the internet.

As far as BGP would have done the same thing: would you mind desciring a
configuration of BGP where deletion of a network statement in one router
would cause unreachability across paths that do not *realy* on that network
statement?

Alex