seeing the trees in the forest of confusion

  I suppose it is more fun to criticize policy and NSPs, but it
  may well be a hole in the BGP protocol, or more likely
  implementations in vendor's code [or user's implementation
  of twiddleable holddown timers].

My (possibly misinformed) understanding was that certain NSPs running
Cisco backbones had holddown timers configured to delay withdrawls. Even
after 7007 was disconnected, there were 7007 routes still being advertised
well over an hour later. I do not believe these NSPs are going to have
timers configured for >1hr.

We've seen a problem before where a transit provider (Cisco based) was
causing us problems, and we decided to turn them off. They were still
advertising our routes an hour later. (Provider unconnected with any
in this case). Pulling the session back up and clearing it did not
help things.

I'd therefore suggest that your analysis is correct. >80% of the
downtime is due either to a protocol bug or a s/w bug somewhere, not
NOC failure.

Alex Bligh
Xara Networks

I agree that there appears to be some underlying problem with the BGP code
on the backbone that is delaying route withdrawals beyond a reasonable
time. We ran into a similar problem Wednesday night where one of our
customers started advertising more specifics for our network blocks to
another transit provider (who does not filter customer routes). After
shutting down the customer's BGP peering, the bogus routes were still in
the table an hour later at which time we started advertising our own more
specifics to restore service to our other customers -- this lead to our
unfortunate position in Thursday's CIDR report.

On a possibly related note, when we stopped advertising the more specifics
4 hours later, one of our transit providers (call them X) continued to
hold some of the more specific routes in a _portion_ of their BGP tables
with a next hop pointing to another of our transit providers (call them Y)
despite the fact that the Y no longer had the more specifics routes
anywhere in there tables. This continued to cause a routing loop in X's
network (due to the inconsistent routes within their IBGP mesh) for 5
hours as X attempted to isolate the problem. After that point, X's
solution was for us to announce more specifics for the affected networks
until they could schedule some core router reloads.

These cases seem to point to a problem with BGP route withdrawls that will
continue to increase the time it takes to recover from network problems.
Perhaps the router vendors would like to comment.

- Doug

/ Douglas A. Junkins | Network Engineering \
/ Network Engineer | NorthWestNet \
\ junkins@nwnet.net | Bellevue, Washington, USA /
\ +1-206-649-7419 | /

junkins@nwnet.net (Doug Junkins) writes:

These cases seem to point to a problem with BGP route withdrawls that will
continue to increase the time it takes to recover from network problems.
Perhaps the router vendors would like to comment.

I'm not a router vendor, but I used to play one on TV. Is that close
enough? :wink:

Let me comment about BGP, the protocol, as opposed to BGP, the
implementation.

The protocol dictates that a BGP speaker that receives a withdrawl for a
prefix _MUST_ promptly distribute that withdrawl. The reason for this is
obvious: a router which has no route to a prefix is blackholing traffic or,
if it has selected a different path, is possibly contributing to a
forwarding loop. We can argue about the definition of 'promptly', but I
hope it's clear that taking hours to withdraw the route is out of the
question.

Now, please note that a BGP speaker that receives a reachability
announcement for a prefix MAY decide to not advertise it for an indefinite
period of time, for whatever reason. However, this is subject to some
restrictions. If the newly reachable prefix is installed in the routers
forwarding table and it chooses not to advertise this fact, the router MUST
NOT advertise a shorter overlapping prefix. Again, this would be lying
about the forwarding path that packets might take, so there's possibly a
forwarding loop.

What does this mean for an implementation? In particular, how MUST flap
damping work? Flap damping MUST NOT damp out withdrawls. Note that a
_history_ of withdrawals may well be data used by subsequent flap damping
computations, but the withdrawl itself should propagate. Flap damping
SHOULD happen on reachability advertisement. To simplify the
implementation, most folks are likely to choose to suppress newly
advertised routes for a time. While the path is suppressed, the
implementation probably does NOT want to install the path in its forwarding
table. That would be painful. Only after the path finishes its
suppression period should it be installed and then promptly advertised.

If your router's implementation is significantly different than this, you
might wanna have a talk with them.

Sooner would be better than later. :wink:

Please note that I'm not throwing stones or pointing fingers. I have no
knowledge of the internals of what happened other than what's appeared on
this list. However, the reports are disturbing and there seems to be some
considerable confusion about the internals of BGP, so I thought some
education was in order.

Tony