router startup behavior

at university of washington, we are doing a measurement study of bgp
misconfiguration
(http://www.cs.washington.edu/homes/ratul/bgp/index.html).

one of the things we found is that there are a lot of announcements of
more-specifics that come and go within a matter of 2-5 minutes.

by talking to the operators involved in these incidents, we found that
most of these are caused when the router is rebooted (intentionally or
not). while some operators were aware of this side effect, most were not,
and were taken by surprise that they just injected anywhere from 1-1000
routes into BGP only to withdraw them a couple of minutes later.

i would like to understand this behavior better. is this behavior
vendor-specific (cisco?) or pervasive? is there a configuration style that
causes or avoids this "spill-over"?

my understanding is limited to this happens when the bgp session comes up
too soon, before the filters have taken effect. could someone familiar
with router internals shed some light on it?

the problem is limited to route origination only, or also propagation?
in other words, can a router propagate a route it should not while
starting up because export filters are not yet in place?

never ever gotten my hands dirty into router configuration; your input
would be invaluable.

thanks,
  -- ratul

at university of washington, we are doing a measurement study of bgp
misconfiguration
(Microsoft Research – Emerging Technology, Computer, and Software Research).

one of the things we found is that there are a lot of announcements of
more-specifics that come and go within a matter of 2-5 minutes.

by talking to the operators involved in these incidents, we found that
most of these are caused when the router is rebooted (intentionally or
not). while some operators were aware of this side effect, most were not,
and were taken by surprise that they just injected anywhere from 1-1000
routes into BGP only to withdraw them a couple of minutes later.

i would like to understand this behavior better. is this behavior
vendor-specific (cisco?) or pervasive? is there a configuration style that
causes or avoids this "spill-over"?

  It appears that routes are leaking out past a route-map
based community based on a route that you e-mailed me about (as267 /30)
that went to route-views.

my understanding is limited to this happens when the bgp session comes up
too soon, before the filters have taken effect. could someone familiar
with router internals shed some light on it?

the problem is limited to route origination only, or also propagation?
in other words, can a router propagate a route it should not while
starting up because export filters are not yet in place?

never ever gotten my hands dirty into router configuration; your input
would be invaluable.

  - jared

Here is my best guess as to what you are seeing. Most likely a large CIDR
block is announced
by a service provider A. A small CIDR block is given to a customer who is
connected to multiple
service providers and thus running BGP. Now the more specific route is
announced by service provider B,
he does not own the block but is announcing it on behalf of service provider
As customer. What is happening is that the customer has a line or router
failure and that withdraws their more specific announcement from service
provider B. Since the service provider A is announcing a supernet route he
now becomes the only route
for that block.

If that's the problem, a fix might be to not advertise any routes to a BGP
peer until you receive all the routes that peer has to send you. I think it's
elegant that when two routers connect, neither sends any routes to the other
until each has received all the routes the other has to send. Very Zen, don't
you think?

  DS

a fix might be to not advertise any routes to a BGP peer until you
receive all the routes that peer has to send you.

that will *greatly* reduce the garbage in the global routing table. to
zero, in fact.

and, of course, we can not know when a peer has sent all the routes it
has to send to you.

randy

how long would you wait for?

Well, wait till it sends you the FIRST route, and then wait a fixed amount
of time for it to finish babbling. Figuring out how long the OTHER end will
wait for you to finish babbling before sending the first route is left as
a trivial excersize... :wink:

And how long do you think convergence would take in this case?

to the best of my knowledge, here is what is happening.

1. router starts rebooting
2. there are routes in the routing table, some of which are not to
be announce according to filters
3. bgp sessions comes up; the filters have not yet taken effect
4. start announcing routes
5. filters come up
6. the router realizes that it made a mistake and withdraws the routes not
meant to be announced.

i should also point out that all such incident are not 1000 router. most
of them are 20-50, but i have seen non-trivial number of ~100 prefixes,
and a couple more than that.

  -- ratul

Consider it very extreme flap dampening. :wink:

:-> "Ratul" == Ratul Mahajan <ratul@cs.washington.edu> writes:

    > i should also point out that all such incident are not 1000 router. most
    > of them are 20-50, but i have seen non-trivial number of ~100 prefixes,
    > and a couple more than that.

Do you have data about brand/model of the routers involved? I'd expect
distributed equipment not to show such a behaviour, but who can tell...

Pf

unlikely.

"route-filtering", "BGP" and "route announcement" all go hand-in-hand.
all are control-plane functions.

for router-vendors that matter, i doubt that the behavior you describe occurs.

the most likely cause would be one of:
  (a) a bug. (but if it is there, i'm surprised it isn't causing more stress/oscillation)
  (b) people changing route-policy. ie. it isn't "router starting up" but more likely
      someone going "route-map FOO deny 20"; "no match ...", "match ...".
  (c) script used to configure router(s) adds a 'network' statement prior to trimming
      route-filters
  (d) too many people experimenting with route-injectors on that live-production-
      network-known-as-the-internet.

cheers,

lincoln.