seeing the trees in the forest of confusion

These cases seem to point to a problem with BGP route withdrawls that will
continue to increase the time it takes to recover from network problems.
Perhaps the router vendors would like to comment.

This seems inappropriate to me.

You have just said: "I sat and watched a provider keep routes around
long past their being withdrawn, and they didn't know what to do so
suggested two kludges: 1) advertising more-specifics and 2) rebooting
routers. Could some vendor comment on this problem?".

This is every vendor's worst nightmare.

Every vendor necessarily (and rightly so!) provides all users enough
rope to hang themselves with. It seems inappropriate for someone who
doesn't know what the full story is to call vendors to account.

If the provider in question adjusted some knobs and settings so as to
cause such a problem, what is the vendor to do?

How could the vendor even come close to trying to explain the problem
without detailed information about the problems and configurations?

Pessimistically speaking, it seems that there are two ways that this
thread could come to a close:

  1) People will keep badgering the vendor and the vendor
    will come out looking ugly if they cannot account for
    the problem based on insufficient data.

  2) People will all be quiet and stop complaining until
    the operator(s) in question and vendor(s) have information
    and communicate it.

2) seems obviously preferable, but I suspect that the people on this
list will go for 1) since it will allow everyone to flame and chatter
incessantly, increasing NANOG mail volume and everyone's productivity.

If anyone who has seen this problem first hand has detailed technical
information to provide, that is of course useful and welcome in this
forum. But complaining without having any of the data? What's the point?

--jhawk

If anyone who has seen this problem first hand has detailed technical
information to provide, that is of course useful and welcome in this
forum.

Thanks, anyway. Why would that help? I already have enough have enough
clueless rantings in my mailbox, We sent it to the operator(s) and the
vendor(s).

randy

> These cases seem to point to a problem with BGP route withdrawls that will
> continue to increase the time it takes to recover from network problems.
> Perhaps the router vendors would like to comment.

This seems inappropriate to me.

You have just said: "I sat and watched a provider keep routes around
long past their being withdrawn, and they didn't know what to do so
suggested two kludges: 1) advertising more-specifics and 2) rebooting
routers. Could some vendor comment on this problem?".

Perhaps I should have been more clear with what the provider did during
the 5 hours that the routing loop continued in there backbone. It didn't
take 5 hours to for the provider to identify that there was a problem with
the routes in their tables (i.e. a few of their routers in their IBGP mesh
had more specifics from Provider Y while most did not). Instead, it took
the provider 5 hours to troubleshoot the problem with the router vendor
before both agreed that it was a software bug and identified the need to
reload some of the routers. The hack of advertising more specifics was
used to buy time before reloading the routers to minimize the impact.

This is every vendor's worst nightmare.

Every vendor necessarily (and rightly so!) provides all users enough
rope to hang themselves with. It seems inappropriate for someone who
doesn't know what the full story is to call vendors to account.

If the provider in question adjusted some knobs and settings so as to
cause such a problem, what is the vendor to do?

How could the vendor even come close to trying to explain the problem
without detailed information about the problems and configurations?

Pessimistically speaking, it seems that there are two ways that this
thread could come to a close:

  1) People will keep badgering the vendor and the vendor
    will come out looking ugly if they cannot account for
    the problem based on insufficient data.

  2) People will all be quiet and stop complaining until
    the operator(s) in question and vendor(s) have information
    and communicate it.

2) seems obviously preferable, but I suspect that the people on this
list will go for 1) since it will allow everyone to flame and chatter
incessantly, increasing NANOG mail volume and everyone's productivity.

If I'm the only person that's seen this type of problem, I'll shut up
about it. But if this type of problem has impacted more providers, I
think it's appropriate in this forum to ask the router vendors to comment
on any known problems with BGP route withdrawals. If they don't have
enough information to account for the problem, then they should tell us
that so we can get the data to them the next time something like this
happens.

- Doug

jhawk@bbnplanet.com (John Hawkinson) writes:

Every vendor necessarily (and rightly so!) provides all users enough
rope to hang themselves with.

I disagree stridently.

While it's certainly in the vendors best interests to cater to the whims of
their customers, the ethical, responsible and far-sighted vendor also has
other responsibilities. In the case of BGP, the most common usage of the
product is to interconnect within the wider community. A community that
has a diversity of talent and experience. And a community that the vendor
would like to retain as customers.

Violating the basics of the protocol are not a reasonable, ethical,
responsible, or intelligent way of accomodating the customer. And
sometimes, just sometimes, when the needs of the customer conflict with the
needs of the community, the customer loses. Better that than the entire
community.

As an example, a certain vendors BGP implementation does not (did not?)
support the oft-requested 'sed on the AS path' for precisely this reason.

That said, there is also a question of intent. We should be careful not to
confuse accidents with incompetence or maliciousness. Vendors make
mistakes ;-), some of which can be misconstrued in a rush to place blame.

Tony