Announcing BGP troubleshooting work for ISPs

Dear all,

We (jointly with AT&T Research Lab) have recently developed a real-time
troubleshooting tool that identifies significant and actionable BGP
routing events (on the order of a few dozen) from millions of BGP updates
from border routers of a given ISP network. The goal is for an ISP's
network operators to identify locally observed BGP events that are
important (e.g., affect a large number of prefixes, shift a lot of
traffic, etc.)

Our paper is published at NSDI:

The talk slides are at:

We have several interesting findings when we applied our tool for the AT&T

-We found more than 15% of the updates are due to persistently flapping
prefixes even when flap damping was enabled. The reason is that flap
damping is session-based. When a session is reset, the damping history is
not retained. Moreover, damping is not implemented for iBGP sessions.

There are three main causes for persistent flapping:
(1) Conservative damping parameters
(2) Protocol oscillations due to MED
(3) Unstable interfaces or BGP sessions

-We found eBGP session resets and hot potato changes contribute to many
routing disturbances, and most of the routing events that have major
impact on traffic shift are also due to session resets and hot potato

Please let us know if you have any comments/feedback.
Unfortunately, the tool is not yet available, but the detailed information
of how the tool works is available in the paper.


-Z. Morley Mao, Jian Wu, Jennifer Rexford, Jia Wang