Internet routing table "completeness" monitoring?

ML11 · October 3, 2012, 4:43am

Has anyone put in place a method to identify if one their BGP peers suddenly withdraws X% of their prefixes?

e.g I should expect ~420k prefixes in a "complete"[1] routing table from a transit peer today. If suddenly I'm only getting 390k prefixes I'd guess a major network was depeered or similiar.

If so how are people doing this? SNMP MIB, screen scrape?

[1] Varying levels of completeless apply.

Saku_Ytti1 · October 3, 2012, 6:52am

I've had monitoring for this for many years, over SNMP. Right now my limits
are

a) prefix count went or came from 0
or
b) relative difference is minimum 1.5x and absolute difference is minimum of 1000

Output what I get as emails:
rtr1: AS702 2001:600:202::15 ge-1-0-4.BR2.LND18.ALTER.NET 0 => 34
rtr2: AS2119 148.122.8.213 ti3001b300-ge3-1-0.ti.telenor.net 688 => 0 (1/3)
rtr2: AS2119 2001:4600:10::4d ti3001b300-ge3-1-0.ti.telenor.net 13 => 0 (2/3)
rtr3: AS3491 80.81.192.50 br02.frf02.pccwbtn.net 37548 => 4710

And there are about 10-20 emails per day, even when looking only rather
'coarse' changes.

But to be honest, I almost never peek at the folder where I get these, I'm
probably moving the output on IRC channel, as I've found it superior way to
keep track of alarms compared to emails for my workflow.

Jay_Ashworth · October 3, 2012, 1:47pm

Well, if I had to do it, I think I'd just point munin at the router, yes,
using SNMP, and put the prefix count graph up on the Big Wall, as a filled
curve. That thing jumps around, someone will likely notice.

This assumes a staffed NOC, of course, but it still gives you something to
look at historically if you note a problem in an unstaffed situation as well.

Cheers,
-- jra

Joseph_Jackson1 · October 3, 2012, 1:50pm

I have cacti graph the amount of prefixes announced and withdrawn from a BGP peer on each BGP router.

Eric_Tykwinski · October 3, 2012, 1:55pm

I agree, and just use the Threshold plugin so when it drops below or goes
above a certain # to notify you.
http://docs.cacti.net/plugin:thold

Joseph_Jackson1 · October 3, 2012, 2:01pm

Not sure I don't have any non-cisco BGP routers. Sorry!

William_F_Maton_Soto · October 3, 2012, 2:16pm

I have cacti graph the amount of prefixes announced and withdrawn from a BGP peer on each BGP router.

+1

Note that not all router OSs support fetching data like that via SNMP.

We use a custom built thing internally that does this two, which we then tack on an alert threshold for. So if a downstream peer sends us less than that, we get an alert. Handy for those times when they call and ask us what we did to their network.

Prior to that, we had a script which whould login, munge the 'show ip bgp summary' table output, figure out the deltas and graph or report as needed on a particularly troublesome peer.

From: ML [mailto:ml@kenweb.org]
Sent: Tuesday, October 02, 2012 11:43 PM
To: North American Networking and Offtopic Gripes List
Subject: Internet routing table "completeness" monitoring?

Has anyone put in place a method to identify if one their BGP peers suddenly withdraws X% of their prefixes?

e.g I should expect ~420k prefixes in a "complete"[1] routing table from a transit peer today. If suddenly I'm only getting 390k prefixes I'd guess a major network was depeered or similiar.

If so how are people doing this? SNMP MIB, screen scrape?

[1] Varying levels of completeless apply.

wfms

Christopher_Morrow · October 3, 2012, 3:02pm

is a threshold helpful here? (well, it's helpful to a point at least)
what if your neighbour starts deaggragating (or sending you their
internal deaggragates) in place of 50k real routes? no alarm, no
'change' from a numbers perspective, but certainly a traffic shift and
reach-ability change

Isn't a speed-of-change threshold also interesting here?

Andrew_Gallo · October 3, 2012, 3:34pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I have cacti graph the amount of prefixes announced and withdrawn

from a BGP peer on each BGP router.

+1

Note that not all router OSs support fetching data like that via SNMP.

We use a custom built thing internally that does this two, which we

then tack on an alert threshold for. So if a downstream peer sends us
less than that, we get an alert. Handy for those times when they call
and ask us what we did to their network.

Prior to that, we had a script which whould login, munge the 'show ip

bgp summary' table output, figure out the deltas and graph or report as
needed on a particularly troublesome peer.

From: ML [mailto:ml@kenweb.org]
Sent: Tuesday, October 02, 2012 11:43 PM
To: North American Networking and Offtopic Gripes List
Subject: Internet routing table "completeness" monitoring?

Has anyone put in place a method to identify if one their BGP peers

suddenly withdraws X% of their prefixes?

e.g I should expect ~420k prefixes in a "complete"[1] routing table

from a transit peer today. If suddenly I'm only getting 390k prefixes
I'd guess a major network was depeered or similiar.

If so how are people doing this? SNMP MIB, screen scrape?

[1] Varying levels of completeless apply.

wfms

So, there is something called the BGP Monitoring Protocol (BMP):

http://www.nanog.org/meetings/nanog45/abstracts.php?pt=MTM2NiZuYW5vZzQ1&nm=nanog45

Looks like it is supported in JunOS.

Has anyone used it? If so, what monitoring software are you using?

Justin_M_Streiner · October 3, 2012, 6:21pm

is a threshold helpful here? (well, it's helpful to a point at least)
what if your neighbour starts deaggragating (or sending you their
internal deaggragates) in place of 50k real routes? no alarm, no
'change' from a numbers perspective, but certainly a traffic shift and
reach-ability change

As long as you have some control over the number of polling intervals between the detection of a noteworthy change and sending an alarm. Otherwise, there is a real danger of your NOC having to investigate a lot of noisy alerts. If that persists for too long, the NOC will grow tired of responding to these alerts, and send them all to the bit bucket, or implement their own polling thresholds that meet their needs more effectively.

If a network you have no business relationship with and several AS hops away from you goes away, how much effort do you want to expend investigating that? That probably depends on your customers. If you see a few hundred routes disappear and determine them to be for an ISP on the other side of the planet, that's one thing. If your view of something like Google or Facebook suddenly disappears, that could be another thing entirely

Isn't a speed-of-change threshold also interesting here?

+1 on that

jms