Reliability of looking glass sites / rviews

This weekend our uninterruptible power supply became interruptible and we lost all circuits. While I was doing initial debugging of the problem while I waited on site power verification, I noticed that there was still paths being shown in rviews for the circuit that were down. This was over an hour after we went hard down and it took hours before we were back up.

I worked with our providers last night to verify there weren't any hanging static routes, etc... We shut the upstream circuit down and watched the convergence and saw that eventually all the paths disappeared. Given what we saw on Saturday, what would cause route-views to cache the paths that long? Some looking glass sites only show what they are peered with or at most what their peers are peered with, that's why I've always used route-views.

What looking glass sites other than route-views would people recommend?

This weekend our uninterruptible power supply became interruptible and we
lost all circuits. While I was doing initial debugging of the problem while
I waited on site power verification, I noticed that there was still paths
being shown in rviews for the circuit that were down. This was over an hour
after we went hard down and it took hours before we were back up.

explicit vs implicit withdrawals causing different handling of the problem
routes?

I worked with our providers last night to verify there weren't any hanging
static routes, etc... We shut the upstream circuit down and watched the
convergence and saw that eventually all the paths disappeared. Given what
we saw on Saturday, what would cause route-views to cache the paths that
long? Some looking glass sites only show what they are peered with or at
most what their peers are peered with, that's why I've always used
route-views.

What looking glass sites other than route-views would people recommend?

ripe ris.

Both should have been similar.

In the first case we lost power to all of our BGP border routers that are peered with the upstream providers
In the second case, I did an explicit “shut” on the interface connected to the upstream provider that appeared “stuck” after an hour after the outage.

Both should have been similar.

In the first case we lost power to all of our BGP border routers that are
peered with the upstream providers
In the second case, I did an explicit “shut” on the interface connected to
the upstream provider that appeared “stuck” after an hour after the outage.

oh, I had thought when you broke the second time intentionally you might
have shut the bgp session (and then maybe the interface too) ... causing a
different semantic for the withdrawals in the isp network.

it's possible that if the load of updates was large enough for the ISP(s)
in question that things simply took a long while to process. In a recent
similar situation we'd observed updates/convergence taking upwards of 30
minutes to trickle through the global system :frowning:

You didn't mention details about which ASN or prefixes you were
checking. Are you referring to ASN 14607 that only advertises two
prefixes 129.77.0.0/16 and 2620:0:2810::/48?

Based what we see over the weekend (using routeviews data), we see:

Event Start Time: 2017-09-09 11:29:23 UTC (2017-09-09 07:29:23 EDT)
Event End Time: 2017-09-09 13:31:30 UTC (2017-09-09 09:31:30 EDT)

Are the above times correct?

We see the routes withdraw and then come back. For example:
http://demo-rv.snas.io:3000/dashboard/db/prefix-history?orgId=2&var-prefix=129.77.0.0&var-prefix_len=16&var-asn_num=All&var-router_name=All&var-peer_name=All&from=1504908000000&to=1505203200000

When you checked routeviews, which router and peer were you looking at?
When you did a "show ip bgp ..." did you include the prefix length? If
not, it would have then shown you 0/0 or 128/5, depending on which
router you were on.

--Tim

On 9/13/17, 8:43 AM, "NANOG on behalf of Matthew Huff"

ASN 14607, and 129.77.0.0/16

After slightly over an hour after our power event where 100% of our equipment was down, this is what I saw at routeviews

BGP routing table entry for 129.77.0.0/16, version 24978989
Paths: (7 available, best #7, table default)
Not advertised to any peer
Refresh Epoch 1
134708 3491 6939 46887 14607
103.197.104.1 from 103.197.104.1 (123.108.254.70)
Origin IGP, localpref 100, valid, external
rx pathid: 0, tx pathid: 0
Refresh Epoch 1
3333 1273 6939 46887 14607
193.0.0.56 from 193.0.0.56 (193.0.0.56)
Origin IGP, localpref 100, valid, external
Community: 1273:23000
rx pathid: 0, tx pathid: 0
Refresh Epoch 1
8283 57866 6762 6939 46887 14607
94.142.247.3 from 94.142.247.3 (94.142.247.3)
Origin IGP, metric 0, localpref 100, valid, external
Community: 6762:33 6762:16500 8283:15 57866:105
unknown transitive attribute: flag 0xE0 type 0x20 length 0xC
value 0000 205B 0000 0006 0000 000F
rx pathid: 0, tx pathid: 0
Refresh Epoch 1
24441 3491 3491 6939 46887 14607
202.93.8.242 from 202.93.8.242 (202.93.8.242)
Origin IGP, localpref 100, valid, external
rx pathid: 0, tx pathid: 0
Refresh Epoch 1
20912 1267 1273 6939 46887 14607
212.66.96.126 from 212.66.96.126 (212.66.96.126)
Origin IGP, localpref 100, valid, external
Community: 1273:23000 9035:50 9035:100 20912:65001
rx pathid: 0, tx pathid: 0
Refresh Epoch 1
1221 4637 6939 46887 14607
203.62.252.83 from 203.62.252.83 (203.62.252.83)
Origin IGP, localpref 100, valid, external
rx pathid: 0, tx pathid: 0
Refresh Epoch 1
2497 6939 46887 14607
202.232.0.2 from 202.232.0.2 (202.232.0.2)
Origin IGP, localpref 100, valid, external, best
rx pathid: 0, tx pathid: 0x0

Hello Matthew,
    I think you may be interested in Isolario (www.isolario.it). It's a
route collector which offer real-time analyses in change of full routing
tables. Let me know if you want more details about that!

Best regards,
Alessandro