RE: BGP and The zero window edge

Ben's blog details an experiment in which he advertises routes and then
withdraws them, but some of them remain stuck for days.

I'd like to get to the bottom of this problem.

Has anyone else seen this before or can provide data to analyze?
On or off list.

Regards,
Jakob.

Dear Jakob, group,

Ben's blog details an experiment in which he advertises routes and then
withdraws them, but some of them remain stuck for days.

I'd like to get to the bottom of this problem.

I think there are *two* problems:

1) some BGP implementations (or multi-node BGP configurations) sometimes
   end up getting stuck in one way or another.

2) other BGP nodes are not able to disconnect/reconnect to systems
   suffering from instantiations of problem #1.

While on the one hand it is important to follow-up on each and every
instantiation of problem #1, I personally think it also is worthwhile
exploring whether the BGP FSM itself can be redefined in a way that
encourages BGP protocol implementations to be more robust and rely less
on the remote peer behaving correctly.

Once Problem #2 is addressed, finding and isolating instances of Problem
#1 will become much easier.

Has anyone else seen this before or can provide data to analyze?
On or off list.

From the BGP Default-Free Zone perspective it is hard to differentiate
between an entire (multi-vendor) Autonomous System being stuck, or just
one router.

To test individual router implementations this tool is useful
GitHub - benjojo/bgp-zerowindow-test: A malicious BGP daemon that forces a TCP zero window edge case - but please keep in mind
that "TCP Recv Wind == 0" trick is just one way to easily get a BGP peer
to manifest the problematic behavior.

From a BGP protocol perspective BGP nodes shouldn't inspect the TCP
receive window, but rather focus on whether all locally available
signals indicate that the remote peer is still progressing data.

Kind regards,

Job

I'd like to get some data on what actually happened
in the real cases and analyze it.

If it's a Cisco router at fault, then we have a bug to fix.
Even if it's not a Cisco, there may be ways we can help
to avoid the situation.
However, before we start on solutions, I'd like to get
a good understanding of what actually happened.

TCP zero window is possible, but many other things could
cause it too.

Anyone?

Regards,
Jakob.

- BGP Zombies | RIPE Labs
- BGP zombie routes

kind regards,

Indeed. There could be a number of reasons that caused it.

Switchings away from TCP win=0 towards "Zombie Routes":

*RIGHT NOW* (at the moment of writing), there are a number of zombie
route visible in the IPv6 Default-Free Zone:

One example is NLNOG RING looking glass

    2a0b:6b86:d15::/48 via:
        BGP.as_path: 204092 57199 35280 6939 42615 42615 212232
        BGP.as_path: 208627 207910 57199 35280 6939 42615 42615 212232
        BGP.as_path: 208627 207910 57199 35280 6939 42615 42615 212232
    (first announced April 15th, last withdrawn April 15th, 2021)
    
Another one is NLNOG RING looking glass

    2a0b:6b86:d24::/48 via:
        BGP.as_path: 201701 9002 6939 42615 212232
        BGP.as_path: 34927 9002 6939 42615 212232
        BGP.as_path: 207960 34927 9002 6939 42615 212232
        BGP.as_path: 44103 50673 9002 6939 42615 212232
        BGP.as_path: 208627 207910 34927 9002 6939 42615 212232
        BGP.as_path: 3280 34927 9002 6939 42615 212232
        BGP.as_path: 206628 34927 9002 6939 42615 212232
        BGP.as_path: 208627 207910 34927 9002 6939 42615 212232
    (first announced March 24th, last withdrawn March 24th, 2021)

Just now, I literally rebooted the BGP speaker behind lg.ring.nlnog.net
to make ensure that those routes are not stuck in the BGP looking glass
itself.

2a0b:6b86:d24::/48 was first announced on March 24th, 2021, and
withdrawn at the end of March 24th, 2021 by the originator, and now
almost a month later, this prefix still is visible in the default-free
zone despite WITHDRAW messages having been sent and the AS 212232
operator confirming they are not announcing that IP prefix anywhere.

I checked the AS 6939 Looking glass, but the d24::/48 route is not
visible in the http://lg.he.net/ web interface. This leads me to believe
the the route got stuck somewhere along way in either of 201701, 204092,
206628, 207910, 207960, 208627, 3280, 34927, 35280, 44103, 50673, 57199,
and/or 9002.

This implies indeed might be multiple reasons a BGP route gets stuck
('stuck' as in - a WITHDRAW was not generated, or ignored). Perhaps on
any one of these edges there is a very high Out Queue for one reason or
another:

    34927 9002
    206628 34927
    44103 50673
    207960 34927
    3280 34927
    9002 6939
    201701 9002
    208627 207910

I'm not sure all the these sightings of stuck routes can be pinpointed
to one specific BGP vendor (or one bug).

Kind regards,

Job

I'm not sure if this is helpful to this discussion or not, but I recently became aware of a bug in a virtual router using DPDK+VPP which sounds like it could possibly produce a similar issue to what is being described, without the TCP window being a factor.

The system used the same process to read and process the messages coming in to the netlink socket. When a large BGP update was being processed it was possible that the netlink buffer was being filled while previous updates were being processed. This caused some route updates to not be processed, not applied to the VPP FIB, and so they became stuck. The particular vendor I spoke to about this issue resolved this by giving priority to reading and storing the messages for processing, and asynchronously processing those messages in batches.

I can share additional details off-list if anyone thinks this could be related to the problem.

I'd like to get some data on what actually happened in the real cases
and analyze it.

[snip]

TCP zero window is possible, but many other things could
cause it too.

Indeed. There could be a number of reasons that caused it.

Switchings away from TCP win=0 towards "Zombie Routes":

*RIGHT NOW* (at the moment of writing), there are a number of zombie
route visible in the IPv6 Default-Free Zone:

One example is http://lg.ring.nlnog.net/prefix_detail/lg01/ipv6?q=2a0b:6b86:d15::/48

     2a0b:6b86:d15::/48 via:
         BGP.as_path: 204092 57199 35280 6939 42615 42615 212232
         BGP.as_path: 208627 207910 57199 35280 6939 42615 42615 212232
     (first announced April 15th, last withdrawn April 15th, 2021)
     Another one is http://lg.ring.nlnog.net/prefix_detail/lg01/ipv6?q=2a0b:6b86:d24::/48

     2a0b:6b86:d24::/48 via:
         BGP.as_path: 201701 9002 6939 42615 212232
         BGP.as_path: 34927 9002 6939 42615 212232
         BGP.as_path: 207960 34927 9002 6939 42615 212232
         BGP.as_path: 44103 50673 9002 6939 42615 212232
         BGP.as_path: 208627 207910 34927 9002 6939 42615 212232
         BGP.as_path: 3280 34927 9002 6939 42615 212232
         BGP.as_path: 206628 34927 9002 6939 42615 212232
         BGP.as_path: 208627 207910 34927 9002 6939 42615 212232
     (first announced March 24th, last withdrawn March 24th, 2021)

Just now, I literally rebooted the BGP speaker behind lg.ring.nlnog.net
to make ensure that those routes are not stuck in the BGP looking glass
itself.

2a0b:6b86:d24::/48 was first announced on March 24th, 2021, and
withdrawn at the end of March 24th, 2021 by the originator, and now
almost a month later, this prefix still is visible in the default-free
zone despite WITHDRAW messages having been sent and the AS 212232
operator confirming they are not announcing that IP prefix anywhere.

I checked the AS 6939 Looking glass, but the d24::/48 route is not
visible in the http://lg.he.net/ web interface. This leads me to believe
the the route got stuck somewhere along way in either of 201701, 204092,
206628, 207910, 207960, 208627, 3280, 34927, 35280, 44103, 50673, 57199,
and/or 9002.

This implies indeed might be multiple reasons a BGP route gets stuck
('stuck' as in - a WITHDRAW was not generated, or ignored). Perhaps on
any one of these edges there is a very high Out Queue for one reason or
another:

     34927 9002
     206628 34927
     44103 50673
     207960 34927
     3280 34927
     9002 6939
     201701 9002
     208627 207910

I'm not sure all the these sightings of stuck routes can be pinpointed
to one specific BGP vendor (or one bug).

I would guess that all the stuck route sightings manifest from one undiscovered TCP library bug that some BGP vendors are all commonly using.

-Hank

Thank you for the details and clearing the issue.

Kind regards,

Job

[...]

    
Another one is http://lg.ring.nlnog.net/prefix_detail/lg01/ipv6?q=2a0b:6b86:d24::/48

    2a0b:6b86:d24::/48 via:
        BGP.as_path: 201701 9002 6939 42615 212232
        BGP.as_path: 34927 9002 6939 42615 212232
        BGP.as_path: 207960 34927 9002 6939 42615 212232
        BGP.as_path: 44103 50673 9002 6939 42615 212232
        BGP.as_path: 208627 207910 34927 9002 6939 42615 212232
        BGP.as_path: 3280 34927 9002 6939 42615 212232
        BGP.as_path: 206628 34927 9002 6939 42615 212232
        BGP.as_path: 208627 207910 34927 9002 6939 42615 212232
    (first announced March 24th, last withdrawn March 24th, 2021)

[...]

I checked the AS 6939 Looking glass, but the d24::/48 route is not
visible in the http://lg.he.net/ web interface. This leads me to believe
the the route got stuck somewhere along way in either of 201701, 204092,
206628, 207910, 207960, 208627, 3280, 34927, 35280, 44103, 50673, 57199,
and/or 9002.

9002. Hit by Juniper PR1562090, route stuck in DeletePending..
Workaround applied, sessions with 6939 restarted, route is gone.

Job Snijders via NANOG writes:

*RIGHT NOW* (at the moment of writing), there are a number of zombie
route visible in the IPv6 Default-Free Zone:

[Reversing the order of your two examples]

Another one is http://lg.ring.nlnog.net/prefix_detail/lg01/ipv6?q=2a0b:6b86:d24::/48

    2a0b:6b86:d24::/48 via:
        BGP.as_path: 201701 9002 6939 42615 212232
        BGP.as_path: 34927 9002 6939 42615 212232
        BGP.as_path: 207960 34927 9002 6939 42615 212232
        BGP.as_path: 44103 50673 9002 6939 42615 212232
        BGP.as_path: 208627 207910 34927 9002 6939 42615 212232
        BGP.as_path: 3280 34927 9002 6939 42615 212232
        BGP.as_path: 206628 34927 9002 6939 42615 212232
        BGP.as_path: 208627 207910 34927 9002 6939 42615 212232
    (first announced March 24th, last withdrawn March 24th, 2021)

So that one was resolved at AS9002, see Alexandre's followup (thanks!)

AS9002 had also been my guess when I read this, because it's the
leftmost common AS in the paths observed.

One example is http://lg.ring.nlnog.net/prefix_detail/lg01/ipv6?q=2a0b:6b86:d15::/48

    2a0b:6b86:d15::/48 via:
        BGP.as_path: 204092 57199 35280 6939 42615 42615 212232
        BGP.as_path: 208627 207910 57199 35280 6939 42615 42615 212232
        BGP.as_path: 208627 207910 57199 35280 6939 42615 42615 212232
    (first announced April 15th, last withdrawn April 15th, 2021)

Applying the same logic, I'd suspect that the withdrawal is stuck in
AS57199 in this case. I'll try to contact them.

Here's a (partial) RIPE RIS BGPlay view of the last lifecycle of the
2a0b:6b86:d15::/48 beacon:

Cheers,

On the AS204092 side, the route is one week and two days old (so
2021-04-16). So we never received the withdrawn.

asbr01#sh bgp ipv6 uni 2a0b:6b86:d15::/48
BGP routing table entry for 2A0B:6B86:D15::/48, version 88407242
BGP Bestpath: deterministic-med: med
Paths: (2 available, best #1, table default)
  Advertised to update-groups:
     129 130 145 167
  Refresh Epoch 1
  57199 35280 6939 42615 42615 212232
    2A0B:CBC0:1::BD (FE80::66D1:54FF:FEEF:9893) from 2A0B:CBC0:1::BD (80.67.167.5)
      Origin IGP, metric 10, localpref 100, valid, external, best
      Community: 24115:6939 35280:10 35280:1040 35280:2080 35280:3120 35280:20000 35280:21000 35280:21150 57199:35280 57199:65535 64496:100 64496:57199 64999:24115
      unknown transitive attribute: flag 0xE0 type 0x20 length 0x30
        value 0000 5E33 0000 03E9 0000 0001 0000 5E33
              0000 03EA 0000 0002 0000 5E33 0000 03EB
              0000 0005 0000 5E33 0000 03EC 0000 1B1B

      path 7F1E8D0F3B58 RPKI State valid
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  57199 35280 6939 42615 42615 212232, (received-only)
    2A0B:CBC0:1::BD (FE80::66D1:54FF:FEEF:9893) from 2A0B:CBC0:1::BD (80.67.167.5)
      Origin IGP, metric 4294967295, localpref 100, valid, external
      Community: 24115:6939 35280:10 35280:1040 35280:2080 35280:3120 35280:20000 35280:21000 35280:21150 57199:35280 57199:65535 64999:24115
      unknown transitive attribute: flag 0xE0 type 0x20 length 0x30
        value 0000 5E33 0000 03E9 0000 0001 0000 5E33
              0000 03EA 0000 0002 0000 5E33 0000 03EB
              0000 0005 0000 5E33 0000 03EC 0000 1B1B

      path 7F1E8D0EF088 RPKI State valid
      rx pathid: 0, tx pathid: 0
asbr01#sh ipv6 route 2a0b:6b86:d15::/48
Routing entry for 2A0B:6B86:D15::/48
  Known via "bgp 204092", distance 20, metric 10, type external
  Route count is 1/1, share count 0
  Routing paths:
    FE80::66D1:54FF:FEEF:9893, GigabitEthernet0/0/0.24
      MPLS label: nolabel
      Last updated 1w2d ago

asbr01#