BGP AFI or SAFI for advertising BFD status

I’m looking for a way to propagate the status of BFD sessions running on one router to another via BGP.

Considering the versatility of BGP, I’m sure this is possible.

But when researching BFD and BGP, my biggest difficulty was avoiding documents that talk about how to use BFD as a trigger for detecting the status of a BGP peer.

I imagine that EVPN may have some space for this status propagation, but I couldn’t find any information about it.

Does any colleague have any clue about BFD session status propagation over BGP?

this internet draft discusses the problem space

https://datatracker.ietf.org/doc/draft-ietf-idr-rs-bfd/

Unfortunately there are no known implementations of the protocol.

Nick

Uuuuuu that’s sounds great!
Exactly what I was wondering.

Create a VRF called BFD-RouteServer on each PE, on that VRF on each PE activate the IPs of Route-Server, and activate de BFD Sessions with the participants connected to that PE.
And then, add a PBR deviation of BFD coming from participants to that Route-Server IP replicas.

Having that weird Anycast of BFD deployed, now comes the part of making Route-Servers being aware of the status of those Distributed BFD Sessions.

Definitely will be interesting to read the list discussion about this. My first reaction was why would you even need this, so def curious.

I’m not sure it’s a all-round good solution to either of these problems, in the “be careful what you wish for, because you might get it” sense. There are going to be router platforms out there which won’t handle hundreds of BFD sessions reliably, so if the protocol were widely supported, it’s not clear that it would help or harm interdomain routing stability due to the ability of routers to handle large numbers of BFD sessions, particularly where there were situations where all the sessions could be triggered simultaneously.

As a separate issue, hold timers should generally be of a comparable order of magnitude to the non-availability effect they’re attempting to mitigate. Inter-domain routing convergence is often measured in minutes rather than seconds. So even if the protocol layer worked at IXPs without causing control plane meltdown, it’s still a mechanism which which has a trigger timer two orders of magnitude faster than the general case of DFZ reconvergence. I can’t see that this would help overall inter-domain routing stability.

Nick

I think this is the fundamental question.

BGP is stable and scales well given its global scope, not only because it turns like a tanker, but because we accept that it turns like a tanker.

Now, in a world of TikTok Brain and Uber Eats where we are used to getting what we want instantly, imposing that on to BGP, even if sneakily, is probably not something we want. At least not at a global scale.

Mark.

There are two main target situations: firstly, when a router unexpectedly
drops off an ixp platform, this won't be explicitly signaled to the other routers
on the fabric, which can mean that packets to that device will be black-holed
until all the others bgp hold timers kick in.

Hi Nick,

I'm missing something.

Wouldn't the route server send withdrawals and updates to the rest of
the participants as soon as its hold timer with the lost router
expires?

Could this not be accelerated by the IXP asking the participants to
keep low keepalive and hold timers with the route server?

How would your solution help when two participants at an IXP have
chosen -not- to bilaterally peer, thus needing the route server to
intermediate? They're going to agree to build BFD sessions even though
they don't want BGP sessions? That... doesn't make sense.

The second situation would be to deal with forwarding plane incongruence
on IXPs, i.e. where router A can reach RS, router B can reach RS, router A
cannot reach router B due to a problem on the IXP fabric itself. Thankfully
this style of problem has become quite unusual over the last several years.

Doesn't seem like it would solve the bouncy link problem.

Absent bouncy links, simply having a reasonable time out for arp and
ND will assure the router quickly finds its neighbor unreachable,
which is applied as backpressure into BGP.

Regards,
Bill Herrin

Is this a bgp-ls solution to an snmp trap problem? Why can oss do this notification? Are we turning bgp into nms?