Thanks to everyone who has responded so far. Enlightening!
My understanding around the origins of BFD is that it was developed in part to try and bring SONET like switchover times to an Ethernet world. What I’m reading is for those who do run BFD, no one seems to be dialing it down to try and achieve those times. Some folks explained why they chose the values they did, but others didn’t. So my follow up question is “Why don’t you dial them down?”. Are achieving those switchover times not important for your use case? Do you not trust that it won’t be reliable based on the gear you’re using, or the quality/reliability of the underlying circuit you’re trying to protect? Something else?
Also, interesting to read about why some folks don’t care much about BFD at all.
Silly question perhaps, but why would you do BFD on dark fiber?
Because Ethernet lacks the PRDI that real WAN protocols have.
Indeed, RFI on ethernet is rather modern addition, turning 20 this year.
(You just reminded me I've been doing some sort of WAN network ops for
about 20 years.)
That does indeed solve the problem for dark fibre, and those lucky WDM
systems that actually reflect input status to output. Not always true, I'm
afraid (just look at the Ethernet switch mid-span that Thomas Bellman wrote
about; a fitting metaphor for all "ethernet-over-other.." models..).
Ethernet still regards "no frames seen on the yellow coax" as an
opportunity to send traffic rather than an error, if we're talking old
things ;-). BFD solves that, and it is worthwhile to have one setup
regardless of technology, if possible.
Not directly related, but I wonder: how common is micro-BFD for detecting
bundle member failures?
I'm not sure it's used by a large proportion of operators, but it is
deployed in some volume in a number of networks that I'm aware of. During
the development of implementations, we hosted inter-op testing/fixing at a
previous employer. Rolling it out had started when I moved on, but I expect
it is now across their global deployments at this point. I haven't heard
anything to say that it's causing any issues.
[I still am somewhat unable to reconcile myself with the use of BFD in this
deployment, some Ethernet OAM - seemed a reasonable per-member solution to
me, but folks have a preference for a single protocol here.]
r.
You also have the issues of:
* Deciding whether you want to have a uniform standard when deploying
BFD. If your standard is not to on dark links and should do lit
links, you can quickly run into an administrative scenario as your
network grows, and keeping track of which link is what re: BFD or
not can be someone's untangling project 10 years later.
* Circuit providers delivering hybrid links and not telling you
because they are either afraid to or don't fully understand the
scope of their (very large) network. In this case, you're told the
link is dark, but somewhere along the path is their active gear.
(not so) Strange, but true.
Mark.
In all recent versions of IOS, this command is now standard and is
elided from the running configuration.
Mark.
Here is what we do...
router isis xxxx
interface TenGigabitEthernet0/0/0/0
circuit-type level-2-only
bfd minimum-interval 50
bfd multiplier 5
bfd fast-detect ipv4
We keep the same config for local and long haul core links. Works like a champ every time.
Also as a FYI if you are running ASR9K, you are able to offload the BFD process from the Linecard CPU to the NPU. This allows BFD timers down to 3.3 milliseconds. https://www.cisco.com/c/en/us/td/docs/routers/asr9000/software/asr9k_r5-1/routing/configuration/guide/b_routing_cg51xasr9k/b_routing_cg51xasr9k_chapter_011.html
We use the same timers for all links, but different multipliers
depending on the link length.
We have links as short as 5km, all the way to 14,500km.
Mark.
Any words of wisdom / battle scars regarding running links that
are in the 10K+ distance?
Any words of wisdom / battle scars regarding running links that
are in the 10K+ distance?
Keep repair ships nearby :-).
From a submarine perspective, things that are out-of-scope here.
From an IP perspective, we've had good experience with 250ms * 5 for
BFD. Actual RTT latency is 140ms, so there is enough headroom to account
for false positives.
Mark.