Are there any transit providers out there that accept using the BFD (or any other similar) mechanism for eBGP peerings?
If no, how do you solve the issue with the physical interface state when LANPHY connections are used?
Anyone messing with the BGP timers? If yes, what about multiple LAN connections with a single BGP peering?
Well first off LAN PHY has a perfectly useful link state. That's pretty
much the ONLY thing it has in the way of native OAM, but it does have
that, and that's normally good enough to bring down your EBGP session
quickly. Personally I find the risk of false positives when speaking to
other people's random bad BGP implementations to be too great if you go
much below 30 sec hold timers (and sadly, even 30 secs is too low for
some people). We (nLayer) are still waiting for our first customer to
request BFD, we'd be happy to offer it (with reasonable timer values of
course).
Link state is good for the local connection. If there are multiple intermediate optical points (not managed by either party), or a lan switch (IX environment), you won't get any link notification for everything not connected locally to your interface, unless there is a mechanism to signal that to you.
We are going to turn up BFD with Level3 this Saturday. They require that you run a Juniper(per SE). Its sounds like it is fairly new as there was no paperwork to request the service, had to put it in the notes.
We have many switches between us and Level3 so we don't get a "interface down" to drop the session in the event of a failure.
This is often my topology as well. I am satisfied with BGP's
mechanism and default timers, and have been for many years. The
reason for this is quite simple: failures are relatively rare, my
convergence time to a good state is largely bounded by CPU, and I do
not consider a slightly improved convergence time to be worth an
a-typical configuration. Case in point, Richard says that none of his
customers have requested such configuration to date; and you indicate
that Level3 will provision BFD only if you use a certain vendor and
this is handled outside of their normal provisioning process.
For an IXP LAN interface and associated BGP neighbors, I see much more
advantage. I imagine this will become common practice for IXP peering
sessions long before it is typical to use BFD on
customer/transit-provider BGP sessions.
There are still a LOT of platforms where BFD doesn't work reliably
(without false positives), doesn't work as advertised, doesn't work
under every configuration (e.g. on SVIs), or doesn't scale very well
(i.e. it would fall over if you had more than a few neighbors
configured). The list of caveats is huge, the list of vendors which
support it well is small, and there should be giant YMMV stickers
everywhere. But Juniper (M/T/MX series at any rate) is definitely one of
the better options (though not without its flaws, inability to configure
on the group level and selectively disable per-peer, and lack of support
on the group level where any IPv6 neighbor is configured, come to mind).
Running BFD with a transit provider is USUALLY the least interesting use
case, since you're typically connected either directly, or via a metro
transport service which is capable of passing link state. One possible
exception to this is when you need to bundle multiple links together,
but link-agg isn't a good solution, and you need to limit the number of
EBGP paths to reduce load on the routers. The typical solution for this
is loopback peering, but this kills your link state detection mechanism
for killing BGP during a failure, which is where BFD starts to make
sense.
For IX's, where you have an active L2 switch in the middle and no link
state, BFD makes the most sense. Unfortunately it's the area where we've
seen the least traction among peers, with "zomg why are you sending me
these udp packets" complaints outnumbering people interesting in
configuring BFD 10:1.
Correct me if I am wrong but to detect a failure by default BGP would wait the "hold-timer" then declare a peer dead and converge.
So you would be looking at 90 seconds(juniper default?) + CPU bound convergence time to recover? Am I thinking about this right?
This is correct. Note that 90 seconds isn't just a "Juniper default."
This suggested value appeared in RFC 1267 §5.4 (BGP-3) all the way
back in 1991.
In my view, configuring BFD for eBGP sessions is risking increased
MTBF for rare reductions in MTTR.
This is a risk / reward decision that IMO is still leaning towards
"lots of risk" for "little reward." I'll change my mind about this
when BFD works on most boxes and is part of the standard provisioning
procedure for more networks. It has already been pointed out that
this is not true today.
If your eBGP sessions are failing so frequently that you are very
concerned about this 90 seconds, I suggest you won't reduce your
operational headaches or customer grief by configuring BFD. This is
probably an indication that you need to:
1) straighten out the problems with your switching network or transport vendor
2) get better transit
3) depeer some peers who can't maintain a stable connection to you; or
4) sacrifice something to the backhoe deity
Again, in the case of an IXP interface, I believe BFD has much more
potential benefit.
Correct me if I am wrong but to detect a failure by default BGP would wait the "hold-timer" then declare a peer dead and converge.
Hence the case for BFD.
There a difference of several orders of magnitude between BFD keepalive intervals (in ms) and BGP (in seconds) with generally configurable multipliers vs. hold timer.
With Real time media and ever faster last miles, BGP hold timer may find itself inadequate, if not in appropriate in some cases.
For a provider to require a vendor instead of RFC compliance is sinful.
Sudeep
There a difference of several orders of magnitude between BFD keepalive intervals (in ms) and BGP (in seconds) with generally configurable multipliers vs. hold timer.
With Real time media and ever faster last miles, BGP hold timer may find itself inadequate, if not in appropriate in some cases.
For eBGP peerings, your router must re-converge to a good state in < 9
seconds to see an order of magnitude improvement in time-to-repair.
This is typically not the case for transit/customer sessions.
To make a risk/reward choice that is actually based in reality, you
need to understand your total time to re-converge to a good state, and
how much of that is BGP hold-time. You should then consider whether
changing BGP timers (with its own set of disadvantages) is more or
less practical than using BFD.
Let's put it another way: if CPU/FIB convergence time were not a
significant issue, do you think vendors would be working to optimize
this process, that we would have concepts like MPLS FRR and PIC, and
that each new router product line upgrade comes with a yet-faster CPU?
Of course not. Vendors would just have said, "hey, let's get
together on a lower hold time for BGP."
As I stated, I'll change my opinion of BFD when implementations
improve. I understand the risk/reward situation. You don't seem to
get this, and as a result, your overly-simplistic view is that "BGP
takes seconds" and "BFD takes milliseconds."
For a provider to require a vendor instead of RFC compliance is sinful.
Many sins are more practical than the alternatives.
There a difference of several orders of magnitude between BFD keepalive intervals (in ms) and BGP (in seconds) with generally configurable multipliers vs. >>hold timer.
With Real time media and ever faster last miles, BGP hold timer may find itself inadequate, if not in appropriate in some cases."
For eBGP peerings, your router must re-converge to a good state in < 9
seconds to see an order of magnitude improvement in time-to-repair.
This is typically not the case for transit/customer sessions."
Not so, if your goal is peer deactivation and failover. Also you miss the point. Once the event is detected the rest of the process starts. I am talking about
event detection. One may want longer than a 30 second hold-timer but peer state deactivated instantly on link failure. If thats the design goal AND link state is not passed through, then
BFD BGP deactivation is a good choice.
To make a risk/reward choice that is actually based in reality, you
need to understand your total time to re-converge to a good state, and
how much of that is BGP hold-time. You should then consider whether
changing BGP timers (with its own set of disadvantages) is more or
less practical than using BFD.
Yes I see that and I mentioned "in some cases" not all or most cases.
Let's put it another way: if CPU/FIB convergence time were not a
significant issue, do you think vendors would be working to optimize
This goes orthogonal to my point. The Table size taxes, best path algorithms and the speed with
which you can re-FIB &rewrite the ASICs are constant in both the cases. But thats post event.
this process, that we would have concepts like MPLS FRR and PIC, and
Those are out of scope in the context of this thread and have completely different roles.
that each new router product line upgrade comes with a yet-faster CPU?
For things they can sell more licenses for such as 3DES, keying algorithms , virtual instances, other things on BGP, stuff that allow service providers to charge a lot more money
while running on common infrastructure such as MPLS & FRR and zillion other things like stateful redundancy, higher housekeeping needs, inservice upgrades and anything else with a list price. And its cheaper than the old cpu.
Of course not. Vendors would just have said, "hey, let's get
together on a lower hold time for BGP."
Because it would be horrible code design. Link detection is a common service. Besides BGP process threads can run longer than min intervals for link. Vendors would have to write checkpoints within BGP
code to come up and service link state machine. And wait its a user configurable checkpoint!! So came BFD. Write a simple state machine and make it available to all protocols.
As I stated, I'll change my opinion of BFD when implementations
improve. I understand the risk/reward situation. You don't seem to
get this, and as a result, your overly-simplistic view is that "BGP
takes seconds" and "BFD takes milliseconds."
I have no doubt that you understand your risk/reward but you don't for every other environments.
For event detection leading to a state change leading to peer deactivation, "my overly-simplistic view" is the fact ( not as you put it, but as it was written unedited). How you want to act in response is dependent on design.
is that "BGP
takes seconds" and "BFD takes milliseconds."
Thats what you read not what I wrote. I was comparing the speed of event detection.
Now like I said for speed of deactivation "BGP hold timer may find itself inadequate, if not in appropriate in some cases" in this same context. But as I mentioned , we don't know the pain we are trying to solve for the requirements thats drove this thread in the first place. So I simply put the facts and a business driver.
BFD is no different than deactivating a peer based on link failure. Your view is that there is no case for it. My point is - it arrived yesterday, its just a damn hard thing to monetize upstream in transit.
For a provider to require a vendor instead of RFC compliance is sinful.
Many sins are more practical than the alternatives.
Few maybe.
Need the ability to test Network Management and Provisioning applications over a variety of WAN link speeds from T1 equivalent up to 1GB speeds. Seems to be quite a few offerings but I am looking for recommendations from actual users. Thanks in advance.
I've used WANem (http://wanem.sourceforge.net/) for a last 2 years.
Simple WEB-interface, wide range of setting - it is enough fro network
engineers.
Need the ability to test Network Management and Provisioning applications over a variety of WAN link speeds from T1 equivalent up to 1GB speeds. Seems to be quite a few offerings but I am looking for recommendations from actual users. Thanks in advance.
Network Nightmare
http://gigenn.net/
I used this device in the past to test an HP RGS deployment. You can
simulate different connection rates and induce latency. Documentation is
weak but it does the job.
I've used both Mini Maxwell and Maxwell Pro from InterWorking Labs with great
success:
http://minimaxwell.iwl.com/
http://maxwell.iwl.com/
The Maxwell Pro has a bunch of interesting capabilities - it can be used to
actually modify packets as they pass through (like changing IP addresses and
port numbers, or messing with other parts of selected packets.) Obviously that
goes beyond simple WAN emulation.
Jim
0n Fri, Mar 18, 2011 at 10:27:18AM -0700, Jim Logajan wrote:
We've used FreeBSD + dummynet on a multi-NIC box in bridging mode
to do 'bump on the wire' WAN simulations involving packet loss, latency,
and unidirectional packet flow variances. Works wonderfully, and the price
is right.
Matt
Linux tc netem:
http://www.linuxfoundation.org/collaborate/workgroups/networking/netem
Has worked well for us.