I'm running two Juniper MX104s. Each MX has 1 ISP connected running
BGP(full routes). iBGP is running between the routers via a two port 20G
lag. When one of the ISPs fails, it can take upwards of 2 minutes for
traffic to start flowing correctly. The router has the correct route in the
routing table, but it doesn't install it in the forwarding table for the
full two mins.
I have a few questions if anyone could answer them.
- What would a usual convergence time be for this setup?
- Is there anything I could do speed this process up? (I tried Multipath)
- Any tips and tricks would be much appreciated
I'm running two Juniper MX104s. Each MX has 1 ISP connected running
BGP(full routes). iBGP is running between the routers via a two port 20G
lag. When one of the ISPs fails, it can take upwards of 2 minutes for
traffic to start flowing correctly. The router has the correct route in the
routing table, but it doesn't install it in the forwarding table for the
full two mins.
I have a few questions if anyone could answer them.
- What would a usual convergence time be for this setup?
With MX104, between 50 seconds and many minutes. The RE is not really
dimensioned for full tables, unfortunately.
- Is there anything I could do speed this process up? (I tried Multipath)
You sure it doesn't have something to do with 60 seconds * 3 = 180 secs of BGP neighbor Time out before it believes neighbor is dead and remove routes to that neighbor?
I could use static routes but I noticed since I moved to full routes I have
had a lot fewer customer complaints about latency(especially when it comes
to Voice and VPN traffic).
I wasn't using per-packet load balancing. I believe juniper default is per
IP.
My timers are as follows
Active Holdtime: 90
Keepalive Interval: 30
Would I be correct in thinking I need to contact my ISP to lower these
values?
An interesting note is when I had both ISPs connected into a single MX104
the failover was just a few seconds.
A last resort route (default route) could still be good to take from your
ISP(s) even if you still do full routes, as the propagation is happening on
the internet side, you should at least have a path inbound through the
other provider. The default route at least would send the traffic out if
it does not see the route locally. Just an idea.
Good Idea. I can't believe I didn't think of that earlier. Simple and
effective. I will go ahead and request the defaults from my ISP and update
the thread of the findings.
Ask if they will configure BFD for you. I’ve not found many transit providers that will, but it’s worth a shot and it will lower failure detection to circa 1 second.
The Juniper default is to not do ECMP at all. Only a single route is
programmed into the FIB for each prefix in your RIB. If you e.g. have
routes to 198.51.100.0/24 pointing to ten different ports, all traffic
to that entire /24 will go out over a single port, unless you have
explicitly enabled ECMP.
In L3VPN, when I've learned say, 3 different routes all using different MPLS tags to the 3 remote PE's, is there a way to ECMP hash across all of the paths to load balance?
You shouldn't need to contact your ISP on the lowered BGP timers as BGP
should establish based on the lowest value. That said, they may have a
value limit where anything lower than that, is set at your own risk.
You can look at running BFD over the BGP session as well. Technically it
has nothing to do with convergence, but it can quickly detect a down issue
and drop BGP right away.
I've tried using the default route, adjusting bgp timers, and mutlipath.
Unfortunately, these changes haven't helped much. Juniper support hasn't
been very helpful also. Although, I think I might have found the solution.