Re: A survey on BGP MRAI timer values in practice

In Cisco, MRAI is "advertisement-interval".
MRAI helps to reduce route update multiplication in highly redundant
networks. OTOH, it can increase the time it takes to re-advertise
a complete internet table in some router implementations.
Update multiplication due to redundant network connections causes
some receivers of the multiple updates to become slow peers.

Here's an experiment: Do something to cause a BGP route refresh, like
the equivalent of "clear bgp soft out". It will not change any routes.
It just resends everything that was already sent. See how long it takes
with MRAI=0. Then set MRAI to about half of that value and do the
refresh again. If it takes substantially longer to complete the refresh,
stick with MRAI=0.
If there is no significant difference, use MRAI of 1 or 2 seconds.


If your work results in actionable recommendations such as "don't use BGP

out-delay timers to mitigate XYZ in circumstance LMNO, do ABC instead",
that's fantastic. Please keep us advised, and do post aggregated survey
results here once you close the survey.

What is actionable? What is the goal? The question as OP presented contains
some assumptions

a) better convergence is needed
b) MRAI is important part of the solution space

Neither are provable. We already know how to make DFZ convergence really
fast (or at least orders of magnitude faster than it is), that information
exists, but that isn't deployed because customers are not asking for it, so
providers are not aware that there is room for improvements.

Things don't optimise to be as good as they can be, things optimise to be
as bad as the market allows them to be. And the market accepts the DFZ

If you do decide to optimise for DFZ convergence, without commercial
pressure, you will risk lower availability, because you'll be using
configuration less tested by other customers and everyone knows how
terrible quality every NOS is. Everyone finds novel bugs, in the same damn
protocols we've ran +20 years. It's like running Windows and Linux and
regularly finding out listing files in a directory breaks your service,
year after year after year.

For those who are interested in better convergence
   - change your interface down reporting to 0 (there may be delay before
interface down is reported to system, so that optical protection works
without causing outage)
   - use 'add-path' or at least 'best-external' in iBGP, so that you always
have backup eBGP route immediately available once best is invalidated
(normally you have lot of delay to find next best, once you lose your best
   - tie your route validity to IGP, so you can invalidate your BGP the
moment IGP disappears
   - ensure IGP converges fast (another topic)
   - set MRAI to 0
   - use PIC edge
   - ensure your BGP NLRI can be as large as MTU allows
   - ensure your convergence isn't bottle necked by slow peer in group
   - ensure you are not dropping received TCP packets on punt path
   - ensure your fast external fallover works (eBGP down, on int down) this
is quite easy to break
   - then ensure everyone else in the DFZ does the same thing

But from a business POV, don't do any of this, you will have more bugs and
lower availability and your customers will be less happy.