Partial vs Full tables

jakobheitz · June 8, 2020, 5:56am

FIB compression comes with some risks.
When routes churn, there are certain cases when you have to decompress the FIB.
Then, the FIB must have the space, or else OOPS.
If a set of compressed routes has to change to decompress some and compress a
different set to improve overall compression, there is a lot of FIB
programming going on. This can cause very long convergence times.
Because a FIB memory cell can not forward and be programmed at the same time,
forwarding takes preference and programming speed suffers.
FIB programming is the slowest part of convergence, the bottleneck.
If routes also have a backup path loaded, then the backup nexthops also
need to be the same in order to compress. During convergence, not all
routes change at the same time and there could be some very uncompressible
transient route sets during convergence.
Some possible sequences of compress/decompress during convergence could
cause a lot of churn in FIB programming.
This presents lots of opportunities for optimization and thus bugs.

Regards,
Jakob.

Baldur_Norddahl · June 8, 2020, 8:14am

The easy solution is to introduce some delay before programming the FIB. Or even process RIB updates as a separate thread, such that the FIB update thread does not try to program every step the RIB might go through. Instead the FIB update thread would take a snapshot of where we are now and where do we want to be and only program the diff.

Given the concept is a smaller FIB size, this might actually end up being less FIB programming.

Regards,

Baldur