Resilience - How many BGP providers


After recent discussions on the list, I've been thinking about the affects
of multiple BGP feeds to the overall resilience of Internet connectivity
for my organisation. So originally when I looked at the design
proposals, there was a provision in there for four connections with the
same Internet provider. Thinking about it and with the valuable input of
members on this list, it was obvious that multiple connections from the
same provider defeated the aim of providing resilience.

So having come to the decision to use two providers and BGP peer with
both, I'm wondering how much more resilience I would get by peering
with more than two providers. So will it significantly increase my
resilience by peering with three providers for example, as both of the
upstreams I choose will be multihomed to other providers. Especially as
I am only looking at peering out of the UK.

Hope the above makes sense.


You question has many caveats. Just having two providers does not necessarily get you more resiliency. If you have two providers and they are terminating on the same router, then you still have a SPOF problem. You also need to look at pysical paths as well. If you have two (or three) providers and they are using a common carrier, then you have a problem as well. For example, GLBX has a small prescence in the Minneapolis metro. If I were to use them as a provider, they would use Qwest as a last mile. If my other provider is Qwest (which it is), I may not have path divergence.
Facilities are important too. We have three upstreams; Qwest, MCI and ATT. The facility only has two entrances, so that means two of these are in the same conduit. IF you only have one entrance, all you connections are going to run through that conduit, and that makes you susceptable to a rouge backhoe.

You are on the right track to question your resilancy. Some upstreams can offer good resilancy with multiple feeds. Others cannot. I would start with your provider and see what you are getting. Maybe you already have path divergence, sperate last miles, and multiple paths in the isp core. If you go with multiple providers, you want to make sure you don't risk losing something you already have.

Dylan Ebner wrote:

IF you only have one entrance, all you connections are going to run through that conduit, and that makes you susceptable to a rouge backhoe.

Not just the rouge ones. The big yellow ones are far more common and can do just as much damage.

The thing to remember about redundancy is that it's a statistical game rather than a magic formula.

You can be reasonably sure that any single component will go down at some point. Nothing works perfectly. Few things last forever.

If you have two fairly reliable components, and if they're suffciently isolated from eachother that they won't be broken by the same event, it's much less likely that they'll both break at the same time. That means that if one breaks, and you're not unlucky, you'll have time to fix it before the other breaks.

If you have three components, the chances of all three being broken at once are even less than the chances of two of them being broken at once. With four, you're even safer, and so on and so forth. But once you get beyond two, you hit a point of diminishing returns pretty quickly.

That doesn't mean you should always do two of any given component. Some things may be so important that you're not willing to take that level of risk and are willing to spend significantly more money to get a small amount more protection. Some things may be sufficiently unimportant that you're willing to deal with occasional outages, and you can get by without a spare (few people -- with obvious exceptions who we don't need to hear about right now -- have fully redundant home connectivity, for instance). It's just a matter of understanding the risks, and doing the cost-benefit analysis to determine how much protection you need and how much you're willing to pay for it.


Not only that, but you have to ask yourself what are the chances that
all these extra components will become extra points of failure and
actually increase the likelihood of something going wrong. I know a lot
of folks who have gotten themselves into a lot of trouble buying transit
from everyone they can possibly buy from, thinking it will make their
network more reliable. In most cases all it does is make their network
more unstable. The more transit paths you have out there, the more
likely you are to have something flap and wipe you out w/flap dampening,
and the more likely you are to see any single event cause a massive
amount of churn. I've seen people with 8 transit providers appear to
others on the internet as though they flapped 100+ times over a single
session flap, because of all the churn as the network reconverges. More
transit providers also means more 95th calculations, and thus a higher
bill, but that is another story for another day. :slight_smile: