Some doubts on large scale BGP/AS design and black hole routing risk

Hi everybody!

as part of laboratory work at the university, I'm working on a BGP design study, and I would like to post some questions regarding IP address space allocation and its impact on BGP which are breaking my mind :slight_smile:

Let's suppose we have an ISP/AS with two POPs: PARIS and LONDON. These two POPs are connected with redundant leased lines. Each POP has a BGP router speaking eBGP to different ISP providers/upstreams and also, each POP run its own OSPF area/ISIS area. Something like this:

  <INET> ---eBGP---<LONDON POP-ospf area1>===redundant leased lines (ospf area0)===<PARIS POP- ospf area2>---eBGP---<INET>

Now, this AS/ISP gets one /22 prefix from it RIR (RIPE in this case), and starts to announce it to its upstreams in PARIS and LONDON at the same time.

My questions are:

1. What could happen in the case of total failure in the redundant leased lines? Black hole routing between POPs?

2. What are the best design methods to avoid this scenario?

    2.1: adding a third POP creating a triangle? What if a POP looses connection with the other two POPs at the same time? Another black hole?

    2.2: requesting another prefix and allocating 1:1 prefix:POP, so in the scenario each POP only would announce its prefix to the upstreams?

    2.3: other?

Thanks in advance!
J.

My questions are:

1. What could happen in the case of total failure in the redundant
leased lines? Black hole routing between POPs?

If you have redundant backhaul that completely fails, you've got real
problems.

However, if that does happen, any traffic coming into each individual
PoP destined for users in the other PoP will fail. Only traffic
terminating for customers at that PoP will succeed.

2. What are the best design methods to avoid this scenario?

Work on your backhaul.

Originate specific routes that cover customers present in each PoP, with
the aggregate as a backup route.

You can run a tunnel across the Internet to simulate a backbone between
both PoP's, using your side of your upstream's IP addresses as the
tunnel end-point. Not elegant, but keeps you up.

   2.1: adding a third POP creating a triangle? What if a POP looses
connection with the other two POPs at the same time? Another black hole?

Your fixation on a complete backhaul outage is interesting.

Purchase backhaul from different service providers to increase your
chances of uptime.

   2.2: requesting another prefix and allocating 1:1 prefix:POP, so in
the scenario each POP only would announce its prefix to the upstreams?

See above re: originating more specific routes based on the customers
you have at each PoP.

   2.3: other?

Work harder on your backhaul.

Yes, bad things can happen, and they do happen. But more than likely, if
a 3-PoP network loses all connectivity from each other, I think routing
will be a much smaller problem to solve in the grand scheme of things.

Mark.

Are respondents to suppose that the customer base and address space are evenly divided between the two cities, and that the ISP is too clueless to originate each /23 from the city that uses it, in iBGP?

                -Bill

>
>
> My questions are:
>
> 1. What could happen in the case of total failure in the redundant
> leased lines? Black hole routing between POPs?

If you have redundant backhaul that completely fails, you've got real
problems.

However, if that does happen, any traffic coming into each individual
PoP destined for users in the other PoP will fail. Only traffic
terminating for customers at that PoP will succeed.

​so (as bill points out) plan to localize subnets to each pop. (do not
number customers in pop1 in the same /24 as customers in pop2)​

>
> 2. What are the best design methods to avoid this scenario?

Work on your backhaul.

Originate specific routes that cover customers present in each PoP, with
the aggregate as a backup route.

You can run a tunnel across the Internet to simulate a backbone between
both PoP's, using your side of your upstream's IP addresses as the
tunnel end-point. Not elegant, but keeps you up.

​be aware of gre / ip-in-ip forwarding limitations​

>
> 2.1: adding a third POP creating a triangle? What if a POP looses
> connection with the other two POPs at the same time? Another black hole?

Your fixation on a complete backhaul outage is interesting.

Purchase backhaul from different service providers to increase your
chances of uptime.

​different providers, different entrance facilities in the building(s),
different conduits out of the area... and hope that somewhere along the
path providerA and B didn't share conduit or capacity-swap you to a single
path :)​

​so (as bill points out) plan to localize subnets to each pop. (do not
number customers in pop1 in the same /24 as customers in pop2)​

Yes.

May lead to some global de-aggregation, but can't really avoid that.

​be aware of gre / ip-in-ip forwarding limitations​

I wouldn't touch it, myself.

I'd rather devote the sleepless nights to fixing the backhaul.

​different providers, different entrance facilities in the
building(s), different conduits out of the area... and hope that
somewhere along the path providerA and B didn't share conduit or
capacity-swap you to a single path :)​

+1.

Mark.

Hi guys

thanks everyone for your replies.

I'd like to highlight this concept that Christopher gave before:

​"different providers, different entrance facilities in the building(s), different conduits out of the area... "

How can we get this in this world where everyone is moving to big Data Center / Colo-Hosters.....In this kind of colo providers, you usually have a Meet-Me-Room or similar (which is a single point of failure) and no control on how you're actually connected with your peers....

Cheers.

​you mean: "My pop1 is equinix ashburn DC2, my pop2 is equinix SVL1" ?

I believe there are separate entrance facilities available at this sort of
facility (or that'd certainly be on my RFP if I were looking) and you can
sort out with the providers local fiber routing issues by asking for their
maps. ​

Into the same data centre, the data centre operators would provide for
two or more fibre entry points.

Otherwise, you may look for providers that have presence in different
facilities, if the city you are living in happens to have this luxury.

Mark.

I would suggest also different provider equipment. If one provider uses J find your second provider that uses C.

Also don't be seduced by a provider that offers 2 disparate paths, using two totally different systems. I remember years ago AT&T's ATM and FR systems both died nationwide due to some equipment bug.

Also providers lie either intentionally or by mistake. If they state a circuit is protected, it might be this month, but next month it may not be. You may only discover this 3 years from now when the circuit dies, and the provider is happy to pay the SLA penalty which is far less than the 3 year cost of a protected vs a non-protected circuit.

-Hank

Hi guys

thanks everyone for your replies.

I'd like to highlight this concept that Christopher gave before:

​"different providers, different entrance facilities in the building(s),
different conduits out of the area... "

How can we get this in this world where everyone is moving to big Data
Center / Colo-Hosters.....In this kind of colo providers, you usually
have a Meet-Me-Room or similar (which is a single point of failure) and
no control on how you're actually connected with your peers....

Sometimes you have two or more MMRs sometimes providers are only one one
or another.

The actual discipline here is delivering reliable cost-effective service
with unreliable components...