BGP Multihoming 2 providers full or partial?

Hi,

We are an enterprise that are eBGP multihoming to two ISPs. We wish to load balance in inbound and outbound traffic thereby using our capacity as efficiently as possible. My current feeling is that it would be crazy for us to take a full Internet routing table from either ISP. I have read this document from NANOG presentations:

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0CCoQFjAA&url=https%3A%2F%2Fwww.nanog.org%2Fmeetings%2Fnanog41%2Fpresentations%2FBGPMultihoming.pdf&ei=cyRnVb--FeWY7gbq4oHoAQ&usg=AFQjCNFsMx3NZ0Vn4bJ5zJpzFz3senbaqg&bvm=bv.93990622,d.ZGU

The above document reenforces my opinion that we do not need full routing tables. However I was seeking some clarity as there are other documents which suggest taking a full routing table would be optimal. I "guess" it depends on our criteria and requirements for load balancing:

- Just care about roughly balancing link utilisation

- Be nice to make some cost savings

We have PI space and two Internet routers one for each ISP. Either of our links is sufficient to carry all our traffic, but we want to try and balance utilisation to remain within our commits if possible. I am thinking a "rough" approach for us would be:

- Take partial (customer) routes from both providers

- Take defaults from both and pref one

Maybe we can refine the above a bit more, any suggestions would be most welcome!

Many Thanks

Can your devices support a full table?

You can load balance outbound traffic easily with out doing a full table. THo that won't be the shortest AS path. In regards to cost savings how were you thinking of doing so? Does one provider charge more? Just use the cheaper provider.

If you wish to do outbound traffic engineering, and want to take advantage of best paths to different networks (outbound), then you have to take full routes.

Or putting it another way.... Taking full routes offers the most flexibility, anything else would be a compromise (an acceptable compromise) to overcome some existing resource limitations...

Regards.

Faisal Imtiaz
Snappy Internet & Telecom
7266 SW 48 Street
Miami, FL 33155
Tel: 305 663 5518 x 232

Help-desk: (305)663-5518 Option 2 or Email: Support@Snappytelecom.net

Hi,

No the current devices can't support full table (well not from both providers) we would need to upgrade. Really in terms of cost saving just want to make sure to not get charged overages because we utilise too much of one link and not enough of another. I don't think the shortest AS path will be of that much concern or noticeable for most destinations.

We do however have a set of remote sites which communicate over the Internet to our central sites where the transit providers are. Just general Internet at the remote sites- but traffic from remote sites to central sites would be the most important.

I am just not sure of exactly how to define the "partial" routing table criteria to our two providers. Should we just take routes for each provider and their peers and a default from both?

The main reason for not taking a full routing table is the cost/inconvenience of upgrading existing hardware.

Thanks

Thanks,

So we just need to take a decision on whether we want to pay the price for a full routing table, whether it gives us enough value for the expenditure.

Interesting... is the cost associated with full tables just for the Hardware or is the service provider charging extra for the full table.

Faisal Imtiaz
Snappy Internet & Telecom
7266 SW 48 Street
Miami, FL 33155
Tel: 305 663 5518 x 232

Help-desk: (305)663-5518 Option 2 or Email: Support@Snappytelecom.net

Just for the hardware and the planning required for migrating to new hardware human resource etc.

BGP traffic engineering is kind of like Soda Prefer. that folks have.... Some like Pepsi, some Like Coke, some don't care as long as it is Cold and fizzy.

Depending on who your two providers are, you may be happy with just taking full routes, and doing some creative routing (i.e. setting up static routes for outbound for specific prefixes, not the most elegant solution).

Remember, BGP allows for Asymmetric routing, as such with default routes, you will have traffic coming in from both providers (by default) and traffic going out via one of them (by default).

At the end of the day you are most likely to make a decision based on what is your cost for having a more powerful router, and how much 'creative routing' you want to / need to do.
(My Personal opinion, is that it is a 50/50 decision to upgrade hardware just to take full routing tables.. however if there are other reasons or needs, that can sway the decision in one direction or the other).

:slight_smile:

Faisal Imtiaz
Snappy Internet & Telecom

Since you can't take a full feed from either upstream, partial routes
will mean taking your upstream's own routes + their directly-connected
customers + default.

You may make it more flexible by asking for their peering routes also,
but if these are large global transit providers, that could be the full
BGP table anyway (or 90% of it).

Mark.

Hello,

Without a full table you are not protected from partitions. Partitions
are when a particular destination is reachable via one of your ISPs
but not via the other. Without receiving the route, you have no idea
which ISP can reach it.

Partitions happen fairly often but rarely last long (on the order of
minutes). The worst cases tend to be when two backbones get into a
peering dispute. Those have been known to last a week or more. See:
Cogent v. everybody else.

Think of it this way: a partial table is like an unsigned SSL
certificate. Better than static routes but not fully protected.

Regards,
Bill Herrin

Well, we�re using 2x Cisco 3560X switches for simple inbound/outbound load sharing with our provider for years (http://wiki.nil.com/EBGP_load_sharing). There�s no need for us going full routes...
Regards,
Michael

Remember this:

1) for inbound traffic there will be no difference at all.

2) routers will ignore a static route if the link is down. If you can get
BFD from the providers then even better.

So you can emulate 99% of what you get with full routes by loading in
static routes. A simple example would be adding a 0.0.0.0/1 route to one
provider and 128.0.0.0/1 route to the other and get approximately 50% load
sharing.

You will still get redundancy as the route will ignored if the link is down
and traffic will follow the default route to the other transit provider.

If you find an offline source for IP ranges originated by each provider and
their peers, you can add routes for that to improve routing. Taking in
partial routes is also good if this provides you with a route count that
your routers can handle.

BGP shortest AS length routing is really not very good to begin with. If
you want the best routes, you need to analyse your traffic, sort by volume
or other metric and figure out which way is best for your top x AS
destinations. It may be more work, but you will get better routing
compared to investing in expensive routers to take in full routes and then
hope BGP magic takes cares for the rest automatically.

Regards,

Baldur

If your traffic is small, you could setup a VyOS box. You can still get redundancy by having two switches, each one connected to an upstream provider receiving a default route. Then hookup your VyOS router to each switch and receive full routes to that. You will need a /29 subnet from your providers to pull this off. If your VyOS box goes down for whatever reason, you will failover to using one or the other switch. Announce your prefixes using the BGP session on each switch so that your inbound traffic doesn't hit the VyOS box.

Hi,

We've have recently published new version of our BGP routing
optimization platform where we included new feature called: vRouter.
That might be interesting for you.

The vRouter provides route summarization support to BGP routers whose
TCAM is unable to hold the entire actual feed of Internet prefixes. In
this case BGP edge routers work in 'Selective Route Download' mode and
transmit the full routing table to the vRouter. The NSI vRouter selects
a number of prefixes to which the most data is routed and advertises the
necessary routes back to the edge router. Thereby the edge router does
not have to support the entire routing table, but only the routes it
needs to reach.

Please contact me off-list if you need more details.

Regards,
Pawel Rybczyk
Product Evangelist
Border 6

Something to point out: Sometimes the device you connect to is up, but has no reachability to the rest of the world. Using static routes is.. well.. static. There are a few cases (such as the one mentioned) where a static route can be somewhat dynamic. Another case is when the static route next hop does not respond to ARP requests or some machines have the ability to perform triggered actions on some sort of event/test. But why bother with BGP if you're just going to override its decisions by using static routes?

As another commenter mentioned, using anything less than a full table is a compromise. If one wants the redundancy in the case of an upstream ISP outage, take full routes. If one wants the traffic engineering flexibility, take full routes and use a BGP knob like route maps to modify existing prefixes rather than make up your own. A default route of last resort is fine; Overriding BGP through static routes degrades the utility of BGP.

Thanks for pointing this out. However I would like to argue whether this is
a big drawback or not.

If the original poster had infinite money and infinite resources there
would be no question to ask. Just get the most expensive router out there
and get full tables.

So given that the money could be spent on other things, that might be more
helpful for his company, is it good value to invest in new routers? I
believe every company and NOC teams needs to decide this for themselves. I
do however feel this is often a rushed decision because people have an idea
that anything less than full tables is not good enough and that you are not
a real ISP if you do not have full tables etc.

It is true that your static routes could end up pointing at a half dead
router, that still keeps the link up. But it is also perfectly possible for
a router to keep advertising routes, that it really can't forward traffic
to or where there are service problems so servere that it amounts to the
same (excessive packet loss etc). This is supposed to be rare for a good
quality transit provider and the remedy is the same (manually take the link
down).

We got our big routers and full tables early on. With perfect 20/20
hindsight I am not sure I would spend the money that way if I had to do it
over.

All I am saying is that you can get most of the value with partial tables.
You get 100% of it with ingress traffic and you can move a very large
fraction of your egress exactly the same. Your redundancy might not be
equal, but it will not be entirely bad.

Regards,

Baldur

First off thanks to everyone that responded to my original post, very instructive and informational replies along with a good view of different perspectives.

Baldur, you pointed out that for ingress it's exactly the same to take partials, we are only affected on outbound and we can achieve a large part of the redundancy for outbound also. Someone else pointed out that partitions of the Internet view from our two providers are often lasting minutes rather than hours. Given this input I really lean towards Baldur's statement of we can probably spend the money better elsewhere.

One point I will try and make internally is "Do we care about all of the Internet all of the time?", note we are not an ISP. Basically if some part of the Internet in is unreachable for a "short" period will we even notice it? Always if it is one of our remote sites, but of course we can mitigate that by making those part of the partials that we take from both of our providers.

By taking full routes I can only see us protecting the view of the whole Internet our internal web browsing clients, after all if a partition to a "busy" part of the Internet happens we will notice it straight away (Google etc.), but if it is someone's iTunes server on the end of some small DSL provider- do we care?

One thing I would rather not do which is manage static routes on the BGP routers seems counter intuitive on the face of it.

A gateway of last resort, also called a backup default route, will take care of partitions and is, in my opinion, a good idea if you are not providing transit to others. It's a requirement if you're not taking full routes, but even if you do take full routes the management cost is practically nill.

The practical problem with with using static routes (or a locally generated default route only BGP feed) for egress route selection is when your upstream providers perform maintenance or have an outages. When this occurs, you'll likely be impacted during the duration of the event. This may be 5 minutes, it may be hours. What are the track records for your upstream ISPs? Is having two ISPs doubling your downtime, and is this the desired outcome? If you can't send traffic out to half of the internet for an hour is that OK? At midnight? At noon?

--Blake

You could have your transit providers send you a default route in the BGP session instead of nailing it up using a static. That way if the interface does not physically go down but the BGP session does, the default route will be pulled when the BGP session dies.

Also, you could go with a less expensive router that will handle full routes such as the Mikrotik CCR's ( http://routerboard.com/CCR1036-8G-2SplusEM ). Get one for each of your transit providers. People have varying experiences with Mikrotik however for basic use they seem to work well.

Jeremy Malli
jeremy@vcn.com

No, Blake, it won't. A partition means one of your ISPs has no route
to the destination. Route the packet to that ISP via a default route
and it gets sent to /dev/null. More, during a partition you don't get
to pick which of your ISPs lack the route.

Regards,
Bill Herrin