BGP peering strategies for smaller routers

Hello,

     I have an ASR1000 router with 4gb of ram. The specs say I can get '1 million routes' on it, but as far as I have been advised, a full table of internet routes numbers more than 530k by itself, so taking 2 full tables seems to be out of the question (?).

      I am looking to connect to a second ip transit provider and I'm looking for any advice or strategies that would allow me to take advantage and make good forwarding decisions while not breaking the bank on bgp memory consumption. I simply don't understand how this would likely play out and what memory consumption mitigation steps may be necessary here. Im open to ideas... a pair of route reflectors? selective bgp download? static route filter maps?

Thank you.

Mike-

You have to keep in mind there are two pools of memory on the router. The
RIB and the FIB. The RIB contains all the routes that the router could
possibly load. This includes all your BGP routes, even the ones not
selected as best, and all your IGP routes. Whereas the FIB would only have
the best routes that the router is actively using to forward traffic.

FIB = 'sh ip route 4.2.2.2'
RIB = 'sh ip bgp 4.2.2.2' (or OSPF, or IS-IS, or RIP, etc, etc)

Generally FIB capacity is given as a number of routes and RIB capacity is
given as a memory amount. Since the router manufacturer doesn't know what
protocols you'll be running or how many attributes you'll be storing they
don't really have a good idea of how many routes you'll get in the memory
the router has.

Now that you have a full table on the router, you won't see much growth on
the FIB. For the most part every route you'll learn is already learned.
You'll have some growth because not every transit provider will advertise
the exact same table. My router with 3 transits has 580k routes for
instance.

In short, you're fine. Read up on RIB and FIB though so you have a good
handle on when you're about to start running into problems.

JM

Sounds like you have enough router resources to do your peering and take
2 full feeds.

Mark.

You have to keep in mind there are two pools of memory on the router.

There's actually three.

1. Prefix (path) via BGP: "show ip bgp <prefix>". BGP will select the
'best' BGP path (can be multiple if ECMP) and send that through to the RIB.
2. RIB. "show ip route <prefix>". routing table will show the path chosen
- and if there are backup paths etc, but may be recursive, e.g. prefix
a.b.c.d points at e.f.g.h which in turn points at i.j.k.l etc.
3. FIB. basically fully resolved prefixes.

What you otherwise say is correct - you could have N transit providers at
(1) providing lotsOfPaths x N providers which ultimately resolve to
lotsOfRoutes with up to N next-hops.
Much design effort goes into the routing stack to efficiently store
lotsOfPaths.

Can't speak for what an ASR1K does but suggest the OP talk to Cisco.

cheers,

lincoln.

Hello.

When we was in a similar situation we opted for one transit provider to provide a default to us then we filtered on AS-HOPS so prefixes that was more than 3 hops away was denied.
This way we got the ones that where closest to us and that where more likely to matter. Prefixes that’s more than 3 hops away on both links could probably just as well go on a default insteed.
However it’s a rather crude way of fixing the issue. We just did it to have the router up while we got extra memory from it. (we had memory shortage after an update that we needed to apply to correct some bug I think. We couldn’t just rollback the update if my memory serves me correct.)

//Gustav

Careful with the ASR1000 and full tables at 4GB.

http://www.gossamer-threads.com/lists/cisco/nsp/180710

I recommend adding some third party RAM to get 16GB.

Mike, the ASR1k series has several ESP options (ESP5, 10, 20, 40, 100, 200). Each ESP comes with a fixed amount of forwarding tcam which holds the forwarding information base (FIB). The ESP5 has 5MB of tcam can hold ~500k routes. The ESP10 has 10MB of tcam, so theoretically should hold roughly double (1 million routes). The ESP20 and ESP40 have 40MB of tam; Cisco quotes these as 4 million routes.
http://www.cisco.com/c/en/us/products/collateral/routers/asr-1000-series-aggregation-services-routers/datasheet-c78-731640.html

There are two route processor (RP) options for the ASR1k series. The RP1 and RP2. If you have 4GB, then you have an upgraded RP1. Which Cisco quotes at 1,000,000 IP4 routes *OR* 500k IP6 routes. Not to get too nitty gritty, but I would simplify these to say that there are 1,000,000 route slots; each IP4 route takes 1 slot and each IP6 route takes 2.
http://www.cisco.com/c/en/us/products/collateral/routers/asr-1000-series-aggregation-services-routers/data_sheet_c78-441072.html

As others have noted, the best routes from the BGP table and other routing tables are condensed into the overall routing information base (RIB) and stored in DRAM. The RIB is then condensed into the FIB and stored as TCAM. Adding additional BGP peers will take minimal amounts of FIB if you're receiving the same full feed from both. The effect on the RIB and overall router memory is usually not that great either (I think on the ASR1k platform it's maybe ~100MB per peer). If you have an ESP10+, you're fine for two or more full feeds. If you have an ESP5, you really don't have the hardware to hold a single full feed.

--Blake

I have used variations Gustav's solution below to good effect as well, this also works with two smaller routers providing basic fail over and load balancing. I found its best to take Full + default from one provider and just default from the other. Set a higher local-pref on the default only provider than the full+default one, then filter the full+default routes by AS-path as desired. Incoming control via the normal prepending of outgoing advertisements.

Rib or Fib for the million - thats the question - but in any event the
following will most likely work for you. BTW, full table is now over 600K
in size.

1) Choose one Transit and take their full table. (pick whatever reasons
cost savings, bigger pipe, coin flip, etc.)
2) With the second transit use a filter to drop all everything /22 or
smaller. Now check your tables , see if you have enough room.
3) Next add your peers - no filtering and lpref those routes about the
transits.
4) Ask both transits to send you a default route.

If this doesn't fit, use some more policy filtering and while this is up
and running begin the search for a router with larger tables to replace
it...as the tables will soon grow larger.

Thank You
Bob Evans
CTO

Mike,

I just did this with a ASR1001. I had to upgrade it to 8gb of ram (I got the real Cisco stuff for ~ $500). Before the router would crash when loading the tables.

Right now, I have full tables from two providers:

router1#show ip bgp summary
BGP router identifier 192.55.82.2, local AS number 4505
BGP table version is 11150622, main routing table version 11150622
582461 network entries using 144450328 bytes of memory
911730 path entries using 109407600 bytes of memory
148924/93298 BGP path/bestpath attribute entries using 36933152 bytes of memory
132977 BGP AS-PATH entries using 6043938 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 296835018 total bytes of memory
BGP activity 962568/380103 prefixes, 5155645/4243915 paths, scan interval 60 secs

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
192.55.82.3 4 4505 2532914 1634867 11150622 0 0 3w0d 330377
192.55.82.4 4 4505 672950 1634865 11150622 0 0 3w0d 1
209.117.103.33 4 2828 1837130 48052 11150557 0 0 2w1d 581351

router1#show ip cef summary
IPv4 CEF is enabled for distributed and running
VRF Default
582527 prefixes (582527/0 fwd/non-fwd)
Table id 0x0
Database epoch: 2 (582527 entries at this epoch)

-Eric

But if I'm reading the above right, it looks like bgp is eating ~300mb on your box.

BGP using 296835018 total bytes of memory

You would seem to have plenty of free ram. In my case, the ASR1002 doesn't have upgradable memory anyways so I'm stuck.

Mike-

It will be fine with 4GB of RAM provided the OP does not enable software
redundancy.

Mark.

Mike, I have a customer that has not had any operational issues taking two full IP4 feeds on an ASR1002 with RP1 @ 4GB of RAM. Again, I'd recommend an ESP10 to ensure you have enough TCAM to hold the FIB and to track netflow or other data that relies on the ESP. I can't comment on the RP2's memory usage or stability as I don't think I've ran into them.

Here's the output to show you what to expect:

#sh processes memory | inc BGP|Total:|PID
Processor Pool Total: 1725514176 Used: 918478524 Free: 807035652
  lsmpi_io Pool Total: 6295088 Used: 6294116 Free: 972
  PID TTY Allocated Freed Holding Getbufs Retbufs Process
  114 0 252 0 23264 4 4 BGP Scheduler
  282 0 165948 327892 189212 0 0 BGP Task
  294 0 0 0 17264 0 0 BGP HA SSO
  317 0 610260780 0 220080 3387715 3387715 BGP I/O
  355 0 0 17841472 23264 0 0 BGP Scanner
  405 0 0 0 23316 0 0 BGP Consistency
  422 0 0 0 23264 0 0 BGP Event
  528 0 560439808 646907016 498109200 51 51 BGP Router
  533 0 0 0 17264 0 0 BGP VA

#sh bgp su
BGP router identifier 1.2.3.4, local AS number ccccc
BGP table version is 66249122, main routing table version 66249122
598594 network entries using 86197536 bytes of memory
1177526 path entries using 94202080 bytes of memory
208207/100401 BGP path/bestpath attribute entries using 28316152 bytes of memory
179429 BGP AS-PATH entries using 7090834 bytes of memory
4342 BGP community entries using 230370 bytes of memory
1 BGP extended community entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 216036996 total bytes of memory
BGP activity 2943907/2345309 prefixes, 10963622/9786096 paths, scan interval 60 secs

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
a.a.a.a 4 aaa 5523968 51273 66249070 0 0 4w4d 578933
b.b.b.b 4 bbbbb 3731440 101400 66249070 0 0 4w4d 598586

#sh cef fib
599851 allocated IPv4 entries, 0 failed allocations
1 allocated IPv6 entry, 0 failed allocations

#sh ip route summary
IP routing table name is default (0x0)
IP routing table maximum-paths is 32
Route Source Networks Subnets Replicates Overhead Memory (bytes)
connected 0 60 0 3600 10800
static 20 79 0 6060 17820
bgp ccccc 180614 417599 0 35892780 107678340
   External: 598213 Internal: 0 Local: 0
internal 6552 23993840
Total 187186 417738 0 35902440 131700800

Hi Mike,

There is no "routing table." Instead:

Routing Information Bases (RIB)
Forwarding Information Base (FIB)

RIB = the routing data from your neighbors. Each protocol (e.g. BGP,
OSPF, static, connected) has a RIB. If you have virtual routers (VRF)
then each VRF has its own RIBs. For the BGP RIB, each prefix (route)
from each neighbor will need its own slot in the RIB.

FIB = the next hop table. Only the best next hop for each prefix is
stored in the FIB. One FIB per protocol (IPv4, IPv6), unless you run
VRFs, then one FIB each per VRF.

All routers have both a FIB and RIBs. All routers keep the RIBs in
DRAM. Big Iron like yours typically store the FIB in special hardware
called Ternary Content Addressable Memory (TCAM). You're looking at
two physically different kinds of memory in your router to store the
RIB and the FIB. No matter how much DRAM you have, you only have so
much TCAM in that routing engine. The TCAM is not upgradable. In fact,
it's buried under a big heat sink.

All the RIBs are processed down to a single FIB for each protocol/VRF.
The best next hop is selected from all the possibilities in the RIBs
and only that one gets stored in the FIB. The FIB in the TCAM is
consulted every single time a packet is routed. The RIBs are only
consulted when new information is received from a neighbor of when the
FIB table needs to be rebuilt. The RIB is not consulted to route
individual packets.

Your "million routes" is a reflection of the TCAM on your routing
engine. The FIB table size. 4G DRAM is enough for 10-15 million routes
in the various RIBs. If you're running a single VRF, you have enough
FIB headroom for the next two or three years, until the v4 and v6 BGP
tables add up to around 900M prefixes. You already have too little FIB
headroom to run two BGP-speaking VRFs.

My piddly little 2811 carries a full IPv4 table, 580k routes. Its BGP
RIB consumes all of 154 megabytes of DRAM. Unlike your router, the
2811 does not have a "hardware fast path." No TCAM. It stores the FIB
in a radix tree in DRAM instead. As a result, it can only handle low
data rates, in the 10s of megabits per second.

I'm leaving out lots of details, but these are the most important with
respect to your question.

Regards,
Bill Herrin

I just did this with a ASR1001. I had to upgrade it to 8gb of ram
(I got the real Cisco stuff for ~ $500). Before the router would
crash when loading the tables.

Hi Eric,

Something very fishy there because:

router1#show ip bgp summary
BGP using 296,835,018 total bytes of memory

Commas added for clarity.

Regards,
Bill Herrin

Hello all.

Yes I can confirm that we also had the issue with the asr1001s.
For us the router was fine until we upgraded it. When we rebooted it after the upgrade it ran out of memory when populating 2 full feeds.
When we contacted TAC they confirmed that indeed it was a memory problem and that we would need to add more memory to the box.
Perhaps 1002 isnt as thirsty?

//Gustav

-----Ursprungligt meddelande-----

Hi Gustav,

IMO, you should not accept that answer from the TAC. An IOS release
that crashes with two 600k BGP feeds in 4 gigs of RAM is badly
defective.

Regards,
Bill Herrin

William Herrin wrote:

IMO, you should not accept that answer from the TAC. An IOS release
that crashes with two 600k BGP feeds in 4 gigs of RAM is badly
defective.

I suspect the time the OP would spend raging down the phone would be
better spent sourcing a third party memory upgrade to 8G or 16G. The
upgrade would certainly be the cheaper option of the two, in addition to
being the only option with a useful outcome.

Nick

Not necessarily.

In essence, your physical memory gets halved in two after
router boots up, then it may be further halved if you’re
using features like SSO. So, with 4GB RAM config and with
SSO running, you may be left with around 600-650MB free after
boot and with IOS-XE loaded, and then all the features kick
in. Including your BGP feeds that need around 300MB of memory
just to store the tables, then there’s CEF RAM representation,
and so on.

Here’s a good WP w/r to memory usage & architecture on ASR 1k:
http://www.cisco.com/c/en/us/support/docs/routers/asr-1000-series-aggregation-services-routers/116777-technote-product-00.html

It actually contains the same recommendation given by TAC -
with recent/current code if you want to run full tables with
BGP, get 8GB of RAM on ASR 1k. In the 3.10-3.12S era I believe
it was still possible to fit (without the SSO) full tables
in RAM and be fine.

As Nick just responded, it’s faster to source the RAM or modify
the config to cut down on number of BGP prefixes rather than
ping back and forth here discussing all the possibilities.

IMO, you should not accept that answer from the TAC. An IOS release
that crashes with two 600k BGP feeds in 4 gigs of RAM is badly
defective.

Not necessarily.

In essence, your physical memory gets halved in two after
router boots up, then it may be further halved if you’re
using features like SSO. So, with 4GB RAM config and with
SSO running, you may be left with around 600-650MB free after
boot and with IOS-XE loaded, and then all the features kick
in. Including your BGP feeds that need around 300MB of memory
just to store the tables, then there’s CEF RAM representation,
and so on.

Here’s a good WP w/r to memory usage & architecture on ASR 1k:
http://www.cisco.com/c/en/us/support/docs/routers/asr-1000-series-aggregation-services-routers/116777-technote-product-00.html

Hi Łukasz,

You make some great points and that's an excellent document.

As Nick just responded, it’s faster to source the RAM or modify
the config to cut down on number of BGP prefixes rather than
ping back and forth here discussing all the possibilities.

I respectfully disagree. Sourcing more ram won't fix the next bit of
sloppiness with the software. Or the one after that. Once the manager
of that team starts to accept poor code quality, the only thing with a
chance of fixing it is strong customer push-back.

And it is poor code quality. Even slicing and dicing the ram in odd
ways, there's just no excuse for an order-of-magnitude increase in ram
required to run the same algorithms on the same data.

Regards,
Bill Herrin