BGP convergence problem

Hi,

This morning there was an ethernet loop problem on DECIX, causing many
BGP sessions to flap throughout the entire platform.
While this can happen, I am myself facing with BGP convergence
problems on our DECIX router (SUP720-3BXL with IOS SXI3).

De DECIX loop has been solved two hours ago, but my BGP sessions are
still flapping and not converging at all. This has been flooding our
logs, and is still going on:

Jun 8 11:47:03 x.x.x.131 239447: Jun 8 11:48:38.364 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.194.32 Up
Jun 8 11:47:03 x.x.x.131 239448: Jun 8 11:48:38.364 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.231 Up
Jun 8 11:47:03 x.x.x.131 239449: Jun 8 11:48:38.364 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.109 Up
Jun 8 11:47:03 x.x.x.131 239450: Jun 8 11:48:38.364 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.194.50 Up
Jun 8 11:47:03 x.x.x.131 239451: Jun 8 11:48:38.364 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.81 Up
Jun 8 11:47:03 x.x.x.131 239452: Jun 8 11:48:38.364 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.28 Up
Jun 8 11:47:03 x.x.x.131 239453: Jun 8 11:48:38.364 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.193.212 Up
Jun 8 11:47:03 x.x.x.131 239454: Jun 8 11:48:38.368 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.193.147 Up
Jun 8 11:47:03 x.x.x.131 239455: Jun 8 11:48:38.368 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.74 Up
Jun 8 11:47:03 x.x.x.131 239456: Jun 8 11:48:38.368 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.241 Up
Jun 8 11:47:03 x.x.x.131 239457: Jun 8 11:48:38.368 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.194.5 Up
Jun 8 11:47:03 x.x.x.131 239458: Jun 8 11:48:38.368 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.40 Up
Jun 8 11:47:03 x.x.x.131 239459: Jun 8 11:48:38.368 CEST:
%BGP-5-ADJCHANGE: neighbor 2001:7F8::1A44:0:1 Up
Jun 8 11:47:03 x.x.x.131 239460: Jun 8 11:48:38.368 CEST:
%BGP-5-ADJCHANGE: neighbor 2001:7F8::8605:0:1 Up
Jun 8 11:47:03 x.x.x.131 239461: Jun 8 11:48:38.368 CEST:
%BGP-5-ADJCHANGE: neighbor 2001:7F8::1A0B:0:1 Up
Jun 8 11:47:03 x.x.x.131 239462: Jun 8 11:48:38.368 CEST:
%BGP-5-ADJCHANGE: neighbor 2001:7F8::3029:0:1 Up
Jun 8 11:47:03 x.x.x.131 239463: Jun 8 11:48:38.368 CEST:
%BGP-5-ADJCHANGE: neighbor 2001:7F8::6E4:0:1 Up
Jun 8 11:47:03 x.x.x.131 239464: Jun 8 11:48:38.372 CEST:
%BGP-5-ADJCHANGE: neighbor 2001:7F8::CB0:0:1 Up
Jun 8 11:47:03 x.x.x.131 239465: Jun 8 11:48:38.372 CEST:
%BGP-5-ADJCHANGE: neighbor 2001:7F8::21C8:0:1 Up
Jun 8 11:47:03 x.x.x.131 239466: Jun 8 11:48:38.372 CEST:
%BGP-5-ADJCHANGE: neighbor 2001:7F8::8463:0:2 Up
Jun 8 11:47:04 x.x.x.131 239467: Jun 8 11:48:38.372 CEST:
%BGP-5-ADJCHANGE: neighbor 2001:7F8::31AA:0:1 Up
Jun 8 11:47:04 x.x.x.131 239468: Jun 8 11:48:38.372 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.194.29 Up
Jun 8 11:47:04 x.x.x.131 239469: Jun 8 11:48:38.372 CEST:
%BGP-5-ADJCHANGE: neighbor 2001:7F8::62BF:0:1 Up
Jun 8 11:47:04 x.x.x.131 239470: Jun 8 11:48:39.656 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.101 Down BGP Notification sent
Jun 8 11:47:04 x.x.x.131 239471: Jun 8 11:48:39.656 CEST:
%BGP-3-NOTIFICATION: sent to neighbor 80.81.192.101 4/0 (hold time
expired) 0 bytes
Jun 8 11:47:07 x.x.x.131 239472: Jun 8 11:48:41.696 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.104 Up
Jun 8 11:47:10 x.x.x.131 239473: Jun 8 11:48:44.488 CEST:
%BGP-3-BGP_NO_REMOTE_READ: 80.81.193.187 connection timed out - has
not accepted a message from us for 20000ms (hold time), 1 messages
pending transmition.
Jun 8 11:47:10 x.x.x.131 239474: Jun 8 11:48:44.488 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.193.187 Down BGP Notification sent
Jun 8 11:47:10 x.x.x.131 239475: Jun 8 11:48:44.488 CEST:
%BGP-3-NOTIFICATION: sent to neighbor 80.81.193.187 4/0 (hold time
expired) 0 bytes
Jun 8 11:47:10 x.x.x.131 239476: Jun 8 11:48:44.900 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.194.61 Up
Jun 8 11:47:10 x.x.x.131 239477: Jun 8 11:48:44.900 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.149 Up
Jun 8 11:47:10 x.x.x.131 239478: Jun 8 11:48:44.900 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.136 Up
Jun 8 11:47:10 x.x.x.131 239479: Jun 8 11:48:44.904 CEST:
%BGP-5-ADJCHANGE: neighbor 2001:7F8::8463:0:1 Up
Jun 8 11:47:10 x.x.x.131 239480: Jun 8 11:48:46.352 CEST:
%BGP-5-ADJCHANGE: neighbor 2001:7F8::6268:0:1 Up
Jun 8 11:47:14 x.x.x.131 239481: Jun 8 11:48:48.084 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.193.78 Up
Jun 8 11:47:14 x.x.x.131 239482: Jun 8 11:48:49.172 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.193.239 Up
Jun 8 11:47:14 x.x.x.131 239483: Jun 8 11:48:49.172 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.194.24 Up
Jun 8 11:47:17 x.x.x.131 239484: Jun 8 11:48:52.160 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.194.45 Up
Jun 8 11:47:17 x.x.x.131 239485: Jun 8 11:48:52.160 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.108 Up
Jun 8 11:47:17 x.x.x.131 239486: Jun 8 11:48:52.160 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.164 Up
Jun 8 11:47:17 x.x.x.131 239487: Jun 8 11:48:52.164 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.193.49 Up
Jun 8 11:47:17 x.x.x.131 239488: Jun 8 11:48:52.164 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.139 Up
Jun 8 11:47:17 x.x.x.131 239489: Jun 8 11:48:52.164 CEST:
%BGP-5-ADJCHANGE: neighbor 2001:7F8::1536:0:1 Up
Jun 8 11:47:17 x.x.x.131 239490: Jun 8 11:48:52.164 CEST:
%BGP-5-ADJCHANGE: neighbor 2001:7F8::8601:0:1 Up
Jun 8 11:47:17 x.x.x.131 239491: Jun 8 11:48:53.788 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.45 Up
Jun 8 11:47:17 x.x.x.131 239492: Jun 8 11:48:53.788 CEST:
%BGP-5-ADJCHANGE: neighbor 2001:7F8::A2DC:0:1 Up
Jun 8 11:47:21 x.x.x.131 239493: Jun 8 11:48:55.056 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.194.91 Down BGP Notification sent

Jun 8 11:49:04 x.x.x.131 239583: Jun 8 11:50:37.684 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.194.14 Down Peer closed the session
Jun 8 11:49:04 x.x.x.131 239584: Jun 8 11:50:38.656 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.120 Down BGP Notification sent
Jun 8 11:49:04 x.x.x.131 239585: Jun 8 11:50:38.656 CEST:
%BGP-3-NOTIFICATION: sent to neighbor 80.81.192.120 4/0 (hold time
expired) 0 bytes
Jun 8 11:49:04 x.x.x.131 239586: Jun 8 11:50:38.656 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.229 Down BGP Notification sent
Jun 8 11:49:04 x.x.x.131 239587: Jun 8 11:50:38.656 CEST:
%BGP-3-NOTIFICATION: sent to neighbor 80.81.192.229 4/0 (hold time
expired) 0 bytes
Jun 8 11:49:04 x.x.x.131 239588: Jun 8 11:50:38.656 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.108 Down BGP Notification sent
Jun 8 11:49:04 x.x.x.131 239589: Jun 8 11:50:38.656 CEST:
%BGP-3-NOTIFICATION: sent to neighbor 80.81.192.108 4/0 (hold time
expired) 0 bytes
Jun 8 11:49:07 x.x.x.131 239590: Jun 8 11:50:41.944 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.194.73 Down BGP Notification sent
Jun 8 11:49:07 x.x.x.131 239591: Jun 8 11:50:41.944 CEST:
%BGP-3-NOTIFICATION: sent to neighbor 80.81.194.73 4/0 (hold time
expired) 0 bytes
Jun 8 11:49:07 x.x.x.131 239592: Jun 8 11:50:41.944 CEST:
%BGP-5-ADJCHANGE: neighbor 2001:7F8::20AD:0:2 Down BGP Notification
sent
Jun 8 11:49:07 x.x.x.131 239593: Jun 8 11:50:41.944 CEST:
%BGP-3-NOTIFICATION: sent to neighbor 2001:7F8::20AD:0:2 4/0 (hold
time expired) 0 bytes
Jun 8 11:49:07 x.x.x.131 239594: Jun 8 11:50:41.944 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.193.115 Down BGP Notification sent
Jun 8 11:49:07 x.x.x.131 239595: Jun 8 11:50:41.944 CEST:
%BGP-3-NOTIFICATION: sent to neighbor 80.81.193.115 4/0 (hold time
expired) 0 bytes
Jun 8 11:49:07 x.x.x.131 239596: Jun 8 11:50:44.124 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.194.3 Down BGP Notification sent
Jun 8 11:49:11 x.x.x.131 239597: Jun 8 11:50:44.124 CEST:
%BGP-3-NOTIFICATION: sent to neighbor 80.81.194.3 4/0 (hold time
expired) 0 bytes
Jun 8 11:49:11 x.x.x.131 239598: Jun 8 11:50:45.200 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.215 Down BGP Notification sent
Jun 8 11:49:11 x.x.x.131 239599: Jun 8 11:50:45.200 CEST:
%BGP-3-NOTIFICATION: sent to neighbor 80.81.192.215 4/0 (hold time
expired) 0 bytes
Jun 8 11:49:11 x.x.x.131 239600: Jun 8 11:50:47.336 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.141 Down BGP Notification sent
Jun 8 11:49:11 x.x.x.131 239601: Jun 8 11:50:47.336 CEST:
%BGP-3-NOTIFICATION: sent to neighbor 80.81.192.141 4/0 (hold time
expired) 0 bytes
Jun 8 11:49:14 x.x.x.131 239602: Jun 8 11:50:48.432 CEST:
%BGP-5-ADJCHANGE: neighbor 2001:7F8::3B41:0:1 Down BGP Notification
sent
Jun 8 11:49:14 x.x.x.131 239603: Jun 8 11:50:48.432 CEST:
%BGP-3-NOTIFICATION: sent to neighbor 2001:7F8::3B41:0:1 4/0 (hold
time expired) 0 bytes
Jun 8 11:49:14 x.x.x.131 239604: Jun 8 11:50:49.720 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.239 Down BGP Notification sent
Jun 8 11:49:14 x.x.x.131 239605: Jun 8 11:50:49.720 CEST:
%BGP-3-NOTIFICATION: sent to neighbor 80.81.192.239 4/0 (hold time
expired) 0 bytes
Jun 8 11:49:17 x.x.x.131 239606: Jun 8 11:50:50.976 CEST:
%BGP-5-ADJCHANGE: neighbor 2001:2000:3080:B4::1 Down Peer closed the
session
Jun 8 11:49:17 x.x.x.131 239607: Jun 8 11:50:52.976 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.194.20 Down BGP Notification sent
Jun 8 11:49:17 x.x.x.131 239608: Jun 8 11:50:52.976 CEST:
%BGP-3-NOTIFICATION: sent to neighbor 80.81.194.20 4/0 (hold time
expired) 0 bytes
Jun 8 11:49:17 x.x.x.131 239609: Jun 8 11:50:54.044 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.193.21 Down BGP Notification sent
Jun 8 11:49:17 x.x.x.131 239610: Jun 8 11:50:54.044 CEST:
%BGP-3-NOTIFICATION: sent to neighbor 80.81.193.21 4/0 (hold time
expired) 0 bytes
Jun 8 11:49:20 x.x.x.131 239611: Jun 8 11:50:56.204 CEST:
%BGP-5-ADJCHANGE: neighbor 2001:7F8::1A0B:0:1 Down BGP Notification
sent
Jun 8 11:49:20 x.x.x.131 239612: Jun 8 11:50:56.204 CEST:
%BGP-3-NOTIFICATION: sent to neighbor 2001:7F8::1A0B:0:1 4/0 (hold
time expired) 0 bytes
Jun 8 11:49:23 x.x.x.131 239613: Jun 8 11:50:58.400 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.194.63 Down BGP Notification sent
Jun 8 11:49:23 x.x.x.131 239614: Jun 8 11:50:58.400 CEST:
%BGP-3-NOTIFICATION: sent to neighbor 80.81.194.63 4/0 (hold time
expired) 0 bytes
Jun 8 11:49:23 x.x.x.131 239615: Jun 8 11:50:59.448 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.97 Down BGP Notification sent
Jun 8 11:49:23 x.x.x.131 239616: Jun 8 11:50:59.448 CEST:
%BGP-3-NOTIFICATION: sent to neighbor 80.81.192.97 4/0 (hold time
expired) 0 bytes
Jun 8 11:49:27 x.x.x.131 239617: Jun 8 11:51:01.664 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.131 Down BGP Notification sent
Jun 8 11:49:27 x.x.x.131 239618: Jun 8 11:51:01.664 CEST:
%BGP-3-NOTIFICATION: sent to neighbor 80.81.192.131 4/0 (hold time
expired) 0 bytes
Jun 8 11:49:27 x.x.x.131 239619: Jun 8 11:51:03.872 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.194.83 Down BGP Notification sent
Jun 8 11:49:27 x.x.x.131 239620: Jun 8 11:51:03.872 CEST:
%BGP-3-NOTIFICATION: sent to neighbor 80.81.194.83 4/0 (hold time
expired) 0 bytes
Jun 8 11:49:27 x.x.x.131 239621: Jun 8 11:51:03.872 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.192.156 Down BGP Notification sent
Jun 8 11:49:30 x.x.x.131 239622: Jun 8 11:51:03.872 CEST:
%BGP-3-NOTIFICATION: sent to neighbor 80.81.192.156 4/0 (hold time
expired) 0 bytes
Jun 8 11:49:30 x.x.x.131 239623: Jun 8 11:51:06.056 CEST:
%BGP-5-ADJCHANGE: neighbor 80.81.194.50 Down BGP Notification sent
Jun 8 11:49:30 x.x.x.131 239624: Jun 8 11:51:06.056 CEST:
%BGP-3-NOTIFICATION: sent to neighbor 80.81.194.50 4/0 (hold time
expired) 0 bytes

CPU load is constantly at 100% doing BGP and more BGP.

We have around 200 BGP sessions on DECIX and I would not want to shut
them down and bring them up individually.

How can I get out of this deadlock?

Andy

Dear Andy

This morning there was an ethernet loop problem on DECIX, causing many
BGP sessions to flap throughout the entire platform.
While this can happen, I am myself facing with BGP convergence
problems on our DECIX router (SUP720-3BXL with IOS SXI3).

De DECIX loop has been solved two hours ago, but my BGP sessions are
still flapping and not converging at all. This has been flooding our
logs, and is still going on:

route half or more of the peering-network to Null -> lowering bgp session up's.
(at the other side, your bgp-router seems to be overloaded).

Kind regards,
   Ingo Flaschberger

I finally decided to shut down all peerings and brought them back one by one.

Everything is stable again, but I don't like the way I had to deal
with it since it will most likely happen again when DECIX or an other
IX we're at is having issues.

I've seen a few BGP convergence discussions on NANOG, but none about
deadlock situations and what could be done to avoid them. Setting
higher MTU or bigger hold queues did not help.

- Andy

The Cisco 7600 and 6500 platforms are getting fairly old and have underpowered cpus these days.

Starting in SXH the control plane did not scale quite as well as in SXF. This got better in SXI, but is not back on par with SXF performance yet.

I mostly attribute this to a combination of bloat in software and routing tables. I would start to look for a replacement sooner rather than later.

- Jared

Some people have found that upgrading to an alternate router vendor
helps. ^_^;

Fundamentally, the CPU on your router is underpowered for the amount
of state information that needs to be updated in the time window of the
hold timers. If you can't move to a faster/more efficient platform, then
you may need to negotiate raising the keepalive interval and corresponding
hold timers with your neighbors, to give your router time to finish processing
updates.

Alternately, if you aren't in a position to be able to upgrade platforms, but
have spare routers around, connecting a second router up to the exchange
and splitting your neighbors up among two links into the exchange would
reduce the load on each router during reconvergence, and buy you time
until you can move to a more capable platform.

Matt

Hi Andy,

  We have had similar problems with s720/3bxl on exchanges with large
numbers of peers. Exact same symptoms, can be triggered by any
significant UPDATE flux, even iBGP originated path-hunts. This problem
is compounded if you are taking full tables on the same device, to the
extent that the bgp scanner and bgp IO processes grind the
control-plane to halt causing ISIS/OSPF adjacencies to drop, SNMP and
SSH unresponsive, etc. Same behavior is seen regardless of IOS train.
As others have pointed out, the sad fact of the matter is that the
s720/3BXL simply does not have the CPU power to cope with hundreds of
neighbor sessions and the growing numbers of paths. Here are some
things that we tried with varied success to remedy bgp deadlock on
this platform:

* lower process-max-time to prevent bgp scanner/bgp io processes from
completely consuming the control-plane
* Take soft-reconfiguration off of neighbors/peer-groups where you
can, this will help tremendously
* Split the load of neighbor sessions between multiple devices, move
full table feeds to other devices

The 'final solution' is to simply replace this platform with a newer
more powerful alternative, and there are numerous candidates :slight_smile:

Best Regards,
Kevin Hodle

Place blame where blame is due, the cpu may be slow, but the crappy ios
scheduler is the real problem here. We saw a huge reduction in the
number of self-sustaining protocols timeouts cycles on these boxes
(where the process of trying to bring up a new neighbor and converge
routing uses so much cpu that it causes other neighbors to time out,
resulting in a never-ending cycle of fail until you shut down everything
and bring them up one neighbor at a time) with the move from SXF to the
SR branches. We never really went down the SXH/SXI road, but I'd have
assumed they would have introduced the same improvements there too. I
guess you know what they say about assuming. :slight_smile:

Try the usual suspects:

* Configure "process-max-time 20" at the top level, this improves
interactivity by making the scheduler switch processes more often.

* Make sure you don't have an overly aggressive control-plane policer.
In my experience the COPP rate-limits are quite harsh, and if you end up
bumping against them you don't get a graceful slowing of the exchange of
routes, you get protocol timeouts.

* Make sure you don't have any stupid mls rate-limits, such as cef
receive. I don't know why anyone would ever want to configure this, all
it does is make your box fall over faster (as if these things need any
help) by rate-limiting all traffic to the msfc.

* You might want to try something like "scheduler allocate 400 4000",
which gives the vast majority of the cpu time to the control plane
rather than process switching on the data plane (which in theory
shouldn't happen on an entirely hw forwarded box like 6500/7600, though
of course we all know that isn't true :P).

Oh and also the OP should take this to the cisco-nsp mailing list, where
all the good bitching about broken Crisco routers takes place. :slight_smile:

The Cisco 7600 and 6500 platforms are getting fairly old and have
underpowered cpus these days.

the hamsters in them were never well fed, ever. though i have never run
one, too yucchhy, i have measured receiving a research feed from one.
over ten minutes for a full table while a router takes two.

some researcher into archeology might try to measure if is just a sick
tcp or if it is closer to rib-out.

randy

* globichen@gmail.com (Andy B.) [Tue 08 Jun 2010, 16:28 CEST]:

I finally decided to shut down all peerings and brought them back one by one.

Sadly that's often the way it has to be done, modulo mild tweaks.

Everything is stable again, but I don't like the way I had to deal with it since it will most likely happen again when DECIX or an other IX we're at is having issues.

As others have said upthread in more polite wordings, get a better router if yours can't handle the load. (Or use the route servers more - it's what they're there for.)

I've seen a few BGP convergence discussions on NANOG, but none about deadlock situations and what could be done to avoid them. Setting higher MTU or bigger hold queues did not help.

I hope you didn't change the MTU to anything different from what everybody else on the DE-CIX Peering LAN uses - that only leads to suffering.

  -- Niels.