Global Crossing Contact / BGP and SONET interaction question

Two somewhat intertwined questions. I'll ask the second part first.

I buy transit from Global Crossing and another carrier on HDLC encapsulated DS3's.

Recently my BGP session has started flapping on the GX circuit... It looks something like this:

Jul 21 21:17:43.731 UTC: %BGP-3-NOTIFICATION: received from neighbor 67.17.168.73 6/6 (cease) 0 bytes
Jul 21 21:17:43.731 UTC: %BGP-5-ADJCHANGE: neighbor 67.17.168.73 Down BGP Notification received
Jul 21 21:18:25.439 UTC: %BGP-5-ADJCHANGE: neighbor 67.17.168.73 Up
Jul 21 21:29:52.315 UTC: %BGP-3-NOTIFICATION: received from neighbor 67.17.168.73 6/6 (cease) 0 bytes
Jul 21 21:29:52.315 UTC: %BGP-5-ADJCHANGE: neighbor 67.17.168.73 Down BGP Notification received
Jul 21 21:30:38.511 UTC: %BGP-5-ADJCHANGE: neighbor 67.17.168.73 Up
Jul 21 21:31:34.411 UTC: %BGP-3-NOTIFICATION: received from neighbor 67.17.168.73 6/6 (cease) 0 bytes
Jul 21 21:31:34.411 UTC: %BGP-5-ADJCHANGE: neighbor 67.17.168.73 Down BGP Notification received
Jul 21 21:32:20.535 UTC: %BGP-5-ADJCHANGE: neighbor 67.17.168.73 Up
Jul 21 21:32:52.547 UTC: %BGP-5-ADJCHANGE: neighbor 67.17.168.73 Down Peer closed the session
Jul 21 21:33:32.703 UTC: %BGP-5-ADJCHANGE: neighbor 67.17.168.73 Up

There are no other log entries during the periods when this occur. Unfortunately this causes enough prefix flaps that any prefixes which are preferred through GX are damped for like a half hour by certain providers as my BGP routes get added/withdrawn through the GX link.

GX claims (although I'm not sure they really know) that these are caused by SONET ring switches. I can believe this, since I haven't seen any real circuit flaps, and my understanding is that a SONET switch should generally be fast enough that you normally won't see the transition other than perhaps an error counter or two cranking up. However, it seems strange that I'm getting a 6/6 (cease) notification which I read as "configuration change" from their router. GX also seems to be at a loss to explain why my BGP is flapping - other than to point at the SONET switches.

I guess I'm trying to find out if someone on the list recognizes what this might be so I can perhaps help GX find and fix this. I'm also kinda curious as to whether or not typically a SONET ring switch event would actually propagate into a router in such a way that BGP would try to shut down the BGP sessions. I'm just having a hard time visualizing how a supposedly below-layer-two switch would cause bgp to reset in this manner. Not being a SONET expert even by any long stretch of the imagination leaves me with some holes here, but I thought the whole goal of SONET when used to provide DS3 circuits was to hide the ring switches as much as possible from the DS3 circuits - realizing that framing may be hard to preserve on a ring switch which would cause momentary loss of sync or similar - which usually shows up as an error instead of a interface flap.

And finally, does anyone have a contact within GX with a clue? So far I'm not sure I've talked to anyone who knows anything but how to spell BGP. I'd really like to talk to someone about the real cause of these flaps and try to resolve them so they don't reoccur.

-forrest

Forrest:

<snip>

Recently my BGP session has started flapping on the GX circuit... It
looks something like this:

Jul 21 21:33:32.703 UTC: %BGP-5-ADJCHANGE: neighbor 67.17.168.73 Up

There are no other log entries during the periods when this occur.
Unfortunately this causes enough prefix flaps that any prefixes which
are preferred through GX are damped for like a half hour by certain
providers as my BGP routes get added/withdrawn through the GX link.

<snip>

I don't have an answer to the root cause of your problem, and I'm not
looking for a discussion on route dampening (there are enough debates on
this issue to make your head spin), but may I suggest you raise your hold
timers to prevent your BGP sessions from going down on short disturbances as
these?

-forrest

Randy

Randy Epstein wrote:

I don't have an answer to the root cause of your problem, and I'm not looking for a discussion on route dampening (there are enough debates onthis issue to make your head spin), but may I suggest you raise your hold timers to prevent your BGP sessions from going down on short disturbances as these?

From what I can tell the disturbances are less than a second in duration. It doesn't appear that this is a hold-timer issue, although I would like GX to set it at something higher than 90 seconds (mine is already at a higher value- but the lower value wins during negotiation). I really suspect that either a) GX has some semi-weird configuration where the SONET ring switching from the normal to the protect path and back causes BGP to reset on the border router I'm attached to or b) There is a separate issue which is causing BGP to flap. Or of course, something else completely different.

Unfortunately, I haven't been able to figure out how to talk to anyone at GX which actually has access to the routers and knows anything about BGP.

-forrest

* repstein@chello.at (Randy Epstein) [Wed 26 Jul 2006, 07:44 CEST]:

Recently my BGP session has started flapping on the GX circuit... It looks something like this:

Jul 21 21:33:32.703 UTC: %BGP-5-ADJCHANGE: neighbor 67.17.168.73 Up

There are no other log entries during the periods when this occur. Unfortunately this causes enough prefix flaps that any prefixes which are preferred through GX are damped for like a half hour by certain providers as my BGP routes get added/withdrawn through the GX link.

I don't have an answer to the root cause of your problem, and I'm not looking for a discussion on route dampening (there are enough debates on this issue to make your head spin), but may I suggest you raise your hold timers to prevent your BGP sessions from going down on short disturbances as these?

Wrong error condition - hold timer isn't triggered when the interface for a directly connected neighbor goes down.

You'll want Global Crossing to configure a hold-timer on their Juniper or a carrier-delay on their Cisco router. Or configure "no bgp fast-external-fallover" but that has more side effects.

  -- Niels.

The timing of protection switching on a SONET ring is of completely the wrong order to upset a BGP session. From memory there's a designed in upper bound of 200 mS from fault to fully restored with typical values being more like 50 mS.

One possibility that occurs to me is that the A end here might be using a router with a SONET card, and the router software is propagating a SONET event through the stack causing BGP to react to an event it wouldn't even see on a physically separate SONET ADM. That is pure speculation though.

You should be able to tell from (cisco speak)

  show controller pos a/b

The counters should be increasing if there are any issues with the line
or the path

Make sure both ends sync from the line (as it's a syncronous link).

A path switch event with won't impact BGP, if it did the entire internet
would be a constant
flap. Typically path switches are 50ms in the metro and not much longer
on long distance
(or is done at the optical layer). It also could be that the protection
may need to be
uni-directional.

Make sure you set scramble on the link as some multiplexers still have
inband
traffic signal issues that IP traffic can trigger.

Regards,
Neil.