GBLX router upgrade breaks bgp sessions

Subject says it all. GBLX upgraded some edge routers to a new JunOS
release (possibly 5.3 rev 24)- and now our bgp sessions continually
reset with:

Jul 10 06:58:24 MST: %BGP-3-NOTIFICATION: sent to neighbor X.X.X.X 3/3 (update missing required attributes) 0 bytes

Anyone clueful at GBLX listening? We've been down for about 4 hours,
and the NOC (call center) people are less than helpful.

bill

I don't know about gblx, but I saw a problem like this at our border.
After JunOS was upgraded to 5.3r2.4 (other side IOS) the session was
continually being reset. The bgp session between theser two peers
was setup with family inet any (for multicast peering) and when that
was removed, the problem went away. I also heard about a problem that
may be related I2 was having with their Juniper code, it sounded
related, but I haven't investigated the details yet.

John

This sounds an awful lot like a problem we saw awhile back when
upgrading from JUNOS 4.x to 5.x. At some point (I don't remember
exactly when, but the details should be in the case notes of
PR.19592) Juniper implemented a change that which makes their box
compliant with RFC 2858. However, when speaking BGP with a
non-RFC-compliant box (such as a Cisco running something like
12.0(15)S) the session flaps continuously in the manner you
describe because the other box expects the NEXT_HOP attribute to
be present in every update message.

Quoting from an email exchange I had with our Juniper rep:

"The result of the change is that JUNOS no longer sends the NEXT_HOP
attribute in an UPDATE message if only the MP_REACH_NLRI attribute is
present. A workaround is to only use family inet unicast instead of
multicast or any on all BGP sessions to those cisco routers or upgrade
all of the cisco routers."

You might try forcing 'nlri uni' on your side to see if that
does anything.

Interested parties may wish to have a look at PR.22527, which was
opened at our request and adds a knob to revert back to the
non-RFC-compliant behavior, which is useful during a transition
period.

--Jeff

That was it- A quick TAC case later (about 10 minutes turnaround from
problem submission to resolution- upgrade IOS or remove multicast from
bgp peer) and the problem is fixed. I removed multicast since it was
not required on this peer, and will schedule the IOS upgrade during
a more friendly maintenance window.

GBLX, however, has not returned my call since I opened a high priority,
customer down ticket about 1.5 hours ago. Like all other support calls
to their NOC, this seems to have disappeared into nevernever land.
I love the GBLX network when it works, but god help you if you ever
need to talk to a clueful NOC person to fix a problem (especially after
hours.)

bill

Can you provide any details as to why you had to "remove multicast" -
do you mean, remove MBGP ? Or is there more?

nanog wrote:

Yes, removing MBGP from the neighbor statement. Sorry for the ambiguity.

bill