BGP next-hop

Hi all,

Is there an easy way to see which iBGP routes are not being selected
due to next-hop not being in IGP?

Before and after IGP route added shown below, note both are marked as valid..

-- BEFORE IGP--
AS5000_LA#show ip bgp
BGP table version is 5, local router ID is 10.0.0.5
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
             r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

  Network Next Hop Metric LocPrf Weight Path
* i100.10.0.0/16 10.0.0.10 0 100 0 2000 3000 ?
*> 10.0.0.6 0 1000 3000 3000 ?

-- AFTER IGP--
AS5000_LA#show ip bgp
BGP table version is 6, local router ID is 10.0.0.5
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
             r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

  Network Next Hop Metric LocPrf Weight Path
*>i100.10.0.0/16 10.0.0.10 0 100 0 2000 3000 ?
* 10.0.0.6 0 1000 3000 3000 ?

Cheers
Heath

ps. I've posted this to cisco-nsp also (a day ago) - so apologies in
advance if you are on both and seeing it twice.

Yes, I believe the command is "show ip bgp rib-failure". This shows routes that are in the BGP table, theoretically eligible to be used as actual traffic-forwarding routes, but are failing to be inserted into the Routing Information Base (RIB) for one reason or another. I don't have a lab router handy to lab it up, and of course on my normal production router it comes up empty (lists column headers, but no routes) because I don't have any edge cases on there right now. But I think this is what you want.

-- Jeff Saxe
Network Engineer, Blue Ridge InternetWorks
Charlottesville, VA

Cheers Jeff.

I thought i'd give that a go, but it doesnt seem to be working for some reason!

(This is without next-hop in IGP)

AS5000_LA#show ip bgp
BGP table version is 3, local router ID is 10.0.0.5
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network Next Hop Metric LocPrf Weight Path
*> 100.10.0.0/16 10.0.0.6 0 1000 3000 3000 ?
* i 10.0.0.10 0 100 0 2000 3000 ?

AS5000_LA#show ip bgp rib-failure
Network Next Hop RIB-failure RIB-NH Matches
AS5000_LA#

AS5000_LA#show ip bgp 100.10.0.0
BGP routing table entry for 100.10.0.0/16, version 3
Paths: (2 available, best #1, table Default-IP-Routing-Table)
Flag: 0x820
  Advertised to update-groups:
     2
  1000 3000 3000
    10.0.0.6 from 10.0.0.6 (10.0.0.13)
      Origin incomplete, localpref 100, valid, external, best
  2000 3000
    10.0.0.10 (inaccessible) from 10.0.0.2 (10.0.0.9)
      Origin incomplete, metric 0, localpref 100, valid, internal

From the detail view, the route is marked as inaccessible. Perhaps

this is the only way to get to it..

Heath

In a message written on Thu, Sep 30, 2010 at 10:49:17AM +0100, Heath Jones wrote:

Is there an easy way to see which iBGP routes are not being selected
due to next-hop not being in IGP?

I have suggested more than a few times to vendors that the command:

show bgp ipv4 unicast 100.10.0.0/16 why-chosen

Would be insanely useful. Yes, you can recreate the 7+ BGP decision
steps in your mind by running a pile of show commands, but when
you're trying to figure out several odd routes at the same time
this is very hard to keep in your head and very labor intensive.
The box knows why it chose the route, and should be able to show
that to you.

Of course most boxes can't show you outgoing BGP communities either.
*sigh* Is it really too hard to ask to get a show bgp neighbor ...
advertised that shows ALL of the attributes?

+1 for that, in a similar manner to packet-tracer on ASAs.

Peter

i was recently bitten by a cousin of this

research router getting an ebgp multi-hop full feed from 147.28.0.1
(address is relevant)

it is on a lan with a default gateway 42.666.77.11 (address not
relevant), so it has

    ip route 0.0.0.0 0.0.0.0 42.666.77.11

massive flapping results.

it seems it gets the bgp route for 147.28.0.0/16 and then can not
resolve the next hop. it would not recurse to the default exit.

of course it was solved by

    ip route 147.28.0.0 255.255.0.0 42.666.77.11

but i do not really understand in my heart why i needed to do this.

randy

Because the path was broken everytime the bgp session was established and rewriting the routing table with more specific routes?

i was recently bitten by a cousin of this

research router getting an ebgp multi-hop full feed from 147.28.0.1
(address is relevant)

it is on a lan with a default gateway 42.666.77.11 (address not
relevant), so it has

   ip route 0.0.0.0 0.0.0.0 42.666.77.11

massive flapping results.

it seems it gets the bgp route for 147.28.0.0/16 and then can not
resolve the next hop. it would not recurse to the default exit.

of course it was solved by

   ip route 147.28.0.0 255.255.0.0 42.666.77.11

but i do not really understand in my heart why i needed to do this.

last time severall years ago on cisco I used a route-map to rewrite the next-hop.
route-map xx-in permit 10
  set ip next-hop 42.666.77.11
route-map xx-out permit 10
  set ip next-hop x.x.x.x

  neighbor 147.28.0.1 remote-as yyy
  neighbor 147.28.0.1 ebgp-multihop 8
  neighbor 147.28.0.1 route-map xx-in in
  neighbor 147.28.0.1 route-map xx-out out

something like this.

last time severall years ago on cisco I used a route-map to rewrite the
next-hop.
route-map xx-in permit 10
  set ip next-hop 42.666.77.11
route-map xx-out permit 10
  set ip next-hop x.x.x.x

  neighbor 147.28.0.1 remote-as yyy
  neighbor 147.28.0.1 ebgp-multihop 8
  neighbor 147.28.0.1 route-map xx-in in
  neighbor 147.28.0.1 route-map xx-out out

something like this.

as i showed, i knew how to hack out of the problem. and yes, there are
many ways.

the question is why i am in the corner in the first place.

randy

Been in JUNOS "show route" since day one, and IMHO is easily in the top
10 list of why I still buy Juniper instead of Cisco despite all the
$%^&*ing bugs these days.

show bgp ipv4 unicast 100.10.0.0/16 why-chosen
Would be insanely useful.

Been in JUNOS "show route" since day one, and IMHO is easily in the top
10 list of why I still buy Juniper instead of Cisco despite all the
$%^&*ing bugs these days.

Its interesting, I was heavy into cisco years back and then juniper
for a while. Going back to cisco now is great (always good for me to
keep my exposure up), but there is just so much unclear in it's CLI.
It wasn't until going back that I realised.

I guess they would have to balance keeping the old timers & scripts
etc happy VS bringing in new features that make the output look
different.. Do you keep something that isn't perfect but people know
how to use, or change it and cause more issues than good?

ps. Juniper has really gone to $h!t lately. There's a website called
glassdoor.com that I found - go look up what employees have to say
about it.. reflects exactly the support we were getting, even as as an
'elite' partner..

Its interesting, I was heavy into cisco years back and then juniper
for a while. Going back to cisco now is great (always good for me to
keep my exposure up), but there is just so much unclear in it's CLI.
It wasn't until going back that I realised.

I guess they would have to balance keeping the old timers & scripts
etc happy VS bringing in new features that make the output look
different.. Do you keep something that isn't perfect but people know
how to use, or change it and cause more issues than good?

Personally I still can't believe that it's the year 2010, and IOS still
shows routes in classful notation (i.e. if it's in 192.0.0.0/3 and is a
/24, the /24 part isn't displayed because it's assumed to be "Class C").
Of course I say that every year, and so far the only thing that has
changed is the year I say it about.

ps. Juniper has really gone to $h!t lately. There's a website called
glassdoor.com that I found - go look up what employees have to say
about it.. reflects exactly the support we were getting, even as as an
'elite' partner..

Don't get me started, I could complain for days and still not run out of
material, but alas it doesn't accomplish anything. Sadly, many of the
best Juniper people I know are incredibly disaffected, and are leaving
(or have already left) in droves. I think the way I heard it put best
was, "I'm convinced that $somenewexecfromcisco is actually on a secret 5
year mission to come over to Juniper, completely $%^* the company, and
then go back to Cisco and get a big bonus for it". :slight_smile:

it seems it gets the bgp route for 147.28.0.0/16 and then can not
resolve the next hop. it would not recurse to the default exit.

of course it was solved by
ip route 147.28.0.0 255.255.0.0 42.666.77.11
but i do not really understand in my heart why i needed to do this.

Neither do I, Randy.
I have seen recursive routing done - perhaps on a juniper - i really
cannot remember.
Given that the packet would be originating from the device itself (not
hardware forwarded), it would make sense that it should be able to
perform a recursive lookup. I'd put it down to an implementation
thing..

Unrelated, I was doing some thinking about a multihomed site and using
BGP advertisments sent out one link (provider 1) to influence the
sending of the advertisments out of the other link (provider 2). Long
story short I needed to know how long bgp nlri's take to traverse the
net, and subsequently have a paper that you co-authored open in
another tab - well done! :slight_smile:

it seems it gets the bgp route for 147.28.0.0/16 and then can not
resolve the next hop. it would not recurse to the default exit.

of course it was solved by
ip route 147.28.0.0 255.255.0.0 42.666.77.11
but i do not really understand in my heart why i needed to do this.

Neither do I, Randy.

a good friend at cisco says he will take the time to write up why in the
next day or two.

I needed to know how long bgp nlri's take to traverse the net

high variance

randy

Only thing I can guess from the Cisco doc that says:

"To prevent the creation of loops through oscillating routes, the multihop will not be established if the only route to the multihop peer is the default route (0.0.0.0)."

Is that they think they're saving you from shooting yourself in the foot, if you learn the route to 147.28.0.0/16 via BGP (which your multihop peer address falls in), yet you have a default route of 0.0.0.0? But then you'd simply recursively look up the FIB route to the next hop in BGP... so I still don't get it.

-b

i was recently bitten by a cousin of this

research router getting an ebgp multi-hop full feed from 147.28.0.1
(address is relevant)

it is on a lan with a default gateway 42.666.77.11 (address not
relevant), so it has

   ip route 0.0.0.0 0.0.0.0 42.666.77.11

massive flapping results.

it seems it gets the bgp route for 147.28.0.0/16 and then can not
resolve the next hop. it would not recurse to the default exit.

of course it was solved by

   ip route 147.28.0.0 255.255.0.0 42.666.77.11

but i do not really understand in my heart why i needed to do this.

Looks like a classic race condition, in that 147.28/16, upon arrival, becomes a better route for the recursed next-hop (which really is a recursed lookup on your default) So you get

147.28/16 -> 147.28.0.1, and then 147.28.0.1 looks best through the learned route.

Of course, this would appear to be a matter of how it is implemented. Because in fact, the 147 route isn't yet in the routing table, so your default should apply. The static seems to force a recursion to the 666 nh.

I'll wait for your friend to send the implementation details, but from a glance, it looks like a defensive (lazy?) attempt to avoid a recursion loop during the update receive process.

Btw, this will happen on a Juniper (or at least it used to). I'll have to check to confirm.

Chris

Section 9.1.2.1 of RFC 4271 seems to address this.

A few points from that section:
- The BGP NEXT_HOP can not recursively resolve (directly or indirectly) through the BGP route.
- Only the longest matching route should be considered when resolving the BGP NEXT_HOP.
- Do not consider feasible routes that would become unresolvable if they were installed.

--Stacy

Section 9.1.2.1 of RFC 4271 seems to address this.
A few points from that section:
- The BGP NEXT_HOP can not recursively resolve (directly or indirectly) through the BGP route.
- Only the longest matching route should be considered when resolving the BGP NEXT_HOP.
- Do not consider feasible routes that would become unresolvable if they were installed.

There are 2 ways of reading that.. Perhaps i'll go and look at the it
in more details.
I'm trying to think of a scenario where following this or something
similar would break it:
- Don't use BGP prefixes to resolve next-hop.
- You can use 0/0 or any route with a lower administrative distance to
resolve the next-top.

With that in mind, I wonder if it works with Juniper (ad = 170 vs 20
from memory)..