BGP - weight

Hello

Dumb question:
If I apply a equal weight to all our transit/peers, will
that affect route announcements to iBGP or eBGP peers anyhow?

We got a very simple setup:
- 2 routers on the border to transit/peers (that's were i want
  to set the weight)
- 1 edge router to customers
- core router running BGP as well

What I want to achieve is that traffic leaves through
the border router it arrived, rather than have it bounced around.

We had some recent issues were it looks like the core got
"out of sync" with the border (looks more like a sw issue than
just convergence delay) and packets bounced back and
forth between them.
I know this doesn't solve the cause but the before digging
for the initial reason I want a quick workaround.

Cheers
Sven

Dumb question:
If I apply a equal weight to all our transit/peers, will
that affect route announcements to iBGP or eBGP peers anyhow?

Yes, given that it's a local parameter (i.e., not BGP,
per se, though it does impact what's installed in the BGP
RIB and what's subsequently advertised to your peers),
you'll likely begin preferring more routes via eBGP
learned peers per subsequent attributes in the best path
selection algorithm (e.g., MED, AS_PATH, even LOCAL_PREF)
won't be considered.

We got a very simple setup:
- 2 routers on the border to transit/peers (that's were i want
  to set the weight)
- 1 edge router to customers
- core router running BGP as well

What I want to achieve is that traffic leaves through
the border router it arrived, rather than have it bounced
around.

Perhaps you should first look at other reasons why this may
be occurring (e.g., accepting MEDs from one peer and not the
other, accepting MEDs from both but different policies are
employed to derive their values, AS_PATH "suggests" a better
path, etc..) -- then comes preference for eBGP over iBGP.

We had some recent issues were it looks like the core got
"out of sync" with the border (looks more like a sw issue than
just convergence delay) and packets bounced back and forth
between them.

This could be any of a number of things.. Without more
information I'd be hesitant to start tweaking knobs.

I know this doesn't solve the cause but the before digging
for the initial reason I want a quick workaround.

"Weight" is a very influential parameter. I'm not a big fan
of configuring routing policies that are entirely local to a
system, for obvious reasons. But I do suspect it would
accomplish what you're trying to achieve.

-danny

Date: Sat, 14 Feb 2004 12:23:06 +0000
From: Sven Huster

We had some recent issues were it looks like the core got
"out of sync" with the border (looks more like a sw issue
than just convergence delay) and packets bounced back and
forth between them.

Yikes. I'd try to see what was wrong with your IGP before using
a big (albeit highly localized) hammer.

I know this doesn't solve the cause but the before digging
for the initial reason I want a quick workaround.

I'm not so sure about having traffic leave through the router via
which it arrived. You have no way of knowing that unless you
were to tag packets or track flows, keeping state...

If you wish to force packets through a router, you might consider
tagging learned routes with a community corresponding to the
router that learned the route. On the edge router you could
force next hop according to community.

Again, though, these are gross and messy. I'm not sure if hacks
will make your situation better or worse.

Good luck,
Eddy

Dumb question:
If I apply a equal weight to all our transit/peers, will
that affect route announcements to iBGP or eBGP peers anyhow?

No it wont affect announcements, weight is local to the router you apply it.

We got a very simple setup:
- 2 routers on the border to transit/peers (that's were i want
  to set the weight)
- 1 edge router to customers
- core router running BGP as well

What I want to achieve is that traffic leaves through
the border router it arrived, rather than have it bounced around.

eBGP should be preferred over iBGP anyhow assuming all other things are equal,
if theyre not equal then either make them equal or you probably want to choose a
different path anyhow (eg shorter as path).

if you dont want any traffic to go across your network why bother meshing the
ibgp in the first place?

We had some recent issues were it looks like the core got "out of sync" with
the border (looks more like a sw issue than just convergence delay) and
packets bounced back and forth between them. I know this doesn't solve the
cause but the before digging for the initial reason I want a quick workaround.

hmm, i'd suggest emergency maintenance before doing some weird screwy stuff like
that :slight_smile:

Steve

> Dumb question:
> If I apply a equal weight to all our transit/peers, will
> that affect route announcements to iBGP or eBGP peers anyhow?

No it wont affect announcements, weight is local to the router you apply it.

> What I want to achieve is that traffic leaves through
> the border router it arrived, rather than have it bounced around.

eBGP should be preferred over iBGP anyhow assuming all other things are equal,
if theyre not equal then either make them equal or you probably want to choose a
different path anyhow (eg shorter as path).

if you dont want any traffic to go across your network why bother meshing the
ibgp in the first place?

Just to make it a bit more clear:

Transit1 Peers Transit2 Peers Customers via BGP
   > > > > >
   ----R1---- ----R2---- R3
       > > >
       > > >
       > > >
       ---------------------Core---------------------
                             >
                             >
                         Data Center

Full-mesh between R1,R2,R3 and Core

We carry traffic from the DC as well as the customers in the core to transit and peers.
We normally want to advertise full routes to customers, which are multi-homed.

> We had some recent issues were it looks like the core got "out of sync" with
> the border (looks more like a sw issue than just convergence delay) and
> packets bounced back and forth between them. I know this doesn't solve the
> cause but the before digging for the initial reason I want a quick workaround.

hmm, i'd suggest emergency maintenance before doing some weird screwy stuff like
that :slight_smile:

The thing that happend was that the core believed that the best path out is via
R1, which R1 thought it was via R2. So a little loop there.

We weren't able to reproduce the problem nor to find a source yet.

So the plan right now was: if the core decides that traffic should go out via
R1, R1 just just send it out via the best path it got from eBGP.
So that we get some more time for debugging what's going on there.

Sven

> > Dumb question:
> > If I apply a equal weight to all our transit/peers, will
> > that affect route announcements to iBGP or eBGP peers anyhow?
>
> No it wont affect announcements, weight is local to the router you apply it.
>
> > What I want to achieve is that traffic leaves through
> > the border router it arrived, rather than have it bounced around.
>
> eBGP should be preferred over iBGP anyhow assuming all other things are equal,
> if theyre not equal then either make them equal or you probably want to choose a
> different path anyhow (eg shorter as path).
>
> if you dont want any traffic to go across your network why bother meshing the
> ibgp in the first place?

Just to make it a bit more clear:

Transit1 Peers Transit2 Peers Customers via BGP
   > > > > >
   ----R1---- ----R2---- R3
       > > >
       > > >
       > > >
       ---------------------Core---------------------
                             >
                             >
                         Data Center

Full-mesh between R1,R2,R3 and Core

We carry traffic from the DC as well as the customers in the core to transit and peers.
We normally want to advertise full routes to customers, which are multi-homed.

>
> > We had some recent issues were it looks like the core got "out of sync" with
> > the border (looks more like a sw issue than just convergence delay) and
> > packets bounced back and forth between them. I know this doesn't solve the
> > cause but the before digging for the initial reason I want a quick workaround.
>
> hmm, i'd suggest emergency maintenance before doing some weird screwy stuff like
> that :slight_smile:

The thing that happend was that the core believed that the best path out is via
R1, which R1 thought it was via R2. So a little loop there.

We weren't able to reproduce the problem nor to find a source yet.

Is this all the same vendor hardware?

Check the bgp configs are identical eg deterministic-med, dampening,
always-compare-med etc are all configured the same..

Steve

>
> > > Dumb question:
> > > If I apply a equal weight to all our transit/peers, will
> > > that affect route announcements to iBGP or eBGP peers anyhow?
> >
> > No it wont affect announcements, weight is local to the router you apply it.
> >
> > > What I want to achieve is that traffic leaves through
> > > the border router it arrived, rather than have it bounced around.
> >
> > eBGP should be preferred over iBGP anyhow assuming all other things are equal,
> > if theyre not equal then either make them equal or you probably want to choose a
> > different path anyhow (eg shorter as path).
> >
> > if you dont want any traffic to go across your network why bother meshing the
> > ibgp in the first place?
>
> Just to make it a bit more clear:
>
> Transit1 Peers Transit2 Peers Customers via BGP
> > > > > >
> ----R1---- ----R2---- R3
> > > >
> > > >
> > > >
> ---------------------Core---------------------
> >
> >
> Data Center
>
> Full-mesh between R1,R2,R3 and Core
>
>
> We carry traffic from the DC as well as the customers in the core to transit and peers.
> We normally want to advertise full routes to customers, which are multi-homed.
>
> >
> > > We had some recent issues were it looks like the core got "out of sync" with
> > > the border (looks more like a sw issue than just convergence delay) and
> > > packets bounced back and forth between them. I know this doesn't solve the
> > > cause but the before digging for the initial reason I want a quick workaround.
> >
> > hmm, i'd suggest emergency maintenance before doing some weird screwy stuff like
> > that :slight_smile:
>
> The thing that happend was that the core believed that the best path out is via
> R1, which R1 thought it was via R2. So a little loop there.
>
> We weren't able to reproduce the problem nor to find a source yet.

Is this all the same vendor hardware?

Nope.

R1-3 - Cisco
Core - Extreme Alpine 3808

Check the bgp configs are identical eg deterministic-med, dampening,
always-compare-med etc are all configured the same..

I'll have a look there.
Thanks

Sven

Date: Sat, 14 Feb 2004 18:00:51 +0000
From: Sven Huster

The thing that happend was that the core believed that the
best path out is via R1, which R1 thought it was via R2. So a
little loop there.

So core sends to R1, which sends to R2... where does R2 send the
packets? Back to R1?

What are you doing in your IGP? Are you using { iBGP | OSPF |
IS-IS | ... }? How does R1 learn routes from Transit2?

What about confederations? Used correctly, they're helpful.
Used incorrectly in similar scenarios, an iBGP mesh becomes a
constantly-oscillating iBGP mess.

Are you using either

  router bgp xxxx
   bgp bestpath compare-routerid

or

  router bgp xxxx
   no bgp bestpath compare-routerid

on all routers? I'm wondering if R1 prefers Transit2 and R2
prefers Transit1 due to different path selection algorithms...

Can you "sh route" or "sh ip bgp" for a route that loops?

Eddy

> Date: Sat, 14 Feb 2004 18:00:51 +0000
> From: Sven Huster

> The thing that happend was that the core believed that the
> best path out is via R1, which R1 thought it was via R2. So a
> little loop there.

So core sends to R1, which sends to R2... where does R2 send the
packets? Back to R1?

The core sends to R1, which believes the best path is via R2 and
sends it back to the core as that's the only way to reach R2.
Then the core again sends it to R1 and all the same again.

What are you doing in your IGP? Are you using { iBGP | OSPF |
IS-IS | ... }? How does R1 learn routes from Transit2?

As this is a small network internally everything is routed via
static routes.
R1 and R2 have full BGP views from the transit providers as well
as partial view from the peers. They run iBGP with R3 and the core.

What about confederations? Used correctly, they're helpful.
Used incorrectly in similar scenarios, an iBGP mesh becomes a
constantly-oscillating iBGP mess.

Are you using either

  router bgp xxxx
   bgp bestpath compare-routerid

or

  router bgp xxxx
   no bgp bestpath compare-routerid

on all routers? I'm wondering if R1 prefers Transit2 and R2
prefers Transit1 due to different path selection algorithms...

All devices use the default settings in this respect.
R1-3 are Cisco routers, the core Extreme Alpine.

Can you "sh route" or "sh ip bgp" for a route that loops?

It seems to be a temp problem, which we just figured out once
it went away based on netflow data and traffic dumps. So there
is no data available for this right now.

Sven

Date: Sun, 15 Feb 2004 16:50:02 +0000
From: Sven Huster

[ editted and reformatted for clarity ]

The core sends to R1, which believes the best path is via R2
and sends it back to the core as that's the only way to reach
R2. Then the core again sends it to R1 and all the same
again.

Yuck.

As this is a small network internally everything is routed
via static routes.

Except for the smallest of networks, I try to avoid static
routes. It's additional work and opportunity for error. Using
BGP + TCP MD5 auth, OSPF auth, hardcoded ARP entries, per-port
MAC address restrictions, prefix lists, route maps, etc., one can
run a dynamic network and still keep security under control.

R1 and R2 have full BGP views from the transit providers as
well as partial view from the peers.

Why not arrange the routers and switch in a single VLAN? (Or did
I misunderstand your earlier ASCII-art diagram?) I usually use
something like:

  10.0.0.1/32 local sinkhole
  10.0.0.2/28 virtual router (HSRP/VRRP; maybe XRRP now)
  10.0.0.3/28 physical router #1
  10.0.0.4/28 physical router #2
  : : : : : : :
  10.0.0.13/28 [routing] switch #2
  10.0.0.14/28 [routing] switch #1

Let R1, R2, and R3 speak directly over ethernet without routing
through core. If they already do, verify that you're setting
nexthop correctly.

Multihop routing sessions often can be made to work, but they're
a tricky "house of cards". Remember, classic IP routing forwards
to a { MAC addr | PVC | endpoint } based on destination IP addr.
You can't do fancy rewriting at each hop; that's part of why PBR
and label switching were invented. :wink:

Note: I am _not_ suggesting PBR for this situation.

They [R1 and R2] run iBGP with R3 and the core.

You have a partial mesh in which R1 and R2 do not exchange routes
with each other?

router bgp xxxx
[no] bgp bestpath compare-routerid

All devices use the default settings in this respect.
R1-3 are Cisco routers, the core Extreme Alpine.

Somewhere along the line Cisco changed the default from "bgp
bestpath compare-routerid" to the converse. I forget when,
although a quick Google search leads me to believe it was around
12.0/12.0S/12.0ST. I can't comment on Extreme.

Again, though, I'm going out on a limb with this one. I'd bet on
static routes, topology, and [lack of] IGP before BGP path
selection algorithm.

It seems to be a temp problem, which we just figured out once

Odd.

it went away based on netflow data and traffic dumps. So there
is no data available for this right now.

If you catch any non-traceroute packets with expiring TTL, see if
you can grab routing info from all the boxes involved. I'm
confused how these devices are building their RIBs...

Eddy

Thanks for anyone who answered.
Guess, we sorted it out now.

Sven

...

> As this is a small network internally everything is routed
> via static routes.

Except for the smallest of networks, I try to avoid static
routes. It's additional work and opportunity for error. Using
BGP + TCP MD5 auth, OSPF auth, hardcoded ARP entries, per-port
MAC address restrictions, prefix lists, route maps, etc., one can
run a dynamic network and still keep security under control.

> R1 and R2 have full BGP views from the transit providers as
> well as partial view from the peers.

Why not arrange the routers and switch in a single VLAN? (Or did
I misunderstand your earlier ASCII-art diagram?) I usually use
something like:

  10.0.0.1/32 local sinkhole
  10.0.0.2/28 virtual router (HSRP/VRRP; maybe XRRP now)
  10.0.0.3/28 physical router #1
  10.0.0.4/28 physical router #2
  : : : : : : :
  10.0.0.13/28 [routing] switch #2
  10.0.0.14/28 [routing] switch #1

...