route for linx.net in Level3?

Leo Bicknell wrote:

Even if the exchange does not advertise the

exchange LAN, it's probably the case that it is in the IGP (or at

least IBGP) of everyone connected to it, and by extension all of

their customers with a default route pointed at them.

Actually, that may not be the case, and probably *should* not be the case.

Here's why, in a nutshell:

If two regional ISPs on either side of the planet, point default to the
same Global ISP,
even if they do not peer with that ISP, by using the IX next-hop at IX A
(for ISP A),
and IX B (for ISP B), then the Global ISP is now giving free on-net transit
to A and B.

So, it turns out that pretty much the only way to prevent this at a routing
level,
is to not carry IXP networks (in IGP or IBGP), but rather to do
next-hop-self.

The other way is to filter at a packet level on ingress, based on Layer 2
information,
which on many kinds of IX-capable hardware, is actually impossible.

So, when it comes to IXPs: Next-Hop-Self.

(BCP 38 actually doesn't even enter into it, oddly enough.)

Brian

I have experience of several networks where that is not the case. IGP carries routes for loopback and internal-facing interfaces; external-facing interface routes are only known to the local router; pervasive next-hop-self for IBGP.

So, no great survey, but don't assume that everybody does things the same way.

Joe

Even if the exchange does not advertise the exchange LAN, it's
probably the case that it is in the IGP (or at least IBGP) of
everyone connected to it,

yikes! this is quite ill-advised and i don't know anyone who does
this, but i think all my competitors should.

I have experience of several networks where that is not the case. IGP
carries routes for loopback and internal-facing interfaces;

i have seen some carry external because, for some reason, they do not
want to re-write next-hop at the border.

randy

Even if the exchange does not advertise the exchange LAN, it's
probably the case that it is in the IGP (or at least IBGP) of
everyone connected to it,

yikes! this is quite ill-advised and i don't know anyone who does
this, but i think all my competitors should.

Its more common than uncommon.

At WIX (Wellington), 64 out of 93 members will carry packets destined
to APE (Auckland Exchange). (source:
http://conference.apnic.net/__data/assets/pdf_file/0018/50706/apnic34-mike-jager-securing-ixp-connectivity_1346119861.pdf)
and this is just New Zealand!

Just checked a few exchanges, not just are the IXP ranges being
carried, they're being leaked:

Equinix SG:

$ bgpctl show rib 202.79.197.0/24
flags: * = Valid, > = Selected, I = via IBGP, A = Announced
origin: i = IGP, e = EGP, ? = Incomplete

flags destination gateway lpref med aspath origin
      202.79.197.0/24 100 0 13335 23947 23947 ?
      202.79.197.0/24 100 0 13335 10026 i

Any2 LA:

bgpctl show rib 206.223.143.0/24
flags: * = Valid, > = Selected, I = via IBGP, A = Announced
origin: i = IGP, e = EGP, ? = Incomplete

flags destination gateway lpref med aspath origin
      206.223.143.0/24 100 0 13335 9304 i
      206.223.143.0/24 100 0 13335 9304 i
      206.223.143.0/24 100 0 13335 4635 9304 i
      206.223.143.0/24 100 0 13335 9304 i

i am not unhappy by the exchange mesh being carried within a member and
being propagated to their customer cone, see my nanog preso of feb 1997
and leo's recent post.

it's putting such things in one's igp that disgusts me. as joe said,
igp is just for the loopbacks and other interfaces it takes to make your
ibgp work.

randy

In a message written on Fri, Apr 05, 2013 at 10:01:34AM +0900, Randy Bush wrote:

it's putting such things in one's igp that disgusts me. as joe said,
igp is just for the loopbacks and other interfaces it takes to make your
ibgp work.

While your method is correct for probably 80-90% of the ISP networks,
the _why_ people do that has almost been lost to the mysts of time.
I'm sure Randy knows what I'm about to type, but for the rest of
the list...

The older school of thought was to put all of the edge interfaces
into the IGP, and then carry all of the external routes in BGP.
This caused a one level recursion in the routers:
  eBGP Route->IXP w/IGP Next Hop->Output Interface

The Internet then became a thing, and there started to be a lot of
BGP speaking customers (woohoo! T1's for everyone!), and thus lots
of edge /30's in the IGP. The IGP convergence time quickly got
very, very bad. I think a network or two may have even broken an
IGP.

The "solution" was to take edge interfaces (really "redistribute
connected" for most people) and move it from the IGP to BGP, and
to make that work BGP had to set "next-hop-self" on the routes.
The exchange /24 would now appear in BGP with a next hop of the
router loopback, the router itself knew it was directly connected.
A side effect is that this caused a two-step lookup in BGP:
  eBGP-Route->IXP w/Router Loopback Next Hop->Loopback w/IGP Next Hop->Output Interface

IGP's went from O(bgp_customers) routes to O(router) routes, and
stopped falling over and converged much faster. On the flip side,
every RIB->FIB operation now has to go through an extra step of
recursion for every route, taking BGP resolution from O(routes) to
O(routes * 1.1ish).

Since all this happened, CPU's have gotten much faster, RAM has
gotten much larger. Most people have never revisited the problem,
the scaling of IGP's, or what hardware can do today.

There are plenty of scenarios where the "old way" works just spiffy,
and can have some advantages. For a network with a very low number of
BGP speakers the faster convergence of the IGP may be desireable.

Not every network is built the same, or has the same scaling
properties. What's good for a CDN may not be good for an access
ISP, and vice versa, for example.

The older school of thought was to put all of the edge interfaces into the

IGP, and then carry all of the external routes in BGP.
I thought people where doing it because IGP converged faster than iBGP and
in case of an external link failure the ingress PE was informed via IGP that
it has to find an alternate next-hop.
Though now with the advent of BGP PIC this is not an argument anymore.

adam

In a message written on Fri, Apr 05, 2013 at 09:32:52AM +0200, Adam Vitkovsky wrote:

I thought people where doing it because IGP converged faster than iBGP and
in case of an external link failure the ingress PE was informed via IGP that
it has to find an alternate next-hop.
Though now with the advent of BGP PIC this is not an argument anymore.

You're talking about stuff that's all 7-10 years after the decisions
were made that I described in my previous e-mail. Tag switching
(now MPLS) had not yet been invented/deployed when the first
"next-hop-self" wave occured it was all about scaling both the IGP
and BGP.

In some MPLS topologies it may speed re-routing to have edge interfaces
in the IGP due to the faster convergence of IGP's. YMMV, Batteries not
Included, Some Assembly Required.