shim6 @ NANOG (forwarded note from John Payne)

Kevin_Day · March 1, 2006, 7:56am

o a small to medium multi-homed tier-n isp

A small-to-medium, multi-homed, tier-n ISP can get PI space from their RIR, and don't need to worry about shim6 at all. Ditto larger ISPs, up to and including the largest.

If you include "Web hosting company" in your definition of ISP, that's not true. Unless you're providing connectivity to 200 or more networks, you can't get a /32. If all of your use is internal(fully managed hosting) or aren't selling leased lines or anything, you are not considered an LIR by the current IPv6 policies.

Even the proposed ARIN 2006-4 assignment policy for "end sites" doesn't help a lot of small to mid sized hosting companies. For that, to just get a /48, you need to already have a /19 or larger, and be using 80% of that. That's 6553 IPs being utilized. If you're running a managed hosting company (name based vhosts) and deploying 1 IP per web server, you're pretty huge before you've hit 6553 devices. Even assuming 20% of that is wasted, you're still talking about more than 5000 servers. 40 1U servers per rack, you need to have 125 racks of packed to the gills servers before you'd qualify for PI space. That excludes every definition I have of "small-to-medium" in the hosting arena.

You don't get PI space, and Shim6 is looking like your only alternative for multihoming.

Content providers have a different set of problems, since a server with N simultaneously-active clients, each with an average of M available locators needs to deal with N*M worth of state, which is presumably M times worse than the situation today.

For very large content providers, aggregating very large numbers of simultaneous clients through load balancers or other middleboxes, this is quite possibly not something that is going to be a simple matter of upgrading to a shim6-capable firmware release.

Yes, and content providers have other issues as well when it comes to IPv6 policy... I'm betting only the top 1 or 2 CDN/content providers out there qualify for a /32. Many content providers set up multiple non-interconnected POPs in different geographical locations. The only way this can be accomplished is by making separate announcements in each POP for each space. This means either being able to deaggregate, or to get a block for each POP. I don't know of *ANY* that are deploying 5000+ servers per POP.

Actually, I think the problem with shim6 is that there are far too few operators involved in designing it. This has evidently led to a widespread perception of an ivory tower with a moat around it.

I think the issue was... When I first heard of shim6, I thought "Oooh, that's really clever. A lot of small businesses/enterprises will use that, they don't need to deal with BGP, adding a new provider is just a drop in." Then when we got to deploying IPv6 the discovery of "Oh, wait, they expect EVERYONE who uses PA space to do this? That's not cool." was a negative reaction.

To gain real relevance it needs to be deployed; to be deployed, it needs to be embraced by enterprise operators and content providers.

If these operators dismiss it out of hand on principal, and refuse to actually find out whether the general approach is able to solve problems or not, then irrelevance does indeed seem inevitable. However, the only alternative on the table is a v6 swamp.

How about some actual technical complaints about shim6?

I'm just one guy, one ASN, and one content/hosting network. But I can tell you that to switch to using shim6 instead of BGP speaking would be a complete overhaul of how we do things.

Putting routing decisions in the control of servers we don't operate scares me. I wouldn't rely on 90% of our customers to get this right unless it was completely idiot proof. Even if it was, I don't see how we can trust that users aren't messing with things to "game the system" somehow.

We deal with long lived TCP sessions (hours/days). I don't see how routing updates can happen that won't result in a disconnect/reconnect, which isn't acceptable. With current BGP technologies, if I need to move traffic off a transit port, I can do so without relying on all of our servers to know anything about it, the move is instant, and non-disruptive. Shim6 requires a keepalive to expire for the end nodes to realize something is broken, then re-negotiate the remaining routing decisions. With BGP, I can see if one of my transit links goes down directly, and compensate before users start getting impatient.

We have peering arrangements with about 120 ASNs. How do we mix BGP IPv6 peering and Shim6 for transit?

So far it looks like Shim6 is going to rely on DNS. The DNS caching issue is a real problem. We need changes to happen faster than DNS caching will allow.

Our network is complicated. We have a /21 that's split into 4 /23s. One for each non-interconnected POP. We only advertise the /23 for each POP out to transit, but we give peers access to our entire network wherever they peer with us and we pay to haul/tunnel it around. How do we even do this without PI space, let alone through shim6?

For quite the foreseeable future, we'd be running IPv4 and IPv6 at the same time, over the same transit connections. We'd have to TE our IPv6 bits completely differently than our IPv4 bits, even though we'd be billed for the aggregate usage of both. Automated tools for tweaking total usage per transit port is hard enough in BGP. Having to tweak both BGP and some external shim6 method of TE when the goal is a common aggregate number is going to be a very difficult issue.

Some of our applications are extremely sensitive to jitter/latency. We've spent ages tweaking route-maps manually (and through automated continual tweaking) to make sure we avoid any congested links. We also rely on BGP communities by our providers to give us some more information when it comes to route decisions. (If NSP A tells me through communities that they peer directly with someone, where NSP B is crossing the country, then hitting another NSP before the Origin ASN, we prefer NSP A). I don't see how information like this, or tweaking to that level is even possible with Shim6. BGP works well for applications like this because each network the traffic passes through can add its own hints (Communities, prepending, etc) to the route, that lots of us use.

We'd still be relying on PA space. No matter how great dhcp6 is, there will be significant renumbering pain when providers are changed. Static ACLs, firewall rules, etc. If you're including customer machines in the renumbering, many simply won't do it.

Putting the logic behind traffic engineering and routing decisions into thousands of boxes seems a step backwards from putting the decision on our border/edges. Many more places where things can break. If we want to do things in a non-standard way, every box has to support it. If there are refinements to Shim6 later, we're forced with either not using them, or forcing our customers to upgrade their OS.

How do we deal with "backup connections"? I.e. connections that are only used if all others are down. Right now we advertise only a supernet out to our "backup transit" provider, and the more specifics to our main providers. (Yes, I realize this isn't perfect, but it works fine for us.)

Please don't get me wrong, I think Shim6 is great for a lot of people. Being able to let ANYONE multihome with no impact on the world is great. BUT, there needs to be a fallback to the BGP/IPv4-ish way for people who need the "power user" set of tools, or there is going to be a huge pushback from a lot of groups when asked to switch to ipv6. This fallback has to be available to anyone who can justify the need, not just "anyone bigger than X size".

-- Kevin

Joe_Abley3 · March 1, 2006, 3:07pm

o a small to medium multi-homed tier-n isp

A small-to-medium, multi-homed, tier-n ISP can get PI space from their RIR, and don't need to worry about shim6 at all. Ditto larger ISPs, up to and including the largest.

If you include "Web hosting company" in your definition of ISP, that's not true.

Right. I wasn't; I listed them separately.

It's important to note that even if you are a hosting company who *does* qualify for PI v6 space, you still need shim6-capable servers, if you want to make them optimally available to multi-homed, shim6-capable hosts. The difference PI makes is in the distribution of addresses to servers (the servers only need a single set).

You don't get PI space, and Shim6 is looking like your only alternative for multihoming.

Right. For a hosting company with multiple PA netblocks, shim6 is the option on the table.

Many content providers set up multiple non-interconnected POPs in different geographical locations. The only way this can be accomplished is by making separate announcements in each POP for each space. This means either being able to deaggregate, or to get a block for each POP. I don't know of *ANY* that are deploying 5000+ servers per POP.

Right. With shim6, getting a block per POP is trivial, since they are all PA assignments from transit providers.

I'm just one guy, one ASN, and one content/hosting network. But I can tell you that to switch to using shim6 instead of BGP speaking would be a complete overhaul of how we do things.

You are not alone in fearing change.

Putting routing decisions in the control of servers we don't operate scares me. I wouldn't rely on 90% of our customers to get this right unless it was completely idiot proof. Even if it was, I don't see how we can trust that users aren't messing with things to "game the system" somehow.

This is the kind of feedback that the shim6 architects need. There is talk at present of whether the protocol needs to be able to accommodate a site-policy middlebox function to enforce site policy in the event that host behaviour needs to be controlled. The scope of that policy mediation function depends strongly on people like you saying "at a high level, this is the kind of decision I am not happy with the hosts making".

We deal with long lived TCP sessions (hours/days). I don't see how routing updates can happen that won't result in a disconnect/reconnect, which isn't acceptable.

One of the primary objectives of shim6 is to provide session survivability over re-homing events. Since routing protocols are not used to manage re-homing, the speed at which a session can recover from a topological event depends on the operation of the shim6 protocol between client and server.

It seems reasonable to say that in some cases shim6 re-homing transitions will be faster than the equivalent routing transition in v4; in other cases it will be shorter. Depends on the network, and how enthusiastically you flap, perhaps.

The experience of people who provide services involving long-held TCP sessions is exactly the kind of thing that the shim6 architects need to hear about.

We have peering arrangements with about 120 ASNs. How do we mix BGP IPv6 peering and Shim6 for transit?

You advertise all your PA netblocks to all your peers.

So far it looks like Shim6 is going to rely on DNS. The DNS caching issue is a real problem. We need changes to happen faster than DNS caching will allow.

Well, not quite.

If you change a transit provider, then you need to remove a set of AAAA records from the servers you operate, and substitute a new set. The time taken for this change to propagate in the DNS is non-zero, assuming you use reasonable TTLs. This is your point above, I think.

With shim6-capable clients and servers, the dark period during which the changes propagate is handled by an address selection/retry algorithm in the client (for new sessions) and by the shim6 protocol doing failure detection and selecting a new locator (for established sessions).

Once the DNS change has propagated, the address selection and shim6 band-aids are no longer required, and clients have an accurate set of information.

Renumbering for hosting providers can be a monstrous pain in the neck, especially for hosting providers who rely on third parties (or, horrors, their customers) to maintain the zone files within which services are named.

Some hosting providers of my acquaintance insist on customer zones being redelegated to the hosting providers' nameservers, so that any renumbering that needs to happen can be coordinated by the hosting provider directly. Hosting providers who don't do this, and who use PA addresses with shim6 to multi-home, are definitely going to face some challenges.

Our network is complicated. We have a /21 that's split into 4 /23s. One for each non-interconnected POP. We only advertise the /23 for each POP out to transit, but we give peers access to our entire network wherever they peer with us and we pay to haul/tunnel it around. How do we even do this without PI space, let alone through shim6?

You avoid it completely, and use PA space in every POP. You can still announce PA space from other POPs to peers, if you want to retain your tunnels.

For quite the foreseeable future, we'd be running IPv4 and IPv6 at the same time, over the same transit connections. We'd have to TE our IPv6 bits completely differently than our IPv4 bits, even though we'd be billed for the aggregate usage of both. Automated tools for tweaking total usage per transit port is hard enough in BGP. Having to tweak both BGP and some external shim6 method of TE when the goal is a common aggregate number is going to be a very difficult issue.

Yep. Difficult and expensive.

Some of our applications are extremely sensitive to jitter/latency. We've spent ages tweaking route-maps manually (and through automated continual tweaking) to make sure we avoid any congested links. [...]

The site-policy middleware that I alluded to earlier seems like the analogous place to specify this policy. Such a facility might actually give you more control than you have now -- tweaking BGP attributes to accomplish this kind of thing is often like a game of whack-a-mole; if you were able to control the route taken by traffic in both directions by influencing the locator selection for each and every session, you'd have far greater, and more fine-grained, control over your external traffic than BGP/swamp-abuse gives you currently.

Your specific requirements in this regard (the high-level objectives that you currently meet using BGP) would no doubt be gratefully received on the shim6 list.

We'd still be relying on PA space. No matter how great dhcp6 is, there will be significant renumbering pain when providers are changed. Static ACLs, firewall rules, etc. If you're including customer machines in the renumbering, many simply won't do it.

Agreed, renumbering is a pain. Dhcp6 sounds like a scary thing to use with servers. Customers suck. Change in operational practices will be required.

Lest I sound too much like a foam-at-the-mouth shim6 advocate, I think it would be perfectly fine if, in the final analysis, the conclusion was that shim6 and PA/renumbering was not an option for hosting providers. A reasoned technical argument which came to that conclusion would provide a solid basis for the RIRs to modify their allocation policies such that hosting providers could use PI space instead. As perhaps the recent attempt to change the v6 PI policy indicates, the chances of making changes without such a reasoned argument are slim.

However, I think it's possible that shim6, incorporating some facility for a site to manage the locator selection of the hosts, could actually make some things easier for hosting providers. There might even be reasons to like it

Joe

jpayne · March 1, 2006, 3:33pm

Only if *everybody* has a shim6 capable stack...

Lucy_E_Lynch · March 1, 2006, 3:35pm

How about some actual technical complaints about shim6?

good question. to give such discussion a base, could you
point us to the documents which describe how to deploy it in
the two most common situation operators see
o a large multi-homed enterprise customer
o a small to medium multi-homed tier-n isp

never under-estimate the range and productivity of Pekka!

Joe_Abley3 · March 1, 2006, 3:45pm

Not quite -- the practical usefulness of the multi-homing increases with the deployment of shim6-capable stacks. You could imagine a threshold of server and host upgrades which would provide useful multi-homing a good proportion of the time without universal deployment.

If Linux and the currently-supported variants of Windows were to be updated to support shim6, and we waited through three or four widely-publicised security vulnerabilities which required OS/kernel upgrades, perhaps that would be sufficient deployment for the benefits of shim6 to be felt, most of the time. My hands are waving again, of course.

I feel fairly certain I have exceeded some kind of unenforced posting threshold to this list in the last twelve hours. I will try hard to be quiet for a while, now

Joe

bill3 · March 1, 2006, 4:08pm

and nobody does! extrapolations and visions of
  a brave new world are just that. kind of like
  the Boeing/Airbus mockups that have lounges, gyms&showers
  and restrants onboard their 747 and A380 aircraft.
  and attractive flight attendants talking about shoes...
  yeah it -might- happen, but...

-- bill (particularly grumpy & cynical tonight)

David_Barak · March 1, 2006, 4:22pm

Okay, if I'm an enterprise with 6 ISPs but don't
qualify for PI space, I'll need to get PA space from
all of them, for Shim6 to work, right? Then each
server on my network is going to need to maintain
state for 6 different contexts for each of the various
external customers who attempt to reach them.
Assuming that I have busy servers, that's a whole lot
of state.

It's cheaper and easier to upgrade or modify N routers
than the M servers behind them, given that M is
certainly greater than N, and in many cases in
multiple orders of magnitude greater.

Also, the current drafts don't support middleboxes,
which a huge number of enterprises use - in fact the
drafts specifically preclude their existence, which
renders this a complete non-starter for most of my
clients.

My single biggest issue here however is the
complexity: given that today's architecture can
deliver relatively simple and robust multihoming to
enterprises, and rerouting DOES work today for
persistent sessions (albeit imperfectly), what is the
benefit to be gained from doing something this hard?

As far as I can tell, the whole reason for these
discussions is the insistence on the strict
PA-addressing model, with no ability to advertise PA
space to other providers. I think that we could spend
our time better in coming up with a different approach
to addressing hierarchy instead. Besides, /48s are
cheap now, but if every enterprise gets multiple /48s
from multiple providers, they might become dear more
quickly than is desired.

-David

David Barak
Need Geek Rock? Try The Franchise:
http://www.listentothefranchise.com

Joe_Abley3 · March 1, 2006, 4:46pm

Also, the current drafts don't support middleboxes,
which a huge number of enterprises use - in fact the
drafts specifically preclude their existence, which
renders this a complete non-starter for most of my
clients.

I have not yet reviewed the lastest shim6 protocol draft, but I've seem discussion around it in which people have talked about middlebox support (in the context of "do we want to leave the door open to middleboxes, or should we insist that this is all done on the host stack?").

My single biggest issue here however is the
complexity: given that today's architecture can
deliver relatively simple and robust multihoming to
enterprises, and rerouting DOES work today for
persistent sessions (albeit imperfectly), what is the
benefit to be gained from doing something this hard?

The current system is complex too, and it will get more complex as the amount of state in the routing system increases. Contrary to what some might think, reading this thread, inter-domain traffic engineering is only achievable using BGP in fairly coarse terms, and the success or failure of the TE tweaks in terms of the desired outcome is often non-determinstic, depending on it does on the routing policies of others.

The current system has the advantage, of course, that its strengths and weaknesses are somewhat well-known.

As far as I can tell, the whole reason for these
discussions is the insistence on the strict
PA-addressing model, with no ability to advertise PA
space to other providers.

The whole reason for the strict PA-addressing model is concern over whether open-slather on PI address space will result in an Internet that will scale.

Joe

(Failing miserably to keep quiet. Must try harder.)

David_Barak · March 1, 2006, 4:55pm

> I'm just one guy, one ASN, and one content/hosting
network. But I
> can tell you that to switch to using shim6 instead
of BGP speaking
> would be a complete overhaul of how we do things.

You are not alone in fearing change.

It isn't fearing change to ask the question "it's not
broken today, why should I fix it?"

This is the kind of feedback that the shim6
architects need. There is
talk at present of whether the protocol needs to be
able to
accommodate a site-policy middlebox function to
enforce site policy
in the event that host behaviour needs to be
controlled. The scope of
that policy mediation function depends strongly on
people like you
saying "at a high level, this is the kind of
decision I am not happy
with the hosts making".

Resounding YES - I specifically DON'T want end-hosts
to be able to make these decisions, but need to be
able to multihome.

> We deal with long lived TCP sessions (hours/days).
I don't see how
> routing updates can happen that won't result in a
disconnect/
> reconnect, which isn't acceptable.

One of the primary objectives of shim6 is to provide
session
survivability over re-homing events. Since routing
protocols are not
used to manage re-homing, the speed at which a
session can recover
from a topological event depends on the operation of
the shim6
protocol between client and server.

It seems reasonable to say that in some cases shim6
re-homing
transitions will be faster than the equivalent
routing transition in
v4; in other cases it will be shorter. Depends on
the network, and
how enthusiastically you flap, perhaps.

A - X - Y - B
\ | \ | /
W - Z

A and B are hosts, W-Z are ISPs

On what basis would you say that in the event of a
network outage in Y, communication between A and B
will be faster than the routing transition?

The experience of people who provide services
involving long-held TCP
sessions is exactly the kind of thing that the shim6
architects need
to hear about.

> We have peering arrangements with about 120 ASNs.
How do we mix BGP
> IPv6 peering and Shim6 for transit?

You advertise all your PA netblocks to all your
peers.

And maintain 120 different context tables on each
host? ouch. I'm guessing that server vendors are
going to be quite happy with this.

You avoid it completely, and use PA space in every
POP. You can still
announce PA space from other POPs to peers, if you
want to retain
your tunnels.

Wait a second - doesn't that deaggregation bring back
the "lots of small routes" business which the whole v6
hierarchical addressing model was supposed to fix? If
we're in the world of deaggregates anyway, why not
just ditch the addressing model instead of accepting
its limitations in this way?

-David

David Barak
Need Geek Rock? Try The Franchise:
http://www.listentothefranchise.com

David_Barak · March 1, 2006, 5:05pm

> As far as I can tell, the whole reason for these
> discussions is the insistence on the strict
> PA-addressing model, with no ability to advertise
PA
> space to other providers.

The whole reason for the strict PA-addressing model
is concern over
whether open-slather on PI address space will result
in an Internet
that will scale.

Is it easier to scale N routers, or scale 10000*N
hosts? If we simply moved to an "everyone with an ASN
gets a /32" model, we'd have about 30,000 /32s. It
would be a really long time before we had as many
routes in the table as we do today, let alone the
umpteen-bazillion routes which scare everyone so
badly.

Joe

(Failing miserably to keep quiet. Must try harder.)

(don't worry - you have content in these posts.
content is always welcome...)

David Barak
Need Geek Rock? Try The Franchise:
http://www.listentothefranchise.com

Jeroen_Massar1 · March 1, 2006, 5:18pm

[..]

Is it easier to scale N routers, or scale 10000*N
hosts? If we simply moved to an "everyone with an ASN
gets a /32" model, we'd have about 30,000 /32s. It
would be a really long time before we had as many
routes in the table as we do today, let alone the
umpteen-bazillion routes which scare everyone so
badly.

Today indeed, but you are missing the point that IPv6 should last for
the couple of next decennia. In IPv4 the starters also got a nice /8 as
a bonus and the result: new small entities complaining that the first
ones got the cool stuff and they can't have any.

You might have noticed the 32-bit ASN talk. This is there for a reason:
ASN's will go to 32bit mode. Can you say 4 million routes?
Simple isn't always good. The KISS principle doesn't always work...

The current 30k in-use ASN's (afaik they are even less) will and can
explode when that means you can get easy address space.

Btw, this is policy talk, you might want to bring that to ARIN PPML or
the various other lists. If you want to propose a PI policy, then please
make a decent proposal and send the relevant RIR group.

That endsites require "PI" is inevitable, but the way those routes end
up in the routing tables and the amount of address space each endsite is
getting should be relevant to need, not to the fact that you got an ASN.
(Which would mean I would qualify for 2x /32's... which is very silly as
the couple of /48's I use is way more than enough.

Please don't mix up addressing and routing. "PI addressing" as you
mention is addressing. SHIM6 will become a routing trick.

Greets,
Jeroen

(who simply would like a policy where endsites that want it could
request a /48 or /40 depending on requirements from a dedicated block
which one day might be used for identity purposes and not pop up in the
bgp tables or whatever we have then anymore....)

Jared_Mauch · March 1, 2006, 5:29pm

I think you're missing that some people do odd
things with their IPs as well, like have one ASN and 35
different sites where they connect to their upstream Tier69.net
all with the same ASN. This means that their 35 offices/sites
will each need a /32, not one per the entire asn in the table.

And they may use different carriers in different
cities. Obviously this doesn't fit the definition that some have
of "autonomous system", as these are 35 different discrete networks
that share a globally unique identifier of sorts.

- jared

Joe_Abley3 · March 1, 2006, 5:40pm

I'm just one guy, one ASN, and one content/hosting

network. But I

can tell you that to switch to using shim6 instead

of BGP speaking

would be a complete overhaul of how we do things.

You are not alone in fearing change.

It isn't fearing change to ask the question "it's not
broken today, why should I fix it?"

What's broken today is that there's no mechanism available for people who don't qualify for v6 PI space to multi-home. That's what shim6 is trying to fix.

It seems reasonable to say that in some cases shim6
re-homing
transitions will be faster than the equivalent
routing transition in
v4; in other cases it will be shorter. Depends on
the network, and
how enthusiastically you flap, perhaps.

A - X - Y - B
\ | \ | /
W - Z

A and B are hosts, W-Z are ISPs

On what basis would you say that in the event of a
network outage in Y, communication between A and B
will be faster than the routing transition?

That's an example of a simple topology where the routing system might be expected to reconverge rapidly.

However, it's not hard to find examples in today's v4 Internet where reconvergence following a re-homing event can take 30 to 60 seconds to occur. In the case where such an event includes some interface flapping, it's not that uncommon to see paths suppressed due to dampening for 20-30 minutes.

I would expect (in some future, hypothetical implementation of shim6) that the default failure detection timers to start rotating through the locator set far sooner than 30-60 seconds.

We have peering arrangements with about 120 ASNs.
How do we mix BGP
IPv6 peering and Shim6 for transit?

You advertise all your PA netblocks to all your
peers.

And maintain 120 different context tables on each
host?

No; maintain one address per PA netblock on each host.

You avoid it completely, and use PA space in every
POP. You can still
announce PA space from other POPs to peers, if you
want to retain
your tunnels.

Wait a second - doesn't that deaggregation bring back
the "lots of small routes" business which the whole v6
hierarchical addressing model was supposed to fix? If
we're in the world of deaggregates anyway, why not
just ditch the addressing model instead of accepting
its limitations in this way?

There's a vast difference in impact on the state held in the core between deaggregating towards direct peers, and deaggregating towards transit providers and having the deaggregated swamp propagated globally.

Joe

Iljitsch_van_Beijnum · March 1, 2006, 6:08pm

Is it easier to scale N routers, or scale 10000*N hosts?

Is it easier for the government to make a 5 year plan or for everyone to spend time and energy finding the best deal for everything?

Every router has to search through its FIB tables for every packet it forwards. That's something like 10 FIB lookups for every packet flowing between two hosts. The hosts only have to search through their TCBs for every packet. Number of TCBs in nearly all hosts is smaller than the average FIB size (even if you consider that many routers don't have a full table). 2 x relatively small is a lot less than 10 x relatively large. Or, in other words: on the host you only pay if you actually communicate. In routers, you pay more as there is more routing information, whether the extra information is used or not.

If we simply moved to an "everyone with an ASN
gets a /32" model, we'd have about 30,000 /32s. It
would be a really long time before we had as many
routes in the table as we do today, let alone the
umpteen-bazillion routes which scare everyone so
badly.

1. We've already walked the edge of the cliff several times (CIDR had to be implemented in a big hurry, later flap dampening and prefix length filtering were needed)
2. We'll have to live with IPv6 a long time
3. Route processing and FIB lookups scale worse than linear
4. If the global routing table meltdown happens, it will be extremely costly in a short time
5. Even if the meltdown doesn't happen a smaller routing table makes everything cheaper and gives us more implementation options (5000 entry TCAM is nice, 500000 entries not so much as it basically uses 100 times as much power)
6. Moore can't go on forever, there are physical limitations

But the most important thing we should remember is that currently, routing table growth is artificially limited by relatively strict requirements for getting a /24 or larger. With IPv6 this goes away, and we don't know how many people will want to multihome then.

Iljitsch_van_Beijnum · March 1, 2006, 6:23pm

I agree.

The address space is one dimensional. This means you can encode a single thing in it in a hierarchical manner "for free". With PA, that's the ISP: for any address, it's very easy to determine which ISP it belongs to and thus route the packet to that ISP. (We're so used to this that we don't even notice anymore.)

However, this doesn't work for multihoming because rather than a linear space starting with ISP A and ending with ISP Z we now have a matrix: A-A, A-B, A-C ... Z-X, Z-Y, Z-Z. (Worse with more than two ISPs.) You can't do a longest match first lookup on a multidimensional space, so in routing, every end-user becomes his own ISP and occupies a slot at the top of the hierarchy.

The thing is, it's not even hard to aggregate differently: just have router A hold the first quarter of the global routing table (0/2 with v4), router B the second quarter (64/2), router C the second quarter (128/2) and router D the fourth quarter (192/2), for example.

There is one snag, though: either you need four routers in each location, or you have to bring the traffic to the place where the router handling that part of the table is located.

Now I happen to think that we can massage this such that it's not necessary to add extra routers to speak of or backhaul traffic through places where it doesn't belong so basically all of this is free (no new protocols!), but unfortunately, I'm having a hard time convincing others that this is a workable approach.

Kevin_Day · March 1, 2006, 6:32pm

If you include "Web hosting company" in your definition of ISP, that's not true.

Right. I wasn't; I listed them separately.

It's important to note that even if you are a hosting company who *does* qualify for PI v6 space, you still need shim6-capable servers, if you want to make them optimally available to multi-homed, shim6-capable hosts. The difference PI makes is in the distribution of addresses to servers (the servers only need a single set).

Which isn't a point to be glossed over. PA v.s. PI is a big deal to people for many reasons. (Portability between providers being the biggest.)

I'm just one guy, one ASN, and one content/hosting network. But I can tell you that to switch to using shim6 instead of BGP speaking would be a complete overhaul of how we do things.

You are not alone in fearing change.

It's not so much fearing change, as the bar of effort/difficulty in transitioning from 4 to 6 being really high.

If IPv6 is made as painless as possible to people to transition to it NOW, people will. If you can tell every ISP, NSP, hosting company and end site that they can continue doing what they're doing now in 4, but with vastly more address space, you'd have a lot of convertors. Every thing that you make people do differently (even if it is for a greater good, and even if it benefits them directly) is one more reason people will give NOT to do it until they have to.

Telling a hosting company "Here, you can get a /32 or /44 of your own which is virtually unlimited space for your needs, continue using BGP and TE as you do now, just deploy IPv6 on top of IPv4 and you're live" is an easy sell.

Telling a hosting company "You have to lose the independence of PI space. You need to completely start over with your traffic engineering and do it in a way totally orthogonal to how you have to continue doing it in IPv4 space(adding workload instead of replacing what you're doing in IPv4). You now need to trust your customers/end users to do the right thing with routing. Routing now involves routers, servers and DNS - instead of a handful of devices making routing decisions now ALL of your devices need to make routing decisions. Even if you do get PI space somehow, you can't deaggregate it even if you truly run multiple isolated networks.(Don't say 'Get additional allocations then!' because that still results in the same number of routes added to the global table, while wasting even more space)" is a really hard sell. The only carrot you can offer someone is "You can have lots more space", which I personally don't think is worth even half of those negatives.

If significant percentages of networks are going to heavily push back/delay deployment of IPv6, IPv4 will be exhausted before a critical mass has switched to IPv6, making the whole "how do we protect the long term future of IPv6" rational for these policies a little less important.

It reminds me of a story from engineer who worked for AT&T/Bell when touch tone dialing was first being tested in a few markets. AT&T wanted to offer touch tone dialing as a convenience for users, but they also had a desire to standardize the dialing procedure everywhere. In some markets you could make local calls by just dialing the last 4 or 5 digits of their number, others required the full 7. Some required a 1+XXX-XXXX to make calls outside their city, some only required the 1 if it was a toll call. AT&T really wanted to change their network where everyone in every city used the same procedure to make outbound calls. They decided that they'd make the new dialing plan mandatory with touch tone service. The carrot was "Look how much easier it is to dial compared to a rotary phone!", and they got the benefits of forcing a standardized system on everyone everywhere at the same time. The customer gets something that makes their life easier, and the operator of the network gets to make their job easier by standardization and reduced overhead of supporting hundreds of incompatible dialing plans in each exchange that people were used to.

They tested it in a few cities, with a few customers(business and residential). A large number of people perceived touch tone dialing negatively, so much so that they asked for their rotary phones back. It had nothing to do with the push-button interface, it was asking people to take a perceived negative along with a positive they weren't sure they needed in the first place. Asking people to make too many changes at once outweighed the convenience of faster dialing. Users found it too confusing to have the new system (touch tone dialing, a.k.a. IPv6) work so much differently than what they were used to (rotary dialing a.k.a. IPv4) that they couldn't see past the change into what the advantage was. In the end, they gave the customers touch tone dialing, then gradually deployed the new dialing plan, using permissive grace periods where both dialing plans worked for as long as possible.

I'm worried the same will happen with IPv4/IPv6. The temptation of virtually unlimited addressing is really nice. But I think the negatives of the allocation policy and the direction of how multihoming is going will scare off the willing participants, and we'll be stuck with only getting people to switch when they're forced to due to IPv4 exhaustion.

My advice: Get people to switch now.

If you have IPv4 PI space, you can get IPv6 PI space. If you have a /21 now, you get a /48. If you have a /20 now, you get a /47. If you have a /19 now you get a /46. Etc.

If you have anything bigger than a /48, you can rely on people accepting deaggregated prefixes of /48 or shorter.

Push shim6 for people who don't need fancy traffic engineering, as a tool for small/medium business. Heck, even residential if you want to go that far.

Get a critical mass of people using IPv6 as soon as possible. Once it's there, once people are comfortable with it, and once IPv6-acting-as-much-like-IPv4-as-possible has proven itself, let people WILLINGLY deploy shim6 if it truly would be advantageous to them. I'd have no problem with raising the initial/yearly fee per ASN if it made everyone comfortable that they were only being used where it was truly needed.

On top of all of that, I'm still not convinced that IPv6-acting-as-much-like-IPv4-as-possible isn't going to have significantly less routes than IPv4, even if everyone moved today. Aren't a large number of the routes being advertised due to people having to go back and ask for more/bigger allocations again and again? If everyone out there right now had to announce only 1 route per POP, and 1 route per multihomed customer with PI space, wouldn't your outbound routes shrink pretty significantly? That would be a huge step in the right direction, and would buy loads of time to allow for a much more gradual transition.

Putting routing decisions in the control of servers we don't operate scares me. I wouldn't rely on 90% of our customers to get this right unless it was completely idiot proof. Even if it was, I don't see how we can trust that users aren't messing with things to "game the system" somehow.

This is the kind of feedback that the shim6 architects need. There is talk at present of whether the protocol needs to be able to accommodate a site-policy middlebox function to enforce site policy in the event that host behaviour needs to be controlled. The scope of that policy mediation function depends strongly on people like you saying "at a high level, this is the kind of decision I am not happy with the hosts making".

While I'm happy to give that feedback anywhere it's needed/welcome, I'm kind of surprised alarm bells didn't go off already about this.

We deal with long lived TCP sessions (hours/days). I don't see how routing updates can happen that won't result in a disconnect/reconnect, which isn't acceptable.

One of the primary objectives of shim6 is to provide session survivability over re-homing events. Since routing protocols are not used to manage re-homing, the speed at which a session can recover from a topological event depends on the operation of the shim6 protocol between client and server.

I'd... be really curious to see how that works, without having to add intelligence to the application layer and stateful firewall layer to handle this.

I don't mean to take an "I'll believe it when I see it" stance, but I think a lot of layers(that may exist on other hosts) would have to be changed to support doing that.

We have peering arrangements with about 120 ASNs. How do we mix BGP IPv6 peering and Shim6 for transit?

You advertise all your PA netblocks to all your peers.

Ok, I was a bit too vague there...

How do we ensure that peering connections are always used instead of transit connections?

Currently for outbound, we can localpref peer-learned routes over everything else. How do all of our servers on our end know that routes learned from a BGP session on our own routers are desirable?

For inbound, we can either rely on our peers to do the same, prepend what we send out to transit to make peer routes look even better to our peers, or if we want to force the issue we can send peers more specific routes than we send to transit (operating on the principle of "most specific wins", no matter what else is done).

I'm not seeing how BGP routing information can enter into Shim6's decision making, in any scalable fashion, and again.. something that updates near instantly.

So far it looks like Shim6 is going to rely on DNS. The DNS caching issue is a real problem. We need changes to happen faster than DNS caching will allow.

Well, not quite.

If you change a transit provider, then you need to remove a set of AAAA records from the servers you operate, and substitute a new set. The time taken for this change to propagate in the DNS is non-zero, assuming you use reasonable TTLs. This is your point above, I think.

Reasonable TTLs on our end or not, lots and lots of people don't respect TTLs.

Seriously, try this sometime. Set the TTL for a very busy site to 5 minutes. Wait 2 weeks, to make SURE everyone is seeing the new TTL. Change the IP in the A record. Watch how long it takes for traffic to stop coming in the old IP. If you see 90% of your traffic moved in less than 4 hours, I'd be surprised. If you saw 99% of your traffic moved in less than a day, I'd be even more surprised.

I don't know if this is intentional misbehavior of DNS caches, broken software, broken OSes, or where the issue is. But, I can attest that having to make sudden changes from one provider's PA space to another with no grace period will result in support issues for at least a day of "Why can't I reach your site?".

Renumbering for hosting providers can be a monstrous pain in the neck, especially for hosting providers who rely on third parties (or, horrors, their customers) to maintain the zone files within which services are named.

Yep. We don't control the DNS for quite a number of the sites we're hosting. Making our customer get involved every time we add/remove a transit connection isn't going to be fun. Especially when "big" providers who can qualify for PI space of their own don't have to do this.

Some hosting providers of my acquaintance insist on customer zones being redelegated to the hosting providers' nameservers, so that any renumbering that needs to happen can be coordinated by the hosting provider directly. Hosting providers who don't do this, and who use PA addresses with shim6 to multi-home, are definitely going to face some challenges.

That's possible for some hosting providers, not for others. It's not uncommon for a customer to use one provider for some services(web hosting, for example) and another provider for others(secure/e-commerce, for example). Trying to make two competing providers play together nicely when managing DNS for one domain is a recipe for disaster, even with sub-delegation.

Some of our applications are extremely sensitive to jitter/latency. We've spent ages tweaking route-maps manually (and through automated continual tweaking) to make sure we avoid any congested links. [...]

The site-policy middleware that I alluded to earlier seems like the analogous place to specify this policy. Such a facility might actually give you more control than you have now -- tweaking BGP attributes to accomplish this kind of thing is often like a game of whack-a-mole; if you were able to control the route taken by traffic in both directions by influencing the locator selection for each and every session, you'd have far greater, and more fine-grained, control over your external traffic than BGP/swamp-abuse gives you currently.

While I don't doubt that there are advantages and disadvantages to each way of doing things, I'd much prefer to be given the choice of selecting the one that works best for us, possibly a mix of both.

We'd still be relying on PA space. No matter how great dhcp6 is, there will be significant renumbering pain when providers are changed. Static ACLs, firewall rules, etc. If you're including customer machines in the renumbering, many simply won't do it.

Agreed, renumbering is a pain. Dhcp6 sounds like a scary thing to use with servers. Customers suck. Change in operational practices will be required.

Lest I sound too much like a foam-at-the-mouth shim6 advocate, I think it would be perfectly fine if, in the final analysis, the conclusion was that shim6 and PA/renumbering was not an option for hosting providers. A reasoned technical argument which came to that conclusion would provide a solid basis for the RIRs to modify their allocation policies such that hosting providers could use PI space instead. As perhaps the recent attempt to change the v6 PI policy indicates, the chances of making changes without such a reasoned argument are slim.

And that's kind of my overall point I've been not-so-successfully trying to make.

IPv4 is running out. We need people to switch to IPv6, sooner rather than later. Instead of trying to make the process as painless as possible, and with using tools that are available now, a large swath of what probably would be the FIRST people(content/hosting companies) who would want to move to IPv6 are being told to "hang tight until we figure out this multihoming thing, just don't expect it to work at all the same as how you do it now." The additional bite of "You can't do what you used to do, because if everyone did it the internet would break. Oh, and your bigger competitors can still do things like that, just not you." isn't helping matters.

Stopping the routing table from exploding in the future is obviously a goal everyone needs to have. Everyone needs to think about this NOW rather than when it's too late, I agree.

But, policies shouldn't be written depending on tools that don't exist yet, which is basically what's happening now. If IPv6 were permitted to be used in the same manner that IPv4 is now (PI space is accessible to just about everyone, everyone with an ASN can run BGP, you're trusted that if you deaggregate your space it's for a good reason, etc), people could actually begin moving things now. Taking an existing IPv4 network and overlaying an IPv6 network over the top of it is relatively easy, we went from planning to full deployment in a week.

Even if 100% of the IPv4 networks moved to IPv6 today, we'd still have a smaller table size in 6 than 4. Growth would be slower (ISPs and NSPs wouldn't continually be adding more networks as they grew, initial allocations should be enough for just about everyone).

Then when Shim6 is developed, you can rely on the current release of every major OS supporting it, router/middleware/etc vendors supporting it, and all the kinks are worked out, you can let the people who find Shim6 appropriate for their needs to use it. Make one of the requirements to get an ASN a justification for why non-ASN/non-public-routing doesn't work for your organization. Then let each network operator choose the right tools for the job.

Without totally upending my network, I need:

1) PI space
2) The ability to deaggregate that PI space where truly needed. (or the ability to request multiple PI blocks, but I don't see how that helps matters)
3) BGP announcements to the world

If IPv6 can't give me those, and the only thing it's offering is more space... That's just not worth the serious amount of labor and reengineering of our network. If that's the value proposition, we'd hold out on IPv4 as long as possible... And I'm *FOR* IPv6.

-- Kevin
(wondering how many people are muttering 'kook' right now)

Joe_Abley3 · March 1, 2006, 6:55pm

We have peering arrangements with about 120 ASNs. How do we mix BGP IPv6 peering and Shim6 for transit?

You advertise all your PA netblocks to all your peers.

Ok, I was a bit too vague there...

How do we ensure that peering connections are always used instead of transit connections?

Currently for outbound, we can localpref peer-learned routes over everything else. How do all of our servers on our end know that routes learned from a BGP session on our own routers are desirable?

If a client has a set of locators, some of which are reachable via peering connections and some of which are reachable via transit connections, you want to be able to bias the locator selection such that (perhaps, e.g.) peering locators are always preferred to those which involve transit providers.

Although simple performance benefits might cause hosts to make that decision on their own, you don't want to leave the decision up to the hosts, since if they get it wrong it will cost you money.

This seems inordinately reasonable. Did I summarise correctly?

[...]

But, policies shouldn't be written depending on tools that don't exist yet, which is basically what's happening now.

Actually, I think policies are conservative because there's a lack of solid, technical argument for loosening them. If (for example) there was consensus between operators and shim6 architects that the requirements of hosting providers are definitively not met by shim6, for technical/architectural reasons, then it'd be far easier to convince RIR memberships and boards that policy modifications should be made to accommodate operators like you.

[...]

Even if 100% of the IPv4 networks moved to IPv6 today, we'd still have a smaller table size in 6 than 4. Growth would be slower (ISPs and NSPs wouldn't continually be adding more networks as they grew, initial allocations should be enough for just about everyone).

The trouble with this is that for every argument that says "PI for all will be fine" there's a corresponding argument that says "PI for all will not scale".

Without totally upending my network, I need:

1) PI space
2) The ability to deaggregate that PI space where truly needed. (or the ability to request multiple PI blocks, but I don't see how that helps matters)
3) BGP announcements to the world

For sure, the simplest and cheapest thing for you would be to obtain PI space and continue as you have been doing with v4. The implications of that are not necessarily simple nor cheap for others, however, if you're one of $BIGNUM people doing it.

-- Kevin
(wondering how many people are muttering 'kook' right now)

Joe

Kevin_Loch1 · March 1, 2006, 7:49pm

Kevin Day wrote:

If you include "Web hosting company" in your definition of ISP, that's not true. Unless you're providing connectivity to 200 or more networks, you can't get a /32. If all of your use is internal(fully managed hosting) or aren't selling leased lines or anything, you are not considered an LIR by the current IPv6 policies.

Leased lines are not required. You can assign a /48 to any
separate organization you provide connectivity to even if they are
colocated. A business model where you don't assign /48's to any
customers does seem to preclude being an LIR. Web hosting companies
that do assign /48's to some customers would qualify.

Even the proposed ARIN 2006-4 assignment policy for "end sites" doesn't help a lot of small to mid sized hosting companies. For that, to just get a /48, you need to already have a /19 or larger, and be using 80% of that. That's 6553 IPs being utilized. If you're running a managed hosting company (name based vhosts) and deploying 1 IP per web server, you're pretty huge before you've hit 6553 devices. Even assuming 20% of that is wasted, you're still talking about more than 5000 servers. 40 1U servers per rack, you need to have 125 racks of packed to the gills servers before you'd qualify for PI space. That excludes every definition I have of "small-to-medium" in the hosting arena.

The latest revision of 2005-1 is also on the table. It would allow
for a /48 assignment for any organization that qualifies for IPv4 space,
(even /22). Name based virtual hosting is not required either.

You don't get PI space, and Shim6 is looking like your only alternative for multihoming.

We are only limited by our own imaginations and and by what actually
works. This is a hard problem to solve and the solution doesn't have
to come from the IETF.

- Kevihn

Owen_DeLong · March 1, 2006, 10:58pm

Please don't mix up addressing and routing. "PI addressing" as you
mention is addressing. SHIM6 will become a routing trick.

I think that is overly pessimistic. I would say that SHIM6 _MAY_
become a routing trick, but, so far, SHIM6 is a still-born piece
of overly complicated vaporware of minimal operational value, if any.

Personally, I think a better solution is to stop overloading IDR
meaning onto IP addresses and use ASNs for IDR and prefixes for
intradomain routing only.

Greets,
Jeroen

(who simply would like a policy where endsites that want it could
request a /48 or /40 depending on requirements from a dedicated block
which one day might be used for identity purposes and not pop up in the
bgp tables or whatever we have then anymore....)

I would, for one. Policy proposal 2005-1 (I am the author) comes reasonably
close to that. It will be discussed at the ARIN policy meeting in
Montreal in April.

Owen

Owen_DeLong · March 1, 2006, 11:01pm

I think you're missing that some people do odd
things with their IPs as well, like have one ASN and 35
different sites where they connect to their upstream Tier69.net
all with the same ASN. This means that their 35 offices/sites
will each need a /32, not one per the entire asn in the table.

People who are doing that have not read the definition of the
term ASN and there is no reason that the community or public
policy should concern itself with supporting such violations
of the RFCs. An AS is a collection of prefixes with a consistent
and common routing policy. By definition, an AS must be a
contiguous collection of prefixes or it is not properly a
single AS. Using the same ASN to represent multiple AS is
a clear violation.

And they may use different carriers in different
cities. Obviously this doesn't fit the definition that some have
of "autonomous system", as these are 35 different discrete networks
that share a globally unique identifier of sorts.

It doesn't fit the RFC definition of AS. Therefore, there is no
reason to support such usage on a continuing basis. You violate
the RFC's you takes your chances.

Owen