Geographic v. topological address allocation

Not that I want to advocate uniform retail pricing structures -
my point is only that without such uniformity of retail structures
such 'follow the money' arguments as Sean presents here
quickly break down to a maze of twisty little passages.

I agree that following the money doesn't always work. Personality
seems to play a key part. I don't know how else to explain some of
the decision making that went into some of today's network operation.
People tend to view the problem in different ways depending on their
background.

What I was trying to postulate, unsuccessfully, there is no such
thing as an universal, optimal hierarchical addressing scheme. I
thought I had chosen examples from the opposite ends of the spectrum.
I guess I wasn't extreme enough in my examples. Perhaps I should
have used the ISBN hierarchy, a combination of language group, country
and publisher prefix. I'm going to publish a million books, so I
should get a 'big' publishers prefix.

What I was trying to postulate, unsuccessfully, there is no such
thing as an universal, optimal hierarchical addressing scheme.

Sean,

It's pretty much obvious that the obvious definition of 'optimality' can
only be achieved if the topology is strictly hierarchical.

Further, there is probably a 'maximal' addressing scheme for a given
topology. But's only useful for the next 10 seconds that that topology
continues to exist. :wink:

The other point that I think you're trying to make is that the maximal
hierarchy must follow the topology, and in some cases, this may actually
cause addressing divisions to not follow organizational boundaries.
Further, the organizations involved must be the ones to recognize this and
follow thru appropriately.

An interesting metric might be if the organization's min-cut-set bandwidth
is exceeded by its regional access bandwidth.

Tony

Sean Donelan <SEAN@SDG.DRA.COM> writes:

What I was trying to postulate, unsuccessfully, there is no such
thing as an universal, optimal hierarchical addressing scheme. I
thought I had chosen examples from the opposite ends of the spectrum.
I guess I wasn't extreme enough in my examples. Perhaps I should
have used the ISBN hierarchy, a combination of language group, country
and publisher prefix. I'm going to publish a million books, so I
should get a 'big' publishers prefix.

You are confusing transport address (describing a location
in a topology) with an object name (a book, a computer, a
process running on a computer).

An object name like an ISBN does not need to be
hierarchical because they do not describe discrete
locations. Introducing hierarchy improves managability
and efficiency of databasing. Many hierarchies can be
imposed on object names.

IPv4 addresses and anything like them need to be
hierarchical in sufficiently large networks because they
do describe the location of one or more objects, and
because flat addresses are known not to scale in large
networks. Hierarchical routing is the only known means of
scaling IP addresses as they exist now, and therefore the
only hierarchy that can be imposed on IP addresses is
strictly topological.

The canonical object-describing database that is the
roughly the analogue of the ISBN database is the DNS.
(I hate analogies I hate analogies I hate analogies please
don't use them). The DNS is also manifestly hiearchical,
and that hierarchy introduced efficiency compared to the
former flat ARPA namespace.

Note that the DNS works with suffixes rather than
prefixes, which is a cosmetic difference unless one is
interested in doing binary sorts or tree-based searces,
and that the DNS is variable-length, which is not a
cosmetic difference from the ISBN.

If you are a big organization and plan to have lots of
objects you need a sufficient swathe of DNS names to
describe them all. These in turn should resolve into
LOCATORS which describe where in the Internet topology (as
opposed to the corporate topology or the geographical
topology) the objects can be reached.

For example, one hierarchy of Internet object names is
clock.org. Although cesium.clock.org and solar.clock.org
are siblings in that hierarchy, they have very different
IP addresses because they are located in different parts
of the Internet topology. They should NOT have the same
IP prefix as any attempt at that would introduce
unnecessary inefficiency into the routing system.

Remember: at each level of naming there can be different
and completely disjoint hierarchies. There are
scalability implications in all of them, most notably when
the names used have size limits or are distributed
non-hierarchically (like ethernet addresses or the COM
domain). However, the important thing is that when a name
is used as a LOCATOR in a topology, in order to be
scalable that name must be related to that topology and in
large networks must lend itself to aggregation in order to
reduce the amount of information needed to have that
locator be used throughout the network.

  Sean.

Tony Li <tli@juniper.net> writes:

An interesting metric might be if the organization's min-cut-set bandwidth
is exceeded by its regional access bandwidth.

This is a brilliant idea, but how do you propose to
measure the regional access bandwidth, or the even
more fun task of determining the total bandwidth so that
you can determine the pro rata share due any arbitrary
subtree?

  Sean.

An interesting metric might be if the organization's min-cut-set bandwidth

   > is exceeded by its regional access bandwidth.

   This is a brilliant idea, but how do you propose to
   measure the regional access bandwidth, or the even
   more fun task of determining the total bandwidth so that
   you can determine the pro rata share due any arbitrary
   subtree?

I don't propose to measure it. I assume that the organization has the
wherewithal to track its own circuits. I know, I know, I'm naive for
living in an altruistic Internet ....

Tony

This is true, but the definition of the top of the hierarchy is arbitrary
and is the nexus of the debate about "topological" versus geographical
addressing, which I interpret as "ISP at top" versus "exchange point at
top" hierarchies. Both are valid topological hierarchies.

--Kent

This is true, but the definition of the top of the hierarchy is arbitrary

Not at all true. The top of the hierarchy must be default free.

and is the nexus of the debate about "topological" versus geographical
addressing, which I interpret as "ISP at top" versus "exchange point at
top" hierarchies. Both are valid topological hierarchies.

True, however, geographic addressing has some rather severe practical
problems. The exchange point at the top becomes a single point of
failure. So it needs replication. But then, there needs to be
interconnect between the exchange points. Who provides it?

All this and more has been beaten to death. If you start with the premise
of geographic addressing and try to beat it into working, you end up with
an ISPAC. See ftp://ftp.juniper.net/pub/users/tli/ispac.txt.

Tony

"Kent W. England" <kwe@geo.net> writes:

This is true, but the definition of the top of the hierarchy is arbitrary
and is the nexus of the debate about "topological" versus geographical
addressing, which I interpret as "ISP at top" versus "exchange point at
top" hierarchies. Both are valid topological hierarchies.

As tli pointed out the top of the hierarchy is not
arbitrary, it must be default free.

In a hierarchical routing system there are three
forwarding directions to consider: intra-area ("lateral"),
default ("upwards") and sub-area ("aggregate" or "downwards").

At the top of a hierarchy you cannot have an upwards
forwarding direction, therefore the entire address space
must be intra-area or presented as an aggreagate.

If you consider an addressing structure that looks like
this:

      level-3-area-id:level-2-area-id:level-1-area-id:final-flat-id

and in an internetwork with three levels of hierarchy,
this pattern is easy to consider.

A level-2 router may have some things directly attached to
its level-2 area, including its peer routers, and it would
carry routes towards them, which probably would be in a
flat routing table. Among the reasons it needs these
routes is that it has to know where to send traffic
towards each of the level-1 areas that are attached to its
own area, and it has to know where to send "default"
traffic towards one or more of its in-area peers that have
level-3 connections.

Each such level-3 router would have to know how to foward
to any given level-2 area, and therefore would need to
carry routes for each level-3/level-2 gateway.

Each level-1 router, by contrast, only needs to know how
to route towards all the things in its area, and how to
reach at least one level-2 router.

(One can be a little tricky and have a single level-1 area
connected to multiple level-2 routers in different level-2
areas, in which case better routing optimality may be
obtained by the level-1 router carrying some level-2
routing information. This would be analogous to Yakov
Rekhter's "route pull".)

However, the minimum set of routes to carry is that which
can cause traffic to be forwarded along a strict
single-path tree-like hierarchy. This requires that each
area be fully contiguous at all times. Other routes may
be introduced in various places to alter this behaviour if
that is desirable, or to effect IS-IS style partition repair.

In order for the hierarchical routing system to scale the
number of entities known in any given area must be small
enough to route on in what is conceptually a flat manner.
That means that there are bounds on the number of level-n
to level-n-minus-one areas, and this in turn requires that
the addressing scheme allow for a deep enough hierarchy.

Consequently, the number of things in the top,
default-free hierarchy is always going to be limited, no
matter what "type" of hierarchical allocation scheme is
proposed.

The further requirement that any given area be fully
contiguous means that the "top" of the hierarchy must be
self-repairing.

In other words, as tli pointed out, if you have a switch
in some convenient geographical location, like the
Grenwich Observatory or the UN building or MAE-EAST, your
entire routing system fails if that switch or that
location fails.

Consequently, to avoid the single point of failure, there
would be a desire to have several diverse geographical
locations to act as the top of the hierarchy.

The problem, again, is that any given area must be fully
contiguous, and this implies that any level-n router
connecting to one of these diverse locations would have
connectivity to every other level-n router, so that this
top level n area would be contiguous.

One could propose to implement this as a big bridged
network. The original DGIX proposal was along these
lines. Operational experience with much smaller but still
big bridged exchange points has demonstrated pretty much
conclusively that this is a Really Really Bad Idea.

One could propose to implement this using the native
protocol, effectively connecting all of these exchange
points into a level-n-plus-one area of its own. As long
as the level-n-plus-one area could route to all the
level-n areas at all times, this would work just nicely,
on a technical basis.

The difference therefore between your "ISP at top" option
and your "exchange point at top" option is that in a
hierarchical addressing system, which is the only way we
currently know how to scale a global internetwork, is
merely in the choice of words. Whatever is at the top has
to connect reliably and continuously all the things that
are one step down from the top, and simple
belt-and-suspenders implies geographical diversity.

Thus, the top of the hierarchy may be expressed as a big,
geographically diverse bridged network connecting all the
"next-level-down" routers, a single big geographically
diverse routed network comprising a single area, or a
meshed concatenation of the "next-level-down" routers in
such a way that robust interconnectivity among them is
maintained at all times.

The choice is probably best made on the basis of
reliability and cost, but experience shows that it is
likelier made on the basis of politics, autonomy/mistrust
of other operators, marketing goals, and possibly cost.

If it were possible for two routing areas to cleanly
synthesize a next-level-up area, which probably implies
the use of variable length addresses, then it strikes me
intuitively that better routing hierarchy than is likely
to be cobbled together through the deployment of physical
infrastructure can be enjoyed, keeping the number of
routing entries needed by any given router anywhere in
such an Internet to a minimum.

With a variable length addressing scheme, in other words,
one can consider a set of operations which can be
summarized as "make-hierarchical" or "make-lateral".

The obvious and increasingly important first baby step in
an evolutionary path towards a scalable Internet is, to
quote Noel Chiappa, "to make the world safe for NAT, by
making all end-to-end functions use the DNS name; e.g. for
authentication, pseudo-headers for checksums, etc, etc."
As he continued, this can be justified solely on the basis
of working better with NAT, which solves some real-world
problems now, and which is being used now.

There is lots more to discuss. Is big-internet still in
post-Bass trauma? If not, let's discuss it there, or
privately.

However, to tie in some tiny degree of NANOG relevance,
and to emphasise through repetition, the idea of using a
large bridged network has been broken through the history
of exchange points, particularly since the lovely days
when people didn't learn from Milo's FIX upgrade path and
began doing multimedia bridging. Single exchange points
fail, so avoiding the large briged network by having a
single exchange point be the "top" of a hierarchy won't
work. Therefore, the current hierarchy implemented in
provider-based addressing with some coordination to
preserve some degree of geographic alignment of addresses
(through ARIN, RIPE and APNIC, and large-ISP allocation
strategies), is almost certainly the most appropriate one.

That is to say, we got CIDR pretty much right.

  Sean.

Tony Li wrote:

True, however, geographic addressing has some rather severe practical
problems. The exchange point at the top becomes a single point of
failure. So it needs replication. But then, there needs to be
interconnect between the exchange points. Who provides it?

Another way to say it is that monopoly is necessary to take
advantage of geographic addressing. Baby Bells do that now
(with the area codes). I'm wondering what is their idea of
migrating into competitive market (centralized database?)

--vadim

Another way to say it is that monopoly is necessary to take
   advantage of geographic addressing. Baby Bells do that now
   (with the area codes). I'm wondering what is their idea of
   migrating into competitive market (centralized database?)

My understanding is that they used mandated exchange points to deal with
the deregulation of IXC's, but this only solves one half of the problem.

Tony

Sean Doran wrote:

As tli pointed out the top of the hierarchy is not
arbitrary, it must be default free.

Sorry for the nitpicking, but this definition has at least two
flaws:

a) there's a bi-partite backbone configuration, where each half
   has default pointing to the other half. Both do not have to
   carry full routes. (Of course, this scheme has problems with
   packets destined to the blue, or can be extended to more than
   two partitions).

   Actually, there's a very simple way to fix the problem with
   packets to nowhere. Simply have routers at exchange points to
   drop packets routed back to the interface from which they came
   from.

b) multihomed non-transit networks may want to be default-free
   and carry full routing to improve load sharing of outgoing
   traffic. Since they are non-transit, they cannot be considered
   "top of hierarchy".

Consequently, the number of things in the top,
default-free hierarchy is always going to be limited, no
matter what "type" of hierarchical allocation scheme is
proposed.

Bingo. Faster boxes, anyone? :slight_smile:

The further requirement that any given area be fully
contiguous means that the "top" of the hierarchy must be
self-repairing.

Note that IXPs are not "top" of the hierarchy, but just
aggregators for essentially point-to-point links between
tier-1 backbones.

One could propose to implement this as a big bridged
network. The original DGIX proposal was along these
lines. Operational experience with much smaller but still
big bridged exchange points has demonstrated pretty much
conclusively that this is a Really Really Bad Idea.

Not only technically -- politically that was a suicide,
as it assumed a signle operator (consortium, or pork money
funded).

--vadim

Vadim Antonov <avg@pluris.com> writes:

Sean Doran wrote:

> As tli pointed out the top of the hierarchy is not
> arbitrary, it must be default free.

Sorry for the nitpicking, but this definition has at least two
flaws:

a) there's a bi-partite backbone configuration, where each half
   has default pointing to the other half. Both do not have to
   carry full routes. (Of course, this scheme has problems with
   packets destined to the blue, or can be extended to more than
   two partitions).

In effect, this entails the synthesis by equivalent areas
of a superior level into which each party can default.
I touched on this briefly in my previous message.

With variable length addressing this kind of joint
level-n-plus-one synthesis is easy; the new area simply
encompasses sufficient bits to distinguish each
level-n/level-n-plus-one IS participating in it.

b) multihomed non-transit networks may want to be default-free
   and carry full routing to improve load sharing of outgoing
   traffic. Since they are non-transit, they cannot be considered
   "top of hierarchy".

Yes, I believe I also mentioned Yakov's "pull" (did you see his
slides at the IETF (and NANOG?) with which he presented
his push/pull definitions?) can be used to optimize
routing when strict hierarchical routing is inefficient.

Bingo.

We are in sync, Vadim. Surprise surprise.

  Sean.

P.S.:

Note that IXPs are not "top" of the hierarchy, but just
aggregators for essentially point-to-point links between
tier-1 backbones.

This is a good way of putting it. I will steal it and use
it myself from time to time.