multi-homing fixes

> I'm pretty sure I need further explanation to "get it"./

I probably still don't get it, but let me see if I understand
the mechanism.

First, assign a prefix to a particular non-topological "locus",
such as a metropolitan area, or a continent. Second, networks
inside that locus will announce only the prefix, but with these
exception bits. [Implied, but not stated: third, all these
networks will exchange full information so as to be able to
generate these exception bits]. Fourth, receivers of these
prefixes, with the exception bits, will expand the longest-match trie
(a Patricia tree is a compact representation of a trie, in common
use when you have data with many nodes with just one child) so
that lookups will only match in the case where there is no exception.

If I understand you, what you are trying to do is to reduce the
requirement for EVERY network operating within the aggregate to
carry traffic to the ENTIRE aggregate at all times. This ordinarily
would require announcing more specifics. So you propose a scheme
where you use an attribute instead of the more specifics. Unfortunately,
your attribute will cause the same behaviour in a receiver as
would the list of more specifics, and therefore is merely a compression
of the representation on the line that is somewhat better than, say, gzip.

IOW, I think you are solving the wrong problem.

We really have nearly zero experience with aggregates containing
disjoint topology (i.e., non-provieder-based aggregation), largely
because there is no obvious way to contain an explosion of more
specifics when complete internal connectivity and complete transit
break down. Steve Deering does propose a (partial) solution for
this, but (in my opinion) it involves a complete reversal of current
financial arrangements to work, in that a sender would have to compensate
a transit network for carrying its traffic to anything within that aggregate,
rather than the transit network collecting from the other (or both) parties.
This is only a partial solution, since even where there is an incentive
to maintain complete interconnectivity and carry traffic to all the
consitituent subnets of the aggregate, failures will still cause
black holes to arise even though other valid paths exist.

Your scheme does let one warn of black holes in this eventuality,
takes a bit less bandwith on the line, probably allows for the
"slosh" to happen all at once rather than in dribs and drabs,
and so forth, but it represents the same amount of work for the
routers processing the attribute. That is, those routers are
effectively brought inside the abstraction boundary of the "locus",
and as a result the goal of hiding information from those routers
is not met.

My gut feeling is that for any sizable "locus", almost all of
what we consider the core of the global routing system would be contained
within the new abstraction boundary, so we're no better off than
not aggregating in the first place.

That is, we are MUCH better off with PA addressing.

  Sean.

I probably still don't get it, but let me see if I understand
the mechanism.

First, assign a prefix to a particular non-topological "locus",
such as a metropolitan area, or a continent.

How this is done is important, because it influences the number of
customers an ISP will have per bitmap. Assigning a prefix to a continent
wouldn't be a good idea, because that way every regional ISP has to
announce the very large bitmap for the entire continent, while most of it
contains just zeros. Per metro area would be better. But two ISPs that
have many multi-homing customers in common could use a prefix for just the
two of them, regardless of geography.

Second, networks
inside that locus will announce only the prefix, but with these
exception bits. [Implied, but not stated: third, all these
networks will exchange full information so as to be able to
generate these exception bits].

The bitmaps are generated inside the source AS (presumably, iBGP will
still carry regular routes) and the bitmaps are transmitted from one
network to another, so there is no requirement for full interconnetion at
the routing level.

Fourth, receivers of these
prefixes, with the exception bits, will expand the longest-match trie
(a Patricia tree is a compact representation of a trie, in common
use when you have data with many nodes with just one child) so
that lookups will only match in the case where there is no exception.

Yes.

If I understand you, what you are trying to do is to reduce the
requirement for EVERY network operating within the aggregate to
carry traffic to the ENTIRE aggregate at all times.

Yes.

This ordinarily
would require announcing more specifics. So you propose a scheme
where you use an attribute instead of the more specifics. Unfortunately,
your attribute will cause the same behaviour in a receiver as
would the list of more specifics, and therefore is merely a compression
of the representation on the line that is somewhat better than, say, gzip.

IOW, I think you are solving the wrong problem.

I'm mostly trying to solve the memory problem, but it should also help
with (but certaintly not completely solve) the processing problem.

Since an updated bitmap is always the same size and it updates many routes
at a time, it should take less CPU power to process the updates. Also, you
could make a certain group of routers responsible for the more specifics
(this would work well if the prefixes are assigned geographically) and let
the others delay processing of the bitmaps or even drop the bitmaps
completely.

Your scheme does let one warn of black holes in this eventuality,
takes a bit less bandwith on the line, probably allows for the
"slosh" to happen all at once rather than in dribs and drabs,
and so forth, but it represents the same amount of work for the
routers processing the attribute. That is, those routers are
effectively brought inside the abstraction boundary of the "locus",
and as a result the goal of hiding information from those routers
is not met.

I think the only way to really know what the processing benefits of all of
this are is implementing it, or run detailed simulations, but those
require pretty much an implementation as well.

Note that bandwidth on the line is not an issue, BGP encodes the routing
information sufficiently efficient.

My gut feeling is that for any sizable "locus", almost all of
what we consider the core of the global routing system would be contained
within the new abstraction boundary, so we're no better off than
not aggregating in the first place.

That is, we are MUCH better off with PA addressing.

Suppose that every "P" would only announce a single "A". (I know, the
other 300 are important too, but just for the sake of argument.) Would
that solve the problem? Only if there is a limit on the number of ISPs. I
don't think there is such a limit. I have my own web and mail servers at
home, along with a router that can do BGP and handle incoming modem
connections. So basically, I'm my own ISP. I have recently helped a medium
sized business with their BGP and they became an "ISP" so they could get a
/20.

The only way we're ever going back to a 8k routing table in IPv6 is if
multihoming at the host level becomes a decent alternative. There is SCTP,
a transport protocol that will handle multiple source and destination IP
addresses, so when one path goes down, it will use another. (SCTP is
useless as a TCP replacement, though.) And there have been successful
experiments with adding this kind of functionality to TCP.

But the problem is that you can't just update a billion or so running TCP
stacks over night. Multihoming will be here for a while. Filtering is
coming back in style now, but it will go away when customers start to
notice they can't reach certain destinations through certain networks:
that's bad business. (It will also make multihoming even more attractive.)
So we either start to build better EGPs now, even if we don't have a new
algorithm that will magically make everything right, or start buying Cisco
and Juniper stock while it's low.

[snip]

The only way we're ever going back to a 8k routing table in IPv6 is if
multihoming at the host level becomes a decent alternative. There is SCTP,
a transport protocol that will handle multiple source and destination IP
addresses, so when one path goes down, it will use another. (SCTP is
useless as a TCP replacement, though.) And there have been successful
experiments with adding this kind of functionality to TCP.

[snip]

I've been being good about keeping my multi6 advocacy off of nanog, but I
have to correct here: SCTP can be used as a full replacement of TCP as it
is a strict superset, it also can replace UDP for many applications.

As soon as the SCTP TCP-like API is finished in the Linux kernel SCTP
implimentation I'll be making the minor changes to a few apps (lynx,
openssh, and apache for starters) to demonstrate how easily TCP
applications can be transisitoned to SCTP for multihoming support (SCTP
has a number of additional advantages that would be useful, such has
multiple streams which would require more then a simple search and
replace).

The bitmaps are generated inside the source AS (presumably, iBGP will
still carry regular routes) and the bitmaps are transmitted from one
network to another, so there is no requirement for full interconnetion at
the routing level.

The trouble with using 1 bit to represent 1 prefix is that there is
a need to move more than 1 bit of information per route between
AS's (think AS paths for loop detection, communities etc.).

In iBGP the situation is worse as you have more information
you want to carry (next hop, localpref), but you seem to
envisage this only to replace eBGP. So all you are doing
is compressing the data stream (after making some simplifying
assumptions some of which I don't believe hold up). As you have to
translate your bitmap back to/from iBGP in order to propagate
announcements across the AS, you might as well consider the
simpler alternative of just compressing the eBGP. However,
you're increasing processor power here, rather than decreasing
it.

It would probably be possible to compress information on
stub nodes, or nearly stub nodes much further (but you
can do that effectively with outbound route filters),
and, to a limitted extent reduce their visibility in
the middle of the network (think proxy-aggregation)
but we have existing tools to do this.

The 'real' solution is to hierachicalize (sp?) or indirect
the routing tree such that reachability information
for common multihomed configurations does not in
general reach the core's of most people's networks [*].
I suspect technologies similar to mobile IP may
have application here.

[*] or reduce the routes held in most of the routers
in most people's networks, which, without wishing
to start another flame war, is a claim occasionally
made for MPLS networks in that non-edge LSR routers
need not carry any BGP table, 'merely' an LIB. However,
you still need to carry the prefixes in edge LSRs
so, stepping neatly around the flame-fest, this
seems to me an incomplete solution.

That is like replacing passenger trains by freight trains. After all,
aren't passengers just one type of freight?

SCTP has a whole bunch of features that are of no use to our current
applications, that all expect TCP. It would be very unwise to switch to a
new transport protocol just because it has one desirable feature that can
very easily be built in TCP.

Two modules that do 99% the same thing but with different code is bad
software design. And SCTP is not backwards compatible with older TCP
implementations or access filters or firewalls or anything.

The trouble with using 1 bit to represent 1 prefix is that there is
a need to move more than 1 bit of information per route between
AS's (think AS paths for loop detection, communities etc.).

I think it is possible to aggregate this information for a relatively
large number of destinations. That means multihomers wouldn't be able to
set communities for their routes, but at least they'd be reachable and
that has to count for something.

In iBGP the situation is worse as you have more information
you want to carry (next hop, localpref), but you seem to
envisage this only to replace eBGP.

I answered a bit too soon. I meant that the full information should be
carried in iBGP on the originating network (and not in transit networks),
but this is not really necessary either, if you use an IGP. (But some
networks use iBGP rather than an IGP to carry customer routes
internally.)

s/TCP/IPv4/
s/SCTP/IPv6/

Interesting to read that way... and it explains why SCTP isn't even known
to most of the folks I deal with on a daily basis, much less in any sort of
wide deployment.

Me, I prefer to build a new car that's fully up to new design specs, rather
than try to retrofit rocket boosters onto the old Studebaker. This isn't to
claim, in any way, that "TCP is dead", mind you; but SCTP answers a fairly
fundamental set of problems, with a different set of design goals than TCP
and UDP were written for. Trying to mangle TCP to accomodate those goals
seems likely to produce more confusion than viable code.

BTW, SCTP is just as compatible with filters and firewalls as any other IP
based protocol. It has a protocol number and a public design spec. That few
of these implement the more advanced matching sets that can be used for TCP
is largely due to the catch-22 of router vendors not wanting to waste time
on writing code for it until people demand it, and people not demanding it
because said vendors don't support it, so how big can it really be? (Oh,
and on a sidenote: my Linux firewall will filter it just fine, without even
knowing what it is).

In any case. I agree with your assertion that TCP could be rewritten to do
the same thing as SCTP. I assert, in turn, that you would end up re-writing
most of the SCTP spec in the process, and have an equal amount of new (read
'buggy') code. As for the 'SCTP isn't backwards compatible with older TCP'
claim... uhm, TCP isn't backwards compatible with UDP, either. Your point?

[SCTP]

Me, I prefer to build a new car that's fully up to new design specs, rather
than try to retrofit rocket boosters onto the old Studebaker. This isn't to
claim, in any way, that "TCP is dead", mind you; but SCTP answers a fairly
fundamental set of problems, with a different set of design goals than TCP
and UDP were written for. Trying to mangle TCP to accomodate those goals
seems likely to produce more confusion than viable code.

SCTP is a protocol designed to carry telephony signalling.

Being able to use multiple IP addresses per session is not something that
is inherently more appropriate for telephony signalling than for network
applications that use stream-based communication. It is a nice option to
have for any transport protocol.

So unless there is _another_ reason why SCTP is appropriate for a certain
application, it seems pretty clear to me that using TCP, which was
designed to work with the protocols we use on the Net, and is the
transport protocol applications expect, is much more appropriate.
Extending TCP to use multiple IP addresses is not a problem. TCP has been
extended in many ways in the past. And an experimental implementation has
been available for four years.

As for the 'SCTP isn't backwards compatible with older TCP'
claim... uhm, TCP isn't backwards compatible with UDP, either. Your point?

But nobody is proposing to have applications built for UDP run over TCP.

I'm not against implementing new protocols that aren't backward
compatible, but I'm merely saying that in this case the benefits are too
small. And comparing this to IPv6: how many people are using IPv6 today?
Sometimes it is necessary to forego backward compatibility, but that
decission should never be taken lightly.

[SCTP]

> Me, I prefer to build a new car that's fully up to new design specs, rather
> than try to retrofit rocket boosters onto the old Studebaker. This isn't to
> claim, in any way, that "TCP is dead", mind you; but SCTP answers a fairly
> fundamental set of problems, with a different set of design goals than TCP
> and UDP were written for. Trying to mangle TCP to accomodate those goals
> seems likely to produce more confusion than viable code.

SCTP is a protocol designed to carry telephony signalling.

And bears about as much resembleance to this origion as TCP does to the
military's origional purposes for having a network.

Being able to use multiple IP addresses per session is not something that
is inherently more appropriate for telephony signalling than for network
applications that use stream-based communication. It is a nice option to
have for any transport protocol.

Agreed.

So unless there is _another_ reason why SCTP is appropriate for a certain
application, it seems pretty clear to me that using TCP, which was
designed to work with the protocols we use on the Net, and is the
transport protocol applications expect, is much more appropriate.
Extending TCP to use multiple IP addresses is not a problem. TCP has been
extended in many ways in the past. And an experimental implementation has
been available for four years.

RFC/Draft/URL/code? I have yet to see anything which allows the sort of
clean and direct setup which SCTP does, but I certainly haven't made an
exhaustive search of the field. Certainly, if it addresses all of the same
issues while being more compatible and requiring fewer changes, I would be
all for it.

> As for the 'SCTP isn't backwards compatible with older TCP'
> claim... uhm, TCP isn't backwards compatible with UDP, either. Your point?

But nobody is proposing to have applications built for UDP run over TCP.

I might argue that, but it would degenerate into nitpicking. However, I
will grant that a conversion to SCTP would affect a significantly larger
portion of the network than any example I could present as a counter.

I'm not against implementing new protocols that aren't backward
compatible, but I'm merely saying that in this case the benefits are too
small. And comparing this to IPv6: how many people are using IPv6 today?
Sometimes it is necessary to forego backward compatibility, but that
decission should never be taken lightly.

I believe that was part of my point, in starting. Both SCTP and IPv6
provide benefits. However, neither appears to be making much headway in the
direction of being adopted by the majority of the Internet. I really wonder
whether any such major change will, since it is no longer practical for a
central agency to say "support for <X> protocol will cease as of <date>".