sub-basement multihoming (Re: Verio Peering Question)

The "BGP uninformed" ask, "Why can't traffic just choose one of
two paths?

The "BGP informed" ask that too. However, they know the technology
isn't quite up to this worthy trick:

magic behind the scenes ... "just works", and all traffic should
be able to use all of their connections.

... except where that is not desired for policy reasons (e.g.,
don't use the volume-charged connection when the flat-rate
connection isn't full).

These are *hard problems*, unfortunately, and are still
in the land of blue-sky research.

Meanwhile, the problem is that the demand to do fancy routing
things outstrips the Internet's current collective ability
to supply it. As a result, we have to say "no" (or more $ than
you can afford) to alot of things that seem worthwhile. One of
those things is "low-value prefixes", independent of who announces
them to the world.

I think that the demand is there -- current products just don't allow it.

That's the crux of the problem, independent of whose "fault" it
is that current products are not up to the task.

  Sean.

Date: Wed, 3 Oct 2001 07:45:00 -0700 (PDT)
From: Sean M. Doran <smd@clock.org>

> The "BGP uninformed" ask, "Why can't traffic just choose one of
> two paths?

The "BGP informed" ask that too. However, they know the technology
isn't quite up to this worthy trick:

Which is the point that I thought I made. Thanks for clarifying.

> magic behind the scenes ... "just works", and all traffic should
> be able to use all of their connections.

... except where that is not desired for policy reasons (e.g.,
don't use the volume-charged connection when the flat-rate
connection isn't full).

These are *hard problems*, unfortunately, and are still
in the land of blue-sky research.

Agreed.

In your particular example, one has the additional problem of being
a closed-loop system with state feedback. Let's add latency,
CoS, and packet length. It gets messy quickly.

Large, public interconnects could help address portability... but
those have problems of their own. Note recent concerns about all
eggs in one basket.

Is IPv8 ready yet? :wink:

Meanwhile, the problem is that the demand to do fancy routing
things outstrips the Internet's current collective ability
to supply it. As a result, we have to say "no" (or more $ than
you can afford) to alot of things that seem worthwhile. One of

Yes. Put bluntly, technology is not serving its users. It's the
oil-burning '73 Nova that won't die: far from ideal, but it
still runs, so we may as well use it instead of buying a new
car...

those things is "low-value prefixes", independent of who announces
them to the world.

> I think that the demand is there -- current products just don't allow it.

That's the crux of the problem, independent of whose "fault" it
is that current products are not up to the task.

I'd also argue that RIR policies need a little new life breathed
into them. IMHO, we're asymptotically approaching pre-CIDR days.

  Sean.

Eddy

I would really love to hear if anyone invented a way to do global routing
with anything better than combination of flooding of aggregated
reacheability information and defaut routes.

It is not technology per se, it's the underlying concept which is barely
adequate.

--vadim

PS I too have a pair of diverse DSLs and use combination of DNS
  (for ingress) and hash-based load sharing (for egress) packet
  routing. The resulting paths are sometimes hugely far from
        optimal.

PS I too have a pair of diverse DSLs and use combination of DNS
        (for ingress) and hash-based load sharing (for egress) packet
        routing. The resulting paths are sometimes hugely far from
        optimal.

Semi-OT: how do you get truly diverse DSLs? Don't they both go to the
same CO?

Grant

Maybe for anybody *ELSE* they'd terminate at the same CO, but I suspect
the usual rules don't apply to Vadim. :wink:

/Valdis

Anyone ever try using the RADWARE LinkProof ?
(or similar - are there any others ? )

<http://www.radware.com/content/products/link.htm>

It looks like a combination between link monitoring & NAT'ing internal
address the the "best" ISP's NetBlock

Thanks
- Rafi

I have not in fact used the product, but I was invited to a
presentation with lots of technical details. I then went for beers with
a couple of the techies, which was quite educational too :slight_smile:

The way it works is as follows:
- you put all your servers that you want redundant (it is hardly
  protocol-specific, which is good) in RFC1918 space.
- you hook up to a couple of ISPs, and get from each a block the same
  size as your RFC1918 block.
- you delegate DNS for any service you want redundant to the linkproof
  box/boxes (they can failover amongst themselves), one NS+A record
  for each ISP you have space from. The inherent failover in DNS
  caches/resolvers makes sure clients will always at least get a reply
  (this is the neat bit - the real failover is in DNS resolvers everywhere,
  not in the box itself).
- the box, continually monitoring rtt's and reachability of networks,
  returns the A record pointing to the most 'optimal' ISP for that
  client. This request then comes in, it NATs it to the RFC1918 space
  and handles it.

The neat thing is that it does not need a netblock big enough to get
through BGP filters - you just get a /24 or whatever from *each* ISP,
out of their larger netblocks.

The concept is nice, it sounds like it will work. I have, however,
never tried it so I can't vouch for the implementation.

Greetz, Peter [not affiliated with RadWare or anything]

The really neat thing is that you can do this with any nameserver. Install
N nameservers and connect each of them to one of your ISPs. These
nameservers are all masters, and all contain different data.

Each one responds with data relevant for the IP addresses of that ISP. If
all your links are up, people will get mixed responses. If one ISP is down,
that nameserver will stop answering, and hence after your TTL expires, no
requests will be made for those IP addresses.

It gets even better - recursing nameservers have the habit of locking in to
nameservers that respond quickest. So you even get some loadbalancing
awareness.

We operate nameservers in the US and in Europe, and we definitely see this
effect.

Regards,

bert

Date: Sat, 6 Oct 2001 19:17:39 +0200
From: bert hubert <ahu@ds9a.nl>

(top-posting due to length of original post)

Alas, the "after your TTL expires" is a killer. I don't want to
resurrect a thread that has been covered in the past couple of
months, but DNS just doesn't cut it for failover. Furthermore,
fast DNS response != fast HTTP response.

{Swamp space|non-Verio filtering policies} and BGP are the way to
approach this. For redundant DNS at a single site, IP and MAC
takeover are what one wants.

All IMHO.

Eddy

The really neat thing is that you can do this with any nameserver. Install
N nameservers and connect each of them to one of your ISPs. These
nameservers are all masters, and all contain different data.

If you have several nameservers all pretending to be masters for some zone
but offering different responses based on IP locality, predictive performance,
or any other criteria, then the name for this is: "broken."

Hrmm, no, that is called "Akamai", isn't it? :slight_smile:

... still collecting Heirloom-Quality Specimens of Pre-Owned Mirror-Image
Cache Routers for our Museum of Internet Treasures, drop us a line if you
still have any available....

Mary Grace
mary@ms.edu

The really neat thing is that you can do this with any nameserver. Install
N nameservers and connect each of them to one of your ISPs. These
nameservers are all masters, and all contain different data.

If you have several nameservers all pretending to be masters for some zone
but offering different responses based on IP locality, predictive

performance,

Hrmm, no, that is called "Akamai", isn't it? :slight_smile:

I wouldn't presume to speak about Akamai's methods. Perhaps Avi's listening.

... still collecting Heirloom-Quality Specimens of Pre-Owned Mirror-Image
Cache Routers for our Museum of Internet Treasures, drop us a line if you
still have any available....

How many have you located so far?

Better tell that to those that have built successful businesses on what
you call "broken."

Be aware that this is not my business model, by the way. Most people wanting
'global server load balancing' appear to buy Alteon Acedirectors.

Is 'behaviour not intended by the original rfc' your definition or 'broken'?
Many important features of today's internet violate at least the spritit of
previous standards, but are able to ride piggyback.

But the main question is, if this is "broken.", please elaborate what
exactly "breaks."

Regards,

bert

I suppose you don't do split-horizon DNS, then?

Greetz, Peter

Hrmm, no, that is called "Akamai", isn't it? :slight_smile:

I wouldn't presume to speak about Akamai's methods. Perhaps Avi's listening.

That's OK, he is one of the champions who is trying to fix the problems the
MIT guys created.

... still collecting Heirloom-Quality Specimens of Pre-Owned Mirror-Image
Cache Routers for our Museum of Internet Treasures, drop us a line if you
still have any available....

How many have you located so far?

If you are saying that you would like to contribute a specimen, especially
a particularly collectible early 1997 version with BSDlite 4.4 source prior
to the fixes for transparency of ecommerce traffic, please write me as
mary@ms.edu and our non-profit will be glad to respond!

We even have an original mid-70s Imsai/Altair 8800, as well as a PDP8a with
paper punch tape circa 1967, and even one of the first personal computers,
a circa 1970 TI with 16 mini toggle switches for clocking in programs and a
whopping 500bytes of memory. In addition to the standard pre-LED grain of
wheat bulbs output panel, it had an option for a Hollerith card reader
input. Not many people then who had any preference at all about getting a
life were punching out decks of cards for personal use. Good old Dancing
Man Jim Treybig gave us a circa 1984 Tandem NonStop, to go with our "Joe
Boyd" Harris 1000 with VOS from the same failed Tandem satellite program
"Infosat" that Tandem and Harris partnered on in 1985 in Santa Clara.

Your cache router was one of the first in its field, too. The history of
the internet is indeed richer fot it.

Paul Vixie <vixie@vix.com> writes:

> Hrmm, no, that is called "Akamai", isn't it? :slight_smile:

I wouldn't presume to speak about Akamai's methods. Perhaps Avi's listening.

It's reasonable to assume that Akamai is deterministic, and is not
dependent on the "Jeopardy Method" (who's-quicker-on-the-buzzer) that
Bert Hubert asked in an earlier message. Stating more than that is
above my pay grade.

If you insist on deploying the Jeopardy Method, please remember to
phrase your reply to a DNS query in the form of a question:

rs@bifrost [2] % dig @192.148.252.10 19.252.148.192.in-addr.arpa. any
...
;; ANSWER SECTION:
19.252.148.192.in-addr.arpa. 5M IN PTR what.is.dhcp-19.seastrom.com.
...

                                        ---Rob

There obviously is a need for an 'official' method to do global load
balancing using DNS. Let's face it, people are doing it now on a not
so large scale but that is rapidly changing because of the introduction
of both hardware and software solutions that (mis)use DNS to overcome
it's current limitation.

I'm not very interested in the discussion why this behaviour would be
broken. It's for more interesting to talk about improving DNS so that
there will be room for things like load balancing or dynamic DNS. In
such a way that people will not start screaming when they see TTLs of
30 seconds or non-linear behaviour of load balancers.

Regards,

Stefan

Date: Sun, 7 Oct 2001 02:14:27 +0200
From: Stefan Arentz <stefan.arentz@soze.com>

[ snip ]

I'm not very interested in the discussion why this behaviour
would be broken. It's for more interesting to talk about
improving DNS so that there will be room for things like load
balancing or dynamic DNS. In such a way that people will not
start screaming when they see TTLs of 30 seconds or non-linear
behaviour of load balancers.

Note: "Context-defined new terms" in double quotes

How probable would a different A RR be on the "next" query?
Perhaps we should look at BGP or other link state protocols as a
starting point... a failover-ready NS could negotiate "I tell you
when the A RR changes iff it happens within TTL[1]" behavior with
the far end -- useful for failover, but not load-balancing. Of
course, because DNS traffic is multihop, endpoint authentication
is more of an issue than with BGP.

[1] Not necessarily the same TTL as current DNS uses

Of course, the drawback with this approach is deployment: Look
at the reluctance of MCSE monkeys and |<0b4lt |<1dd13z^W^W^W^W^W
some net admins to patch critical bugs. Do you think that
they'll upgrade things at the edge to support a non-critical cool
new feature? Not likely. The onus for correct operation is on
the hosting provider.

Are we talking about dynamic balancing within a single site, or
across multiple locations? If the former, why not use gear a la
Extreme, thus 1) conserving IP space and 2) remaining transparent
to the outside world?

If distributing across multiple sites, one can use BGP to advert
the same subnet from different points... let routing protocols
route, and DNS give the same answer all the time. (Damn those
filters!) Ideally, the routing protocols could shunt "excess
traffic" from a "heavily loaded" site to a "lightly loaded" one.

Load balancing across multiple sites gets uglier. Either we have
incredibly short TTLs (sorry, AOL users[2]) or we need something
else. Perhaps storing multiple routes (woops, more route memory)
and use some sort of ECN?

[2] I personally find it tempting to say, 'screw anyone who uses
looong TTLs with flagrant disregard to the authoritative host's
wishes'... allowing bad behavior to become a de facto standard by
virtue of customer base is _not_ sound engineering.

All-pairs-shortest-paths gives nice info... until you look at
scalability, or the lack thereof. O(N^4) cpu time[3] and a few
times as much RAM? Ughh.

[3] IIRC, O(N^3 * log(N)) is do-able. However, standard APSP
does not record paths... only path lengths. Minus two points for
chewing up even more CPU.

I guess that the big questions would be:

1. How often do changes occur?
2. How sparse are "rapidly changing" values wrt the entire graph?
3. Distribution across multiple sites?
4. What do we leave up to DNS?
5. What do we leave up to routing?

If heavily enough distributed, congestion should be highly
localized... if present at all. Let's assume that a "basic
server" can service 10 Mbps of traffic. Install servers in
various points, a la Akamai... any traffic sinks here ever manage
to overload your Akamai boxen? If so, how often compared to a
random traffic source.

Whatever we do, we must keep state as far from the core as
possible. State in core baaad.

I've rambled enough. CTRL+X with no further edits.

Eddy

Is 'behaviour not intended by the original rfc' your definition of 'broken'?

Nope.

But the main question is, if this is "broken.", please elaborate what
exactly "breaks."

I take it that unless I can point to some specific situation in which some
specific application or user community is negatively impacted by this,
you'll go on assuming that this deviant behaviour is merely an exercise in
creativity.

In that case, discussion would be pointless.

If that's not the case, though, consider that a correct implementation of
DNS would be within its rights to take note of the "same serial number but
incoherent answers" condition and declare the zone unreachable. I'm not
saying that BIND will ever do this (nor that it won't!), but I will say
that I wish it had done this all along, since this is not by far the only
erroneous configuration that such logic would have detected. It's possible
that DNSSEC, if deployed, will cause these erroneous configurations to be
detected and properly dealt with.

If you're doing something that a correct implementation (which merely happens
not to exist yet) could correctly treat as an error, then what you're doing
fits my definition of "broken."

If you think that the protocol specification is simply out of date and needs
to take account of this kind of intentional incoherence, then you are welcome
to try updating it.

DNS is a distributed, autonomous, reliable, coherent database -- not a
mapping service.

DNS is about fact, not value -- it's about mechanism, not policy.

No matter how you slice it, intentionally incoherent DNS zones are "broken."