Traffic Engineering (fwd)

See, the really neat thing about the 'net is it *removes* the geographical
locality as a barrier.

People have interests, very specific interests. The number of people
interested in following alt.barney.die.die.die are geographically
dispersed, but the Internet brings them together in a virtual community.

Search engines, as primitive as they are now, make it much easier to find
whatever specific item you're looking for, and odds are overwhelming that
it's not on your neighbors server.

So perhaps what we need is a way for search engines to determine what's
"close" - geographically, politically, or speed-wise. This isn't particularly
easy to do, but if it was implemented and only worked, say, 15% of the time,
it'd still make things look that much faster.

Idea: what about a search engine that understands a BGP table? I'm thinking
that something like Hotbot, which returns search results with several places
to find the same page, goes through a process like this:

1) perform the query.
2) if your query returns multiple places to get the same page
    a) look at the AS_PATH for the querying IP address
    b) look at the AS_PATHs for the found pages
    c) Determine and return the "closest" one - perhaps the one
        whose AS_PATH is most like that of the querying host.

This is a bit rough (off the top of my head, first thing in the morning), but
you could do a bunch with it. Search engines, for example, that optimize for
search speed vs. retrieval speed, come to mind.

Anybody out there have any spare venture capital? :slight_smile:

eric

osborne@terra.net writes:

So perhaps what we need is a way for search engines to determine what's
"close" - geographically, politically, or speed-wise. This isn't particularly
easy to do, but if it was implemented and only worked, say, 15% of the time,
it'd still make things look that much faster.

How do you plan to accumulate a priori knowledge of
distant topology and connectivity using current routing
protocols and the current transport addressing scheme?

Idea: what about a search engine that understands a BGP
table?

Whose BGP table? Remember that you want to determine what
is most local to the client or its proxies.

1) perform the query.
2) if your query returns multiple places to get the same page
    a) look at the AS_PATH for the querying IP address
    b) look at the AS_PATHs for the found pages
    c) Determine and return the "closest" one - perhaps the one
        whose AS_PATH is most like that of the querying host.

(c) is full of landmines thanks to such nifty things as
aggregation, the single-view propagation feature,
deliberately non-unique addresses and change and
instability of intermediate topology from moment to
moment.

Anybody out there have any spare venture capital? :slight_smile:

Since you are trying to get it to work correctly with an
addressing scheme which only very weakly encodes
topological information, the lossy DV approach to
propagating routing information (as opposed to a
map-exchanging scheme), three huge churny databases
(the mapping of information to URL, the mapping of
hostname to IP address and the mapping of IP addresses to
paths) and attempting to come up with a non existant
database or workable heuristics (the mapping of n observed
paths to a graph of connectivity among m endpoints), I
would say that you need the level of funding you could
only raise from such lucrative business as the Psychic
Friends Network.

Meanwhile, I suggest you look at Dave Clark's distributed
database work (I think I remember Van Jacobson commenting
in more detail than his "How to Kill the Internet"
viewgraphs on how to apply this to the WWW) and consider a
scheme where rather than a database which centralizes
searches for a weak data architechture, a better
architecture and a scheme which treats every reference
into it as a search for the most local copy would be a
better development direction.

Note that since this seems to be possible through feature
accretion upon the current practice of aggressive
interception of WWW queries, you probably want to think
about whether time-to-market issues lead you into
developing on that type of platform. (Several people
reading this message are heavily into researching that
sort of thing already, btw.)

  Sean.

Since you are trying to get it to work correctly with an
addressing scheme which only very weakly encodes
topological information, the lossy DV approach to
propagating routing information (as opposed to a
map-exchanging scheme),

DV = Direction Vector, i.e. paths like in BGP

For more info on maps check out the big-internet archives for April
ftp://munnari.oz.au/big-internet/list-archive/1997-04-Apr

Note that since this seems to be possible through feature
accretion upon the current practice of aggressive
interception of WWW queries,

This is essentially what a Squid http proxy cache does. And now companies
like mirror-image and Cisco are coming out with transparent proxy caches
that intercept port 80 traffic so this technology is likely to become more
widespread even in North America. If only those vendors would make their
software compatible with Squid's parent/sibling protocols for sharing cache
contents then it would be even easier to offload a significant amount of
web traffic onto caching proxies.