A call for the future. Was: Re: Verio Decides what parts ofthe internet to drop

Around 07:22 AM 12/8/1999 -0800, rumor has it that Randy Bush said:

The phone system doesn't require anything close to millions of routes for
LNP. Instead, at the time of call setup, there is a lookup that performs
the translation between the portable number (which is the logical address)
and the physical address (which to date is still mostly statically
routed using a well-defined hierarchy based upon physical location).

and here is where the anology breaks down. a second or two of call setup
may be acceptable for establishing a phone call. it would be a disaster
on a per-packet basis.

ip is a connectionless protocol. before hitting the reply key, think about
that.

I thought about it. It seems to me that a router is not presented with a stream of randomly addressed packets. In any time frame, there are going to be from 1 to many hundreds of packets between the same set of addresses.

Cisco takes advantage of this to with a route cache. When I was a junior monkey, one of my first tickets involved the route cache ACL bug in IOS 8.0, where routes added to the route cache were no longer checked against the appropriate ACL's. Of course, I didn't know immediately that was the real problem. It just seemed to work after I pinged (permitted in the ACL), and would continue to work (for something denied in the acl) for a while afterward.

So, while the first packet may result in a longer lookup, the successive packets hit the cached route. So now it becomes a problem to build an adequately sized route cache for the number of simultaneous 'connections' that one might expect to process in a given timeperiod. Which brings one back the terms and design rules that are not unlike those that apply to phone switches. The thing that is tough to grow is the access list size.

The connectionless part of IP is just a matter of whether state is maintained in the protocol or at the endpoints. An ISDN connection has a state at the endpoints and all the processing points. All the switches along the path must maintain that state. An ISDN packet in the middle of stream does not carry enough information to recreate the state. If you miss the setup message, you can't figure out the state from the rest of the packets. By contrast, an IP packet traversing an IP network carries all the state it needs with it. However, that doesn't mean that you need to start from scratch each time you see the same src/dest pair. Methods which hold some additional state in the router for faster processing can be used to speed things up.

The full route table could be very large. Much larger than 256Meg or even 4Gig. And even moving to disk backed storage (many gigs) most likely means that access would still be in tens or hundreds of milliseconds, not seconds. However, any given router doesn't really need to use very much of it at any given time.

The problem, as I see it, is that Cisco sees everything as a router. And a typically low powered router, as compared to mid and high range unix servers. Consider the Cisco H323 (VOIP) product line. Clearly, the 5300 makes sense as a router. It processes 4 PRI or T1 CAS voice lines into h323 on IP, along with the consequent DSP chips. The 2600 & 3600 make sense for lower density DSP platforms. Clearly, tranlating G.711 to G.729 or others requires specialized hardware.

However, the gatekeeper (call routing) software runs on a 3600. This is a grossly underpowered platform for gatekeeper functions, and underfeatured. Gatekeeper functions belong on a general purpose machine with modular (user replaceable) software and access to a database which can handle complex and quickly changing route policies (eg, time of day routing, followme routing). One desires a machine which can scale to handle hundreds or thousands of transactions per second (between a Sun ultra2 and an SGI Challenge w 64 processors). None of these functions are particularly suited to specialized routing hardware. Yet, Cisco insists that (nearly) everything must run on IOS. The best alternative right now runs on NT, with its lack of scalability and other PC problems. But the NT platform runs circles, triangles, squares, and other complex polyhedrons around the 3600.

Policy based routing in either a voice or an IP network require flexibility which can't be found in a specialized hardware platform. The problem is the wrong software & hardware combination to do the job, not that the route table has grown too large, or that the job just can't be done.

    --Dean

Dean, This is where you are showing the wrong things.

  You seem to be rambling on a variety of topics way off from the
problem.

  Problem: People trying to announce the improper class (or classless)
address out of the "classical classful space".

  Primarily the people who a large number of the inital /16's were
assigned to were universities, and the larger businesses.

  These days the people who obtained large quantities of this space
who have no use for it see a chance to make a profit off of it by "leasing"
it to another person, or the outright sale of the company, or some
semi-existant division of the company where the sale of the address space
goes along with it.

  All I have to say is "Let the buyer beware".

  Not everyone uses the same filtering policies, but the majority of
the tier-1 providers do. You are not in this boat of supporting this, nor
really running these types of environments in your day-to-day operations.

  Although adding memory and cpu, along with a non-caching routing
solution (such as CEF, or any type of '100% cache hit' routing system)
will make the ability to forward packets faster.

  The problem is that we do not have direct control over what the
vendors hand us in the backbone environment.

  If I wish to take an OC12 and do policy based routing on EVERY
packet that goes across it, it is not possible with the cpu power that
is put on there. This is what the majority of people are using these days.

  On the smaller ds(N) sized networks, it is more possible because
of the current ability to distribute the load, and the cpu power available
per packet, but once you get into this type of realistic environment,
you really can not perform this type of routing that you are suggesting.

  Even with this state information, if I were a router and spent
all my time comparing both my src and dst within the packet, I would more
than double my forwarding time, and cause undue increased latency.

  Now, given that there are ~4.3B ip addresses, given memory
consumption of lets say 512 bytes per ip, you're talking about 2.1Tb
of memory if you were to do allow every 32 bit ip to be routed
seperateley.

  This is not possible. This is why aggregation of the
routes, and these tactics are useful, because looking at it from
a common-sense point of view, we are saving ourselves a fair amount
of money instead of asking for a 7200/GSR/M20/M40 that can route
each ip seperateley and on a totally different path.

  Your cost for this box would make it really prohibitive.

  - jared

>ip is a connectionless protocol. before hitting the reply key, think about
>that.

I thought about it. It seems to me that a router is not presented with a
stream of randomly addressed packets. In any time frame, there are going
to be from 1 to many hundreds of packets between the same set of
addresses.

This is a major fallacy. Many promising local ISPs had experience with
this when they were smaller and more local. I am sure dkatz will tell you
of his experiences with trying to get caching algorithms to work well with
core network flows. In short, the core network flows cause so much churn
in cache memory that the working set of the cache tends to be the size of
the entire FIB to get adequate performance.

Caching does NOT work in context of tcp flows at the core level. Period.

the packets. By contrast, an IP packet traversing an IP network carries
all the state it needs with it. However, that doesn't mean that you
need to start from scratch each time you see the same src/dest pair.
Methods which hold some additional state in the router for faster
processing can be used to speed things up.

Once again, at the core of promising local ISP's, this does NOT work.

The full route table could be very large. Much larger than 256Meg or
even 4Gig. And even moving to disk backed storage (many gigs) most
likely means that access would still be in tens or hundreds of
milliseconds, not seconds. However, any given router doesn't really
need to use very much of it at any given time.

Full route table size is not a problem. You can burn a hard disk as you
mentioned to store it. The issue is getting data in and out of the
processor, i.e. number of pins. Core flows are not ameneable to caching.
This approach will fail the first time you see a new packet and need to
swap from hard disk.

/vijay

To: Dean Anderson <dean@av8.com>

why bother? procmail is your friend.

randy

Not that it would be very economical, but what are the technical
implications of using a solid state device (such as the Quantum's RUSHMORE NTE
series) instead of a normal hard drive?

-- Tim

Full route table size is not a problem. You can burn a hard disk as you
mentioned to store it. The issue is getting data in and out of the
processor, i.e. number of pins. Core flows are not ameneable to caching.
This approach will fail the first time you see a new packet and need to
swap from hard disk.

Not that it would be very economical, but what are the technical
implications of using a solid state device (such as the Quantum's
RUSHMORE NTE series) instead of a normal hard drive?

Interesting question... even though it's significantly faster than an hard
drive, it does have some inherent bottlenecks such as a maximum number of
operations per second which might be a little stifling on a backbone core
router :slight_smile: Still, I've never actually tried *that*, so don't know for sure.

There's also latency in other areas - that leads me to think that regular
memory is still faster. Finally, it would be attached to the host system
through a bus (SCSI, whatever) that's a lot slower than the internal
memory bus.

These kinds of devices tend to be a better fit for systems that doesn't
have extreme time limitations on processing data such as for mail spool
files, etc.

reality check:

if you had 100TB of on-ASIC SRAM you would still be screwed. you can't
afford the PER-PACKET LATENCY of telco number style portability REFERRAL.

once again: ip is a connectionless protocol. each packet is potentially a
new route.

telcos don't mind a second or two in call setup, because it is CALL SETUP,
not 42 times a second.

[ credit scott bradner for making this quite clear even to me in some ietf
bar ]

randy

Now, given that there are ~4.3B ip addresses, given memory
consumption of lets say 512 bytes per ip, you're talking about 2.1Tb
of memory if you were to do allow every 32 bit ip to be routed
seperateley.

2.1 terabytes...wow. but think about it this way: if you did manage
to build such a beast, a route cache would not be needed, as route
lookups would all be O(1). :slight_smile:

actually...if you *did* manage such a thing, you could get the memory
usage down a lot, since you'd just need to store next hop for each ip
address, and not so much stuff about linked lists and stuff.

Tim Wolfe <tim@clipper.net> writes:

Cisco takes advantage of this to with a route cache.

And, after much machination, Cisco replaced the route cache with a full forwarding table.

Caching doesn't work in the core. Really.

Tony