too many routes

AGS+'s only could handle 16meg, the cpu in a AGS+ is the same as in a 7000
series, (motorola 68040) As of a year ago, I believe I heard that sprint
still had AGS+'s in their backbone and were upgrading them to 7000 series
equipment.

-- Jason
Jason Vanick ------------------------------------------ jvanick@megsinet.net
Network Operations Manager V: 312-245-9015
MegsInet, Inc. 225 West Ohio St. Suite #400 Chicago, Il 60610

Having hopelessly screwed up my facts ... I was trying to make a point here.
So the router was worse than I thought. Retaing policies that exclude
new players because of AGS+'s inability to handle large routing flaps
just does not cut it.

Sprint imposed this at a time when 7000s with 64M of memory where available.
Will /19 remain policy when the majors are running with Cisco 12000 and GRFs?

The CPU issue has more to do with changes in the routing table rather than
the size. Aggrigation is good because if properly implemented it reduces
router flaps. If aggregation is the goal then mechanisms should be developed
for exchanging CIDR blocks so the address space can be re-packed.

The /19 policy is archaic. It creates an obstacles and only partly resolves
the problem. Fixing holes in CIDR blocks, exchanging fragmented blocks for contiguous
blocks, and cleaning up "The Swamp" can do more for the stability and size of
the routing table.

BTW - If you use a route server to do the dampening and calculation of peer
routes you can even make a wimpy CPUed 7000 handle backbone traffic.

Having hopelessly screwed up my facts ... I was trying to make a point here.
So the router was worse than I thought. Retaing policies that exclude
new players because of AGS+'s inability to handle large routing flaps
just does not cut it.

Sprint imposed this at a time when 7000s with 64M of memory where available.
Will /19 remain policy when the majors are running with Cisco 12000 and GRFs?

Lots of stuff cut out.

BTW - If you use a route server to do the dampening and calculation of peer
routes you can even make a wimpy CPUed 7000 handle backbone traffic.

This is what I beleave sprint is doing. They are using the new Cisco 12000
GSR with external router servers. It is a smart way of patching the
problem. If you need more CPU or memory you can just add a bigger box and
more RAM.

Nathan Stratton President, CTO, NetRail,Inc.

When the class A's are chopped up into CIDR blocks, the number of routes
in the backbone routing tables will dramatically increase, even if all of
them are /19 or larger. Saying that instead of filtering routes Sprint
should have just waited for everyone to clean up the routes as you suggest
seems silly -- they did what they had to do to keep their network
operational. They can't afford to rely on other people cleaning up their
mess.

I am not sure what policies you think should be there that aren't -- we are
renumbering existing blocks into our own /18, and will be returning those
blocks to our upstream providers. They get the benefit of having their
address space back, our aggregates drop in half, and we get the benefit of
the full use of our multiple connections. I don't see a problem with that,
and there was no difficulty getting the address space telling them what
we were going to do.

John Tamplin Traveller Information Services
jat@Traveller.COM 2104 West Ferry Way
205/883-4233x7007 Huntsville, AL 35801

"Joseph T. Klein" <jtk@titania.net> writes:

Having hopelessly screwed up my facts ... I was trying to make a point here.
So the router was worse than I thought. Retaing policies that exclude
new players because of AGS+'s inability to handle large routing flaps
just does not cut it.

Sprint imposed this at a time when 7000s with 64M of memory where available.
Will /19 remain policy when the majors are running with Cisco 12000 and GRFs?

I am not sure what point you are trying to make here.

If you feel like grinding this axe again, I am more than
willing to play in my fleeting spare moments.

An AGS+ with a CSC/4 has exactly the same CPU as a 7000
with an RP. In fact, AGS+ performance is slightly higher
in some cases because of some interesting design features
of the 7000 and the other AGS+ downgrade path routers
(notably the 7500 series).

I did not put in a filter on /18s (yes it was /18s
initially, and got changed to /19s after much discussion
with the registries, especially Daniel Karrenberg at RIPE,
in an attempt to harmonize Sprint's filters with
slow-start allocation policies) because of the AGS+
difficulties; all the routers that were carrying full
routing at the time had 64Mbytes of RAM, and the two
remaining AGS+es were there to implement historical things
done principally for ICM (like a STUN connection and the
PANAMSAT router).

What triggered the filter was the observation that in
the blocks freshly allocated by all three registries were
very poorly aggregated. More annoyingly, those allocated
to Sprint's principal peers (most notably Internet MCI)
demonstrated the worst aggregation; in one case a /14 was
announced almost exclusively as prefixes no shorter than
19 bits.

After spending some time trying to chase this down -- with
some success, as in the case of PSI's then newest blocks,
but not in the case of Internet MCI, who did nothing -- I
decided to issue a warning that once the /8 that the
InterNIC was using had filled up (and after some
discussion, once RIPE and APNIC proceeded to allocate from
new /8s), I would begin filtering all new unicast
addresses to ignore Sprintward announcements of any prefix
longer than 18 bits. Moreover, I also announced that I
would filter out any subnets of historically classful As
and Bs.

The warning was several months old when people started
noticing that they couldn't reach things behind
Sprintlink, and alot of time was spent explaining to
people that this shouldn't have surprised them at all.

Some changes happened, notably I dropped down to 19 bits,
the registries began to explain to people that anything
longer than that almost certainly would not be routable,
and that allocation != routing.

This measurably flattened the growth curve of the number
of prefixes seen by default-free routers, changing it from
a nearly exponential function to a linear one, with the
slope below that of Moore's law.

In other words, it probably as much as the initial
introduction of supernetting as a concept acted to keep
the Internet scalable while it continued to use the
current set of routing protocols.

If aggregation is the goal then mechanisms should be developed
for exchanging CIDR blocks so the address space can be
re-packed.

It is time for everyone to learn a term that unfortunately
I did not invent: IPv4ever.

NAT and other clever gatewaying effectively provides a
mechanism to extend the address lifetime expectancy not
only of the IPv4 unicast address space in general, but of
any given host in particular.

That is, there are now mechanisms which can hide address
changes from hosts that deal with address changes badly,
while at the same time there is increasingly good software
to assist with renumbering hosts.

There are mechanisms evolving which ultimately should lead
to nearly any given unicast subnet of 0/0 to be perceived by
everything else as having a different number than things
within that subnet believe. Moreover, there are also
mechanisms evolving which will cause nearly any given unicast
subnet of 0/0 to renumber itself so that all the numbered
entities under that subnet renumber into a different
unicast subnet of 0/0.

This alone should give rise to maximal aggregation, and if
combined with schemes which overload some addresses or
which simply compress sparsely populated large subnets
into densely populated smaller ones, should eliminate a
large percentage of address waste.

In other words, the mechanism(s) you allude to are being
worked on. I would like to see them applied to the swamp
within the next year or two.

The "IP addresses never change within the lifetime of a
session" and "IP addresses are end-to-end" crowds who have
misthought a number of protocols will probably fight tooth
and nail to see this never happen. Mind you, they are
mostly the same people who fought tooth and nail against
the idea of renumbering in the first place, so one can
expect roughly the same type of "discussions".

The /19 policy is archaic. It creates an obstacles and
only partly resolves the problem. Fixing holes in CIDR
blocks, exchanging fragmented blocks for contiguous
blocks, and cleaning up "The Swamp" can do more for the
stability and size of the routing table.

If you have an implementation of something Sprint and its
competitors who now do precisely the same filtering can
buy that can cause the swamp to be aggregated into a
small handful of prefixes from their point of view, then I
can point you at people who would be happy to sign a
cheque.

The only real problem I saw in the implementation of the
/19 filter was the bad press generated by people who
refused to listen to registries' warnings that long
prefixes probably would not be globally routable, and
possibly the lack of a tariff which would have allowed
people with money to purchase exceptions in the Sprint
filters.

BTW - If you use a route server to do the dampening and calculation of peer
routes you can even make a wimpy CPUed 7000 handle backbone traffic.

The wimpy 7000 still has to receive at least one copy of
the NLRI, and process changes into the forwarding
table(s).

As the number of prefixes increase, even if the level of
"background" noise (the rate at which a large set of
prefixes demonstrate instability that is considerably less
than that which would be prevented even by very aggressive
route dampening) were to remain constant, you require more
CPU even in the simple case of receiving and installing
modified forwarding tables. In the absence of any
feedback mechanism that holds down the total number of
globally visible prefixes, the increase in CPU
requirements could easily outstrip Moore's law and
overwhelm even state-of-the-art processors in a matter of
time.

Note that the liklihood of keeping up with the economics
of dealing with things which are CPU bound and ill suited
to parallel processing and which grow along the same slope
or on a slightly greater slope than Moore's law is small.

This describes the amount of BGP processing required prior
to the installation of the first prefix-length filters at
Sprint's border routers.

I was always open to suggestions that would accomplish the
same result, and helped push Cisco to develop two of them
(a large cleanup of some of their BGP implementation's
processing and an implementation of something very close
to Curtis Villamizar's route flap dampening algorithm),
however I have yet to see suggested something that
eliminates the need for such filters that is readily
deployable and which will keep the slope of processing
requirements below that of processing capability.

I still am, I belive my successors at Sprint and
like-minded people at other ISPs who implement
prefix-length filtering are too.

Until such a thing emerges, however, I continue to believe
that inbound prefix-length filtering is a good policy that should
be implemented universally.

  Sean.

Nathan Stratton <nathan@netrail.net> writes:

This is what I beleave sprint is doing. They are using the new Cisco 12000
GSR with external router servers. It is a smart way of patching the
problem. If you need more CPU or memory you can just add a bigger box and
more RAM.

The GSRs are to move bits fast. They are not to act as
route servers.

Sprint recently announced that they are deploying 622Mbps
POS cross-country links, and the GSRs are the only things
you can put on the ends of such beasts that are available
today.

(Someone may point out that it may not be the only choice
for very long, however, it's the only one whose design I
know and have had any influence upon, and is therefore
frankly the only one I trust, although I don't expect the
people building the competing box will ship anything but a
good product even if they are now rather deeply in bed
with the cell-heads and about as transparent to most
observers as the Kremlin during the cold war... The
box being built by someone who actually worked very close
to the Kremlin during the cold war is also something I
don't know enough about, however that is much more due to my
neglect than due to organizational attitude.)

Remember that a current snapshot is not a good indicator
of future health. Sure the processing demands of routing
are such that the GSR's main CPU is largely idle after
converging with its neighbours, however there is still a
CPU spike during convergence and the processing load is
non-zero. If you increase the growth curve of the number
of prefixes, you increase the CPU demands during
large-scale convergence. You also statistically increase the
CPU demand after convergence unless you reduce the
probability of any given prefix transitioning from up to
down or vice versa over time.

If you have a growth curve where the processing load
during large-scale convergence and the processing load to
handle background noise increases beyond processing
capacity faster than processing capacity can be increased,
you lose, no matter how crunchy your box is today relative
to today's processing demands.

In other words, removing the feedback mechanisms on the
growth of the number of globally-visible prefixes and on
prefix instability is probably a really bad idea.

  Sean.

... I decided to issue a warning that once the /8 that the
InterNIC was using had filled up (and after some
discussion, once RIPE and APNIC proceeded to allocate from
new /8s), I would begin filtering all new unicast
addresses to ignore Sprintward announcements of any prefix
longer than 18 bits. Moreover, I also announced that I
would filter out any subnets of historically classful As
and Bs.

But you didn't actually do it then.

The warning was several months old when people started
noticing that they couldn't reach things behind
Sprintlink, and alot of time was spent explaining to
people that this shouldn't have surprised them at all.

And the reason they started noticing it later was because
only then did you actually implement the filters, without
further notice, instead of much earlier. The surprise was
due to the fact that you did it without further notice
and not the fact that you did it.

--Kent

"Kent W. England" <kwe@geo.net> writes:

But you didn't actually do it then.

Ok, true.

And the reason they started noticing it later was because
only then did you actually implement the filters, without
further notice, instead of much earlier. The surprise was
due to the fact that you did it without further notice
and not the fact that you did it.

Hey, people should take notes when I rant.

However you're right and as I said a long time ago it
probably would have been helpful to try to coordinate
things a little better.

On the other hand, the unilateral action completely
eliminated any argument that there was collusion between
Sprint, MCI and the rest, which maybe was a good thing at
the time.

That people still yell about this is sort-of funny.
Do you think people would be screaming less about
allocation policies if I had been doing a count-down
between the final warning and the actual implementation?
(That's an actual question with obvious future operational
impact rather than a rhetorical point. Really.)

  Sean.