I note that the recommended command in that article, "mls cef
maximum-routes ip 1000", will throw most of your IPv6 routes out of
the TCAM instead. Which if you have any IPv6 traffic of substance just
kills you in the other direction. Might want to try something more
like "mls cef maximum-routes ip 900".
"According to NANOG, and complaints tracker DownDetector, many Internet
providers — including Comcast, Level3, AT&T, Cogent, Sprint, Verizon, and
others — have suffered from serious performance problems at various times
on Tuesday.”
While we had a few multi-homed customers have problems with their routers,
we did not see anything in the core. Is this just a ZDNET reporting error?
Unless you guys are miraculously managing to terminate
Nx100G bundles into 6509s with Sup2 or sup3s, I would
be really, really surprised if this even made it on your
radar. Chalk it up to poorly-researched reporting.
And if you *are* handling Nx100G bundles on 6509s,
please contact me off-list, I need to get the details on
your source for magic router pixie dust.
"According to NANOG, and complaints tracker DownDetector, many Internet
providers �� including Comcast, Level3, AT&T, Cogent, Sprint, Verizon, and
others �� have suffered from serious performance problems at various times
on Tuesday.��
While we had a few multi-homed customers have problems with their routers,
we did not see anything in the core. Is this just a ZDNET reporting error?
Unless you guys are miraculously managing to terminate
Nx100G bundles into 6509s with Sup2 or sup3s, I would
be really, really surprised if this even made it on your
radar. Chalk it up to poorly-researched reporting.
There are/have been multiple fiber provider outages the past two days, but I suspect there's always a fiber cut / outage somewhere.
And if you *are* handling Nx100G bundles on 6509s,
please contact me off-list, I need to get the details on
your source for magic router pixie dust.
Unless you guys are miraculously managing to terminate
Nx100G bundles into 6509s with Sup2 or sup3s, I would
be really, really surprised if this even made it on your
radar. Chalk it up to poorly-researched reporting.
And if you *are* handling Nx100G bundles on 6509s,
please contact me off-list, I need to get the details on
your source for magic router pixie dust.
It made the radar with the consumer impact. We traced the issue quickly
to customer datacenter routers/512K and worked with them to correct. We
were surprised (or not really) with this being called a wide spread
provider issue. Just checking if others really had an issue or was this
isolated to a few data centers.
We went with 768 - enough time to replace the routers with ASR9010s. It is merely a stop-gap measure to give everyone time to replace their routers in an orderly fashion.
The same people who, knowing the 6509 had this default config issue, and
neither replaced the gear nor did the reconfig to buy time *before* the
wall got hit, are going to replace said 6509 in orderly fashion?
Hank, you gotta learn to wear respiratory apparatus when working near
open containers of magic router pixie dust - that stuff can screw you up
if you inhale it.
512K routes, here we come. Lots of TCAM based routers suddenly become
really expensive doorstops.
We had a planned outage yesterday 2300 UTC to perform the operation Hank
mentions. Alas, around 0850UTC the table went "critical" and we had to
do an emergency reboot. Well, the good part is that all 10G line cards
survived, and we're back in operation. The new routers are bought or
in the investment plan for this year. Just need to wait until it's time
for our vendors fiscal year end race...
We went with 768 - enough time to replace the routers with ASR9010s. It is
merely a stop-gap measure to give everyone time to replace their routers in
an orderly fashion.
The same people who, knowing the 6509 had this default config issue, and
neither replaced the gear nor did the reconfig to buy time *before* the
wall got hit, are going to replace said 6509 in orderly fashion?
Sadly enough:
folk running BGP on 65xx and taking full tables who are not plugged
into NANOG / the community. In many cases they are single homed
enterprise folk, but run BGP anyway (because com consultant set it up,
some employee with clue did it years ago and then left, etc).
B: they *did* know about the issue, but convincing management to spend
the cash to buy hardware that doesn't suck was hard, because
"everything is working fine at the moment" -- some folk needed things
to fail spectacularity to be able to justify shelling out the $$$ (
yes, they could recard the TCAM, but they are using this as an excuse
to get some real gear)...
Am I overly cynical, or does this all work out perfectly for some
vendors? I'm guessing that a certain vendor is going to see a huge
number of orders for new equipment, for an event that could have been
(and was) easily predicted... "Here, buy my widget... and then you'll
come back in a few years and buy another one.. <mwahahahah>".
Yup, folk purchasing these *should* have known (not like there was no
discussions of this), but, well, not everyone spends all day reading
NANOG / RIPE / CIDR report...
I am not an operator, but I used to be a *really* active routing
engineer once upon a time in the stone age and what really bothers
me is the serious lack of general awareness on the issue of routing
table size, aggregation, and stability, and what effect it has on the
global Internet.
If anyone *seriously* believes that IPv6 will have any positive effect
on this particular issue, you are sorely misinformed. If anything, it
will make the problem worse, since the ability to "get aggregation
wrong" will be much easier.
I'm not being cynical, I'm being a realist. :-/
- - ferg
p.s. I recall some IPv6 prefix growth routing projections by Vince
Fuller and Tony Li from several years ago which illustrated this, but
cannot find a reference at the moment....
Apologies for replying to my own post, but... below:
Am I overly cynical, or does this all work out perfectly for
some vendors? I'm guessing that a certain vendor is going to see
a huge number of orders for new equipment, for an event that
could have been (and was) easily predicted... "Here, buy my
widget... and then you'll come back in a few years and buy
another one.. <mwahahahah>". Yup, folk purchasing these *should*
have known (not like there was no discussions of this), but,
well, not everyone spends all day reading NANOG / RIPE / CIDR
report...
I am not an operator, but I used to be a *really* active routing
engineer once upon a time in the stone age and what really
bothers me is the serious lack of general awareness on the issue of
routing table size, aggregation, and stability, and what effect it
has on the global Internet.
If anyone *seriously* believes that IPv6 will have any positive
effect on this particular issue, you are sorely misinformed. If
anything, it will make the problem worse, since the ability to "get
aggregation wrong" will be much easier.
I'm not being cynical, I'm being a realist. :-/
- ferg
p.s. I recall some IPv6 prefix growth routing projections by Vince
Fuller and Tony Li from several years ago which illustrated this,
but cannot find a reference at the moment....
I found it:
"Scaling issues with ipv6 routing+multihoming"
Vince Fuller, Cisco Systems
I think the slides [above] were done for an IAB routing workshop in ~2006.
Apologies for replying to my own post, but... below:
p.s. I recall some IPv6 prefix growth routing projections by Vince
Fuller and Tony Li from several years ago which illustrated this,
but cannot find a reference at the moment....
The raws workshop report makes for interesting reading, especially with
respect to how things actually turned out now that we're a decade on.
Apologies for replying to my own post, but... below:
p.s. I recall some IPv6 prefix growth routing projections by
Vince Fuller and Tony Li from several years ago which
illustrated this, but cannot find a reference at the
moment....
The raws workshop report makes for interesting reading, especially
with respect to how things actually turned out now that we're a
decade on.
We went with 768 - enough time to replace the routers with ASR9010s. It is
merely a stop-gap measure to give everyone time to replace their routers in
an orderly fashion.
The same people who, knowing the 6509 had this default config issue, and
neither replaced the gear nor did the reconfig to buy time *before* the
wall got hit, are going to replace said 6509 in orderly fashion?
Sadly enough:
A: not everyone knew about the issue - there are a large number of
folk running BGP on 65xx and taking full tables who are not plugged
into NANOG / the community. In many cases they are single homed
enterprise folk, but run BGP anyway (because com consultant set it up,
some employee with clue did it years ago and then left, etc).
I suspect this is true to some extent. Last NANOG had a record attendance and if I remember
correctly, 300(!!!!) NEW attendees.
Also, Philip Smith is STILL doing the BGP fundamentals tutorials with a full house every time. Granted
this is mostly around rest of world but there are new folks coming along all the time and while many
old timers are aware of all the historical info on route aggregation, this should be brought up ad nauseum
for new folks. Do enterprise type educational folks who include routing tutorials do anything with route
aggregation? Just wondering out loud.
B: they *did* know about the issue, but convincing management to spend
the cash to buy hardware that doesn't suck was hard, because
"everything is working fine at the moment" -- some folk needed things
to fail spectacularity to be able to justify shelling out the $$$ (
yes, they could recard the TCAM, but they are using this as an excuse
to get some real gear)…
Oh yeah, I'd bet this is also the case. Just like in 'security' related issues….