So Philip Smith / Geoff Huston's CIDR report becomes worth a good hard look today

Suresh_Ramasubramani · August 12, 2014, 4:10pm

512K routes, here we come. Lots of TCAM based routers suddenly become
really expensive doorstops.

Maybe time to revisit this old 2007 nanog thread?

http://www.gossamer-threads.com/lists/engine?do=post_view_flat;post=99870;page=1;sb=post_latest_reply;so=ASC;mh=25;list=nanog

FYI nanog - https://puck.nether.net/pipermail/outages/2014-August/007091.html

[outages] Major outages today, not much info at this time

Teun Vink teun at teun.tv
Tue Aug 12 11:42:05 EDT 2014

Hank_Nussbacher1 · August 12, 2014, 6:02pm

Many don't need to buy anything new. Just follow the instructions here:
http://www.cisco.com/c/en/us/support/docs/switches/catalyst-6500-series-switche$
We did this in the 1st week of June. Problem solved.

-Hank

Hank_Nussbacher1 · August 12, 2014, 6:42pm

http://www.cisco.com/c/en/us/support/docs/switches/catalyst-6500-series-switches/117712-problemsolution-cat6500-00.html

-Hank

Leo_Bicknell1 · August 12, 2014, 9:55pm

s/Problem solved/Critical limit pushed out long enough to give us a few more years/

William_Herrin · August 12, 2014, 10:10pm

I note that the recommended command in that article, "mls cef
maximum-routes ip 1000", will throw most of your IPv6 routes out of
the TCAM instead. Which if you have any IPv6 traffic of substance just
kills you in the other direction. Might want to try something more
like "mls cef maximum-routes ip 900".

Regards,
Bill Herrin

Tom_Hill · August 12, 2014, 10:15pm

And if you want any MPLS labels (especially if running 6PE) you might
want to claw that back a bit further.

tl;dr buy new routers next year.

Tom

McElearney_Kevin · August 13, 2014, 12:06am

http://www.zdnet.com/internet-hiccups-today-youre-not-alone-heres-why-70000
32566/

"According to NANOG, and complaints tracker DownDetector, many Internet
providers — including Comcast, Level3, AT&T, Cogent, Sprint, Verizon, and
others — have suffered from serious performance problems at various times
on Tuesday.”

While we had a few multi-homed customers have problems with their routers,
we did not see anything in the core. Is this just a ZDNET reporting error?

- Kevin

Matthew_Petach2 · August 13, 2014, 12:49am

Unless you guys are miraculously managing to terminate
Nx100G bundles into 6509s with Sup2 or sup3s, I would
be really, really surprised if this even made it on your
radar. Chalk it up to poorly-researched reporting.

And if you *are* handling Nx100G bundles on 6509s,
please contact me off-list, I need to get the details on
your source for magic router pixie dust.

Matt

Jon_Lewis1 · August 13, 2014, 1:34am

http://www.zdnet.com/internet-hiccups-today-youre-not-alone-heres-why-70000
32566/

"According to NANOG, and complaints tracker DownDetector, many Internet
providers �� including Comcast, Level3, AT&T, Cogent, Sprint, Verizon, and
others �� have suffered from serious performance problems at various times
on Tuesday.��

While we had a few multi-homed customers have problems with their routers,
we did not see anything in the core. Is this just a ZDNET reporting error?

Unless you guys are miraculously managing to terminate
Nx100G bundles into 6509s with Sup2 or sup3s, I would
be really, really surprised if this even made it on your
radar. Chalk it up to poorly-researched reporting.

There are/have been multiple fiber provider outages the past two days, but I suspect there's always a fiber cut / outage somewhere.

And if you *are* handling Nx100G bundles on 6509s,
please contact me off-list, I need to get the details on
your source for magic router pixie dust.

Cisco white papers. Where else?

McElearney_Kevin · August 13, 2014, 1:54am

Unless you guys are miraculously managing to terminate

Nx100G bundles into 6509s with Sup2 or sup3s, I would
be really, really surprised if this even made it on your
radar. Chalk it up to poorly-researched reporting.

And if you *are* handling Nx100G bundles on 6509s,
please contact me off-list, I need to get the details on
your source for magic router pixie dust.

It made the radar with the consumer impact. We traced the issue quickly
to customer datacenter routers/512K and worked with them to correct. We
were surprised (or not really) with this being called a wide spread
provider issue. Just checking if others really had an issue or was this
isolated to a few data centers.

No pixie dust

- Kevin

Hank_Nussbacher1 · August 13, 2014, 5:08am

We went with 768 - enough time to replace the routers with ASR9010s. It is merely a stop-gap measure to give everyone time to replace their routers in an orderly fashion.

-Hank

Valdis_Kletnieks · August 13, 2014, 5:40am

The same people who, knowing the 6509 had this default config issue, and
neither replaced the gear nor did the reconfig to buy time *before* the
wall got hit, are going to replace said 6509 in orderly fashion?

Hank, you gotta learn to wear respiratory apparatus when working near
open containers of magic router pixie dust - that stuff can screw you up
if you inhale it.

Mans_Nilsson · August 13, 2014, 7:10am

512K routes, here we come. Lots of TCAM based routers suddenly become
really expensive doorstops.

We had a planned outage yesterday 2300 UTC to perform the operation Hank
mentions. Alas, around 0850UTC the table went "critical" and we had to
do an emergency reboot. Well, the good part is that all 10G line cards
survived, and we're back in operation. The new routers are bought or
in the investment plan for this year. Just need to wait until it's time
for our vendors fiscal year end race...

Warren_Kumari · August 13, 2014, 1:52pm

We went with 768 - enough time to replace the routers with ASR9010s. It is
merely a stop-gap measure to give everyone time to replace their routers in
an orderly fashion.

The same people who, knowing the 6509 had this default config issue, and
neither replaced the gear nor did the reconfig to buy time *before* the
wall got hit, are going to replace said 6509 in orderly fashion?

Sadly enough:
folk running BGP on 65xx and taking full tables who are not plugged
into NANOG / the community. In many cases they are single homed
enterprise folk, but run BGP anyway (because com consultant set it up,
some employee with clue did it years ago and then left, etc).

B: they *did* know about the issue, but convincing management to spend
the cash to buy hardware that doesn't suck was hard, because
"everything is working fine at the moment" -- some folk needed things
to fail spectacularity to be able to justify shelling out the $$$ (
yes, they could recard the TCAM, but they are using this as an excuse
to get some real gear)...

Am I overly cynical, or does this all work out perfectly for some
vendors? I'm guessing that a certain vendor is going to see a huge
number of orders for new equipment, for an event that could have been
(and was) easily predicted... "Here, buy my widget... and then you'll
come back in a few years and buy another one.. <mwahahahah>".
Yup, folk purchasing these *should* have known (not like there was no
discussions of this), but, well, not everyone spends all day reading
NANOG / RIPE / CIDR report...

W

Paul_Ferguson1 · August 13, 2014, 2:05pm

I am not an operator, but I used to be a *really* active routing
engineer once upon a time in the stone age and what really bothers
me is the serious lack of general awareness on the issue of routing
table size, aggregation, and stability, and what effect it has on the
global Internet.

Especially questions like this:

"Is it time to switch to all IPv6 yet?"

http://tech-beta.slashdot.org/story/14/08/13/0048244/the-ipv4-internet-hiccups

If anyone *seriously* believes that IPv6 will have any positive effect
on this particular issue, you are sorely misinformed. If anything, it
will make the problem worse, since the ability to "get aggregation
wrong" will be much easier.

I'm not being cynical, I'm being a realist. :-/

- - ferg

p.s. I recall some IPv6 prefix growth routing projections by Vince
Fuller and Tony Li from several years ago which illustrated this, but
cannot find a reference at the moment....

- --
Paul Ferguson
VP Threat Intelligence, IID
PGP Public Key ID: 0x54DC85B2
Key fingerprint: 19EC 2945 FEE8 D6C8 58A1 CE53 2896 AC75 54DC 85B2

Paul_Ferguson1 · August 13, 2014, 3:55pm

Apologies for replying to my own post, but... below:

Am I overly cynical, or does this all work out perfectly for
some vendors? I'm guessing that a certain vendor is going to see
a huge number of orders for new equipment, for an event that
could have been (and was) easily predicted... "Here, buy my
widget... and then you'll come back in a few years and buy
another one.. <mwahahahah>". Yup, folk purchasing these *should*
have known (not like there was no discussions of this), but,
well, not everyone spends all day reading NANOG / RIPE / CIDR
report...

I am not an operator, but I used to be a *really* active routing
engineer once upon a time in the stone age and what really
bothers me is the serious lack of general awareness on the issue of
routing table size, aggregation, and stability, and what effect it
has on the global Internet.

Especially questions like this:

"Is it time to switch to all IPv6 yet?"

The IPv4 Internet Hiccups - Slashdot

If anyone *seriously* believes that IPv6 will have any positive
effect on this particular issue, you are sorely misinformed. If
anything, it will make the problem worse, since the ability to "get
aggregation wrong" will be much easier.

I'm not being cynical, I'm being a realist. :-/

- ferg

p.s. I recall some IPv6 prefix growth routing projections by Vince
Fuller and Tony Li from several years ago which illustrated this,
but cannot find a reference at the moment....

I found it:

"Scaling issues with ipv6 routing+multihoming"
Vince Fuller, Cisco Systems

I think the slides [above] were done for an IAB routing workshop in ~2006.

Also:

"Scaling of Internet Routing and Addressing: past view, present
reality,and possible futures"
Vince Fuller, Cisco Systems
http://www.vaf.net/~vaf/apricotworkshop\.pdf

FYI,

- - ferg

- --
Paul Ferguson
VP Threat Intelligence, IID
PGP Public Key ID: 0x54DC85B2
Key fingerprint: 19EC 2945 FEE8 D6C8 58A1 CE53 2896 AC75 54DC 85B2

Joel_Jaeggli · August 13, 2014, 6:09pm

Apologies for replying to my own post, but... below:

p.s. I recall some IPv6 prefix growth routing projections by Vince
Fuller and Tony Li from several years ago which illustrated this,
but cannot find a reference at the moment....

The raws workshop report makes for interesting reading, especially with
respect to how things actually turned out now that we're a decade on.

Paul_Ferguson1 · August 13, 2014, 6:14pm

Apologies for replying to my own post, but... below:

p.s. I recall some IPv6 prefix growth routing projections by
Vince Fuller and Tony Li from several years ago which
illustrated this, but cannot find a reference at the
moment....

The raws workshop report makes for interesting reading, especially
with respect to how things actually turned out now that we're a
decade on.

RFC 4984 - Report from the IAB Workshop on Routing and Addressing

Thanks for that -- I had completely forgotten about it.

- - ferg

I found it:

"Scaling issues with ipv6 routing+multihoming" Vince Fuller,
Cisco Systems
http://iab.org/wp-content/IAB-uploads/2011/03/vaf-iab-raws.pdf

I think the slides [above] were done for an IAB routing workshop
in ~2006.

Also:

"Scaling of Internet Routing and Addressing: past view, present
reality,and possible futures" Vince Fuller, Cisco Systems
http://www.vaf.net/~vaf/apricotworkshop\.pdf

FYI,

- ferg

- --
Paul Ferguson
VP Threat Intelligence, IID
PGP Public Key ID: 0x54DC85B2
Key fingerprint: 19EC 2945 FEE8 D6C8 58A1 CE53 2896 AC75 54DC 85B2

Merike_Kaeo1 · August 13, 2014, 6:27pm

We went with 768 - enough time to replace the routers with ASR9010s. It is
merely a stop-gap measure to give everyone time to replace their routers in
an orderly fashion.

The same people who, knowing the 6509 had this default config issue, and
neither replaced the gear nor did the reconfig to buy time *before* the
wall got hit, are going to replace said 6509 in orderly fashion?

Sadly enough:
A: not everyone knew about the issue - there are a large number of
folk running BGP on 65xx and taking full tables who are not plugged
into NANOG / the community. In many cases they are single homed
enterprise folk, but run BGP anyway (because com consultant set it up,
some employee with clue did it years ago and then left, etc).

I suspect this is true to some extent. Last NANOG had a record attendance and if I remember
correctly, 300(!!!!) NEW attendees.

Also, Philip Smith is STILL doing the BGP fundamentals tutorials with a full house every time. Granted
this is mostly around rest of world but there are new folks coming along all the time and while many
old timers are aware of all the historical info on route aggregation, this should be brought up ad nauseum
for new folks. Do enterprise type educational folks who include routing tutorials do anything with route
aggregation? Just wondering out loud.

B: they *did* know about the issue, but convincing management to spend
the cash to buy hardware that doesn't suck was hard, because
"everything is working fine at the moment" -- some folk needed things
to fail spectacularity to be able to justify shelling out the $$$ (
yes, they could recard the TCAM, but they are using this as an excuse
to get some real gear)…

Oh yeah, I'd bet this is also the case. Just like in 'security' related issues….

- merike

Bandy_Rush1 · August 13, 2014, 8:42pm

half the routing table is deagg crap. filter it.

you mean your vendor won't give you the knobs to do it smartly ([j]tac
tickets open for five years)? wonder why.

randy