subnet prefix length > 64 breaks IPv6?

Glen_Kent · December 28, 2011, 2:32pm

Most vendors have a TCAM that by default does IPv6 routing for netmasks <=64.

They have a separate TCAM (which is usually limited in size) that does
routing for masks >64 and <=128.

TCAMs are expensive and increase the BOM cost of routers. Storing
routes with masks > 64 takes up twice the number of TCAM entries as
the routes with masks <= 64. Since IPv6 is *supposed* to work with /64
masks, most vendors (usually the not-so-expensive-routers) provide a
smaller TCAM for > /64 masks.

Glen

Ryan_Malayter · December 28, 2011, 2:35pm

Well, I do know if you look at the specs for most newer L3 switches,
they will often say something like "max IPv4 routes 8192, max IPv6
routes 4096". This leads one to believe that the TCAMs/hash tables are
only using 64 bits for IPv6 forwarding, and therefores longer prefixes
must be handled in software.

This may very well not be true "under the hood" at all, but the fact
that vendors publish so little IPv6 specification and benchmarking
information doesn't help matters.

sthaug · December 28, 2011, 2:41pm

Most vendors have a TCAM that by default does IPv6 routing for netmasks <=64.

They have a separate TCAM (which is usually limited in size) that does
routing for masks >64 and <=128.

Please provide references. I haven't seen any documentation of such an
architecture myself.

TCAMs are expensive and increase the BOM cost of routers. Storing
routes with masks > 64 takes up twice the number of TCAM entries as
the routes with masks <= 64. Since IPv6 is *supposed* to work with /64
masks, most vendors (usually the not-so-expensive-routers) provide a
smaller TCAM for > /64 masks.

Ah, but do the "not-so-expensive-routers" use TCAM at all?

Steinar Haug, Nethelp consulting, sthaug@nethelp.no

sthaug · December 28, 2011, 2:50pm

> Can you please name names for the "somewhat less efficient" part? I've
> seen this and similar claims several times, but the lack of specific
> information is rather astounding.

Well, I do know if you look at the specs for most newer L3 switches,
they will often say something like "max IPv4 routes 8192, max IPv6
routes 4096". This leads one to believe that the TCAMs/hash tables are
only using 64 bits for IPv6 forwarding, and therefores longer prefixes
must be handled in software.

It might lead you to believe so - however, I believe this would be
commercial suicide for hardware forwarding boxes because they would no
longer be able to handle IPv6 at line rate for prefixes needing more
than 64 bit lookups. It would also be an easy way to DoS such boxes...

This may very well not be true "under the hood" at all, but the fact
that vendors publish so little IPv6 specification and benchmarking
information doesn't help matters.

Cisco actually has published quite a bit of info, e.g.

"Delivering scalable forwarding Performance: up to 400 Mpps IPv4 and
200 Mpps IPv6 with dCEF"

They have also published EANTC tests which include IPv6 forwarding rates.

Steinar Haug, Nethelp consulting, sthaug@nethelp.no

Ray_Soucy · December 28, 2011, 3:19pm

It's fairly common knowledge that most of our systems work on 64-bit
at best (and more commonly 32-bit still).

If every route is nicely split at the 64-bit boundary, then it saves a
step in matching the prefix. Admittedly a very inexpensive step.

I expect that most hardware and software implementations store IPv6 as
either a group of 4 32-bit integers or a pair of 64-bit integers, and
a [ 7 or ] 8-bit prefix length field. I haven't read anything about a
new 128-bit ASIC for IPv6, at least.

In this context, it is perfectly reasonable, and expected, that the
use of longer prefixes will have a higher cost.

However, I think the number of routes, and your network architecture
play a significant factor.

It is a fairly standard practice to have different routes for your WAN
connections (e.g. the routers you use BGP on and need to support
thousands of routes) than the routers you use internally, where the
routing table can be considerably smaller (and for which you can
summarize). For these routers, the cost of routing is generally a
non-factor as the tables are much smaller.

I think a greater concern than simple routing and forwarding, would be
additional services, such as queuing, or filtering. These may be
implemented in hardware when a 64-bit boundary is used, but punted to
CPU otherwise. Though this would be implementation specific and is
something you would want to research for whatever hardware you're
running.

So far, the biggest performance problem I've encountered is related to
neighbor discovery. It seems that in most implementations the
neighbor discovery process is implemented in software. It doesn't
have much to do with the boundary, but rather just that the process
(e.g. solicitation for unknown entries) is expensive enough that
sweeping through available address space can easily use all available
CPU capacity.

One [somewhat effective] solution to this is to attempt to use longer
prefixes so there is much less address space where such an attack
would be valid. It is much less costly for a router to discard a
packet that it has no route for than it is to issue thousands of
neighbor discovery solicitations per second.

There are a few solutions that vendors will hopefully look into. One
being to implement neighbor discovery in hardware (at which point
table exhaustion also becomes a legitimate concern, so the logic
should be such that known associations are not discarded in favor of
unknown associations).

I do think, despite these limitations, that hardware is quickly
catching up to IPv6, though. I don't think it will be long before we
see the major vendors have solid implementations. Some of them
already may; I haven't had a chance to play with the newest stuff out
there.

Ryan_Malayter · December 28, 2011, 3:30pm

It might lead you to believe so - however, I believe this would be
commercial suicide for hardware forwarding boxes because they would no
longer be able to handle IPv6 at line rate for prefixes needing more
than 64 bit lookups. It would also be an easy way to DoS such boxes...

That's just what I'm arguing here: no vendor info I've seen positively
says they *can* handle line-rate with longer IPv6 prefixes. The other
information available leads one to believe that all the published
specs are based on /64 prefixes.

Even a third-party test reports don't mention IPv6 or prefix length at
all:

Cisco actually has published quite a bit of info, e.g.

http://www.cisco.com/en/US/prod/collateral/switches/ps5718/ps708/prod…

"Delivering scalable forwarding Performance: up to 400 Mpps IPv4 and
200 Mpps IPv6 with dCEF"

They have also published EANTC tests which include IPv6 forwarding rates.

Except nowhere in there is the prefix length for the test indicated,
and the exact halving of forwarding rate for IPv6 leads one to believe
that there are two TCAM lookups for IPv6 (hence 64-bit prefix lookups)
versus one for IPv4.

For example, what is the forwarding rate for IPv6 when the tables are
filled with /124 IPv6 routes that differ only in the last 60 bits?

Even then EANTC test results you reference make no mention of the
prefix length for IPv4 or IPv6, or even the number of routes in the
lookup table during the testing:

Ray_Soucy · December 28, 2011, 3:44pm

For what its worth I haven't stress tested it or anything, but I
haven't seen any evidence on any of our RSP/SUP 720 boxes that would
have caused me to think that routing and forwarding isn't being done
in hardware, and we make liberal use of prefixes longer than 64
(including 126 for every link network). They might just be under
capacity to the point that I haven't noticed, though. I have no
problem getting muti-gigabit IPv6 throughput.

sthaug · December 28, 2011, 3:45pm

If every route is nicely split at the 64-bit boundary, then it saves a
step in matching the prefix. Admittedly a very inexpensive step.

My point here is that IPv6 is still defined as "longest prefix match",
so unless you *know* that all prefixes are <= 64 bits, you still need
the longer match.

In this context, it is perfectly reasonable, and expected, that the
use of longer prefixes will have a higher cost.

In a way I agree with you. However, if I put my purchasing hat on, I
would refuse to buy equipment that could only forward on the first 64
bits, *or* where the forwarding decision was much slower (hardware vs
software) for prefixes longer than 64 bits. I would not be surprised
if vendors decide that it is a *commercial* necessity to support full
128 bit matches.

However, I think the number of routes, and your network architecture
play a significant factor.

Absolutely. In our network by far the largest number of IPv6 prefixes
are EBGP prefixes in the 32 to 48 bit range. However, we also have for
instance our own 128 bit loopbacks - they are obviously only in our IGP.

I think a greater concern than simple routing and forwarding, would be
additional services, such as queuing, or filtering. These may be
implemented in hardware when a 64-bit boundary is used, but punted to
CPU otherwise. Though this would be implementation specific and is
something you would want to research for whatever hardware you're
running.

Again, that would be an excellent reason *not* to buy such equipment.

And yes, we know equipment that cannot *filter* on full IPv6 + port
number headers exists (e.g. Cisco 6500/7600 with 144 bit TCAMs) - my
original point was that I still haven't seen equipment with forwarding
problems for prefixes > 64 bits.

There are a few solutions that vendors will hopefully look into. One
being to implement neighbor discovery in hardware (at which point
table exhaustion also becomes a legitimate concern, so the logic
should be such that known associations are not discarded in favor of
unknown associations).

I'm afraid I don't believe this is going to happen unless neighbor
discovery based attacks become a serious problem. And even then it would
take a long time.

Steinar Haug, Nethelp consulting, sthaug@nethelp.no

Leo_Bicknell1 · December 28, 2011, 3:57pm

In a message written on Wed, Dec 28, 2011 at 10:19:54AM -0500, Ray Soucy wrote:

If every route is nicely split at the 64-bit boundary, then it saves a
step in matching the prefix. Admittedly a very inexpensive step.

I expect that most hardware and software implementations store IPv6 as
either a group of 4 32-bit integers or a pair of 64-bit integers, and
a [ 7 or ] 8-bit prefix length field. I haven't read anything about a
new 128-bit ASIC for IPv6, at least.

In this context, it is perfectly reasonable, and expected, that the
use of longer prefixes will have a higher cost.

The routers are already having to do a 128-bit lookup under the
hood. Consider you have a /48 routed in your IGP (to keep things
simple). When you look up the /48 in a router you will see it has
a next hop. A 128 bit next hop. This may be a link local, it may
be a global unicast (depending on your implementation). This next
hop has to be resolved, in the case of Ethernet as an example to a
48 bit MAC address.

So a typical forwarding step is already a two step process:

Look up variable length prefix to get next hop.
Look up 128 bit next hop to get forwarding information.

Once the vendor has built a 128-bit TCAM for step #2, there's no
reason not to use it for step #1 as well. AFAIK, in all recent products
this is how all vendors handle the problem (at a high level).

Sadly, this is all a case where mind share is hobbled by a few early
adopter problems. If you look at the first IPv6 images for platforms
like the Cisco 7500 (in the VIP-2 days) that hardware was built to
IPv4 criteria, and had 32 bit TCAM's. To make IPv6 work they did
multiple TCAM lookups, some the simple 32 bits x 4, others fancy
things trying to guess prefix lengths that might likley be used.
All took a substantial line rate hit moving IPv6 as a result.

Those problems simply don't exist in modern gear. Once products
were designed to support native IPv6 rational design decisions were
made.

I don't know of any _current generation_ core router that has any
performance difference based on prefix length. That's why prefix length
isn't in the test criteria, it simply doesn't matter.

I say this as a proud user of /128's, /126's, and /112's in a
multi-vendor network, as well.

Glen_Kent · December 28, 2011, 4:36pm

So a typical forwarding step is already a two step process:

Look up variable length prefix to get next hop.
Look up 128 bit next hop to get forwarding information.

Wrong.

You only do a lookup once.

You look up a TCAM or a hash table that gives you the next hop for a route.

You DONT need to do another TCAM lookup to get the egress
encapsulation information.

You get the egress encapsulation after your TCAM lookup. It typically
gives you an index that stores this information. All routes pointing
to one nexthop will typically point to the same index.

Once the vendor has built a 128-bit TCAM for step #2, there's no
reason not to use it for step #1 as well. AFAIK, in all recent products
this is how all vendors handle the problem (at a high level).

You only use the TCAM for #1, not for #2.

Glen

Ryan_Malayter · December 28, 2011, 5:51pm

You can get >10GbE *throughput* from a Linux box doing all forwarding
in software as well. That's easy when the packets are big and the
routing tables are small, and the hash tables all fit in high-speed
processor cache.

The general lack of deep information about how the switching and
routing hardware really works for IPv6 is my main problem. It's not
enough to make informed buying or design decisions. Unfortunately, I
have over the course of my career learned that a "trust but verify"
policy is required when managing vendors. Especially vendors that have
a near-monopoly market position.

The problem, of course, is that verifying this sort of thing with
realistic worst-case benchmarks requires some very expensive equipment
and a lot of time, which is why the lack of solid information from
vendors and 3rd-party testing labs is worrying.

Surely some engineers from the major switch/router vendors read the
NANOG list. Anybody care to chime in with "we forward all IPv6 prefix
lengths in hardware for these product families"?

Ray_Soucy · December 28, 2011, 7:14pm

I did look into this a bit before.

To be more specific:

IPv6 CEF appears to be functioning normally for prefixes longer than
64-bit on my 720(s).

I'm not seeing evidence of unexpected punting.

The CPU utilization of the software process that would handle IPv6
being punted to software, "IPv6 Input", is at a steady %0.00 average
(with spikes up to 0.02%).

So there would seem to be at least one major platform that is OK.

sthaug · December 28, 2011, 7:46pm

IPv6 CEF appears to be functioning normally for prefixes longer than
64-bit on my 720(s).

I'm not seeing evidence of unexpected punting.

The CPU utilization of the software process that would handle IPv6
being punted to software, "IPv6 Input", is at a steady %0.00 average
(with spikes up to 0.02%).

So there would seem to be at least one major platform that is OK.

And there are other platforms, e.g. Juniper M/MX/T, where there is no
concept of "punt a packet to software to perform a forwarding decision".
The packet is either forwarded in hardware, or dropped. IPv6 prefixes >
64 bit are handled like any other IPv6 prefixes, i.e. they are forwarded
in hardware.

Steinar Haug, Nethelp consulting, sthaug@nethelp.no

Jeff_S_Wheeler1 · December 28, 2011, 9:08pm

There are a few solutions that vendors will hopefully look into. One
being to implement neighbor discovery in hardware (at which point
table exhaustion also becomes a legitimate concern, so the logic
should be such that known associations are not discarded in favor of
unknown associations).

Even if that is done you are still exposed to attacks -- imagine if a
downstream machine that is under customer control (not yours) has a
whole /64 nailed up on its Ethernet interface, and happily responds to
ND solicits for every address. Your hardware table will fill up and
then your network has failed -- which way it fails depends on the
table eviction behavior.

Perhaps this is not covered very well in my slides. There are design
limits here that cannot be overcome by any current or foreseen
technology. This is not only about what is broken about current
routers but what will always be broken about them, in the absence of
clever work-arounds like limits on the number of ND entries allowed
per physical customer port, etc.

We really need DHCPv6 snooping and ND disabled for campus access
networks, for example. Otherwise you could give out addresses from a
limited range in each subnet and use an ACL (like Owen DeLong suggests
for hosting environments -- effectively turning the /64 into a /120
anyway) but this is IMO much worse than just not configuring a /64.

I'm afraid I don't believe this is going to happen unless neighbor
discovery based attacks become a serious problem. And even then it would
take a long time.

The vendors seem to range from "huh?" to "what is everyone else
doing?" to Cisco (the only vendor to make any forward progress at all
on this issue.) I think that will change as this topic is discussed
more and more on public mailing lists, and as things like DHCPv6
snooping, and good behavior when ND is disabled on a subnet/interface,
begin to make their way into RFPs.

As it stands right now, if you want to disable the IPv6 functionality
(and maybe IPv4 too if dual-stacked) of almost any datacenter /
hosting company offering v6, it is trivial to do that. The same is
true of every IXP with a v6 subnet. I think once some bad guys figure
this out, they will do us a favor and DoS some important things like
IXPs, or a highly-visible ISP, and give the vendors a kick in the
pants -- along with operators who still have the "/64 or bust"
mentality, since they will then see things busting due to trivial
attacks.

Ray_Soucy · December 28, 2011, 10:07pm

You will always be exposed to attacks if you're connected to the Internet.

(Not really sure what you were trying to say there.)
My primary concerns are attacks originated from external networks.
Internal network attacks are a different issue altogether (similar to
ARP attacks or MAC spoofing), which require different solutions.

As previously discussed in a recent thread, the attack vector you
describe (in a service provider environment) can be mitigated though
architecture simply by effective CPE design (isolated link network to
CPE, L3 hand-off at CPE, with stateful packet inspection; and small or
link-local prefixes for link networks). Thankfully, this isn't a
model that is anything new; many networks are already built in this
way.

The only point contested is the validity of longer-than 64-bit
prefixes; which I think I've spoken enough on.

Enterprise and Data Center environments have a different set of
[similar] concerns. Which is where the most concern on exploitation
of ND and large prefixes comes into play. I think most of us have
been at this for long enough to have given up on the
one-configuration-fits-all school of network design. A stateful
firewall internally can be a strong tool to mitigate this attack
vector in such environments, depending on their size. For networks
where a stateful firewall isn't practical, though, that is where
stronger router implementation comes into play.

The suggestion of disabling ND outright is a bit extreme. We don't
need to disable ARP outright to have functional networks with a
reasonable level of stability and security. The important thing is
that we work with vendors to get a set of tools (not just one) to
address these concerns. As you pointed out Cisco has already been
doing quite a bit of work in this area, and once we start seeing the
implementations become more common, other vendors will more than
likely follow (at least if they want our business).

Maybe I'm just a glass-half-full kind of guy.

I think being able to use longer prefixes than 64-bit helps
considerably. I think that seeing routers that can implement ND in
hardware (or at least limit its CPU usage), and not bump known
associations for unknown ones will help considerably. Stateful
firewalls (where appropriate) will help considerably. And L2 security
features (ND inspection with rate-limiting, RA guard, DHCPv6 snooping)
will all help -- considerably. Combined, they make for an acceptable
solution by current standards.

As was also pointed out, though, many networks don't even implement
this level of security for IP internally; the difference is that many
of them haven't needed to for external attacks because of the
widespread use of NAT, stateful firewalls, and much smaller address
space. That is a little different in the IPv6 world, and why there is
concern being expressed on this list.

The most important thing is that network operators are aware of these
issues, have a basic understanding of the implications, and are
provided with the knowledge and tools to address them.

This really isn't much different than IPv4.

Jeff_S_Wheeler1 · December 28, 2011, 10:39pm

The suggestion of disabling ND outright is a bit extreme. We don't
need to disable ARP outright to have functional networks with a
reasonable level of stability and security. The important thing is

I don't think it's at all extreme. If you are dealing with an access
network where DHCPv6 is the only legitimate way to get an address on a
given LAN segment, there is probably no reason for the router to use
ND to learn about neighbor L3<>L2 associations. With DHCPv6 snooping
the router can simply not use ND on that segment, which eliminates
this problem. However, this feature is not yet available.

It would also be difficult to convince hosting customers to use a
DHCPv6 client to populate their gateway's neighbor table. However, if
this feature comes along before other fixes, it will be a good option
for safely deploying /64s without ND vulnerabilities.

that we work with vendors to get a set of tools (not just one) to
address these concerns. As you pointed out Cisco has already been
doing quite a bit of work in this area, and once we start seeing the
implementations become more common, other vendors will more than
likely follow (at least if they want our business).

Maybe I'm just a glass-half-full kind of guy.

I think your view of the Cisco work is a little optimistic. What
they have done so far is simply acknowledge that, yes, ND exhaustion
is a problem, and give the customer the option to mitigate damage to
individual interfaces / VLANs, on the very few platforms that support
the feature.

Cisco has also given the SUP-2T independent policers for ARP and ND,
so if you have a SUP-2T instead of a SUP720 / RSP720, your IPv4 won't
break when you get an IPv6 ND attack. Unfortunately, there are plenty
of people out there who are running IPv6 /64s on SUP720s, most who do
not know that an attacker can break all their IPv4 services with an
IPv6 ND attack.

The most important thing is that network operators are aware of these
issues, have a basic understanding of the implications, and are
provided with the knowledge and tools to address them.

We certainly agree here. I am glad the mailing list has finally moved
from listening to Owen DeLong babble about this being a non-problem,
to discussing what work-arounds are possible, disadvantages of them,
and what vendors can do better in the future.

My personal belief is that DHCPv6 snooping, with ND disabled, will be
the first widely-available method of deploying /64s "safely" to
customer LAN segments. I'm not saying this is good but it is a
legitimate solution.

Ray_Soucy · December 28, 2011, 11:08pm

As much as I argue with Owen on-list, I still enjoy reading his input.

It's a little uncalled for to be so harsh about his posts. A lot of
us are strong-willed here, and many of us read things we've posted in
the past and ask "what was I thinking, that's ridiculous"; and perhaps
I'm just saying that because I do so more than most.

But really, let's stay civil.

I don't disagree with your other comments much, but I do think (hope
actually) that DHCPv6 snooping will not filter link-local traffic.
That would be a job for an ND inspection kind of technology, and one I
would hope was configurable. There is no DHCPv6 for link-local so it
would be kind of silly to have DHCPv6 snooping restrict that traffic
completely. It will be a little less straight forward than DHCP
snooping is, no doubt.

And I will admit I can be a Cisco fanboy at times, but only because
they've consistently been able to deliver on IPv6 more that other
vendors I've worked with. Like any vendor it can be hard to get
through to the people who matter, but Cisco has been pretty good at
responding to us when we poke them on these matters. Surprisingly,
most of the time the delay is waiting on a standard to be established
so they can implement that rather than their own thing.

Mark_Tinka3 · December 29, 2011, 8:56am

IOS XR-based systems operate the same way.

Mark.

Saku_Ytti1 · December 29, 2011, 9:10am

Of course this isn't strictly true, transit might be punted in either
platform for various reasons, IP(v6) options comes to mind, possibly too
many IPv6 extension headers (Cisco.com claims to punt in such instance,
JNPR/trio (imho incorrectly) just drops packet in hardware), glean/arp
resolve, multicast learning, probably many other reasons I can't think off.

Mark_Tinka3 · December 29, 2011, 11:39am

Of course, not "strictly".

What I meant was the CRS and ASR9000 don't operate like the
6500/7600 and other Cisco switches that punted packets to
CPU if, for one reason or another, a bug or misconfiguration
caused said packets to be sent to the CPU for forwarding.

Mark.