It's fairly common knowledge that most of our systems work on 64-bit
at best (and more commonly 32-bit still).
If every route is nicely split at the 64-bit boundary, then it saves a
step in matching the prefix. Admittedly a very inexpensive step.
I expect that most hardware and software implementations store IPv6 as
either a group of 4 32-bit integers or a pair of 64-bit integers, and
a [ 7 or ] 8-bit prefix length field. I haven't read anything about a
new 128-bit ASIC for IPv6, at least.
In this context, it is perfectly reasonable, and expected, that the
use of longer prefixes will have a higher cost.
However, I think the number of routes, and your network architecture
play a significant factor.
It is a fairly standard practice to have different routes for your WAN
connections (e.g. the routers you use BGP on and need to support
thousands of routes) than the routers you use internally, where the
routing table can be considerably smaller (and for which you can
summarize). For these routers, the cost of routing is generally a
non-factor as the tables are much smaller.
I think a greater concern than simple routing and forwarding, would be
additional services, such as queuing, or filtering. These may be
implemented in hardware when a 64-bit boundary is used, but punted to
CPU otherwise. Though this would be implementation specific and is
something you would want to research for whatever hardware you're
running.
So far, the biggest performance problem I've encountered is related to
neighbor discovery. It seems that in most implementations the
neighbor discovery process is implemented in software. It doesn't
have much to do with the boundary, but rather just that the process
(e.g. solicitation for unknown entries) is expensive enough that
sweeping through available address space can easily use all available
CPU capacity.
One [somewhat effective] solution to this is to attempt to use longer
prefixes so there is much less address space where such an attack
would be valid. It is much less costly for a router to discard a
packet that it has no route for than it is to issue thousands of
neighbor discovery solicitations per second.
There are a few solutions that vendors will hopefully look into. One
being to implement neighbor discovery in hardware (at which point
table exhaustion also becomes a legitimate concern, so the logic
should be such that known associations are not discarded in favor of
unknown associations).
I do think, despite these limitations, that hardware is quickly
catching up to IPv6, though. I don't think it will be long before we
see the major vendors have solid implementations. Some of them
already may; I haven't had a chance to play with the newest stuff out
there.