Cisco GRE/IPSec performance, 3845 ISR/3945 ISR G2

We're running GRE/IPSec transport over a point-to-point DS3.
We're also doing some QoS. The traffic mix is voice; our
average packet size can be as low as 250 bytes at times.

We are seeing incredibly high CPU when the traffic levels
approach 30Mb/s and around 11kpps in each direction, at times
over 95%. We've seen packet loss as well in the priority queue.

We recently forklifted the routers on a point-to-point DS3 from
3845s to 3945s, thinking we'd see an improvement in performance.
We saw no such improvement, and some on our team argue it's
worse.

I'm assuming here that the packet rate coupled with the QoS and
GRE is just killing the router's CPU. That said, are there any
optimizations that we could consider before ripping this thing
out completely?

Yes, cef is enabled.

I'm considering changing from GRE/IPSec to VTI, but I suspect
this will still have the same actual switching behavior through
the router, and may not change anything.

One other possibility was to run IPSec tunnel mode and just
exclude EIGRP from the tunnel, but that may be risky (IPSec
fails == black hole).

Our fall back plan is swap out the DS3 with an ethernet and get
a L2 ethernet encryptor, but that can't happen until 2011-01-01.

Any suggestions, wisdom?

Thanks,
-cjp

This is probably more appropriate for the cisco-nsp list, but what
process is taking up the CPU or is it due to interrupts?
To the best of my knowledge the crypto should be hardware accelerated,
while everything else is going to be done in software on the 3800.

-Pete

The ISR series do have onboard hardware crypto, but I don't know offhand
if it can handle a full DS3 worth.

My first guess is fragment reassembly would probably kill it fast.

~Seth

Do you have the VPN/SSL AIM module? That would offload the crypto work. Supposedly capable of full 100Mbps line rate, I have them in 2811s.

Sincerely,

Brian A . Rettke
RHCT, CCDP, CCNP, CCIP
Network Engineer, CableONE Internet Services

There are a couple potential issues, that when looked at in whole, add
up to a significant performance impact.

1) IPSec + GRE involves two forwarding operations, one to send it to the
tunnel interface , and another to send the now-encapsulated packet out
the WAN interface. This effectively halves the total forwarding rate
before any other considerations.

2) While the IPSec portion is hardware accelerated, the GRE
encapsulation is not, unless this is a Cat6500/CISCO7600 router, or
7200VXR with C7200-VSA card. Because of this, the GRE process itself
will consume a fairly large amount of CPU, as this is also a per-packet
process. The impact is similar to a forwarding decision, so that
throughput level is halved again.

3) Other factors like quantity of tunnels, any routing protocols
running, NAT, or other such control protocols all have their own CPU
demands too, and can, in aggregate, be a small but significant burden
when the router also has to handle the demands of IPSec + GRE.

For reference, here is a guide to VPN performance:
http://www.cisco.com/web/partners/downloads/765/tools/quickreference/vpn
_performance_eng.pdf
It's slightly old, as it does not have the 39xx routers, but is still
useful for raw 3DES/AES performance for the 1800/2800/3800. See Table
5.

Sam Chesluk | Team Lead - Key Accounts | Network Hardware Resale |
T: 805.690.3718 | M:805.450.7469 | F: 805-690-3713
26 Castilian Dr. Santa Barbara, CA 93117
E: sam@networkhardware.com | www.networkhardware.com

- NHR's top global performer 7 years running
- World's largest provider of pre-owned/fully-tested and new/sealed
Cisco hardware

2) While the IPSec portion is hardware accelerated, the GRE
encapsulation is not, unless this is a Cat6500/CISCO7600 router, or
7200VXR with C7200-VSA card. Because of this, the GRE process itself
will consume a fairly large amount of CPU, as this is also a per-packet
process. The impact is similar to a forwarding decision, so that
throughput level is halved again.

I think this is where we're having the issue. It is just
shocking that this is occurring in a relatively low kpps
situation.

3) Other factors like quantity of tunnels, any routing protocols
running, NAT, or other such control protocols all have their own CPU
demands too, and can, in aggregate, be a small but significant burden
when the router also has to handle the demands of IPSec + GRE.

The number we were given for the 3945 for IMIX 1400 raw IPSec
performance was 840Mbps. However, all this extra crypto power
is completely useless if the GRE processing is hitting the same
limits as it's predecessor, the 3845.

We're going to give straight IPSec a go to see if that solves
things.

-cjp

We're not seeing fragmentation. The MTU of the physical DS3 is
arbitrarily large (over 9000) to intentionally avoid this.

-cjp

I would like to question this one.
I always thought that GRE header is pre-calculated and kept in the CEF adjacency table,
thus GRE encapsulation involves no additional processing overhead compared to regular ethernet encapsulation.
The only difference with 6500/7600 is that encapsulation is done by CPU, not PFC.
I'm in no way an expert in this, but I'd imagine the whole process to be like this:
1. a sinlge CEF lookup/encapsulation produces a GRE packet
2. packet encryption/ESP encapsulation
3. another CEF lookup/encapsulation to get the encrypted packet out
So forwarding rate halved, but just once.
Am I wrong?

Michael

The GRE encap on a software based router like an ISR should be
resolved in CEF from the start, so it shouldn't be two CEF lookups.
However, on the software based platforms, every feature you turn on
takes a little more CPU so even with a single lookup I wouldn't expect
the same performance from GRE that I would from non-GRE traffic.

6k/7600 requires recirculation in hardware and the story is completely
different as you are basically running the packet through the hardware
twice.

-Pete