Hello again,
I've heard a lot of encouraging things on this list in response to my previous inquiries about VoIP - hoping you can help me out again.
In order to cut costs in our telecom budget I'm toying with the idea of replacing a lot of our inter-office leased lines with VPN connections over the public Internet. (I've got a lot of experience doing this in major production environments so I'm aware of the gotchas in this scenario.) My general method is to create IPsec-encrypted GRE tunnels between sites and then treat them as true virtual circuits; i.e., I run OSPF over the tunnel and exchange dynamic routing information, blah-de-blah.
Assume for the moment that latency and bandwidth are not an issue; e.g., any two points that will be exchanging voice data will both have transit from the same provider with an aggressive SLA.
Does anyone have any experience running VoIP over such tunnels? Is there a technical reason why this solution is not feasible? Are Cisco routers not happy doing VoIP/IPsec/GRE in concert?
Thanks as always,
C.
Are Cisco routers not happy doing VoIP/IPsec/GRE in concert?
Cisco routers (and some others) are somewhat jittery doing IPsec but if you
keep your CPU utilization levels low enough, it shouldn�t pose a problem.
I would expect to keep watching the performance as traffic levels increase.
Pete
Thus spake "Charles Youse" <cyouse@register.com>
In order to cut costs in our telecom budget I'm toying with the idea
of replacing a lot of our inter-office leased lines with VPN
connections over the public Internet. [...]
Assume for the moment that latency and bandwidth are not an issue;
e.g., any two points that will be exchanging voice data will both have
transit from the same provider with an aggressive SLA.
Latency, bandwidth, and packet loss are moot. Jitter is VoIP's enemy.
Does anyone have any experience running VoIP over such tunnels?
Is there a technical reason why this solution is not feasible? Are
Cisco routers not happy doing VoIP/IPsec/GRE in concert?
IPsec itself will not cause you problems; there's no theoretical conflict.
Unfortunately, IOS can introduce jitter when encrypting packets. To
mitigate this, you can apply QOS, with a strict priotiy queue for the VoIP
packets and the "qos pre-classify" feature. Your mileage will vary
depending on the CPU power of the router, the traffic levels, and whether
you're using hardware encryption.
S
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking
I've done voip over a pptp tunnel several different times
with no real problems. This includes at hotels as well as at the
last nanog. obviously there was no encryption.. but i'm not that exciting
to listen to anyways 
- jared
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Unfortunately, IOS can introduce jitter when encrypting packets. To
mitigate this, you can apply QOS, with a strict priotiy queue for the VoIP
packets and the "qos pre-classify" feature. Your mileage will vary
depending on the CPU power of the router, the traffic levels, and whether
you're using hardware encryption.
Stephen, I know this is outside of Charles' original inquiry, but I'm not
familiar with this "qos pre-classify" feature. Since we would be encrypting
voice traffic ... at what point would you classify it? If I classify it
before it goes into the tunnel and gets encrypted, would that
classification last once it's encrypted? If we try to classify after it's
been encrypted, how can we tell it's voice traffic? It seems to me that
jitter from both the actual encryption process as well as that associated
with basic serialization would be the potential death of VoIP in this
scenario, but I'm not sure mechanisms available to help resolve that risk.
Thus spake "Charlie Clemmer" <cclemmer@nexgennetworks.com>
Stephen, I know this is outside of Charles' original inquiry, but I'm not
familiar with this "qos pre-classify" feature. Since we would be
encrypting
voice traffic ... at what point would you classify it? If I classify it
before it goes into the tunnel and gets encrypted, would that
classification last once it's encrypted? If we try to classify after it's
been encrypted, how can we tell it's voice traffic? It seems to me that
jitter from both the actual encryption process as well as that associated
with basic serialization would be the potential death of VoIP in this
scenario, but I'm not sure mechanisms available to help resolve that risk.
In the default IOS code path, encryption happens before QOS (and after GRE).
Modern IOS versions copy the DSCP when encapsulating/ encrypting packets, so
DSCP-based QOS will still work, but IP- and port-based QOS will not.
More importantly, encryption is slow; even hardware encryption is
significantly slower than the rest of the forwarding process. It's also
FIFO by default, meaning that large data packets can get stuck ahead of your
VoIP packets, causing jitter.
'qos pre-classify' adds a second QOS stage before encryption, which allows
you to classify packets in their unencrypted state and, more importantly,
adds PQ capability to the encryption stage.
For more information:
http://www.cisco.com/univercd/cc/td/doc/product/software/ios122/122cgcr/fqos
_c/fqcprt1/qcfvpn.htm
S
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking
Does anyone have any experience running VoIP over such tunnels?
Is there a technical reason why this solution is not feasible? Are
Cisco routers not happy doing VoIP/IPsec/GRE in concert?
The company I'm working for uses Shoreline VoIP PBX gear spread out
over maybe a dozen offices of varying sizes. All are interconnected
through the corporate enterprise net, Cisco routers with IPSEC/GRE tunnels
over the public Internet. Each office has at least a T1, and we use a
variety of providers. We have a typical mix of enterprise interoffice
traffic: email, web, file sharing, etc. There's no QoS configured in
the routers at present.
It all seems to just work fine.
Steve
Maybe a stupid question... why would you need GRE tunneling while IPsec
has a tunnel mode of its own?
Probably because a major router vendor, despite of repeated customer requests,
declined to implement routing across such tunnel mode.
Pete
through the corporate enterprise net, Cisco routers with IPSEC/GRE tunnels
over the public Internet.
Maybe a stupid question... why would you need GRE tunneling while IPsec
has a tunnel mode of its own?
For running routing over the tunnels for example...
- kurtis -
So if the router uses tunnel mode (as per the RFC) despite the GRE
tunnel the packet has three IP headers... So that's 160 bits ethernet
layer 1 + 18 bytes ethernet layer 2 overhead, 24 bytes for the GRE
tunnel, 20 bytes for the IPsec tunnel mode IP header, 10 - 12 bytes for
the ESP header, 16 bytes for the initialization vector, 20 bytes for the
original IP header and finally 20 bytes for the RTP header. With a 40
byte payload that adds up to 188 bytes on the wire of which 78% is
overhead...
Iljitsch van Beijnum wrote:
So if the router uses tunnel mode (as per the RFC) despite the GRE
tunnel the packet has three IP headers... So that's 160 bits ethernet
layer 1 + 18 bytes ethernet layer 2 overhead, 24 bytes for the GRE
tunnel, 20 bytes for the IPsec tunnel mode IP header, 10 - 12
bytes for
the ESP header, 16 bytes for the initialization vector, 20
bytes for the
original IP header and finally 20 bytes for the RTP header. With a 40
byte payload that adds up to 188 bytes on the wire of which 78% is
overhead...
...leaving a dream of RTP as true and presumably light-weight
protocol, as per rfc753, 759, 760, 761, 793, etc. Was this RTP
the protocol under NVP (as per rfc741)? It was mentioned in
documents before UDP (first mentioned in rfc755 and defined in
rfc768), but I don't see any RFC ever defining it, and it doesn't
have a protocol number assigned in the early assigned number RFCs
(eg. rfc755, which is after UDP was conceived but before anything
was removed or re-used from the early allocations).
Of course that won't help the other overheads. And there's still
a lot of the internet where you'd want to add cell tax then block
up to the next 53 bytes... do we have 90% overhead yet? 
It's interesting that the original 'ST' and 'RTP' were thought of
in 1979 and 1981, but it was 1990 before 'ST-II' (rfc1190) and
1996 by the time the actual RTP was formalized (rfc1889, where it
is mentioned as being "typically [..] on top of UDP", but the option
is left open that it could be used directly as a protocol on top
of IP). I'm sure I was using (commonly available) voice over
the 'net before 1996, but I think it was a horrible application
which sent duplicate UDP packets in the expectation of dropped
packets... probably still with less overhead than today's VoIP
over GRE over IPsec over EoMPLS over ATM type designs, despite the
packet duplication...
David.
Well, sloppy thinking breeds complexity -- what I dislike about standards
commitees (IETF/IESG included) is that they always sink to the lowest
common denominator of the design talent or competence of its participants.
In fact, a method to encrypt small parcels of data efficiently is
well-known for decades. It is called "stream cypher" (surprise). Besides
LFSR-based and other stream cyphers, any block cypher can be used in this
mode. Its application to RTP is trivial and straight-forward. Just leave
sequence number in clear text, so that position in the stream is
recoverable in case of packet loss. It also allows precomputation of the
key stream, adding nearly zero latency/jitter to the actual packet
processing.
--vadim
Thus spake "Vadim Antonov" <avg@kotovnik.com>
In fact, a method to encrypt small parcels of data efficiently is
well-known for decades. It is called "stream cypher" (surprise).
Besides LFSR-based and other stream cyphers, any block cypher
can be used in this mode. Its application to RTP is trivial and
straight-forward. Just leave sequence number in clear text, so that
position in the stream is recoverable in case of packet loss.
Most stream modes are chained in some way to intentionally disrupt
decryption if part of the ciphertext is missing; that is why IPsec resets
the stream for each packet (currently).
When NIST was standardizing AES, they added CTR mode specifically to address
IPsec implementations. I think there's already been a draft out of the IRTF
on how to modify IPsec for this, but it's not something I've followed
closely.
It also allows precomputation of the key stream, adding nearly zero
latency/jitter to the actual packet processing.
You fail to note that this requires precomputing and storing a keystream for
every SA on the encrypting device, which often number in the thousands.
This isn't feasible in a software implementation, and it's unnecessary in
hardware.
S
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking
That would be CBC mode (where the output of one block becomes part of
the input for the next) and I don't think this effect is a feature. At
least, certainly not a desirable one because now we need a relatively
large initialization vector in each encrypted packet. (It would of
course be possible to negotiate some random data in advance from which
the IVs can be taken in a way that is linked to the counter so the IV
doesn't have to be included in the packet.)
A stream cipher generates a random-looking data stream against which the
payload is XORed. If you miss some payload you can still generate the
data stream for the missing part and start XORing again for the data you
have, as long as you exactly know how much is missing. This would be
trivial to implement in IPsec with a fixed packet length because the
anti-replay counter tells you the number of packets that were
transmitted in the clear.
You don' have to store the entire keystream, just enough to allow
on-the-fly packet processing. Besides, memory is cheap. 100 msec buffers
for 100,000 simultaneous voice connections is an astonishing 80 Mb.
More realistically, it's 10k calls and 30 msec of buffering.
--vadim