More MD5 fun: Cisco uses wrong MD5 key for old session after key change

The other day I was trying to configure MD5 authentication on a BGP
peering with MCI. Juniper at our end, Cisco at their end. We couldn't
get the session up the way we wanted - or rather, we got the session up,
and everything *seemed* to work fine, but the Cisco router was logging
lots of "Invalid MD5 digest" - this did not seem to stop.

Today I saw the same thing on a session where I controlled both ends,
and was later able to reproduce it in the lab. The routers are:

Cisco 7206, IOS 12.0(23)S6, IP 7.0.0.9
Juniper M7i, JunOS 6.2R2.4, IP 7.0.0.18

The session was sniffed with the latest (-current) version of tcpdump
from tcpdump.org - which supports verification of TCP MD5 digests
("chel3ixy" is the MD5 key):

# tcpdump -ni xl1 -s 1500 -M chel3ixy tcp port 179

Here we see normal BGP keepalives, all MD5 digests valid:

21:19:10.183900 IP 7.0.0.18.3961 > 7.0.0.9.179: P 76:95(19) ack 77 win 17102 <nop,nop,md5:valid>: BGP, length: 19
21:19:10.184306 IP 7.0.0.9.179 > 7.0.0.18.3961: P 77:96(19) ack 95 win 16206 <md5:valid,eol>: BGP, length: 19
21:19:10.298365 IP 7.0.0.18.3961 > 7.0.0.9.179: . ack 96 win 17083 <nop,nop,md5:valid>
21:19:40.181172 IP 7.0.0.18.3961 > 7.0.0.9.179: P 95:114(19) ack 96 win 17083 <nop,nop,md5:valid>: BGP, length: 19
21:19:40.181690 IP 7.0.0.9.179 > 7.0.0.18.3961: P 96:115(19) ack 114 win 16187 <md5:valid,eol>: BGP, length: 19
21:19:40.280368 IP 7.0.0.18.3961 > 7.0.0.9.179: . ack 115 win 17064 <nop,nop,md5:valid>
21:20:10.202449 IP 7.0.0.18.3961 > 7.0.0.9.179: P 114:133(19) ack 115 win 17064 <nop,nop,md5:valid>: BGP, length: 19
21:20:10.202831 IP 7.0.0.9.179 > 7.0.0.18.3961: P 115:134(19) ack 133 win 16168 <md5:valid,eol>: BGP, length: 19
21:20:10.302389 IP 7.0.0.18.3961 > 7.0.0.9.179: . ack 134 win 17045 <nop,nop,md5:valid>
21:20:40.214582 IP 7.0.0.18.3961 > 7.0.0.9.179: P 133:152(19) ack 134 win 17045 <nop,nop,md5:valid>: BGP, length: 19
21:20:40.214960 IP 7.0.0.9.179 > 7.0.0.18.3961: P 134:153(19) ack 152 win 16149 <md5:valid,eol>: BGP, length: 19
21:20:40.314417 IP 7.0.0.18.3961 > 7.0.0.9.179: . ack 153 win 17026 <nop,nop,md5:valid>

After a while I decided to change the MD5 key. The session with the new
key came up and looked fine, but the old session didn't close properly.
Notice the close is initiated from the Juniper side, and the first
packet from the Cisco side is now sent with an invalid MD5 digest - it
turns out that the packet is actually sent with an MD5 digest based on
the *new* key:

21:20:56.850905 IP 7.0.0.18.3961 > 7.0.0.9.179: F 173:173(0) ack 153 win 17026 <nop,nop,md5:valid>
21:20:57.845617 IP 7.0.0.18.3961 > 7.0.0.9.179: FP 152:173(21) ack 153 win 17026 <nop,nop,md5:valid>: BGP, length: 21
21:20:59.845711 IP 7.0.0.18.3961 > 7.0.0.9.179: FP 152:173(21) ack 153 win 17026 <nop,nop,md5:valid>: BGP, length: 21
21:21:03.846005 IP 7.0.0.18.3961 > 7.0.0.9.179: FP 152:173(21) ack 153 win 17026 <nop,nop,md5:valid>: BGP, length: 21
21:21:10.551830 IP 7.0.0.9.179 > 7.0.0.18.3961: P 153:172(19) ack 152 win 16149 <md5:invalid,eol>: BGP, length: 19

And since Juniper is now getting packets with invalid MD5 digests (for
this session, which hasn't yet been properly closed), we get a *long*
sequence (about 10 minutes) of:

21:23:03.854077 IP 7.0.0.18.3961 > 7.0.0.9.179: FP 152:173(21) ack 153 win 17026 <nop,nop,md5:valid>: BGP, length: 21
21:24:05.941658 IP 7.0.0.9.179 > 7.0.0.18.3961: FP 153:212(59) ack 152 win 16149 <md5:invalid,eol>: BGP, length: 59
21:24:07.858411 IP 7.0.0.18.3961 > 7.0.0.9.179: FP 152:173(21) ack 153 win 17026 <nop,nop,md5:valid>: BGP, length: 21
21:25:11.862725 IP 7.0.0.18.3961 > 7.0.0.9.179: FP 152:173(21) ack 153 win 17026 <nop,nop,md5:valid>: BGP, length: 21
21:25:35.028423 IP 7.0.0.9.179 > 7.0.0.18.3961: FP 153:212(59) ack 152 win 16149 <md5:invalid,eol>: BGP, length: 59
21:26:15.867021 IP 7.0.0.18.3961 > 7.0.0.9.179: FP 152:173(21) ack 153 win 17026 <nop,nop,md5:valid>: BGP, length: 21
21:27:04.115353 IP 7.0.0.9.179 > 7.0.0.18.3961: FP 153:212(59) ack 152 win 16149 <md5:invalid,eol>: BGP, length: 59
21:27:19.871338 IP 7.0.0.18.3961 > 7.0.0.9.179: FP 152:173(21) ack 153 win 17026 <nop,nop,md5:valid>: BGP, length: 21

And for every packet from the Juniper side (trying to close the old
BGP session properly, with the correct MD5 key for the old session),
the Cisco side now logs an invalid digest:

Apr 24 21:23:03.854 %TCP-6-BADAUTH: Invalid MD5 digest from 7.0.0.18(3961) to 7.0.0.9(179)
Apr 24 21:24:07.857 %TCP-6-BADAUTH: Invalid MD5 digest from 7.0.0.18(3961) to 7.0.0.9(179)
Apr 24 21:25:11.860 %TCP-6-BADAUTH: Invalid MD5 digest from 7.0.0.18(3961) to 7.0.0.9(179)
Apr 24 21:26:15.863 %TCP-6-BADAUTH: Invalid MD5 digest from 7.0.0.18(3961) to 7.0.0.9(179)
Apr 24 21:27:19.871 %TCP-6-BADAUTH: Invalid MD5 digest from 7.0.0.18(3961) to 7.0.0.9(179)

Meanwhile, the new session (with the new MD5 key) is up and all is
well *on that session*. But because the Cisco side keeps logging
these messages, it *looks* like the new session is somehow not
working.

As far as I can see, the bug here is clearly on the Cisco side. We
will definitely be logging a TAC case about this.

Steinar Haug, Nethelp consulting, sthaug@nethelp.no

Yes - I've noticed this when configuring MD5 on sessions. If I let the
old session timeout due to md5 mismatch, and then let it re-establish,
then the session seems to work, but continues to log the MD5 errors.

Doing a "clear ip bgp <nei>" on my side removes the problem.

Simon

This is a feature, and hopefully Juniper will also add the same feature
soon. The feature allows you to change the MD5 key without flapping the
BGP session which is very important on large peering sessions. There is
no requirement that MD5 keys must remain constant throughout an entire TCP
session, or to terminate a TCP session when the key changes. As long as
both sides agree, the key can be changed at any time including in the
middle of a TCP session. New packets after the key change are sent with
message digests based on the new key.

Key management is still an issue. It would be nice to be able to "roll"
the MD5 key change similar to more recent protocols. If you had a list
of valid keys, we wouldn't need to perfectly synchronize key changes.
But this would increase CPU utilization for failed packets, i.e. check
key, key + 1, key - 1, increasing the DOS risk.

> After a while I decided to change the MD5 key. The session with the new
> key came up and looked fine, but the old session didn't close properly.
> Notice the close is initiated from the Juniper side, and the first
> packet from the Cisco side is now sent with an invalid MD5 digest - it
> turns out that the packet is actually sent with an MD5 digest based on
> the *new* key:

This is a feature, and hopefully Juniper will also add the same feature
soon. The feature allows you to change the MD5 key without flapping the
BGP session which is very important on large peering sessions. There is
no requirement that MD5 keys must remain constant throughout an entire TCP
session, or to terminate a TCP session when the key changes. As long as
both sides agree, the key can be changed at any time including in the
middle of a TCP session. New packets after the key change are sent with
message digests based on the new key.

But as long as the session *is* reset anyway, the current situation is
extremely confusing - the log messages (on both Cisco and Juniper) give
no indication that the invalid key in question is for an *old* BGP
session, no longer active!

Steinar Haug, Nethelp consulting, sthaug@nethelp.no

That's why I hope Juniper will fix their implementation not to reset
the session and to stop using an old key. Once the key is changed, all
new packets (including new packets for old sessions) should use the new
key, not the old key.

You think the bug is on Cisco's side, I think the bug is on Juniper's
side. Hence interoperability.

Or, gosh, just use IPSec in AH mode which solves this problem by
allowing one to use very strong public-key auth (rsa, x509 ssl certs,
etc..) or simple (pre-shared-keys + a variety of symmetric ciphers,
from weak to strong) for initial authentication and hence
negotiation of a session key to be used for per-packet auth/integrity

the md5 hack was invented as a simple stopgap until availability of
ipsec, why perpetuate the hack ever more? adding rekeying features
to tcp-md5, eek!

regards,

It is easy to understand, that MD5 authenticates BGP connection, not a
session, and no any need to reset session etc (as Junioper and old Cisco are
doing) instead of just using new key after it wsa changed.

If someone want sa 'smooth' transition, they should implement dual check
(new key, old key, abort) to allow
interruption-less transition.

So, Cisco is 100% correct when it change session key instantly.