md5 for bgp tcp sessions

eric, all,

not to pick on eric at all, but since he raised the issue...

likely need to make modifications to our IGP/EGP setup. Though we filter
OSPF multicast traffic, we wanted to add in MD5 passwords to our
neighbors.

just a quick comment here. i would encourage you not to do that.

the md5 password hack to protect tcp sessions is rapidly falling out
of favor for a number of reasons. among them:

1) it protects against a very limited "vulnerability". for operating
systems that stay up for reasonable periods of time, that generate
sufficiently random ISNs and that check for in-windowness of syns and
rsts, there is a very limited exposure.

2) the cure is worse than the disease:
  
  a) many (all?) implementations of md5 protection of tcp expose
new, easy-to-exploit vulnerabilities in host OSes. md5 verification
is slow and done on a main processor of most routers. md5
verification typically takes places *before* the sequence number,
ports, and ip are checked to see whether they apply to a valid
session. as a result, you've exposed a trivial processor DOS to your
box.
  b) coordination problems cause downtime. password
coordination problems are reported to be a major cause of downtime
among peers that i interact with. this downtime is costly and is much
greater than the downtime caused by the (theoretical and not actively
exploited) tcp "vulnerability"

i would encourage everyone to seriously rethink the routine use of MD5
passwords to protect BGP tcp sessions.

t.

the md5 password hack to protect tcp sessions is rapidly falling out
of favor for a number of reasons. among them:

1) it protects against a very limited "vulnerability". for operating
systems that stay up for reasonable periods of time, that generate
sufficiently random ISNs and that check for in-windowness of syns and
rsts, there is a very limited exposure.

Well, it isn't really as bad as all that (and I don't think I've ever been
accused of being a BGP MD5 lover).

Yes, everyone who knows how TCP works knows that the "vulnerability" that
was "discovered" is precisely how TCP was meant to work, and if you ever
thought it worked otherwise you were really confused and/or misinformed.
But out of all the hoopla, we've ended up widely deploying a fairly nifty
hack that helps prevent this type of attack at the protocol level, across
a wide variety of routers and systems. While it didn't actually stop any
attacks in the wild (because there were never any to begin with), it never
hurts to harden your protocol implementation if there is no tradeoff.

2) the cure is worse than the disease:
  
  a) many (all?) implementations of md5 protection of tcp expose
new, easy-to-exploit vulnerabilities in host OSes. md5 verification
is slow and done on a main processor of most routers. md5
verification typically takes places *before* the sequence number,
ports, and ip are checked to see whether they apply to a valid
session. as a result, you've exposed a trivial processor DOS to your
box.

Well, I think they've finally fixed this one by now, at least everyone
that I'm aware of has done so. Immediately following the whining to start
deploying MD5 is was certainly the case that many implementations did
stupid stuff like process MD5 before running other validity checks like
sequence numbers which are far less computationally intensive, and there
were a few MSS bugs that popped up, but they should have all been worked
out by now. I don't think that anyone running modern code is suffering any
more attack potential because of this.

  b) coordination problems cause downtime. password
coordination problems are reported to be a major cause of downtime
among peers that i interact with. this downtime is costly and is much
greater than the downtime caused by the (theoretical and not actively
exploited) tcp "vulnerability"

i would encourage everyone to seriously rethink the routine use of MD5
passwords to protect BGP tcp sessions.

This one is really the heart of the problem, which has far more to do with
those silly humans behind the keyboards than it does with any protocols.
If you were working with intelligent, responsive, organized folks,
deploying MD5 probably wasn't difficult to do at all. If however you were
working with the clueless, paranoid, unresponsive, disorganized folks that
most of us were dealing with, you probably did a lot of swearing that
week.

Before this incident, it was much more difficult to explain, pick, share,
and configure the MD5 keys with all of your idiot peers, so just the fire
drill effect probably did help organize people a little bit. As long as
you don't get carried away with it, deploying MD5 everywhere is probably
not going to hurt anything, and has become the new path of least
resistance.

Just please realize that this is a trivial layer of security, an extra
little bit of insurance to make it harder to alter the packets in flight
or screw with the delivery protocol, and as such the key is not a state
secret. I am going to seriously hurt the next person who wants to exchange
phone numbers via pgp encrypted email so that we can have a conference
call to set up a meeting where we can whisper MD5 keys to each other in
pig latin while standing under the god damned cone of silence and then
shoot the engineers who configured it on the router afterwards.

It's not just trivial, it's nearly useless.

Would someone please raise their hand if they have ever seen this attack in the wild? Anyone?

Seems the TTL hack is much more effective at guarding against this sort of thing, doesn't require "secrets", far less CPU intensive, easier to configure, etc., etc., etc.

Want security? I suggest you use something that has more benefit than cost.

ras, all,

> a) many (all?) implementations of md5 protection of tcp expose
> new, easy-to-exploit vulnerabilities in host OSes. md5 verification
> is slow and done on a main processor of most routers. md5
> verification typically takes places *before* the sequence number,
> ports, and ip are checked to see whether they apply to a valid
> session. as a result, you've exposed a trivial processor DOS to your
> box.

Well, I think they've finally fixed this one by now, at least everyone
that I'm aware of has done so. Immediately following the whining to start
deploying MD5 is was certainly the case that many implementations did
stupid stuff like process MD5 before running other validity checks like
sequence numbers which are far less computationally intensive, and there
were a few MSS bugs that popped up, but they should have all been worked
out by now. I don't think that anyone running modern code is suffering any
more attack potential because of this.

my understanding is that md5 is still checked before the ttl-hack
check takes place on cisco (and perhaps most router platforms). new
attack vector for less security than you had before. oh well. ras:
can you confirm that it is possible to implement ttl-hack and have it
check *before* md5 signature checks?

the chaos (and crappy quality of the implementations) during the panic
demonstrates two other things: rolling out magic code because your
vendor tells you to is a bad idea; slapping together a hack on top of
a well-designed protocol without careful thought and testing is a
terrible idea.

t.

Todd,

eric, all, not to pick on eric at all, but since he raised the issue...

I always assume and, frankly hope, that when I post something someone will
pipe up and point out anything thats inaccurate, needs clarification,
is a bad idea, etc.

> likely need to make modifications to our IGP/EGP setup. Though we filter
> OSPF multicast traffic, we wanted to add in MD5 passwords to our
> neighbors.

just a quick comment here. i would encourage you not to do that.

Honestly, I completely agree with you that MD5'ing our OSPF adjacencies isn't
a great idea (I've so far stalled its roll-out). I strongly argued against it
internally. There were, however, those in both the networking and security
groups that were concerned about the OSPF vulnerabilities that were pointed
out recently and were in favor of the MD5s as the mitigation method. I used
the discussion as a point in favor of moving to IS-IS because, since we don't
route CLNS on our campus, IS-IS would be more immune to that form of attack.
I just noted the issue in my response because it was one of the reaons why
we're deciding to move from OSPF to IS-IS, rather than as a recommendation.

Thanks for pointing it out!

Eric :slight_smile:

Just in case it's not obvious to any onlookers here, Eric was talking about using MD5 authentication in OSPF adjacencies, and Todd is talking about using the TCP MD5 signature option (RFC2385) between BGP peers.

They are two different things (although they both involve routing protocols and the MD5 algorithm): not all arguments for or against one will apply to the other.

Joe

Last I knew there was still a bug open on this that has gotten
little/no action for at least half a year on this issue, I would
think that in 6mos someone at Cisco could take the time to research
the bug and fix it. (I'll leave out the part about releasing TAC supported
code with a fix).

  I believe the bugid is CSCee73956

  - Jared

Eric Gauthier <eric@roxanne.org> writes:

Honestly, I completely agree with you that MD5'ing our OSPF
adjacencies isn't a great idea (I've so far stalled its roll-out).
I strongly argued against it internally. There were, however, those
in both the networking and security groups that were concerned about
the OSPF vulnerabilities that were pointed out recently and were in
favor of the MD5s as the mitigation method.

passive-interface is your friend.

                                        ---rob

my understanding is that md5 is still checked before the ttl-hack
check takes place on cisco (and perhaps most router platforms). new
attack vector for less security than you had before. oh well. ras:
can you confirm that it is possible to implement ttl-hack and have it
check *before* md5 signature checks?

The TTL hack itself (this is the one where your neighbor sets their
outbound TTL to 255 and then you can drop the packet if it has a TTL less
than 254, in case anyone wasn't paying attention) can be implemented on
the data plane in the receive/loopback hardware based filters before any
TCP processing happens or the packet ever gets near the management cpu,
since it is an IP-specific check. The only thing that the TTL hack
guarantees is that the packet hasn't traveled over any routed network to
get to you, so for example you could still get hit by directly connected
networks (across a public exchange with malicious or compromised
participants, for example). This is different from the issue of whether
the sequence number is checked before the MD5 signature.

Remember the entire point of this attack was that some bright person
"realized" that with most people having a default TCP window size of 16384
(btw I'm told that this isn't necessarily the case, and that at least some
vendors are lowering their socket buffers on the BGP specific sockets for
other unrelated reasons) or 2^14 you only need to try 2^18 combinations
per ephemeral port and bgp port pair instead of 2^32, times the number of
ephemeral ports you must test, times 2 to handle BGP collision detection
which may set the session up in either direction. You still have to throw
a couple billion packets at the victim and hope for a match, and only
after you get this match do you need to proceed to the next step of
validation on the one packet that managed to get through. If you are doing
MD5 validation on every packet that comes in before you check the other
criteria like sequence number, you are opening yourself up to a very easy
DoS by anyone who wants to throw junk MD5 signed packets at you.

the chaos (and crappy quality of the implementations) during the panic
demonstrates two other things: rolling out magic code because your
vendor tells you to is a bad idea; slapping together a hack on top of
a well-designed protocol without careful thought and testing is a
terrible idea.

This wasn't the first or last time the vendors have told us we must
upgrade immediately to some buggy new code because of some secret reason
they can't disclose without killing us afterwards, only to find out it was
a dud issue we wouldn't have cared about if we had been given technical
details beforehand. While I suppose this is slightly better than them
putting out a press release 24 hours after the discovery with everything
but some example exploit code that compiles on linux, I think the point
that we're all trying to make is that we'd like the vendors to find the
happy middle ground between stupidity and paranoia.