carping about CARP

I can't seem to recall anyone griping about this here on our august
little list but google finds that I'm by no means the first to have
been burned by an unholy interaction between VRRP and CARP.

Let's skip the protocol discussions (same protocol number and uses
multicast) [*] and go straight to the behavioral observations.

I turned on VRRP this evening on a pair of routers. All of a sudden a
CARP instance between a pair of pfSense boxes in the rack (which I
didn't even know was there) invited itself to the party and started
flailing all over the place and causing oscillating packet loss for
anything that was going off-segment.

Note that the Ciscos didn't exhibit any untoward behavior, and there
were "passwords" on the VRRP sessions too. Meanwhile, the pfSense box
spazzed out and filled its dmesg logs with stuff like:

arp: 192.0.2.1 moved from 00:00:0c:xx:xx:01 to 00:00:5e:xx:xx:01 on em1
arp: 192.0.2.1 moved from 00:00:5e:xx:xx:01 to 00:00:0c:xx:xx:01 on em1

(no other hosts on the segment were logging such activity)

Looks like CARP is a bit loose about believing stuff coming in over
the wire. Seems a bit out of character for OpenBSD, but maybe these
days it's considered all good so long as such a malfunction only
causes an outage, not a core dump.

Anyway, word to the wise, CARP and VRRP is a bit of a dangerous mixture.

-r

[*] The OpenBSD side of the story can be read at
http://en.wikipedia.org/wiki/Common_Address_Redundancy_Protocol#No_official_Internet_protocol_number

Seems that there is a lesson to be learned here:

"o hai, we wrote this software but can not be bothered to follow your
process or formally write up the protocol, plz to be giving us a
protocol number" ain't gonna fly.

case of the same situation all[1] 'software md5 tcp' implementations have?
sign but never verify...

-chris

[1]: solaris's md5 and I believe the linux one do this :frowning:

[*] The OpenBSD side of the story can be read at
Common Address Redundancy Protocol - Wikipedia

Seems that there is a lesson to be learned here:

"o hai, we wrote this software but can not be bothered to follow your
process or formally write up the protocol, plz to be giving us a
protocol number" ain't gonna fly.

This tells me pretty much everything I need to know about this:

Theo's comments in context here:
http://marc.info/?l=openbsd-misc&m=133832434412686&w=2

The article in question:
http://queue.acm.org/detail.cfm?id=2090149
I recommend reading the comments.

From where I stand, the OpenBSD project has been consistent on

insulating itself against future legal issues, no matter how remote,
with the idea that your security should not be restrained by anyone
other than you.
I believe that idea has legs regardless of practical considerations
and stands on it's own.

Besides, I won't discount OpenBSD out of hand for forging ahead,
withstanding practical issues, considering the runs they've got on the
board and the many facepalm fails we see in the diametrically opposed
corporate world.
It might be a very good thing they've bothered to take the time on this.

Best wishes.

case of the same situation all[1] 'software md5 tcp' implementations
have? sign but never verify...

and freebsd :frowning:

The amount of detail in the original posting is rather disappointing,
with absolutely no hope of anyone being able to reproduce the problem
with the data given.

Did the vhid and vrrp group overlap? Were there duplicate IP addresses?

David Walker <davidianwalker@gmail.com> writes:

[ patent fight recap ]

Thanks for posting those. I recall the discussions surrounding the
HSRP patents well, but it's been a while and I have proportionally
more gray hair (and less overall) now.

My problem is not with Theo nor with the IETF. My problem is with a
crappy and credulous implementation. When an outage is caused by
redundancy software that comes from an organization that prides itself
on well-written code, the irony meter goes off the scale.

From where I stand, the OpenBSD project has been consistent on
insulating itself against future legal issues, no matter how remote,
with the idea that your security should not be restrained by anyone
other than you.

What is "security" though and what it its aim? To my way of thinking,
what happened to me last night wherein a box misbehaved and caused
indigestion on an entire broadcast domain was a non-trivial security
and availability incident.

On the scale of badness, it's somewhat worse than a "magic packet
causes this box to reboot" flaw, but not as bad as a "box gets owned,
sensitive data gets divulged" incident. In my world, at least,
security and availability are intimately intertwined. Were they not,
one could easily "win" the security "game" by the simple expedient of
turning the host off. Mission accomplished!

I believe that idea has legs regardless of practical considerations
and stands on it's own.

Besides, I won't discount OpenBSD out of hand for forging ahead,
withstanding practical issues, considering the runs they've got on the
board and the many facepalm fails we see in the diametrically opposed
corporate world.

It might be a very good thing they've bothered to take the time on this.

The problem here is "insufficient paranoia about packets that come
flying in over the transom, based on naive contemporaneous belief that
a particular protocol number was not in use". I mean, gosh, who would
ever send packets on an unused protocol number? And who other than us
would get frustrated with the process and decide to forge ahead on
their own.

Most of us here are familiar with Postel's oft-quoted RFC793
robustness principle ("be conservative in what you do, be liberal in
what you accept from others"). Yet, when one is engaging in an
off-label use of any protocol, identifier, etc. it is incumbent on the
protocol designer to mark their traffic in a particular way so that it
is easy to identify, both for themselves and for others. Sure, one
could argue that this is merely abstracting away the semantics of the
protocol number field (hopefully to a field with more data space) but
the whole point is to not accidentally interoperate with something
with which you are not prepared to interoperate.

Stated another way, nothing is keeping me from using udp/139 for
something else so long as my packets aren't misinterpreted by SMB
servers out there as being SMB, and so long as I don't accidentally
eat someone else's SMB and do something stupid.

Would you eat food that someone left on your doorstep with no note and
no hint as to who it came from? Obviously from your mom, right? I
mean who else would leave food on your doorstep? How about Halloween
candy with open wrappers? The comparisons in the messages you cited
to a four year old may not be that far off.

-r

Which is fair enough from the ietf's point of view. Having said that,

1. patent US5473599 is pretty general and I can't see why it wouldn't apply
to any host running CARP for router NHRP - although it's not clear that it
would apply to e.g. host service high availability addressing. IOW, it's
not at all clear that this is a legally unencumbered protocol, despite the
bleatings from the openbsd camp.

2. the original patent is due to expire in about 18 months (april 2014) and
i can't immediately see any cip applications which might extend it. This
will render the debate substantially redundant.

Regarding the ruckus between the openbsd camp and the ietf, the ietf's
position is here:

http://www.ietf.org/mail-archive/web/vrrp/current/msg00350.html

It looks like there wasn't any serious attempt on the part of the openbsd
people to engage with the ietf. There were no drafts, barely any mailing
list postings to either the vrrp (now concluded) or routing discussion WGs,
and apparently only a single presentation at a single ietf meeting. Maybe
I've missed something though - I haven't checked the openbsd mailing lists
because apparently their archives aren't publicly accessible.

It's not at all clear why the openbsd people expected that a "petition" to
IANA would result in them being assigned an official protocol number for
CARP. There are only 254 of these available so it's not unreasonable to
decline to register them unless there is a strong written case to do so.
There's a policy in place for this (rfc5237), and it's in place for a good
reason.

As for the openbsd position on the choice of protocol number:

"Consequently we were forced to choose a protocol number which would not
conflict with anything else of value, and decided to place CARP at IP
protocol 112"

My goodness, what a co-incidence that they happened to choose the same
protocol number as VRRP.

http://www.ietf.org/mail-archive/web/ietf/current/msg48988.html

Good thing this wasn't ever going to cause people trouble.

Nick

* Robert E. Seastrom <rs@seastrom.com> [2012-11-30 13:46]:

My problem is not with Theo nor with the IETF. My problem is with a
crappy and credulous implementation. When an outage is caused by
redundancy software that comes from an organization that prides itself
on well-written code, the irony meter goes off the scale.

vrrp and carp share the vhid space. you have to use unique vhids per
network segment, that's about it.

the openbsd box was nice enough to tell you about the mac address
conflict, the other's didn't.

if you looked at the carp boxes you had seen that carp had continued
to work just fine. the mac address (which is basically "fixed prefix +
vhid) conflict is your "outage". there's nothing we could do about
that.

and re IANA, they made it clear they would not give us a proto number
no matter what; we didn't have a choice but to ignore that
industry-money-driven committee.

I can't seem to recall anyone griping about this here on our august
little list but google finds that I'm by no means the first to have
been burned by an unholy interaction between VRRP and CARP.

Let's skip the protocol discussions (same protocol number and uses
multicast) [*] and go straight to the behavioral observations.

I turned on VRRP this evening on a pair of routers. All of a sudden a
CARP instance between a pair of pfSense boxes in the rack (which I
didn't even know was there) invited itself to the party and started
flailing all over the place and causing oscillating packet loss for
anything that was going off-segment.

Note that the Ciscos didn't exhibit any untoward behavior, and there
were "passwords" on the VRRP sessions too. Meanwhile, the pfSense box
spazzed out and filled its dmesg logs with stuff like:

arp: 192.0.2.1 moved from 00:00:0c:xx:xx:01 to 00:00:5e:xx:xx:01 on em1
arp: 192.0.2.1 moved from 00:00:5e:xx:xx:01 to 00:00:0c:xx:xx:01 on em1

(no other hosts on the segment were logging such activity)

All this shows is that the IP address is flip-flopping between
a Cisco MAC address and a CARP/VRRP unicast MAC address.
I would double check the vrrp config and make sure that the vrrp
IP address is *only* configured on vrrp, not ethernet interfaces.

Looks like CARP is a bit loose about believing stuff coming in over
the wire. Seems a bit out of character for OpenBSD, but maybe these
days it's considered all good so long as such a malfunction only
causes an outage, not a core dump.

I don't see anything here indicating that it's to do with CARP
believing things sent over the wire, I suspect the problem would still
occur if CARP were disabled on the pfSense box. (Do people really
run CARP in the wild without authentication anyway?)

openbsd verifies these, btw.

Henning Brauer <hb-nanog@bsws.de> writes:

* Robert E. Seastrom <rs@seastrom.com> [2012-11-30 13:46]:

My problem is not with Theo nor with the IETF. My problem is with a
crappy and credulous implementation. When an outage is caused by
redundancy software that comes from an organization that prides itself
on well-written code, the irony meter goes off the scale.

vrrp and carp share the vhid space. you have to use unique vhids per
network segment, that's about it.

the openbsd box was nice enough to tell you about the mac address
conflict, the other's didn't.

pfSense is FreeBSD, but who's counting? The problem is magnified when
ill-behaved software ends up in appliances. Good thing we were able
to get a shell on the box.

if you looked at the carp boxes you had seen that carp had continued
to work just fine. the mac address (which is basically "fixed prefix +
vhid) conflict is your "outage". there's nothing we could do about
that.

and re IANA, they made it clear they would not give us a proto number
no matter what; we didn't have a choice but to ignore that
industry-money-driven committee.

Between choosing an Ethernet OUI which was assigned to IANA by IEEE
(another "industry-money-driven committee") and choosing protocol 112
(odds of coincidence 1 in what, 120 or so at the time?), "ignore" is
not the word I would have chosen here.

-r

Jussi Peltola <pelzi@pelzi.net> writes:

The amount of detail in the original posting is rather disappointing,
with absolutely no hope of anyone being able to reproduce the problem
with the data given.

It was not intended as a bug report, instead merely an expression of
disappointment and an advsory to fellow travelers to watch their backs.
Sometimes a report of muggings in a locale is useful, even without a
detailed description of the attacker.

Did the vhid and vrrp group overlap? Were there duplicate IP addresses?

Yes, "vrrp 1" turned out to be a bad plan here.

Turned off vrrp on the router and went with HSRP. There is enough
documentation on HSRP vs VRRP around (heck, even Wikipedia) to surmise
that something that interacted poorly with VRRP would likely not do
the same to HSRP. Docs on CARP are thin on the ground. Never even an
I-D. Didn't have time to read the source code when the network was
acting up.

-r

Stuart Henderson <stu@spacehopper.org> writes:

I don't see anything here indicating that it's to do with CARP
believing things sent over the wire, I suspect the problem would still
occur if CARP were disabled on the pfSense box. (Do people really
run CARP in the wild without authentication anyway?)

1) it did not.

2) standard, out of the box pfSense distribution. Haven't run that
codebase lately myself, and not sufficiently interested this morning
to dig through the code.

Just watch your back, that's all. :slight_smile:

-r

Comments inline ... as best I can.

David Walker <davidianwalker@gmail.com> writes:

[ patent fight recap ]

Thanks for posting those. I recall the discussions surrounding the
HSRP patents well, but it's been a while and I have proportionally
more gray hair (and less overall) now.

My problem is not with Theo nor with the IETF. My problem is with a
crappy and credulous implementation. When an outage is caused by
redundancy software that comes from an organization that prides itself
on well-written code, the irony meter goes off the scale.

You should hammer on OpenBSD.

However, as yet this is an unknown.

As far irony goes, there is some here but I'm not sure what you've got
is countable yet.

From where I stand, the OpenBSD project has been consistent on
insulating itself against future legal issues, no matter how remote,
with the idea that your security should not be restrained by anyone
other than you.

What is "security" though and what it its aim? To my way of thinking,
what happened to me last night wherein a box misbehaved and caused
indigestion on an entire broadcast domain was a non-trivial security
and availability incident.

Of course.

On the scale of badness, it's somewhat worse than a "magic packet
causes this box to reboot" flaw, but not as bad as a "box gets owned,
sensitive data gets divulged" incident. In my world, at least,
security and availability are intimately intertwined. Were they not,
one could easily "win" the security "game" by the simple expedient of
turning the host off. Mission accomplished!

The phrase you're looking for is denial of service, a known security phenomena.

I believe that idea has legs regardless of practical considerations
and stands on it's own.

Besides, I won't discount OpenBSD out of hand for forging ahead,
withstanding practical issues, considering the runs they've got on the
board and the many facepalm fails we see in the diametrically opposed
corporate world.

It might be a very good thing they've bothered to take the time on this.

The problem here is "insufficient paranoia about packets that come
flying in over the transom, based on naive contemporaneous belief that
a particular protocol number was not in use". I mean, gosh, who would
ever send packets on an unused protocol number? And who other than us
would get frustrated with the process and decide to forge ahead on
their own.

As far as not using the same protocol number, that's neither here nor there.
Something I've noticed looking at information security is the taxonomy
of Confidentiality, Integrity, Availability - which addresses your
previous points.
Something else I've noticed is the notion of security through
obscurity and how it cedes the initative to the attacker.
Experience tells me this is not lost on the OpenBSD folks.
Translation, it's commonly understood that secure protocols shouldn't
rely on trusting others to obey the rules ... and whether or not it's
OpenBSD or Johnny Black Hat that's on 122 or whatever, if that causes
issues then it's either down to the protocol or the administrator. I
have no doubt OpenBSD understood all this.
If I take Theo's word for it, he employed a mechanism available in the
rfc (i.e. VRRP) to allow traffic to be differentiated.
Regardless, if a competing implementation can cause a DoS or any other
issue that's either a design failure that should be addressed in a
subsequent rfc or if it's a design limitation, then it's a failure to
concommittantly secure the network.
Blaming OpenBSD for protocol number won't fly.
If I'm to take Stuart's cue then somebody hasn't read the documentation. Simple.

Most of us here are familiar with Postel's oft-quoted RFC793
robustness principle ("be conservative in what you do, be liberal in
what you accept from others"). Yet, when one is engaging in an
off-label use of any protocol, identifier, etc. it is incumbent on the
protocol designer to mark their traffic in a particular way so that it
is easy to identify, both for themselves and for others. Sure, one
could argue that this is merely abstracting away the semantics of the
protocol number field (hopefully to a field with more data space) but
the whole point is to not accidentally interoperate with something
with which you are not prepared to interoperate.

At a casual reading, looking at the security considerations of for example ...
http://tools.ietf.org/rfc/rfc3768.txt
... suggests to me that there are exploitable vectors inherent to this protocol.
I'll say it again, I'm no subject matter expert.
I'd be happy for you point me in the right direction, otherwise you're
going to have to wait for me to get up to speed.

Otherwise see previous, if there are no mechanisms to secure VRRP or
CARP then either the network or the machine needs to be secure or the
protocol shouldn't be in service or relied upon.

Stated another way, nothing is keeping me from using udp/139 for
something else so long as my packets aren't misinterpreted by SMB
servers out there as being SMB, and so long as I don't accidentally
eat someone else's SMB and do something stupid.

No matter what protocol we look at, ultimately that comes down to
protocol design.
After that is network design.
If a protocol is open to attack by unauthenticated users then it's up
to me to secure the network against unauthenticated users.
Expecting only legitimate traffic no matter what the door or window
we're looking at is not the right way to do it.
The bad guys certainly don't care either way whether you want
malformed packets or not or complimentary looking implementations or
not.

and re IANA, they made it clear they would not give us a proto number

As they should have. IANA abides by the rules laid down for it by the IETF/IESG/IAB. The openbsd folks couldn't be bothered to even write up a draft and chose to squat on a protocol number.

no matter what;

BS. If the openbsd folks followed the rules, they'd have gotten the number(s) they requested (assuming they were justified). There is no grand persecution here. There is management of a limited resource.

we didn't have a choice but to ignore that industry-money-driven committee.

Which 'industry-money-driven committee' would that be?

Regards,
-drc

This issue came up originally during my tenure at IANA, and FWIW I
concur with David. I have a vague memory of engaging directly with some
folks from OpenBSD and letting them know that I was sympathetic with
their situation, but IANA has strict rules to follow, and unless they
followed procedure my hands were tied.

Re the "industry-money-driven committee" bit, at the time (and in fact,
up until recently) I was a FreeBSD committer myself, so if anything I
was *more* inclined to be sympathetic to those from the OS community who
submitted applications. I can also assure you that we did assign code
points to a non-trivial number of open source applicants _who followed
the documented procedures_.

Doug

I believe that idea has legs regardless of practical considerations
and stands on it's own.

Besides, I won't discount OpenBSD out of hand for forging ahead,
withstanding practical issues, considering the runs they've got on the
board and the many facepalm fails we see in the diametrically opposed
corporate world.

It might be a very good thing they've bothered to take the time on this.

The problem here is "insufficient paranoia about packets that come
flying in over the transom, based on naive contemporaneous belief that
a particular protocol number was not in use". I mean, gosh, who would
ever send packets on an unused protocol number? And who other than us
would get frustrated with the process and decide to forge ahead on
their own.

Perhaps we should ask IETF/IANA to allocate a group of protocol numbers
to "the wild west". A protocol-number equivalent of RFC-1918 or private ASNs.
You can use these for whatever you want, but so can anyone else and if you
do, you do so at your own risk.

This won't entirely solve the problem, but at least it would provide some
level of shield for protocol numbers that are registered to particular
purposes through the IETF/IANA process.

Owen

> and re IANA, they made it clear they would not give us a proto number

As they should have. IANA abides by the rules laid down for it by the
IETF/IESG/IAB. The openbsd folks couldn't be bothered to even write up a
draft and chose to squat on a protocol number.

> no matter what;

BS. If the openbsd folks followed the rules, they'd have gotten the
number(s) they requested (assuming they were justified). There is no
grand persecution here. There is management of a limited resource.

IETF already decided that VRRP was the way to go. So an alternative
implementation would not have been accepted. The result would be a draft
that would never be adopted and so it is back to start.

Still carp packets can coexist with vrrp packets. They use a different
version numbers. Also you need to use a different vhid but the same thing
is true if you have 2 groups of vrrp on the same lan. If you configure
something like VRRP you should run a quick tcpdump first and check
if there are not unexpected packets showing up. This is especially
important for any protocol that does a link local multicast or broadcast.
This is basic network admin best practice (at least I expect that from a
network engineer).

> we didn't have a choice but to ignore that industry-money-driven committee.

Which 'industry-money-driven committee' would that be?

Did you ever read any of the IETF mailing lists and looked at the email
addresses of those people pushing the hardest? At least in the ones I'm
subscribed to the bias is obvious.

Horse pucky. On the Internet, the secure and reliable players
co-ordinate their protocol actions through the IANA, using the
published IANA rules for how you get a protocol identifier. This case
is a straightforward example of a bunch of people angry at things not
going their way, and treading all over a well-defined, open process
becuse they didn't like the actions of some of the participants.

I don't like those actions either, but if proponents cannot bother to
publish an Internet-Draft describing CARP, it's pretty hard to take
CARP seriously as anything like a "protocol". It's just rude
behaviour on someone else's well-defined port.

A

implementation would not have been accepted. The result would be a draft
that would never be adopted and so it is back to start.

"Adopted" by whom? The procedure, even at the time, did not require
in any way IETF consensus. Getting a number requires that you tell
others what is going on, not that you justify the going on itself.

Did you ever read any of the IETF mailing lists and looked at the email
addresses of those people pushing the hardest? At least in the ones I'm
subscribed to the bias is obvious.

I think that _ad hominem_ arguments are fallacious, and should be
dismissed as such.

A