BGP Experiment

NANOG,

We would like to inform you of an experiment to evaluate alternatives
for speeding up adoption of BGP route origin validation (research
paper with details [A]).

Our plan is to announce prefix 184.164.224.0/24 with a valid
standards-compliant unassigned BGP attribute from routers operated by
the PEERING testbed [B, C]. The attribute will have flags 0xe0
(optional transitive [rfc4271, S4.3]), type 0xff (reserved for
development), and size 0x20 (256bits).

Our collaborators recently ran an equivalent experiment with no
complaints or known issues [A], and so we do not anticipate any
arising. Back in 2010, an experiment using unassigned attributes by
RIPE and Duke University caused disruption in Internet routing due to
a bug in Cisco routers [D, CVE-2010-3035]. Since then, this and other
similar bugs have been patched [e.g., CVE-2013-6051], and new BGP
attributes have been assigned (BGPsec-path) and adopted (large
communities). We have successfully tested propagation of the
announcements on Cisco IOS-based routers running versions 12.2(33)SRA
and 15.3(1)S, Quagga 0.99.23.1 and 1.1.1, as well as BIRD 1.4.5 and
1.6.3.

We plan to announce 184.164.224.0/24 from 8 PEERING locations for a
predefined period of 15 minutes starting 14:30 GMT, from Monday to
Thursday, between the 7th and 22nd of January, 2019 (full schedule and
locations [E]). We will stop the experiment immediately in case any
issues arise.

Although we do not expect the experiment to cause disruption, we
welcome feedback on its safety and especially on how to make it safer.
We can be reached at disco-experiment@googlegroups.com.

Amir Herzberg, University of Connecticut
Ethan Katz-Bassett, Columbia University
Haya Shulman, Fraunhofer SIT
Ítalo Cunha, Universidade Federal de Minas Gerais
Michael Schapira, Hebrew University of Jerusalem
Tomas Hlavacek, Fraunhofer SIT
Yossi Gilad, MIT

[A] https://conferences.sigcomm.org/hotnets/2018/program.html
[B] http://peering.usc.edu
[C] https://goo.gl/AFR1Cn
[D] https://labs.ripe.net/Members/erik/ripe-ncc-and-duke-university-bgp-experiment
[E] https://goo.gl/nJhmx1

Dear Italo,

Thanks for giving the community a heads-up on your plan! I think your
announcement like these are the best anyone can do when trying legal
but new BGP path attributes.

I'll forward this message to other NOGs and make sure that our NOC
adds it to their calendar.

Kind regards,

Job

NANOG,

We've performed the first announcement in this experiment yesterday,
and, despite the announcement being compliant with BGP standards, FRR
routers reset their sessions upon receiving it. Upon notice of the
problem, we halted the experiments. The FRR developers confirmed that
this issue is specific to an unintended consequence of how FRR handles
the attribute 0xFF (reserved for development) we used. The FRR devs
already merged a fix and notified users.

We plan to resume the experiments January 16th (next Wednesday), and
have updated the experiment schedule [A] accordingly. As always, we
welcome your feedback.

[A] https://goo.gl/nJhmx1

* cunha@dcc.ufmg.br (Italo Cunha) [Tue 08 Jan 2019, 17:42 CET]:

[A] https://goo.gl/nJhmx1

For the archives, since goo.gl will cease to exist soon, this links to
https://docs.google.com/spreadsheets/d/1U42-HCi3RzXkqVxd8e2yLdK9okFZl77tWZv13EsEzO0/htmlview

After seeing this initial result I'm wondering why the researchers couldn't set up their own sandbox first before breaking code on the internet. I believe FRR is a free download and comes with GNU autoconf.

  -- Niels.

[A] https://goo.gl/nJhmx1

For the archives, since goo.gl will cease to exist soon, this links to
https://docs.google.com/spreadsheets/d/1U42-HCi3RzXkqVxd8e2yLdK9okFZl77tWZv13EsEzO0/htmlview

After seeing this initial result I’m wondering why the researchers
couldn’t set up their own sandbox first before breaking code on the
internet. I believe FRR is a free download and comes with GNU autoconf.

– Niels.

There are a fair number of open source BGP implementations now. It would require additional effort to test all of them.

Tom

Hi Niels, we did run the experiment in a controlled environment with
different versions of Cisco, BIRD, and Quagga routers and observed no
issues. We did add FRR to the test suite yesterday for future tests.

Hey,

After seeing this initial result I'm wondering why the researchers
couldn't set up their own sandbox first before breaking code on the
internet. I believe FRR is a free download and comes with GNU autoconf.

We probably should avoid anything which might demotivate future good
guys from finding breaking bugs and reporting them, while sending
perfectly standard-compliant messages. Only ones who will win are bad
guys who collect libraries of how-to-break-internet.
There are certainly several transit packet of deaths and BGP parser
bugs in each implementation, I'd rather have good guy trigger them and
give me details why my network broke, than have bad guy store them for
future use.

Perhaps you'd like to supply the researchers (and us) with a *complete*
list of all BGP-speaking software in use on the Internet? (Personally, I'd
never heard of FRR before)

the researchers didn't break code - their test unearthed broken code.

That code has now been fixed, so this is a good result.

Nick

* thomasammon@gmail.com (Tom Ammon) [Tue 08 Jan 2019, 17:59 CET]:

There are a fair number of open source BGP implementations now. It would require additional effort to test all of them.

In the real world, doing the correct thing is often harder than doing an incorrect thing, yes.

  -- Niels.

* valdis.kletnieks@vt.edu (valdis.kletnieks@vt.edu) [Tue 08 Jan 2019, 18:06 CET]:

(Personally, I'd never heard of FRR before)

Martin Winter of OSR/FRR has attended many a NANOG, RIPE and other industry meetings, so it's not for their lack of trying

  -- Niels.

[A] https://goo.gl/nJhmx1

For the archives, since goo.gl will cease to exist soon, this links to
https://docs.google.com/spreadsheets/d/1U42-HCi3RzXkqVxd8e2yLdK9okFZl77tWZv13EsEzO0/htmlview

After seeing this initial result I’m wondering why the researchers
couldn’t set up their own sandbox first before breaking code on the
internet. I believe FRR is a free download and comes with GNU autoconf.

There are a fair number of open source BGP implementations now. It would require additional effort to test all of them.

Not just every implementation, but also every version, and every configuration permutation. This type of black box testing is not scalable. It is not feasible work, nor the job of these researchers. It’s the job of the software the developer to ensure the product is standards compliant.

In the case of FRR:

  • improper use of the 0xFF codepoint
  • FRR is not compliant with RFC 7606 (the devs indicated they will be working on this)

Ultimately, the developers are responsible for their product, not random other internet users. This situation was avoidable if standards had been followed.

I’m happy the FRR developers quickly identified the issue and published a fix. We can now all move on.

Kind regards,

Job

Yeah, I think it also gets complicated as some of us also have our own internal BGP speakers as well. Taking MRT files from route-views or RIPE RIS and replaying them is certainly helpful to simulate certain events. I’ve found a lot of interesting “new attribute” experiments when I had a poorly written MRT parser that would trigger periodically when something new hit the internet.

(FRR is descendent of Zebra/Quagga world)

- Jared

Hi Saku,

After seeing this initial result I'm wondering why the researchers couldn't set up their own sandbox first before breaking code on the internet. I believe FRR is a free download and comes with GNU autoconf.

We probably should avoid anything which might demotivate future good
guys from finding breaking bugs and reporting them, while sending
perfectly standard-compliant messages. Only ones who will win are bad
guys who collect libraries of how-to-break-internet.
There are certainly several transit packet of deaths and BGP parser
bugs in each implementation, I'd rather have good guy trigger them and
give me details why my network broke, than have bad guy store them for
future use.

I fully agree with you. However, this doesn't give 'good guys' carte blanche to break stuff. I'm glad they've already taken action to improve their practices as confirmed by Italo Cunha in his earlier mail.

  -- Niels.

And other times you just get BGP as art

https://twitter.com/powerdns_bert/status/878291436034170881

- jared

8 Jan. 2019 г., 20:19 <niels=nanog@bakker.net>:

In the real world, doing the correct thing

— such as writing RFC compliant code —

is often harder than doing
an incorrect thing, yes.

Evidently, yes.

There is no such thing as a fully RFC compliant BGP :

does not list 7606 Cisco Bug: CSCvf06327 - Error Handling for RFC 7606 not implemented for NXOS This is as of today and a 2 second google search… anyone running code from before RFC 7606 (2015) would also not be compliant. I did not see Juniper on the list of BGP speakers tested.

We plan to resume the experiments January 16th (next Wednesday), and
have updated the experiment schedule [A] accordingly. As always, we
welcome your feedback.

i did not realize that frr updates propagated so quickly. very cool.

randy

I "grew up" during the early days of PPP. As a member of the press I
attended an "inter-op" session at Telebit's campus, and watched as a
collection of engineers and programmers matched up implementations of
PPP and found bugs in both the Proposed Standard and in the
implementations thereof.

Watching these guys with all sorts of data monitors trying to figure out
who goofed was an interesting and fascinating experience.

During my stint with the Telecommunications Industry Associate TR-30
committee hashing out modem standards like V.32 et al and V.25 ter was a
similar exercise -- one that lead to me being in a near fight in a
parking lot in San Jose with a Microsoft enginner over clarity problems
with the proposed Standard for side-channel protocol. "Can you do
better?" "Yes." "Prove it." And I did. My proposal was accepted by
all, even the Microsoft guy.

(We continued to collaborate until he cashed out of the company.)

Steve Noble
Sent: Tuesday, January 8, 2019 6:42 PM

There is no such thing as a fully RFC compliant BGP :

Which RFC do you mean 6286, 6608, 6793, 7606, 7607, 7705 or 8212 when you say fully RFC compliant BGP please?

https://www.juniper.net/documentation/en_US/junos/topics/reference/st
andards/bgp.html does not list 7606

Cisco Bug: CSCvf06327 - Error Handling for RFC 7606 not implemented for
NXOS

This is as of today and a 2 second google search.. anyone running code from
before RFC 7606 (2015) would also not be compliant.

With regards to Revised Error Handling for BGP UPDATE Messages RFC 7606,
My recollection is there was a very long discussion with working code preceding the various drafts as well as the final RFC standard.
Regarding the Juniper case specifically a bit of googling reveals that:
All Junos software releases built on or after 2009-06-29 have been enhanced to be more tolerant of malformed optional, transitive attributes. Releases containing the coding change specifically include: 9.1S2, 9.3R3, 9.6R1 and all subsequent releases (i.e. all releases built after 9.6R1).
-so it's not quite black and white, there will be levels of protection available in current releases (albeit not fully compliant with RFC per se).
Question is whether folks out there have it actually enabled.
Oh and then there are bugs associated with the new feature (like the one in some versions of Junos which ,upon receiving malformed update won't bring the session down but rather the whole rpd if the bgp-error-tolerance feature is enabled )

adam