do not filter your customers

I think if we asked telstra why they didn't filter their customer some
answer like:
1) we did, we goofed, oops!
2) we don't it's too hard
3) filters? what?

I suspect in the case of 1 it's a software problem that needs more
belts/suspenders
I suspect in the case of 2 it's a problem that could be shown to be
simpler with some resource-certification in place
I suspect 3 is not likely... (or I hope so).

So, even without defining what a leak is, providing a tool to better
create/verify filtering would be a boon.

Yes, I agree!

What I'd hate to see is:

4) We fully deployed BGPSEC, and RPKI, and upgraded our
infrastructure, and retooled provisioning, operations and processes
to support it all fully, and required our customers and peers to use it,
and even then this still happened - WTF was the point?

This "leak" thing is a key vulnerability that simply can't be brushed
aside - that's the crux of my frustration with the current effort.

-danny

I think if we asked telstra why they didn't filter their customer some
answer like:
1) we did, we goofed, oops!
2) we don't it's too hard
3) filters? what?

I suspect in the case of 1 it's a software problem that needs more
belts/suspenders
I suspect in the case of 2 it's a problem that could be shown to be
simpler with some resource-certification in place
I suspect 3 is not likely... (or I hope so).

So, even without defining what a leak is, providing a tool to better
create/verify filtering would be a boon.

Yes, I agree!

What I'd hate to see is:

4) We fully deployed BGPSEC, and RPKI, and upgraded our
infrastructure, and retooled provisioning, operations and processes
to support it all fully, and required our customers and peers to use it,
and even then this still happened - WTF was the point?

I think this is the point:
<https://twitter.com/#!/atoonk/status/165245731429564416&gt;

This "leak" thing is a key vulnerability that simply can't be brushed
aside - that's the crux of my frustration with the current effort.

You seem to think that there's some extension/modification to BGPSEC
that would fix route leaks in addition to the ASPATH issues that
BGPSEC addresses right now. Have you written this up anywhere? I
would be interested to read it.

--Richard

I don't, actually -- as I haven't presupposed that "BGPSEC" is the
answer to all things routing security related, nor have I excluded it.

I didn't realize it was unacceptable to acknowledge a problem exists
without having solved already. I might have that backwards though.

-danny

Steve,

the problem is that you have yet to rigorously define it and how to
unambiguously and rigorously detect it. lack of that will prevent
anyone from helping you prevent it.

You referred to this incident as a "leak" in your message:

"a customer leaked a full table"

I was simply agreeing with you -- i.e., looked like a "leak", smelled
like a "leak" - let's call it a leak.

I'm optimistic that all the good folks focusing on this in their day
jobs, and expressly funded and resourced to do so, will eventually
recognize what I'm calling "leaks" is part of the routing security
problem.

Sure; I don't disagree, and I don't think that Randy does. But just
because we can't solve the whole problem, does that mean we shouldn't
solve any of it?

Solving for route leaks is /the/ "killer app" for BGPSEC. I can't understand why people keep ignoring this.

As has been discussed in the SIDR WG, BGPSEC will _increase_ state in BGP, (more DRAM needed in PE's and RR's, crypto processors to verify sigs, more UPDATE traffic for beaconing). And, at the end of the day, ISP's are going to go to their customers and say to them:
- BGP convergence may be slower than in the past, because we're shipping sigs around in BGP now
- we can prevent a malicious attack from a random third-party (in the right part of the topology);
- *but* I can't protect you from a 20+ year old problem of a transit customer accidentally -or- maliciously stealing/dropping your traffic if they leak routes from one provider to another provider?

As Randy said, we can't even try for a strong technical solution
until we have a definition that's better than "I know it when I see it".

The first step is admitting that we have a problem, then discussing it collectively to try to determine a way to prevent said problem from happening.

-shane

I don't think anyone's ignoring the problem... I think lots of people
have said an equivalent of:
1) "How do I know that this path: A - B - C - D
  is a 'leak'?"

Followed by:
2) "Tell me how to answer this programatically given the data we have
today in the routing system" (bgp data on the wire, IRR data, RIR
data)

so far ... both of the above questions haven't been answered (well 1
was answered with: "I will know it when i see it" which isn't helpful
at all in finding a solution)

-chris

In a message written on Fri, Feb 24, 2012 at 01:04:20PM -0700, Shane Amante wrote:

Solving for route leaks is /the/ "killer app" for BGPSEC. I can't understand why people keep ignoring this.

Not all "leaks" are bad.

I remember when there was that undersea landside in Asia that took
out a bunch of undersea cables. Various providers quickly did
mutual transit and other arrangements to route around the problem,
getting a number of things back up quite quickly. These did not
match IRR records though, and likely would not have matached BGPSEC
information, at least not initially.

There are plenty of cases where someone "leaks" more specifics with
NO_EXPORT to only one of their BGP peers for the purposes of TE.

The challenge of securing BGP isn't crypto, and it isn't enough
ram/cpu/whatever to process it. The challenge is getting a crypto
scheme that operators can use to easily represent the real world.
It turns out the real world is quite messy though, often full of
temporary hacks, unusual relationships and other issues.

I'm sure it will be solved, one day.

well.... for bgpsec so if the paths were signed, and origins signed,
why would they NOT pass BGPSEC muster?

I can see that if the IRR data didn't match up sanely
prefix-lists/filters would need some cajoling, but that likely
happened anyway in this case.

-chris

In a message written on Fri, Feb 24, 2012 at 04:07:28PM -0500, Christopher Morrow wrote:

well.... for bgpsec so if the paths were signed, and origins signed,
why would they NOT pass BGPSEC muster?

I honestly have trouble keeping the BGP security work straight.
There is work to secure the sessions, work to authenticate route
origin, work to authenticate the AS-Path, the peer relationships,
and so on.

I believe BGPSEC authenticates the AS-Path, and thus turning up a
new peer requires them to each sign each others "path object".

During the time period between when the route propogates and the
signature propogates these routes appear to be a leak. I don't
believe the signature data is moved via BGP. Worse, in this case,
imagine if one of the parties was "cut off" from the signature
distribution system. They would need to bring up their (non-validating)
routes to reach the signature distribution system before their
routes would be accepted!

In fact, this happens today with those who strict IRR filter. Try
getting a block from ARIN, and then service from a provider who
only uses IRR filters. The answer is to go to some other already
up and working network to submit your IRR data to the IRR server,
before your network can come up and be accepted!

On a new turn up for an end-user, not a big deal. When you look at the
problems that might occur in the face of natural or man made disasters
though, like the cable cut, it could result in outages that could have
been fixed in minutes with a non-validing system taking hours to fix in
a validating one.

That may be an acceptable trade off to get security; but it depends on
exactly what the trade off ends up being. To date, I personally have
found "insecure" BGP, even with the occasional leaks, to be a better
overall solution.

In a message written on Fri, Feb 24, 2012 at 04:07:28PM -0500, Christopher Morrow wrote:

well.... for bgpsec so if the paths were signed, and origins signed,
why would they NOT pass BGPSEC muster?

I honestly have trouble keeping the BGP security work straight.

yes

There is work to secure the sessions, work to authenticate route
origin, work to authenticate the AS-Path, the peer relationships,
and so on.

I believe BGPSEC authenticates the AS-Path, and thus turning up a
new peer requires them to each sign each others "path object".

well currently it doesn't do anything (really) but the PLAN is that
you'd be able to look at the origin, view some transitive
community/attribute and say: "That validates with the roa data" - some
cert-check/hash-check/etc.

then later on you'd be able to say for each AS in the ASPATH:
  "Yes, the route is signed by AS1, the signature validates. Yes the
route is signed by AS2, the signature validates (wash/rinse/repeat for
the whole path)"

During the time period between when the route propogates and the
signature propogates these routes appear to be a leak. I don't

signatures follow inside the announcement as currently draft-spec'd.

believe the signature data is moved via BGP. Worse, in this case,
imagine if one of the parties was "cut off" from the signature
distribution system. They would need to bring up their (non-validating)
routes to reach the signature distribution system before their
routes would be accepted!

the sig data for an NLRI follows along inside the announcement.
the cache of data is probably updated inside of a day... there's
likely some skew, but provided the origins don't change and no one has
to emergency release new key materials, I think it's not important for
this discussion.

you simply start hearing routes with same origin as previously on
different paths. "new customers" essentially pop up en-mass. This
isn't a problem as long as the customers are the same origin-as as
before... it'd mean some rejiggering of prefix-lists (as I said
before) but ... you'd be doing that anyway.

In fact, this happens today with those who strict IRR filter. Try
getting a block from ARIN, and then service from a provider who
only uses IRR filters. The answer is to go to some other already
up and working network to submit your IRR data to the IRR server,
before your network can come up and be accepted!

right, there's some lag between publication and acceptance/update. I
think in the case of (for example L(3) the lag is ~6hrs in the worst
case.

On a new turn up for an end-user, not a big deal. When you look at the
problems that might occur in the face of natural or man made disasters
though, like the cable cut, it could result in outages that could have
been fixed in minutes with a non-validing system taking hours to fix in
a validating one.

I don't think that's really the case, but walking through the
processes/requirements seems like a sane thing to do.

That may be an acceptable trade off to get security; but it depends on
exactly what the trade off ends up being. To date, I personally have
found "insecure" BGP, even with the occasional leaks, to be a better
overall solution.

how's that chinese leak of F-root doing for you? :slight_smile:

-chris

I repeat -- we're in violent agreement that route leaks are
a serious problem. No one involved in BGPSEC -- not me, not Randy,
not anyone -- disagrees. Give us an actionable definition and
we'll try to build a defense. Right now, we have nothing better
than what Justice Potter Stewart once said in an opinion: "I shall
not today attempt further to define the kinds of material I
understand to be embraced within that shorthand description
["hard-core pornography"]; and perhaps I could never succeed
in intelligibly doing so. But I know it when I see it..."

Again -- *please* give us a definition.

    --Steve Bellovin, https://www.cs.columbia.edu/~smb

P.S. It was routing problems, including leaks between RIP and either
EIGRP or OSPF (it's been >20 years; I just don't remember), that got
me involved in Internet security in the first place. I really do
understand the issue.

I can think of a way to do it but it would require some trust and it would require that people actually *used* it. What one would do is feed the routes they are proposing to send to a BGP peer to a RIR front-end. The receiving peer would "sign off" on the proposal and the routes would be then entered into the RIR. That is the step that is currently missing. Anyone can enter practically anything into an RIR and the receiving side never gets to "sanity check" the information before it actually gets written to the database. Once you have this base of information, route filtration generated from the database becomes more reliable.

In fact, a network might have several "canned" profiles of different route packages registered in the front end. A "transit" package, a "customer routes" package and maybe some specialized packages for peering at various private/public exchange points. If you pick up a new peer at a transit point, you select the package for that point, it proposes that to the peer, peer approves it, and they can both generate their route filters from that information.

It could even highlight some glaring errors automatically to spot what might be a typo or even attempted nefarious activity. The receiver of a proposed change might be alerted to the fact that the new route(s) being offered are inconsistent with the database information (routes already being sourced by an AS that the proposed sender is not peering with) which could be overridden by the receiver (or just ignored) but having something show up in some way that highlights a possible inconsistency might generate a closer look at that proposal and head off problems later.

But the fundamental problem is that the current system is "open loop".

Solving for route leaks is /the/ "killer app" for BGPSEC. I can't understand why people keep ignoring this.

I don't think anyone's ignoring the problem... I think lots of people
have said an equivalent of:
1) "How do I know that this path: A - B - C - D
is a 'leak'?"

If you are receiving a path of the form (A B C D), and the origination of the prefix at D is good, then the only way you can figure out this is a leak as compare to the intentional operation of BGP is not by looking at the operation of protocol per se, but by looking at the routing policy intentions of A, B, C and D and working out if what you are seeing is intentional within the scope of the routing policies of these entities. RPSL is one such approach of describing such policy in a manner that one could perform some basic computation over the data.

It exposes a broader issue here about the difference between routing intent and protocol correctness. From the perspective of protocol correctness, regardless of whether the information was intended to be propagated, a protocol correctness tool should be able to tell you that the information has been faithfully propagated, but cannot tell you whether such propagation was intentional or not.

Followed by:
2) "Tell me how to answer this programatically given the data we have
today in the routing system" (bgp data on the wire, IRR data, RIR
data)

I wish.

so far ... both of the above questions haven't been answered (well 1
was answered with: "I will know it when i see it" which isn't helpful
at all in finding a solution)

Some longstanding problems are longstanding because we have not quite managed to apply the appropriate analytical approach to the problem. Others are longstanding problems because they are damn difficult and this makes me wonder if we really understand the nature of the space we are working in. For example, if you think about routing not as a topology and reachability tool, but an distributed algorithm to solve a set of simultaneous equations (policies) would that provide a different insight as to the way in which routing policies and routing protocols interact?

Geoff

Solving for route leaks is /the/ "killer app" for BGPSEC. I can't
understand why people keep ignoring this.

I'd be interested to hear your opinions on exactly how rpki in its current
implementation would have prevented the optus/telstra problem. Could you
elaborate?

Here's a quote from draft-ietf-sidr-origin-ops:

   As the BGP origin AS of an update is not signed, origin validation is
   open to malicious spoofing. Therefore, RPKI-based origin validation
   is designed to deal only with inadvertent mis-advertisement.

   Origin validation does not address the problem of AS-Path validation.
   Therefore paths are open to manipulation, either malicious or
   accidental.

An optus/telstra style problem might have been mitigated by an rpki based
full path validation mechanism, but we don't have path validation. Right
now, we only have a draft of a list of must-have features -
draft-ietf-sidr-bgpsec-reqs. This is only the first step towards designing
a functional protocol, not to mind having running code.

Nick

... and, if you create a top-down control mechanism to be superimposed upon
the current fully distributed control mechanism, you will soon find that
politicians and regulators will take a very keen interest in BGP once they
realise that they can turn off specific prefixes from a single point.

Whatever about temporary hacks and unusual relationships, the entropy
introduced by layers 9 through 12 is almost always insufferable.

Nick

I'm optimistic that all the good folks focusing on this in their day
jobs, and expressly funded and resourced to do so, will eventually
recognize what I'm calling "leaks" is part of the routing security
problem.

Sure; I don't disagree, and I don't think that Randy does. But just
because we can't solve the whole problem, does that mean we shouldn't
solve any of it?

is it a *security* problem? it is a violation of business intent. and
one we would like to solve. but it is not clear to me that 'leaks' are
really a security issue.

randy

Solving for route leaks is /the/ "killer app" for BGPSEC.

as would be solving world hunger, war, bad cooking, especially bad
cooking.

route leaks, as much as i understand them
  o are indeed bad ops issues
  o are not security per se
  o are a violation of business relationshiops
  o and 20 years of fighting them have not given us any significant
    increase in understanding, formal definition, or prevention.

i would love to see progress on the route leak problem. i do not
confuddle it with security.

randy

1. Make your customers register routes, then filter them.
     (may be time for big providers to put routing tools into
     open source for the good of the community - make it
     less hard?)

2. Implement the "1-hop" hack to protect your BGP peering.

98% of problem solved on the Internet today

3. Implement a "# of routes-type" filter to make your peers
     (and transit customers) phone you if they really do want
     to add 500,000 routes to your session ( or the wrong set
     of YouTube routes...).

99.9% of problem solved.

4. Implement BGP-Sec

99.91% of "this" problem solved.

Because #1 is 'just too hard' and because #4 is just too sexy
as an academic pursuit we all suffer the consequences. It's
a shame that tier one peering agreements didn't evolve with
a 'filter your customers' clause (aka do the right thing) as well
as a 'like for like' (similar investments) clause in them.

I'm not downplaying the BGP-SEC work, I think it's valid and
may one day save us from some smart bunny who wants to
make a name for himself by bringing the Internet to a halt. I
don't believe that's what we're battling here. We're battling the
operational cost of doing the right thing with the toolset we have
versus waiting for a utopian solution (foolproof and free) that may
never come.

jy

ps. my personal view.

Availability is a key aspect of security - the most important one, in many cases/contexts. The availability of the control plane itself (i.e., being stable/resilient enough to continue doing its job even under various forms of duress) as well as the availability of the information about paths it propagates in order to allow the routing of transit traffic both fall squarely within the rubric of security, IMHO.

The disruption of transit traffic routing often caused by route leaks, as in this particular case, has a negative impact of the overall availability of affected networks/endpoints/applications/services/data. However, route leaks are only one potential cause of such hits to availability - and while there are several BCPs which can and should be adopted in order to protect against control-plane disruption, they in many cases honored more in the breach than in the observance due to complexity, opex (as is the case with many - some would say most - security-related BCPs), and so forth.

The single best thing which could be done to improve the stability/resiliency of the control-plane on IP networks in general would be to change the nature of the control-plane (not just BGP, but the IGPs, as well) from in-band to out-of-band, IMHO. I know this will probably never happen, but wanted to be sure that the point was made in relation to this specific topic for the sake of completeness, if nothing else.

1. Make your customers register routes, then filter them.
(may be time for big providers to put routing tools into
open source for the good of the community - make it
less hard?)

not a big provider, but ras@e-gerbil did release irr-tools no?

2. Implement the "1-hop" hack to protect your BGP peering.

98% of problem solved on the Internet today

which problem? GTSH only protects your actual bgp session, not the
content of the session(s) or the content across the larger network.

3. Implement a "# of routes-type" filter to make your peers
(and transit customers) phone you if they really do want
to add 500,000 routes to your session ( or the wrong set
of YouTube routes...).

max-prefix already exists... sometimes it works, sometimes it's a
burden. It doesnt' tell you anything about the content of the session
though (the YT routes example doesn't actually work that way)

99.9% of problem solved.

? not sure about that number

4. Implement BGP-Sec

99.91% of "this" problem solved.

Because #1 is 'just too hard' and because #4 is just too sexy
as an academic pursuit we all suffer the consequences. It's

there are folks working on the #4 problem, not academics even. It's
not been particularly sexy though :frowning:

a shame that tier one peering agreements didn't evolve with
a 'filter your customers' clause (aka do the right thing) as well
as a 'like for like' (similar investments) clause in them.

I'm missing something here... it's not clear to me that 'tier1'
providers matter a whole lot in the discussion. Many of them have
spoken up saying: "Figuring out the downstream matrix in order to put
a prefix-list on my SFP peer is not trivial, and probably not workable
on gear today." (shane I think has even said this here...)

I'm not downplaying the BGP-SEC work, I think it's valid and
may one day save us from some smart bunny who wants to
make a name for himself by bringing the Internet to a halt. I
don't believe that's what we're battling here. We're battling the
operational cost of doing the right thing with the toolset we have

right, so today you have to do a lot of math/work to figure out if
your customer's prefixes are hers, and if they should be permitted
into your RIB. Tomorrow you COULD get a better end result with less
work and more assurance given a populated resource certification
system.

Extending some into the land of BGPSEC you COULD also know that the
route you hear originated from the correct ASN and later you'd be able
to tell that path the route travel was the same as the ASPATH in the
route...

-chris