ROVER routing security - its not enumeration

Hi,

Just wanted to clarify a few things about the ROVER approach. One key misunderstanding seems to
be that ROVER is an approach for enumerating all potentially valid routes. This is not the case. Slides
on ROVER are posted for the NANOG 55 talk and there was an additional Lightning talk Monday in NANOG

A good summary of misunderstandings are listed below and addressed below:

Summarizing a few other things other people have mentioned:

- The normal operating mode with RPKI is to fetch everything rather
  than do a point query. We've spent the last decade or so making
  that harder to do with DNS (blocking AXFR/IXFR, using NSEC3 instead
  of NSEC, etc). This makes it fairly difficult to know in advance
  what queries one should be asking ROVER (as Paul Vixie puts it,
  ROVER isn't a catalogue). When I pressed the ROVER folks about this
  at the Paris IETF meeting, they mumbled something about maybe
  walking the IRR or other external databases as a way of knowing what
  DNS queries to issue.

ROVER's operational model is ask a question and get an answer. ROVER is not
an enumeration method. RPKI does provide enumeration, but ROVER is not trying to
duplicate RPKI.

I think the first step is to step back and ask whether every operational model needs
enumeration. For example, the talk yesterday by Level3 used the DNS and IRR
did not need such an enumeration. Enumeration is not a goal in itself.
There are number of operational models that provide the needed routing protection
without enumeration.

- Circular dependencies are a problem. Helical dependencies can be
  made to work, but this says that one probably should not be
  depending on routing to make a point query to make decisions about
  routing. If you look at the architecture of the existing RPKI
  validators (well, mine and BBN's, anyway, not sure about RIPE's but
  suspect they took the same approach), we've gone to some trouble to
  make sure that the validator will continue to work across network
  outages as long as the collected data haven't expired or been
  revoked. In theory one could do the same thing with bulk transfers
  of DNS (whether AXFR/IXFR or NSEC walking, if they worked) but it
  would not work well with point queries.

Or a simpler approach that does not require bulk zone transfers or zone walking is
simply DNS caching, which already exists and is well understood.

More broadly, whether one calls its a cache or RPKI validator or whatever, you
can build it with redundancy. One can certainly make either system work across
network outages.

- ROVER gives us no traction on path validation (BGPSEC), it's limited
  to origin validation. RPKI can certify both prefixes and ASNs,
  which gives it the basics needed to support path validation as well
  as origin validation. ASNs have no hierarchical structure, thus
  would be a very poor match for encoding as DNS names.

The focus is on origin and sub prefix hijacks. There are certainly discussions and
early experiments with future additions, but the work is focused on origin/subprefix
events.

- Some of the DNS aspects of ROVER are a little strange. In
  particular, as currently specified ROVER requires the relying party
  to pay attention to DNS zone cuts, which is not normal in DNS (the
  basic DNS model since RFC 883 has been that zones are something for
  the zone administrator to worry about, resolvers mostly just see a
  tree of RRsets). ROVER requires the relying party to check for the
  same data in multiple zones and pay close attention to zone cuts.
  While it is certainly possible to do all this, it is not a matter of
  issuing a simple DNS query and you're done. DNS caching effects can
  also complicate matters here if the zone structure is changing:
  think about what happens if you have cached responses to some (but
  not all) of the queries you need to make to figure out whether to
  allow a more specific route punched out of a larger prefix block.

This is a misunderstanding of the ROVER approach.
Multiple copies of the data do not exist in multiple zones. There is a one-to-one mapping
between a prefix and a DNS name. The resolver simply finds the data and has no need to
understand where zone cuts occur.

On the other hand, DNS administrators do care about how they make zone cuts and delegate to
their customers. They can take a /16 and delegate two /17's, or they can manage the whole thing
in a single zone. Their choice.

A resolver simply issues a query for the unique DNS name associated with a prefix. This could be
done with anything from a complex tool set to a simply command line tool like dig.

The confusion here may arise from what happens if you get an *authenticated* response
saying there is no routing data at this name. This could mean 1) the prefix should not be announced
or 2) the reverse DNS happens to be signed with DNSSEC but the site is not participating in
routing security via DNS.

To determine this, you issue a second query. Is an RLOCK present along with the DNSKEY
used to sign the data? The existence of an RLOCK proves participation.
    

- The reuse of existing infrastructure argument for ROVER is somewhat
  disingenuous -- it's only partial reuse of existing infrastructure.
  ROVER's new encoding of prefixes as DNS names means that a lot of
  new stuff would need to be deployed, and attempting to be backwards
  compatible with the existing DNS reverse tree adds some complexity
  to ROVER's architecture

I strongly disagree with this. ROVER does use a naming convention.

This is simply a convention, not a protocol change. The best analogy here is
that one may have an internal naming convention for naming routers or particular
servers or so forth. You should follow this convention and build this into your
provisioning scripts where appropriate.

Clearly it is enormously better if there is a consistent way to name prefixes so
we have a common convention for naming the data. Everyone putting data in
is using the convention and we are working to get the convention standardized.
The convention is also useful for storing data at prefixes; geolocations is one example.

(conflicting data for same prefix can appear
  in multiple zones, relying party has to sort this out, yum).

Again, this is simply a naming convention. There is a unique name for a prefix.
To DNS, this is a name like any other name. A DNS name belongs to a zone. It
cannot appear in multiple zones. The prefix has a unique name. The name
cannot appear in multiple zones.

ROVER is not trying to do exactly what RPKI is doing. Much of this seems to be an
attempt to build a form of enumeration into ROVER. See the Level3 NANOG talk
from Monday (6/4/12) for a concrete example of a different model. There are many different
operational models. We seek a common convention for data publishing, but believe
strongly there can and should be different operational models for how you do
validation in your network.

Thanks,
Dan and Joe

One correction below.

[--snip--]

I think the first step is to step back and ask whether every operational model needs
enumeration. For example, the talk yesterday by Level3 used the DNS and IRR
did not need such an enumeration.

To clarify the above, the IRR _does_ provide an enumerated list of "Candidate" (IP prefix + Origin_AS) pairs. The second step is to walk through those "Candidate" pairs and ask DNSSEC, in question/answer process, to validate that the "Candidate" IRR (IP prefix, Origin_AS) pairs are authentic, or not. So, considering each step independently: the former (IRR data) is enumeration, the second is not. However, in the context of this specific operational model, the end result is an enumerated list of validated (IP Prefix, Origin_AS) pairs.

-shane

did not need such an enumeration. Enumeration is not a goal in itself.
There are number of operational models that provide the needed routing protection
without enumeration.

which are?

I can see a use-case for something like:
  "Build me a prefix list from the RIR data"

which is essentially:
  1) pull IRR data for customer-X
  2) validate all entries with 'resource certification' data
  3) deploy new filter to edge-link-to-customer-X (only if changes occur)
(shane seems to point at this as the method in question...)

I think this means that the customer here has to keep updated their
DNS data and their IRR data, and in the case (today) of 'ROVER'
getting no-answer, the customer skates... (no validation is possible).

I'm not sure you can extend usage of 'ROVER' to things which are not
'offline processed' though, and it's not clear to me that the
fail-open answer is good for us, absent some signal that 'customer-x
will not be playing today'.

- Circular dependencies are a problem. Helical dependencies can be
made to work, but this says that one probably should not be
depending on routing to make a point query to make decisions about
routing. If you look at the architecture of the existing RPKI
validators (well, mine and BBN's, anyway, not sure about RIPE's but
suspect they took the same approach), we've gone to some trouble to
make sure that the validator will continue to work across network
outages as long as the collected data haven't expired or been
revoked. In theory one could do the same thing with bulk transfers
of DNS (whether AXFR/IXFR or NSEC walking, if they worked) but it
would not work well with point queries.

Or a simpler approach that does not require bulk zone transfers or zone walking is
simply DNS caching, which already exists and is well understood.

caching implies that:
  1) the cache is filled
  2) the timeout on records is longer than the outage(s)
  3) the timeout is still short-enough to meet user change requirements

- ROVER gives us no traction on path validation (BGPSEC), it's limited
to origin validation. RPKI can certify both prefixes and ASNs,
which gives it the basics needed to support path validation as well
as origin validation. ASNs have no hierarchical structure, thus
would be a very poor match for encoding as DNS names.

The focus is on origin and sub prefix hijacks. There are certainly discussions and

in somewhat real-time on the router (get update, lookup dns records,
decide)? or via offline compute and peer filter-updates?

- Some of the DNS aspects of ROVER are a little strange. In
particular, as currently specified ROVER requires the relying party
to pay attention to DNS zone cuts, which is not normal in DNS (the
basic DNS model since RFC 883 has been that zones are something for
the zone administrator to worry about, resolvers mostly just see a
tree of RRsets). ROVER requires the relying party to check for the
same data in multiple zones and pay close attention to zone cuts.
While it is certainly possible to do all this, it is not a matter of
issuing a simple DNS query and you're done. DNS caching effects can
also complicate matters here if the zone structure is changing:
think about what happens if you have cached responses to some (but
not all) of the queries you need to make to figure out whether to
allow a more specific route punched out of a larger prefix block.

This is a misunderstanding of the ROVER approach.
Multiple copies of the data do not exist in multiple zones. There is a one-to-one mapping

1.23.45.10.in-addr.arpa.
<rover prefix entry-10.45/16>

that's 2 copies... what about:
1.23.45.10.in-addr-arpa.
<rover-covering-route entry>
<rover-customer-allocation-10.45.16/19>
<rover-customer-of-customer-allocation-10.45.23/24>

that's 4 copies.

between a prefix and a DNS name. The resolver simply finds the data and has no need to
understand where zone cuts occur.

don't I have to walk up the tree a few times in the above example
though? "Is this the covering route? the customer route? the
customer-of-customer-route? the-hijack? Wait, no RLOCK, so this was a
giant waste of time..."

A resolver simply issues a query for the unique DNS name associated with a prefix. This could be
done with anything from a complex tool set to a simply command line tool like dig.

'resolver' here is what? router? unix-y-box-thing doing
filter-generation? near-line-query/response-box for
router-real-time-lookup?

The convention is also useful for storing data at prefixes; geolocations is one example.

not to nit-pick, but near as I can tell no one uses the geoloc entries
in dns... also they aren't very well kept up to date by those few who
actually do put them into dns :frowning:

(conflicting data for same prefix can appear
in multiple zones, relying party has to sort this out, yum).

Again, this is simply a naming convention. There is a unique name for a prefix.
To DNS, this is a name like any other name. A DNS name belongs to a zone. It
cannot appear in multiple zones. The prefix has a unique name. The name
cannot appear in multiple zones.

10.45.23.0/24
10.45.16.0/19
10.45.0.0/16
10.0.0.0/8

ROVER is not trying to do exactly what RPKI is doing. Much of this seems to be an
attempt to build a form of enumeration into ROVER. See the Level3 NANOG talk
from Monday (6/4/12) for a concrete example of a different model. There are many different

you referenced this a few times:
  <http://www.nanog.org/meetings/nanog55/agenda.php&gt;

doesn't mention a talk from L3 on 6/4 ... got link?

-chris

There are number of operational models that provide the needed
routing protection without enumeration.

I can see a use-case for something like:
  "Build me a prefix list from the RIR data"

this requires a full data fetch, not doable in dns.

and, at the other end of the spectrum, for any dynamic lookup on
receiving a bgp announcement, the data had best be already in the
router. a full data set on an in-rack cache will go nuts on any
significant bgp load. beyond that, you are in non-op space.

randy

does it? shane implied (and it doesn't seem UNREASONABLE, modulo some
'doing lots of spare queries') to query for each filter entry at
filter creation time, no?

get-as-GOOGLE = 216.239.32.0/19
lookup-in-dns = <rover-query-for-/19> + <rover-query-for-/20> +
<rover-query-for-/21>.....

that could be optimized I bet, but it SEEMS doable, cumbersome, but
doable. the 'fail open' answer also seems a bit rough in this case
(but no worse than 'download irr, upload to router, win!' which is
today's model).

-chris

routing protection without enumeration.

I can see a use-case for something like:
"Build me a prefix list from the RIR data"

this requires a full data fetch, not doable in dns.

does it? shane implied (and it doesn't seem UNREASONABLE, modulo some
'doing lots of spare queries') to query for each filter entry at
filter creation time, no?

what is the query set, every prefix /7-/24 for the whole fracking ABC
space?

that could be optimized I bet, but it SEEMS doable, cumbersome, but
doable. the 'fail open' answer also seems a bit rough in this case
(but no worse than 'download irr, upload to router, win!' which is
today's model).

irr, i do have the 'full' set. but you said RIR (the in-addr roots),
not IRR. was it a mis-type?

and i am not gonna put my origin data in the irr and the dns.

randy

routing protection without enumeration.

I can see a use-case for something like:
"Build me a prefix list from the RIR data"

this requires a full data fetch, not doable in dns.

does it? shane implied (and it doesn't seem UNREASONABLE, modulo some
'doing lots of spare queries') to query for each filter entry at
filter creation time, no?

what is the query set, every prefix /7-/24 for the whole fracking ABC
space?

that could be optimized I bet, but it SEEMS doable, cumbersome, but
doable. the 'fail open' answer also seems a bit rough in this case
(but no worse than 'download irr, upload to router, win!' which is
today's model).

irr, i do have the 'full' set. but you said RIR (the in-addr roots),
not IRR. was it a mis-type?

oh hell :frowning: yes, I meant IRR.

and i am not gonna put my origin data in the irr and the dns.

yea... so today people already fill in:

   RIR (swip/rwhois)
   IRR (routing filter updates)
   DNS (make sure your mailserver has PTRs!)

putting origin-validation data into IRR's happens today, it's not
'secured' in any fashion, and lots of proof has shown that 'people
fill it with junk' :frowning: So being able to bounce the IRR data off some
verifiable source of truth seems like a plus. How verifiable is the
rdns-rover tree though? how do I get my start in that prefix hierarchy
anyway? by talking to IANA? to my local RIR? to 'jimbo the dns guy
down the street?' (I realize that referencing the draft would probably
get me this answer but it's too hard to look that up in webcrawler
that right now...)

-Chris

putting origin-validation data into IRR's happens today, it's not
'secured' in any fashion, and lots of proof has shown that 'people
fill it with junk' :frowning: So being able to bounce the IRR data off some
verifiable source of truth seems like a plus.

so i should use the sow's ear as the authoritative definition of the
full set?

randy

Shane A. gave a Lightning Talk the slides for which will be posted at some
time soon.
They came in at the last minute which is why they're not up already.

Tony

Shane A. gave a Lightning Talk the slides for which will be posted at some
time soon.

I figured the talk was shane's.

They came in at the last minute which is why they're not up already.

ok, cool. thanks
-chris

I think we debate the superficial here, and without sufficient imagination. The enumerations vs query issue is a NOOP as far as I am concerned. With a little imagination, one could envision building a box that takes a feed of prefixes observed, builds an aged cache of prefixes of interest, queries for their SRO records, re queries for those records before their TTLs expire, and maintains a white list of "SRO valid" prefix/origin pairs that it downloads to the router.

Lets call that box a SRO validating cache.

Where do you get the feed of prefixes of interest? From your own RIBs if you are only interested in white lists proportional to the routes you actually see, e.g., feed the box iBGP. From other sources (monitors, etc) if you would like a white list of every known prefix that anyone has seen.

What about a completely new prefix being turned up? ... we could talk through those scenarios in each approach.

How does the cache down load the white list to the router ... we already have one approach for that. Add a bit to the protocol to distinguish semantics of SRO from ROA semantics if necessary.

Point being, with a little imagination I think one could build components with either approach with similar black box behavior.

If there are real differences in these approaches it will be in their inherent trust models, the processes that maintain those trust models, the system's level behavior of the info creation and distribution systems, and the expressiveness of their validation frameworks.

dougm

Doug Montgomery <dougm.tlist@gmail.com> writes:

> ...

I think we debate the superficial here, and without sufficient imagination.
The enumerations vs query issue is a NOOP as far as I am concerned. With
a little imagination, one could envision building a box that takes a feed
of prefixes observed, builds an aged cache of prefixes of interest, queries
for their SRO records, re queries for those records before their TTLs
expire, and maintains a white list of "SRO valid" prefix/origin pairs that
it downloads to the router.

this sounds like a steady state system. how would you initially populate it,
given for example a newly installed core router having no routing table yet?

if the answer is, rsync from somewhere, then i propose, rsync from RPKI.

if the answer is, turn off security during bootup, then i claim, bad idea.

...

Point being, with a little imagination I think one could build components
with either approach with similar black box behavior.

i don't think so. and i'm still waiting for a network operator to say what
they think the merits of ROVER might be in comparison to the RPKI approach.
(noting, arguments from non-operators should and do carry less weight.)

Doug Montgomery <dougm.tlist@gmail.com> writes:

> ...

I think we debate the superficial here, and without sufficient
imagination.
The enumerations vs query issue is a NOOP as far as I am concerned.
With
a little imagination, one could envision building a box that takes a
feed
of prefixes observed, builds an aged cache of prefixes of interest,
queries
for their SRO records, re queries for those records before their TTLs
expire, and maintains a white list of "SRO valid" prefix/origin pairs
that
it downloads to the router.

this sounds like a steady state system. how would you initially populate
it,
given for example a newly installed core router having no routing table
yet?

if the answer is, rsync from somewhere, then i propose, rsync from RPKI.

if the answer is, turn off security during bootup, then i claim, bad idea.

Well, I should probably let the ROVER guys say what they have in mind.

The above started from my imagination that if you did not want routers
actually doing route-by-route queries, that it would be easy to build a
validating cache that behaves similar to a RPKI validating cache, but
pulling the info from rDNS as opposed to RPKI.

Maybe the ROVER guys have something else in mind (e.g., routers doing the
queries themselves, some other model of how the info ... Or its impacts
... Is effected on the router).

IFF you do imagine that there is a SRO validating cache box - you can
decompose the question of how one solves state skew between (1) rtr and
cache, (2) cache and info authoritative source, and (3) how new
authoritative information gets globally distributed/effected in the system.

Looking at just (1) (your question I think), we have a couple of different
questions to look at.

a. How does a router with no origin info (new router, router reboot),
synchronize with the cache (assuming the cache has state).

The current machinery of rtr-to-cache would work fine here. Might need to
add a bit or two, but the basic problem is the same.

b. How does a cache with no state, build a list of prefix-origin pairs?
Clearly if one builds a SRO validating cache box, the usual techniques of
checkpointing state, having redundant cache's etc could be used ... But at
some level the question of having to get initial state, and what the
router does during that period (assuming that the stateless cache is his
only source) must be answered.

One way of thinking about these questions, is to ask how would it work in
RPKI?

If for origin validation we have a strict "don't fail open" during resets
requirement, then there are a lot of initialization questions we must
address in any system. I.e., what does the router do, if its only RPKI
cache has to rebuild state from zero? What does such a router do if it
looses contact with its cache?

At this point, I could propose more ideas, but probably going further with
my imagination is not important. The ROVER guys should tell us what they
have in mind, or someone interested in building a ROVER validating cache
should design one and tell us.

But maybe stepping back one level of abstraction, you can think of things
this way.

We have a top-down-enumeration vs query model. One could put a cache in
the the query model to make it approximate an enumeration model, but only
to the point that one has, or can build a reasonably complete, list of
prefixes of interest.

If one admits that sometimes there will be cache misses (in the
query/cache model) and one might have to query in those cases, then the
trade off seems to be how often that occurs vs the responsiveness one
would get out of such a system for situations when the authoritative
information itself changes (case 3 above). I.e., how fast could you turn
up a new prefix in each system?

Maybe the ROVER guys don't believe in caches at all. In which case I
return you to the original "OMG! Enumeration vs Query thread".

I just don't think that is the most significant difference between the two
approaches.

dougm