where was my white knight....

Dobbins_Roland · November 8, 2011, 10:26pm

A cache that's persistent across reboots?

Dobbins_Roland · November 8, 2011, 10:29pm

They don't have to be directly-connected - they could be on the DCN, which ought to have at least some static 'hints' to critical resources.

Leo_Bicknell1 · November 8, 2011, 10:32pm

In a message written on Tue, Nov 08, 2011 at 10:19:24PM +0000, Nick Hilliard wrote:

One solution is to have directly-connected rpki caches available to all
your bgp edge routers throughout your entire network. This may turn out to
be expensive capex-wise, and will turn out to be yet another critical
infrastructure item to maintain, increasing opex.

Couldn't you just have a couple of these boxes on your network and
route them in your IGP, removing any BGP dependancy? KISS.

bill3 · November 8, 2011, 10:32pm

are they actually coherent enough to be read & understood?

/bill

Matthias_Waehlisch1 · November 8, 2011, 10:38pm

I think so: at least a Bachelor student of my got along with them for
his thesis.

Btw: There is also a very nice overview by Geoff published in Cisco
IPJ:

Christopher_Morrow · November 8, 2011, 10:46pm

not across reboots, but in this case routers didn't necessarily reboot
(parts of them did though).
in the case of a reboot, sure, pull from your local cache, no 'walk up
the chian' is required here.

Bandy_Rush1 · November 9, 2011, 3:14am

I understand what the manual says (actually, i read it).

cheating!!!!

I'm just curious as to how this is going to work in real life. Let's
say you have a router cold boot with a bunch of ibgp peers, a transit
or two and an rpki cache which is located on a non-connected network -
e.g. small transit pop / AS boundary scenario. The cache is not
necessarily going to be reachable until it sees an update for its
connected network.

once again,
  o when you have no connection to a cache or no covering roa for a
    a prefix, the result is specified as NotFound
  o we recommend you route on NotFound

so the result is the same as today.

Until this happens, there will be no connectivity from the router to
the cache

false

Look, i understand that you're designing rpki <-> interactivity such that
things will at least work in some fashion when your routers lose sight of
their rpki caches. The problem is that this approach weakens rpki's
strengths - e.g. the ability to help stop youtube-like incidents from
recurring by ignoring invalid prefix injection.

you can't have you cake and eat it to. you can not detect invalid
originations until you have the data to do so.

randy

Bandy_Rush1 · November 9, 2011, 3:28am

fwiw, we have not tested the scaling of rpki-rtr performance as much as
we might have. we synthesized an rpki cache with roas for all the
prefixes in a current table, 370k of them or whatever, and let routers
load that cache from zip to full. for low-end routers and a mediocre
cache server, either local or across noam, it took less than five
seconds. this was small enough that we moved on to other stuff.

randy

Bandy_Rush1 · November 9, 2011, 3:33am

Indeed, we can expect new and exciting ways to blow up networks with
SIDR.

the black helicopters spraying fud are especially vicious

Owen_DeLong · November 9, 2011, 3:54am

Did you do this on routers that already had fully converged tables, or,
did you bootstrap the table load into the routers at the same time
as would be the case in a power failure, post-crash reboot, software
upgrade, etc.?

If only the former, may I suggest that at least doing some level of the
latter might prove a useful exercise?

I apologize for this mildly operational question. Y'all can go back to
Randy's fud-laiden black helicopters now.

Owen

Nick_Hilliard3 · November 9, 2011, 11:43am

once again,
  o when you have no connection to a cache or no covering roa for a
    a prefix, the result is specified as NotFound
  o we recommend you route on NotFound

so the result is the same as today.

Well no, not really because when the cache becomes reachable again, you
need to revalidate everything which got a NotFound. This will cause extra
bgp churn where revalidation caused a local policy change.

Even if you have a local cache, this will still cause problems due to the
problem you summarised in draft-ietf-sidr-origin-ops, section 6:

"Like the DNS, the global RPKI presents only a loosely consistent view,
depending on timing, updating, fetching, etc. Thus, one cache or router
may have different data about a particular prefix than another cache or
router. There is no 'fix' for this, it is the nature of distributed data
with distributed caches."

Local caches may miss updates due to interior unreachability. Routers will
not revalidate after cache updates. So this loosely consistent view will
propagate into your routers' bgp views. Do I really want this? Or, more
to the point, is a perpetually inconsistent bgp network view better or
worse than the occasional more serious reachability problem that rpki is
attempting to solve? This isn't clear to me.

Until this happens, there will be no connectivity from the router to
the cache

false

Not false in the scenario I described. Please read what I said, not what
your straw man whispers in your ear.

Nick