Theorical question about cyclic dependency in IRR filtering

Hello everyone,

While discussing IRR on some groups recently, I was thinking if there can be (and if there is) cycling dependency in filtering where IRR (run by whoever APNIC, RIPE, RADB etc) uses some upstream and accepts only routes with existing & valid route object.

So hypothetical case (can apply to any IRR):

  1. APNIC registry source is whois.apnic.net and points to 202.12.28.136 / 2001:dc0:1:0:4777::136. The aggregate of both these has a valid route object at the APNIC registry itself.

  2. Their upstreams say AS X, Y and Z have tooling in place to generate and push filters by checking all popular IRRs. All is well till this point.

  3. Say APNIC has some server/service issue for a few mins and X Y and Z are updating their filters at the same time. They cannot contact whois.apnic.net and hence miss generating filters for all APNIC IRR hosted prefixes.

  4. X, Y and Z drop APNIC prefixes including those of IRR & the loop goes on from this point onwards.

So my question is: Can that actually happen?
If not, do X, Y and Z and possible all upstreams till default-free zone treat these prefixes in a special manner to avoid such loop in resolution?

Thanks!

Hi Anurag,

Circular dependencies definitely are a thing to keep in mind when designing IRR and RPKI pipelines!

In the case of IRR: It is quite rare to query the RIR IRR services directly. Instead, the common practise is that utilities such as bgpq3, peval, and bgpq4 query “IRRd” (https://IRRd.net) instances at for example whois.radb.net and rr.ntt.net. You can verify this with tcpdump. These IRRd instances serve as intermediate caches, and will continue to serve old cached data in case the origin is down. This phenomenon in the global IRR deployment avoids a lot of potential for circular dependencies.

Also, some organisations use threshold checks before deploying new IRR-based filters to reduce risk of “misfiring”.

The RPKI case is slightly different: the timers are far more aggressive compared to IRR, and until “Publish in Parent” (RFC 8181) becomes common place, there are more publication points, thus more potential for operators to paint themselves into a corner.

Certainly, in the case of RPKI, all Publication Point (PP) operators need to take special care to not host CAs which have the PP’s INRs listed as subordinate resources inside the PP.

See RFC 7115 Section 5 for more information: “Operators should be aware that there is a trade-off in placement of an RPKI repository in address space for which the repository’s content is authoritative. On one hand, an operator will wish to maximize control over the repository. On the other hand, if there are reachability problems to the address space, changes in the repository to correct them may not be easily access by others”

Ryan Sleevi once told me: “yes, it strikes me that you should prevent self-compromise from being able to perpetually own yourself, by limiting an attacker’s ability to persist beyond remediation.”

A possible duct tape approach is outlined at https://bgpfilterguide.nlnog.net/guides/slurm_ta/
However, I can’t really recommend the SLURM file approach. Instead, RPKI repository operators are probably best off hosting their repository outside their own address space.

Just like with Authoritative DNS servers, make sure you also can serve your records via a competitor! :slight_smile:

For example, if ARIN moved one of their three publication point clusters into address space managed by any of the other four RIRs, some risk would be reduced.

Kind regards,

Job

beyond just ‘did the filter deployed change by +/- X%’
you probably don’t want to deploy content if you can’t actually talk to the source… which was anurag’s proposed problem.

I suppose there are a myriad of actual failure modes though :wink: and we’ll always find more as deployments progress… hurray?

Coin phrase … IRR (dedup)

Hi Chris,

> Hi Anurag,
>
> Circular dependencies definitely are a thing to keep in mind when
> designing IRR and RPKI pipelines!
>
> In the case of IRR: It is quite rare to query the RIR IRR services
> directly. Instead, the common practise is that utilities such as bgpq3,
> peval, and bgpq4 query “IRRd” (https://IRRd.net) instances at for example
> whois.radb.net and rr.ntt.net. You can verify this with tcpdump. These
> IRRd instances serve as intermediate caches, and will continue to serve old
> cached data in case the origin is down. This phenomenon in the global IRR
> deployment avoids a lot of potential for circular dependencies.
>
> Also, some organisations use threshold checks before deploying new
> IRR-based filters to reduce risk of “misfiring”.
>
>
beyond just 'did the filter deployed change by +/- X%'
you probably don't want to deploy content if you can't actually talk to the
source... which was anurag's proposed problem.

The point that Job was (I think?) trying to make was that by querying a
mirror for IRR data at filter generation time, as opposed to the source
DB directly, the issue that Anurag envisioned can be avoided.

I would recommend that anyone (esp. transit operators) using IRR data
for filter generation run a local mirror whose reachability is not
subject to IRR-based filters.

Of course, disruption of the NRTM connection between the mirror and the
source DB can still result in local data becoming stale/incomplete.

You can imagine a situation where an NRTM update to an object covering
the source DB address space is missed during a connectivity outage, and
that missed change causes the outage to become persistent.
However, I think that is fairly contrived. I have certainly never seen
it in practise.

Cheers,

Ben

Hi Chris,

Hi Anurag,

Circular dependencies definitely are a thing to keep in mind when
designing IRR and RPKI pipelines!

In the case of IRR: It is quite rare to query the RIR IRR services
directly. Instead, the common practise is that utilities such as bgpq3,
peval, and bgpq4 query “IRRd” (https://IRRd.net) instances at for example
whois.radb.net and rr.ntt.net. You can verify this with tcpdump. These
IRRd instances serve as intermediate caches, and will continue to serve old
cached data in case the origin is down. This phenomenon in the global IRR
deployment avoids a lot of potential for circular dependencies.

Also, some organisations use threshold checks before deploying new
IRR-based filters to reduce risk of “misfiring”.

beyond just ‘did the filter deployed change by +/- X%’
you probably don’t want to deploy content if you can’t actually talk to the
source… which was anurag’s proposed problem.

The point that Job was (I think?) trying to make was that by querying a
mirror for IRR data at filter generation time, as opposed to the source
DB directly, the issue that Anurag envisioned can be avoided.

I would recommend that anyone (esp. transit operators) using IRR data
for filter generation run a local mirror whose reachability is not
subject to IRR-based filters.

yup, sure; “remove external dependencies, move them internal” :slight_smile:
you can STILL end up with zero prefixes even in this case, of course.

Of course, disruption of the NRTM connection between the mirror and the
source DB can still result in local data becoming stale/incomplete.

yup!

You can imagine a situation where an NRTM update to an object covering
the source DB address space is missed during a connectivity outage, and
that missed change causes the outage to become persistent.
However, I think that is fairly contrived. I have certainly never seen
it in practise.

sure, there’s a black-swan comment in here somewhere :slight_smile:
The overall comment set here is really:
“Plan for errors and graceful resumption of service in their existence”
(and planning is hard)