ARIN RPKI Trust Anchor Issue

Hello folks,

Has anyone received any similar event notifications (from PacketVis or other)? Trying to work out if it’s a false-positive.

Regards,
Christopher Hawker

Dear all,

I analysed the alert, here is my assessment.

If I recall correctly, Packetvis uses multiple data sources (different
versions of validator implementations) and alerts on anomalies spotted
by more than a single data source.

Most RPKI Validator implementations limit the maximum allowable file
size of RPKI Signed Objects. It appears one particular Manifest in
ARIN's "Hosted CA" system exceeded a threshold known to exist in older
implementations, rendering all subordinate ROAs on that manifest invalid
for those instances.

Timeline (based on rpkiviews.org):

* On Tue 28 Jan 2025 14:02:07 +0000, one Large CA's keypair signed a
  Manifest with FileAndHash 50,157 entries. This manifest was nominally
  valid until Thu 30 Jan 2025 10:00:00 +0000 and 3,964,618 bytes in
  size. Note that this is very large compared to other Manifest
  objects: it is 174% larger than the second largest Manifest, and 456%
  larger than the third largest Manifest.

* On Tue 28 Jan 2025 17:54:07 +0000 this Large CA's keypair signed a
  Manifest with FileAndHash 51,014 entries. That particular issuance was
  4,032,321 bytes in size and exceeded a threshold (4M bytes). Following
  the "Failed Fetch" mechanism described in RFC 9286 Section 6.6,
  affected instances continued to use the older "Tue 28 Jan 2025
  14:02:07" manifest, until it expired (which happened today at Thu 30
  Jan 2025 10:00:00).

It is interesting that the 'trigger event' happened two days ago, but it
is only just now that it became quite tangible! It seems this anomaly
could've been alerted for earlier on.

I noted in my "RPKI's 2024 Year In Review" report:

  """
  "Efficiency" in this context arises from validators spending the
  computational cost of validating a single EE certificate and
  yielding more than 1 ROAIPAddress. Under the RIPE NCC TAL one
  yields 6.5 prefixes per ROA, while in the ARIN region this is
  number is 1.1 prefixes per ROA. As stewards of this technology,
  we need to keep an eye on the overall efficiency of the RPKI to
  ensure things don't get out of hand.
  """
  source: https://mailman.nanog.org/pipermail/nanog/2025-January/227166.html

When a resource holder creates many ROAs (tens of thousands), it'll
result in many Manifest FileAndHash entries (again, tens of thousands),
which increases the file size of the Manifest (to the point that some
validators may consider such a Manifest object invalid). When this
happens, the validator marks ROAs as invalid, in turn BGP routers will
considered covered routes 'not-found'.

Another downside of systems signing over only a single prefix per ROA is
that each individual ROA object comes with 1500~2100 bytes 'overhead'
regardless of how many prefixes are encoded inside of it (due to the
embedded X.509 End-Entity certificate and signature). Expressed as a
percentage, this overhead is ~ 98% in the case of single prefixes. While
Manifests grow linearly, the per-ROA overhead makes for a somewhat steep
curve in turn directly impact size of RRDP snapshots (in which all
Manifests + ROAs are bundled together). Publication point operators
are recommended to keep in mind that RRDP snapshot size is another limit
that can be tripped.

Stakeholders operating certification services should keep in mind that
validator implementations might restrict the file size of individual
objects, and the number of objects, but also impose limits on the size
of the RRDP snapshot, the duration of the synchronization task, etc.

Kind regards,

Job

We at ARIN have verified that all our systems are functioning optimally and there were no indications of any issues at the time of the PacketVis alerts.

Brad Gorman

Director, Customer Technical Services

American Registry for Internet Numbers

Dear Job,

I analysed the alert, here is my assessment.

Thanks a lot for the analysis. I had also received the alert (Randy
Bush and others as well, see "Subject: TA Malfunction??" thread :slight_smile: and
was wondering... your analysis makes sense as far as I can judge (which
is not very far).

[...]

It is interesting that the 'trigger event' happened two days ago, but it
is only just now that it became quite tangible! It seems this anomaly
could've been alerted for earlier on.

Can you elaborate how? (Looking for overly-large or otherwise suspicious
manifests signed by CAs?)

I noted in my "RPKI's 2024 Year In Review" report:

Thanks for that one as well. It has interesting information and
reflections that should be discussed in the operator/sidrops community,
preferably by people more knowledgeable than me...

Cheers,

One could develop simple monitoring utility which checks for 'overly'
long filesizes of signed objects in the Relying Party's cache. I don't
recommend the below for production monitoring, but merely as
illustration.

For example, using rpki-client on Debian Linux, the following displays
the top 10 largest objects:

  $ cd /var/lib/rpki-client/cache
  $ find * -type f | xargs du -ka | sort -nr | head

Another example, one could monitor the RRDP snapshot size simply by
fetching it:

  $ curl -s https://rrdp.arin.net/notification.xml | grep snapshot
  <snapshot uri="https://rrdp.arin.net/4a394319-7460-4141-a416-1addb69284ff/99127/snapshot.xml&quot; hash="3f2acde605e9aa4b2370e41299d445b5c01a47f78d5ac8df4c8cdc69cf837a98"/>
  $ wget --no-verbose --compression=gzip https://rrdp.arin.net/4a394319-7460-4141-a416-1addb69284ff/99127/snapshot.xml
  2025-01-30 15:22:52 URL:https://rrdp.arin.net/4a394319-7460-4141-a416-1addb69284ff/99127/snapshot.xml [532342274] -> "snapshot.xml" [1]

In a similar way, the notification.xml can be used to find RRDP deltas
and monitor those for size and trends in size.

There also are all kinds of metrics available in OpenMetrics format in
/var/lib/rpki-client/metrics

All in all - there are hundreds of metrics to look at! :slight_smile:

Kind regards,

Job