We learned from Cloudflare’s https://isbgpsafeyet.com/ that some ASes have deployed RPKI Origin Validation (ROV). However, we downloaded BGP collection data from RouteViews and RipeRis platforms and found that some ROV-ASes can announce some invalid routes. For example, from RIB data at 2022-10-31 00:00:00, 13 out of 17 ASes which declared to deploy ROV announced invalid routes, and we list the number of related prefixes for each AS below.
We can see that ROV ASes announced apparently fewer invalid routes compared to the non-ROV ASes, though they did not filter all the invalids.
AS6939 announced apparently more invalid routes compared with other ROV-ASes. We learned from the discussions two years ago (Reactive RPKI ROV (Was: Hurricane Electric has reached 0 RPKI INVALIDs)) that AS6939 uses reactive ROV. I.e., route collectors identify invalid routes, write them into scripts and send to routers, who then send “withdrawals” of the invalids based on the scripts.
However, for the BGP collection time 2022-10-31 00:00:00, we downloaded the two-hour updates afterwards, and found very few withdrawals from AS6939 about those invalid routes in the first hour. In the second hour, AS6939 withdraws hundreds of invalid prefixes, but most of these withdraws are followed by another invalid announcement with the same prefix and same invalid origin AS.
Can anyone help us to correctly interpret this case? Thank you very much.
We learned from Cloudflare's https://isbgpsafeyet.com/ that some ASes
have deployed RPKI Origin Validation (ROV). However, we downloaded BGP
collection data from RouteViews and RipeRis platforms and found that
some ROV-ASes can announce some invalid routes. For example, from RIB
data at 2022-10-31 00:00:00, 13 out of 17 ASes which declared to
deploy ROV announced invalid routes, and we list the number of related
prefixes for each AS below.
[snip]
As a comparison, we count the invalid routes the non-ROV ASes (also
declared in https://isbgpsafeyet.com/) announces, as below:
We can see that ROV ASes announced apparently fewer invalid routes
compared to the non-ROV ASes, though they did not filter all the
invalids.
[snip]
Can anyone help us to correctly interpret this case? Thank you very much.
You ask great questions! I hope an answer to your questions can be found
in a message I sent a year ago:
The summary: in any sufficiently large network, chances are not 100% of
all equipment supports RPKI-based BGP Route Origin Validation; in such
cases a handful of invalid routes may still percolate through the
system. Another contributing factor might be certain types of software
upgrades; where ROV temporarily is disabled on one or more devices. Or
perhaps an ISP made a handful of exceptions for test/beacon invalid
routes to propagate.
aside from technical reasons for an ROV-supporting AS (RAS) to announce
an ROV invalid prefix, there is an administrative one. the RAS's
customers *pay* RAS to announce the customers' prefixes. so RAS is
configured to propagate their customers' announcements without dropping
invalids.
Hello Job,
Thank you very much for your reply! I got that no AS can actually filter all the invalids. Yet I was trying to figure out why we couldn't see reasonable amount of withdrawals from AS6939 about invalid prefixes, as they explained how they implement ROV (Reactive RPKI ROV (Was: Hurricane Electric has reached 0 RPKI INVALIDs)). Perhaps we need to learn their detailed implementations.
Thank you very much!
There are 2 sides to the bgp conversation for any ASN, and then really 4 sides.
customer -> RAS -> peer (settlement-free)
peer(sfp) -> RAS -> customer
customer -> ras -> transit
transit -> ras -> customer
Depending on the RAS's capabilities or status in their journey to
'fully RAS', it's
possible that they may have:
o "We OV all customer sessions" (notably not SFP peers)
o "We OV all sessions(*)" (noting not all, and maybe depending on
platform specifics)
There are a bunch of ways this goes wrong This also doesn't really
tell what sort of peering
the RAS has set up with RouteViews (customer? peer? partial peer?)
Also, also, possibly the output path on the session(s) here is not
filtering in an OV fashion.
ROV belongs on the input path, let's not ROV on the output towards
customers / route collectors.
Announcing bigger, ROV valid/unkown aggregates, while really routing
based on possibly ROV-invalid more specifics in the FIB is akin to
actively obscuring routing security, "cheating" your way to a RAS.
Yes, there are some very specific situations where output ROV is
beneficial (a peering box not supporting ROV and you ask your peer to
ROV their output), but let's not normalize ROV on the output path.
sure. This assumes a 100% coverage for all inputs to the rib-out on
the customer port we're talking about, though.
If you don't have 100% coverage you'll end up with the leaks
seen/reported by the OP.
I don't mean to say/imply:
"Hey, everyone(anyone) should do OV on output"
I mean to say that:
"Hey, if you see OV failures leaking, this is probably a side effect
of the behavior/design
choices a network made." (not doing OV filtering on one of
peer/customer/transit type
peerings."