RPKI race

Hello

I noticed that we regressed and started failing the test at https://isbgpsafeyet.com/. Investigating I found that we apparently had some routes in the validation state “unknown” that should have been either invalid or valid. Including the test prefix which was received via NL-IX (and Cogent on IPv6).

We do however have plenty of prefixes that are validated and received from the same sources.

This is a Juniper MX204 router running 20.1R1.11. I tried a few things including “clear bgp neighbor xxx soft-inbound” (supposed to rerun the import policy where RPKI marking and check happens) which did not fix it. Doing a “clear bgp neighbor xxx”, which disconnects the peer and reconnects after a slight delay, did however fix the issue. But I have to do that for every peer we received the prefix from and potentially we could have trouble with every peer we have :frowning:

This router was software upgraded and rebooted two days ago. I suspect a race condition. What if the router started BGP sessions before it was able to communicate with the RPKI validation server or before the RPKI database was synchronized?

I find it a bit disappointing that we this easily ended up with a bad validation state and apparently there is little I can do about it, except for walking through all our peers and BGP reset them. Which frankly is an unacceptable disruption of traffic flow.

Regards,

Baldur

Any default route to a non-ROV enabled upstream ?
Do you receive the test prefix from more than one upstream and the previous test success could be a function of upstream ROV ?

Rubens

Any default route to a non-ROV enabled upstream ?
Do you receive the test prefix from more than one upstream and the previous test success could be a function of upstream ROV ?

No this is how it looks:

admin@gc-edge1> show route 2606:4700:7000::6715:f40f

internet.inet6.0: 92472 destinations, 288208 routes (90838 active, 0 holddown, 6565 hidden)

  • = Active Route, - = Last Active, * = Both

2606:4700:7000::/48*[BGP/170] 1d 21:46:42, MED 100, localpref 100, from 2001:7f8:13::a503:4307:1
AS path: 13335 I, validation-state: unknown

to 2001:7f8:13::a501:3335:1 via nl-ix
[BGP/170] 1d 21:46:39, MED 100, localpref 100, from 2001:7f8:13::a503:4307:2
AS path: 13335 I, validation-state: unknown
to 2001:7f8:13::a501:3335:1 via nl-ix
[BGP/170] 1d 21:46:50, MED 290, localpref 100
AS path: 174 37100 13335 I, validation-state: unknown
to 2001:978:2:d::25:1 via cogent

admin@gc-edge1> show route 103.21.244.14

internet.inet.0: 818706 destinations, 2528384 routes (816242 active, 4 holddown, 32715 hidden)

  • = Active Route, - = Last Active, * = Both

103.21.244.0/24 *[BGP/170] 1d 21:35:34, MED 100, localpref 100, from 193.239.117.0
AS path: 13335 I, validation-state: unknown

to 193.239.117.114 via nl-ix
[BGP/170] 1d 21:35:29, MED 100, localpref 100, from 193.239.116.255
AS path: 13335 I, validation-state: unknown
to 193.239.117.114 via nl-ix

Plenty of prefixes in valid state:

admin@gc-edge1> show route table internet.inet.0 validation-state valid

internet.inet.0: 811569 destinations, 2519989 routes (809383 active, 1 holddown, 28989 hidden)

  • = Active Route, - = Last Active, * = Both

1.9.0.0/16 *[BGP/170] 08:05:51, MED 100, localpref 100, from 193.239.117.0
AS path: 6939 4788 I, validation-state: valid

to 193.239.116.14 via nl-ix
[BGP/170] 08:04:24, MED 100, localpref 100, from 193.239.116.255
AS path: 6939 4788 I, validation-state: valid
to 193.239.116.14 via nl-ix
1.9.250.0/24 *[BGP/170] 08:05:48, MED 100, localpref 100, from 193.239.117.0
AS path: 6939 4788 I, validation-state: valid
to 193.239.116.14 via nl-ix
[BGP/170] 08:04:21, MED 100, localpref 100, from 193.239.116.255
AS path: 6939 4788 I, validation-state: valid
to 193.239.116.14 via nl-ix
1.32.218.0/24 [BGP/170] 2d 05:48:33, MED 210, localpref 100
AS path: 1299 2914 64050 4842 I, validation-state: valid
to 62.115.180.72 via telia
[BGP/170] 2d 05:47:35, MED 290, localpref 100
AS path: 174 2914 64050 4842 I, validation-state: valid
to 149.6.137.49 via cogent
etc

After clearing the relevant BGP sessions the Cloudflare invalid prefixes are gone from our routing table and we pass the test again.

Regards,

Baldur

Are you running RTR to the validator for the router, or using RPKI communities? Mark.

That was missing from the config. After adding it and running the command “request validation policy” I got the prefixes validated.

Thanks for the help.

Baldur

Not to sound funny, but this is one of the reasons I am still afraid to run the Internet in a VRF. There are a lot more things to consider, I’ve often found, compared to what you take for granted in the global table. That said, this is great to know, given that many operators run the Internet in a VRF, and we need RPKI + ROV to be supported far and wide. Mark.

Hi all,

We (Juniper) are aware of the challenges with internet-in-a-VRF and RPKI OV. Hence work is in progress to solve some of these issues.
If there’s news (and I remember this promise) I will update. Feel free to ping me.

Cheers,
Melchior