ROV++: Improved Deployable Defense against BGP Hijacking

Amir_Herzberg · December 9, 2020, 8:04am

Hi, the paper:
ROV++: Improved Deployable Defense against BGP Hijacking
will be presented in the NDSS’21 conference.

The paper is available in:
https://www.researchgate.net/publication/346777643_ROV_Improved_Deployable_Defense_against_BGP_Hijacking

Feedback, by discussion here or by direct email to me, is welcome, thanks.

btw, I keep most publications there (researchgate), incl. the drafts of `foundations of cybersecurity’ ; the 1st part (mostly applied crypto) is in pretty advanced stage, feedback is also very welcome. URL in sig.

Lars_Prehn1 · December 9, 2020, 11:40am

Hi Amir,

Neither providing an abstract nor the high-level takeaways of your work is a rather blunt way to promote your paper. I have a bunch of comments and questions, but I’m only a student so take them with a grain of salt.

Regarding ROV++ v1: Let’s modify your example in Figure 2a slightly such that AS 666 announces 1.2.3/24 also via AS 86. Further, let’s say AS 88 also uses ROV++ v1. Now, let’s replay your example from the paper. AS 78 still sees the same announcements you describe, and you recommend using a different, previously less-preferred route for 1.2/16. Yet, all routes available to AS 78 ultimately run into the same hijack behavior (which is not visible from AS 78’s routing table alone). In a nutshell, your recommendation did not affect the outcome for 1.2.3/24—the traffic still goes towards the hijacker—but you effectively moved all the remaining traffic inside 1.2/16 from an optimal route to a sub-optimal one. Your approach not only may have no effects on the fate of the attacked traffic, but it may also mess with previously unaffected traffic.

Regarding ROV++ v2: A simple sub-prefix hijack would still not yield a “valid” during your ROV. The moment you propagate such a route, you reject the entire idea of ROV. I understand that you drop the traffic, but your proposal still feels like a step backward. However, I’m not an expert on this—I might just be wrong.

Regarding goals: I think that you only meet your first design goal since your definition of ‘harm’ is very restricted. The moment you add more dimensions, e.g., QoS degradation for previously unaffected traffic, this goal is no longer met.

Regarding your evaluation: Which of CAIDA’s serials do you use? Serial-1 is known to miss a significant fraction of peering links, while Serial-2 contains potentially non-existing links (as they are inferred using heuristics). Since coverage and validity of links varies drastically between serials (and for serial-2 even between snapshots), it is unclear to which degree your topology reflects reality. I like that you assumed the basic Gao-Rexford Model for the best-path decision process. Yet, you ignored that various networks deploy things like prefix-aggregation, peer-locking, or more-specifics (referring to /25 … /30 IPv4 prefixes) filters. Further, I do not get why you randomly picked ROV-deploying networks. I am sure people like Job Snijders or Cecilia Testart could have provided you an up-to-date list of ASes that currently deploy ROV. It is not clear to me why it is useful to look at scenarios in which those networks potentially no longer deploy ROV.

Those are at least my thoughts. I hope they initiate some discussion.
Best regards,
Lars

Amir_Herzberg · December 9, 2020, 12:48pm

Lars, peace, and thanks for your comments.

The reason that I didn’t include the abstract is that this list , to my understanding, is mostly for operational issues and discussions btwl operators, and I didn’t want to annoy subscribers by excessive text on an academic paper.

For the same reason, I’m hesitant in responding to such technical questions on this list, unless people are really interested in us doing this here; maybe we should do such discussion off list? [I also have a bit of crazy schedule in rest of this week and next, so I may be unable to response promptly as I normally do; btw part of it is for giving tutorial on PKI and participating in the CANS conference, if anybody interested, it’s free ; not that I understand why I agreed to do it

Cheers, Amir

Amir_Herzberg · January 9, 2021, 5:07pm

Dear Lars (and NANOG), sorry for the late reply. We looked carefully at your feedback, and made few relevant fixes in the paper, e.g., mentioned that we use serial-2 - definitely should have done it , so thanks for pointing it out.

You’re most welcome to take a look at the revised (camera-ready) version; we plan to have a `full version’ later on so if you’ll have any more feedback we’ll be happy to consider it and modify that version accordingly. You can download from : https://www.researchgate.net/publication/346777643_ROV_Improved_Deployable_Defense_against_BGP_Hijacking

Let me respond to all your comments/questions;

Regarding ROV++ v1: Let’s modify your example in Figure 2a slightly such that AS 666 announces 1.2.3/24 also via AS 86. Further, let’s say AS 88 also uses ROV++ v1. Now, let’s replay your example from the paper. AS 78 still sees the same announcements you describe, and you recommend using a different, previously less-preferred route for 1.2/16. Yet, all routes available to AS 78 ultimately run into the same hijack behavior (which is not visible from AS 78’s routing table alone).

Lars, this is incorrect: in your example AS88 uses ROV++ so it would ignore the hijack from 666 and route correctly to 99. But let me clarify: there are scenarios where ROV++ (all versions) fail to prevent hijack for different reasons, including some which you may consider `disappointing’; we never claimed otherwise (and present the results). Clearly, further improving would be interesting!

btw, we are also not claiming our results `prove’ anything. This is not something we can prove just by simulations, we know that, and we continue now with pilot deployment. Although, frankly, I’m quite sure, that ROV++v1 helps a lot - esp for edge ASes.

Regarding ROV++ v2: A simple sub-prefix hijack would still not yield a “valid” during your ROV. The moment you propagate such a route, you reject the entire idea of ROV. I understand that you drop the traffic, but your proposal still feels like a step backward. However, I’m not an expert on this—I might just be wrong.

We definitely don’t reject ROV! it does improve security considerably - although, our results do show there seem to be room to improve.

ROV++V2 doesn’t just propagate the hijack, it turns it into `backhole announcement’. But, based on our results, we don’t recommend to deploy it for announced prefixes; but it does provide significant value for unannounced prefixes - which are often abused, e.g., for DDoS, spam, etc.

Regarding goals: I think that you only meet your first design goal since your definition of ‘harm’ is very restricted. The moment you add more dimensions, e.g., QoS degradation for previously unaffected traffic, this goal is no longer met.

Well, we definitely cannot claim that we meet all intuitive interpretations of `do no harm’; maybe our text was a bit misleading here so we tried to make it more clear.

Regarding your evaluation: Which of CAIDA’s serials do you use? Serial-1 is known to miss a significant fraction of peering links, while Serial-2 contains potentially non-existing links (as they are inferred using heuristics).

Serial 2 - I think, most works in the area use this.

Since coverage and validity of links varies drastically between serials (and for serial-2 even between snapshots), it is unclear to which degree your topology reflects reality. I like that you assumed the basic Gao-Rexford Model for the best-path decision process. Yet, you ignored that various networks deploy things like prefix-aggregation, peer-locking, or more-specifics (referring to /25 … /30 IPv4 prefixes) filters.

We definitely agree that it should be possible to do better simulations/evaluations by taking such aspects in consideration. But : (1) what we did is the same as what was done afaik in all previous works (except it seems our implementation is better optimized), and (2) we are working toward better simulation/evaluation mechanism; in fact we believe we already have a first version working. But we couldn’t use this for this evaluation since this is absolutely non-trivial change of evaluation method, and we have quite a lot of work to complete this and evaluate this very well. Clarifying: I refer to evaluating the correctness of our improved evaluation/simulation mechanism… So that’s why we didn’t use it yet. We are the first to agree current methodology is not the best!

Further, I do not get why you randomly picked ROV-deploying networks. I am sure people like Job Snijders or Cecilia Testart could have provided you an up-to-date list of ASes that currently deploy ROV. It is not clear to me why it is useful to look at scenarios in which those networks potentially no longer deploy ROV.

Excellent point, this may indeed be a more interesting/realistic measurement. I must admit - just didn’t think of it. Stupid… Cecilia sent us a list and although it’s just by email, we’ll use it to do additional evaluation, Real Soon Now

best, Amir