RIPE NCC publishes case study of youtube.com hijack

Tom_Quilling · February 29, 2008, 2:04pm

for those interested in the matter

tom

David_A_Ulevitch · February 29, 2008, 2:46pm

The report states:

Sunday, 24 February 2008, 20:07 (UTC): AS36561 (YouTube) starts announcing 208.65.153.0/24. With two identical prefixes in the routing system, BGP policy rules, such as preferring the shortest AS path, determine which route is chosen. This means that AS17557 (Pakistan Telecom) continues to attract some of YouTube's traffic.

It's worth noting that from where I sit, it appears as though none of Youtube's transit providers accepted this announcement. Only their peers.

The point is -- Restrictive customer filtering can also bite you in the butt. Trying to require your providers to do a "ge 19 le 25" (or whatever your largest supernet is), rather than filters for specific prefix sizes seems a worthwhile endeavor so you can de-aggregate on the fly, as necessary.

-David

Tom Quilling wrote:

Danny_McPherson4 · February 29, 2008, 5:32pm

The report states:

Sunday, 24 February 2008, 20:07 (UTC): AS36561 (YouTube) starts announcing 208.65.153.0/24. With two identical prefixes in the routing system, BGP policy rules, such as preferring the shortest AS path, determine which route is chosen. This means that AS17557 (Pakistan Telecom) continues to attract some of YouTube's traffic.

It's worth noting that from where I sit, it appears as though none of Youtube's transit providers accepted this announcement. Only their peers.

A simple artifact of shortest AS path route selection. In addition, many
providers employ policies that apply preference for prefixes learned from
customers over those learned from peers, assuming they're of the same
length.

Had those same providers explicitly not accepted the /24 announcement
from AS 17557 via their peers you wouldn't have been affected at all.

The point is -- Restrictive customer filtering can also bite you in the butt. Trying to require your providers to do a "ge 19 le 25" (or whatever your largest supernet is), rather than filters for specific prefix sizes seems a worthwhile endeavor so you can de-aggregate on the fly, as necessary.

Deaggregation in order to mitigate less specific route hijacking is a hack
that in most cases only half fixes the problem, if that. If providers didn't
have those policies in place it'd be /32s that were being hijacked and
route table growth and churn would be far worse than it already is.

You prevent this by ubiquitous deployment of explicit customer and inter-
provider prefix filters, you don't open things up more so that when problems
occur, folks can try to hack around them.

-danny

Jeff_Aitken · February 29, 2008, 6:14pm

If you support community-based blackholes, your customers want/need to be
able to advertise up to /32. At a previous job we defined customer
prefix-filters as "prefix/mask upto 32" and then applied a reasonable
max-prefix setting[1]. This allowed customers to send us a reasonable
number of deaggregates for blackholing or TE purposes but protected us
from a full-on leak/deaggregation event. Needless to say, each prefix
with a mask longer then /24 was tagged with no-export as well, so those
longer prefixes weren't propagated beyond our network.

[1] We had a limited number of customer buckets... IIRC something like 2500,
5000, 15000, and 25000. That keeps the number of different configurations
to a minimum number but still gives adequate protection.

--Jeff

David_A_Ulevitch · February 29, 2008, 6:49pm

Danny McPherson wrote:

It's worth noting that from where I sit, it appears as though none of Youtube's transit providers accepted this announcement. Only their peers.

A simple artifact of shortest AS path route selection.

Well, we (youtube and opendns) share some common transit providers -- and so I had expected to see all announcements from one customer to another customer directly downstream from the provider. But you very well could be right.

Had those same providers explicitly not accepted the /24 announcement
from AS 17557 via their peers you wouldn't have been affected at all.

Of course... In fact, wouldn't it even providers benefit from having some logic that says "don't ever accept a more specific of a customer-announced prefix?"

Customers might not like that though...

You prevent this by ubiquitous deployment of explicit customer and inter-
provider prefix filters, you don't open things up more so that when problems occur, folks can try to hack around them.

Like most things, ymmv.

-David

Danny_McPherson4 · February 29, 2008, 7:35pm

Of course... In fact, wouldn't it even providers benefit from having some logic that says "don't ever accept a more specific of a customer-announced prefix?"

Sure, that'd suck less, I guess, although then you have to punch
holes for multi-homed customers, etc.., which is actually trivial if
policy is generated automatically based on RPSL or the like and
the policy is registered accordingly. But I still prefer explicit route
policy where what an AS is permitted to announce is all that's
permitted and any other prefixes are discarded. Any policy that
allows lots of more specifics to be announced in case of route
hijacking to me is like putting a band-aid on a headache.

Customers might not like that though...

Right, it's breaks fail-over with multi-homing, in particular. As far
as other more specifics for TE and the like, well, they're certainly
welcome to register those prefixes such that they can be reflected
in routing policies, but this would also ease announcements of
unintentional more-specifics.

I don't consider this one of those 'YMMV' things. Today, if
providers explicitly filter at all they filter customer routes based
on some IRR data or other internal database. They may put a
few safety nets in place for bogon prefixes and certain prefix
length policies or ASNs, or perhaps not accept their own aggregate
or more specifics from peers.

However, they accept everything else from peers, which means
tomorrow, when this happens again, all they can do is get pissed
because some monkey on the other side of the world fat-fingered
a 2 instead of a 3, or forget to attached a no_advertise, no_export
or other explicit non-transit community to a blackhole route .. and
now some other site "that presumably matters" is offline, or half
reachable, or whatever...

Further, we can keep experiencing more extraneous route table
bloat because of folks advertising more specifics of their own
aggregates in order to minimize any impact a potential hijacking
might have to their own space......

Or, we could start implementing explicit inter-provider filtering.

Explicit policy on all inter-domain peers, customer or provider, based
on RIR allocations, IRR objects and RPSLish language, and work on
removing deployment barriers (e.g., stale IRR data, allocation
authentication, IRR update vulnerabilities, router configuration scale
and load issues, TTM for newly announced prefixes, etc..), with
real deployment likely in an incremental bi-lateral manner between
ISPs that employ IRR data for customer route policy today and already
have tools to manage and deploy new policy.

I challenge providers to step up here, the onus is on you and nothing
else is going to make this problem go away. There's tangible
incremental benefit to any provider that institutes such a policy, and
by it's very nature, the right ISPs will encourage other sites on the
Internet to begin employing IRRs and similar mechanisms, if for no
reason other than to enable propagation of their own legitimate routes
more quickly.

-danny