Request comment: list of IPs to block outbound

The following list is what I'm thinking of using for blocking traffic
between an edge router acting as a firewall and an ISP/upstream. This
table is limited to address blocks only; TCP/UDP port filtering, and IP
protocol filtering, is a separate discussion. This is for an
implementation of BCP-38 recommendations.

I'm trying to decide whether the firewall should just blackhole these
addresses in the routing table, or use rules in NFTABLES against source
and destination addresses, or some combination. If NFTABLES, the best
place to put the blocks (inbound and outbound) would be in the FORWARD
chain, both inbound and outbound. (N.B. for endpoint boxes, they go
into the OUTPUT chain.)

In trying to research what would constitute "best practice", the papers
I found were outdated, potentially incomplete (particularly with
reference to IPv6), or geared toward other applications. This table
currently does not have exceptions -- some may need to be added as a
specific "allow" route or list.

The Linux rp_filter knob is effective for endpoint servers and
workstations, and I turn it on religiously (easy because it's the
default). For a firewall router without blackhole routes, it's less
effective because, for incoming packets, a source address matching one
of your inside netblocks will pass. A subset of the list would be
useful in endpoint boxes to relieve pressure on the upstream edge router
-- particularly if a ne'er-do-well successfully hijacks the endpoint box
to participate in a DDoS flood.

IPv4
Address block Scope Description
0.0.0.0/8 Software Current network (only valid as
                                    source address).
10.0.0.0/8 Private network Used for local communications
                                    within a private network.
100.64.0.0/10 Private network Shared address space[3] for
                                    communications between a service
                                    provider and its subscribers
                                    when using a carrier-grade NAT.
127.0.0.0/8 Host Used for loopback addresses to
                                    the local host.
169.254.0.0/16 Subnet Used for link-local addresses
                                    between two hosts on a single
                                    link when no IP address is
                                    otherwise specified, such as
                                    would have normally been
                                    retrieved from a DHCP server.
172.16.0.0/12 Private network Used for local communications
                                    within a private network.
192.0.0.0/24 Private network IETF Protocol Assignments.
192.0.2.0/24 Documentation Assigned as TEST-NET-1,
                                    documentation and examples.
192.88.99.0/24 Internet Reserved. Formerly used for
                                    IPv6 to IPv4 relay
192.168.0.0/16 Private network Used for local communications
                                    within a private network.
198.18.0.0/15 Private network Used for benchmark testing of
                                    inter-network communications
                                    between two separate subnets.
198.51.100.0/24 Documentation Assigned as TEST-NET-2,
                                    documentation and examples.
203.0.113.0/24 Documentation Assigned as TEST-NET-3,
                                    documentation and examples.
224.0.0.0/4 Internet In use for IP multicast.
240.0.0.0/4 Internet Reserved for future use.
255.255.255.255/32 Subnet Reserved for the "limited
                                    broadcast" destination address.

IPv6
Address block Usage Purpose
::/0 Routing Default route.
::/128 Software Unspecified address.
::1/128 Host Loopback address to local host.
::ffff:0:0/96 Software IPv4 mapped addresses.
::ffff:0:0:0/96 Software IPv4 translated addresses.
64:ff9b::/96 Global Internet IPv4/IPv6 translation.
100::/64 Routing Discard prefix.
2001::/32 Global Internet Teredo tunneling.
2001:20::/28 Software ORCHIDv2.
2001:db8::/32 Documentation Addresses used in documentation
                                    and example source code.
2002::/16 Global Internet The 6to4 addressing scheme
fc00::/7 Private network Unique local address.
fe80::/10 Link Link-local address.
ff00::/8 Global Internet Multicast address.

Hi,

sorry - but why would you want to block Teredo / 6to4?

https://www.team-cymru.com/bogon-reference-http.html

The following list is what I’m thinking of using for blocking traffic
between an edge router acting as a firewall and an ISP/upstream. This
table is limited to address blocks only; TCP/UDP port filtering, and IP
protocol filtering, is a separate discussion. This is for an
implementation of BCP-38 recommendations.

BCP-38 as it applies to outbound traffic is more about blocking SOURCE IP addresses. You should block everything whose source IP address is not within your assigned address space.

100.64.0.0/10 Private network Shared address space[3] for
communications between a service
provider and its subscribers
when using a carrier-grade NAT.

This space is set aside for your ISP to use. like RFC1918 but for ISPs. It is not specifically CGNAT. Unless you are an ISP using this space, you should not block destinations in this space.

224.0.0.0/4 Internet In use for IP multicast.
240.0.0.0/4 Internet Reserved for future use.
255.255.255.255/32 Subnet Reserved for the “limited
broadcast” destination address.

This can be covered with a single rule: 224.0.0.0/3

IPv6
Address block Usage Purpose
::/0 Routing Default route.

The current IPv6 Internet is 2000::/3, not ::/0 and that won’t change in the foreseeable future. You can tighten your filter to allow just that.

Regards,
Bill Herrin

Hi,

Only do this, if this isn't CLI jockey network now or in the future.

Hi,

sorry - but why would you want to block Teredo?

I know nothing about Terendo tunneling.

In computer networking, Teredo is a transition technology that gives
full IPv6 connectivity for IPv6-capable hosts that are on the IPv4
Internet but have no native connection to an IPv6 network. Unlike
similar protocols such as 6to4, it can perform its function even from
behind network address translation (NAT) devices such as home routers.

Teredo operates using a platform independent tunneling protocol that provides IPv6 (Internet Protocol version 6) connectivity by encapsulating IPv6 datagram packets within IPv4 User Datagram Protocol (UDP) packets. Teredo routes these datagrams on the IPv4 Internet and through NAT devices. Teredo nodes elsewhere on the IPv6 network (called Teredo relays) receive the packets, un-encapsulate them, and pass them on.

Are you saying that Terendo should come off the list? Is this useful
between an ISP and an edge firewall fronting an internal network? Would
I see inbound packets with a source address in the 2001::/32 netblock?

sorry - but why would you want to block 6to4?

In my research, this is marked as deprecated. Would I see packets with
a source address in the 2002::/16 netblock?

The Linux rp_filter knob is effective for endpoint servers and workstations, and I turn it on religiously (easy because it's the default).

I think it's just as effective on routers as it is on servers and workstations.

For a firewall router without blackhole routes, it's less effective because, for incoming packets, a source address matching one of your inside netblocks will pass.

I'm not following that statement. Is incoming a reference to packets from the Internet to your LAN? Or is incoming a reference to packets coming in any interface, thus possibly including from your LAN to the Internet?

Even without blackhole (reject) routes, a packet from the Internet spoofing a LAN IP will be rejected by rp_filter because it's coming in an interface that is not an outgoing interface for the purported source IP.

A subset of the list would be useful in endpoint boxes to relieve pressure on the upstream edge router -- particularly if a ne'er-do-well successfully hijacks the endpoint box to participate in a DDoS flood.

rp_filtering will filter packets coming in from the internal endpoint that's been compromised if the packets spoof a source from anywhere by the local LAN. (No comment about spoofing different LAN IPs.)

I've been exceedingly happy with rp_filter and blackhole (reject) routes.

I've taken this to another level where I have multiple routing tables and rules that cascade across tables. One of the later rules is a routing table for any and all bogons & RFC 3330. I am still able to access specific networks that fall into RFC 3330 on internal lab networks without a problem because those prefixes are found in routing tables that are searched before the bogon table that black holes (rejects) the packets. IMHO it works great. (I really should do a write up of that.)

I think you should seriously re-consider using rp_filter on a router.

Are you saying that Terendo should come off the list? Is this useful
between an ISP and an edge firewall fronting an internal network? Would
I see inbound packets with a source address in the 2001::/32 netblock?

If you are running services which are "generally available to the public". you can absolutely expect to see these. Anyone stuck behind an IPv6-hostile NAT44 is likely to end up using Teredo as the "transition mechanism of last resort". It usually works, albeit with poor performance, in almost all situations unless the IPv6-hostile network has actively blocked it in their IPv4 ruleset.

I personally use Teredo somewhat frequently. Yes, I could set up a similar tunneling mechanism to a network I control and get "production" addressing and probably better quality of service, but Teredo is as simple as "apt-get install miredo". It's also available on stock Windows albeit (I think) disabled by default.

If your network only talks to specific, known destinations, then it's up to you. Your network; your rules. It's certainly unlikely you'll ever see any publicly accessible services of consequence being hosted in 2001::/32 if only because the addressing tends to be somewhat transient and NAT hole punching unreliable for inbound, unsolicited data.

In my research, this is marked as deprecated. Would I see packets with
a source address in the 2002::/16 netblock?

In theory, this is just as legitimate as Teredo. In practice, it is indeed deprecated, and almost anyone who can set up 6to4 can get a "production" tunnel to someone like HE.net or likely has 6rd available from their native IPv4 provider. It can also be tricky to prevent reflection type attacks using 6to4 address space.

IIRC, Windows used to set up 6to4 by default if it found it had what it believed to be publicly routable IPv4 connectivity, but I think this may now be disabled. Some consumer routers did the same. It was handy because you got a full /48 allowing non-NAT addressing of subtended networks and even prefix delegation if you wanted it.

While this probably falls under the same justifications as the above, in practice I'd say 6to4 is probably all but dead in terms of legitimate uses on the public Internet of today. I haven't personally run 6to4 in over a decade.

6to4 was a neat idea, but I think it's dead, Jim.

rp_filter is one of the most expensive features in modern routers, you
should only use it, if PPS performance is not important. If PPS
performance is important, ACL is much faster. ACL is also applicable
to more scenarios, such as BGP customers.

How much performance impact should we expect with uRPF?

Thanks.

Depends on the platform, but often it's 2nd lookup. So potentially 50%
decrease in performance. Some platforms it means FIB duplication. And
ultimately it doesn't really offer anything over ACL, which is, in
comparison, much cheaper feature.
I would encourage people to toolise this, then the ACL generation is
no cost or complexity. And you can use ACL for many BGP customers too,
as you create 'perfect' prefix-list for customer, you can reference to
same prefix-list in ACL, without actually needing customer to announce
that prefix, as it's entirely valid to originate traffic from
allowable prefix without advertising the prefix (to you).

Hello!

> How much performance impact should we expect with uRPF?

Depends on the platform, but often it's 2nd lookup. So potentially 50%
decrease in performance. Some platforms it means FIB duplication. And
ultimately it doesn't really offer anything over ACL, which is, in
comparison, much cheaper feature.
I would encourage people to toolise this, then the ACL generation is
no cost or complexity. And you can use ACL for many BGP customers too,
as you create 'perfect' prefix-list for customer, you can reference to
same prefix-list in ACL, without actually needing customer to announce
that prefix, as it's entirely valid to originate traffic from
allowable prefix without advertising the prefix (to you).

This has the potential to brake things, because it requires symmetry
and perfect IRR accuracy. Just because the prefix would be rejected by
BGP does not mean there is not a legitimate announcement for it in the
DFZ (which is the exact difference between uRPF loose mode and the ACL
approach).

For BGP customers where I control the announced IP space (it's mine,
the customer has a private ASN and the only reason for BGP is so he
can multi-home to different nodes of my network), sure.

For real "IP Transit" where the customers may itself have multiple
downstream ASNs, there is no guarantee that everyone in the chain will
update the IRR records 24 - 48 hours before actually sourcing traffic
from a new prefix (or enabling that new downstream as-path). Some
other transit may just allow prefixes "manually" (for example, because
of LOA's or inetnum objects, as opposed to route objects), so *a valid
announcement is in the DFZ*, you are just not accepting it on your
customers BGP session.

In fact, maybe my downstream customer just wants to send traffic to my
network, but not receive any, so I don't actually have to include that
customer in my AS-macro (an exotic use-case for sure, just trying to
point out that there will always be asymmetry).

Routing, BGP and the IRR data is asymmetric by definition and neither
real-time nor 100% accurate. That's not a problem for BGP and strict
ingress prefix-lists, but it is a problem for ingress ACL'ing, because
the latter effectively blackholes traffic, while uRPF loose mode does
not (if there is a announcement for it in the DFZ).

So I don't think ACL's can replace uRPF loose mode in the DFZ and
frankly I find this proposal to be a bit dangerous.

If my transit provider would do this without telling me, I'm turning
up a new transit customer with an incomplete IRR record, causing an
immediate partial outage for them, I would be *very* surprised (along
with some other emotions).

cheers,
lukas

It's interesting to also think, when is good time to break things.

CustomerA buys transit from ProviderB and ProviderA

CustomerA gets new prefix, but does not appropriately register it.

ProviderB doesn't filter anything, so it works. ProviderA does filter
and does not accept this new prefix. Neither Provider has ACL.

Some time passes, and ProviderB connection goes down, the new prefix,
which is now old prefix experiences total outage. CustomerA is not
happy.

Would it have been better, if ProviderA would have ACLd the traffic
from CustomerA? Forcing the problem to be evident when the prefix is
young and not in production. Or was it better that it broke later on?

Hello,

It's interesting to also think, when is good time to break things.

CustomerA buys transit from ProviderB and ProviderA

CustomerA gets new prefix, but does not appropriately register it.

ProviderB doesn't filter anything, so it works. ProviderA does filter
and does not accept this new prefix. Neither Provider has ACL.

Some time passes, and ProviderB connection goes down, the new prefix,
which is now old prefix experiences total outage. CustomerA is not
happy.

Would it have been better, if ProviderA would have ACLd the traffic
from CustomerA? Forcing the problem to be evident when the prefix is
young and not in production. Or was it better that it broke later on?

That's an orthogonal problem and it's solution hopefully doesn't
require a traffic impacting ingress ACL.

I'm saying this breaks valid configurations because even with textbook
IRR registrations there is a race condition between the IRR
registration (not a route-object, but a new AS in the AS-MACRO), the
ACL update and the BGP turn-up of a new customer (on AS further down).

Here's a environment for the examples below:

Customer C1 uses existing transits Provider P11 and P12 (meaning C1 is
actually a production network; dropping traffic sourced by it in the
DFZ is very bad; P11 and P12 is otherwise irrelevant).
Customer C1 is about to turn-up a BGP session to Provider P13.
Provider P13 is a Tier2 and buys transit from Tier1 Providers P1 and P2
Provider P2 deploys ingress ACLs depending on IRR data, based on P13's AS-MACRO.

Example 1:

P13's AS-MACRO is updated last-minute because:

- provisioning was last minute OR
- provisioning was wrong initially OR
- it's an emergency turn-up
- whatever the case IRR records are corrected only 60 minutes before the turn up
- and C1 is aware traffic towards C1 will completely converge only
after additional 24 hours (but that's accepted, because $reasons;
maybe C1 just needs TX bandwidth - in a hypothetical emergency turn-up
for example)

At the turn-up of C1_P13, traffic with as-path C1_P13_P2 is dropped,
because the ingress ACL at P2 wasn't updated yet (updated only once
every night). P13 expected prefixes not getting accepted at P2 on the
BGP session, but never would have imagined that traffic sourced from
valid prefixes present in the DFZ would be dropped.

Example 2:

Just as in example 1, C1 turns up BGP with P13, but the provisoning
was "normal". P13 AS-MACRO was updated correctly 36 hours before the
turn-up.

However, at P2 the nightly cronjob for IRR updates (prefix-lists and
ACL ingress filters) failed. It's is monitored and a ticket about the
failing cronjob was raised, however they either:

- the did not recognize the severity, because "worst-case some new
prefixes are not allowed in ingress tomorrow"
- where unable to fix it in just a few hours
- did fix it, but did not trigger a subsequent full rerun ("it will
run next time", or "it could not complete anyway before the next run")
- maybe the node was actually just unreachable for a regular
maintenance, so automation could not connect this time around
- or maybe automation just couldn't connect to the $node, because
someone regenerated the SSH key by mistake this morning

Whatever the case, the point is: for internal problems at P2, the ACL
wasn't updated during the night like it usually does. And at turn-up
of C1_P13, C1_P13_P2 traffic is again dropped on the floor.

When you reject a BGP prefix, you don't blackhole traffic, with an
ingress ACL you do. That is a big difference and because of this, you
*but more importantly every single downstream ASN* need to account for
race conditions and failures in the entire process, that includes the
immediate resolution thereof, which is not required for BGP strict
prefix-lists and uRPF loose mode.

Is this deployed like this in a production transit network? How does
this network handle a failure like in example 2? How does it
downstream customers handle the race conditions like in example 1?

For the record: I'm imagining myself operating P13 getting blamed in
both examples for partially blackholing C1's traffic at the turn-up.

Thanks,
Lukas

This has the potential to brake things, because it requires symmetry
and perfect IRR accuracy. Just because the prefix would be rejected by
BGP does not mean there is not a legitimate announcement for it in the
DFZ (which is the exact difference between uRPF loose mode and the ACL
approach).

It's interesting to also think, when is good time to break things.

CustomerA buys transit from ProviderB and ProviderA

CustomerA gets new prefix, but does not appropriately register it.

ProviderB doesn't filter anything, so it works. ProviderA does filter
and does not accept this new prefix. Neither Provider has ACL.

Some time passes, and ProviderB connection goes down, the new prefix,
which is now old prefix experiences total outage. CustomerA is not
happy.

Would it have been better, if ProviderA would have ACLd the traffic
from CustomerA? Forcing the problem to be evident when the prefix is
young and not in production. Or was it better that it broke later on?

Having been through this exact situation recently (made worse by the fact that it was caused by provider b’s upstreams not having updated their filters and not provider b itself), I would suggest its 100 times better for it to happen right at the start rather than randomly down the track

Hey Lukas,

I'm saying this breaks valid configurations because even with textbook
IRR registrations there is a race condition between the IRR
registration (not a route-object, but a new AS in the AS-MACRO), the
ACL update and the BGP turn-up of a new customer (on AS further down).

I'm not proposing an answer, I'm asking a question.
Could it be that the utter disinterest in working BGP filters is
consequence of it not actually mattering in turn-ups in typical case?
And would the examples be same, if we were not so disinterested in
having proper BGP filters in place?
If in common case we did ACL, would we evolve different mechanisms to
ensure correctness of filtering before fact? Perhaps common API to
query state of filters in provider networks? Perhaps maintenance
window to turn-up new transit with option to fall back immediately and
complain about their configurations?

Is this deployed like this in a production transit network? How does
this network handle a failure like in example 2? How does it
downstream customers handle the race conditions like in example 1?

Yes, I've ran BGP prefix-list == firewall filter (same prefix-list
verbatim referred in BGP and Firewall) for all transit customers in
one network for +decade. Few problems were had, the majority of
customers were happy after explaining them logic behind it. But this
was tier2 in Europe, data quality is high in Europe compared to other
markets, so it doesn't communicate much of global state of affairs. I
would not feel comfortable doing something like this in Tier1 for
US+Asia markets.
But there is also no particular reason why we couldn't get there, if
we as a community decided it is what we want, it would fix not just
unexpected BGP filter outages but also several dos and security
issues, due to killing spoofing. It would give us incentive to do BGP
filtering properly.

Hello,

> Is this deployed like this in a production transit network? How does
> this network handle a failure like in example 2? How does it
> downstream customers handle the race conditions like in example 1?

Yes, I've ran BGP prefix-list == firewall filter (same prefix-list
verbatim referred in BGP and Firewall) for all transit customers in
one network for +decade. Few problems were had, the majority of
customers were happy after explaining them logic behind it. But this
was tier2 in Europe, data quality is high in Europe compared to other
markets, so it doesn't communicate much of global state of affairs. I
would not feel comfortable doing something like this in Tier1 for
US+Asia markets.

Ok, that is a very different message than what I interpreted from your
initial post about this: just enable it, it's free, nothing will
happen and your customers won't notice.

But there is also no particular reason why we couldn't get there, if
we as a community decided it is what we want, it would fix not just
unexpected BGP filter outages but also several dos and security
issues, due to killing spoofing. It would give us incentive to do BGP
filtering properly.

I agree this is something that should to be discussed, but to get
there it's probably a very long road. Just look at the sorry state of
BGP filtering itself. And this requires even more precision,
automation,carefulness and *process changes*.

I just want to emphasize that when I buy IP Transit and my provider
does this *without telling me beforehand*, I will be very surprised
and very unhappy (as I'm probably discovering this configuration
because of a partial outage).

Lukas

BGP is broken, because it can be. If it could not be, it would not be.
This would make BGP filters market driven fact. Instead of nice thing
some nerds care about. Transition would invariably cause some gray
hairs, but Internet is robust against technical and non-technical
problems.

I still think that ACLs rule should go hand in hand with eBGP prefixes by default.
But the ACLs should be updated based on advertised and accepted eBGP prefixes automatically (so not dependent on external data).
If the IRR data accuracy and AS-MACROs get solved the filtering problem would be solved as well.
If such mechanism was enabled by default in all vendors' implementations it would address the double lookup problem of uRPF while accomplishing the same thing and even address the source IP spoofing problem.
3 Simple rules:

Rule 1)
If you are advertising a prefix
Then allow it as source prefix in your egress ACL
And allow it as destination prefix in you ingress ACL
(cause why do you advertise a prefix well you expect to send traffic sourced from IPs covered by that prefix and you expect to get a response back right?)
And as a result
Traffic sourced from IPs haven't advertised via a particular link would be blocked at egress from your AS (on that link) -boundary A1
Traffic destined to IPs you haven't advertised via a particular link it will be blocked at ingress to you AS (on that link)

Rule 2)
If you are accepting a prefix
Then allow in as source in your ingress ACL
And allow it as destination in your egress ACL
(cause why do you accept a prefix well you expect to send traffic towards IPs covered by that prefix and you'd want those IPs to be able to respond back right?)
And as a result
Traffic sourced from IPs you haven't accepted via a particular link would be blocked at ingress to your AS (on that link) -boundary A2
Traffic destined to IPs you haven't accepted via a particular link would be blocked at egress from your AS (on that link) -required because there's already an egress ACL blocking everything.

Rule 3)
If interface can't be uniquely identified based on IPs used for the eBGP session warn the operator about the condition

The obvious drawback especially for TCAM based systems is the scale, so not only we'd need to worry if our FIB can hold 800k prefixes, but also if the filter memory can hold the same amount -in addition to whatever additional filtering we're doing at the edge (comb filters for DoS protection etc...)

adam