BCP38 on public-facing Ubuntu servers

Not every uplink service implements BCP38. When putting up servers connected more-or-less directly to the Internet through these uplinks, it would be nice if the servers themselves were able to implement ingress and egress filtering according to BCP38. (Sorry about the typo in the subject lines of my previous message -- not everyone can get a BGP feed.)

(Or, when using Ubuntu server edition to implement edge routers.)

My earlier query was asking if anyone has encoded the blackhole routes in YAML for inserting in netplan(5). My prior message contains the routes to be blackholed. That takes care of egress routing.

(I think I can write a Python program to take my list and convert it to the YAML that netplan(5) wants to see. That way, the routes are inserted when the public interface is up, and removed when the public interface is down.)

Ingress routing appears to be one-line addition. IPTABLES can be told to weed out packets with unroutable source addresses. My experiments will add something like this line to the firewall:

# iptables -A INPUT -m addrtype -i enp1s0 --src-type BLACKHOLE -j DROP

THIS HAS NOT BEEN VERIFIED. I'm building a web server that will integrate this idea, and try it out.

Maybe you can explore the in kernel feature call RP filter or reverse path filter. In router gear it's called uRPF.

cat /proc/sys/net/ipv4/conf/default/rp_filter

There are 2 modes: Loose or strict.

If your server is BGP multi-homed, then you must use loose. Loose is still very powerful and useful.

Basically, RP is doing what a router does, but the opposite way. When a packet arrives on your server, it checks the routing table for destination next-hop and RP also check whether the frames arrived from the good source interface. If your routing is asymmetric or spoofed, then RP drops it.
It's a nice feature, but it's doing a double route checkup so for sure, it's slightly slower. I'm not sure we can say that it's twice slower though.

I assume your network is not asymmetric, so RP would help you for ingress traffic. For egress, then add blackholes routes to /dev/null interface or with the bogon scripts in python. I wouldn't use iptables for that as it's purely routing, but there are many ways to achieve the same goal.

I recommend to explore the rp_filter as it might do what you're looking for.

As a side note, iptables is super slow when under attack and/or under heavy load.
There are a lot of limitations, like the kernel can only forward ~1.4 Mpps per cpu/socket with iptables. It's too slow slow in my opinion and this was still true recently, but I can't confirm with the latest 5.x kernel. It could have been fix or improve.

Finally, can you share with us which provider doesn't filter BCP38 in their uplink? #JustCurious. :blush:

Jean

And by that he means: “only a few” =D.

Maybe you can explore the in kernel feature call RP filter or reverse path filter. In router gear it's called uRPF.

cat /proc/sys/net/ipv4/conf/default/rp_filter

+100 to rp_filter

There are 2 modes: Loose or strict.

If your server is BGP multi-homed, then you must use loose. Loose is still very powerful and useful.

I think loose with any default will fail to do what you want. If you are running your router without a default, then loose would probably be okay.

Basically, RP is doing what a router does, but the opposite way. When a packet arrives on your server, it checks the routing table for destination next-hop and RP also check whether the frames arrived from the good source interface.

For strict mode, the router allows the incoming packet if the incoming interface would be the outgoing interface when sending a packet to the incoming packet's source IP.

If your routing is asymmetric or spoofed, then RP drops it. It's a nice feature, but it's doing a double route checkup so for sure, it's slightly slower. I'm not sure we can say that it's twice slower though.

I'm confident that it is at least some slower. However ...

I have a lowly AMD E-350 APU (lscpu says it's at 918 MHz) processing multiple hundred Mbps on GPON against a full DFZ feed with no noticeable delay. (I've never felt the need nor desire to instrument it.)

As such, I'm confident that any system that would be used in a greenfield deployment will be able to *easily* handle the traffic that most servers will see.

I assume your network is not asymmetric, so RP would help you for ingress traffic. For egress, then add blackholes routes to /dev/null interface or with the bogon scripts in python. I wouldn't use iptables for that as it's purely routing, but there are many ways to achieve the same goal.

"unreachable" routes (in Linux parlance) or "null" routes (in Cisco parlance) combined with Reverse Path Filtering (RPF) is a HUGE win in my book.

I've expanded this methodology to federate Fail2Ban between multiple systems. EBGP via bird to trade fail2ban specific tables between machines and ip rule to make sure the fail2ban table is processed. Works great in my opinion.

I recommend to explore the rp_filter as it might do what you're looking for.

+100

As a side note, iptables is super slow when under attack and/or under heavy load. There are a lot of limitations, like the kernel can only forward ~1.4 Mpps per cpu/socket with iptables. It's too slow slow in my opinion and this was still true recently, but I can't confirm with the latest 5.x kernel. It could have been fix or improve.

That may be the case. However, that's Apples (iptables) to walnuts (RPF). They are both food (processing packets), but they are significantly different.

rp_filter is great until your network is slightly less than a perfect
hierarchy. Then your Linux "router" starts mysteriously dropping
packets and, as with allow_local, Linux doesn't have any way to
generate logs about it so you end up with these mysteriously
unexplained packet discards matching no conceivable rule in
iptables... This failure has too often been the bane of my existence
when using Linux for advanced networking.

Regards,
Bill Herrin

I don't remember the particulars, but I thought that was the domain of log_martians (net.ipv4.conf.*.log_martians).

Without log_martians or explicitly looking for such, no, you won't get any indication of such drops.

Yes, enabling the log_martians sysctl will generate a kernel log
message for each rp_filter failure (subject to rate limiting). There
are also stat counters in /proc/net/stat/rt_cache (one line per CPU) for
in_martian_dst and in_martian_src which increment regardless of the
log_martians setting.

  The rp_filter sysctl defaults to strict mode (== 1) on Ubuntu,
but can be set to loose mode (== 2); the difference is, essentially, in
strict mode the reverse path must be the same interface as the ingress
interface, whereas in loose mode the reverse path can be any interface
(as long as the source address is reachable).

https://www.kernel.org/doc/Documentation/networking/ip-sysctl.rst

  -J

Hey,

to my knowledge there is no IPv6 equivalent for net.ipv4.conf.all.rp_filter.

Therefore I use netfilter to do the RP filtering for both address families.

ip(6)tables -t raw -I PREROUTING -m rpfilter --invert -j DROP

Using the raw tables less resources are used, but you could also choose other tables.
Details abour rpfilter can be found here [1].

This can also be achieved using nftables [2].

Best

Fran

[1] Man page of iptables-extensions
[2] Matching routing information - nftables wiki

I've been in discussions on how to filter packets with bad source addresses on several mailing lists, including this one. For the last few weeks, I've been search for all the information I can find for how Linux implements rp_filter...which appears to have some holes.

Looking at /proc/sys/net/ipv6, there is no knob for rp_filter, so if your system is IPv6 enabled you have to use the built-in firewall.

For IPv4, I found kernel documentation, but it doesn't tell the whole story. For that, I had to comb the kernel sources to find out all the details of rp_filter. I've prepared a RFC letter of what I think I found, to be sent to the kernel developers. Here is the text of what I'll be sending, with any constructive criticism I get from here:

Letter begins:

After looking at the source that appears to implement rp_filter
     linux/net/ipv4/fib_frontend.c
I believe that I now understand the tests rp_filter performs to
validate the source address when net.ipv4.conf.*.rp_filter is
set to one or two for a given interface.

Does the new paragraph I have written accurately reflect what
happens? If so, then I find out how to submit a patch to add the
clarification to the kernel document.

Description of rp_filter from
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt

Bingo!

With the -t raw, you can bypass the 1.2 Mpps limitation in iptables per cpusocket, because it's doing a very early drop without crossing the full iptables kernel modules.

You can reach close to wrirespeed with the -t raw compare to using the same iptables without -t raw.

Jean