The Making of a Router

sten_rulz · December 28, 2013, 7:09am

Hello Baldur,

Your design regarding proxy arp for every VLAN might hit some issues. If
you look at the nanog history you will find people having issues with proxy
arp for large number of VLANs, what is your requirement for proxy arp?
Doing something at the access switch will most likely be better for you
such as PVLAN or Brocade IP follow ve statement. If you are planning to put
clients on the same subnet what are you planning to put in place to limit
client stealing each other’s IPs? Only a few Brocade devices support the
ARP ACLs rules which are a really nice feature, IP Source Guard works
reasonable if using a DHCP server otherwise you need to specify the MAC
address. Some other brand switches support filtering the ARP packets per
access port.

Regards,
Steven.

Baldur_Norddahl · December 29, 2013, 2:31am

This is a complex question that depends entirely on the capabilities of the
equipment I can get. I was considering an OpenFlow solution, where this is
easy: I would make rules that would only forward traffic with correct
source IP from each VLAN. If the user tries something funny, nothing
happens and his traffic is just dropped.

But I am bit let down on the capabilities of current OpenFlow switches.
Most only support OpenFlow 1.0 which is simply not good enough. That has no
IPv6 support, which naturally is a requirement. I know about the HP
offerings, but they only support 4k rules in hardware, which is a far cry
from being enough. There is NoviFlow who are still working on getting me a
quote. If they can give me a competitive price I might still consider
OpenFlow.

The problem is this: A conventional approach assigns a full IPv4 subnet to
each user. This uses a minimum of 4 addresses of each user. I currently
have to pay somewhere between $10 and $20 for each address and this will
only become more expensive in the future.

The users each have a unique VLAN (Q-in-Q). The question is, what do I put
on those VLANs, if I do not want to put a full IPv4 subnet on each?

My own answer to that is to have the users share a larger subnet, for
example I could have a full class C sized subnet shared between 253
users/VLANs.

To allow these users to communicate with each other, and so they can
communicate with the default gateway IP, I will need proxy arp. And in a
non-OpenFlow solution, also the associated security functions such as
DHCP-snooping to prevent hijacking of IP addresses.

Which devices can solve this task?

To me the work seems quite simple. For outbound packets, check that the
source IP matches the expected IP on the VLAN, then forward the packet
according to the routing table. For inbound packets, lookup the destination
IP and find the correct VLAN, then push the VLAN tag on the packet and
forward it using the normal MAC lookup. For ARP packets, lookup the
destination VLAN from the destination IP, change the VLAN tag and forward
the packet.

There is no reason a device should not be able to handle a large number of
rules such as the above. The NoviSwitch will do it. However it appears that
a lot of devices are quite limited in this regard. I could buy a
router/switch for every few thousand users and split the work between them.
Split the cost on many users, so the extra cost would probably not be
prohibitive. This is the do the work at the edge solution, although I would
be hosting the equipment in the same rack as the core router. But why fill
a rack with equipment, to do simple dummy work, that should be manageable
by a single device?

Regards,

Baldur

Laurent_GUERBY · December 29, 2013, 9:10am

Hi Baldur,

Assuming you manage 1.1.1.0/24 and 2001:db8:0::/48 and
have a Linux box on both ends you can get rid of
IPv4 and v6 interco subnets and arp proxy the following way:

1/ on the gateway
ip addr add 1.1.1.0/32 dev lo

for all client VLAN "NN" on eth0 :
ip -6 addr add fe80::1/64 dev eth0.NN
ip -6 route add 2001:db8:0:NN00::/56 via fe80::1:NN dev eth0.NN

2/ on user CPE number "NN" CPE WAN interface being eth0 :
ip addr add 1.1.1.NN/32 dev eth0
ip route add 1.1.1.0/32 dev eth0
ip route add default via 1.1.1.0
ip -6 addr add fe80::1:NN/64 dev eth0
ip -6 route add default via fe80::1 dev eth0
# ip -6 addr add 2001:db8:0:NN00::1/56 dev eth0 # optional

Note: NN in hex for IPv6

The trick in IPv4 is that linux by default will answer to ARP requests
for "1.1.1.0" on all interfaces even if the adress is on the loopback.
And in IPv6 use static link local on both ends. You can replace
"1.1.1.0" by any IPv4, but since ".0" are rarely assigned to end users
it doesn't waste anything and keep traceroute with public IPv4.

The nice thing of this setup is that it "virtualizes" the routing from
the client point of view: you can split/balance your clients on multiple
physical gateways and not change a line to the client configuration
while it's being moved, you just have to configure your IGP between
gateways to properly distribute internal routes.

We (AS197422 / tetaneutral.net) use this for virtual machines too (with
"tapNN" interfaces from KVM instead of "eth0.NN"): it allows us to move
virtual machines around physical machines without user reconfiguration,
not waste any IPv4 and avoid all issues with shared L2 (rogue RA/ARP
spoofing/whatever) since there's no shared L2 anymore between user VM.
It also allows us to not pre split our IPv4 space in a fixed scheme,
we manage only /32 so no waste at all.

Of course you still have work to do on PPS tuning.

Sincerely,

Laurent GUERBY
AS197422 http://tetaneutral.net peering http://as197422.net

PS: minimum settings on a Linux router
echo 1 > /proc/sys/net/ipv4/ip_forward
for i in /proc/sys/net/ipv6/conf/*; do for j in autoconf accept_ra; do echo 0 > $i/$j; done;done
echo 1 > /proc/sys/net/ipv6/conf/all/forwarding
echo 65536 > /proc/sys/net/ipv6/route/max_size
for i in /proc/sys/net/ipv4/conf/*/arp_announce; do echo 2 > $i;done

PPS: we also like to give /56 to our users in IPv6, it makes a nice /24
IPv4 <=> /48 IPv6 correspondance (256 users).

Ray_Soucy · December 29, 2013, 2:11pm

for i in /proc/sys/net/ipv4/conf/*/arp_announce; do echo 2 > $i;done

+1 setting arp_announce in Linux is essential if being used as a router
with more than one subnet.

I would also recommend setting arp_ignore. For Linux-based routers, I've
found the following settings to be optimal:

echo 1 > /proc/sys/net/ipv4/conf/all/arp_announce
echo 2 > /proc/sys/net/ipv4/conf/all/arp_ignore

On a side note, this underscores what a lot of people on-list are saying:

If you don't understand the internals of a Linux system, for example,
"rolling your own" will bite you.

It's also pretty rare to find a network engineer who is also a Linux
system-level developer, so finding and maintaining that talent can often be
a challenge.

Many make a leap and go on to assert that because of this software-based
systems can never be viable, which I disagree with. After all, the latest
OS offerings from Cisco run a Linux kernel. Nearly all the Ciena DWDM and
ME gear I run is built on Linux. These companies aren't doing quite as
much with hardware acceleration as they would lead you to believe.

I think Intel DPDK will be a disruptive technology for networking.

At the end of the day, I'm pretty anxious to see the days of over-priced
routers driving up network service costs go away.

> (...)
> The users each have a unique VLAN (Q-in-Q). The question is, what do I

put

> on those VLANs, if I do not want to put a full IPv4 subnet on each?
>
> My own answer to that is to have the users share a larger subnet, for
> example I could have a full class C sized subnet shared between 253
> users/VLANs.
>
> To allow these users to communicate with each other, and so they can
> communicate with the default gateway IP, I will need proxy arp. And in a
> non-OpenFlow solution, also the associated security functions such as
> DHCP-snooping to prevent hijacking of IP addresses.
>
> Which devices can solve this task?

Hi Baldur,

Assuming you manage 1.1.1.0/24 and 2001:db8:0::/48 and
have a Linux box on both ends you can get rid of
IPv4 and v6 interco subnets and arp proxy the following way:

1/ on the gateway
ip addr add 1.1.1.0/32 dev lo

for all client VLAN "NN" on eth0 :
ip -6 addr add fe80::1/64 dev eth0.NN
ip -6 route add 2001:db8:0:NN00::/56 via fe80::1:NN dev eth0.NN

2/ on user CPE number "NN" CPE WAN interface being eth0 :
ip addr add 1.1.1.NN/32 dev eth0
ip route add 1.1.1.0/32 dev eth0
ip route add default via 1.1.1.0
ip -6 addr add fe80::1:NN/64 dev eth0
ip -6 route add default via fe80::1 dev eth0
# ip -6 addr add 2001:db8:0:NN00::1/56 dev eth0 # optional

Note: NN in hex for IPv6

The trick in IPv4 is that linux by default will answer to ARP requests
for "1.1.1.0" on all interfaces even if the adress is on the loopback.
And in IPv6 use static link local on both ends. You can replace
"1.1.1.0" by any IPv4, but since ".0" are rarely assigned to end users
it doesn't waste anything and keep traceroute with public IPv4.

The nice thing of this setup is that it "virtualizes" the routing from
the client point of view: you can split/balance your clients on multiple
physical gateways and not change a line to the client configuration
while it's being moved, you just have to configure your IGP between
gateways to properly distribute internal routes.

We (AS197422 / tetaneutral.net) use this for virtual machines too (with
"tapNN" interfaces from KVM instead of "eth0.NN"): it allows us to move
virtual machines around physical machines without user reconfiguration,
not waste any IPv4 and avoid all issues with shared L2 (rogue RA/ARP
spoofing/whatever) since there's no shared L2 anymore between user VM.
It also allows us to not pre split our IPv4 space in a fixed scheme,
we manage only /32 so no waste at all.

Of course you still have work to do on PPS tuning.

Sincerely,

Laurent GUERBY
AS197422 http://tetaneutral.net peering http://as197422.net

PS: minimum settings on a Linux router
echo 1 > /proc/sys/net/ipv4/ip_forward
for i in /proc/sys/net/ipv6/conf/*; do for j in autoconf accept_ra; do

echo 0 > $i/$j; done;done