IP4 address conservation method

Mikael_Abrahamsson · June 4, 2013, 10:34pm

I read:

http://www.nanog.org/sites/default/files/tues.general.Papandreou.conservation.24.pdf

I would like to point out RFC 3069. On most cisco equipment this is done using static routes and "ip unnumbered".

So my question is basically: What am I missing? Why can't data center guys not build their network the same way regular ETTH is done? Either one vlan per customer and sharing the IPv4 subnet between several vlans, or having several customers in the same vlan but use antispoofing etc (IETF SAVI-wg functionality) to handle the security stuff?

One vlan per customer also works very well with IPv6.

Dan_White · June 5, 2013, 2:44pm

I read:

http://www.nanog.org/sites/default/files/tues.general.Papandreou.conservation.24.pdf

I would like to point out RFC 3069. On most cisco equipment this is done using static routes and "ip unnumbered".

So my question is basically: What am I missing? Why can't data center guys not build their network the same way regular ETTH is done? Either one vlan per customer and sharing the IPv4 subnet between several vlans, or having several customers in the same vlan but use antispoofing etc (IETF SAVI-wg functionality) to handle the security stuff?

VLAN-per-subscriber (1 customer per VLAN), can require more costly routing
equipment, particularly if you're performing double tagging (outer tag for
switch, inner tag for customer). Sharing an IPv4 subnet among customers is
appropriate for residential and small business services, which is how we
typically deliver service. But may be less appropriate for larger business
customers (and I presume hosting customers) where the number of IPs is
large enough that you're throwing away less addresses ratio-wise. Generally
the simpler deployment model wins out in that type of scenario. Also, the
'ip unnumbered' approach may require some layer-3 security features.

VLAN-per-service (>1 customer sharing a VLAN) is problematic, and typically
pushes a lot of IPv4 specific layer-3 security features (MACFF, DHCPv4
snooping, proxy arp, broadcast forwarding/split horizon) down into the
access equipment, and that's rarely a perfect feature set. In my
experience, IPv6 services lag behind on such equipment because those v4
security features break v6.

One vlan per customer also works very well with IPv6.

+1

William_Herrin · June 5, 2013, 4:06pm

Both the router and host have to support sending and accepting invalid
ARP requests. Since the Linux kernel already mishandles arp by
default, you're probably begging for unexpected behavior. Double down
on that if the customer controls the server image.

I don't have any experience with softlayer but I have had to abandon a
handful of VPS providers due to bizarre routing failures they couldn't
fix. I was particularly thrilled with the one where if I didn't ping
the second-hop router from each of the VPS's IPs at least once every
15 seconds it would eventually forget how to reach the address. I
could log in via one of the other addresses and confirm with tcpdump
that no arps or anything else would appear on the interface. Their
advice? Disable iptables. Thanks guys, real helpful.

-Bill

Mikael_Abrahamsson · June 5, 2013, 4:11pm

Exactly what is wrong with the ARP answers and requests sent using local-proxy-arp?

William_Herrin · June 5, 2013, 4:30pm

Nothing. The problem is that the arp source IP doesn't fall within the
interface netmask at the receiver. Some receivers ignore that... after
all, why do they care what the source IP is? They only care about the
source MAC. Other receivers see a spoofed packet and drop it.

Regards,
Bill Herrin

Mikael_Abrahamsson · June 5, 2013, 4:57pm

Why wouldn't it be within the source IP mask? I would imagine local-proxy-arp would work exactly the same way as if a directly connected host with the IP the ARP request was for would have answered.

Dan_White · June 5, 2013, 6:42pm

I've seen two vendors get it wrong: 1) when originating an ARP request, the
router uses a source IP that does not match the subnet of the ip being
requested (happened when the interface on the router had secondary IPs); 2)
when a customer had more than IP address assigned on an interface/VLAN, and
one device ARPd the other, the router responded with its own MAC, creating
a race condition where sometimes traffic between those two devices was
forced up through the router.

Christopher_Papandre · June 5, 2013, 9:20pm

Hi Mikael,

(Sorry if you are getting a duplicate copy of this.)

In our network we had a couple of problems with RFC3069. Not all the hardware we currently use supports the RFC so we tried to come up with a solution that worked and didn't have us opening a lot of ERs (I know I reference 1 ER in the presentation but that's just 1 rather than a lot). We have more than just routers to consider (i.e. load balancers, firewalls, etc..) and don't want to lock ourselves in to any particular vendor. We also wanted a solution that we could easily migrate our customers into rather than completely taking them off line while we "retrofit" them into a new config (as probably would've been the case if we tried implementing RFC3069). Additionally, for a number of our customers we needed a solution that worked with a FHRP. I don't currently see a way to do that with RFC3069 but if I've missed something please let me know.

Thanks,
ChrisP.
SoftLayer Technologies
chrisp@softlayer.com

Ricky_Beam · June 5, 2013, 10:25pm

I won't argue against calling Linux "wrong". However, the linux way of dealing with ARP is well tuned for "host" and not "router" duty. It's just not designed for the kernel to maintain huge arp tables for extended periods. Generally, a host speaks to very few L2 neighbors. Even a "server" tends to speak to few of it's L2 neighbors -- esp. for an internet service (www, ftp, irc, etc.). However, a ROUTER speaks to everything on most of it's links. As such, out-of-the-box, linux makes for a very BAD router... it's neighbor cache goes "stale" in 30s (avg), and entries are dropped on a scale of minutes. Real Routers(tm) hold on to arp's for *hours* -- because broadcast traffic requires CPU attention.

That said, I do use a stripped debian box as an inter-vlan router. You don't want to see the pages of tweaks it's taken to stop it being a broadcast storm generator. (and no, "arpd" is stupid hack.) It's a beautiful thing to run "tcpdump ... broadcast" and see no packets!

(And I'm not too happy with the BS 32 interface limit for multicast routing.)

--Ricky

William_Herrin · June 5, 2013, 11:15pm

I love Linux and use it throughout my work but I can't tell you the
number of times its ARP behavior has bitten me. If you send a packet
to a VIP on a Linux box and it doesn't have an arp entry for the
default gateway, the Linux box will send an arp request... with the
vip as the source. That is just wrong. Wrong, wrong, wrong. Use the
damn interface IP when you arp for something on that interface. If the
router doesn't happen to like the bad arp (since the VIP isn't on the
router's LAN) the router will ignore it. And your service will merrily
pop up and down depending on whether the Linux box has any traffic to
originate.

Okay, I'm done venting now.

-Bill

Robert_Drake · June 6, 2013, 3:11am

That said, I do use a stripped debian box as an inter-vlan router. You
don't want to see the pages of tweaks it's taken to stop it being a
broadcast storm generator. (and no, "arpd" is stupid hack.) It's a
beautiful thing to run "tcpdump ... broadcast" and see no packets!

(And I'm not too happy with the BS 32 interface limit for multicast
routing.)

Actually, I'd love to see the pages of tweaks. Seems like it would be useful if I need to do this in the future
Maybe drop it on the Debian wiki somewhere if you get the chance.

Or at the least it would be nice to know what issues you're hitting now. You can tune the neighbor cache size and timeout via sysctl, so I would think it would be more of a memory limit than anything (unless the kernel uses a really poor hash lookup for arp entries)

--Ricky

--Robert

James_Hess · June 6, 2013, 3:30am

[snip]

(And I'm not too happy with the BS 32 interface limit for multicast
routing.)

Actually, I'd love to see the pages of tweaks. Seems like it would be
useful if I need to do this in the future

The great thing about open sourced operating system kernels is if an
arbitrary limit or system misbehavior causes you problems, or a
tweak is needed to fix incorrect behavior, you can work out a patch
to correct the situation -- or add an optional configuration setting
to fix the problem, and submit the improvement to the maintainer in
the form of a patch.

Blake_Hudson · June 6, 2013, 2:29pm

Dan White wrote the following on 6/5/2013 9:44 AM:

I read:

http://www.nanog.org/sites/default/files/tues.general.Papandreou.conservation.24.pdf

I would like to point out RFC 3069. On most cisco equipment this is done using static routes and "ip unnumbered".

So my question is basically: What am I missing? Why can't data center guys not build their network the same way regular ETTH is done? Either one vlan per customer and sharing the IPv4 subnet between several vlans, or having several customers in the same vlan but use antispoofing etc (IETF SAVI-wg functionality) to handle the security stuff?

VLAN-per-subscriber (1 customer per VLAN), can require more costly routing
equipment, particularly if you're performing double tagging (outer tag for
switch, inner tag for customer). Sharing an IPv4 subnet among customers is
appropriate for residential and small business services, which is how we
typically deliver service. But may be less appropriate for larger business
customers (and I presume hosting customers) where the number of IPs is
large enough that you're throwing away less addresses ratio-wise. Generally
the simpler deployment model wins out in that type of scenario. Also, the
'ip unnumbered' approach may require some layer-3 security features.

One thing not mentioned so far in this discussion is using PPPoE or some other tunnel/VPN technology for efficient IP utilization. The result could be zero wasted IP addresses without the need to resort to non-routable IP addresses in a customer's path (as the pdf suggested) and without some of the quirkyness or vendor lock-in of using ip unnumbered.

PPPoE (and other VPNs) have many of the same downsides as mentioned above though, they require routing cost and increase the complexity of the network. The question becomes which deployment has more cost: the simple, yet wasteful, design or the efficient, but complex, design.

--Blake

Bjorn_Mork · June 6, 2013, 7:00pm

William Herrin <bill@herrin.us> writes:

William_Herrin · June 6, 2013, 9:19pm

William Herrin <bill@herrin.us> writes:

I won't argue against calling Linux "wrong". However, the linux way of
dealing with ARP is well tuned for "host" and not "router" duty.

I love Linux and use it throughout my work but I can't tell you the
number of times its ARP behavior has bitten me. If you send a packet
to a VIP on a Linux box and it doesn't have an arp entry for the
default gateway, the Linux box will send an arp request... with the
vip as the source. That is just wrong. Wrong, wrong, wrong. Use the
damn interface IP when you arp for something on that interface. If the
router doesn't happen to like the bad arp (since the VIP isn't on the
router's LAN) the router will ignore it. And your service will merrily
pop up and down depending on whether the Linux box has any traffic to
originate.

Did you try setting sys.net.ipv4.conf.all.arp_announce=2 ?

Yes, of course I changed the sysctl. Yes of course that worked. Every
time I've run in to the problem. On server after server after server.

Yes, the system default may be tuned for host/desktop usage

No, it doesn't default to reasonable desktop settings for ARP... it
defaults to a version of wrong that on a desktop with one NIC and one
IP doesn't happen to break anything. It'd be nice if it defaulted to
RFC compliant instead and let the few folks with wacky needs move it
off the standard behavior.

-Bill

James_Hess · June 7, 2013, 4:06am

Yes, the system default may be tuned for host/desktop usage

No, it doesn't default to reasonable desktop settings for ARP... it
defaults to a version of wrong that on a desktop with one NIC and one
IP doesn't happen to break anything. It'd be nice if it defaulted to
RFC compliant instead and let the few folks with wacky needs move it
off the standard behavior.

I find Linux's arp defaults annoying also, but they're not "wrong"
or "non-RFC compliant".

An interpretation that applies in the design of Linux networking, is
that IP addresses belong to the host, and IP addresses do not belong
to IP interfaces (excepting 'scope local' IPs, such as IPv6
link-local).

An interface has a source IP address assigned to it for outgoing
traffic from the host.
All destination IPs for incoming traffic to the host belong to no
specific interface on the host.

Any IP address added to any interface, belongs to the host as a
valid destination IP, and can be ARP'ed on any of the host's IP
interfaces.

Excepting a firewall rule to the contrary, traffic for any of the
host's destination IPs can come in any interface.

This is a totally valid and correct way of a host managing that
host's IP addresses.
However, it is a tad inconvenient for the administrator, in some
real-world circumstances; mainly unusual configs such as servers with
multiple NICs plugged into different subnets, or servers behind a load
balancer.

And the ARP behavior is counterintuitive, because regardless of
that fact, in Linux you _still_ configure IP addresses on
interfaces; every interface has a preferred IP, and maybe some
alias IPs.

In most case's Linux's choice not to restrict ARP to a specific
interface bound to the IP is not useful.

However, it is useful if you have a host that has multiple NICs
plugged into the same network.

The kernel has its defaults, but distribution vendors such as
Redhat/Ubuntu/Debian, are free to supply their own defaults through
sysctl.conf or their NetworkManager packages or network configuration
scripts...

It's interesting to note they have so far chosen to go (mostly) with
the defaults.

I'm sure most people do not have a problem, or else, someone would
have updated the defaults by now

William_Herrin · June 7, 2013, 5:36am

Hi Jimmy,

I reread RFC 826 and much to my annoyance it doesn't directly speak to
this question. But it does speak to it in a backhanded way, setting a
requirement that makes sense only if the ARP source address is part of
the subnet on which the arp request is made.

826 says, "The Address Resolution module then sets the [...] ar$spa
with the protocol address of itself." "Itself" is never explicitly
defined.

But 826 also says, "The sender hardware address and sender protocol
address are absolutely necessary. It is these fields that get put in
a translation table." It says that in a context that appears to apply
to both request and response ARPs. RFC 5227 confirms this
interpretation, insisting that gratuitous arps and defensive arps are
arp-request packets, not arp-reply packets.

That would yield a nonsensical activity from the ARP request message
*unless* the source layer 3 address is part of the subnet defined on
that layer 2 network. Not just any source address will do; it must be
one of the machine's addresses that would form a valid entry in the
target's arp cache.

Linux's default behavior copies the source IP address of the outgoing
IP packet to the ARP request, regardless of whether that IP is valid
for that particular LAN subnet. So, I reiterate that Linux's default
for selecting the ARP source address does not match what the RFC says.

Postel's law cuts Linux some slack with respect to accepting ARPs on
the wrong interface. Even though that's almost always the wrong thing
to do. On the other hand, it reinforces the errant nature of Linux's
behavior with respect to source address selection when originating ARP
requests.

-Bill

Tore_Anderson1 · June 7, 2013, 6:35am

* Blake Hudson

One thing not mentioned so far in this discussion is using PPPoE or some
other tunnel/VPN technology for efficient IP utilization. The result
could be zero wasted IP addresses without the need to resort to
non-routable IP addresses in a customer's path (as the pdf suggested)
and without some of the quirkyness or vendor lock-in of using ip
unnumbered.

PPPoE (and other VPNs) have many of the same downsides as mentioned
above though, they require routing cost and increase the complexity of
the network. The question becomes which deployment has more cost: the
simple, yet wasteful, design or the efficient, but complex, design.

Or, simply just use IPv6, and use a stateless translation service
located in the core network to provide IPv4 connectivity to the public
Internet services.

This allows for 100% efficient utilisation of whatever IPv4 addresses
you have left - nothing needs to go to waste due to router interfaces,
subnet power of 2 overhead, internal servers/services that have no
Internet-available services, etc...all without requiring you to do
anything special on the server/application stacks to support it (like
set up tunnel endpoints), add dual-stack complexity into your network,
or introduce any form of stateful translation or VPN service into your
network.

Here's some more resources:

http://fud.no/talks/20130321-V6_World_Congress-The_Case_for_IPv6_Only_Data_Centres.pdf

In case you're interested in more, Ivan Pepelnjak and I will host a
(free) webinar about the approach next week. Feel free to join!

http://www.ipspace.net/IPv6-Only_Data_Centers

BTW: I hear Cisco has implemented support for this approach in their
latest AS1K code, although I haven't confirmed this myself yet.

Tore

Bjorn_Mork · June 7, 2013, 8:01am

Jimmy Hess <mysidia@gmail.com> writes:

The kernel has its defaults, but distribution vendors such as
Redhat/Ubuntu/Debian, are free to supply their own defaults through
sysctl.conf or their NetworkManager packages or network configuration
scripts...

It's interesting to note they have so far chosen to go (mostly) with
the defaults.

I'm sure most people do not have a problem, or else, someone would
have updated the defaults by now

Changing defaults will break stuff for people relying on those defaults.
This is usually not acceptable. At least not in the kernel.

The behaviour is well documented and easy to change. Whining about the
defaults not matching personal preferences is useless noise.

Bjørn