Policy-based routing is evil? Discuss.

I'm having a discussion with a small network in a part of the world
where bandwidth is scarce and multiple DSL lines are often used for
upstream links. The topic is policy-based routing, which is being
described as "load balancing" where end-user traffic is assigned to a
line according to source address.

In my opinion the main problems with this are:

  - It's brittle, when a line fails, traffic doesn't re-route
  - None of the usual debugging tools work properly
  - Adding a new user is complicated because it has to be done in (at
    least) two places

But I'm having a distinct lack of success locating rants and diatribes
or even well-reasoned articles supporting this opinion.

Am I out to lunch?

-w

I'm having a discussion with a small network in a part of the world
where bandwidth is scarce and multiple DSL lines are often used for
upstream links. The topic is policy-based routing, which is being
described as "load balancing" where end-user traffic is assigned to a
line according to source address.

In my opinion the main problems with this are:

- It's brittle, when a line fails, traffic doesn't re-route
- None of the usual debugging tools work properly

I think this all depends on how it's configured, and if you can monitor/detect failures.

I've seen folks do things like this with a Linux box with "multiple routing tables". If you have something validate the link is working, you can easily have it "fail over". This is all depending on the admin to do it right.

- Adding a new user is complicated because it has to be done in (at
   least) two places

This all depends on the tool set in use/available.

But I'm having a distinct lack of success locating rants and diatribes
or even well-reasoned articles supporting this opinion.

Am I out to lunch?

No, but most people I've seen either

a) set it up, it works (or seems to) and cross their fingers and move to the next fire
b) try to over-engineer the crap out of it so it's got what they feel is "100% availability" but isn't sustainable or maintainable by someone other than themselves.

The simple answer is: rfc1925 7.a & 8 apply

- Jared

But I'm having a distinct lack of success locating rants and diatribes or even well-reasoned articles supporting this opinion.

Possibly because it's so commonly known that PBR is generally a Very Bad Idea for the reasons you cite, and more, that nobody has felt the need to re-state the obvious?

;>

Am I out to lunch?

Not with regards to PBR, at least, IMHO.

;>

It's to be avoided if at all possible.

I'm having a discussion with a small network in a part of the world
where bandwidth is scarce and multiple DSL lines are often used for
upstream links. The topic is policy-based routing, which is being
described as "load balancing" where end-user traffic is assigned to a
line according to source address.

In my opinion the main problems with this are:

- It's brittle, when a line fails, traffic doesn't re-route

it's brittle

- None of the usual debugging tools work properly
- Adding a new user is complicated because it has to be done in (at
   least) two places

you take all the useful information that an IGP could be (or is) providing you, and then you ignore it and do something else.

But I'm having a distinct lack of success locating rants and diatribes
or even well-reasoned articles supporting this opinion.

Am I out to lunch?

evil is not a synonym for ugly patch placed over a problem that could be handled better. If it's being used as an alternative to VRF, it isn't.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Le 11/10/2013 19:41, joel jaeggli a �crit :

I'm having a discussion with a small network in a part of the world
where bandwidth is scarce and multiple DSL lines are often used for
upstream links. The topic is policy-based routing, which is being
described as "load balancing" where end-user traffic is assigned to a
line according to source address.

In my opinion the main problems with this are:

- It's brittle, when a line fails, traffic doesn't re-route

it's brittle

- None of the usual debugging tools work properly
- Adding a new user is complicated because it has to be done in (at
   least) two places

you take all the useful information that an IGP could be (or is)

providing you, and then you ignore it and do something else.

I like that phrase. :wink:

mh

But I'm having a distinct lack of success locating rants and diatribes
or even well-reasoned articles supporting this opinion.

Am I out to lunch?

evil is not a synonym for ugly patch placed over a problem that could

be handled better. If it's being used as an alternative to VRF, it isn't.

BGP is nothing if not policy-based routing, but I think I see your
concern with an approach that essentially statically locks in a
particular set of paths to links.

Not knowing what if any routing is configured between the end points,
perhaps just point out there are alternative means to achieve load
balancing. Perhaps using LOCAL_PREF for some set of ASNs over one path
or the other, or alternatively doing some sort of flow-based load
balancing might be sufficient.

John

you take all the useful information that an IGP could be (or is)

    > providing you, and then you ignore it and do something else.

Yes, that's another part of the conversation, encouraging the use of
an IGP, which has been a source of trouble for them because of broken
wireless bridges from a very commonly used vendor that randomly eat
multicast packets, so it's not as straightforward as it should be.

    > evil is not a synonym for ugly patch placed over a problem that
    > could be handled better.

Ok, fair enough. My first experience with PBR was as a summer intern in
the mid-1990s who inherited management of a large ATM network that had
a big VPN-esque thing built entirely that way and with no
documentation. It certainly felt evil at the time. :wink:

-w

I've done exactly this with Linux routers doing SNAT and multiple upstream connections (ip route and ip rule are the commands used to setup the "multiple tables" and rules to determine routing policy). Depending on the level of segregation needed, adding a new "user" can be as simple as plugging them into the appropriate network.

Is it ideal? No. But when $ is the deciding factor between a real router with real upstream connections supporting BGP and a Linux router with DSL and cable and no routing protocol, policy routing with some intelligence to fail-over if a link fails (and go back when it recovers) can work acceptably.

Most if not all IGPs can be configured to work without multicast. Now if
you're talking IPv6 you may have some issuesŠ

Well, I tell you what.

My perception of where this was a good idea is the use case a recent
client might have for it:

Two consumer-grade uplinks (FiOS 150 and RR 100, specifically); primary
application is callcenter, VoIP to a service provider Elsewhere.

I would set it up so that all the VoIP and callcenter web traffic went over
FiOS *until it failed*, and everything else went Road Runner *unless it
failed*.

This keeps the general traffic out of the hair of the latency/PPS sensitive
traffic whenever possible.

Is that not policy-based routing?

Why is it bad?

Cheers,
-- jra

I think really PBR violates this:
  <http://en.wikipedia.org/wiki/Principle_of_least_astonishment>

I see ISP folks MOSTLY avoid PBR, because it does weird things that
NOC/ops folks just plain don't expect. I see Enterprise network folks
fall back to PBR often, for reasons that they seem happy with... but
man it makes things confusing :slight_smile:

-chris

I think they are referring to something like Cisco PBR, where you
configure routing policy statically on each hop. Yes, it can be
configured to fail over, etc, but inherently it is a management nightmare
if you are configuring PBR on each device in your network. May as well
move back to static routing on everythingŠ

Used sparingly, I'd agree that it does have its uses. One use I can think
of is to use PBR to direct traffic for testing a new circuit or path while
not cutting everything over. That is, until it is sufficiently tested,
and then everything would be cut over and the PBR removedŠ

Hi all,

We use Linux for our edge routers which have multiple interfaces to
different BGP peers. Policy based routing allows us to insure that
traffic originating from a particular external IP address on the router,
goes out the matching network.

We have also used in on client systems to force particular protocols out
particular providers.

It's not that easy to do on Linux, as you need to make sure you have all
the proper link routes on place and positioned properly in the rule
chain, or you can really break things.

Stu

Doing this with actual routing, in a way that doesn't become fragile is
hard. It is not impossible as Jared points out, but is non-trivial.

However there is a variant which is much less brittle, but is more
annoying to configure with most tools. The idea is that the gateway
box is a NAT, with an outbound IP on each of the two uplinks. The
box can then make intelligent decisions about which provider to use
based on layer 8+9 information.

I've seen this done multiple times where for instance there is high
bandwidth satellite, and low bandwidth terrestrial services. Latency
sensitive traffic (dns, ssh, etc) are send over the low bandwidth
terrestrial, while bulk downloads go over satellite. It's quite
robust and useful in these situations.

Making open source boxes do this is possible, but quite annoying
in my experience. I don't think it's possible to make a Cisco or
Juniper do this sort of thing in any reasonable way. A number of
manufacturers have developed custom solutions around this idea.

In my opinion the main problems with this are:
  - It's brittle, when a line fails, traffic doesn't re-route

Yes, but this is no worse than if you just had one single DSL link.
Manual failover is a perfectly valid solution for very small networks where
a less-than-enterprise-grade solution such as DSL is suitable.

I'd be more concerned about the question of /have you implemented a proper
firewall solution/ ?

  - None of the usual debugging tools work properly

  - Adding a new user is complicated because it has to be done in (at
    least) two places

Not necessarily.

You might pick a /20 rfc1918 network, and then assign a /24 of source
addresses out of the subnet to each link. Then you won't need to adjust
two places, every time a device is added; just IP it appropriately, or
set the appropriate DHCP reservation, or Best: subnet the local network
based on choice of outgoing WAN link, and select the client's VLAN based
on desired WAN link...

Another alternative to PBR is to have an extra router for each DSL link,
providing a default gateway;

But I'm having a distinct lack of success locating rants and diatribes
or even well-reasoned articles supporting this opinion.

There are plenty of downsides to PBR in various scenarios, but the PBR
functionality on these devices doesn't exist just at the whim of the device
manufacturer --- operators look for the functionality.

It is perfectly valid and very good to use PBR, as long as you understand
any limitations and drawbacks that apply to your specific situation.

The main drawback is ease-of-maintenance challenges.

-w

I'm having a discussion with a small network in a part of the world
where bandwidth is scarce and multiple DSL lines are often used for
upstream links. The topic is policy-based routing, which is being
described as "load balancing" where end-user traffic is assigned to a
line according to source address.

I wouldn't say "evil", I have found it really useful in some cases. You
just need a different approach to the network design.

I'd just say it's not the easiest way and yeah, I try to generally avoid it.

  - It's brittle, when a line fails, traffic doesn't re-route

This depends on how flexible the PBR implementation on your router is.
If your router can have conditionals like this:

* match: source address A && link P available --> send it to link P
* match: source address A --> unconditionally send it to fallback link F

Then your users will converge quite nicely. Also, make sure you prepare
for router redundancy.

Configuration can get pretty complex, though, and link addition can
require redesign of the whole policy.

  - None of the usual debugging tools work properly

No, but then, they can't expect usual debugging tools with unusual
scenario. You may need to develop some new tools and teach them how to
use them.

  - Adding a new user is complicated because it has to be done in (at
    least) two places

With a good design this burden can be significantly lowered to the point
of being not 100% but 80 or 90% effective, so to speak. Consider a good
topology and a good addressing plan.

It doesn't necessarily have to be that complex OR brittle.

I would suggest the use of recursive next-hop with PBR to the loopback
/32 of a peer router that is not associated with a directly connected
network.

If that /32 route happens to be down, then the recursive lookup of
the next-hop results in a lookup of the default route.

Yeah.

Just do it in private and wash your hands afterwards.

(Sorry, but a Lazarus Long quote seemed appropriate.)

I'm having a discussion with a small network in a part of the world
where bandwidth is scarce and multiple DSL lines are often used for
upstream links. The topic is policy-based routing, which is being
described as "load balancing" where end-user traffic is assigned to a
line according to source address.

In my opinion the main problems with this are:

  - It's brittle, when a line fails, traffic doesn't re-route

You can always know what IPs are on the other end of the link, add static
routes for them to make sure they're reachable and based on ping results
use the link or not. It works fairly well if 1-2 minutes of downtime is not
an issue. I've done this using Linux and a bash script and it worked to
balance traffic across two links with up/down detection. iproute2 does
wonders.

  - None of the usual debugging tools work properly

As long as you don't have asymmetric routing in place, debugging will be
the same. Even so, you can (at least on Linux) do a "tcpdump -i any" and
see what goes in/out of your box :slight_smile:

  - Adding a new user is complicated because it has to be done in (at
    least) two places

I agree it's not scaleable, but for when all you have are DSL lines or low
capacity lines over which you cannot run an IGP, you'll have make it work
with what you have :slight_smile:

But I'm having a distinct lack of success locating rants and diatribes
or even well-reasoned articles supporting this opinion.

I would go for the "right tools for the right job" idea and say that PBR in
the case you're mentioning of a valid use and probably the most effective
way of doing business for them.

Also take into consideration that in many parts of the world, the effort of
configuring and maintaining a setup like this fall in the the day to day
job of one or several network admins. Also, most of the time is cheaper to
hire more people than go and buy let's say professional networking
equipment.

Regards,
Eugeniu

I'm having a discussion with a small network in a part of the world
where bandwidth is scarce and multiple DSL lines are often used for
upstream links. The topic is policy-based routing, which is being
described as "load balancing" where end-user traffic is assigned to a
line according to source address.

In my opinion the main problems with this are:

- It's brittle, when a line fails, traffic doesn't re-route

You can always know what IPs are on the other end of the link, add static
routes for them to make sure they're reachable and based on ping results
use the link or not. It works fairly well if 1-2 minutes of downtime is not
an issue. I've done this using Linux and a bash script and it worked to
balance traffic across two links with up/down detection. iproute2 does
wonders.

Or you could run FreeBSD with PF and ifstated and it would be an almost instantaneous failover.

- None of the usual debugging tools work properly

As long as you don't have asymmetric routing in place, debugging will be
the same. Even so, you can (at least on Linux) do a "tcpdump -i any" and
see what goes in/out of your box :slight_smile:

Asymmetric routing is a fact of life and is fairly common.

- Adding a new user is complicated because it has to be done in (at
   least) two places

I agree it's not scaleable, but for when all you have are DSL lines or low
capacity lines over which you cannot run an IGP, you'll have make it work
with what you have :slight_smile:

But I'm having a distinct lack of success locating rants and diatribes
or even well-reasoned articles supporting this opinion.

I would go for the "right tools for the right job" idea and say that PBR in
the case you're mentioning of a valid use and probably the most effective
way of doing business for them.

Also take into consideration that in many parts of the world, the effort of
configuring and maintaining a setup like this fall in the the day to day
job of one or several network admins. Also, most of the time is cheaper to
hire more people than go and buy let's say professional networking
equipment.

Hmm, really? The professional networking equipment required for this type of thing would be in the ~10k new and significantly cheaper used. That's not a lot of salary.

Mike