BGP Design question.

Bret_Palsson · June 22, 2011, 10:27pm

Here is my current setup in ASCII art. (Please view in a fixed width font.) Below the art I'll write out the setup.

     +--------+ +--------+
     > Peer A | | Peer A | <-Many carriers. Using 1 carrier
     +---+----+ +----+---+ for this scenario.
         >eBGP | eBGP
         > >
     +---+----+iBGP+----+---+
     > Router +----+ Router | <-Netiron CERs Routers.
     +-+------+ +------+-+
       >A `.P A.' |P <-A/P indicates Active/Passive
       > `. .' | link.
       > :: |
     +-+------+' `+------+-+
     >Act. FW | |Pas. FW | <-Firewalls Active/Passive.
     +--------+ +--------+

To keep this scenario simple, I'm multihoming to one carrier.
I have two Netiron CERs. Each have a eBGP connection to the same peer.
The CERs have an iBGP connection to each other.
That works all fine and dandy. Feel free to comment, however if you think there is a better way to do this.

Here comes the tricky part. I have two firewalls in an Active/Passive setup. When one fails the other is configured exactly the same
and picks up where the other left off. (Yes, all the sessions etc. are actively mirrored between the devices)

I am using OSPFv2 between the CERs and the Firewalls. Failover works just fine, however when I fail an OSPF link that has the active default route, ingress traffic still routes fine and dandy, but egress traffic doesn't. Both Netiron's OSPF are setup to advertise they are the default route.

What I'm wondering is, if OSPF is the right solution for this. How do others solve this problem?

Thanks,

Bret

Note: Since lately ipv6 has been a hot topic, I'll state that after we get the BGP all figured out and working properly, ipv6 is our next project.

Brant_I_Stevens1 · June 22, 2011, 10:33pm

Here is my current setup in ASCII art. (Please view in a fixed width
font.) Below the art I'll write out the setup.

    +--------+ +--------+
    > Peer A | | Peer A | <-Many carriers. Using 1 carrier
    +---+----+ +----+---+ for this scenario.
        >eBGP | eBGP
        > >
    +---+----+iBGP+----+---+
    > Router +----+ Router | <-Netiron CERs Routers.
    +-+------+ +------+-+
      >A `.P A.' |P <-A/P indicates Active/Passive
      > `. .' | link.
      > :: |
    +-+------+' `+------+-+
    >Act. FW | |Pas. FW | <-Firewalls Active/Passive.
    +--------+ +--------+

To keep this scenario simple, I'm multihoming to one carrier.
I have two Netiron CERs. Each have a eBGP connection to the same peer.
The CERs have an iBGP connection to each other.
That works all fine and dandy. Feel free to comment, however if you think
there is a better way to do this.

Here comes the tricky part. I have two firewalls in an Active/Passive
setup. When one fails the other is configured exactly the same
and picks up where the other left off. (Yes, all the sessions etc. are
actively mirrored between the devices)

I am using OSPFv2 between the CERs and the Firewalls. Failover works just
fine, however when I fail an OSPF link that has the active default route,
ingress traffic still routes fine and dandy, but egress traffic doesn't.
Both Netiron's OSPF are setup to advertise they are the default route.

What I'm wondering is, if OSPF is the right solution for this. How do
others solve this problem?

You could also do an eBGP session through the firewall between the outside
routers and routers on the inside firewall, passing only the default route
to the inside routers.

Owen_DeLong · June 22, 2011, 10:44pm

I would suggest running VRRP on the routers towards the firewalls and only use OSPF
to advertise the ingress routes. Statically route default to the VRRP group.

Implemented as follows:

[RA]------[switch]-----[switch]------[RB]
> >
[AFW] [PFW]

Make sense?

AFW/PFW advertise OSPF for the interior routes so that RA/RB know how to reach
them, but, RA/RB don't have to advertise anything and AFW/PFW have static
default routes to a VRRP group address shared between RA/RB.

If you want to make OSPF work, then, try making sure you have default-information originate always
on both RA and RB.

Owen

Bandy_Rush1 · June 22, 2011, 11:02pm

vrrp?

Ingo_Flaschberger2 · June 22, 2011, 11:07pm

Hi Bret,

To keep this scenario simple, I'm multihoming to one carrier.
I have two Netiron CERs. Each have a eBGP connection to the same peer.
The CERs have an iBGP connection to each other.
That works all fine and dandy. Feel free to comment, however if you think there is a better way to do this.

Here comes the tricky part. I have two firewalls in an Active/Passive setup. When one fails the other is configured exactly the same
and picks up where the other left off. (Yes, all the sessions etc. are actively mirrored between the devices)

I am using OSPFv2 between the CERs and the Firewalls. Failover works just fine, however when I fail an OSPF link that has the active default route, ingress traffic still routes fine and dandy, but egress traffic doesn't. Both Netiron's OSPF are setup to advertise they are the default route.

Linux firewall?
disabled rp-filter?

What I'm wondering is, if OSPF is the right solution for this. How do others solve this problem?

I do something similar with freebsd; you always make shure the backbone area 0.0.0.0 does not break into 2 parts, perhaps use an extra link between the 2 firewalls just because of this.

Kind regards,
Ingo Flaschberger

Hammer · June 22, 2011, 11:11pm

Another option would be to insert switches between your routers and FWs. OSPF from the routers to the switches (yes, switches running L3 OSPF) and then HSRP/VRRP/etc. to the FWs. This way routing changes don't affect the FWs. The FWs simply have a default route to the HSRP/VRRP/etc. VIP. Then the primary switch routes to the routers which then route out to their EBGP peers. Only caveat is to make sure you are only redistributing the 0/0 into OSPF. Not the full route table.

-Hammer-

William_Cooper · June 22, 2011, 11:22pm

Couple of questions for clarification (inline):

Here is my current setup in ASCII art. (Please view in a fixed width font.) Below the art I'll write out the setup.

\+\-\-\-\-\-\-\-\-\+    \+\-\-\-\-\-\-\-\-\+
&gt; Peer A |    | Peer A |  &lt;\-Many carriers\. Using 1 carrier
\+\-\-\-\+\-\-\-\-\+    \+\-\-\-\-\+\-\-\-\+    for this scenario\.
    &gt;eBGP          | eBGP
    &gt;              &gt;
\+\-\-\-\+\-\-\-\-\+iBGP\+\-\-\-\-\+\-\-\-\+
&gt; Router \+\-\-\-\-\+ Router |  &lt;\-Netiron CERs Routers\.
\+\-\+\-\-\-\-\-\-\+    \+\-\-\-\-\-\-\+\-\+
  &gt;A   \`\.P    A\.&#39;    |P   &lt;\-A/P indicates Active/Passive
  &gt;      \`\.  \.&#39;      |      link\.
  &gt;        ::        |
\+\-\+\-\-\-\-\-\-\+&#39;  \`\+\-\-\-\-\-\-\+\-\+
&gt;Act\. FW |    |Pas\. FW |  &lt;\-Firewalls Active/Passive\.
\+\-\-\-\-\-\-\-\-\+    \+\-\-\-\-\-\-\-\-\+

(Tony) What's behind this point?

To keep this scenario simple, I'm multihoming to one carrier.
I have two Netiron CERs. Each have a eBGP connection to the same peer.
The CERs have an iBGP connection to each other.
That works all fine and dandy. Feel free to comment, however if you think there is a better way to do this.

Here comes the tricky part. I have two firewalls in an Active/Passive setup. When one fails the other is configured exactly the same
and picks up where the other left off. (Yes, all the sessions etc. are actively mirrored between the devices)

I am using OSPFv2 between the CERs and the Firewalls. Failover works just fine, however when I fail an OSPF link that has the active default route, ingress traffic still routes fine and dandy, but egress traffic doesn't. Both Netiron's OSPF are setup to advertise they are the default route.

(Tony) (Apologies for the seemingly dumb question) but by egress, do
you mean from behind the FW towards your carrier?

PC11 · June 22, 2011, 11:33pm

Who makes the firewall?

To make this work and be "hitless", your firewall vendor must support
stateful replication of routing protocol data (including OSPF). For
example, Cisco didn't support this in their ASA product until version 8.4 of
code.

Otherwise, a failover requires OSPF to re-converge -- and quite frankly,
will likely cause some state of confusion on the upstream OSPF peers, loss
of adjacency, and a loss of routing until this occurs. It's like someone
just swapped a router with the same IP to the upstream device -- assuming
your active/standby vendor's implementation only presents itself as one
device.

However, once this is succesful your current failover topology should work
fine -- even if it takes some time to failover.

In my opinion though, unless the firewall is serving as "transit" to
downstream routers or other layer 3 elements, and you need to run OSPF to it
(And through it) as a result, it's often just easier to static default route
out from the firewall(s) and redistribute a static route on the upstream
routers for the subnets behind the firewalls. It also helps ensure
symmetrical traffic flows, which is important for stateful firewalls and can
become moderatly confusing when your firewalls start having many interfaces.

Hammer · June 22, 2011, 11:37pm

Do people really run routing protocols with their public address space on their FWs? I'm not saying right or wrong. Just curious. Seems like the last thing I would want to do would be to have my FW participate in a routing protocol unless is was absolutely necessary. Better to static the FW with a default route? I'd love to hear arguments for or against....

-Hammer-

William_Herrin · June 22, 2011, 11:42pm

I am using OSPFv2 between the CERs and the Firewalls.
Failover works just fine, however when I fail an OSPF link
that has the active default route, ingress traffic still routes
fine and dandy, but egress traffic doesn't. Both Netiron's
OSPF are setup to advertise they are the default route.

Hi Bret,

I have a setup that is almost identical except there is a pair of
simple switches between the routers and firewalls interconnecting all
into a LAN and I'm working with Cisco 2811's instead of Netiron CERs.
Can you expand on the interface addressing and what the firewalls see
via OSPF during your failure scenario?

What I'm wondering is, if OSPF is the right solution for
this. How do others solve this problem?

My failover firewall also connects to the switches (inside and out)
and turns down ports which connect to the primary firewall. During a
failure, the primary can't be depended on to completely take itself
out of line. If it was in a working state that could be depended on,
it wouldn't have failed.

Regards,
Bill Herrin

Bret_Palsson · June 23, 2011, 1:04am

Couple of questions for clarification (inline):

> Here is my current setup in ASCII art. (Please view in a fixed width
font.) Below the art I'll write out the setup.
>
>
> +--------+ +--------+
> > Peer A | | Peer A | <-Many carriers. Using 1 carrier
> +---+----+ +----+---+ for this scenario.
> >eBGP | eBGP
> > >
> +---+----+iBGP+----+---+
> > Router +----+ Router | <-Netiron CERs Routers.
> +-+------+ +------+-+
> >A `.P A.' |P <-A/P indicates Active/Passive
> > `. .' | link.
> > :: |
> +-+------+' `+------+-+
> >Act. FW | |Pas. FW | <-Firewalls Active/Passive.
> +--------+ +--------+

(Tony) What's behind this point?

We have a few gigs of voice (RTP) traffic at any given time of the day. We
want/need hitless failover. Currently we provide this, but we use our
providers BGP mix. We will be peering with many carriers directly now and
are changing our topology to do so. Before we had a HSRP L3 hand-off to two
switches in the same vlan. On our juniper SSGs we bonded ports and we use
the NSRP for all the RTOs. Which provided hitless fail-over.

>
>
> To keep this scenario simple, I'm multihoming to one carrier.
> I have two Netiron CERs. Each have a eBGP connection to the same peer.
> The CERs have an iBGP connection to each other.
> That works all fine and dandy. Feel free to comment, however if you think
there is a better way to do this.
>
> Here comes the tricky part. I have two firewalls in an Active/Passive
setup. When one fails the other is configured exactly the same
> and picks up where the other left off. (Yes, all the sessions etc. are
actively mirrored between the devices)
>
> I am using OSPFv2 between the CERs and the Firewalls. Failover works just
fine, however when I fail an OSPF link that has the active default route,
ingress traffic still routes fine and dandy, but egress traffic doesn't.
Both Netiron's OSPF are setup to advertise they are the default route.
>

(Tony) (Apologies for the seemingly dumb question) but by egress, do
you mean from behind the FW towards your carrier?

Yes.

Bret_Palsson · June 23, 2011, 1:07am

Who makes the firewall?

Juniper SSG. We use NSRP and replicate all the RTOs. We have hitless on the
Firewalls, have for years. We're now peering with our own carriers vs. using
our datacenter's mix.

A static route from the junipers to the VIP (VRRP) is probably the way to
go. I think.

To make this work and be "hitless", your firewall vendor must support

Jason_Roysdon · June 23, 2011, 3:42am

I second the static routes, specially from a simplicity standpoint. Add
in a pair of layer two switches to simplify further:

     +--------+ +--------+
     > Peer A | | Peer A | <-Many carriers. Using 1 carrier
     +---+----+ +----+---+ for this scenario.
         >eBGP | eBGP
         > >
     +---+----+iBGP+----+---+
     > Router + + Router | <- Routers. Not directly connected
     +-+------+ +------+-+
       > >
     +-+------+ +------+-+
     >L2Switch|----|L2Switch| <- Layer 2 switches, can be stacked
     +--------+ +--------+
       > >
     +-+------+ +------+-+
     >Act. FW |----|Pas. FW | <-Firewalls Active/Passive.
     +--------+ +--------+

You can lose all of the left leg, or all of the right leg, and still be
up. If you want to complicate things, you can add crossing links
between it all, but again, beyond BGP and VRRP, this is a very simple
design you can easily troubleshoot at 3AM. It's also much easier to
document the troubleshooting steps (so you can go on vacation and
someone else can solve without calling you) and test upgrades.

You can nearly evenly split the traffic by having a VRRP VIP on each
edge router, with the other router backing up the first. The firewalls
can have two static routes, one to each VIP, and this will roughly
load-balance the traffic out on a packet basis. As you peer with the
same ISP, this will work just fine. If they have an outage, your edge
routers will learn, and even if the circuit drops it'll know, and
basically the VIP will just redirect traffic to the other router.

Now all your firewalls have to do is maintain stateful session
information, not OSPF.

If you had two different ISPs (especially if they are not roughly evenly
connected), then not having intelligence of the BGP paths in your
firewalls can cause an extra hop when it hits router with the longer
path, which will redirect it to the router with the shorter path.

Speaking from a Cisco/HSRP point of view, you could be more intelligent
(re:more complicated, and complication means harder troubleshooting and
more documentation needed) during problem periods by having the VIP move
routers automatically based on the WAN link dropping and/or a route
beyond it being lost (others can comment to if VRRP supports this).
This would save one hop to the "broken" router when the BGP path or WAN
is down.

Jason Roysdon

PC11 · June 23, 2011, 4:04am

A quick google search says you should be ok with screenos 6.0 or later for
the routing protocol replication.

I'm looking at your diagram again though. You will want a switch in the
middle of your Firewalls and routers, as the firewalls are in an
active/standby mode and do not independently run OSPF. And in this case,
throw them all on one vlan, and let them peer with each other (2x1). This
could actually be your problem.

None the less, I agree, why involve it in OSPF and make it complex if
there's no real need to? I think your static route idea is the best way to
do, given the FW supports presenting itself as a "single" entity.

Hank_Nussbacher1 · June 23, 2011, 6:02am

Let me be a bit of a heretic here. How often does your router fail? Or your firewall? In the 25 years I have gone into customers I have found when they did a cross setup as proposed below by Bret and Jason, only one person truly knew the complete setup and if something broke only he was able to fix it. There is never complete printed documentation: routing design, IPs on all interfaces, subnetting schematic, etc. And if there was at one point, after 2 years it was outdated and never updated and only the *1* guy knew the changes in his head.

In that kind of situation, when something stopped working they always had to call in the "guru" to fix it. On the other hand, a simple design of only *one* path (pick either left or right side of each of the ASCII arts), made it possible that even junior network engineers as well as technicians called in on emergency with 4 hours notice, were able to fix the situation much more quickly than the "cross" design. And the MTBF on a single path solution, IMHO, is around 3-4 years. And if you need redundancy, keep a spare box on a shelf, completely loaded with the latest config so that it can be hot-swapped in within 15 minutes of failure.

This 1-path design is not for everyone. The vendors always recommend the "cross" design since they sell 2x the amount of boxes but I have found that life works fine with just a 1-path design as well.

-Hank

Bret_Palsson · June 23, 2011, 6:07am

That's fine if you are running a website. When it comes to telecommunications, a 15 minute outage is pretty huge. Especially with certain types of customers: emergency services for example.

-Bret

Hammer · June 23, 2011, 12:44pm

Agreed. At an enterprise level, there is no need to risk extended downtime to save a buck or two. Redundant hardware is always a good way to keep Murphy out of the equation. And as far as hardware failures go, it's not that common. Nowadays it's the bugs in overly complicated code on your gear that get you first. I miss IOS 11.3.....

-Hammer-

Valdis_Kletnieks · June 23, 2011, 1:59pm

So what you're saying is we're more likely to take an outage due to tripping
over a bug, so we should go for the simplest non-crossover config to minimize
the chances of hitting a bug.

Hammer · June 23, 2011, 3:44pm

HaHa! I agree with keeping it simple. I keep my routers simple. I keep my switches simple. Sometimes it's not as easy on a Layer 7 FW or a load balancer. So plan accordingly.

-Hammer-

George_Bonser · June 23, 2011, 5:04pm

I am using OSPFv2 between the CERs and the Firewalls. Failover works
just fine, however when I fail an OSPF link that has the active

default

route, ingress traffic still routes fine and dandy, but egress traffic
doesn't. Both Netiron's OSPF are setup to advertise they are the
default route.

What I'm wondering is, if OSPF is the right solution for this. How do
others solve this problem?

Thanks,

Bret

Man, I would have a lot of questions. The CER's are a layer2/3 switch.
What is the topology and how are you "failing" the link? Are the links
to the firewalls on a vlan with the interfaces being a ve on the CERs or
are the interfaces to the firewalls "route-only"? Is that vlan trunked
across on the link between the two switches? How are you failing it
over? There are lots of "failover" things you could be doing (turning
off the left router, turning off the left firewall, disabling the
primary port from the left router to the left firewall). When you say
it doesn't work are you saying that it doesn't work if you disable the
port from the left router to the left firewall or are you saying it
doesn't work with the right firewall takes over from the left or what.

There are so many subtle configuration possibilities with these units
that just given a wiring diagram without also seeing the config makes it
hard to help.

I am guessing that the connections to the firewalls are not MCT cluster
trunks because you can't run layer3 routing protocols with MCT (yet) on
the CERs. Is it link failover or device failover that isn't working?