Question about EX - SRX redundancy

Anurag_Bhatia · April 2, 2015, 2:12pm

Hello everyone!

I have got two Juniper EX series switches (on virtual chassis) and two SRX
devices on native clustering.

I am trying to have a highly available redundancy between them with atleast
2Gbps capacity all the time but kind of failing. I followed Juniper's
official page here
<http://kb.juniper.net/InfoCenter/index?page=content&id=KB22474> as well as
this detailed forum link here
<http://forums.juniper.net/t5/SRX-Services-Gateway/Best-way-of-redundancy-between-SRX-and-EX/td-p/181365>
.

I wish to have a case where devices are connected criss cross and following
the documentation I get two ae bundles in EX side and one single reth
bundle on SRX side. Both ae bundles on EX side have identical configuration
and VLAN has both ae interfaces called up.

If I do not go for criss cross connectivity like this:

EX0 (ae1) >> Two Patches to SRX0 (reth1)
EX1 (ae2) >> Two Patches to SRX1 (reth1)

Then it works all well and redundancy works fine. In this case as long as 1
out of 4 patch is connected connectivity stays live but this has trade off
that if one EX goes down then I cannot make use of other corresponding SRX.

If I do criss connectivity, something like:

EX0 (ae1) >> Two Patches to SRX0 (reth1)
EX0 (ae1) >> One patch to SRX1 (reth1)

EX1 (ae2) >> Two Patches to SRX1 (reth1)
EX1 (ae2) >> One patch to SRX0 (reth1)

In this config system behaves very oddly with one ae pair (and it's
corresponding physical ports) working well while failover to other ae
bundle fails completely.

I was wondering if someone can point me out here.

Appreciate your time and help!

Bill_Blackford · April 2, 2015, 2:29pm

It's my understanding that a cross chassis LAG is not supported. If there is a way, I'm not aware of it. I'm running the same set up as your working example in my locations and for now, this suits my requirements.

Anurag_Bhatia · April 2, 2015, 3:17pm

Hi

I thought cross chassis lag is supposed by the use of reth bundled at SRX
end. I read this is basically the major difference in reth Vs ae bundle in
SRX.

Interesting factor here is that ae bundles can spread across multiple EX
chassis in a virtual chassis environment but this cannot be the case with
ae bundles in SRX.

Thanks.

Hugo_Slabbert1 · April 2, 2015, 3:51pm

In:

> EX0 (ae1) >> Two Patches to SRX0 (reth1)
> EX1 (ae2) >> Two Patches to SRX1 (reth1)

with:

> that if one EX goes down then I cannot make use of other corresponding
SRX.

Do you mean that e.g. if SRX0 is the chassis cluster primary and EX0 goes down, then you can't use SRX0, but you would like to be able to survive EX0 going down *without* failing over the SRX chassis cluster to SRX1?

Anurag_Bhatia · April 2, 2015, 6:20pm

Hi

Yes,

Since SRX0 connected to EX0 and SRX1 connected to EX1 (only). Thus either
pair - 0 will work or pair - 1 will work. I wish if criss crossing worked
then failure of one EX would have still made both SRX available.

In current worst case scenario - failure of EX0 and SRX1 can cause full
outage.

Thanks.

Hugo_Slabbert1 · April 2, 2015, 6:55pm

Putting the EXs in a VC and splitting your AEs across the 2x VC members takes care of that.

EXVC (ae1) >> Two Patches to SRX0 (reth1)
EXVC (ae2) >> Two Patches to SRX1 (reth1)

...where EXVC is a VC composed of EX0 and EX1, and ae1 and ae2 both have one member interface from each VC member.

In a failure of EX0 or EX1, your throughput on ae1 and ae2 halves as they each lose a LAG member, but both SRX0 and SRX1 are still reachable.

Anurag_Bhatia · April 2, 2015, 9:11pm

Hi

Tried exactly same. Note: it's ae18 and ae20 on EX side and reth4 on SRX
side.

Initially worked but when I took down ae18, i.e ae18 is disabled, now on
ae20 I am getting:

show interfaces ae20
Physical interface: ae20, Enabled, Physical link is Up
  Interface index: 533, SNMP ifIndex: 924
  Link-level type: Ethernet, MTU: 1514, Speed: 2Gbps, BPDU Error: None,
MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
  Flow control: Disabled, Minimum links needed: 1, Minimum bandwidth
needed: 0

on reth4 on SRX I am getting:

show interfaces reth4
Physical interface: reth4, Enabled, Physical link is Down
Interface index: 132, SNMP ifIndex: 696

Any idea why so? All physical ports are up (none is shut) and only thing
which I shut is one of ae bundles. Also rather then disabling ae18 if I
disabled associated physical ports behavior is just the same i.e reth4 goes
down.

Thanks for your time and help!

Hugo_Slabbert1 · April 2, 2015, 10:22pm

I just want to confirm your setup.

The "criss-cross" setup you were describing is different from what I described.

You listed:

> EX0 (ae1) >> Two Patches to SRX0 (reth1)
> EX0 (ae1) >> One patch to SRX1 (reth1)
>
> EX1 (ae2) >> Two Patches to SRX1 (reth1)
> EX1 (ae2) >> One patch to SRX0 (reth1)

...meaning that your AEs cannot survive losing either one of the EX VC members, and you're splitting each AE's connectivity across the two SRX chassis cluster members. You need to dedicate an AE to an SRX chassis cluster member.

IOW: ae18 should have one LAG member on EX0 and one member on EX1, and both of those physical ports go to SRX0. Likewise, ae20 should have one LAG member on EX0 and one member on EX1, and both of those physical ports go to SRX1.

When you shut one of the AEs (e.g. ae18) in the setup I describe, you *will* lose connectivity to its corresponding SRX, as those are fate-sharing. You would need to configure interface monitoring on the chassis cluster to flip over the primary to 2nd SRX in order to survive that, since the second AE (ae20) that is tied to the 2nd SRX is still up.

Your failure modes are:

e.g. 1: lose an EX, you lose the throughput that's being contributed to the AE by that VC member's ports, but both SRXs remain available and the primary shouldn't flip (provided your node priorities and interface-monitoring weights are set accordingly).

e.g. 2: shut an AE (which spans both EX VC members), one SRX goes dark since you've killed the AE that's dedicated to it, and the primary will need to flip (either through interface monitoring or manually) in order for the setup to remain online.

Randy1 · April 3, 2015, 3:53am

I've started to get some message today from google claiming that my computer or network was sending automated queries, and they are blocking me.
I'm not sending automated queries, Ive logged all of my outbound traffic and there is only my browser traffic going to google.

I'm not responsible for any one else on "my network" since it is owned by my ISP, and solely blocking me based on what some one else with an ip address close to mine is not an acceptable practice to have for an address used for personal web browsing.
I would like to know if there is any way to get into contact with google about this other then by legal means?

Pedro_Cavaca · April 3, 2015, 10:39am

https://support.google.com/websearch/answer/86640?hl=en

_Matt_Palmer · April 3, 2015, 9:53pm

Or, to answer your question more simply: "No".

- Matt

Joe · April 3, 2015, 10:14pm

Maybe - 1 (650) 253-0000? At least that what comes up when I google
google for google's phone number....Of course the more apparent course
of action would be to follow the directions and contact your ISP.I
highly doubt they'd be trying to block some IP address that's close to
yours and accidentally block the wrong one. I know the dude that does
all the blocking and its not likely he fat fingered it. He doesn't
make mistakes like that unless you made him really mad....

Happy Friday All!
-Joe

Fred · April 3, 2015, 10:28pm

I need contact to a Google network Admin as well. Having some serious issues with our clients reaching Google services.

Pedro_Cavaca · April 3, 2015, 10:54pm

Or, to answer your question more simply: "No".

That completely mischaracterizes my answer.

Christopher_Morrow · April 4, 2015, 1:16am

it always helps to provide more data in your request ...

Eduardo_Schoedler1 · April 4, 2015, 5:52am

Inoc-dba?

Randy1 · April 5, 2015, 8:01am

Randy, you can just use the contact details on their page about it:

"Unusual traffic from your computer network" - Google Search Help

Ask them for the netflow or other source of proof. My understanding was they blocked on /32s not larger subnets which would indicate that the traffic is coming from your network, and not someone with a similar address, but you should be able to check once they give you the info.

Hope that helps a bit
Lou

I've gotten a new IP by my ISP and the issue has gone away, but it was extra cost and inconvenience as I have to resetup my "non google" related services back up. I'm a developer, and one whos adept at removing malware, of which I broke my computer twice trying to find any existence of. Google has not responded yet from that contact form from when I submitted that originally. I've also found several places/fourms where people are having the same issue I was, hopefully it doesn't come back.

Harald_Koch · April 5, 2015, 2:33pm

Randy, you can just use the contact details on their page about it:

"Unusual traffic from your computer network" - Google Search Help

Ask them for the netflow or other source of proof. My understanding was
they blocked on /32s not larger subnets which would indicate that the
traffic is coming from your network, and not someone with a similar
address, but you should be able to check once they give you the info.

This reply suggests you've never actually used that contact page. Have you
received a response from them?

I get this message about once a month using one or both of my Linode-based
web proxies. Google remains silent; as they say in the contact page: the
process is completely automated and there's nothing mere humans can do
about it.

Bow to our robot overlords.

Anurag_Bhatia · June 13, 2015, 7:39pm

Hello everyone

Just thought to update over here that I was able to get it done as needed.
Some quick points across the same on building redundancy between Juniper EX
and SRX devices:

   1. Virtual chassis in EX is very different from clustering in SRX and I
   did not realized the same initially.
   2. Key difference is that in virtual chassis both devices run in stacked
   config and act as single device while routing engine of primary/master EX
   is used.
   3. In case of SRX only one device runs at a time and ports of other SRX
   (slave) do not access traffic at all as long as it see the master is up via
   heartbeat.

So keeping above points in mind, I did 6 cables connection between EX and
SRX. On SRX side all 6 ports belong to same reth bundle (3 on SRX-0 and 3
on SRX-1). On Ex side configuration is in a way to use two ae bundles. If
we use same ae bundle for all 6 ports then problem comes up as a % of
traffic will hit SRX-1 (slave/secondary) and would be trashed which is not
desired. Hence we need to make two ae bundles as say ae1 and ae2. One
bundle goes towards one SRX (say master SRX-0) and other bundle goes
towards other SRX-1. Ports in ae1 and ae2 can be distributed across
multiple Ex to ensure redundancy.

So setup can work as:

EX-0 >> 2 patches >> SRX-0
EX-1 >> 2 patches >> SRX-1

Ex-0 >> 1 patch >> SRX-1
Ex-1 >> 1 patch >> SRX-0

Now all ports on Ex towards SRX0 go in ae1 and SRX1 go in ae2.

Thanks everyone for help and inputs. Have a good weekend ahead!

Rob_Greenwood · June 13, 2015, 10:12pm

  3. In case of SRX only one device runs at a time and ports of other SRX
  (slave) do not access traffic at all as long as it see the master is up via
  heartbeat.

Not entirely accurate. The control plane (routing engine) is only active on one SRX at a time, however, the data plane is active on both. This means in your configuration, both ae1 and ae2 on the EX will be passing traffic to both SRXs.

It’s also worth noting you can create multiple redundancy groups and split them between both SRXs. If a device fails, all redundancy groups on the failed device will be migrated to the remaining one.

-Rob