Juniper Config Commit causes Cisco Etherchannels to go into err-disable state

I have cases open with both Cisco and Juniper on this, but wanted to see if
anyone else had seen an issue like this because support has no idea.

I have a Juniper QFX 5100 Core running in Virtual Chassis mode with 4
switches. I have 4 separate stacks of Cisco 3750 switches with 2x1GB
uplinks bound into 4 different LACP trunks. I have had it happen twice now
where I apply a trunk port config(not an LACP trunk) to a port that isn't a
part of any of the LACP trunks and it causes all 4 of the Etherchannels on
the Cisco stacked switches to go into an err-disable state with these
messages:

Mar 14 07:11:33: %PM-4-ERR_DISABLE: channel-misconfig (STP) error detected
on Gi1/0/48, putting Gi1/0/48 in err-disable state

Mar 14 07:11:33: %PM-4-ERR_DISABLE: channel-misconfig (STP) error detected
on Po17, putting Gi1/0/48 in err-disable state

Mar 14 07:11:33: %PM-4-ERR_DISABLE: channel-misconfig (STP) error detected
on Po17, putting Po17 in err-disable state

Mar 14 07:11:34: %LINEPROTO-5-UPDOWN: Line protocol on Interface
GigabitEthernet1/0/48, changed state to down

Mar 14 07:11:33: %PM-4-ERR_DISABLE: channel-misconfig (STP) error detected
on Gi2/0/48, putting Gi2/0/48 in err-disable state (CA-TOR-1-7-2)

Mar 14 07:11:34: %LINEPROTO-5-UPDOWN: Line protocol on Interface
GigabitEthernet2/0/48, changed state to down

Mar 14 07:11:34: %LINEPROTO-5-UPDOWN: Line protocol on Interface
Port-channel17, changed state to down

Here is the config I am applying to the port that has caused this issue to
happen twice now:

set interfaces ge-0/0/67 description "Firewall Port"
set interfaces ge-0/0/67 unit 0 family ethernet-switching interface-mode
trunk
set interfaces ge-0/0/67 unit 0 family ethernet-switching vlan members 9-10
set interfaces ge-0/0/67 unit 0 family ethernet-switching vlan members 29
set interfaces ge-0/0/67 unit 0 family ethernet-switching vlan members 31-32
set interfaces ge-0/0/67 unit 0 family ethernet-switching vlan members 43
set interfaces ge-0/0/67 unit 0 family ethernet-switching vlan members 50-51
set interfaces ge-0/0/67 unit 0 family ethernet-switching vlan members 56
set interfaces ge-0/0/67 unit 0 family ethernet-switching vlan members 58
set interfaces ge-0/0/67 unit 0 family ethernet-switching vlan members 66
set interfaces ge-0/0/67 unit 0 family ethernet-switching vlan members 68
set interfaces ge-0/0/67 unit 0 family ethernet-switching vlan members 90
set interfaces ge-0/0/67 unit 0 family ethernet-switching vlan members 143
set interfaces ge-0/0/67 unit 0 family ethernet-switching vlan members 170

The issue happens within a couple of minutes of committing the config on
the Juniper side, there are no cables plugged into port 0/0/67 so
technically there shouldn't be any BPDU's sent out since there isn't a port
change.

Juniper Support wants me to turn on trace option and then run though a
bunch of scenarios, the issue is that testing this takes down my network.

Just wanted to put it out there to see if anyone else had run into a
situation similar to this.

TIA

Joe

I don't see any issue with the snippet of the config you provided for the "Firewall Port". Is there a chance that the port ge-0/0/67 is referenced somewhere else in the Juniper config that when applying your trunk setup is causing issues?

Just throw that out off the top of my head and not really thinking it through.

Robert

We have to do this on all of our Cisco Port-channels that lead to Brocade
ICX switches:
no spanning-tree etherchannel guard misconfig

If we don't do it, after a couple of days, the Cisco will err-disable the
Port-channel just as you describe. I guess the misconfig detection is
incompatible with the Brocade OS.
We have seen no ill effects from this, as we are using "mode active" on all
our Port-channels. So if there is a misconfiguration, the LAG does not come
up for that port on either end, and we're good.

Hope that helps.

No there isn't, but from what I am getting responses both onlist and off
list is to just run this on the Cisco switches:

no spanning-tree etherchannel guard misconfig

and that should resolve the issue.

Thanks Everyone.

I am kind of confused by your configuration. If the Cisco side is configured as LACP trunk, then the Juniper side also needs to be configured as LACP trunks. Spanning-tree would be getting confused because the Cisco is treating the LACP trunk as a single interface for purposes of spanning-tree (which should be configured at the port-channel level), Juniper is considering them to all be individual ports and would be sending BPDUs over each individual interface. The Cisco is correctly error disabling the port because it detects individual port BPDUs and determines that the channel is misconfigured. Or am I missing something in your config completely?

If you are configuring ports other than the connected ports as trunks then your case makes sense. One thing that might cause you issue is the VLAN access of the LACP trunk. If one side has an vlan access list and the other side does not, you might get a spanning tree error when you configure a port on a new VLAN. Essentially you have a "trunk all" on one side and a new VLAN is showing up on a trunk that is not allowed on the other side. It would also help to see your spanning tree configuration (i.e. are both side running the same spanning tree mode?). The clue here is that the event triggers even though the port is not up yet. If you configure a new port on a VLAN that is not currently up, the VLAN will come up on all trunks that are allowed to have all VLANs immediately.

Steven Naslund
Chicago IL

It really does not resolve anything it just allows a bad configuration to work. The guard is there so that if one side is configured as a channel and the other side is not, the channel gets shut down. Allowing it to remain up can cause a BPDU loop. Your spanning tree is trying to tell you something, you should listen or you could get really hard to isolate issues.

Steven Naslund
Chicago IL

Steve let me clarify the config I am applying has nothing to do with an
LACP trunk or any of my existing LACP trunks. It is a completely different
configuration on a completely different interface, the only similarity is
that I am trying to configure a trunk interface on the Juniper side for
multiple vlans. There is no LACP configuration involved.

This are also no new vlans being used at all. They are all already existing
on the switches involved and nothing is being added. In fact what makes
this even weirder is that I already have that exact same port configuration
running on port 1/0/67 of the Juniper and it doesn't cause me any issues
nor did it cause any issues when the config was applied. This existing port
1/0/67 has gone down/up as the firewall has been rebooted and doesn't cause
any issues or hiccups on the network. For reference the attached firewall
is an ASA which doesn't do spanning tree anyways.

set interfaces ge-1/0/67 description "Firewall Port-2"
set interfaces ge-1/0/67 unit 0 family ethernet-switching interface-mode
trunk
set interfaces ge-1/0/67 unit 0 family ethernet-switching vlan members 9-10
set interfaces ge-1/0/67 unit 0 family ethernet-switching vlan members 29
set interfaces ge-1/0/67 unit 0 family ethernet-switching vlan members 31-32
set interfaces ge-1/0/67 unit 0 family ethernet-switching vlan members 43
set interfaces ge-1/0/67 unit 0 family ethernet-switching vlan members 50-51
set interfaces ge-1/0/67 unit 0 family ethernet-switching vlan members 56
set interfaces ge-1/0/67 unit 0 family ethernet-switching vlan members 58
set interfaces ge-1/0/67 unit 0 family ethernet-switching vlan members 66
set interfaces ge-1/0/67 unit 0 family ethernet-switching vlan members 68
set interfaces ge-1/0/67 unit 0 family ethernet-switching vlan members 90
set interfaces ge-1/0/67 unit 0 family ethernet-switching vlan members 143
set interfaces ge-1/0/67 unit 0 family ethernet-switching vlan members 170

Got it. Do any of those trunks add a new VLAN to the switch that was not active before? If so, that would cause a BPDU over all trunks that allow that VLAN. Even if the port is not up yet, by adding the VLAN to ANY trunk you are implying that it should be active on ALL trunks that are not VLAN limited.

Steve

What it's telling you is totally unclear, though. I've asked TAC to
explain to me the packet behaviour that generates this errdisable, and
haven't been able to get a clear answer from them. It seems to come out
of 'nowhere' on multi-vendor networks, where all other vendors are
perfectly happy and no operational or configuration issue is evident,
other than Cisco shutting the port. As far as I can tell from the
documentation's description of this case, it should not even be possible
for it to trigger when LACP is in use (as the 'port channel' is
negotiated by LACP, not configured by the user...), yet it certainly can.

FWIW, I've also seen this between Juniper and Cisco, and have been
forced to disable the misconfig detection.

If you know exactly what Cisco's STP is telling me happened with this
error, I'd really love to know, it might at least help to understand how
it could be triggering, because it is definitely not 'port-channel
misconfiguration'.

Keenan

Please see the link below, that ugly hack should be disabled asap on all your
Cisco boxes:

https://supportforums.cisco.com/t5/lan-switching-and-routing/spanning-tree-etherchannel-guard-misconfig/td-p/1147273

MD

What it's telling you is totally unclear, though. I've asked TAC to
explain to me the packet behaviour that generates this errdisable, and
haven't been able to get a clear answer from them. It seems to come out
of 'nowhere' on multi-vendor networks, where all other vendors are
perfectly happy and no operational or configuration issue is evident,
other than Cisco shutting the port. As far as I can tell from the
documentation's description of this case, it should not even be
possible for it to trigger when LACP is in use (as the 'port channel'
is negotiated by LACP, not configured by the user...), yet it
certainly can.

FWIW, I've also seen this between Juniper and Cisco, and have been
forced to disable the misconfig detection.

If you know exactly what Cisco's STP is telling me happened with this
error, I'd really love to know, it might at least help to understand
how it could be triggering, because it is definitely not 'port-channel
misconfiguration'.

Keenan

> It really does not resolve anything it just allows a bad configuration to

work. The guard is there so that if one side is configured as a channel and the
other side is not, the channel gets shut down. Allowing it to remain up can
cause a BPDU loop. Your spanning tree is trying to tell you something, you
should listen or you could get really hard to isolate issues.

>
> Steven Naslund
> Chicago IL
>
>> From: NANOG [mailto:nanog-bounces@nanog.org] On Behalf Of Joseph Jenkins
>> Sent: Thursday, April 05, 2018 4:16 PM
>> To: Robert Webb
>> Cc: nanog@nanog.org
>> Subject: Re: Juniper Config Commit causes Cisco Etherchannels to go into

err-disable state

>>
>> No there isn't, but from what I am getting responses both onlist and off

list is to just run this on the Cisco switches:

Sounds like the Juniper is leaking a "default" BPDU as it resets the
various internal chip configurations, which the Cisco receives thus
triggering the err-disable.

/mark

Not sure exactly what your environment looks like, but we encountered
something similar when trunking Cisco-DELL and Cisco-Juniper switches.
We run RSTP on DELL and Juniper switches, but RPVST+ on Cisco. In the
beginning we just allow those VLANS we need between Cisco-DELL/Juniper
switches, then encountered unexpected err-disable / link drop things.
Later we figured Cisco always carry default VLAN (VLAN-1) untagged
through trunk ports. Hence we manually "explicitly" add/allow
Native-VLAN-1 (untagged) on all trunk ports in all switches. Problem
solved.