IPV6 Multicast Listener storm control?

Richard_Holbo · September 22, 2014, 11:44pm

(originally posted to wispa ipv6 list, and someone there mentioned that
folks here might have some suggestions, so apologize if you are a member of
both.)

I am seeing issues with IPV6 multicast storms in my network that are fairly
low volume (1-2mbit), but that are causing service disruptions due to CPU
load on the switches and that the network is a Point to MultiPoint wireless
network.

I have about 500 IPV4 clients on a vlan served by Cisco ME3400, Catalyst
3750 and 3560 switches. These are switched back to a routed interface and
IP addresses are assigned by DHCP. We are not using IPV6 at all, and I
don't have control of the clients.

What I'm seeing is IPV6 Multicast Listener requests from a single client
(different clients at different times) going out on the network, the
switches manage them in software, so CPU goes up (not a lot, but it seems
to impact performance quite a bit), but the larger problem is that all
other IPV6 clients respond to the multicast broadcast address generating a
1-2mbit storm of traffic to all ports all the time. This then transits the
bandwidth constrained wireless network in a steady state, causing high
collisions which causes _significant_ performance degradation in the
wireless network.

It would appear that this is _generally_ caused by Dell or HP workstations
with buggy network interface cards in hibernate mode.

http://blog.bimajority.org/2014/09/05/the-network-nightmare-that-ate-my-week/

http://packetpushers.net/good-nics-bad-things-blast-ipv6-multicast-listener-discovery-queries/

Now it looks like from my reading that CISCO MLD snooping would _help_ with
this, though it would not stop the offender from generating the multicast
requests, it might keep if from reaching _all_ ports, but it would still
affect any ports that had _subscribed_ IPV6 clients, and it would require
changing the SDM template and a reload on all the switches. So not a real
answer and very painful.

Right now, I'm just tracking the source down and shutting it off. Do not
really want to get into an argument about switched vs routed, and am
working on reducing the size of the broadcast domain now, but this is a new
issue, and I need to come up with some kind of plan to resolve with my
current equipment/network.

Any thoughts?? Ideas? I suspect this will become more of an issue for more
folks in the near future.

/thanks

Mikael_Abrahamsson · September 23, 2014, 3:55am

If the packets are sent to ff02::1, then this will be sent to all ports even with MLD snooping turned on.

http://www.ietf.org/rfc/rfc4541.txt

"In IPv6, the data forwarding rules are more straight forward because
    MLD is mandated for addresses with scope 2 (link-scope) or greater.
    The only exception is the address FF02::1 which is the all hosts
    link-scope address for which MLD messages are never sent. Packets
    with the all hosts link-scope address should be forwarded on all
    ports."

So I doubt turning on MLD snooping will help.

Your switches, can't you do some kind of protocol based filtering, and only allow two ethertypes, ARP and IPv4?

Naslund_Steve · September 23, 2014, 4:24am

We have seen the same issue with Lenovo devices. They all seem to have a variety of Intel chipsets. We have not found a good solution other than updating drivers and/or shutting down ipv6 which we really don’t want to do but it is easier to automate that than to automate the driver update. I will be interested in seeing what anyone else has come up with to kill these off. In our case, the biggest issue is wireless clients that show this behavior because they really bury the access points CPU. The switched network seems to absorb the load better.

Steven Naslund
Chicago IL

Rob_Seastrom2 · September 23, 2014, 11:59am

Richard Holbo <holbor@sonss.net> writes:

I have about 500 IPV4 clients on a vlan served by Cisco ME3400, Catalyst
3750 and 3560 switches. These are switched back to a routed interface and
IP addresses are assigned by DHCP. We are not using IPV6 at all, and I
don't have control of the clients.

This configuration is reminiscent of my back lawn. It probably grew
organically, has been neglected for a period of time, and it's going
to require a bit of effort to tame it and bring it under control.

You probably don't have the option of blocking horizontal layer 2
traffic like the WISP guys do, and even if you were able to get away
with that it brings its own set of downsides to it.

The solution here is to chop things into separate broadcast domains,
each one no bigger than a single switch. You might bring each to a
routed interface on another device (or likely more than one other
device depending on your network layout), but on no account should you
have the broadcast domain span more than one port on that device.

Hopefully you don't have any poorly behaved software that depends on
being in the same broadcast domain. It can be difficult to inventory
that and make sure it all works before taking the leap. It could be
easier to just peel off one workgroup of people to configure them that
way as a pilot and see if anyone squawks. Tell them that you're doing
it and that you want feedback, since your current configuration is
conditioning them to just suck it up when the network periodically
flakes.

Hope this helps,

-r

Rob_Seastrom2 · September 23, 2014, 12:00pm

Richard Holbo <holbor@sonss.net> writes:

I am seeing issues with IPV6 multicast storms in my network that are fairly
low volume (1-2mbit), but that are causing service disruptions due to CPU
load on the switches and that the network is a Point to MultiPoint wireless
network.

OK, well one comment in my previous email will sound stupid (not
enough coffee yet) but the upshot remains: more subnetting.

-r