iPhone and Network Disruptions ...

Hank, Warren, & Fellow Nanogers:

Looking at this issue with an ‘interoperability lens,’ I remain puzzled by a personal observation that at least in the publicized case of Duke University’s Wi-Fi net being effected, the “ARP storms” did not negatively impact network operations UNTIL the presence of iPhones on campus. The nagging point in my mind therefore, is: why have other Wi-Fi devices (laptops, HPCs/PDAs, Smartphones etc.,) NOT caused the ‘type’ of ARP flooding, which was made visible in Duke’s Wi-Fi environment? Why did this issue become MOST prominent with the introduction of Apple’s iPhone on campus?

In that sense, my original question regarding iPhone’s ‘unique’ operational circumstance(s) will have/need to be considered. Initial analysis tells me that we may not be far into that aspect but, we might need to…

Again, I wish to thank you for the responses.

All my best,

Reading the Cisco document the conclusion seems obvious: the iPhone implements RFC 4436 unicast ARP packets which cause the problem.

I don't have an iPhone on hand to test this and make sure, though.

The difference between an iPhone and other devices (running Mac OS X?) that do the same thing would be that an iPhone is online while the user moves around, while laptops are generally put to sleep prior to moving around.

But I know that I have walked around IETF meetings with my laptop open, and I know others do too, and I don't recall
ever hearing about this problem at an IETF meeting from Jim Martin and the other NOC volunteers.


If you look at Kevin's example traces on the EDUCAUSE WIRELESS-LAN listserv
you'll see that the ARP packets are in fact unicast.

Iljitsch's point about the fact that iPhones remain on while crossing
wireless switch boundaries is exactly dead on. If you read the security
advisory you'll see that it involves either L3 roaming or two or more WLCs
that share a common L2 network. Most wireless clients don't roam in such a
big way.


With the exception of our 1000+ Cisco 7920 phones...

Then again, they probably work just fine with Cisco's other products, heh.

  - d.

There is also the weird property of many types of "flood vulnerable" systems that they seem to remain stable until some sort of threshold is reached before suddenly spiraling out of control.

I am not sure of the exact mechanism behind this, but I have seen multiple instances of this happening. The standard scenario is basically:

You have a couple of switches with STP turned off -- someone plugs in some random cable, forming a bridge loop....... and everything continues running fine, until some time in the future when it all goes to hell in a hand-basket. Now, I could understand the system remaining stable until the first broadcast / unknown MAC caused flooding to happen, but I have seen this system remain stable for anywhere from a few days to in a few weeks before suddenly exploding.

I have seen the same thing happen in systems other than switches, for example RIP networks with split-horizon turned off, weird frame-relay networks, etc. Unfortunately I have never managed to recreate the event in a controlled environment (In the few cases that I have cared enough to try, I form a loop and everything goes BOOM immediately!), and in the wild have always just fixed it and run away (its usually someone else's network and I'm just helping out or visiting or something). I HATE switched networks.....

A few observations:
In *almost* all of the cases, things *do* go boom immediately!
In the instances where they don't, there doesn't seem to be a correlation between load and when it does suddenly spiral out of control [0].
There is not a gradual increase increase in the sorts of packets that you would expect to see cause this (in a switched environment, you do not see flooded packets slowly increase, or even an exponential increase over a long time, there is basically no traffic and then boom! 100%).

Anyway, I have wondered that triggers it, but never enough to actually look into much....


[0] Except for one case that I remember especially fondly -- it was switched network with something like 30 switches scattered around -- someone had plugged one of those "silver satin" phone type cables (untwisted copper) between two ports on a switch -- the cable was bad enough that most of the frames were dropped / corrupted, but under high broadcast traffic loads enough packets would make it through to cause a flood, and then after some time (5-10 minutes) it would die back down...

If you want to hear about something whacked along those lines - imagine
two access points which had spanning tree disabled, connected to
a pair of switches on a vlan which wasn't running stp (thanks to
platform stp limitations, the switches running pvstp and said
campus having >800 vlans), and said ap's would occasionally associate
in infrastructure mode - which would cause a broadcast storm
on that vlan and fill trunk pipes with spaf. Debugging that one was