FCC Issues Staff Report On T-Mobile Outage

The outage was initially caused by an equipment failure and then exacerbated by a network routing misconfiguration that occurred when T-Mobile introduced a new router into its network. In addition, the outage was magnified by a software flaw in T-Mobile¢s network that had been latent for months and interfered with customers¢ ability to initiate or receive voice calls during the outage.

44. While fiber link failures are common, PSHSB finds that these steps, taken together, will reduce the likelihood that a fiber link failure could result in the recurrence of a similar event in TMobile¢s network because traffic would be routed to an alternative path that could handle it. Moreover, if such an event recurred on T-Mobile¢s network, it would not cause such a large service disruption because T-Mobile would have improved its networks¢ ability to manage congestion in the case of a similar event
and would have increased network capacity to maintain the network in a working state even with an increased volume of traffic.


This part, I find most interesting as well:

However, they were unable to resolve the issue by restoring the link because
the network management tools required to do so remotely relied on the same
paths they had just disabled.

I can't begin to tell you how often I battled senior mgmt to get some investment
into an OOB network. This only proves the point.

Parantap, are you reading this? I know you are.



The larger story here is...

"7. Routing. Routers connect T-Mobile’s LTE towers to T-Mobile’s LTE
network. These routers utilize a routing protocol called Open
Shortest Path First."

Calling Vijay Gill to the courtesy phone.