Anyone experiencing problems with Wiltel Backbone, or know of any issues with the Wiltel Backbone? I called their NOC and was told they are experiencing a nationwide routing problem that they are working on but couldn’t get any further details?
Told me it was related to the L3/Wiltel integration. Most of the breaks I've been seeing seem to be at the point where most of my traffic has be going from wiltel to l3 in DC or so. Oh, and that the "former wiltel" tier 1 guys had 40+ calls in the hold queue...
To add the notable quotes: "running scripts updating BGP configs" and "don't know why they're doing it in the middle of the day"
I can confirm this as well, although have no proof (at this point)
that Layer3 is necessarily to blame. We (or rather the company I
work for) are seeing similar between MCI/UU/Verizon and WCG when
reaching some of our clients:
4. 0.so-4-0-0.CL1.LAX15.ALTER.NET 0.0% 102 10.9 11.3 10.9 17.1 0.7
5. POS6-0.GW1.LAX15.ALTER.NET 0.0% 102 10.7 11.1 10.7 13.9 0.5
6. wcgGigELAX-gw.customer.alter.net 98.0% 102 135.4 147.0 135.4 158.5 16.3
7. anhmca1wcx2-pos6-1-oc48.wcg.net 98.0% 102 162.9 156.8 150.8 162.9 8.6
"Yes, we became aware of a problem within the Level 3 network that affected all
routing from all affiliated ISPs on the Internet. Many of our own customers
called because they were being affected. At approximately 3:45 PM EST, the
Cogent NOC was able to contact the Level 3 NOC and obtain information that the
issue should be resolved within 15-20 minutes from that time. The specifics
were not released either to our NOC or from Level 3's NOC."
FYI Level 3 had a pre-existing peering issue with UUNet/Alternet at the
end of last week that was due to issues on an oc48. If that hasn't been
resolved, it is a separate (or compound) issue compared to today's
Wiltel outage. We have been told by L3 that it was an outage on the
legacy Wiltel network. From what we gather they had to reroute a lot of
that traffic onto other parts of their network, causing increased total
traffic and then latency, but that has not been confirmed to us.
I have a box sitting in a colo off a WCG circuit in Columbus, OH;
traceroutes from the west coast were dying a few hops short of the colo
facility, but I'm not a direct customer of WCG, so calling them for info
would have been pointless...
Steve Sobol wrote:
Was anyone able to get an RFO or post-mortem for this?
"An inaccurate set of BGP policies were distributed to routers connected to AS791 1 when an automated update script ran at 1100 MDT. The update script regularly ran every two hours to update the network with current BGP information. Due to the scheduled shutdown of the legacy BGP policy server and subsequent con version to the Level3 route registry engine, the old server policy server was sh utdown. In addition, the scripts used to update routes on the network were to be disabled. One of these scripts wasn t disabled as intended. As a result, the script ran as scheduled at 1300MDT and consequently pushed partial configurations to production routers because the script was unable to communicate with decommissioned policy server. Incorrect policies were exchanged between AS7911 s customers and peers resulted in increased latency; as large route blocks attem pted to traverse individual customer connections.
Updated configurations were pushed to all the routers, individual connections were cleaned up and BGP sessions were restored. In addition, the automated BGP script has been shut-off. Maximum pre-fix list limits have been established across the network as a risk mitigation step.