reachability problems Europe->US?

Thomas_Schmid · October 7, 2010, 12:23pm

Hi,

any known problems with reachability from europe to US? We have
customer complaints that they can't reach US-based sites like microsoft and
others. Seems to be only source-prefix-based, but several ISPs in europe
are affected.

Or is the same problem visible in the states?

Regards,

Thomas

Heath_Jones · October 7, 2010, 12:35pm

Seems to be only source-prefix-based, but several ISPs in europe are affected.

Can you post source and destination IP's ?

Thomas_Schmid · October 7, 2010, 1:09pm

Hi,

Seems to be only source-prefix-based, but several ISPs in europe are affected.

Can you post source and destination IP's ?

source: 131.220.0.0/16, 212.201.68.0/22, 212.201.72.0/21,
destination: 65.122.178.73, 63.228.223.104

traceroute to 65.122.178.73 (65.122.178.73), 30 hops max, 40 byte packets
  1 er-rz-gig-3-3.stw-bonn.de (131.220.99.62) 1.792 ms 1.275 ms 1.125 ms
  2 xr-bon1-te2-3.x-win.dfn.de (188.1.233.193) 0.705 ms 2.132 ms 0.755 ms
  3 xr-bir1-te2-3.x-win.dfn.de (188.1.144.9) 1.477 ms 1.936 ms 1.051 ms
  4 zr-fra1-te0-7-0-5.x-win.dfn.de (188.1.145.46) 4.034 ms 3.734 ms 4.957 ms
  5 64.213.78.237 (64.213.78.237) 3.866 ms 3.295 ms 26.854 ms
  6 jfk-brdr-04.inet.qwest.net (63.146.26.225) 119.511 ms 92.735 ms 99.019 ms
  7 * * *

or quote from DE-CIX tech-list:

[www.microsoft.com]

Thomas_Schmid · October 7, 2010, 1:50pm

an update:

Heath_Jones · October 7, 2010, 2:06pm

Seems to be only source-prefix-based, but several ISPs in europe are
affected.

source: 131.220.0.0/16, 212.201.68.0/22, 212.201.72.0/21,
destination: 65.122.178.73, 63.228.223.104
traceroute to 65.122.178.73 (65.122.178.73), 30 hops max, 40 byte packets
1 er-rz-gig-3-3.stw-bonn.de (131.220.99.62) 1.792 ms 1.275 ms 1.125 ms
2 xr-bon1-te2-3.x-win.dfn.de (188.1.233.193) 0.705 ms 2.132 ms 0.755 ms
3 xr-bir1-te2-3.x-win.dfn.de (188.1.144.9) 1.477 ms 1.936 ms 1.051 ms
4 zr-fra1-te0-7-0-5.x-win.dfn.de (188.1.145.46) 4.034 ms 3.734 ms 4.957
ms
5 64.213.78.237 (64.213.78.237) 3.866 ms 3.295 ms 26.854 ms
6 jfk-brdr-04.inet.qwest.net (63.146.26.225) 119.511 ms 92.735 ms
99.019 ms

Based on all that, it looks like Qwest is not propogating your routes
within their network.
I was going to recommend route-views, but it might not reflect that
now if you have dropped GBLX.
Historical routing updates will show though if Qwest were advertising
reachability to you (which would be a good indicator if they were
filtering at their edge)

John_van_Oppen1 · October 7, 2010, 3:59pm

Global crossing is having major issues (since yesterday actually) in Seattle. Every path I see to dfn.de is via gblx and Microsoft hosts most of those sites out of the seattle area so they may be seeing the same issue.

Based on what we can see gblx has a broken port-channel or something similar here as random traffic (into) their network via our transit link gets black-holed. We could not even reach global crossing's own name servers for a while. We gave up and turned down BGP yesterday until we hear from them. Based on graphs at the time things broke they appeared to be black-holing roughly 1/4 of what we were sending them.

Thanks,
John van Oppen
Spectrum Networks / AS 11404

Heath_Jones · October 7, 2010, 4:22pm

... random traffic (into) their network via our transit link gets black-holed.

So for the same source & destination, sometimes it works, sometimes it doesn't?

Heath_Jones · October 7, 2010, 4:24pm

It seemed from the symptoms OP was seeing, that Qwest was the issue.
Has GLBX reported to you that they are having a fault? If not, perhaps
try tagging your exported routes to GLBX with 8010 as per this:
http://onesc.net/communities/as3549/

John_van_Oppen1 · October 7, 2010, 4:44pm

I know for certain it was gblx, noc confirmed, we saw this to multiple destinations all with the outbound towards gblx (not just DFN). We are on the same GBLX pop the sites they are talking about are connected to (westin) and almost every path I see back to dfn (from seven upstreams in seattle) was via gblx not qwest, the only exceptions were level3's and Savvis' routes which are via AS1299.

I think the asymmetric routing was obfuscating the problem a bit for the guys attached to DFN.

John

John_van_Oppen1 · October 7, 2010, 4:46pm

It looked like a broken aggregated Ethernet bundle or something similar... Most annoying was that the issue moved around a bit, over about five hours all the broken test IPs we had started working again and then other destinations started failing. All was well when we turned down gblx. As of now though we are seeing the issue as fixed and turned up GBLX again.

Thanks,
John

Thomas_Schmid · October 7, 2010, 5:12pm

yes, I can confirm that situation is back to normal now after we re-enabled
the GBLX session. I heared from others that it was again a broken LSP
problem in GBLX (unconfirmed )

Cheers,

Thomas

Richard_A_Steenbegen · October 7, 2010, 7:07pm

Global Crossing recently started deploying Foundry/Brocade XMR's in
their MPLS core, as a lower cost alternative to their old T640/OC192
MPLS core model. Unfortunately these boxes are buggy as all hell, and
seem to blackhole LSPs somewhere in their network on at least a weekly
basis. I think we've seen at least a dozen issues similar to this over
the last couple months, though most of them were out of LA, so I didn't
know they had actually done a Seattle deployment.

Honestly GX deserves what they get on this one. I'm not aware of any
other large network who has ever done a serious MPLS deployment using
these boxes (and if you're thinking of replying to this and saying "hey
we do some vll's between 2 routers and it seems to work", stop and think
about what I might mean when I say a SERIOUS mpls deployment first :P),
so this was pretty much to be expected. I'll also say that I'm
remarkably underwhelmed by their response to this issue, and suggest
that anyone who doesn't want their packets blackholed by the Floundrys
be prepared to vote with their wallet.