Carrier Circus (was RE: Intermedia (ICIX) brokenness...)

Richard_A_Steenbegen · May 4, 2001, 7:47pm

Personally, I'm still trying to figure out why Exodus, in all their
apparent wisdom (or lack thereof), has stopped using the GBLX OC-48's
in the former GlobalCenter facilities (or at least SNV3), and is now
shuttling all its traffic out a single Exodus OC-12. Prior to
yesterday these traces would've shown gblx.net routers (on different
IPs), and would never have touched an exodus backbone...

Hrm lets think about that for a momment shall we. Could it be, perhaps,
that Exodus purchased GlobalCenter and is integrating those facilities
into their network? Could it also be that Exodus has a well designed
network where most of the traffic is quickly sent to peers and an OC48
backbone is not required? I don't see any congestion on that OC12, so
perhaps that is the case? I also don't see a damn thing wrong with the
traceroute you provided, and an OC12 peer to UU is pretty good. Was there
some other complaint or do you just not like it when your traceroute
changes?

Of course, this is probably a move I should've expected from Exodus,
after the mongolian flustercluck that was the AS change in SNV3.
You'd think they would do something like that carefully, as you can
-seriously- bone customers. But noooooo. One of our junior admins
made the change (since I was out of town, but hey, it's cut and
paste!). He, and all of the other affected customers in SNV3 on the
conference call, were left on hold for about half an hour (plus the
call started half an hour late), whereupon the exodus engineering team
popped back in and said "We're done with our side, you guys go
ahead!".

Actually I was awake for that. I guess your junior engineer wasn't able to
figure out that if he simply put in an additional neighbor statement with
the new AS your downtime would have been less then 30 seconds as bgp came
back up. 30 second outages are pretty light in the history of GCTR and
GBLX outages, if you can't handle maint then you should have setup static
routes out or multihomed, but you shouldn't blame your stupidity or lack
of forethought on other networks.

Now. Does it seem logical to kill connectivity over BOTH of your
hosting routers at once, thus killing every single BGP-running
customer you have that isn't physically in their cage at the time?
Or would it seem better to do what I assumed they'd do, which is do
one router, wait for everyone to make changes, then do the other?

ASN changes are not exactly easy or frequent, but I seem to recall that
one going over rather smoothly. Customers were given ample warning and a
conference call was setup to handle any outstanding issues, of which there
were none.

I guess this is what happens when I assume intelligence at a
hosting/backbone provider.

Or when we assume intelligent posts to nanog...

Christian_Nielsen2 · May 4, 2001, 10:05pm

Just to give correct information, from SNV3 Exodus has an OC48 and an OC12
to two other datacenters in Santa Clara and an OC48 to Chicago. There is
enough bandwidth to carry the traffic.

No peering or backbone link was hurt during this move.

Thanks

Christian

Yes, I work for Exodus.

Majdi_S_Abbas1 · May 5, 2001, 2:49am

I would definitely have to agree. Personally, I think this
speaks volumes.

I have spent 4 hours on the phone with GCTR trying to find
someone capable of understanding that HSRP does not work when you put
interfaces in the same subnet on different VLANs.....

I have spent 5 hours waiting for GCTR to diagnose a simple
failed linecard. That was just the diagnosis; it took another 7
hours or so to actually replace it.

You're right about multihoming; anyone in a GCTR facility
should /definitely/ be multihomed.

--msa