We're a small ISP / datacenter with a Time Warner fiber-based DIA contract
that is coming up for renewal.
We're getting much better pricing offers from Cogent, and are finding out
what Level 3 can do for us as well. Both providers will use Time Warner
fiber for last mile.
My questions are:
- Will we be sacrificing quality if we spring for Cogent?
(yesterday's Cogent/Verizon thread provided some cold chills for my spine)
- Is there a risk with contracting a carrier that utilizes another
carrier (such as Time Warner) for the last mile? (i.e. if there is a
downtime situation, are we going to be caught in a web of confusion and
finger-pointing that delays problem resolution)?
- How are peoples' experiences with L3 vs TWC?
Although I assume everyone on the list would be interested in what others
have to say about these questions, out of respect for the carriers in
question, I encourage you to email frank opinions off list.
Or if there are third party tools or resources you know that I could consult
to deduce the answers to these questions myself, they are most welcome.
We have had Cogent over Verizon's Fiber for more than a few years now. Cogent goes down once at year at minimum. They had 2 outages in a single day a couple days ago in Northern NJ. One in the AM "..caused by a power outage in a vendor data center where Cogent is collocated." They went on to have another outage at around 9:30 PM on the same day for which I'm still waiting for an RFO. During this outage, they still were advertising our BGP routes so we didn't fail over to our 2nd provider. I notice that happens alot with them. When they go down, they still advertise your routes.
As far as price goes, for us Cogent is cheap but Lightpath is cheaper.
Our college is kind of far from things so we don't have a lot of outside fiber coming. The last mile fiber for both of our connections are different from our Internet providers. I've never had a big issue with the two working with each other. The only issue we had is I suspected we weren't getting as much bandwidth as we paid for. They had to work out where the policer and/or bottle neck was. This is the only issue we had in 5 years with this set up and it got resolved. IME, when there is a full outage, it's always been clear who the responsible party is.
Cogent always has the cheapest rates but they also have the most peering disputes of any operator. I've seen intra-data center hops between cogent and Verizon take over 150ms.
As with all things Internet, your mileage may vary. I would not put something with a 5 9'a uptime requirement on cogent without a failover circuit. For less sensitive applications it seems like a win.
The Internet is both incredibly robust and fragile simultaneously.
When I priced out providers 2 years ago for 500Mbps over 1 gig fiber link the list from most expensive to least expensive was:
Verizon-->XO-->Cogent-->Lightpath
This is for Northern NJ. Abovenet and some of the other big providers couldn't reach our Campus. Lightpath ate the cost of running Fiber to our campus while the other weren't willing to do that.
I use Cogent as well, no real issues other than I wouldn't single home to
them. Personally, I don't understand why someone would depend on a single
provider for connectivity however...
When you say that "they still advertise your routes", do you mean:
doing so when they had problems? Or...
B: That routes you were originating continued to be propagated by them,
even though your session with them was down? Or...
C: Something else.
I ask, as we are considering some cheap Cogent bandwidth in the
not-too-distant future, to allow us to keep commit rates low on higher
quality connections. 'A' wouldn't be a real problem, since we run our own
AS and originate our own routes; 'B' could be potentially devastating.
B) We have our own AS and IP space. I advertise them to both Cogent and our other ISP. I use the local preference attribute to share the load for incoming traffic between both ISPs. In the last 5 outages over the last few years, this has happened twice. I'm waiting on the RFO so I can further investigate why this happened. I think someone mentioned this in a post a few months ago too.
It sucks for us, because we're a small school and don't have someone in a NOC to monitor our networks 24x7. I literally had to get out of bed and disable our BGP session with them for us to get through the outage. I was getting around 90% packet loss from my home to our router.
Based on your description, it sounds like the outage did not bring your BGP session down, as such you were connected and advertising to the broken Service Provider.
e.g. Cogent typically does multi-hop bgp, as such if there a network outage past the BGP router, you will experience the situation you described.
You might have to deploy some other means of (script ?) to bring your BGP session down from the 'broken' Service Provider.
To the best of my knowledge, BGP does not have any mechanism to determine broken connectivity upstream past the router you are BGP session is up with.
Faisal Imtiaz
Snappy Internet & Telecom
7266 SW 48 Street
Miami, FL 33155
Tel: 305 663 5518 x 232
You might have to deploy some other means of (script ?) to bring your BGP session down from the 'broken' Service Provider.
To the best of my knowledge, BGP does not have any mechanism to determine broken connectivity upstream past the router you are BGP session is up with.
Well, technically there's BFD that might do the trick. But of course it won't be available; it's not usually, so specially with Cogent...
But maybe its link was just overloaded in fact.
Well, if you are trying to balance the incoming traffic load with local-pref attribute, I can understand your disappointment
Since it doesn't work at all this way: local-pref is local to an AS and deals with outgoing traffic only.
Based on my understanding on BFD, it will not help you... BFD will detect the direct connected port being down quicker and force the BGP session down, (faster than the time BGP session timers take to determine something is broken)
This is the common issue / challenge in how to determine up-stream path outage and then doing appropriate route engineering on an automatic basis.
Maybe a SLA monitor type scripting/configuration be useful in your case.
Faisal Imtiaz
Snappy Internet & Telecom
7266 SW 48 Street
Miami, FL 33155
Tel: 305 663 5518 x 232
Did you verify your problem was announcements on the other side of the
outage? This sounds to me like you are using a bgp announced default
route from cogent which is always sent. I think the problem was you were
sending traffic out a path that was broken. Since you mentioned your
outbound balancing this would explain some packet loss and not 100%
loss.
We don't get a default route from them. At the time of the outage my bgp session was up and I had a full routing table from them. I didn't have much time to troubleshoot it in that state since we were down so I had to disable the session ASAP. Once the RFO comes in, I'll be asking a lot more questions about it. My only experience with BGP is as a customer so I'm not too familiar with the intricacies on the provider side. We had an outage in the AM the same day and we failed over just fine. I'm very curious why the same didn't happen in the evening.