OT: Question/Netflix issues?

Greetings,

  I know this is way off topic, but is anyone else getting calls/tickets
about Netflix access problems?

I tried (sucessfully) to duplicate the issues, seems like extremely slow
responses from the servers I have tested, as well seems the web servers
are also either overloaded or just dropping packets. Just wondering if
anyone else is seeing the same.

Kind Regards,
-Joe Blanchard

What does the AS path look like from them to you?

-RR

We¹re sorry, the Netflix website and the ability to instantly watch
  movies are both temporarily unavailable.However, our shipping centers
are continuing to send and receive DVDs so your order is in
  process as usual.
Our engineers are working hard to bring the site and ability to watch
instantly back up as soon
  as possible. We appreciate your patience and, again, we apologize for
any inconvenience this may
  cause. If you need further assistance, please call us at 1-877-445-6064.

Thanks Paul, I keep getting busy signals trying to reach them after the
Sales office and directly.

Just needed to know it wasn't something on our side.

Thanks all for the feed back.

-Joe Blanchard

http://twitter.com/Netflixhelps/status/50326616840220672

Netflix is currently down

Where do you pick up their feeds from ? A lot of their content seems to
come from CDNs like Akamai in my neighbourhood (in this case, Torix).
That being said, I just tried to watch a movie and all I get is
"Checking for device activation".... My browser seems to be blabbing
back and forth with
208.75.79.32 on port 443
which I see 11647 6453 7922 2906.

  ---Mike

Greetings,

  I know this is way off topic, but is anyone else getting calls/tickets about Netflix access problems?
<<<<<SNIP>>>>>
-Joe Blanchard

Quite to the contrary Joe. It is actually a pleasure to read an operationally relevant thread on NANOG. If your customers are calling about accessibility issues then this is 200% relevant. The week long diatribe about why, who, what, when, and if Sun Spots caused it, after the fact, are not.

Robert D. Scott Robert@ufl.edu
Senior Network Engineer 352-273-0113 Phone
CNS - Network Services 352-392-2061 CNS Phone Tree
University of Florida 352-273-0743 FAX
Florida Lambda Rail 352-294-3571 FLR NOC
Gainesville, FL 32611 321-663-0421 Cell

What Paul missed was to credit this info.

http://www.netflix.com/ Rather simple to go look.

Robert D. Scott Robert@ufl.edu
Senior Network Engineer 352-273-0113 Phone
CNS - Network Services 352-392-2061 CNS Phone Tree
University of Florida 352-273-0743 FAX
Florida Lambda Rail 352-294-3571 FLR NOC
Gainesville, FL 32611 321-663-0421 Cell
                          3216630421@messaging.sprintpcs.com

Now getting "We�re sorry, the Netflix website and the ability to
instantly watch movies are both temporarily unavailable." out of Charter.

Campus getting same routed via 1239 209 2906.

Jeff

Guess that move to Amazon EC2 wasn't such a good idea. First reddit,
now netflix.
http://techblog.netflix.com/2010/12/four-reasons-we-choose-amazons-cloud-as.html

I suppose there's a reason you can't get an SLA with any teeth from
Amazon...

You're assuming that the outage was somehow related to the quality of
hosting (virtual server, instance management, etc).

In my experience with large website failures, some of mine and talking
to others at conferences and elsewhere, I can't recall one where the
servers HW performance / virtualization management were the root cause
(and only one that was intrinsically hardware-based, which was a
catastrophic storage failure and not server failure). Configuration
management, inadequate testing of new software, systems management
error, DBMS throughput capacity, emergent software / architecture
failures are the usual culprits.

Guess that move to Amazon EC2 wasn't such a good idea. First reddit,
now netflix.
Four Reasons We Choose Amazon’s Cloud as Our Computing Platform | by Netflix Technology Blog | Netflix TechBlog

FWIW, at $DAYJOB we haven't been able to run out a pool of a couple of
dozen EC2 instances for more than two weeks (since last June) without
at least one of them going down. The same number of hardware servers
we ran ourselves in Peer1 ran for a couple of years with no unplanned
outages.

Amortized over five years, Peer1 colo + hardware is also cheaper than
the equivalent EC2 cost.

Hey everyone! Join the cloud, and stand in the pissing rain.

--lyndon

Greetings,

   Just to be clear I am only looking for a scope of the issue I am seeing,
its not a direct assumption of fault or mis-configuration, more so a sanity
check if you will. Thanks much for all of the feed back, as I see it its not
just me. Thanks again

-Joe Blanchard

Netflix was hard down for about an hour last night. This is strictly from an
end user perspective. Several of my buddies told me it was not even
responding to DNS.

-Hammer-

"I was a normal American nerd."
-Jack Herer

Interesting, because we run 120 with almost no issues whatsoever (3 failures over the past 12 months, none of which caused downtime). I've never had an EBS volume fail in the 18 months we've used them. IMHO, the "issues" with the cloud are almost always at a layer above the infrastructure.

--L

Reddit has routinely had EBS volumes either outright fail (2 major outages in the last month/month and a half, both caused by several EBSs vanishing), or show some not insignificant degradation in performance, and it seems barely a month goes by when I don't hear someone on twitter talking about similar with their infrastructures. Most of the problems I've heard about do seem to revolve around EBS, however, rather than their other services. It may be just the nature of people to pick on and shout about the biggest targets, but I'm reasonably sure almost all the problems I hear about relating to cloud services revolve around Amazon and rarely their competitors.

http://highscalability.com/blog/2010/12/20/netflix-use-less-chatty-protocols-in-the-cloud-plus-26-fixes.html
When it comes to other layers in the infrastructure probably one of the most talked about problems is network latency between instances. Netflix had to specifically re-engineer their platform because of it (and other major users talk of similar changes). There is almost certainly an argument to be made that the outcome of the forced re-engineering is a good thing as it's generally boosting resilience, but that it's been forced on them in such a way surely should also be of some cause for concern also.
Reddit seem to be working hard to make their platform as resilient as possible to their routine problems cause by the infrastructure. One of their outgoing dev's gave a pretty interesting read on the problems they'd experience with Amazon: http://www.reddit.com/r/blog/comments/g66f0/why_reddit_was_down_for_6_of_the_last_24_hours/c1l6ykx

I absolutely do think cloud hosting / virtual servers have value and use and shouldn't be underestimated or written off as a fad, but I'm also not entirely convinced at the moment that Amazon is a vendor to particularly trust with such services, I'd probably also argue that anyone keeping their eggs in one basket and relying on a single vendor for such services is taking a significant risk. There are plenty of tools and libraries out there to help provide a standard API for rolling out servers on different platforms. It seems crazy not to take advantage of the flexibility the cloud offers to remove as many SPOFs as possible.

Paul