Issues with Gmail

<bleep> happens and services break. the internet is a wonderful
demonstration of building a reliable network out of reliable components.

but what we have with google mail (and apps) is two scary problems

  o way too many users relying on a single point of failure. so it
    makes the nyt when it breaks because of the number of users
    affected, and

  o too many foolish people giving their private data to a data miner to
    whom they actually yeild rights to those data and who seems to store
    them for a scary long time.

randy

Amazing what we will pay for free service.

-M<

There's a post-mortem on the gmail blog:
NANOG list <nanog@nanog.org>

http://gmailblog.blogspot.com/2009/09/more-on-todays-gmail-issue.html

[....] the internet is a wonderful
demonstration of building a reliable network out of reliable components.

but what we have with google mail (and apps) is two scary problems

o way too many users relying on a single point of failure. so it
makes the nyt when it breaks because of the number of users
affected, and

I choose to not assume to "what/which single point of failure" this
reference by Randy applies. However, we can take confidence in the
fact that Google's Gmail service architecture is distributed; not to be
interpreted of course, as suggesting that within the distribution, there
isn't a single point of failure. Perhaps, from a network operations
point of view, the point needs elaboration.

o too many foolish people giving their private data to a data miner to
whom they actually yeild rights to those data and who seems to store
them for a scary long time.

Naturally, this is a separate issue, and indeed a very prickly one,
which is beyond the charter of NANOG. Therefore, I refrain from penning
any thoughts on it.

All the best,
Robert.

Long before we has widespread commercial internet, we still had to have the backup plan for when the single highly fault tollerant entitity on which we were dependant on for a particular service went out.

Sometimes, that plan is wait for restoration, whether it was because the bell systems got a bit melty on the long distance, or because your regional utility managed to melt down the power grid taking out both substations providing diverse feeds.

Systemic but temporarly localized failured has existed as long as the weather. One can move the failure around but I think I can confidently assert that we'll never entirely eleminate it.

Michael Thomas wrote:

I think that Randy might be conflating single point of failure with
"resilience". Google, distributed on every level as it is, is still
just one operator and in this case the lemmings faithfully followed
each other into the sea. We've been on an anti-resilience binge for
quite some time, accelerated to warp speed by the advent of the
Internet itself. There's something to be said about not having all of
your
police scanners, etc, etc on the internet from a resilience
standpoint, but the siren call is strong for good reasons too.

Mike

As I have mentioned to Randy separately, my interest was to understand
whether he
had made the "single point of failure" reference colloquially, or in a
/critical infrastructure/
context.

Some treat, and relate to the Internet as though it is a part of
/"critical infrastructures."/
I simply wished to better understand the point of reference. A
caveat... in stating this above...
it is not a personal intention, to now originate a vacuous and
malodorous thread on NANOG
regarding the Internet's place in critical infrastructures. Surely,
that cannot be resolved here,
in this community.

Regards,
Robert.

Long before we has widespread commercial internet, we still had to have the backup plan for when the single highly fault tollerant entitity on which we were dependant on for a particular service went out.

Sometimes, that plan is wait for restoration, whether it was because the bell systems got a bit melty on the long distance, or because your regional utility managed to melt down the power grid taking out both substations providing diverse feeds.

Systemic but temporarly localized failured has existed as long as the weather. One can move the failure around but I think I can confidently assert that we'll never entirely eleminate it.

Right, but a cascading failure now with the internet is liable to be far more
serious than back in the good old days. The electrical grid is probably an
example of a system with relatively low resilience, but once it goes onto the
net its resilience is vastly lessened. So we're making this grand engineering
and economic trade off of less resilience for better interconnection. Which has
a tendency to be a great trade off when things are going right, and a terrible one
when things are going wrong :slight_smile:

I've always wondered what is going to happen when we have our first catastrophic
cascading failure ala the blackout of 1965 or something similar but with the net
instead. The real miracle of the net is that we _haven't_ had such a thing yet,
but it really is only a matter of time unless somebody's willing to stand up and
say that such things have been safely engineered away :slight_smile:

Mike