Going With The Flow: Google’s Secret Switch To The Next Wave Of Networking
By Steven Levy April 17, 2012 | 11:45 am |
Categories: Data Centers, Networking
In early 1999, an associate computer science professor at UC Santa Barbara
climbed the steps to the second floor headquarters of a small startup in Palo
Alto, and wound up surprising himself by accepting a job offer. Even so, Urs
Hölzle hedged his bet by not resigning from his university post, but taking a
He would never return. Hölzle became a fixture in the company — called
Google. As its czar of infrastructure, Hölzle oversaw the growth of its
network operations from a few cages in a San Jose co-location center to a
massive internet power; a 2010 study by Arbor Networks concluded that if
Google was an ISP it would be the second largest in the world (the largest is
Tier 3, which services over 2,700 major corporations in 450 markets over
100,000 fiber miles.) ‘You have all those multiple devices on a network but
you’re not really interested in the devices — you’re interested in the
fabric, and the functions the network performs for you,’ Hölzle says.
Google treats its infrastructure like a state secret, so Hölzle rarely speaks
about it in public. Today is one of those rare days: at the Open Networking
Summit in Santa Clara, California, Hölzle is announcing that Google
essentially has remade a major part of its massive internal network,
providing the company a bonanza in savings and efficiency. Google has done
this by brashly adopting a new and radical open-source technology called
Hölzle says that the idea behind this advance is the most significant change
in networking in the entire lifetime of Google.
In the course of his presentation Hölzle will also confirm for the first time
that Google — already famous for making its own servers — has been designing
and manufacturing much of its own networking equipment as well.
“It’s not hard to build networking hardware,” says Hölzle, in an advance
briefing provided exclusively to Wired. “What’s hard is to build the software
itself as well.”
In this case, Google has used its software expertise to overturn the current
If any company has potential to change the networking game, it is Google. The
company has essentially two huge networks: the one that connects users to
Google services (Search, Gmail, YouTube, etc.) and another that connects
Google data centers to each other. It makes sense to bifurcate the
information that way because the data flow in each case has different
characteristics and demand. The user network has a smooth flow, generally
adopting a diurnal pattern as users in a geographic region work and sleep.
The performance of the user network also has higher standards, as users will
get impatient (or leave!) if services are slow. In the user-facing network
you also need every packet to arrive intact — customers would be pretty
unhappy if a key sentence in a document or e-mail was dropped.
The internal backbone, in contrast, has wild swings in demand — it is
“bursty” rather than steady. Google is in control of scheduling internal
traffic, but it faces difficulties in traffic engineering. Often Google has
to move many petabytes of data (indexes of the entire web, millions of backup
copies of user Gmail) from one place to another. When Google updates or
creates a new service, it wants it available worldwide in a timely fashion —
and it wants to be able to predict accurately how quickly the process will
“There’s a lot of data center to data center traffic that has different
business priorities,” says Stephen Stuart, a Google distinguished engineer
who specializes in infrastructure. “Figuring out the right thing to move out
of the way so that more important traffic could go through was a challenge.”
But Google found an answer in OpenFlow, an open source system jointly devised
by scientists at Stanford and the University of California at Berkeley.
Adopting an approach known as Software Defined Networking (SDN), OpenFlow
gives network operators a dramatically increased level of control by
separating the two functions of networking equipment: packet switching and
management. OpenFlow moves the control functions to servers, allowing for
more complexity, efficiency and flexibility.
“We were already going down that path, working on an inferior way of doing
software-defined networking,” says Hölzle. “But once we looked at OpenFlow,
it was clear that this was the way to go. Why invent your own if you don’t
Google became one of several organizations to sign on to the Open Networking
Foundation, which is devoted to promoting OpenFlow. (Other members include
Yahoo, Microsoft, Facebook, Verizon and Deutsche Telekom, and an innovative
startup called Nicira.) But none of the partners so far have announced any
implementation as extensive as Google’s.
Why is OpenFlow so advantageous to a company like Google? In the traditional
model you can think of routers as akin to taxicabs getting passengers from
one place to another. If a street is blocked, the taxi driver takes another
route — but the detour may be time-consuming. If the weather is lousy, the
taxi driver has to go slower. In short, the taxi driver will get you there,
but you don’t want to bet the house on your exact arrival time.
With the software-defined network Google has implemented, the taxi situation
no longer resembles the decentralized model of drivers making their own
decisions. Instead you have a system like the one envisioned when all cars
are autonomous, and can report their whereabouts and plans to some central
repository which also knows of weather conditions and aggregate traffic
information. Such a system doesn’t need independent taxi drivers, because the
system knows where the quickest routes are and what streets are blocked, and
can set an ideal route from the outset. The system knows all the conditions
and can institute a more sophisticated set of rules that determines how the
taxis proceed, and even figure whether some taxis should stay in their
garages while fire trucks pass.
Therefore, operators can slate trips with confidence that everyone will get
to their destinations in the shortest times, and precisely on schedule.
Continue reading ‘Going With The Flow: Google’s Secret Switch To The Next
Wave Of Networking‘ …
Making Google’s entire internal network work with SDN thus provides all sorts
of advantages. In planning big data moves, Google can simulate everything
offline with pinpoint accuracy, without having to access a single networking
switch. Products can be rolled out more quickly. And since “the control
plane” is the element in routers that most often needs updating, networking
equipment is simpler and enduring, requiring less labor to service.
Most important, the move makes network management much easier. By early this
year, all of Google’s internal network was running on OpenFlow. ‘Soon we will
able to get very close to 100 percent utilization of our network,’ Hölzle
“You have all those multiple devices on a network but you’re not really
interested in the devices — you’re interested in the fabric, and the
functions the network performs for you,” says Hölzle. “Now we don’t have to
worry about those devices — we manage the network as an overall thing. The
network just sort of understands.”
The routers Google built to accommodate OpenFlow on what it is calling “the
G-Scale Network” probably did not mark not the company’s first effort in
making such devices. (One former Google employee has told Wired’s Cade Metz
that the company was designing its own equipment as early as 2005. Google
hasn’t confirmed this, but its job postings in the field over the past few
years have provided plenty of evidence of such activities.) With SDN, though,
Google absolutely had to go its own way in that regard.
“In 2010, when we were seriously starting the project, you could not buy any
piece of equipment that was even remotely suitable for this task,” says
Hotzle. “It was not an option.”
The process was conducted, naturally, with stealth — even the academics who
were Google’s closest collaborators in hammering out the OpenFlow standards
weren’t briefed on the extent of the implementation. In early 2010, Google
established its first SDN links, among its triangle of data centers in North
Carolina, South Carolina and Georgia. Then it began replacing the old
internal network with G-Scale machines and software — a tricky process since
everything had to be done without disrupting normal business operations.
As Hölzle explains in his speech, the method was to pre-deploy the equipment
at a site, take down half the site’s networking machines, and hook them up to
the new system. After testing to see if the upgrade worked, Google’s
engineers would then repeat the process for the remaining 50 percent of the
networking in the site. The process went briskly in Google’s data centers
around the world. By early this year, all of Google’s internal network was
running on OpenFlow.
Though Google says it’s too soon to get a measurement of the benefits, Hölzle
does confirm that they are considerable. “Soon we will able to get very close
to 100 percent utilization of our network,” he says. In other words, all the
lanes in Google’s humongous internal data highway can be occupied, with
information moving at top speed. The industry considers thirty or forty
percent utilization a reasonable payload — so this implementation is like
boosting network capacity two or three times. (This doesn’t apply to the
user-facing network, of course.)
Though Google has made a considerable investment in the transformation —
hundreds of engineers were involved, and the equipment itself (when design
and engineering expenses are considered) may cost more than buying vendor
equipment — Hölzle clearly thinks it’s worth it.
Hölzle doesn’t want people to make too big a deal of the confirmation that
Google is making its own networking switches — and he emphatically says that
it would be wrong to conclude that because of this announcement Google
intends to compete with Cisco and Juniper. “Our general philosophy is that
we’ll only build something ourselves if there’s an advantage to do it — which
means that we’re getting something we can’t get elsewhere.”
To Hölzle, this news is all about the new paradigm. He does acknowledge that
challenges still remain in the shift to SDN, but thinks they are all
surmountable. If SDN is widely adopted across the industry, that’s great for
Google, because virtually anything that happens to make the internet run more
efficiently is a boon for the company.
As for Cisco and Juniper, he hopes that as more big operations seek to adopt
OpenFlow, those networking manufacturers will design equipment that supports
it. If so, Hölzle says, Google will probably be a customer.
“That’s actually part of the reason for giving the talk and being open,” he
says. “To encourage the industry — hardware, software and ISP’s — to look
down this path and say, ‘I can benefit from this.’”
For proof, big players in networking can now look to Google. The search giant
claims that it’s already reaping benefits from its bet on the new revolution
in networking. Big time.
Steven Levy's deep dive into Google, In The Plex: How Google Thinks, Works
And Shapes Our Lives, was published in April, 2011. Steven also blogs at
StevenLevy.com. Check out Steve's Google+ Profile .
Read more by Steven Levy
Follow @StevenLevy on Twitter.