24x7 Support Strategies

Hi,

I'm wondering how different organisations structure their 24x7 network operations? We are undergoing some restructuring here and it would be interesting for us to know how other large enterprises and service providers arrange this. We are particulary interested in service providers. (Currently we have an enterprise that is slowly morphing into more of a service provider setup). I'll summarise back to the list, after removing any identifying details.

These questions specifically refer to network staff, as opposed to any general Ops team.

Do you have 24x7 staff on site?
What level of technical ability do the on-site staff have?
What shift patterns do the 24x7 staff use?

Do you have a response time for on-call staff, by which time they must be VPN'ed into the network?
What level of techincal ability do the first line on-call staff have?
Do you have an official escalation system if the first-line on-call staff do not have the required techincal ability?
Do the staff on on-call escalation have a required response time, by which time they must be VPN'ed into the network?
Do the staff on on-call escalation rota the on-call responsibilities?
Do the on-call staff receive additional benefits or compensation for being on-call?

All,

Thanks for the replies that have started rolling in. They've made me realise I should have added an additional question for clarity.

Does anyone have any CCIE (or equivalent technical ability) staff on a 24x7 shift? What about CCIE level staff on an on-call rota with a garanteed response time? How about CCNP?

If people could also give an identication of the size of their organisation/network it would be useful.

Sam

Sam Stickland wrote:

This topic interests me very much, and I had a BOF about staff development
at the Montreal meeting in 1999. I remember some of the details, and, while
I am no longer generally doing course development, I have some pretty strong
ideas of what reasonably constitutes a proper training sandbox for a major
ISP.

If anyone would like to discuss this, pleae feel free to contact me offline.
If there's a use for a separate mailing list or summaries to NANOG, I'd be
happy to try to organize it.

Does anybody actually put any stock in the presence or absence of vendor certifications on a resume when judging the capabilities of an engineer?

There's no correlation between certification and capability, in my experience.

Joe

I doubt it maybe training companies!

A number of vendors have grades to meet. So X number of certified experts
mean better deals from said vendor.

Regards,
Neil.

Joe Abley wrote:

Does anyone have any CCIE (or equivalent technical ability) staff on a 24x7 shift? What about CCIE level staff on an on-call rota with a garanteed response time? How about CCNP?

Does anybody actually put any stock in the presence or absence of vendor certifications on a resume when judging the capabilities of an engineer?

There's no correlation between certification and capability, in my experience.

I fully agree with you Joe, but I needed to quantify the level of technical expertise somehow. I think most people have some kind of a feel for what level we are talking about if we say "equivalent techincal ability to CCIE", even if there are CCIEs out there who are useless :wink:

S

Even my recent experience says no here. I had a CCIE (written..!) in for an interview and, well, I am not sure how he managed to get CCIE written but he sure as hell didn't know much.

There are some useful qualifications for ISP potential employees that the LINX provide in conjunction with some training companies, these are good for NOC or junior engineers but at the end of the routing table there's no substitute for getting people who have had a few years to screw up other people's networks and learn what not to do..

I find that within 15 mins of any interview I have accurately judged the technical competence of most candidates...

Sam Stickland wrote:

There is no such thing as a "CCIE (written)". That would be kind of
like PhD ABD (All But Dissertation), but without the coursework, too.

Daniel J McDonald wrote:

[Sorry, I have a hosed copy of #*@! Outlook, which crashes whenever I tell
it to prefix earlier comments with >]

A related area that might well be worth revisiting is
cooling. IIRC, it was someone from Google, at the Intel
developer conference, who said that their power and HVAC
costs were rapidly approaching the cost of their servers. He
laid down a challenge for chipmakers to be more energy-efficient.

And what about all those diesel generators? How many of them are capable
of running on vegetable oil rather than diesel oil? I regularly walk
past a building in London that reeks because of the diesel fuel tanks in
the basement. You have to wonder about the safety of storing large
amounts of petroleum oil in the centers of major cities when vegetable
oil is safer, and more carbon-friendly.

But back to chips and heat generation. Has anyone instrumented some of
these servers (and their software) to figure out how much heat various
functions generate? A few months ago, I walked away from my desk to get
a cup of tea, and stretch my legs for a few minutes. I was away about 25
minutes and when I came back, my laptop had a scorched smell coming out
of it (probably dust on the cpu chip). I closed all my apps, shut it
down, waited 5 minutes to let it cool and restarted it. After a while, I
noticed the smell again, shortly after I restarted an Oracle client
install that I hadn't completed during the earlier incident. This
install wasn't completing because I didn't have the right information to
identify which database servers to connect to and it was spinning its
wheels. This particular application was generating so much heat that the
dust on the CPU chip was being scorched. I closed the app, and the smell
faded.

So, how do we know that the heat generated by all this software on all
these servers is actually providing any value at all to either the data
center owners or the server owners? In order to know this, we need to
measure what is generating the heat, then change the software to remove
bad behaviors. Since a lot of these servers are running open source
software, this is not as hard to implement as you might think. I suspect
that you would need to make special modifications to the hardware of a
server to install temperature and current measuring devices in key
locations and feed all this data into a separate machine for analysis.
Also, I expect that the embedded system industry with their experience
of building low-power consumption devices, might be able to help out.

--Michael Dillon

Didn't we discuss the need for standard water connectors not so long
ago? Water over Ethernet?

The new Liebert GX high density cooling gear is pretty slick. It uses a liquid which turns to a gas if the line is breached. It provides the advantages of liquid cooling, but without the hazards of having water leaks inside the datacenter. We aren't using it yet, but we did investigate it to get better cooling with the last AC upgrade/addition cycle. We stayed with conventional air cooling due to equipment footprint restrictions. Our current datacenter heat load is about 125-150W/sq. ft. and it is only rising.

-Robert

Tellurian Networks - Global Hosting Solutions Since 1995
http://www.tellurian.com | 888-TELLURIAN | 973-300-9211
"Well done is better than well said." - Benjamin Franklin

People are asking me to port a summary back to the list, but as I'm still getting replies coming in I'm going to leave this until tomorrow.

S

Sam Stickland wrote:

The vast majority of modern machines have working temperature sensors;
some just one or two (e.g. cpu and case temperature), others have a huge
range.

If you're unlucky and use an OS which doesn't provide access and
monitoring of these as part of the standard installation, third-party
software is usually available.

Non-CPU-bound servers can often benefit from enabling power-saving
features too (e.g. speedstep, powernow/cool'n'quiet). In some cases this
is very simple and offers significant reduction in power consumption and
heat generation.

But back to chips and heat generation. Has anyone
instrumented some of these servers (and their software) to
figure out how much heat various functions generate?

It seems that someone has done just that. A list member sent me a
private reply pointing me to http://www.linuxpowertop.org/

I still think that special hardware that can work with profiling data to
target specific blocks of code inside the process would also be useful
to identify best programming practices. However, many programmers debug
their code quite effectively without profilers by means of print
statements, and this powertop tool can be leveraged by a programmer with
source code available. If nothing else, code two variations of a program
and compare powertop results.

--Michael Dillon

Does anybody actually put any stock in the presence or absence of
vendor certifications on a resume when judging the
capabilities of an engineer?

No stock whatsoever. In fact when we advertise open positions we stress that experience and capability are FAR more important to us than certification.

There's no correlation between certification and capability, in my experience.

Certification is merely a secondary revenue stream for said vendors, period.

--chuck

And what about all those diesel generators? How many of them are capable
of running on vegetable oil rather than diesel oil? I regularly walk
past a building in London that reeks because of the diesel fuel tanks in
the basement. You have to wonder about the safety of storing large
amounts of petroleum oil in the centers of major cities when vegetable
oil is safer, and more carbon-friendly.

This is something I know about, as I home-brew fuel for my vehicles, and I operate a datacenter with backup power. :wink:

The issue with straight vegetable oil is that it must be pre-heated to >55°C to efficiently run in a Diesel engine without risk of injector or injector pump clogging. This is not exactly efficient for fail-over power generation as you would either need to build dual-tank and heating systems (still storing SOME petro-Diesel AND losing X% of your power generation facility to heating your fuel in the process... a LOT of electricity as most backup gensets have a LOT of fuel around to heat up... looking outside my office window I see two tanks, one 19000 liters, the other 30,000 liters in capacity.) Or you would need to mix that SVO with petroleum Diesel, to thin it enough to run risk free... negating your desire to rid yourself of petrochemical risk.

This is moot in exceptionally warm climates or exceptionally cloud-free areas as passive solar could be used to store SVO at high temps. VO-based fuels are more likely to gel at lower temps than petro-fuels. Ask me how I know!

Finally, vegetable oil has long term storage issues of the organic sort, where it can become contaminated with algae, bacteria, etc.

BioDiesel, that is vegetable oil that has been chemically stripped of glycerines through transesterfication is a better suggestion to replace petro-Diesel in generators. It has cold weather and storage issues too, but generally performs better than SVO.

--chuck

Not to go too far off-topic, but it’s very true that the best thing to use is straight petrol diesel for your redundant power systems at a datacenter. No fun telling a client power went out for 12 hours because your fuel supply had gelled due to the low overnight temps (depending on location of course). As the price of petrol fuel supplies slowly moves upward due to demand from China and India, I foresee datacenters moving away from diesel generators as backup power sources towards fuel cells/generators that can burn natural gas and hydrogen.

With that said, climates such as Brazil’s would be perfect to use generators burning ethanol for backup power (also helped by the large ethanol distribution infrastructure in place there).

-brandon