Advanced Computing in the Age of AI | Tuesday, March 19, 2024

Study Probes Ways to Avoid Datacenter Outages 

As recent outages at Verizon Cloud and Facebook illustrate, datacenter operators can catch plenty of flack when their infrastructure goes off-line—for a few hours or a weekend. A cottage industry actually tracks outages and ranks cloud service providers.

Hence, a lot of thought is being given to better ways of monitoring datacenter operations to avoid outages and improve availability. As the cost of building and operating datacenters grows, these analyses are also stressing efficiencies that can lower operating costs while maximizing the return on growing investments in datacenters and cloud infrastructure.

A white paper released by asset management and power monitoring specialist RF Code, Austin, Texas, argues that real-time monitoring of datacenters along with capacity planning and predictive analysis techniques are among the best ways to keep datacenters up and running while squeezing maximum capacity out of IT infrastructure.

The study cites a datacenter monitoring program implemented by cloud and telecommunications giant CenturyLink, which operates 55 datacenters around the world. Those datacenters wracked up an electricity bill of more than $80 million in 2011. A real-time monitoring pilot project allowed operators to safely bump up air temperatures without reducing datacenter availability. According to the RF Code white paper, the pilot study identified $2.9 million in "potential annual savings" across CenturyLink's datacenter business.

The study also touts an energy efficiency approach called "power proportional computing," that is, matching power supply to computing demand. Still, it found that few datacenter operators leverage dynamic provisioning technologies or the power capping features built into many servers. One reason is risk: "Without real-time monitoring and management, raising inlet air temperature increases the risk of equipment failure," the white paper notes.

"Without a detailed understanding of the relationship between compute demand and power dynamics in the datacenter, power capping increases the risk that the processing capacity won’t be available when required."

Adds the report: "Availability trumps savings. Availability trumps all."

The report also cites survey results from a recent Data Center User's Group in which 70 percent of the respondents reported that their average power density was between 2 to 8 kW per rack. Average density is expected to increase to 4 to 16 kW per rack in the next two years. Higher density means more heat, higher cooling requirements and increased power requirements.

"The only way to maintain continuous availability in high density deployments is real-time monitoring and granular control of the physical infrastructure," the white paper argues.

Finally, the RF Code white paper makes the case that predictive analytics should be integrated into datacenter operations as a way to leverage other tools like real-time monitoring, asset management and capacity planning. Predictive analysis of data collected by real-time monitoring and asset management systems "enables more integrated, autonomous operation of the datacenter and informs decision making throughout the organization," the study contends.

RF Code counts among its customers large datacenter operators and enterprises like Bank of America, Dell, GE and Hewlett-Packard.

Download the datacenter white paper here.

EnterpriseAI