Data Center Headaches: Monitoring Power and Cooling to Reduce Downtime
May 26, 2017

When it comes to data center management, downtime, power, and cooling are consistently among the most common problems facing modern data center professionals. Why link these three issues together? Because ineffective power management and environmental cooling are two of the biggest contributors to downtime.

In fact, balancing power and environmental cooling in your data center is critical to reducing downtime. Data Center Infrastructure Management (DCIM) software can help you achieve this goal by not only monitoring your environment but also accelerating data analysis, leading to better, more efficient data center energy management.

Power and Environmental Cooling Risks in Your Data Center

Your data center is an enclosed environment where servers, intelligent rack PDUs, and other devices consume power and produce heat. The data center cooling system removes the hot air from cabinets via exhaust vents, while cold air (chilled by Computer Room Air Conditioning, or CRAC, units) is typically pushed under the floor and pulled in through cold air intakes. Meanwhile, the hot air is returned to the CRACs for conditioning.

The many different variables make maintaining this delicate balance of power and cooling a headache for data center managers. DCIM or data center monitoring tools can help you track and manage power and environmental changes in your data center, so you can keep your power loads under control while also ensuring that your environmental indicators remain within acceptable thresholds, such as manufacturer or ASHRAE guidelines.

Power and Cooling Use Cases for Data Center Monitoring

The following use cases illustrate how data center monitoring can help you reduce downtime resulting from power or cooling:

Managing Inlet Air Temperature and Humidity

Cabinet inlet air is a key concern because it’s going through your cabinets to reduce heat. Air that is too hot won’t allow for proper cooling, while air that is too humid can cause corrosion and damage equipment. Air that is too dry could cause static electrical discharge.

DCIM or data center monitoring software can collect data on the temperature and humidity of the air going into your cabinets and display this data in easy-to-read data visualizations so you can spot trends. Overlaying this information on a data center floor map can even help you predict and identify hotspot formation.

Increasing Data Center Temperature as a Measure of Efficiency

As we’ve seen with the continued focus on Power Usage Effectiveness (PUE), energy efficiency remains top of mind for data center managers. Keeping your data center temperature comparatively high can be considered a measure of efficiency, but you also run the risk of overheating.

DCIM or data center monitoring tools can send trap notifications that alert you when temperatures are outside of thresholds, giving you the peace of mind to increase temperature without risking the safety of your equipment. On the flip side, DCIM software can also help you avoid overcooling your data center, which helps you better manage energy costs and increase savings.

Note: Even something as routine as a firmware update could experience power and environmental issues that could lead to extended downtime, as Microsoft experienced in 2013 when a temperature spike caused a 16-hour outage. In a case like this where downtime may have been unavoidable, DCIM software may have been leveraged to decrease downtime by accelerating the troubleshooting process and allowing the data center team to more rapidly identify and address the problem.

Ensuring Power Redundancy to Proactively Maintain Uptime

With technology trends like AI and GPU computing hardware on the rise, cabinets in modern data centers are more densely packed with power-hungry hardware than ever before. As a result, data center teams are under pressure to deliver increasing amounts of power to these devices. If you’re not managing your power capacity and temperature, you could experience unexpected downtime either from overloading or overheating.

DCIM or data center monitoring tools provide dashboards and reports to help you trend and visualize capacity so you’ll be able to forecast when you’re about to run out. Additionally, a failover simulation report enables you to identify at-risk cabinets and determine if your equipment can continue to function if one PDU goes down. 

By combining data center energy management with effective environmental monitoring, you can maintain the balance of power and cooling in your data center. Using DCIM software to keep track of power and cooling gives you data you can analyze and explore to better manage your data center environment. Armed with this information and a comprehensive DCIM solution, you’ll be able to not only address critical power and environmental reasons of downtime but also limit instances of the biggest cause of downtime: human error.

Want to see for yourself how DCIM can do this and more? Take a test drive today.

Stay updated on new Blogs

Other Recent Posts

Share the wealth.