Data Center Monitoring: Effective Ways to Mitigate Human Error and Reduce Downtime
June 08, 2018

Downtime continues to be one of the biggest data center management headaches faced by modern data center professionals. For owner-operated and colocation data centers alike, an unplanned outage can wreak havoc on business-critical systems and applications and result in unhappy customers. Additionally, downtime can be expensive, with the average cost of an unplanned outage reaching over $740,000.

Avoiding unplanned downtime starts with addressing one of its chief causes: human error. According to the Ponemon Institute, humans accounted for 22 percent of unplanned data center outages. While data center training and certification programs can help to educate data center staff to reduce the risk of human error, how you manage the devices and environment in your facility is equally important.

That’s where data center monitoring can make a difference. Data Center Infrastructure Management (DCIM) software can help you track the power, environmental, and security status of the items in your data center and provides information to decrease the chance of downtime due to human error.

Consider these four common use cases for data center monitoring:

1. Health polling

Ensuring that your intelligent PDUs, UPSs, and other devices in your data center are operational and accessible via your network is critical to maintaining the health of your data center. All too often, however, a PDU or other piece of equipment can go down without you and your team being aware of it. A technician or engineer may accidentally place a PDU into maintenance mode, forget to power on newly provisioned resources, or even connect equipment using incorrect cabling or ports.

DCIM software and other data center monitoring tools can limit the possibility of outages due to malfunctioning hardware through health polling. The data center software polls intelligent PDUs or other equipment at user-configurable intervals to ensure that it can be reached. If it can’t be reached, then the software immediately sends an alert so you’ll be the first to know about a potential issue in your data center.

2. Data center monitoring thresholds, alerts, and reports

Unexpected downtime from either overloading or overheating can occur if you’re not keeping an eye on your data center power management and environmental monitoring. It’s easy for data center staff to miscalculate or incorrectly set the budgeted power or the maximum temperature allotted for a cabinet, leading to inadequate cooling and inaccurate data center capacity management.

DCIM software provides thresholds, alerts, and data center business intelligence reporting capabilities to avoid overcooling and overcapacity situations. Configuring temperature and power thresholds allow you to set a comfortable range for your data center, while alerts and notifications warn you immediately if you’re getting too close to the limit. Real-time load monitoring for intelligent PDUs can also help you boost the productivity of your data center team by helping you react instantly to overcapacity issues. Data center power management reports give you data that you can slice and dice to better understand and visualize your power and temperature trends for foolproof data center capacity planning.

3. Power redundancy for failover situations

What happens in your data center when a PDU goes down or is overcapacity? Many data center teams are so focused on making the most of their existing resources and delaying capital expenditures that they may not realize that they have overloaded their cabinets until it’s too late.

Power redundancy in case of equipment failure is a simple yet effective component for any downtime reduction strategy. A failover simulation report enables you to identify at-risk cabinets and determine if your equipment can continue to function if one PDU goes down, without impacting the equipment. As a result, your team can make the appropriate changes to the loads of these at-risk cabinets before they become problems.

4. Security monitoring

Although data center power monitoring and environment management are among the most common use cases for DCIM software, data center security is becoming more prominent as threats to physical assets gain prevalence. Unauthorized access accounts for up to 18 percent of data center breaches. Whether this access is malicious or accidental, being aware of who has access to your data center is critical to safeguarding both your data and physical resources.

DCIM software and other data center tools can help you track who goes in and out of your data center. Data center software can monitor the contact closures sensors and door locks on your cabinets. It can also be used to manage your RFID cards and to assign permissions to certain doors to specific users. An automatic re-lock timer monitors how long a door has been locked and will re-lock it after a certain period, so you never need to worry about a technician forgetting to lock the door. Security and audit reports can show you who had access to different areas of your data center in case you need to conduct forensic analysis for an event.

Reducing the risk of downtime is key to keeping your data center up and running smoothly. When used in conjunction with intelligent PDUs, environmental sensors, and other instrumentation throughout your data center, data center monitoring can provide the checks and balances needed to mitigate the risks of human error and maintain uptime and availability.

Want to see for yourself how data center monitoring tools can reduce human error while increasing uptime? Test drive our industry-leading second generation DCIM software today.

Stay updated on new Blogs

Other Recent Posts

Share the wealth.