Edge data centers—small, mission-critical facilities located near the customers they serve that connect to a larger central data center or multiple data centers—are being utilized in increasing numbers as organizations seek high-performance and cost-effective ways to reduce latency. Edge data centers have become critical in delivering anywhere, anytime access to applications, services, and data that today’s end users demand. Latency is no longer tolerable with big data, the Internet of Things (IoT), cloud and streaming services, and other technology trends.
Due to the nature of the edge, data center managers often struggle with managing their edge infrastructure. Some of the challenges of edge infrastructure management include the complexity of managing many remote sites, a lack of on-site personnel, tracking equipment and configuration, site infrastructure monitoring, site and equipment security, and equipment maintenance.
Fortunately, with the right remote management tools, you can reduce latency, maintain availability and uptime, and achieve optimal performance while decreasing costs.
In our recent webinar titled “Implementing Edge Infrastructure Management – 5 Best Practices,” data center experts Michael Piers, Sr Manager DCIM/Tools, Comcast and Michael Garito, Sr Data Center Program Manager, Akamai joined James Cerwinski, VP Product Management, Sunbird Software to discuss their own real-world use cases of how they leverage Sunbird’s DCIM solution to manage their edge sites.
5 Best Practices for Edge Infrastructure Management
1. Monitor power and environmental trends.
You should leverage an enterprise polling engine that collects all the power and environmental data from your edge sites, retains that data for long periods of time, and turns it into actionable information in the form of easy-to-use, flexible charting capabilities such as tracking your locations’ active power over time.
Akamai’s Michael Garito shared an example of how trending power data is valuable to him. Most of their equipment is custom-built by their internal hardware teams and they perform extensive benchmark tests to determine how it will perform in production. When they deployed Data Center Infrastructure Management (DCIM) software, Garito says, “It was eye opening to find that over the course of the day, even though we had numbers on what we believed we would use, the reality versus what we planned didn’t quite align.”
“In one data center in particular, over the course of the day our power demand from low-usage periods to high-usage periods could swing by as much as 200kw over the course of the day,” Garito continues. “Being able to see this swing and use this information both to prevent potential problems like tripping breakers as well as identify areas where we’ve overbudgeted power so we can start to backfill where applicable is very valuable information and we look forward to pushing this out to more locations.”
Garito also monitors and trends temperature data to reduce Akamai’s environmental footprint and energy cost.
“Being able to monitor the interior temperature over time allows us to make better decisions on when to pull in outside air versus recirculate inside air,” Garito says. “Having all this data readily available and being able to track it over periods of time allows us to plan not just the day-to-day operations but also start to get into long-term planning so we can start planning over the course of seasons and allocate resources accordingly.”
2. Monitor power loads.
Deploying equipment at the edge is expensive, and you want to put as much equipment as you can in the space you lease from a third-party. Therefore, monitoring power loads is important not just to understand the overall trend of your power consumption, but to know the exact load you need to budget for each compute device you deploy.
Comcast leverages the Auto Power Budget feature in second-generation DCIM that collects massive amounts of data for every compute device deployed, and using user-configurable policies, it calculates a highly accurate budget value to use when deploying each make/model instance. Each instance gets its own budget, allowing you to deploy much more compute capacity into the given space you rent.
“This is a great feature that we take advantage of,” says Michael Piers of Comcast. “In the past, what we were doing was a very manual process and we were looking at the nameplate of each power supply and taking some arbitrary number, say 60% of that... What we’ve found using this tool and actually getting the real-time numbers from Power IQ into dcTrack is that we’ve got some devices that, instead of this 60% we were using, might only be using 38%… and we are able to put a whole lot more devices out there in the space. Utilizing that stranded power is pretty massive when you’re already paying for it. For us, this is a really huge benefit of using the tool.”
3. Monitor the health of your edge sites.
Edge data centers are mission critical, and you need to monitor the uptime of them to ensure that services continue to run. You need a holistic view—such as an enterprise health dashboard showing real-time power and environmental health and events for all edge sites in a single pane of glass—with the ability to drill down for granular, cabinet-level metrics. In such a dashboard, easy-to-understand red/green/yellow color-coding indicates the status of your sites in terms of potential incidents or capacity restraints that can become problems, such as critical events, hottest rack, number of recent additions and commissions, and the current power load.
“The simple color-coding of the tiles is one of my favorite features because red is bad, green is good, and yellow indicates that it can usually wait until after lunch,” says Akamai’s Garito.
“We had a rack recently that went into alarm for drawing 9.3kw and it’s a 17kw rack,” Garito continues. “The problem was that there’s two PDUs, 9.6kw each, which is more than enough power. The issue is that the team onsite did not properly load balance the servers between the two rack PDUs. This is the kind of thing that goes through QA just fine because when it goes through QA it’s not drawing enough power to trip the alarm, but once it hits production and actually gets that traffic load, those kind of alarms come to light. Being able to quickly identify the problem and quickly sort out all the loads on one PDU versus the other, those are the kinds of things that enable rapid response before problems escalate.”
4. Monitor power and cooling capacity.
Increasing power capacity or cooling capacity often has a lead time, so you need to know exactly what your current capacity is and when you will run out. Leverage a DCIM solution with zero-configuration dashboard widgets such as capacity gauges by site for an accurate view of where you are. Configure your gauges with your own red/yellow/green thresholds to easily see if you have capacity at each site, need additional capacity, or have run out of capacity anywhere.
“Planning when you need to add power or need to add cooling, and the correlation between the two—one watt being 3.41 BTUs—is something we look at and are able to understand and better utilize our space and power, and more efficiently take advantage of those wonderful assets,” says Piers. “This tool definitely helps us understand where we are and where we’re going.”
5. Track all assets across all sites.
Most organizations with edge sites have complex deployments involving multiple locations and business applications. Maintaining an accurate inventory of all equipment across every site requires real-time views of cabinet contents, infrastructure devices, and cabling. With DCIM software, you can look at multiple cabinets from separate locations side-by-side and see high-fidelity front and back images of what’s in the cabinet. Automatically drawn rack elevations can be shared with coworkers so they see the same view as you, with respect to role-based access control.
“Prior to starting on DCIM, everything goes into a database. But, the thing about databases is that databases are largely populated by people, which tend to be the weak link in keeping track of these things. With hundreds of thousands of assets, it’s a lot to keep track of,” says Garito. “In addition to that, a lot of our projects span multiple racks so you might have a cluster of equipment in adjacent racks or a row or two apart, and those clusters might feed back up to an aggregation layer in yet another rack. Being able to quickly and visually lay that information out dramatically improves both planning and troubleshooting.”
BONUS: Remotely visualize your edge sites.
Modern edge infrastructure management software will provide remote visualization that is better than being there. You can see any site and get a true floorplan view as if you were there, but can also bring in various information such as available RUs, front temperature, and measured amps to quickly see the health of a site and where you can deploy equipment. You can also isolate on a row and overlay data like actual power load, budget load, and environmental sensor data. You can even look above the cabinets and under the floor.
With remote visualization, you can also turn data into easy-to-understand information with color-coding. For example, color-code your cabinets and contents by any asset attribute such as customer and see who owns which cabinets and which devices. Knowing what you have, who owns it, and the SLAs on the equipment is key to effectively managing your edge sites.
“For troubleshooting, being able to quickly identify everything visually is often key,” says Garito. “Because we have so many different devices, different server models, different variations of server models, and we track all this information, being able to rapidly identify—not just prepare a list—but identify where in space all the servers of a certain model that was deemed to have a defective parts… and pull them out of production before an issue occurs.”
Bringing It All Together
Edge infrastructure management can be a challenge due to a lack of on-site personnel and no visibility into what’s happening, but with DCIM software, you can simplify edge infrastructure management and improve uptime, efficiency, and productivity.
Whether you have just learned about DCIM software or are already a veteran user, these best practices and tips from the industry’s best edge data center professionals can help you overcome the challenges of managing edge sites. Your new knowledge, when paired with DCIM software, will drive smarter, more effective edge infrastructure management.