If you’re a data center provider, it goes without saying that your operations team has two fundamental responsibilities:
- To monitor and manage the physical data center infrastructure — servers, storage, and networking.
- To monitor and manage the building infrastructure, including power and cooling technologies.
The ultimate goal of both responsibilities is efficiency optimization in the broadest sense. After all, Operations teams work to maximize cooling and rack space utilization, ensure uptime, enhance application performance, and all the other metrics you’d expect in the data center.
And once upon a time, they drove efficiencies using the principles and tools that fell under an umbrella called data center infrastructure management, or DCIM. Our friends at TechTarget say:
Data center infrastructure management (DCIM) is the convergence of IT and building facilities functions within an organization. The goal of a DCIM initiative is to provide administrators with a holistic view of a data center’s performance so that energy, equipment, and floor space are used as efficiently as possible.
DCIM makes sense, intuitively — it’s about centralizing insight and control over anything in the data center. DCIM lets organizations use energy, equipment, and floor space efficiently, which is of paramount importance — they all cost money. By managing them carefully, organizations can reduce operational costs and hopefully keep tenants happy.
It seems as though DCIM ought to be a universal concept that’s an ongoing area of innovation. But funnily enough, you don’t hear much about DCIM anymore. Even the TechTarget article was last updated in 2013. In a sense, it feels as though it’s a defunct concept.
Why is that? After all, we still need data center performance, efficiency, and reliability enhancements.
So why has DCIM apparently gone away?
The reality in today’s data centers, especially in colocation facilities, is simply that the data center is so complex, the IT infrastructure in it is so complex, and the workloads are so variable, the old efforts to unify command and control for both building functions and IT operations simply can’t provide compelling value. It’s just too hard for operations teams to deal with the constantly shifting demands of today’s mix of legacy and leading-edge infrastructure from a single tool. Most organizations have a collection of tools — some provided by infrastructure manufacturers, some that span a collection of different vendors, but the promise of DCIM seems to have been left by the wayside.
Some software vendors have tried to help. They would tell you that their management solutions, based on analytics, give operations teams the tools they need to succeed. But at best, today’s management (sometimes still called DCIM) gives you a look at the now. It’s not predictive, it doesn’t offer a view of future problems, it doesn’t pick up on patterns, and it doesn’t make recommendations for future enhancements.
To make matters worse, as data centers grow, as efficiency matters more and more, even the best analytics and dashboards can’t solve the limitations of human intervention. Human response times and the ability to multitask are limited, human error introduces substantial problems, and sourcing the right skill sets, in the right locations, at the right time, for the right cost is a challenge most organizations struggle to solve.
So we live in a world today where even the best infrastructure management tools are limited in their scope, their ability to provide future insight and reduce failures, and the value they bring to data center operations.
That’s why, at Nautilus, we see the need for something new. Whether you just call it autonomous management, a DCIM, or simply an autonomous data center, the ground rules are the same. As a data center operator and innovator, we think that there’s a real need for a next-generation approach to data center management, one that could be, and should be:
- Powered by a facility-wide AI/ML platform built on predictive analytics.
- Tied into everything, from virtual machines and containers to chillers and power distribution, through APIs and ubiquitous sensor data
- Able to learn not only from sensor data but also from habitual user interventions
- Gaining insight from other, similarly equipped data centers.
- Delivering command and control across all data center functions, from power and cooling to server performance and network bandwidth.
- Based on open source technologies to facilitate widespread adoption
- Self-optimizing and self-healing.
Let’s imagine for a moment that data center providers had a centralized management platform based on these principles, or, to look at it another way, with autonomous data centers, what could happen?
In today’s data centers, outages are usually caused by human interference and error. With autonomous, self-healing technology, data centers can shorten mean-time-to-repair, reducing downtime.
The autonomous data center could identify patterns before there’s a problem. It could recognize, for example, increasing demand on servers due to eCommerce applications being used during the holidays, and compensate by activating dormant servers and chillers to provide additional application performance.
This capability opens up substantial opportunities for data center providers who wish to build data centers in areas that are difficult to service, in edge deployments, and in unusual environments like the Space Station.
It would also offer company-wise, and society-wide advantages by reducing power consumption. Unnecessary infrastructure could be automatically powered down as needed, systems could turn on and off based on environmental changes, and new deployments could be automatically optimized before they go into a rack.
However, there’s a problem.
We know of NO organizations that are thinking along these lines. We’re familiar with the state of the art in the industry, and so far, no vendor is able to deliver on these concepts. That’s frustrating because server vendors ARE doing this work, network vendors have self-healing capabilities in hardware and software, but there’s nobody thinking along these lines at the data center level.
But the potential is there. The value is obvious. And the opportunity is one that’s hard to pass up.