Lessons from 400,000 Unit Hours of Data Center Liquid Cooling

All News, Blog

July 8, 2025

Don’t Wait to Learn the Hard Way

The rise of AI has compressed years of infrastructure evolution into months—and cooling has become the bottleneck. Many operators are facing their first data center liquid cooling deployments under pressure, without the luxury of trial and error. At Nautilus, we’ve operated a 100% liquid-cooled AI facility for over four years. That’s more than 400,000 unit hours of hard-earned operational insight that has informed the development of our cooling products and how we approach working with our customers to help them implement and maintain cooling infrastructure. This isn’t a sales pitch. It’s a field guide. These are the patterns, oversights, and adjustments that only show up in real deployments—with real loads, tight SLAs, and customers who don’t care about your learning curve.

We’ve distilled what we’ve learned into five areas that matter most to operators.

1. Operational Experience Isn’t Optional—It’s Architecture

Most systems look fine until something goes wrong. What matters is how they behave under failure conditions, during unexpected workloads, or when someone makes a mistake. We’ve seen it all and design accordingly.

Our first generation of CDUs taught us what not to do with buffer tanks under transient gas loads. We didn’t predict how often customers would introduce air into the system—sometimes accidentally, sometimes by skipping protocol. So we optimized the air removal system in our CDUs. Not to perform in ideal conditions, but to hold up when the field throws the unexpected. Always expect the unexpected.

The biggest gap we see in the market isn’t just capacity, it’s also experience. And that gap shows up in the things no one thinks to ask until it’s too late.

2.Data Center Liquid Cooling Is Infrastructure. Treat It That Way.

A decade ago, cooling systems were designed as extensions of the IT rack. In-row CDUs, rear-door heat exchangers, modular units tucked into white space—these were logical choices for 5–10kW racks. But AI-scale loads have flipped that logic on its head.

The moment you have 50kW+ racks, cooling stops being an IT accessory and becomes a core component of the mechanical plant. Yet too many operators still treat it as tenant-specified gear, subject to preference rather than integrated design.

Here’s what we’ve learned:

Liquid cooling should be centralized. Not just to recover space in the white floor, but to improve serviceability, redundancy, and flow balancing.

You need to route your TCS like you route your power: with clear points of demarcation, monitored reliability, and failover logic.

The shift from “in-rack” to “in-facility” requires more than a new product—it requires rethinking roles and processes. Your customer’s IT tech should be tightly integrated with your mechanical system. Are your SOPs ready for that?

We didn’t get this right on the first try. But 400,000 unit hours of operations taught us what worked—and what caused chaos. This is why whether we’re talking infrastructure or CDUs, we approach the cooling conversation differently – because we’ve seen, experienced and know what works for efficiency, risk mitigation and scalability.

3. You’re Not Deploying CDUs. You’re Deploying Systems.

One of the biggest traps we see in this industry is the belief that a CDU is a standalone product. It’s not. It’s a node in a system.

Operators need to think in loops, not boxes. Pressure differentials, flow balancing, thermal cycling, pump curve tuning, degassing logic—these aren’t afterthoughts. They’re core to stability. And too often, they’re left to last-minute commissioning.

We’ve rewritten startup playbooks multiple times because our assumptions about field behavior didn’t match reality. We now train operators with engineers who’ve managed hundreds of startup hours—because good design still breaks without informed operation.

4. Retrofit Complexity Is the Rule, Not the Exception

No two facilities are alike. And very few are greenfield.

We’ve seen operators try to “value-engineer” liquid cooling systems into air-optimized spaces—only to end up rebuilding them later. If retrofitting is the best path forward for your situation, understanding the constraints and approaching it with experienced design discipline and strategy is important.

At Start Campus, we installed CDUs into a facility with no mechanical alley, overhead routing only, and an aggressive timeline. We pulled it off because we’d failed before—in smaller ways, in our own facilities – in-production R&D. Every layout now starts with: what happens if the raised floor is gone? What if the CDU is 100 feet from the load? What if the customer specs a pressure delta we didn’t plan for?

If your design doesn’t include these variables up front, you’re not ready for deployment.

5. Support Can’t Be a Phone Number. It Has to Be a Framework.

Everyone says they offer support. But what happens when a customer needs to hot-tap a live loop? When they introduce glycol unexpectedly? When an onsite tech opens the wrong valve?

Our support model includes:

Design engagement early in the project

Onsite commissioning by engineers who’ve run our own data center

Operational training built from actual failure scenarios

L3 escalation paths staffed by engineers—not customer service reps

We’re now building formal certifications, knowledge transfer programs, and third-party field training networks. Because the only way to scale expertise is to codify it.

We’re just getting started.

Because 400,000 hours wasn’t just time spent. It was time earned. It is invaluable experience that has driven our EcoCore product line and service offerings. AI is vastly changing how we cool data centers, and we’re here for it.

Lessons from 400,000 Unit Hours of Data Center Liquid Cooling

More Recent Posts

CDUs: Unsung Heroes of Sustainable Data Center Operations

Data Center Infrastructure Isn’t the Problem. It’s the Opportunity.

Brownfields: The Next Frontier for Data Centers and Industrial Innovation