Why AI and Liquid Cooling Go Hand in Hand

Share This Post

“Our next-generation data center design will support our current products while enabling future generations of AI hardware for both training and inference. This new data center will be an AI-optimized design, supporting liquid-cooled AI hardware” — Meta announcement, May 18, 2023

We’ve all heard about AI taking over the world, in a thousand different ways. Whether we’re talking about artificial narrow intelligence, machine learning, or the latest source of excitement, generative AI, it’s clear that we’re now living in a world where AI is becoming a prominent part of our day to day lives. 

  • An overwhelming majority (95%) of companies now integrate AI-powered predictive analytics into their marketing strategy, including 44% who have indicated that they’ve integrated it into their strategy completely.
  • 43% of college students say they have experience using AI tools like ChatGPT. 
  • Autonomous artificial intelligence is being widely used in manufacturing and transport to accelerate production and distribution.

As AI adoption accelerates and new use cases emerge, those of us who design, build, and operate the underlying digital infrastructure are presented with new challenges.

What is it about AI that demands a different infrastructure?

Two issues come to mind. The first is architecture.

AI training infrastructure is very unlike cloud infrastructure, because AI training is a synchronous job — all elements in the cluster are working together, in concert, at high intensity, on a single service. The size of an AI training cluster is enormous — at least 5,000 servers, and probably more than 10,000 servers, each server with several CPUs and GPUs.

Since training is synchronous, the backend network, that connects all the servers in a cluster, is incredibly important. Issues with backend bandwidth and latency, whether caused by distance or a component failure, can seriously affect the performance of a cluster. So in order to maximize performance, AI infrastructure designers want to keep the servers as physically close to one another as possible, as physically long interconnections increase latency and are prone to being disconnected.

In other words, they want density. They want as many servers as possible in a rack. However, our industry, designed to support enterprise workloads and cloud, isn’t well-suited to provide them with the intensive density they demand. The Uptime Institute tells us that 75% of operators don’t support more than 20kW per rack.

AI training servers can consume up to 4kW…each. 

Five servers per rack = 20kW per rack. 10,000 servers / 5 servers per rack = 2,000 racks.

That’s too many racks, too far apart from one another, and that’s not good enough. Operators have to support greater densities to support AI. At a bare minimum, designers and operators will have to support more power distribution, utilization, heat generation, and heat removal, in each rack, to cope with the densities needed.

And that leads us to the second problem. Energy…and heat.

 

According to Digital Information World, “AI model training in data centers may consume up to three times the energy of regular cloud workloads, putting a strain on infrastructure.”

This isn’t a trivial problem. If AI consumes three times the power of cloud, it produces three times the heat — and today’s data centers, almost all air cooled, simply aren’t designed for the new levels of power distribution and heat removal we’ll need to support widespread AI adoption.

 

Right now, according to the stat above, at least 75% of operators are nowhere NEAR ready to support the density, energy, and heat demands of AI. They simply cannot do the job with today’s designs. The industry is waking up to the possibility that thousands of data centers are essentially obsolete, today.

 

But there are two other factors that will make matters even worse.

  1. Generative AI models are rapidly becoming multi-modal (training with, and using, and creating video, audio, speech, images, and text simultaneously). Moving past text-only generative AI to AI that can absorb, integrate, and generate rich content (music, audio, and video), we’ll see an exponential increase in the need for more servers, more GPUs, more storage capacity, and more bandwidth.
  2. We know that tomorrow’s servers will be hotter than before. The general-purpose processors will consume more power, each chassis will have more specialized silicon, and we’ll see designs pushing 6kW, 8kW, and beyond.

So what we have is a perfect storm. More demand for AI, more demanding AI, more density, more power, more heat…and more problems.

And our data center operators, struggling to support 20kW per rack, are being approached by enterprises, governments, and hyperscalers, and asked “Can you support densities of 80kW per rack?”

If the answer is no…the operators lose out on multi-million dollar sales.

What can the industry do to cope?

Liquid cooling is the only way forward. And it’s not just Meta saying so.

The server manufacturers see the need for liquid cooling. Charles Liang, CEO of Supermicro, recently wrote that “We hope and anticipate that 20 percent or more of worldwide datacenters will need to and will move to liquid cooling in the next several years to efficiently cool datacenters that use the latest AI server technology.”

And leading-edge data center providers see the need too. Equinix is already using liquid cooling for production workloads, and specialist data center companies, including Nautilus, are showing the world what can be done.

Today’s liquid cooling technologies, whether direct to chip or immersion cooling, are the only feasible technologies that address the density, energy, heat production, and growth of AI. Liquid cooling improves:

  1. Density. Today’s solutions can support as much as 80kW per rack, allowing AI infrastructure designers to support more servers per rack and shorter interconnects for each server in the cluster.
  2. Cooling system energy utilization. Water is 23 times more efficient at transporting air than heat. That added efficiency makes heat rejection much more energy efficient. Energy can be reallocated to additional servers or other hardware.
  3. Reliability. Liquid cooling reduces or eliminates hot spots in data centers, a principal cause of hardware failure. Failing components are a serious problem in AI training, so reduced failure rates are a substantial advantage.
  4. Footprint. Instead of needing 2,000 racks for an AI cluster, liquid cooling might cut the footprint to 500 racks. That reduction in floor space has implications for site selection. and overall data center utilization.
  5. Sustainability. Smaller sites, better energy efficiency, and reduced failure rates all contribute to sustainability.

 

It’s indisputable, liquid cooling IS better for AI. And though companies like Meta, Supermicro, and others have the engineering expertise to design their own liquid cooled data centers, not everyone does. Many enterprises and governments need a trusted partner to help them navigate the journey from air cooled to liquid cooled.

That’s where Nautilus comes in.

Our data center designs are liquid cooled native. Our data center cooling system uses any source of water, both freshwater and marine, to remove heat from the data center. We support all forms of liquid cooling inside the data center and have partnerships with a number of leaders. Our capacity to rapidly build liquid cooled data centers around the world, more efficiently than others, gives us a capacity to help organizations move forward faster into the world of AI.

More Recent Posts

Request a Quote

Do you want to make your data center as green as it is powerful? Send over your requirements and we’ll be back with you as soon as possible. 

If you have a general inquiry, please contact us here.

Schedule a Tour

Chad Romine

Chad Romine has over two decades of experience in technical and strategic business development. As Vice President of Business Development for Nautilus Data Technologies, Mr. Romine brings global connectivity to some of the most prominent global influencers in technology. Mr. Romine has led startups and under-performing companies to successful maturity built largely upon solid partnerships. Proven results in negotiating mutually beneficial strategic alliances and joint ventures. Outside of work, Chad has invested time fundraising for the American Cancer Society. Mr. Romine recently helped secure funding and led marketing for the completion of a new private University.

Ashley Sturm

Ashley Sturm is a marketing and strategy leader with more than 15 years of experience developing strategic marketing initiatives to increase brand affinity, shape the customer experience, and grow market share. As the Vice President of Marketing at Nautilus Data Technologies, Ashley is responsible for all global marketing initiatives; she integrates the corporate strategy, marketing, branding, and customer experience to best serve clients and produce real business results. Before joining Nautilus Data Technologies, she served as the Senior Director of Marketing Brand and Content for NTT Global Data Centers Americas, spearheading marketing efforts to open two out of six data center campuses. Prior to NTT, Ashley led global marketing through the startup of Vertiv’s Global Data Center Solutions business unit, where she developed the unit’s foundational messaging and established global and regional marketing teams. Ashley’s career experience includes extensive work with the US Navy through the Clearinghouse for Military Family Readiness as well as broadcast journalism. Ashley earned a bachelor’s degree in journalism with an emphasis in converged media from the University of Missouri’s School of Journalism.

Paul Royere

Paul Royere is Vice President of Finance and Administration at Nautilus Data Technologies. For more than twenty years, he has specialized in finance and administration leadership for emerging technology companies, guiding them through high growth commercialization. In addition to senior team roles guiding strategic business operations, Mr. Royere has directed cross-functional teams in implementing business support systems, designing and measuring business plan performance, leading pre/post-merger activities, and delivering requisite corporate, tax and audit compliance.

While at 365 Data Centers, Mr. Royere served as Vice President of Finance leading a multi-discipline restructuring in preparation for the successful sale of seventeen data centers. As Vice President and Corporate Controller at Reliance Globalcom, Royere led the finance and business support teams to and through the conversion from a privately held company to a subsidiary of an international public conglomerate.

Arnold Magcale

Arnold Magcale is founder and Chief Technology Officer of Nautilus Data Technologies. As a recognized leader and respected visionary in the technology industry, he specializes in data center infrastructure, high-availability networks, cloud design, and Software as a Service (SaaS) Technology.

While serving on the management team of Exodus Communications, he launched one of Silicon Valley’s first data centers. Mr. Magcale’s background includes executive positions at Motorola Mobility, where his team deployed the first global Droid devices, and LinkSource Technologies and The Quantum Capital Fund, serving as Chief Technology Officer. He was an early adopter and implementer of Cloud Computing and a member of the team at Danger, Inc., acquired by Microsoft.


Mr. Magcale had a distinguished ten year career in the United States Navy Special Forces. His military and maritime expertise provided the foundation for inventing the world’s first commercial waterborne data center.

Patrick Quirk

Patrick Quirk is a business and technology executive who specializes in operations management, strategic partnerships, and technology leadership in data center, telecommunications, software, and semiconductor markets. Prior to joining Nautilus, he spent the past year working with small businesses and non-profits on survival and growth strategies in addition to PE advisory roles for critical infrastructure acquisitions. Quirk was the President of Avocent Corp, a subsidiary of Vertiv, the Vice President and General Manager for the IT Systems business, and the VP/GM of Converged Systems at Emerson Network Power, providing data center management infrastructure for data center IT, power, and thermal management products. He has held numerous global leadership roles in startups and large multinational companies including LSI and Motorola in the networking and semiconductor markets.

Rob Pfleging

Most recently, Rob was the Senior Vice President of Global Solutions at Vertiv Co, formerly Emerson Network Power. Vertiv Co is an international company that designs, develops and maintains critical infrastructures that run vital applications in data centers, communication networks and commercial and industrial facilities. Rob was responsible for the global solutions line of business at ​​Vertiv, which serves the Americas, Europe and Asia. Prior to Vertiv, Rob was the Vice President of Expansion and Innovation, Datacenter Engineering at CenturyLink, where he was responsible for 55 datacenters across North America, Europe and Asia. Before working for CenturyLink, Rob was the Executive Director of Computer/Data Center Operations at Mercy, where he led datacenter engineering and operations, desktop field services, call center services, and asset management and logistics for more than 40 hospitals. Before fulfilling this mission at Mercy, Rob held various engineering management and sales positions at Schneider Electric. Rob Pfleging additionally served for 6 years in the United States Marine Corps.

James Connaughton

James Connaughton is a globally distinguished energy, environment, technology expert, as both corporate leader and White House policymaker. Mr. Connaughton is the CEO of Nautilus Data Technologies, a high-performance, ultra-efficient, and sustainable data center infrastructure company powered by its proprietary water-cooling system. Before joining Nautilus Data Technologies, he served as Executive Vice President of C3.ai, a leading enterprise AI software provider for accelerating digital transformation.

From 2009-2013, Mr. Connaughton was Executive Vice President and a member of the Management Committee of Exelon and Constellation Energy, two of America’s cleanest, competitive suppliers of electricity, natural gas, and energy services. In 2001, Mr. Connaughton was unanimously confirmed by the US Senate to serve as Chairman of the White House Council on Environmental Quality. He served as President George W. Bush’s senior advisor on energy, environment, and natural resources, and as Director of the White House Office of Environmental Policy. During his eight-year service, Mr. Connaughton worked closely with the President, the Cabinet, and the Congress to develop and implement energy, environment, natural resource, and climate change policies. An avid ocean conservationist, Mr. Connaughton helped establish four of the largest and most ecologically diverse marine resource conservation areas in the world.

Mr. Connaughton is a member of the Advisory Board of the ClearPath Foundation and serves as an Advisor to X (Google’s Moonshot Factory) and Shine Technologies, a medical and commercial isotope company. He is also a member of the Board of Directors at the Resources for the Future and a member of the Advisory Boards at Yale’s Center on Environmental Law and Policy and Columbia’s Global Center on Energy Policy.