The AI factory data center age is here and it brings new challenges and opportunities. The Intel X8-based server era for enterprise and cloud computers spanned roughly three decades with very slight changes in power draw. The GPU-based accelerated compute in AI factories is only beginning, but one thing is for certain: every year presents new challenges as the GPU evolution is on a one-year cycle with power draw from each new GPU generation roughly doubling every year.
As thermal demands grow, liquid cooling is not a choice but a necessity. Liquid cooling unlocks performance and sustainability benefits while enabling the move to rapidly increasing kilowatts per rack. However, GPU evolutions are moving so fast and high, they are challenging the power and cooling solutions of today and new innovations are needed to keep pace.
Why AI factory cooling is fundamentally different
Let’s start with the fundamental differences between a legacy cloud data center and an AI factory when it comes to cooling.
- AI factories process massive amounts of data simultaneously and output tokens vs general purpose processing and storing plus transactional workloads like ERP and CRM systems in a cloud data center.
- AI factories are filled with accelerated compute GPU-based AI servers vs general purpose CPU servers in cloud data centers.
- The white space for compute dominates the real estate for a cloud data center but the grey space dominates the scale of the physical space in an AI factory.
- The power densities per rack for a cloud data center are around 5-20kW per IT rack while the latest AI factory rack densities are 227kW per IT rack, roughly doubling every year and expected to be over 1MW per rack in two to three years.
- Cloud data centers are primarily air cooled, while AI factories are becoming primarily liquid cooled and have been since densities rose over 75kW per IT rack.
Density rising every year makes future proofing tough
In cloud data centers, most buildings with facility level power and cooling systems could accommodate three to five IT refresh cycles every three to seven years. The chillers and heat rejection may have had to be slightly oversized to account for this, usually 20-50 percent.
AI factories on the other hand would require roughly 100 percent upsizing of the chillers and heat rejection to accommodate only one IT refresh cycle. For example, Blackwell GB200 NVL 72s in 2025 running at 132kW per IT rack, if you were to upgrade those to Vera Rubin NVL 72 would require up to 227kW per IT rack.
What cooling is used in AI factories?
A liquid-cooled AI factory introduces a new thermal architecture compared to air-cooled environments.The main elements in liquid-cooled accelerated compute include:
- Liquid cooled servers that are mainly direct-to-chip cooled
- In-rack manifolds to evenly distribute liquid/coolant to multiple servers via the IT fluid loop
- Cooling Distribution Unit (CDUs) – the main interface between the IT fluid loop and the facility cooling loop to provide temperature control, flow control, pressure control, fluid treatment, heat exchange, and isolation
- A facility heat rejection system that rejects the heat from the entire data center to the outside, often via chillers or free coolers with compression assist via the facility cooling loop.
Hybrid cooling reality: Why AI factories still need air and liquid cooling
Not everything in an accelerated compute AI cluster is direct-to-chip liquid cooled. In our Vera Rubin reference design 113, the facility cooling design features a dual path piping system. For air cooling, a chilled water loop integrates chillers, with free cooling capabilities, to deliver 68°F chilled water to giant fan walls in N+1 configuration.
This lower temperature water loop handles the air cooling needs of the data center. The data hall features up to 64 227kW liquid cooled IT racks, 24 20kW air-cooled networking racks, and ten 40kW air-cooled networking racks. The liquid-cooled racks remove 96 percent of the heat via liquid while four percent require air. Roughly a third of the IT racks in the AI cluster require air cooling – 34 racks.
Outdoor environment influences heat rejection choices for water and electricity use
Determining the best architectural method in an AI factory to expel the heat outdoors is highly dependent on the temperature and humidity of your environment and the amount of water and electricity use that is desired. As cooling is an architecture, heat rejection can be highly personalized.
- Cooling towers: These rely on the highly efficient principles of evaporation and water circulation (most water use, lowest electricity).
- Dry coolers (free coolers): These use outdoor air as the primary heat rejection from fluids (like water or glycol), saving energy and water. They operate via a closed loop, using fans to force ambient air over finned coils, keeping the fluid completely isolated from the outside environment (zero water use, very low electricity).
- Free cooler with adiabatic assist: When outside ambient temperatures rise, a fine mist of water is sprayed over the intake air to cool it before it hits the cooling coils (water and electricity use is dependent on the environment).
- Free cooler with compression assist: Compression assist (mechanical cooling) is only used when the ambient temperature rises (water and electricity use is again dependent on the environment).
- Standard low temperature chiller: This uses a vapor-compression refrigeration cycle to produce chilled water for air cooling (low-to-zero water use, high electricity).
- Standard high temperature chiller: Same as low temperature chiller but can be more efficient, used for liquid cooling (low-to-zero water use, electricity use varies with temperature set points).
- Turbocor chiller: This uses oil-free, magnetic bearing compressors with variable frequency drives (low-to-zero water use, more efficient than standard chiller, electricity use varies with temperature set points).
Again, liquid cooling can be configured to use very little if any water in a closed loop system.
Water temperatures and chillers
Chillers, which are used to refrigerate water, are widely used in both air-based and liquid-based cooling but they require a large amount of electricity to operate due to their mechanical compression.
The good news is that Rubin could be cooled with water at 45°C (around 113°F degrees). However, it doesn’t mean they will always be in an AI factory. The reality is that data center operators prefer to run their GPUs at lower temperatures due to SLAs (service level agreements), risk tolerance, and long-term reliability.
Water temperatures are starting to be raised by data center operators for GPUs. When more data is available, we may see 45°C operation as the standard, which could lead to ‘chiller optional’ data centers.
Most water-cooled chillers have built-in economizers called waterside economizers, which use a heat exchanger to leverage cool outside air (via a cooling tower or dry cooler) to chill the water, bypassing the chiller’s compressor for significant energy savings or free cooling.
Therefore, the only reason to go straight dry coolers is to lower initial CapEx. But then you will lose the ability to cool the GPUs in days when it is unseasonably warm outside. If a data center is in a very cold climate, say Norway or Finland, the majority of the time it’s cold enough – but not all the time.
Green Mountain Data Centers in Norway (Rjukan site) reports 330 days of free cooling annually by using cold air, taking advantage of low ambient temperatures for much of the year. However, that leaves 35 days when the data center needs mechanical assist to cool provided by chillers.
Optimization is critical
In today’s data centers, cooling systems represent the third-largest capital expense after electrical systems and IT equipment and around 40 percent of the OpEx or variable expense that is electricity. Electricity for cooling is an expense that should be minimized for profit and also to meet sustainability targets. Best practice is to deploy a dedicated heat rejection system sized and configured for your exact needs and optimize the temperature, flow rates, and pressure through the CDU.
Backup of the CDUs and pumps
At the high densities AI factories are running at, any hiccup in the cooling system will result in overheating and downtime in seconds. An integrated thermal storage system provides five minutes of continuous cooling backup, in case of power outage – a 30,000-gallon tank for the high temp loop (air cooling) and a 15,000-gallon tank for the low temp loop (liquid cooling). In addition, the CDUs and facility pumps are on their own dedicated UPS power with five minutes of backup.
Opportunity for waste heat reuse in data centers
Cloud data centers do not produce enough high quality heat consistently to enable productive use of that heat. AI factory liquid systems can return water at higher than 55°C or 130°F, which qualifies as high quality heat. Plus, the plumbing can be easily adapted to various applications including industrial processes, green houses, and district heating.
Designing and deploying next-gen cooling
As AI factory accelerated compute evolves, designing cooling for the next Nvidia NVL becomes laborious and many times impossible as there are very few people in the world with knowledge and experience.
Six months before every new Nvidia release, companies like Schneider Electric, which partner very closely with Nvidia, release reference designs that include the BOMs with performance specifications, layouts, etc. These can be used as starting points to adapt individual preferences and meet local regulations.
An even simpler approach is to leverage prefabricated modules, including white space PODs and packaged chillers. These facility-level cooling skids are designed to work seamlessly with the IT PODS and dramatically shorten design and deployment cycles and provide confidence they will support the accelerated compute.
Conclusion: Cooling is the constraint and the opportunity
The challenge of cooling AI factories has sparked the industry to provide liquid-cooling solutions that are more efficient and reduce water use compared to the legacy air-cooled systems for cloud data centers.
Today’s liquid cooling, in addition to supporting very high densities, can be configured and optimized for water and electricity use in AI factories by starting with the preferred heat rejection method, setting the optimum water temperatures, and using variable-speed pumps in the CDUs to regulate flow precisely. Reference designs are available well in advance, allowing data center operators to confidently deploy the latest generations of accelerated compute today and in the future.
More from Schneider Electric

Inside the AI server

Sponsored
Air vs. liquid cooling: Finding the right strategy for AI-ready data centers
Liquid cooling offers the performance and efficiency needed to manage the intense heat of GPU-driven workloads, reducing energy use while improving reliability

The AI opportunity | AI growth zones
Exploring the regions set to transform the UK into a global hub for AI investment and innovation
Read the orginal article: https://www.datacenterdynamics.com/en/opinions/ai-factory-cooling-vs-cloud-data-centers-why-liquid-cooling-is-essential-for-high-density-ai-workloads/




