Is AI inferencing the greatest challenge the data center world has ever faced?
Perhaps. To understand why, let’s explore what inferencing entails and how to tackle its demands.
What is inferencing?
Inferencing is the practice of applying trained machine learning models to new, unseen data to generate predictions, decisions, and responses. In essence, it’s the application phase of machine learning, where the model uses the knowledge gained from the training data to generate outputs for new inputs.
We all use inferencing whenever we access Gemini, Meta, ChatGPT, or the other large language models that emerged over the past few years. Every time we type in a prompt, inferencing generates the response. For example, Meta leverages inferencing to make real-time decisions about content displayed to users on platforms like Facebook and Instagram, utilizing trained AI models to analyze user data and predict which posts, ads, or recommendations will be most relevant and engaging for each user based on their previous interactions and behaviors.
Of course, the value of inferencing isn’t limited to individuals. Nearly every business is evaluating or deploying new, generative AI applications on inferencing hardware. According to Salesforce, 86% of IT leaders expect generative AI to quickly play a prominent role in their organizations.
And that means that we’ll see a shift in organizational IT infrastructure priorities.
So far, training infrastructure, with its hot and dense cluster designs that are akin to HPC architectures, has soaked up most of the investment and attention because there’s been a race to build the best models.
However, as companies move from training models and testing generative AI applications to deploying them at full scale, demand for robust inferencing infrastructure will skyrocket, far surpassing the need for training. Current estimates suggest that the market will soon be split into 15% for training, 45% for data center inferencing, and 40% for edge inferencing. This shift highlights a fundamental challenge: today’s data center infrastructure is simply not designed to support the massive and evolving demands of inferencing.
Scaling Inferencing: The New Infrastructure Challenges
- Specialized, High-Density Hardware AI hardware for inferencing isn’t standard; it relies on GPUs and TPUs with high-speed interconnects similar to HPC systems. The power demand for Nvidia’s next-gen GPUs is expected to reach 1,000W per card in 2024, with single 8U AI servers consuming over 10kW—far above typical rack limits. Uptime Institute’s Hardware for AI: What Makes It Different report estimates that by early 2025, IT infrastructure supporting Nvidia AI workloads will reach 3,000 MW globally—a surge driven by unprecedented demand for specialized inferencing setups, according to JetCool.
- Exponential Demand and Limited Data Center Capacity Inferencing demand is growing rapidly, and with it, the need for vast amounts of power and cooling. At the same time, we have a shortage of data center capacity that’s equipped for inferencing hardware.
- Extreme Power and Cooling Needs GPUs in data center AI applications are increasingly power-intensive, with Nvidia chips using six times the memory they did in 2017. This surge in power and memory capacity strains existing electrical and cooling systems. In response, large-scale inferencing clusters will soon mirror supercomputer layouts, with racks requiring thousands of copper and optical connections to maintain high-speed data flow across GPUs.
- Latency Sensitivity Many applications, such as real-time video analysis or autonomous driving, require low-latency inferencing, which can be challenging to achieve in large-scale data centers without specialized deployments, often at the edge. Low-latency multi-modal models (that allow input and output of any media, not just text), will require scalable support across various computing environments—including centralized cloud, edge computing, and IoT devices. We’ll see accelerated obsolescence of so much of our existing data center and edge capacity. New designs will be needed.
- Environmental Impact and Sustainability Goals The computational power required for inferencing can lead to substantial energy consumption, making it a significant environmental concern. That’s happening in the face of power generation and distribution limits that can’t be met with today’s electrical infrastructure. Managing inferencing at scale must align with sustainability goals to reduce water and energy use, minimizing strain on existing infrastructure.
Meeting these challenges sustainably is a monumental task, especially as organizations look to deploy latency-sensitive, multi-modal AI applications globally. Inferencing clusters require dense configurations, full performance under strict power limits, and minimal environmental impact—a tough balance to achieve with conventional infrastructure.
These are hard problems. Let’s take a step back for a moment and consider them in context.
Let’s say an organization wants to deploy a latency-sensitive, multi-modal generative AI application at scale. To do that, they need to deploy fifty clusters of inferencing hardware around the world.
What challenges will they face?
- Finding spaces to support service demands won’t be easy. They may have to put some of the hardware in data centers and the rest of it at the edge. Variations in environmental conditions are likely from one location to another.
- Access to physical space is restricted. They have to pack their computationally intensive hardware as densely as possible to fit limited space, even though it’s hot enough to melt itself if it isn’t cooled properly.
- They have to deliver the highest level of performance within a given power envelope. They must drive maximum efficiency, and can’t waste power anywhere, but they have to do that without throttling or restricting hardware performance.
- They must be able to minimize the environmental impact of their deployment. They have to deploy and run the application, at full performance, in the most sustainable way.
- And of course, they have to support these needs for today’s hardware while supporting subsequent generations of inferencing hardware, which is likely to be hotter and denser than before.
How can organizations work through these challenges in a proven, mature, and cost-effective way?
They can use liquid cooling.

Direct-to-Chip Liquid Cooling: A Solution for Inferencing
Not that long ago, liquid cooling was a niche technology, but now it’s becoming mainstream, adopted by the largest data center providers as a critical component of AI inferencing hardware.
Liquid cooling, and especially direct-to-chip liquid cooling, is a key to unlocking inference architecture at scale.
- Unlike conventional air cooling, direct-to-chip cooling can function in a variety of spaces, supporting demanding performance requirements in a very dense deployment.
- Liquid cooling cuts energy waste, boosting energy efficiency by cutting reliance on inefficient air cooling.
- It supports the highest levels of performance, keeping chips cool enough so that they don’t have to throttle, and they’re not being degraded by thermal stress.
- Liquid cooling supports sustainability goals and helps keep organizations from harming the environment.
All of this sounds well and good of course, but, in real-world situations, what are the advantages of using liquid cooling?
Real-world Benefits of JetCool’s Direct-to-Chip Cooling
JetCool’s microconvective direct-to-chip liquid cooling technology is already used in some of the most demanding environments, and users are seeing compelling benefits. By aiming jets of cooling fluid at the hottest spots on a processor, JetCool’s technology can provide exceptional cooling without inefficient, expensive refrigeration cycles. JetCool:
- Supports the hottest and densest AI infrastructure
- Outperforms air cooling by 82%, delivering best-in-class thermal performance
- Reduces AI cluster power consumption by 15%
- Operates in all types of environments and with a wide range of coolant temperatures
- Keeps GPUs at safe temperatures, even in hot environments or with warm coolants
- Minimizes water utilization, reducing waste by as much as 92%
- Unlocks heat reuse opportunities with coolants above 50°C
In other words, it delivers best-in-class cooling for the most demanding AI infrastructure while cutting energy consumption, maintains hardware performance, cuts water use, and even gives organizations an opportunity to utilize or sell waste heat.
To make matters even better, we’ve been able to demonstrate that our technology can cool processors 5,000W+, well beyond today’s limits, giving organizations a way to futureproof their infrastructure without costly cooling redesigns in a few years.
JetCool delivers best-in-class cooling for demanding AI infrastructure while cutting energy consumption, maintaining hardware performance, reducing water use, and enabling waste heat utilization. Additionally, it future-proofs infrastructure by cooling processors beyond current limits without costly redesigns.
Customer Spotlight: University of Pisa’s Sustainable HPC Deployment
A Glimpse into Sustainable HPC with JetCool’s SmartPlate System
Recognizing the market’s challenges with limited data center space and the complexities of implementing liquid cooling, JetCool developed a self-contained liquid cooling solution using our advanced cold plates. The University of Pisa exemplifies how this innovative approach seamlessly integrates with high-performance computing (HPC) and AI inferencing to drive sustainability and efficiency.
As a prestigious research institution, the University’s Green Data Center has been pioneering sustainable computing since 2016, utilizing cutting-edge servers from Dell Technologies and high-performance processors from all the major chipmakers.
Facing escalating power and cooling demands, the University recently expanded its data center by adopting JetCool’s SmartPlate System. This innovative, self-contained liquid cooling solution enabled the University to achieve significant power savings and enhance cluster efficiency without major infrastructure overhauls. By retrofitting its Dell PowerEdge R760 servers, the University successfully optimized its HPC clusters to support intensive AI inferencing tasks without modifying existing computing infrastructure.
Discover the Full Story
Read more about the deployment at the University of Pisa and learn how JetCool can help your organization overcome similar challenges with our self-contained liquid cooling systems, enabling immediate adoption of liquid cooling for HPC and AI clusters without infrastructure modifications.







