What a Datacenter Actually Eats: The Hidden Supply Chain of an AI Query
· Hunter · 14 min
A modern AI prompt does not just consume compute. It pulls on substations, cooling towers, transformers, diesel tanks, optical modules, and in some places, municipal water politics.
In Northern Virginia, permits for data centers add up to roughly 4,632 diesel generators with a combined capacity of about 11.1 gigawatts. That is backup power on the scale of a national grid, sitting mostly idle behind anonymous warehouse walls, waiting for the moment the utility blinks. The strange part is that those generators are part of the supply chain for a chatbot answer that appears on your screen in under two seconds.
The public story about AI has mostly been told in abstractions: models, parameters, benchmarks, scaling laws. The physical story is messier and more revealing. A prompt sent to a modern model does not travel through “the cloud” in any meaningful meteorological sense. It moves through a stack of very earthly things: copper, aluminum, silicon, transformer oil, cooling water, diesel fuel, fiberglass ducts, optical glass, and utility paperwork.
The bottleneck, increasingly, is not the model. It is whether the right patch of land can get another 300 megawatts, whether a transformer can arrive before 2027, whether a cooling design can survive August, and whether a town wants to spend its water on someone else’s tokens.
The old datacenter was not built for this
For most of the cloud era, data centers got denser slowly.
A rack full of conventional servers might draw 5 to 15 kilowatts. Operators obsessed over power usage effectiveness, or PUE, shaving a few points off overhead with better airflow, hotter aisles, smarter chillers. The architecture was mature enough that efficiency felt incremental. Add another hall, another row, another cluster.
Then generative AI arrived and quietly changed the geometry.
Training large models was already power-hungry, but inference turned out to matter too, because inference happens all day, for everyone, in production. The hardware profile shifted from CPUs and modest GPU clusters to racks packed with accelerators that each draw 700 watts to 1,000 watts or more. Nvidia’s H100 SXM is rated around 700 W TDP. Early Blackwell B200 systems push into the 1,000-watt class, and OEM designs now assume liquid-cooled racks around 140 kW. Meta showed one such 140 kW liquid-cooled AI rack at OCP 2024.
That is not a normal server room problem. That is an industrial heat-removal problem.
The International Energy Agency estimates data centers used about 1.5% of global electricity in 2024. It also warns that AI-driven demand could roughly double data center electricity consumption by 2026 to 2030 in major markets. Fatih Birol, the IEA’s executive director, put it plainly: “There is no AI without energy—specifically electricity.”
That sentence lands differently once you realize how many separate systems have to work, simultaneously, for one model response to exist.
What happens when you send a prompt
A prompt looks weightless because text is tiny. The hidden cost is not in the size of the input. It is in the machinery required to turn that input into the next token, and the next, and the next.
Start with the front end. Your request hits a web service that authenticates you, routes the request, and decides which model and cluster should handle it. That part looks familiar to anyone who has seen cloud software before.
Then the expensive path begins.
The prompt is broken into tokens and sent into a GPU cluster over high-speed interconnects, often 400 or 800 gigabits per second InfiniBand or Ethernet. The model’s weights live in high-bandwidth memory, or HBM, attached to the accelerators. During inference, each generated token requires repeated reads of those weights, matrix multiplications across many GPU cores, and synchronization traffic between devices.
A useful mental model is this: an LLM is less like a calculator and more like a factory line that has to consult an enormous reference library every few milliseconds while coordinating with neighboring machines. The arithmetic is fast. The movement of data is what makes the whole thing physically intense.
Inside a single accelerator, HBM is the star. It feeds the compute units at extraordinary bandwidth. Across accelerators, technologies like NVLink or PCIe move activations and partial results. Across nodes, the network fabric keeps many machines aligned. The model may be sharded because it is too large for one GPU, or because the service needs enough parallelism to answer many users at once.
Every one of those transfers burns power.
And not just in the GPUs. A recent analysis in Nature notes that networking can approach about 10% of cluster power at AI scale. An 800G optical module typically draws around 11 to 14 watts. That sounds trivial until you multiply it by thousands of ports in a large fabric. Suddenly, “the network” is not a rounding error. It is megawatts.
Now follow the electricity.
Power may arrive from the grid at transmission voltage, often around 115 kV, then step down at a substation to medium voltage such as 34.5 kV or 13.2 kV, then again to 480 V on site. UPS systems smooth out disturbances and buy enough time for generators to start if the grid fails. In some modern designs, power is distributed inside the data hall via 48 V busbars and power shelves, reducing conversion losses closer to the rack.
From the rack’s perspective, electricity arrives as a promise: every watt you draw must leave as heat.
That is physics, not accounting. GPUs do not “use” electricity in the way a battery charger stores it. Nearly all of it becomes heat that has to go somewhere.
So the prompt becomes warm water.
Or hot air.
Or a plume of evaporated water above a cooling tower.
The cooling system is the real second computer
The glamorous machine in AI is the GPU cluster. The necessary machine is the cooling plant.
High-density AI racks are pushing operators toward direct-to-chip liquid cooling and other hybrid designs because air alone struggles at 100 kW-plus rack densities. A conventional enterprise room can move a lot of air. It cannot casually remove the heat of a small furnace from every cabinet.
Once heat is captured, it still has to be rejected outdoors. That can happen through chillers, dry coolers, or evaporative cooling towers. Each choice changes the resource footprint.
Cooling towers are efficient because evaporation carries away large amounts of heat. But they consume water. The makeup water for a tower is not just evaporation; it also includes blowdown to control mineral concentration and drift, the small droplets that escape with the exhaust air.
This is where AI’s water story gets uncomfortably concrete.
Researchers led by Shaolei Ren estimated that training GPT-3 evaporated roughly 700,000 liters of freshwater. In the same line of research, they estimated that roughly 20 to 50 LLM prompts can indirectly consume about 500 milliliters of water, largely through cooling, depending on where and when the workload runs. That is cup-scale water for a handful of interactions.
Not every prompt costs the same. Water usage depends on climate, cooling design, time of day, and grid conditions. But the broader point is hard to unsee: digital services have local hydrology.
Industry water usage effectiveness, or WUE, can range from about 0.2 to 1.8 liters per kWh of IT load, depending on the facility and climate. Push toward evaporative cooling and electricity overhead can fall, but water use rises. Push toward dry cooling and water use falls, but electricity use usually rises because compressors and fans work harder.
PUE and WUE, in other words, can pull in opposite directions.
That trade-off has escaped the data center and entered local politics. In The Dalles, Oregon, a 2022 settlement forced disclosure of Google’s water use, turning a technical operating metric into a civic issue about public records and municipal water rights. Once people can count the gallons, the cloud stops looking abstract.
Google reported about 6.1 billion gallons of data center water consumption in 2023. That figure covers a global fleet, not AI alone. But it is large enough to make the argument visible: the internet is not merely electric. It is thermodynamic.
The surprising bottleneck is not the GPU
The obvious bottleneck in AI is chips. H100 shortages were real. HBM supply is tightly concentrated. High-speed optics are hard to source at scale.
But the more surprising constraint now shows up in utility yards and procurement spreadsheets.
Transformers.
Medium-voltage switchgear.
Interconnection studies.
According to NREL, U.S. distribution transformer lead times stretched to roughly two years after 2022 and remain elevated. Wood Mackenzie projected a 2025 shortage of about 30% for power transformers and 10% for distribution units. MV switchgear lead times also blew past pre-2020 norms.
That means a company can have financing, land, chips, and customers, and still be stuck waiting for the metal box that lets electricity enter the site safely.
The queue extends beyond hardware. Lawrence Berkeley National Laboratory reports that U.S. interconnection queues reached around 2,600 GW of generation and storage waiting to connect through 2024. Large new loads, including data centers, are now forcing grid operators to rethink procedures designed for a different era.
PJM, which runs the grid across a large part of the Mid-Atlantic and Midwest, launched a fast-track process for large-load interconnections. Dominion Energy Virginia proposed a large-load queue and disclosed roughly 25 GW of dated requests plus about 45 GW undated, largely tied to data centers.
These are not small numbers. They are big enough to make “Can we build another AI cluster?” sound less like a software question and more like a regional planning question.
Singapore and Dublin saw this earlier than most. Singapore paused new data centers in 2019, then reopened through a selective process with strict efficiency criteria. Dublin effectively halted most new grid connections from 2021 amid local constraints. Ireland later ended the Dublin-area freeze in December 2025, but with conditions.
This is what the physical internet looks like when it reaches local limits. Not a dramatic collapse. A queue.

The backup system is a hidden fossil layer
There is one more part of the AI query that usually stays out of sight: resilience.
Data centers are built on the assumption that the grid will fail eventually. So a facility might have N+1 or 2N redundancy in UPS systems, generators, fuel storage, and distribution paths. Emergency diesel generators are the standard answer because they are mature, powerful, and can start fast enough to matter.
That is why Northern Virginia’s permitted generator fleet is so striking. 4,632 diesel generators, 11.1 GW of capacity, all there to preserve continuity for digital services that market themselves as clean and intangible.
Under EPA rules, non-emergency operation of emergency diesel generators is generally capped at 100 hours per year. So these machines are not meant to run constantly. But they shape air-quality debates, noise complaints, fuel logistics, and local permitting. They are part of the hidden fossil layer under AI.
The contradiction is not hypocrisy so much as engineering. A model serving millions of users cannot simply go dark because a feeder tripped. Reliability requirements pull operators toward backup systems that are available now, not the ones that might exist at scale later.
What people get wrong about an AI query
The common mental picture is that an AI prompt “uses electricity.”
That is true in the same way saying a city “uses roads” is true. It hides the actual system.
A modern AI response consumes:
- Grid capacity at a specific node, not generic electricity in the abstract
- Power delivery hardware such as transformers, switchgear, UPS systems, and busbars
- Compute silicon plus scarce HBM memory
- Network fabric and high-speed optics, which can take around a tenth of cluster power
- Cooling capacity, often in the form of water, refrigerant loops, pumps, and towers
- Backup infrastructure, including generators and fuel
- Permits and queue positions, which sound bureaucratic until they become the gating resource
The weirdest part is that some of the hardest constraints are low-tech.
A transformer is not glamorous. It does not appear in benchmark charts. But if transformer lead times are two years, then the transformer is more important to AI deployment than a five-point model improvement.
The same goes for water rights. A system can be “green-powered” on paper and still depend on evaporating large amounts of freshwater to keep rack temperatures in range. That does not make the accounting dishonest. It means the footprint is multi-dimensional.
Shaolei Ren has argued for more transparency around AI’s “secret water footprint,” including regionalized impacts from cooling and chip manufacturing. That regional part matters. A liter of water in a wet, cool region is not the same civic resource as a liter in a drought-prone basin. One megawatt in a grid-rich industrial corridor is not the same as one megawatt at the edge of a constrained substation.
The bottleneck is always somewhere specific.
The real map of AI is a map of infrastructure
Once you see the supply chain of a query, the geography of AI looks different.
The important places are not just San Francisco and model labs. They are Northern Virginia, where grid planners and county boards are negotiating the shape of a digital utility. They are Singapore, where the state treated data center approvals as a scarce-resource allocation problem. They are Dublin, where local grid constraints were strong enough to halt growth. They are towns like The Dalles, where cooling water became a public controversy.
The story of AI scaling is often told as a race for smarter algorithms and bigger chips. It is also a race for substations, cooling designs, optical efficiency, and delivery schedules for equipment that most users will never hear about.
That changes the emotional feel of the technology.
The cloud starts to look less like cyberspace and more like heavy industry with better UX.
And that may be the most useful correction. AI is not escaping the physical world as it gets more powerful. It is colliding with it more forcefully.
Every prompt is tiny. The system that answers it is not.
FAQ
How much electricity does a single AI query use?
There is no single number that works across models and deployments. It depends on model size, how many tokens are generated, how efficiently the cluster is utilized, and how much networking and cooling overhead sits around the compute. What the IEA makes clear is the aggregate picture: data centers used about 1.5% of global electricity in 2024, and AI is a major reason demand is rising.
Why does AI use water at all if computers run on electricity?
Because nearly every watt consumed by compute turns into heat, and heat has to be removed. Many facilities use evaporative cooling towers because they reject heat efficiently. Researchers led by Shaolei Ren estimated that 20 to 50 prompts can indirectly consume about 500 mL of water in some scenarios, mostly through cooling.
Why are transformers and switchgear suddenly such a big deal?
AI clusters need very large amounts of power at specific sites, often on aggressive timelines. That makes grid connection equipment a gating resource. NREL documented distribution transformer lead times around two years after 2022, and Wood Mackenzie projected 2025 shortages in both power and distribution transformers. A data center cannot run on GPUs alone.
Why not just build data centers where electricity is cheap?
Cheap power is only part of the puzzle. Operators also need available transmission capacity, substations that can support large loads, water or alternative cooling options, permits, and acceptable latency to users. A site with inexpensive energy but no interconnection path can be less useful than a pricier site with existing infrastructure.
Are GPUs still the main bottleneck?
They are a major bottleneck, but no longer the only one, and often not the one that dictates schedule. HBM memory supply is tight, optics consume meaningful power and can be hard to source, and utility interconnection plus long-lead electrical gear increasingly determine when capacity can actually come online.
---
<sub>2175c009</sub>