LLM Risk: What the Chat Box Hides
Large language models are powerful, but they are not free, neutral, or harmless by default.
A simple prompt can hide four things: expensive infrastructure, energy and cooling demand,
privacy and security exposure, and the risk of confident wrong answers. This page helps you
see the real process underneath: training builds the model, inference serves every prompt,
and responsible use means checking cost, data, accuracy, and dependence.
Training cost
Building frontier-scale models can require millions of GPU-hours before any user sends a prompt.
Inference cost
Every chat, summary, or code request uses serving hardware, memory, networking, and electricity.
Information risk
Private data, biased outputs, hallucinations, and over-trust can create real harm.
Good practice
Use LLMs where they add value, verify important outputs, and avoid unnecessary repeated calls.
Pricing checked against OpenAI API pricing on May 19, 2026. Some values below are labeled as derived estimates.
1. The Physical Cost
LLMs feel like software, but they run on physical infrastructure: GPUs, memory, storage, power, cooling, and networks. These examples show the scale.
Power draw per top-end GPU
Official
700 W
NVIDIA lists the H100 SXM with up to 700W TDP and 80 GB memory. Even one 8-GPU server means 5.6 kW for GPUs alone, before CPUs, networking, storage, and cooling.
Training scale
Model card
30.84M H100 GPU-hours
Meta reports Llama 3.1 405B used 30.84 million H100 GPU-hours for training, with 15T+ pretraining tokens. That is industrial-scale compute, not desktop-scale computing.
Cooling water
Research
700,000 L direct
5.4M L total
The 2023 water-footprint paper estimated GPT-3 training in Microsoft U.S. data centers could directly consume 700,000 liters of freshwater, and about 5.4 million liters total when indirect water is included.
Model storage
Derived
~810 GB weights
A 405B-parameter model stored at FP16 needs about 810 GB just for weights. Replicas, checkpoints, optimizer state, KV cache, and backup copies push real storage and memory requirements much higher.
Training budget
Epoch AI
$100M+
Epoch AI reports that the most advanced models now cost hundreds of millions of dollars to train, with about half of that spend on GPUs and the rest on other hardware and energy.
Inference can dominate
HotCarbon
25x training emissions
Chien et al. estimate that a ChatGPT-like service at 11 million requests per hour could generate 12.8k metric tons CO2 per year, about 25 times the emissions of training GPT-3 once.
Strong takeaway: the interface is simple, but the system behind it is large. Cost does not disappear because the prompt box is clean.
2. The Everyday Cost
Training is expensive, but repeated inference is where everyday usage becomes a recurring bill. Change the numbers to see the effect.
Estimated API cost per day
$0
Estimated API cost per month
$0
Estimated API cost per year
$0
Parameter Estimate for These Three Tiers
- GPT-5.4 mini: exact parameter count is not publicly disclosed. A reasonable teaching estimate is tens of billions of parameters, roughly 20B-80B.
- GPT-5.4: exact parameter count is not publicly disclosed. A reasonable teaching estimate is hundreds of billions of parameters or an MoE-class system with comparable effective scale, roughly 200B+.
- GPT-5.5: exact parameter count is not publicly disclosed. It is included here as the current expensive frontier tier, so treat its scale as larger or more compute-intensive than GPT-5.4, not as a known parameter count.
- These are inferred ranges, not official OpenAI numbers. They are included here to help readers connect model tier with likely memory, storage, and infrastructure scale.
Visible inference bill
0%
Power and data-center pressure
0%
Lock-in and operating risk
0%
A moderate-volume workload already produces a real recurring bill. The hidden part is that the API bill is only one layer; the underlying power, cooling, storage, and capacity footprint is larger still.
3. Why the Risk Stays Hidden
What most users see
- A chat box and a quick answer.
- A simple subscription or token price.
- No direct view of GPU clusters or cooling systems.
- No obvious sign of storage replication, checkpointing, or traffic spikes.
What sits underneath
- High-end accelerator hardware with large power draw.
- Cooling water or equivalent cooling infrastructure.
- Large model weights, caches, checkpoints, and replicas.
- Recurring inference traffic that can outweigh one-time training impacts over time.
Main teaching point
- Training is expensive, but repeated inference at scale can be even more expensive over time.
- Water and power matter because LLM infrastructure is physical, not magical.
- Model size affects not just quality, but memory, storage, cooling, and cost.
- Best practice: use LLMs where they add high value, then reuse outputs locally when possible.
4. Use LLMs Deliberately
A strong LLM workflow is not "never use AI". It is: use it where it helps, protect sensitive data, verify important claims, and control repeated cost.
Check the data
Do not paste private, confidential, medical, financial, legal, or student-identifiable data unless the system is approved for that use.
Check the answer
LLMs can sound certain when they are wrong. Verify facts, citations, calculations, and code before using the output.
Check the bias
Training data can contain stereotypes or gaps. Review outputs for unfair assumptions, missing perspectives, and cultural context.
Check the cost
Repeated prompts, long context, and long answers multiply token use. Cache, reuse, summarise, and batch where possible.
Check the dependence
If a workflow only works with one provider or one large model, there is lock-in risk. Keep exports, fallbacks, and human knowledge.
Check the value
Use the smallest capable model and the shortest useful prompt. The best prompt is not always the biggest prompt.
Sources and Notes
- NVIDIA H100 specs: up to 700W TDP and 80 GB memory. nvidia.com
- Meta Llama 3.1 405B model card: 30.84M H100 GPU-hours, 15T+ pretraining tokens, 700W hardware reference. build.nvidia.com
- Epoch AI, June 19 2024: the most advanced models now cost hundreds of millions of dollars to train. epoch.ai
- Li et al., 2023, "Making AI Less 'Thirsty'": GPT-3 training estimated at 700,000 L direct water and ~5.4M L total water footprint; about 500 mL of water for 10-50 prompts depending on where and when inference runs. arxiv.org
- Chien et al., HotCarbon 2023: a ChatGPT-like service at 11M requests/hour estimated at 12.8k metric tons CO2/year and about 25x the emissions of training GPT-3 once. hotcarbon.org
- OpenAI API pricing checked May 19, 2026: GPT-5.5 input $5.00/M tokens, output $30/M; GPT-5.4 input $2.50/M, output $15/M; GPT-5.4 mini input $0.75/M, output $4.50/M. openai.com/api/pricing
- Derived estimates on this page: 21.6 GWh for Llama 3.1 405B training GPU draw is from 30.84M GPU-hours x 0.7 kW; 810 GB model storage is from 405B parameters x 2 bytes/parameter at FP16.
- Parameter estimates for GPT-5.4 mini, GPT-5.4, and GPT-5.5 are not official. They are teaching-oriented inference ranges based on public pricing tiers, capability tiering, and current frontier-model scale patterns.