AI feels cheap right now.
You can get state of the art models for a few dollars per million tokens. Even subscriptions are "reasonable" in human terms, like $8 to $200 per month type vibes.
So the obvious thought hits.
What if I just ran this on my own?
Not as a research lab. Just as a person, or a small team. Either I buy a GPU, or I rent one from these neo cloud providers and call it a day.
Let's do the math, slowly, like normal humans.
Step 1: The subscription trap (it feels too good)
Let's take a heavy subscription — $200 per month.
Over 6 years, your total cost is:
200 × 12 × 6 = $14,400So that's $14,400 over 6 years.
Now compare that to buying a data center grade GPU like an NVIDIA H100.
An H100 can be around tens of thousands of dollars depending on the market and availability. People throw around numbers like $30k for a single card in many discussions, but treat that as a "ballpark street price", not a sticker price promise.
So yeah, if you're alone, it's hard to beat the subscription.
But then the second thought hits.
What if I don't buy it alone?
Step 2: The "split it with friends" idea
If 4 people pay that same subscription for 6 years:
4 × 14,400 = $57,600Now you're at $57,600.
At this point, buying hardware starts to look tempting. Not because it's cheap, but because shared cost changes the game.
So let's test the simplest scenario. One GPU, shared by 4 people.
Step 3: Total Cost of Ownership (TCO) is not just the GPU price
Let's assume you buy an H100 and you already have the rest of the setup (host machine, PSU, etc.).
Now electricity enters the chat.
The H100 PCIe power draw is commonly listed around 350W (0.35 kW).
So if it runs 24/7 for 6 years, first compute hours:
6 × 365 × 24 = 52,560 hoursEnergy used:
0.35 kW × 52,560 hours = 18,396 kWhNow multiply by electricity rate. Instead of hardcoding one state, here's the clean way:
Electricity Cost = 18,396 × rWhere r is your price per kWh. U.S. Energy Information Administration data shows Michigan residential electricity price around 19.66 cents per kWh for July 2024.
So plug r = 0.1966:
18,396 × 0.1966 ≈ $3,615So electricity alone is roughly $3.6k over 6 years in that example. That's not insane.
But we still forgot something. Cooling.
Step 4: Cooling is the silent multiplier (PUE)
Cooling is why data centers talk about PUE.
PUE basically means: if your GPUs consume 1 unit of power, the facility consumes PUE units total (compute + cooling + overhead).
Total Power Cost = Compute Power Cost × PUEIf you pick a conservative "home or small server" PUE assumption like 1.5:
3,615 × 1.5 ≈ $5,423So now you're around $5.4k for power + cooling overhead over 6 years (using that Michigan price example).
GPU cost is huge upfront. Electricity is not the main killer for a single card.
But now comes the real killer. User experience.
Step 5: Can one GPU actually run the models you want?
Here's the part people skip.
Running "a model" is not the same as running a frontier grade experience.
Big models need VRAM for weights (the model itself), KV cache (context window memory), and activations and overhead.
So even if a model is "Mixture of Experts" and only activates a small slice per token, you still often need to load a lot of weights into memory.
That's why you'll see cases where a truly massive model needs multiple H100s just to exist in memory.
And once you go from "one GPU" to "eight GPUs", you're no longer doing a hobby project. You're doing a mini data center.
Step 6: The DGX temptation and the "median house" moment
NVIDIA DGX H100 systems bundle 8x H100 class GPUs into one box.
People often cite DGX class power requirements in the multi kilowatt range for 8 GPU systems, and summaries commonly land around 8 to 10 kW for typical 8 GPU deployments.
So even before electricity, you're in: "Do we even have the infrastructure to run this safely" territory.
And price wise, DGX H100 class systems are typically quoted in the hundreds of thousands of dollars by vendors.
So if the plan is "Let's just buy what the big labs have" — you quickly end up at: this costs like a house.
Which brings us to the uncomfortable truth.
Step 7: So why does the subscription model work at all?
Because frontier labs don't run one GPU for one user.
They run massive fleets with batching (many users packed together efficiently), utilization optimization (keeping GPUs busy), custom kernels, quantization, speculative decoding, datacenter cooling optimized at scale, negotiated power rates and long term contracts, and constant infra tuning.
Basically, they're not selling you "a GPU".
They're selling you a well oiled machine that turns electricity into text at industrial efficiency.
That's why pricing can look like magic. Not because the physics vanished. Because scale bends the economics.
Bigger picture: "Cheap AI" is still a power and hardware story
AI looks cheap at the surface because the interface price is low.
But under the hood, the bill exists somewhere: power plants, grids, cooling loops, GPU supply chains, data centers, networking.
So the real question is not: "Is AI cheap?"
The real question is: who is paying the hidden bill, and what happens when demand keeps rising?
And that's why this whole topic matters.
Because the future of AI is not just about better models. It's also about energy, infrastructure, and who can scale both at the same time.
