Why Flat-Rate AI Was Always Going to Break (Part 2 of 3)

The Math, Briefly

In Part 1 of this series I walked through what happened to flat-rate AI subscriptions in March and April. GitHub Copilot is moving to credit-based billing on June 1. Anthropic spent six weeks adjusting Claude limits, blocking third-party harnesses, briefly dropping Claude Code from the Pro plan, and then publishing a postmortem after Max users burned through weekly quotas in two days. Pricing emails started landing.

This piece is about why none of that should be a surprise.

The temptation is to read the events of April as a series of operational decisions. A throttle here, a limit there, a clumsy A/B test on the pricing page. Those are the things you can see. The thing you cannot see in any single month, but which is doing all of the work, is the gap between what frontier AI inference costs to deliver and what flat-rate subscribers pay for it. That gap is not a small one. It is one of the biggest gaps in any consumer product market today.

Once you put a few numbers next to each other, the surprise stops being that flat-rate plans are cracking. The surprise is that they held together this long.

Conceptual image of a GPU on a scale outweighing a credit card, symbolizing the high cost of AI hardware and bubble market trends.

The Spreadsheet Does Not Close

Let me start with the most cited single line item. According to the Wikipedia entry on the AI bubble (yes, there is an entry!), OpenAI's inference spend on Microsoft Azure was about $3.76 billion across the whole of 2024. In the first half of 2025 alone, it was $5.02 billion. Inference, not training, not capex, not salaries. Just the cost of answering the prompts that paying users typed into ChatGPT.

If you straight-line that, OpenAI's 2025 inference bill is north of ten billion dollars. The same source notes that Sora, OpenAI's flagship video model, was discontinued in late March 2026, six months after launch. The reason was not that nobody used it. The reason was that the unit economics did not work. The cost per video did not converge with what users were willing to pay, and the company decided to stop subsidising it.

Now widen the frame. Derek Thompson's piece on (and his interview with Paul Kedrosky) how the AI bubble pops puts the projected US AI capital expenditure across 2026 and 2027 at over $500 billion combined. That is data centres, chips, power, fibre, cooling. Capital that has to earn a return at some point. On the other side of the ledger, total US consumer spending on AI products is somewhere around $12 billion a year. The two numbers are not in the same room. They are barely on the same continent.

Half a trillion dollars of capex on one side. Twelve billion dollars of recurring consumer spending on the other. That is the gap. Anybody can do the arithmetic. There are basically two ways for those two numbers to meet. Either consumer spending grows by something like 40x in two years, which is not a thing that happens to consumer markets, or capital owners eventually demand a price-per-unit that lets the spreadsheet close. The repricing of flat-rate subscriptions is the second option, in slow motion.

This is also where the historical precedent that I already wrote about, DeepSeek's January 2025 release, still earns its keep. DeepSeek's R1 dropped training cost by something like 95% relative to the Western frontier and the market noticed instantly. NVIDIA fell 17% in a day. The lesson from that day was that the conviction "you have to spend tens of billions to train a competitive model" had a counterexample, and the moment one counterexample existed the whole capex narrative had a question mark on it. That question mark has not gone away. It has only gotten louder.

The Cost Curve Is Moving The Wrong Way

There is a second piece of this that is less obvious from the outside but very obvious from inside the developer experience. The capabilities everybody now wants from AI, the ones the marketing decks promise, are the most expensive ones to deliver per query.

Reinforcement learning from human feedback was already raising training cost. Then came reasoning models, the ones that "think" before answering. Now agentic loops, where the model takes tens or hundreds of steps, calls tools, reads files, and re-reads its own outputs. Then "extended thinking," which is exactly what it sounds like. Each of these things makes the model better. Each of them generates more tokens per single user request than the prior generation did, in some cases by an order of magnitude.

In Part 1 I noted Anthropic's Opus 4.7 makes Claude Code sessions roughly three times longer than Opus 4.6. That is not a bug. That is the product working as intended. The model is allowed to think more, so it does, which allows it produce a better answer, but then costs more to produce. The user, on a flat plan, is paying the same. The user, in fact, is happier because the answer is better. From the user's seat, this is the platform improving. From the provider's seat, every monthly invoice from the cloud is bigger than the one before, while the subscription line item is unchanged.

It is worth pausing here on the capability ceiling argument. Yann LeCun has been arguing for some time that current LLM scaling is hitting fundamental limits. Whether you agree with him or not, the consequence of his thesis on AI economics is the part that matters for this piece. If each marginal capability gain is going to cost disproportionately more compute, the spreadsheet just gets worse, not better. The "we'll fix margins by getting more efficient" answer becomes a harder one to sell to a CFO. Some of the efficiency wins are real. Distillation, smaller specialised models, mixture-of-experts. None of them are big enough by themselves to close the half-trillion-dollar gap.

So the cost curve is moving the wrong way at the same time as the demand curve is doing exactly what the providers wanted it to do. Both can be true and are true.

The Honest Counter-Argument

I want to be fair to the people pushing back on bubble framing, because their argument is not silly. The clearest version I have seen comes from Vlad Galabov at Omdia.

His point, summarised in the Broadband Breakfast piece on the AI compute shortage, is that the bubble narrative confuses stock-market behaviour with real demand. Token usage went from roughly 6 million tokens per minute in October 2025 to roughly 15 billion per minute by March 2026. GPU prices are up 48%, not down, despite the supposed "shortage of buyers" that bubble theory predicts.

Galabov's reading is that demand is real and the constraint is supply, not appetite. I think he is right about the first half of that.

What I would add is that being right about real demand does not refute the bubble thesis. It strengthens the part of it that matters most for this series. The argument was never that nobody wants AI. The argument is that flat-rate consumer pricing was set at a number that does not reflect the actual cost of serving real demand at scale. If demand is real, and supply is constrained, and GPU prices are climbing, then the providers carrying the cost of that demand on a fixed monthly fee are in a worse position, not a better one. Galabov's data does not save flat-rate. It explains why flat-rate is being repriced.

The other piece worth holding onto is the consumer-side number. As I noted in my June 2024 writeup of how few people actually use AI tools daily, the gap between "real demand from a small set of heavy users" and "broad consumer adoption" is still wide. The 15 billion tokens per minute number is real, but it is concentrated. Heavy power users on Claude Code, agentic coding workflows, large enterprise pilots. The sustained $20-a-month-from-everyone wave that would close the half-trillion-dollar capex gap is not where the curve is right now.

Loss Leaders, Quietly Repriced

So lets try and put the pieces together.

Capex on one side dwarfs consumer revenue on the other by more than an order of magnitude. The cost-per-query curve is moving up, not down, because the features users actually want are the expensive ones. Real demand exists, but it is concentrated in heavy users whose usage destroys the assumptions behind flat-rate pricing. The historical precedent of DeepSeek shows that capex-heavy strategies are not safe even when demand is real. And the entire flat-rate AI subscription category was, transparently, a user-acquisition loss leader rather than a margin-positive product.

That is the math the events of March and April were the surface of. Not a series of separate operational decisions. One repricing, in installments, of a category of products that was always going to need repricing.

I do not think this is sad nor do I think it is even surprising. I think it is the part of any technology cycle where the price tag finally catches up to the workload, and where the people who built around the temporary price tag have to redesign their stack around the permanent one. The companies that priced flat-rate frontier subscriptions at $20 or $200 a month did everyone a favour by pricing them too low for two years. They got a generation of developers fluent in agentic workflows. Now they need that generation to pay something closer to what those workflows actually cost.

The interesting question, the one I want to spend the third post on, is not whether this is a bubble or not. It is what small businesses, the audience I intend to write for, should actually do about it. The answer is more optimistic than the framing of this post suggests, because in parallel with all of the above something genuinely good has happened. A real open-weight tier matured underneath the frontier in 2025 and the first months of 2026. It is now mature enough that the routing decision, not the subscription decision, is the one that matters.

That is the next post.

Until then I would just say this. If you are running a small business that is building on top of AI tooling, do not panic at the events of April. They are not the end of anything. They are the beginning of usage-based billing showing up in your invoices, and the math behind that change is older and quieter than any of the headlines. The companies announcing the changes are responding to the same arithmetic anybody could do with a calculator. Plan accordingly. The floor is not falling out. The floor is being repriced.

I am still on the ground. The view from down here is more interesting than the one from orbit, and the bubble framing I used three weeks ago holds up better in the pricing email than in the satellite filing. That is fine. It just means the work is closer to the desk than I expected.

Menu

Language

Why Flat-Rate AI Was Always Going to Break (Part 2 of 3)

The Math, Briefly

The Spreadsheet Does Not Close

The Cost Curve Is Moving The Wrong Way

The Honest Counter-Argument

Loss Leaders, Quietly Repriced

Related Articles

The Pricing Emails Are Landing (Part 1 of 3)

Building Data Centres in Space: This Smells Like a Bubble

How Fast Can Pictory AI Turn a Blog Post Into a Video (Part 2)?