Your AI Bill Is Not a Tool Problem

Table of Contents

I keep hearing a similar complaint: “Our AI tools bill went from $1k to $10k in a month. We don’t know where it’s coming from.“

Most of them respond the same way. They start shopping for a different tool. They evaluate Cursor vs Claude Code vs Copilot. They build comparison spreadsheets. They run pilots.

They say “Cursor is too expensive” or “Copilot is 10x cheaper.”

They are looking at the wrong problem.

The 97/3 Math

Every time I pull the actual billing data, the same thing comes back: the model is 97% of the bill.

The platform (Cursor, Claude Code, whichever wrapper you picked) is just 3%.

On a typical coding session, the platform surcharge is about 3 cents. The model itself -Opus at $5 in/$25 out per million tokens- runs about a dollar.

Switch tools, and you shave the three cents. The dollar stays.

Your AI bill isn’t decided by the tool you pick. It’s decided by which model your developers default to, and whether anyone is watching when that default drifts.

How $1,000 Becomes $10,000

I watched this play out with a multi-seat team. They scaled from a $1k/month subscription to a $10k bill in one month. Nobody made a bad decision.

They started on Sonnet. Calls cost about 33 cents. Predictable.

Then someone tried Opus for a hard refactor. The result was better. Word spread. “Use Opus for tricky stuff.” Then, for “just to be sure.”

Within a month, the whole team was on Opus – about three times the per-call cost. Most of it exceeded the included quota and was billed on demand. The month closed at $10k.

(For a more in-depth analysis, see The Evidence-Based AI Stack for Large Codebases.)

The Model Drift Tax

This is what I call the Model Drift Tax: what you pay when developers self-select for the best model, and nobody locks the default.

It’s not a bad decision problem. Every individual choice was rational. Of course, you’d reach for the best model on a hard problem. The drift isn’t a bad call. It’s the absence of a default.

As I wrote here, effort never decreases – it migrates.

A team without a model policy has every developer making a model decision for every prompt. That cognitive load doesn’t disappear. It accumulates on the invoice.

The choice of default model should be strategic.

What I’d Actually Change

The first thing I’d change is the model your team starts on.

Set Sonnet (or your stack’s equivalent) as the default. The SWE-bench quality gap between Sonnet 4.6 and Opus 4.5 is 1.3 percentage points – invisible on feature work, integrations, tests, and reviews.

Opus stays available. It just becomes a deliberate escalation, not a passive habit.

The point isn’t “Sonnet is good enough.” It’s the starting model that does most of the work. If you open on Opus, every routine call costs three times what it should.

The second thing is billing. Move Opus-level work to a flat rate.

Anthropic’s Claude Max plans are $100/month, flat. The same Opus usage at per-token rates inside Cursor blows past the $70 API budget on Pro Plus by month-end.

Flat-rate isn’t just a cap – it’s a growth subsidy.

Anthropic ran the Uber playbook: price below cost, drive developer adoption, build lock-in. The team running long agentic pipelines no longer gets the scary email. For now, at least.

Tools Aren’t the Game

The tools are commodity wrappers around the same frontier models: Anthropic’s, OpenAI’s, Google’s.

The differentiator isn’t which IDE you picked. It’s the billing model you bought, and what your team is instructed to default to.

The team I watched eventually capped costs by flipping the default back to Sonnet and moving Opus to Claude Max.

Same models. Same quality. Roughly half the spend.

The change wasn’t technological. It was governance.

You don’t need to switch tools. You need to govern the one you have.

Author
Recent Posts

Leo Celis

Founder & CEO at InTheValley

I build AI systems that run startups and write about what I find.