Frontier inference at $1.50/$9: the IDP reset

The number that moved — and the curve it sits on

The headline is easy to misread as one model launch. The curve underneath is the story. Frontier-grade intelligence — the tier that clears the production bar for structured extraction, reasoning over long context and multi-pass validation — sat at roughly $3.50 per million input tokens and $15–18 per million output tokens through most of 2025. In the same week of May 2026, the same tier landed at $1.50 input and $9 output on the leading new release, with a latency profile four times faster than peers. Two of the other frontier providers had already moved within 10–25% before the announcement; the rest moved within ten days. The sequence is what matters: this is not one vendor cutting price; it is the floor under the category recalibrating.

The compute-side reasons sit in the compute economics piece — funding cycles, post-IPO unit-pricing pressure on the labs, open-weight alternatives now production-ready, and the inference-stack efficiency gains that compounded across 2025. This post is not about why prices fell. It is about what an IDP vendor and an IDP buyer do in the 90 days after they fall, before the next tier re-prices and the cycle repeats. The mistake to avoid first is treating the May 2026 number as the new constant. It is not a new constant; it is the new step in a stairway whose next step is in 6–12 months.

Inference is no longer a line item. It is a moving floor under your unit economics. The vendor whose pricing engineering reacts at the cadence of the provider's price card is the vendor whose margin survives the next two cuts.

The math at the document level

The unit a CFO reads is cost per document processed, not cost per million tokens. The translation between the two is where vendors win or lose without telling anyone. A typical structured-extraction call on the 2025 frontier stack carried roughly:

Input — 8–14k tokens (the document text or a vision-pre-processed serialization, plus the schema, the system prompt and the few-shot examples).
Output — 1.2–2.5k tokens (the structured JSON, plus a short rationale per field where the agent had to reason).
Auxiliary calls — 1–2 small validator calls and an occasional re-prompt on low-confidence fields.

At the 2025 tier, that maps to roughly $0.16–0.22 per document on inference alone, depending on document length and validator policy. The same call at the May 2026 tier maps to $0.04–0.06 — a reduction of roughly 65–75% in the inference line, with no change to schema, prompt or validator policy. Three secondary effects compound below that headline:

Lever	2025 stack	2026 stack	Net effect on cost per document
Base tier price	$3.50 / $15 per 1M (input / output)	$1.50 / $9 per 1M	−65 to −75%
Prompt caching	Partial discounts on the shared prefix.	Up to 90% off the cached prefix; cached prefixes now reach 60–80% of the call.	Another −15 to −25%
Batch / async tier	~50% off list, with high-volume eligibility.	Same ~50% off, applied to the new lower list — and async windows shortened to a few hours, which expands the eligible workload.	Another −20 to −35% on eligible volume
Smaller-model fallback	Eligible only for the simplest fields.	Smaller siblings of the new tier now clear the bar on roughly half the field set without quality regression.	Another −10 to −20% on the routed share

The stacked effect, applied honestly, lands a typical production IDP pipeline at one-quarter to one-third of its 2025 inference cost per document, without any change in schema accuracy or human-review rate. The vendor who passes none of that through is the vendor whose customer reads the public price card and opens the renewal at a 25–40% asking discount — and is right to.

The three vendor effects that hit in the first 90 days

The reset does not arrive at the vendor's door as a memo. It arrives as three separate conversations, all uncomfortable, all on the customer's side first.

1. The sophisticated buyer reopens pricing at renewal

Inside a quarter of the public price cut, the buyers who run their own FinOps on AI infrastructure — banks, large accounting firms, insurance carriers, mid-cap SaaS that embeds IDP — open the renewal conversation with the same sentence. The vendor's backbone cost fell by 65–75%; why is the per-page line on the renewal proposal up 12% on inflation and renewal-uplift clauses? The vendor who has not prepared an answer reads that sentence as procurement theatre. It is not theatre. The buyer is reading the same provider's public price card the vendor is reading, and the buyer has a spreadsheet that maps tokens-per-document to dollars-per- document with their actual volume in the denominator.

The vendor who survives the conversation has done three things before the renewal opens: refactored the call sizing, published a new internal cost-per-document baseline, and chosen a pricing posture — pass-through, partial pass-through with a margin band, or held price with a quality-and-feature story. Each posture is defensible; none of them is "no change". The vendor who arrives at renewal with the old price and no posture loses the deal or loses the margin, and most of the time loses both.

2. Premium features become commodity in six months

A category of feature that was defensible in 2025 because its inference cost was prohibitive becomes baseline in 2026 because the same inference is one-quarter the price. Three classes of feature sit squarely in this window:

Long-reasoning extraction — multi-step rationale per field, chain-of-thought on contract clauses, multi-page coreference resolution. Priced as a premium SKU at $0.40–0.80 per document in 2025; lands at $0.10–0.20 in 2026 and is offered at parity by competitors who never could ship it before.
Multi-pass auto-validation — running a second model over the first model's output, cross-checking against a rules engine, re-prompting on disagreement. Premium SKU at $0.25–0.40; lands at $0.06–0.12 and stops being a SKU at all.
Complex structured generation — write-back to e-forms, regulator-grade XML, deeply nested schemas with conditional fields. Premium SKU at $0.30–0.60; lands at $0.08–0.18 and joins the standard tier.

The product strategy implication is uncomfortable. A vendor's roadmap that bet on "premium reasoning SKU as a margin engine" has roughly two quarters to either absorb the SKU into the standard tier — and find new margin elsewhere — or watch a competitor who refactored ship the same feature at the standard price and take the deal. The same dynamic shipped in the reasoning-models piece last year; the 2026 update is that the cost gap between reasoning and standard collapsed faster than most product roadmaps assumed.

3. FinOps stops monitoring and starts re-routing

The third effect is internal. The FinOps function that spent 2024–2025 watching the inference bill, building dashboards and writing quarterly cost-attribution reports becomes a different function in 2026. It still does the reporting; the new job is to renegotiate the model mix the moment a new tier ships. The decision the function used to make once a year — which provider, which model, which contract — is now a decision it makes every time the provider's price card updates, which is roughly every 3–4 months. The provider due-diligence piece covered the contractual side of running a multi-vendor portfolio; the FinOps side is the operational layer that turns the portfolio into a routing decision the system makes by itself, not a procurement decision the team makes by hand.

The router that reacts in hours, not sprints

The technical artefact that separates a vendor who captured the May 2026 drop from a vendor who watched it happen is the model router. Not the router in the sense of "a config file with the model name". The router in the sense of a small service inside the pipeline that decides, per call, which model the call goes to, with policy that updates without a deploy. The shape that holds:

The policy is data, not code

The mapping from a call's signature — document class, field, expected reasoning depth, tenant SLA — to a model choice lives in a policy table, not in if-statements buried in the inference layer. When a new tier ships, the team updates a row in the table and the routing changes on the next call. The vendor whose routing lives in code has to ship a release to capture a price drop. The vendor whose routing lives in data captures it in an afternoon.

Each route carries a quality SLO

A routing change without an evaluation hook is a quality regression waiting to be discovered by a customer. Every route in the table names a small evaluation set — 50–200 held-out documents, scored by the existing harness — and a quality threshold the route must clear before it goes live. The router runs the evaluation on every policy change, in the same pipeline, with the same scoring function the production system uses. A route that drops below the threshold is rolled back automatically. The vendor pays the evaluation cost — small, predictable — and gets the right to migrate to a cheaper tier within a day of the tier shipping.

Cost telemetry per route, not per service

The dashboards the FinOps function used in 2025 reported cost per service or cost per tenant. The 2026 shape adds a third axis: cost per route. The same tenant on the same service can run on three or four routes simultaneously, and the unit margin of the contract depends on the mix. The dashboard that shows route mix and the dollar-per-document consequence is what lets the FinOps function answer the next question — which is "show me the unit margin on the top ten contracts after the price card moved" — without a week of spreadsheet work.

A migration playbook that does not require a meeting

When the next tier ships, the migration is a three-step run: add the new route to the table, run the evaluation harness, promote the route on a percentage of traffic for 24 hours, promote the rest. None of the three steps requires a sprint planning session. The vendor whose migration requires a sprint planning session is the vendor whose unit margin lags the market by a quarter, every quarter, by construction. The tracing infrastructure that makes the migration safe sits in the tracing piece; the routing change is the surface, the trace is what lets a human read what changed if a customer asks.

The contract clause that survives the next cut

The conversation the vendor most wants to avoid is the one where a customer reads the public price card mid- contract and demands a re-quote. The conversation is unavoidable if the contract does not name the dynamic. The clause that survives diligence on both sides is short, and we now see it in roughly one in three enterprise IDP contracts above $250k ARR:

Price-pass-through, with a margin band. The contract names a reference model tier, a per-document baseline price at that tier, a defined margin band the vendor keeps, and a re-quote cadence — usually quarterly — when the underlying tier price moves by more than a stated threshold. The buyer accepts the band as the vendor's right to a fair margin; the vendor accepts the cadence as the cost of keeping the customer past the next price card. The clause has three operational consequences:

The baseline tier is named, not implied. Both parties can read the public price card and run the same math. There is no place for "we are using a different mix" in the renewal conversation, because the mix is in the contract.
The margin band is honest. A 20–35% margin band is defensible to both procurement and the vendor's CFO; a 60% band is the kind of clause that gets renegotiated by a competitor's proposal six months later.
The re-quote is scheduled. When the next tier ships, the calendar already has the conversation on it. The vendor does not surprise the customer; the customer does not surprise the vendor.

The vendors who refuse the clause are the vendors whose proposals lose to competitors who accept it. The buyers who do not ask for the clause are the buyers whose unit economics drift away from the market for the term of the contract. The clause is not generous. It is the contractual shape that lets both sides keep the relationship through a price card that moves every quarter.

Four anti-patterns that decouple unit margin from the market

The vendors who lose the most in a price-reset cycle do not lose because they failed a technical bar. They lose because four habits from the SaaS-stable era survived into the inference-floating era. The cost of each is silent for a quarter, then visible for the rest of the contract.

Pricing IDP in cents per page with no peg to a model tier. The flat per-page price was a defensible 2024 posture, because the model behind the page was stable and the margin was the vendor's job to manage. The same posture in 2026 means the vendor has signed a 12–24 month commitment to a per-page price that is detached from the floor under it. When the floor drops 30%, the vendor's margin widens for a quarter and then is reset down 40% at renewal, because the customer has the math. The fix is the clause above; the absence of the clause is the bug.

Treating inference as a 12-month planning constant. A vendor whose annual plan assumes a stable inference price has a P&L that misses every quarterly re-pricing event in either direction. The right cadence in 2026 is to refresh the inference assumption every quarter — at minimum — and to rebuild the per-product unit margin at every price card update. The work is small when the data lives in the route table; it is intractable when the data lives in spreadsheets and the routing lives in code.

No governance to force a tier migration when quality is unchanged. The hardest internal habit to break is the engineering preference for "the model we know works". A model that worked in 2025 is not the cheapest model that works in 2026; the cheapest model that works in 2026 is a small sibling of the new tier, which an evaluation run on the held-out set would have promoted in an afternoon. The vendor that has no internal rule — "if a tier 80% cheaper clears the evaluation set, the route migrates within N days" — is the vendor whose inference bill stays at the 2025 baseline for no technical reason. The governance piece covers the audit side; the FinOps governance is the same shape with a different output.

A P&L that does not show margin per tier. The vendor's finance function reads gross margin for the product and stops. In 2026 the right view is gross margin per model tier, with the routing share next to it. The margin on the call that runs on the new tier is materially different from the margin on the call that still runs on the 2025 tier; a P&L that averages the two hides the routing decision that would have moved 20 points of margin in a quarter. The CFO who asks the question "what is our margin if we migrate the remaining 30% of traffic by end-of-quarter" should get the answer in a minute, from a dashboard the FinOps function maintains. The CFO who waits a week for the answer is the CFO whose vendor is one renegotiation behind the market.

The customer-churn trade-off the pass-through invites

The honest case against the pass-through clause is that it raises customer-churn risk in a falling-price market. A customer who reads a 65% drop in inference price and a 20–35% drop in per-document price on the renewal asks the next question — "is the new entrant offering full pass-through?" — and a vendor who said no in the previous cycle is the vendor on the wrong side of that conversation. The trade-off does not disappear with a clause; the clause makes it survivable.

The shape that holds through a price-down cycle has three elements next to the clause. A roadmap of features that depend on the new tier the entrant has not absorbed yet — long-context reasoning over a corpus, multi-document coreference, agentic write-back. A service envelope the entrant cannot match in the first 12 months — named SLAs, evidence package, the audit trail covered in the non- deterministic audit-trail piece, the regulator-side posture from the regulator-in-the-loop piece. And a deployment shape — air-gapped, dedicated tenancy, sovereign data path — covered in the air-gapped piece — that the entrant's price card cannot reach because the entrant is shipping on a single public-cloud SKU.

The vendor who pairs the pass-through clause with that envelope retains the customer at a healthy margin band through two or three price-down cycles. The vendor who offers only the clause and not the envelope retains the customer for one cycle and loses them on the next. The vendor who offers neither loses them at the first cycle and spends the rest of the year explaining the churn line on the board deck.

A 90-day plan — vendor side and buyer side

The work to convert the May 2026 reset into a defensible posture is one quarter on each side. The deliverables are documents, a routing change and a contract template, not a re-platforming.

For the IDP vendor

Days 1–30: the cost-per-document baseline and the route table. Compute, with real tenant data, the cost-per-document at the current model mix and at the target mix on the new tier. Build the route table — one row per document-class-and-field — with the chosen model, the held-out evaluation set, the quality threshold and the cost-per-route. Move the table out of code and into the deployment manifest. The output is the new internal baseline that informs every renewal conversation for the next quarter.

Days 31–60: the router refactor and the FinOps dashboard. Move the per-call routing decision out of the inference layer and into a small service whose policy is the route table. Wire the evaluation harness to the router so a policy change runs the eval before promotion. Build the cost-per-route dashboard with the routing share next to the unit margin per tenant. By the end of day 60 the team is migrating routes in an afternoon, not in a sprint, and the CFO can answer the margin question in a minute.

Days 61–90: the contract template and the renewal conversation. Rewrite the standard contract to include the named reference tier, the per-document baseline price, the margin band and the quarterly re-quote cadence. Apply the new template to the next three enterprise renewals to test the buyer reaction; calibrate the band to what holds. Brief the sales team on the framing — the clause is not a concession, it is the right of both sides to keep the relationship past the next tier launch. The output is the renewal posture that survives the rest of the year.

For the regulated buyer

Days 1–30: the cost-attribution view. Take the current IDP contract — and the related provider contracts behind it — and build a one-page view of the per-document price, the implied inference share and the margin the vendor is keeping at the current tier. The view does not need vendor cooperation; the public price card and the contract volumes are enough. The output is the version of the math both sides will see in the renewal.

Days 31–60: the contract refresh. Add the pass-through clause to the standard procurement template. Name the reference tier, define an acceptable margin band, set the re-quote cadence quarterly, and write the breakpoint at which the clause triggers — typically a 10% move in either direction. Apply the refreshed template to one active renewal. The result is the calibrated clause the team uses on every subsequent enterprise IDP renewal.

Days 61–90: the multi-vendor split. The buyer with one IDP vendor reads the price card and asks for a re-quote. The buyer with two IDP vendors on a defined split — covered in the provider due-diligence piece — asks for a re-quote and means it, because the alternative is real. Move 10–20% of the volume to a second vendor on a stable workload, instrument both sides on the same dashboard, and run the renewal conversation with the split as the leverage. The buyer whose vendor knows the buyer can move 30% of the volume in a quarter is the buyer whose vendor signs the pass-through clause without theatre.

What this means for document AI specifically

Document AI sits at the centre of the reset for three reasons. The workload is high-volume and steady, so the elasticity of cost-per-document maps directly to contract margin. The schemas are stable enough that a routing change can be evaluated against a held-out set in an afternoon. And the buyers are sophisticated enough — banks, accounting firms, insurance carriers, regulators — that the renewal conversation will be informed by the public price card whether the vendor wants it or not.

The cents-per-page price loses information. A vendor who quotes a flat cents-per-page number in 2026 is a vendor whose proposal is read as either generous, in which case the buyer signs and the vendor regrets it, or stingy, in which case the buyer asks for a re-quote at the first price card update. The shape that holds is the per-document baseline at a named tier with a margin band, not the flat number.

The route table is the product. A document AI platform that does not expose its routing decisions — at least to its FinOps team, ideally to the customer's — is a platform whose unit margin is invisible to the people who manage it. The route table is the artefact that turns "we use a frontier model" into "we use these specific tiers on these specific document classes with this specific quality threshold". The extraction-to- decision piece covers the decision side; the route table is the cost side of the same surface.

Evidence outlasts the price card. The single thing that does not get cheaper at the next price cut is the evidence layer — the structured trace, the signed bundle, the regression report, the audit export. A vendor whose pricing rests only on the inference cost has no defensible margin when the inference cost halves. A vendor whose pricing rests on the inference cost plus a named evidence envelope has a margin the price card cannot compress. The audit- evidence piece is the operational version of the same observation: the evidence is the part of the product the next price drop does not touch.

Closing thought

The May 2026 cut is not the last cut. The next tier is on a roadmap a few quarters out, and the one after that is on the slide a lab will publish at their next event. The vendor that survives the cycle is not the vendor with the cleverest pricing engineering today. It is the vendor whose routing, FinOps and contract templates assume the cycle is the operating condition, not the surprise. The buyer that captures the cycle is the buyer whose procurement template has the pass-through clause in it and the multi-vendor split behind it. Everyone else pays a quiet tax for the rest of the contract.

At Cogneris we treat the route table, the evaluation harness and the pass-through clause as part of the product — because the inference floor moves, and the vendor whose unit economics survive the move is the vendor whose customers stay through the move. If you are sizing the cost-per-document side of your document AI programme, or rewriting the procurement template before the next renewal, see our product page, the pricing page, or talk to our team. The model is what the headline writes about; the route table is what the renewal turns on.

Frontier inference at commodity prices.