AI operating model: six artifacts, 90 days

The 50-point jump that did not close the gap

The CAIO numbers are by now well-rehearsed: 26% of large enterprises had a named executive accountable for AI a year ago, 76% have one today. That shift also changed the document AI buyer. The story we keep hearing from the next conversation is the one this post is about: the buyer hired the title, the title is in seat, and the RFP they responded to last week still failed at the operating-model question.

The pattern is consistent enough to name. A CAIO joins; in the first 60 days they publish an AI policy, stand up a steering committee, and commission a use-case inventory. The artifacts feel like an operating model. They are not. The first audit, the first regulator question, the first vendor RFP that asks "how do you enforce this?" — and the artifacts come apart at the seams, because they were a strategy slide in operating-model clothing.

None of that is the CAIO's fault. The role is new, the analyst templates conflict, and the first quarter is consumed by stakeholder rounds. The operating model is the second quarter's job, and the artifact list below is the shape the deals we close ask for.

What the operating model actually contains

We have now seen the operating-model section of enough RFPs to triangulate. Six artifacts show up in almost every one — sometimes under different names, but with the same content. A CAIO who can drop each of these in a board pack, an auditor's binder and a customer diligence pack — without rewriting them three times — has the operating model the title was hired to install.

1. A risk taxonomy mapped to use cases

Not a policy paragraph that says "we classify risk." A table. Columns: use case, risk class, framework citation (NIST AI RMF / ISO/IEC 42001 / EU AI Act Annex III), accountable owner, allowed deployment contexts. Rows: every production and pilot AI workflow in the org. The point of the artifact is to make "where does this use case live?" a five-second lookup for the CAIO, internal audit and the customer. The most common failure is a risk taxonomy that exists only as the auditor's spreadsheet, refreshed once a year. The operating-model version lives in the same registry the deployment platform reads, so a workflow cannot ship at a risk class higher than its policy gate.

2. A model and prompt registry with lineage

Every model, prompt, schema and tool definition the org uses in production — versioned, with lineage. The registry answers three questions on demand: which version is in production right now, which versions ran the last 10,000 cases, and what changed at each bump. Without it, every "why did the output change?" question turns into an archaeology exercise; with it, the answer is one query. Document AI programs that built a registry early treat the audit trail as a runtime artifact rather than a forensic reconstruction. Programs that skipped it routinely discover, six months in, that the case the regulator is asking about ran on a prompt that no longer exists.

3. A governance committee with a cross-functional mandate

Two patterns deliver. One: a single AI risk committee chaired by the CAIO with named seats for Risk, Security, Legal/Privacy, Audit, the LOB owner and Procurement, meeting on a fixed cadence with a documented decision log. Two: a federated model where the CAIO owns the framework and platform, and each LOB owns delivery against it, with a steering body that handles cross-LOB conflicts. Both work. What does not work is two parallel committees — a security AI committee that says no to a vendor the AI committee already approved. The cross-mandate is what keeps the decision from being relitigated in every quarterly review.

4. A vendor evaluation policy that is actually graded

The artifact is a one-page rubric, not a procurement checklist. Sections include framework alignment (NIST AI RMF mapping, ISO/IEC 42001 readiness, EU AI Act class per use case), sub-processor topology, per-tenant isolation, audit-trail retrieval, kill-switch protocol, and unit economics per business event. Each vendor is scored; the score lands in the registry next to the use case. The anti-pattern is a vendor policy that exists as a PDF nobody scores against — every deal is then a fresh debate, and the CAIO ends up adjudicating the same vendor twice in the same year.

5. Adoption and outcome metrics per business line

Two numerators, one denominator. The numerators are the ones the board now asks for: cost per business event (cost of "process one claim", "underwrite one application", "triage one ticket") and outcome quality (case-level accuracy and blast-radius envelope per use case). The denominator is the business unit. The artifact is a quarterly scorecard, per LOB, with the two ratios, the trend, and the named accountable owner. Vanity metrics — number of employees using a chatbot, number of prompts run — do not survive a second board cycle. The CFO and the CRO both want the same number, and they want it tied to a unit the rest of the business already understands.

6. A kill switch and an incident protocol

Not a slide. A documented, tested procedure: who has authority to pause a workflow, how fast it pauses (and the measured time on the last drill), what the fallback path is, what the customer communication template is, and how the post-incident review feeds back into the registry. The kill switch is the single artifact regulators ask to see executed in a tabletop. We routinely watch AI programs pass every other section of a diligence pack and fail the kill-switch demo, because the procedure was written but never rehearsed. The operating-model version is rehearsed quarterly.

The operating model is six artifacts. The first quarter of the CAIO role usually delivers four of them in slide form. The deals that close are won by the quarter that turns the four slides into runtime artifacts and ships the missing two.

Three reference architectures, and where they agree

The major advisory shops have each published their own version of the operating-model picture. They argue about boxes; they agree about content. A brief tour of where they converge, because the CAIO will be asked to map to at least one of them, and the answer is usually "all three, with the same artifacts underneath."

IBM — AI value creation, operationalised

IBM's operating-model picture frames AI as a function with four layers: strategy, governance, platform and people. The substance the architecture demands maps cleanly to the six artifacts — risk taxonomy and adoption metrics in the strategy layer, registry and kill switch in the platform layer, committee in governance, accountable owners in people. The useful thing IBM publishes is the platform-side detail: model lifecycle controls, lineage, and the runtime hooks that turn policy into enforcement. The less-useful thing is the way the picture undersells how much of the work is procurement: half the operating model is the vendor rubric, and IBM's picture treats vendor decisions as inputs to a finished platform.

PwC — AI governance and the controls inventory

PwC's framing leans on controls. They publish a long control catalogue that maps to NIST AI RMF, ISO/IEC 42001 and the EU AI Act, with the controls organised by lifecycle stage. The artifact a CAIO will recognise is the controls inventory — every policy, every check, every monitoring point — and that inventory is what an internal audit team can run against. PwC's version is the easiest to defend in a regulated industry because it speaks audit-language natively. The trade-off is that the controls framing can flatten the business-outcome side; a controls inventory with no adoption metric is a compliance program with no program.

Deloitte — the AI operating model triangle

Deloitte publishes the triangle most often quoted in board decks: strategy, governance and execution, with a value-realisation loop. Their useful contribution is the operating-rhythm side — committee cadence, board reporting cadence, escalation paths — the bits of operating-model practice that are not artifacts but habits. CAIOs who borrow the cadence pattern from Deloitte and the controls discipline from PwC and the platform detail from IBM end up with something that looks like all three and is none of them — and that is the right outcome. The frameworks are scaffolding; the artifacts are the building.

The convergence is the point. Whatever picture the CAIO sponsors internally, the six artifacts must be in the artefacts list under the picture, or the picture is a poster.

Three anti-patterns we keep seeing

The operating model fails in predictable ways. Three patterns cover most of the failures we have watched from the vendor side of the table.

Parallel governance

The CAIO stands up an AI committee. The CISO already runs a security committee. Legal runs a privacy committee. Procurement runs a vendor committee. Nobody redraws the boundary. The result: the same vendor gets three different answers in the same quarter, the LOB owner stops bringing things to any of them, and shadow deployments grow under the radar. The fix is unglamorous: a single chartered AI risk forum with named seats from each adjacent function, a documented decision rule when the seats disagree, and a sunset clause for the parallel committees so the boundary actually moves. The most common failure mode is leaving the parallel committees in place "for now"; "for now" lasts until the first incident.

Policy without enforcement

The AI policy is published on the intranet. It says use cases above a certain risk class require committee approval. The deployment platform has no idea the policy exists. Engineers ship what they ship; the policy is found, post-incident, by the regulator. The fix is to put the policy in the same place the deployment platform reads from — the registry — and to make the platform refuse to deploy a workflow whose risk class exceeds its approval state. This is the boring half of the operating model. It is also the half the audit will check first, because a policy that is not enforced is, at best, a marketing document.

Vanity metrics

The first quarterly AI report goes to the board and is full of the wrong numbers. Number of employees using a chatbot. Number of prompts run. Tokens processed. None of these answer the question the board is asking, which is whether the program is paying back and whether the next incident is survivable. The fix is the scorecard we described under artifact 5 — cost per business event, outcome quality, blast-radius envelope. The pattern we see at programs that get re-funded is that they report the boring denominator numbers from quarter one, even though the absolute numbers are small, because changing the scorecard later is harder than landing on the right one early.

A 90-day build

The plan that has worked, in roughly this order, for the programs that landed the operating model inside a quarter. None of the steps are heroic; the discipline is doing them in order and stopping when each artifact is shipped, not when it feels finished.

Days 1–30 — Inventory and accountability

Two outputs by day 30. First, the use-case inventory: every production and pilot AI workflow in the org, with the LOB owner, the risk class (provisional is fine), the framework citation and the data classes touched. Second, the named accountable owner per use case — one human name, with authority to pause. The mechanics are interview-based: the CAIO and a small team sit with each LOB for an hour, build the row, and move on. The output is rough; the point is that no production workflow leaves day 30 without an owner. Trying to perfect the taxonomy before naming owners is the classic month-one trap.

Days 31–60 — Registry, vendor rubric, kill switch

Three artifacts ship by day 60. The model and prompt registry, even if it only covers the top quartile of workflows by volume. The vendor evaluation rubric, applied to the existing vendor list, with the scores written down (and the disagreements written down too — they will be useful in month three). The kill-switch protocol and the first tabletop drill, for the two highest-risk production workflows. The temptation in this block is to widen scope — to register every prompt, to score every vendor. Resist it. The discipline is to ship the artifacts at the smallest useful scope and let the registry grow as the platform integrations catch up.

Days 61–90 — Enforcement, metrics, board pack

Three outputs by day 90. The deployment platform is wired to the registry, and a workflow whose risk class exceeds its approval state cannot ship. The quarterly scorecard, per LOB, with the two ratios and the named owners, is in production. The first board pack lands — six artifacts, three anti-patterns addressed, one tabletop drill on the record, and a plan for the next quarter. The objective for the board pack is not to impress; it is to set the cadence that survives the second and third pack, when the novelty is gone and the discipline is what is left.

Programs that hit this plan tend to share two traits. The CAIO has direct authority over the registry and the deployment gate — not "input into," not "consulted on." And the board has agreed in advance to one cadence for AI reporting, so the second quarter is not a redesign of the scorecard but a refinement of the same one.

What the RFP filter now looks like

From the vendor side of the conversation, the operating-model question has reshaped the RFP. The questions below are not new in their wording — most of them appeared in scattered form a year ago — but the ordering and the disqualifiers have hardened. These are the ones that, in our recent experience, sit in the first ten pages of the document and decide whether the vendor gets the demo slot.

Section	What the CAIO is checking	What disqualifies
Framework mapping	Use-case-by-use-case NIST AI RMF / ISO/IEC 42001 / EU AI Act mapping, not a one-line slogan.	"We follow the frameworks" with no table.
Audit-trail retrieval	Per-case reconstruction (page hash, model version, prompt version, schema version, output, latency, cost, decision path), retrievable on demand for the retention window.	Logs at the aggregate level only. No per-case reconstruction.
Registry fit	The vendor's models, prompts and schemas can be reflected in the customer's registry — exportable, queryable, versioned.	Opaque pipelines whose version state lives only inside the vendor.
Per-tenant isolation	Data, prompts, models and audit logs separated per tenant. Tenant-specific policy can override platform defaults.	Shared models with no tenant boundary in the audit trail.
Kill switch on the customer side	The customer's CAIO can pause the workflow without opening a vendor support ticket; the SLA on the pause is measurable.	Pause requires vendor intervention. No documented SLA on the pause.
Sub-processor topology	Every model provider, observability vendor and infrastructure dependency, with zero-retention flags and DPAs to back them. We covered the shape of a defensible answer in our DPA piece.	Ambiguous list that does not survive vendor risk review.
Unit economics	Cost per business event and outcome quality per use case — with the routing pattern that produced them.	A single accuracy number with no use-case context.

The interesting shift is the ordering. A year ago the first question was accuracy, the second was cost per page. Today, those questions live in the middle of the document; the first pages are about the operating model. Vendors that built their platform around an audit-first architecture tend to find this RFP shorter to answer than the previous one — the artifacts are already emitted by the system. Vendors that wrapped a single model end up answering "we plan to align" to four of the seven sections, which is the line that gets them filtered before the demo.

The honest part: where the model is still half-built

Two things keep coming up in conversations with CAIOs who are running this plan now. Neither has a clean answer; both shape how the operating model evolves over the next year.

The reporting line keeps moving. CAIOs report to the CEO, the CIO, the COO or the Chief Risk Officer, in roughly even quarters across the industry. The line decides what the operating model emphasises: a CAIO under the CEO drives program-level investment; under the CIO, platform consolidation; under the CRO, governance maturity before scale. None of these is wrong, but the operating model has to be portable across reporting lines, because the line will probably change before the model is mature.

The role might be transitional. A non-trivial set of analysts argue the CAIO is a 3–5 year function — long enough to install the operating model, then folded back into CIO, COO or CRO. Others argue the function is permanent. The operating model has to be defensible either way. The six artifacts we listed are the ones that outlast the title. If the seat is dissolved in 2029, the registry, the rubric and the kill-switch protocol still run; what changes is who chairs the committee. CAIOs who build for that durability tend to keep the budget through reorganisations; ones who build a structure that only works while the title exists tend to be relitigated the moment the org chart moves.

Closing thought

The operating model is not a new genre of corporate document. It is the same operating model the CISO already runs for security, the CFO already runs for capital, the COO already runs for delivery — applied to AI, with the same insistence on artifacts over slogans. The CAIO title is one year old in most large enterprises; the artifacts are decades old in their adjacent functions. Borrowing the shape that already works is the path that closes the quarter.

At Cogneris we built document AI as a per-case auditable pipeline because that was the right shape for regulated extraction long before "AI operating model" was the phrase. The shift makes the conversation easier: a CAIO building the six artifacts above can treat the platform as one of the inputs to the registry rather than another vendor whose claims have to be verified by hand. If you are mid-build on the operating model and want to compare your in-flight artifacts against what a vendor can ship into them, see our product page or talk to our team. We would rather have the artifact conversation in the first 10 minutes than at the end of the deal.

Title alone is not an operating model.