The trigger moved from a click to an event in the cloud
For most of the last decade the back-office workflow had a shape that survived three waves of automation. A human spotted something — a new invoice in a shared mailbox, a flagged transaction on a dashboard, a regulator notice in a watchlist — and opened a ticket. The ticket carried the context. Another human, or a script that a human kicked off, worked the ticket. The system of record updated. The case closed. Everything downstream was paced by the moment a person decided to look.
The shape that consolidated in 2026 is different in one load-bearing way: the agent does not wait for a person to look. The agent runs continuously, subscribed to the sources the work actually originates in — the mailbox, the ERP event bus, the supplier portal webhook, the sanctions feed, the regulator's RSS, the cloud-storage drop folder. When an event lands, the agent decides what intent it corresponds to, drives the workflow to completion against a contract it can read, and writes back. Only the exception — the case the agent's own confidence model rejects, or the policy says a human must sign — reaches a queue a person will open later. The back-office that still operates on the old shape is paying its operators to do the easy 90% so nobody is available for the 10% the agent could not finish on its own.
The shift is not a new framework. It is the second-order consequence of three boring things that all became cheap enough at the same time in early 2026: continuous inference priced for steady-state load rather than for spikes; event sources that finally publish webhooks the way the protocol intended; and tool-use plumbing that lets an agent call a system of record with the same authorisation an operator would have used. Where any one of the three is missing, "always-on" is a label, not an architecture. Where all three are present, the unit of economic work in the back-office stops being the ticket and becomes the event.
Why the back-office is the first place the shift lands
The back-office is where the math is most obviously on the agent's side. The work is high-volume, the rules are documented, the systems of record are reachable through an API, and the marginal value of finishing a case in 4 minutes instead of 4 hours is large and easy to count. Five workflow families absorbed the always-on pattern first, in roughly the order the procurement conversation reached them.
Accounts payable. Invoices arrive continuously, in the formats the supplier chose, against purchase orders that already live in the ERP. An always-on agent watches the inbox and the EDI feed, extracts the invoice, runs the 3-way match against the order and the receipt, posts the journal entry when the match holds, and routes the discrepancy to the operator with the evidence pre-loaded. The shape is covered in detail in the extraction-to- execution piece. AP is the first place the CFO sees the number: time-to-pay drops, exception queue shrinks, the operator's day reorganises around the cases that need judgement.
Vendor lifecycle and continuous monitoring. KYB used to be a one-time check at onboarding and an annual review. The always-on shape subscribes to sanctions feeds, adverse-media services, beneficial-owner registries and bank-account-change events, re-scores the supplier continuously and opens a case only when the score crosses a threshold or a hard signal lands. The operator stops reading the watchlist; the agent does. The team's operating model needs to absorb the change — see the operating- model piece for the artefacts the CAIO has to install for that absorption to stick.
Bank reconciliation and treasury exceptions. The ledger and the bank statement do not agree on a small share of lines every day. An always-on agent pulls the statement on settlement, matches what matches, attempts a bounded set of repair moves (FX rounding, fee allocation, timing differences) and queues the residue. The treasury analyst stops opening the reconciliation file at 9 a.m. and starts opening only the cases the agent flagged.
Regulatory monitoring. A regulator publishes a circular, a tax authority changes a schema, a privacy authority opens a consultation. The always-on agent watches the relevant feeds, classifies the document against the matters the firm cares about, drafts the diff against the playbook the firm already operates, and opens a structured task for the compliance owner with the section, the change and the suggested update pre-attached. The compliance lead's calendar stops being filled by reading; it gets filled by deciding.
Continuous due diligence. A counterparty's public filing changes, a credit signal shifts, a litigation record updates. The always-on agent re-runs the diligence the deal team did at signing and surfaces the deltas against the underwriting memo. The conversation in the Monday meeting moves from "let's pull a fresh report" to "the agent flagged three deltas; here is the one that matters."
In each of the five, the savings are not the headline. The headline is that the latency on the exception path collapses — the case the agent could not finish on its own reaches a human within minutes of the event, with the evidence already loaded, instead of within hours or days of the next time a human chose to look. Organisations that still operate "open ticket, someone responds" report 60–80% higher latency on exception flow than peers who put an always-on agent on the same queue. The latency is the differentiator the procurement conversation moved to in the first half of 2026.
The four design patterns that consolidated in 2026
What consolidated in 2026 is not the always-on idea — that one had been circulating since 2024 — but the engineering contract that makes it survivable in production. Four patterns covered most of what we shipped and most of what the buyers' security reviews approved.
1. Subscription over polling, polling over event-driven, idle by default
The first decision is how the agent learns there is work. Where the source supports a webhook or a streaming subscription, the agent subscribes; the source pays the cost of telling the agent something happened, the agent wakes only for real events, and the cost curve is a function of volume, not of patience. Where the source only exposes a list-and-cursor API, the agent polls — on a cadence the source allows, with an exponential back-off when the response is empty, and with a ceiling that does not let the polling cost dominate the inference cost. The anti-pattern, common in teams that ship the always-on label without the engineering, is to schedule a recurring full scan of the source every five minutes regardless of whether anything changed: the bill is constant, the latency is worst-case, and the source's rate-limit budget is consumed for nothing. The default state of an always-on agent is idle. The trick is to be idle in a way that wakes fast.
2. Per-agent budget, not per-tenant budget
The cost model that survived the year is per-agent, priced against the work the agent actually closes. Per-tenant budgets — "this customer can spend X per month on agentic actions" — fail because the customer cannot reason about the unit. Per-agent budgets — "the AP agent can spend Y per invoice cycle, with a hard ceiling and a soft alert at 80%" — give the customer a number they can map to the work the agent is paid for. Each always-on agent gets its own budget envelope, with three knobs the operator can move: a token ceiling per intent, a per-hour run-rate cap, and a per-day spend cap. When any of the three trips, the agent stops, files a structured budget event, and waits for a human to extend the envelope or to investigate why the work-rate moved. The economics of this is the same conversation the compute- moat piece ran for inference margin — the always-on loop is where that margin gets spent if the envelope is loose.
3. Kill switch and dead-man's switch as first-class artefacts
An always-on agent that the operator cannot stop without filing a ticket is the operator's worst hour, eventually. Two switches need to be on the dashboard, named, tested in production at least once a quarter. The kill switch is the operator's explicit stop: one action, propagated within seconds, the agent acknowledges the stop in the audit log, drops the lease on its work queue, and exits idle without taking new events. The dead-man's switch is the agent's own stop: a heartbeat the agent publishes on every loop, a watchdog the platform runs on the heartbeat, and an automatic stop when the heartbeat falls below threshold — because an agent that has stopped reasoning correctly about whether to act is more dangerous than an agent that has stopped acting. Both switches need a documented rollback procedure. A buyer's security review will ask for both by name in the first procurement call; the vendor that ships the switches without the rehearsal loses the deal in the second.
4. Continuous observability with a ground-truth-of-when-to-stop signal
The classic web-service observability stack — latency histograms, error rates, throughput, queue depth — is necessary but not sufficient. The signal an always-on agent needs that a web service does not is a ground-truth-of-when-to-stop: per intent, per case, an explicit machine-readable acknowledgement from the downstream system that the work is finished. Without it, the agent can keep retrying a case that the downstream already accepted, double-paying invoices, double-opening tickets and double-charging itself for inference. The observability surface that survived the year publishes four things per agent: events consumed, intents fired, terminal acknowledgements received, and budget spent — with a per-tenant audit export that matches the trace shape covered in the tracing piece. Anything less is a dashboard that looks busy and answers the wrong question.
The always-on agent is not the team's new intern. It is the team's new background process — and background processes that no operator can stop, observe or budget become incident reports the operator did not write.
The four artefacts on one page
The conversation with security and platform engineering is shorter when the four patterns above are presented as a contract on a single page. The shape that survived ten reviews:
| Pattern | What ships | What the security review asks |
|---|---|---|
| Subscription / polling / idle | Per-source declaration of how the agent learns there is work, with cadence, back-off ceiling and a wake-on-event hook where the source supports it. | "How does this agent know it has work, and what is the worst-case rate of source calls per hour if the source goes silent?" |
| Per-agent budget | Token ceiling per intent, per-hour run-rate cap, per-day spend cap, with a structured budget-trip event that pages an owner. | "What happens at 100% of budget, who is paged and how long does the agent stay stopped?" |
| Kill switch + dead-man's switch | One-action operator stop with a documented rollback; heartbeat + watchdog + automatic stop on missing heartbeat; both rehearsed quarterly. | "Show me the runbook, the last rehearsal date and the audit-log entry the stop produces." |
| Ground-truth-of-when-to-stop | Per intent, a machine-readable terminal ack from the downstream system, surfaced in the observability stack and in the per-tenant audit export. | "Prove to me the agent will not double-process a case if the downstream is slow to acknowledge." |
The anti-patterns that quietly retire an always-on agent
Failure here is rarely loud. The agent does not crash; it drifts, double-fires or silently spends budget the owner did not know was open. Three patterns cover most of what we have seen retire an always-on deployment.
Always-on without a rate limiter on the downstream
The agent wakes on a source event and calls the ERP. The source fires a hundred events in a minute because a batch loader upstream ran. The agent fires a hundred ERP calls. The ERP's rate limit kicks in, the calls fail, the agent retries with the same payload — and the ERP's operator pages the agent's owner asking why the integration is hammering the system. The fix is boring and load-bearing: every always-on agent ships with a token bucket per downstream, sized to the downstream's published or negotiated rate, with a queue for the overflow and an explicit policy for what the agent does when the queue drains slower than it fills. The boring fix is also the reason the always-on agent does not become an outage.
No circuit breaker on the external integration
The downstream is degraded — the ERP returns 5xx, the sanctions feed returns malformed JSON, the supplier portal is down for maintenance. The agent retries, the retries cost tokens, the tokens cost money, the work-rate stays flat and the budget burns. A circuit breaker that opens on a sustained failure rate, holds open for a back-off window and probes on a half-open schedule turns the degraded downstream into a paused agent instead of a spending one. The same pattern the API team has run on synchronous calls for a decade is the pattern the agent team has to run on agentic loops in 2026.
No ground truth for "the work is done"
The agent extracts an invoice, posts the journal entry, and considers the case closed. The ERP, on its own schedule, rejects the entry for a reason the agent did not check — the period is closed, the tax code is wrong, the cost centre was reorganised yesterday. The agent moves on; the case rots in a half-finished state nobody looks at, because the dashboard reads "closed" on the agent's side and "not posted" on the ERP's side. The fix is the ground-truth-of-when-to-stop signal from the pattern above, treated as load-bearing — the agent's case state is not closed until the downstream's terminal acknowledgement arrives, and the audit export shows both sides of the handshake. Vendors that treat the acknowledgement as nice-to-have ship a pipeline that looks healthy on the dashboard and looks broken in the month-end close.
The cost trade-off: continuous inference vs trigger-based
The honest answer to "is always-on cheaper or more expensive?" is that it depends on what the agent's idle looks like. The math has three terms, and any pitch that skips one is a pitch that will fail a CFO's diligence.
The first term is the inference cost of the idle state itself. An always-on agent that wakes on a subscription and exits idle without spending inference tokens costs the platform a heartbeat and a watchdog — cents per agent per day. An always-on agent that polls every minute regardless of activity spends inference on empty work, and the bill is constant. The first design decision — subscription over polling, idle by default — is the difference between a per-event cost curve and a per-minute cost curve, and the gap compounds quickly.
The second term is the cost per event processed versus the cost per ticket processed under the old shape. The always-on cost includes the full agent loop — extract, decide, write-back, observe; the old cost included an operator's time. Where the operator's fully loaded hourly cost is high and the per- event token spend is bounded, the always-on cost is lower per event in the steady state. Where the per-event spend is unbounded — the agent retries on a poorly defined terminal condition, or the prompt grows because nobody capped it — the savings reverse. The cap is the work.
The third term is the cost of the exception path. Always-on agents that finish 90% of cases on their own concentrate the human time on the 10%, and the 10% is harder than the average ticket the operator used to see. The operator's role moves from "process the easy 90%" to "decide the hard 10%" — and the per-case time goes up. The total team cost still drops because the 90% costs almost nothing, but the per-case cost of the exception is not the per-case cost of the old ticket. A plan that does not budget the harder exception path is a plan that will show "agent saved 70% of operator time" on the slide and "operator still working overtime on exceptions" in the team's standup.
The shape we have seen hold across the five workflow families in the second section is the same one: 60–80% lower cost per event in steady state, a higher per-case cost on the exception path, a flat overall bill in the first 60 days while the calibration settles, and a step change down in months 3–6 once the agent's confidence model is calibrated against the downstream's actual error modes. Vendors that promise the step change in month 1 are pricing on a future the buyer pays for; vendors that publish the three-term build-up upfront close the deal in the first procurement call.
What this looks like for document AI specifically
Document AI sits at a particular point in the always-on shape. Documents are the artefact most always-on back- office workflows are watching for — the invoice in the mailbox, the bank statement in the drop folder, the contract change in the supplier portal, the regulator notice in the watchlist. Three design choices change for a document pipeline once the requester is an always-on agent rather than an operator clicking "process".
Streaming intake, not batch intake. The classic IDP shape is a batch — the operator drops a folder, the pipeline runs, the JSON comes back. The always-on shape is a stream — the source publishes one document at a time, the pipeline accepts it the moment it lands, the agent receives a per-document webhook on completion. The pipeline that still batches makes the always-on agent wait, and the always-on agent that waits is just a poller in disguise. The streaming intake contract is the one we ship by default for agent-mode clients; the architecture is summarised in the VLM pipeline piece.
Idempotency keys are not optional. Always-on agents retry. The pipeline that does not give the agent an idempotency key on every endpoint creates duplicate cases, duplicate extractions and duplicate write-backs, and the bill is paid twice. The contract is the same as for the synchronous agent-mode endpoint: same key, same payload, same result, no side effect. The idempotency window has to be long enough to cover the agent's retry budget — minutes, not seconds — and the replay has to be cheap, because the agent will replay.
Terminal acknowledgement on the agent's cadence. The pipeline's "done" signal has to match the agent's loop. A webhook on completion with a signed payload, a structured outcome (success, partial, rejected, needs-human), the page hashes that produced the outcome and the model version that ran is what the agent's ground-truth-of-when-to-stop signal consumes. Pipelines that publish a polling endpoint instead of a webhook force the agent into the cost curve the subscription pattern was designed to avoid; pipelines that publish a webhook without a signed payload force the agent to trust an unauthenticated source. The audit-trail shape covered in the audit- trail piece is the one the agent's observability stack expects to receive.
Where the contract is still half-built
Two pieces of the always-on contract are still moving. Any team adopting the patterns above in the next two quarters should know which assumptions are durable and which will be relitigated.
Agent identity for the source. The downstream system can authenticate the agent today — the agent presents a token, the downstream verifies it. The source system, in many cases, cannot. A webhook publisher does not know whether the receiver is the agent the operator approved or a service account that was re-used. Signed agent attestations are emerging, the identity layer covered in the capability- contracts piece is the one that will settle for the receiver side as well, but the standard is half a standard. The practical defence is to scope the source's webhook to a single endpoint per agent, rotate the endpoint's secret on the same cadence as any other production credential, and log the source-to-agent path in the audit export.
Cross-agent coordination. One always-on agent on one queue is a solved problem. Two always-on agents on overlapping queues — the AP agent and the vendor-monitoring agent both reacting to the same supplier event — is not. The coordination layer that lets agents declare what they intend to act on, and lets the platform refuse a second agent from acting on the same case without an explicit handover, is the next contested piece. The multi- agent piece covers the orchestration shape that survived our first deployment; we expect the contract to tighten over the next two quarters.
Closing thought
The always-on shape is not a feature to ship; it is a unit of work to redesign around. The teams that treated 2026 as the year to move their back-office workflows from human-opened tickets to agent-processed events answer the same volume with a smaller queue, a shorter exception path and a budget envelope the CFO can read. The teams that kept the ticket as the unit of work spent the year explaining, on the renewal call, why the latency on their exception path is worse than the peer who installed an always-on agent on the same source. The patterns above are not sufficient to win the year on their own, but no team we have seen ship an always-on deployment to production has skipped them and kept it there.
At Cogneris we built our agent-mode endpoints, streaming intake, idempotency contract and signed terminal acknowledgements for exactly this shape — the always-on agent that calls a document pipeline as part of its loop, not the operator who opens a portal and clicks "process". If you are mapping your back-office workflows against the four patterns above and want to compare what the document layer should look like in an always-on deployment, see our product page, the agents documentation, or talk to our team. We would rather have the rate-limiter and kill-switch conversation in the first 10 minutes than read about it in next quarter's post-mortem.