Audit Trail in Document AI — Definition

Audit trail · Document AI requirement · Last updated 2026-05-07

Definition

An audit trail is an immutable, time-ordered record of every operation performed on a document. In a document AI workflow, the audit trail captures who did what, when, with which model and prompt, against which input, and what was returned. It's the artifact regulators, internal audit, and disputing customers all eventually ask for. Without it, your AI workflow is a black box and you can't defend a decision after the fact.

"The model decided X" is not an audit trail. "Model version 2026.04.18, prompt hash abc123, response payload 0xDE.., reviewer Sarah Chen confirmed at 2026-04-22 14:22:09 UTC, source page 3 region [120, 240, 580, 320]" is an audit trail.

What an AI audit trail must capture

Document lineage — when received, who uploaded it, hash of the file bytes, original format and size, downstream destinations.
Pipeline stages traversed — classifier output, layout regions identified, extraction calls made.
Model and version metadata — for every LLM call: provider, model name, version, deployment region, prompt hash (not necessarily the prompt content), response payload, latency.
Field-level outputs with citations — extracted value, confidence score, source page and region, model version that produced it.
Validation results — which rules ran, which passed, which failed, with what magnitudes.
Human actions — reviewer identity, timestamp, before-value and after-value if changed, free-text reasoning if entered.
Downstream events — webhook fired, ERP record created, document deleted on retention expiry.

Why this matters now

Three forces converged. SOC 2 Type II auditors require evidence of access control and change control on every operation. GDPR Article 22 gives data subjects the right to challenge automated decisions; you can't respond to a challenge without a complete record. EU AI Act Article 13 mandates record-keeping for high-risk AI systems. None of these are satisfied by application logs that get deleted in 30 days.

Retention defaults that buyers should expect

7 years is the right default for audit metadata in regulated workflows. It covers the longest practical statute of limitations for financial-services litigation, the IRS audit window in the US, and most EU-jurisdiction record-keeping rules. If you process protected health information under HIPAA, the audit-trail retention is shorter (6 years) but the document-content retention is longer.

The audit trail's retention can — and usually should — be longer than the document-content retention. Customers can delete documents on schedule (often 30–90 days for ephemeral workflows) while keeping the audit metadata for the full 7 years. The metadata proves the work was done; the content is no longer needed.

What auditors actually look for

From running enterprise security questionnaires and SOC 2 audit walkthroughs:

Immutability — can the audit trail be modified or deleted by a privileged user? Append-only is the right answer.
Completeness — pull a random transaction and trace it end-to-end. Every step should have a record. Gaps mean the trail is fiction.
Timestamp accuracy — clock-synchronized to a trusted source, monotonic, with timezone clearly recorded.
Replay capability — can you reproduce the model output from the recorded inputs and version metadata? If not, the model version field isn't doing its job.
Export — can the customer export their audit trail in a structured format (JSON-Lines, CSV) for their own retention?

Cogneris captures every extraction, review, and export with the metadata above. Audit retention defaults to 7 years; export is available via API. See the Security page for the details and the audit trails for non-deterministic outputs blog post for how we handle the LLM-specific challenges.

Related terms

← Back to the full glossary

Audit trail. What it is, what it must capture.