Definition
Human-in-the-loop — HITL — is a workflow pattern where uncertain or low-confidence machine outputs are routed to a human reviewer before downstream use. In document AI, the human sees the document, the extracted values, the confidence scores, and a highlighted view of the source clause; they confirm or correct, and the corrected output ships downstream. The corrections also feed active learning to improve the model on similar documents.
Why HITL matters
Document AI is statistical. Even a 99% field-level accuracy means 1 in 100 fields is wrong, and on a 12-field document that's roughly a 1-in-9 chance of at least one error. For most real workflows — lending, claims, KYC, contract review — that error rate is unacceptable to ship blindly. HITL is how you reach 99.9%+ effective accuracy: the model handles the easy 90%, the human handles the 10% the model wasn't sure about, and you get accuracy that exceeds what either could do alone.
Done well, HITL is a force multiplier. Done poorly, HITL is a euphemism for "we ship 80% accuracy and call the queue a feature."
Common pitfalls
Uncalibrated confidence scores. The whole HITL routing depends on confidence. If your 0.95 means a different thing than your other model's 0.95, you're routing the wrong documents to the queue. Calibration is a per-tenant, per-document-type engineering problem — not a free input from the LLM.
Reviewer fatigue. Send too many low-stakes items to the queue and reviewers click through without reading. Now you've manufactured the appearance of human oversight without the substance. Audit your queue: what fraction of human "confirmations" actually changed the output? If it's under 5%, your routing threshold is too aggressive.
No corrective feedback loop. If reviewer corrections don't flow back into model improvement, the same documents keep landing in the queue forever. Active learning closes the loop.
Missing the audit trail. "Human reviewed at 14:22 UTC, corrected field X from $1,200 to $1,250" is auditable. "Human approved" is not. Reviewer identity, timestamp, before/after values, and reasoning notes belong in the audit trail.
How to size the queue
Three knobs: confidence threshold, document throughput, and reviewer hours per document. The product of those gives you required headcount. A typical setup with calibrated confidence routes 10–30% of documents to review, and a trained reviewer handles 60–120 documents per hour for invoice-class workflows. Numbers vary by document type — a multi-page MSA review is closer to 4–8 per hour.