Definition
KYC — Know Your Customer — is the regulatory requirement for financial institutions, fintechs, marketplaces, crypto platforms, and other regulated entities to verify the identity of their customers and assess risk. The requirement comes from anti-money-laundering (AML) and counter-terrorism-financing (CTF) rules, with the specifics varying by jurisdiction (BSA in the US, AMLD in the EU, MLR 2017 in the UK, multiple PMLA-style regimes worldwide).
In practice, KYC means: collect identity documents, verify they're authentic, confirm the person presenting them is the rightful holder, and run sanctions and PEP screening. Document AI sits at the document-verification stage.
Document types in a typical KYC pipeline
- Government-issued photo ID — passport, national ID card, driver license. These are the primary identity proofs and contain the highest-value extraction targets (name, DOB, document number, MRZ data, expiry, issuing country).
- Proof of address — utility bill, bank statement, lease. Verifies residential address; less standardized format than ID.
- Proof of source of funds (enhanced KYC) — payslips, tax returns, employer letters, bank statements showing income.
- Selfie + liveness — biometric face match against the photo ID. Document AI doesn't do this directly, but the document extraction feeds it the reference image.
- Beneficial-ownership documents — for corporate KYC, the registration certificate, articles of association, and UBO declarations.
What "good" looks like in 2026
A KYC-grade document AI system hits four bars:
Accuracy on damaged inputs. KYC photos arrive crumpled, glared, rotated, or partially obscured. Field-level accuracy on clean documents is meaningless if it falls off a cliff on the realistic distribution.
Fraud signal extraction. Beyond reading the fields, the system flags signals: image manipulation detection, expiry-date plausibility, MRZ checksum verification, security-feature presence (ICAO RFID for chip-equipped passports). A pure extraction system without fraud signals is incomplete for KYC.
Audit-grade outputs. Every extraction needs a citation (page, region, bounding box) and an immutable record. KYC is one of the workflows where regulators actually pull the audit trail. AML retention typically runs 5–7 years.
Regulatory posture. Encryption at rest with per-tenant keys, GDPR DPA with EU SCCs, configurable data residency, and HIPAA BAA on Enterprise (for cases that touch healthcare KYC). Vendors without that posture are off the table.
What to ask vendors
- Which ID document types and issuing countries do you cover, and what's the accuracy on each?
- Do you do MRZ checksum verification and security-feature detection, or just text extraction?
- How do you handle face-match and liveness — in-house, or pass-through to a partner?
- What's your retention default, and is it configurable down to 0 days?
- Can you support our SAR/STR audit-trail format for the AML reporting authority?
Cogneris covers passport, national ID, and driver license extraction with MRZ verification and document-level fraud signals. See KYC document extraction → for the full posture.