Document AI Products

Supported types

40+ documents ready to use.

Each one calibrated for the formats, languages, currencies and local rules your customers actually use.

Payslip

HR

Bank statement

Credit

Tax return

Tax

Tax invoice

Tax

Articles of incorporation

Legal

Driver's license

Onboarding

National ID

Onboarding

Proof of address

Onboarding

Bill / invoice

Finance

Service invoice

Tax

Receipt

Finance

Purchase order

Finance

Developer experience

Three lines of code. One clean JSON.

Official SDKs for C#, Python, Node and Go. Webhooks, async jobs and batches of up to 10,000 documents per request.

Start with the document extraction API, parse layout and sections with the document parsing API, or use the OCR API when scans and photos need to become validated data.

C# / .NET Python Node.js Go cURL

extract.cs

// Cogneris SDK — C#
var client = new CognerisClient("fx_live_...");

var result = await client.Documents.ExtractAsync(new {
  Type       = "payslip",
  File       = "payslip.pdf",
  FraudCheck = true
});

// { salary: 12480, taxId: "...", fraud_score: 0.03 }

What ships in the box

Six capabilities, one orchestration.

The full document AI stack runs as one orchestrated pipeline. Every page goes through the same agentic flow — classify, extract, validate, score, audit — and lands as structured JSON ready for your downstream system. For API-first teams, these capabilities map to the document extraction API, document parsing API, and OCR API.

Capability	What it does
Ingestion	PDF (digital and scanned), JPEG, PNG, TIFF, DOCX, and email attachments. Multi-page documents stitched automatically. Inbox monitoring, webhook intake, and direct API upload.
Classification	Auto-detection across 40+ document types out of the box. Custom types ship from sample documents in 1 business day on Enterprise. Returns confidence scores and routing decisions.
Extraction	ReAct-architected agents pull structured fields with per-field confidence. Cross-field math, date, and entity validation runs in the same pass. Span pointers preserve provenance to the source paragraph.
Validation	Schema validation, cross-document reconciliation, and configurable business rules. Low-confidence fields route to HITL queues; reviewer corrections feed active learning.
Fraud signals	Synthetic-document detection, template tampering, font and metadata anomalies, cross-document consistency checks, and configurable fraud scoring per workflow.
Audit trail	Every extraction logs model version, prompt hash, response, reviewer ID, and per-field confidence. 7-year retention configurable per workflow. Exportable for SOC 2, GLBA, HIPAA, and GDPR examinations.

By document type

Pre-trained extractors for the documents you already process.

Each extractor ships with the right schema, validators, and downstream connectors. Drop in a sample document, get structured JSON back in seconds.

Finance / AP

Invoice extraction

Vendor, line items, GL-code hints, totals. 3-way match ready for NetSuite, SAP, Oracle, Workday, QuickBooks, Xero.

Banking / lending

Bank statement extraction

Transactions, balances, recurring deposits. Sub-3-second parsing on US and Canadian statements for cash-flow underwriting.

Legal / CLM

Contract extraction

Parties, term, governing law, liability caps, named clauses. Span pointers to source paragraph for legal review.

Onboarding / CIP

KYC document extraction

Government IDs, proof of address, beneficial-ownership certifications. MRZ parsing and sanctions/PEP screening hooks.

HR / lending

Payroll extraction

Pay stubs, W-2s, 1099s. Gross, deductions, net pay, and YTD reconciliation for income verification.

Tax / lending

Tax return extraction

1040s, K-1s, schedules. AGI, deductions, credits, refund or amount owed — with cross-schedule math validation.

T&E / expense

Receipt extraction

Itemized receipts with merchant, line items, tax, and tip handling. Routing for Expensify, Concur, Brex, Ramp.

Logistics

Bill of lading OCR

Carriers, consignees, shipment IDs, weights, charges, and freight line items for logistics workflows.

Payments

Check OCR API

Payer, payee, amount, date, MICR context, and remittance fields for payment operations.

Insurance

ACORD forms extraction

Policy, applicant, vehicle, property, coverage, premium, and loss data from insurance packets.

Insurance

Insurance claim extraction

FNOL, police reports, medical bills, repair estimates. Cross-document reconciliation for claim adjudication.

Integrations

Lives where your systems already run.

Pre-built connectors push extracted data directly into the system of record. No intermediate data-entry step, no manual reconciliation, no brittle CSV middleware.

ERP and financeNetSuite, SAP S/4HANA, Oracle Cloud ERP, Workday Financials, QuickBooks Online, Xero, Sage Intacct.
CRM and revenueSalesforce, Salesforce Financial Services Cloud, HubSpot. Customer records, opportunities, and applicants land as structured objects.
Lending and core bankingnCino, Encompass, Mambu, Alloy, Q2, Jack Henry, Fiserv, FIS, Plaid, MX.
CLM and legalIronclad, Agiloft, ContractWorks, Concord. Clauses, parties, and term data flow with span pointers preserved.
HR and payrollWorkday HCM, ADP, The Work Number, Gusto, Justworks. Wage and YTD data flows into HRIS and verification workflows.
T&E and expenseExpensify, Concur, Brex, Ramp, Pleo. Receipts and reimbursements route to the right policy and ledger code.
Onboarding and KYCPersona, Alloy, Sumsub, ComplyAdvantage. ID verification and CIP/CDD packets with sanctions screening triggers.
Custom integrationsREST API and webhook events. SDKs for C# / .NET, Python, Node.js, and Go. Enterprise contracts include scoped custom-connector engineering.

Security & compliance

We treat your documents the way you would.

Privacy by design

Configurable retention, automatic PII anonymization and proper data-processing agreements out of the box.

End-to-end encryption

TLS 1.3 in transit. AES-256 at rest. Keys managed by a dedicated KMS instance per customer.

Dedicated VPC

For regulated companies, we offer isolated deployment in your cloud or on-premises.

FAQ

Common questions.

Can I run Cogneris on document types you don't list?

Yes. Custom document types ship from sample documents in 1 business day on Enterprise — send 10 to 50 examples, get a calibrated extractor with the right schema, validators, and confidence thresholds. The same agentic pipeline handles them; the only thing that changes is the schema you target.

What's the synchronous vs. async limit?

Synchronous extraction handles documents up to 10 pages. Above that, async mode with a webhook callback is recommended — same per-page rate, just a different request shape. Batch endpoints accept up to 10,000 documents per request for back-office processing.

How does Cogneris handle multi-language documents?

Thirty-plus languages are supported with field-level localization. Currency conversion, date format normalization (DD/MM vs. MM/DD), and address parsing follow the source-document locale. Output JSON normalizes to ISO formats (ISO 4217 for currency, ISO 8601 for dates, ISO 3166 for countries) so downstream systems get consistent data regardless of source language.

What happens to documents that fail validation?

Low-confidence fields or schema/business-rule failures route to a configurable HITL queue. Reviewers see the original document, the extracted value, the confidence score, and the validation error — they correct in-line, and corrections feed active learning to lift confidence on the next similar document. Full audit trail (model version, prompt hash, response, reviewer ID) is preserved.

Can Cogneris run in our cloud or on-premises?

Yes. VPC-deployed processing is available on Enterprise, with a customer-managed control plane in your AWS, GCP, or Azure tenant. The data plane never leaves your network boundary; you bring your own KMS keys (BYOK) for envelope encryption. PrivateLink and Private Service Connect are supported for the managed-tenant deployment so traffic stays off the public internet.

How fast is the integration?

Three lines of code to extract your first document via the REST API. Most teams ship a first extractor into production within 1 week and reach steady-state accuracy on their document mix in 30 days. For pre-built ERP, CRM, and CLM connectors, integration is configuration rather than engineering work. See the docs or the API reference.

The entire document processing stack. In one API.

40+ documents ready to use.

Three lines of code. One clean JSON.

Six capabilities, one orchestration.

Pre-trained extractors for the documents you already process.

Lives where your systems already run.

We treat your documents the way you would.

Privacy by design

End-to-end encryption

Dedicated VPC

Common questions.

Ready for your first extraction?