The product

The entire document processing stack. In one API.

From ingestion to structured data — classification, contextual OCR, extraction, validation and fraud detection. Run it as SaaS or in a dedicated VPC.

Supported types

40+ documents ready to use.

Each one calibrated for the formats, languages, currencies and local rules your customers actually use.

Payslip
HR
Bank statement
Credit
Tax return
Tax
Tax invoice
Tax
Articles of incorporation
Legal
Driver's license
Onboarding
National ID
Onboarding
Proof of address
Onboarding
Bill / invoice
Finance
Service invoice
Tax
Receipt
Finance
Purchase order
Finance
Developer experience

Three lines of code. One clean JSON.

Official SDKs for C#, Python, Node and Go. Webhooks, async jobs and batches of up to 10,000 documents per request.

Start with the document extraction API, parse layout and sections with the document parsing API, or use the OCR API when scans and photos need to become validated data.

C# / .NET Python Node.js Go cURL
extract.cs
// Cogneris SDK — C#
var client = new CognerisClient("fx_live_...");

var result = await client.Documents.ExtractAsync(new {
  Type       = "payslip",
  File       = "payslip.pdf",
  FraudCheck = true
});

// { salary: 12480, taxId: "...", fraud_score: 0.03 }
What ships in the box

Six capabilities, one orchestration.

The full document AI stack runs as one orchestrated pipeline. Every page goes through the same agentic flow — classify, extract, validate, score, audit — and lands as structured JSON ready for your downstream system. For API-first teams, these capabilities map to the document extraction API, document parsing API, and OCR API.

Capability What it does
Ingestion PDF (digital and scanned), JPEG, PNG, TIFF, DOCX, and email attachments. Multi-page documents stitched automatically. Inbox monitoring, webhook intake, and direct API upload.
Classification Auto-detection across 40+ document types out of the box. Custom types ship from sample documents in 1 business day on Enterprise. Returns confidence scores and routing decisions.
Extraction ReAct-architected agents pull structured fields with per-field confidence. Cross-field math, date, and entity validation runs in the same pass. Span pointers preserve provenance to the source paragraph.
Validation Schema validation, cross-document reconciliation, and configurable business rules. Low-confidence fields route to HITL queues; reviewer corrections feed active learning.
Fraud signals Synthetic-document detection, template tampering, font and metadata anomalies, cross-document consistency checks, and configurable fraud scoring per workflow.
Audit trail Every extraction logs model version, prompt hash, response, reviewer ID, and per-field confidence. 7-year retention configurable per workflow. Exportable for SOC 2, GLBA, HIPAA, and GDPR examinations.
By document type

Pre-trained extractors for the documents you already process.

Each extractor ships with the right schema, validators, and downstream connectors. Drop in a sample document, get structured JSON back in seconds.

Finance / AP
Invoice extraction
Vendor, line items, GL-code hints, totals. 3-way match ready for NetSuite, SAP, Oracle, Workday, QuickBooks, Xero.
Banking / lending
Bank statement extraction
Transactions, balances, recurring deposits. Sub-3-second parsing on US and Canadian statements for cash-flow underwriting.
Legal / CLM
Contract extraction
Parties, term, governing law, liability caps, named clauses. Span pointers to source paragraph for legal review.
Onboarding / CIP
KYC document extraction
Government IDs, proof of address, beneficial-ownership certifications. MRZ parsing and sanctions/PEP screening hooks.
HR / lending
Payroll extraction
Pay stubs, W-2s, 1099s. Gross, deductions, net pay, and YTD reconciliation for income verification.
Tax / lending
Tax return extraction
1040s, K-1s, schedules. AGI, deductions, credits, refund or amount owed — with cross-schedule math validation.
T&E / expense
Receipt extraction
Itemized receipts with merchant, line items, tax, and tip handling. Routing for Expensify, Concur, Brex, Ramp.
Logistics
Bill of lading OCR
Carriers, consignees, shipment IDs, weights, charges, and freight line items for logistics workflows.
Payments
Check OCR API
Payer, payee, amount, date, MICR context, and remittance fields for payment operations.
Insurance
ACORD forms extraction
Policy, applicant, vehicle, property, coverage, premium, and loss data from insurance packets.
Insurance
Insurance claim extraction
FNOL, police reports, medical bills, repair estimates. Cross-document reconciliation for claim adjudication.
Integrations

Lives where your systems already run.

Pre-built connectors push extracted data directly into the system of record. No intermediate data-entry step, no manual reconciliation, no brittle CSV middleware.

  • ERP and financeNetSuite, SAP S/4HANA, Oracle Cloud ERP, Workday Financials, QuickBooks Online, Xero, Sage Intacct.
  • CRM and revenueSalesforce, Salesforce Financial Services Cloud, HubSpot. Customer records, opportunities, and applicants land as structured objects.
  • Lending and core bankingnCino, Encompass, Mambu, Alloy, Q2, Jack Henry, Fiserv, FIS, Plaid, MX.
  • CLM and legalIronclad, Agiloft, ContractWorks, Concord. Clauses, parties, and term data flow with span pointers preserved.
  • HR and payrollWorkday HCM, ADP, The Work Number, Gusto, Justworks. Wage and YTD data flows into HRIS and verification workflows.
  • T&E and expenseExpensify, Concur, Brex, Ramp, Pleo. Receipts and reimbursements route to the right policy and ledger code.
  • Onboarding and KYCPersona, Alloy, Sumsub, ComplyAdvantage. ID verification and CIP/CDD packets with sanctions screening triggers.
  • Custom integrationsREST API and webhook events. SDKs for C# / .NET, Python, Node.js, and Go. Enterprise contracts include scoped custom-connector engineering.
Security & compliance

We treat your documents the way you would.

Privacy by design

Configurable retention, automatic PII anonymization and proper data-processing agreements out of the box.

End-to-end encryption

TLS 1.3 in transit. AES-256 at rest. Keys managed by a dedicated KMS instance per customer.

Dedicated VPC

For regulated companies, we offer isolated deployment in your cloud or on-premises.

FAQ

Common questions.

Can I run Cogneris on document types you don't list?
Yes. Custom document types ship from sample documents in 1 business day on Enterprise — send 10 to 50 examples, get a calibrated extractor with the right schema, validators, and confidence thresholds. The same agentic pipeline handles them; the only thing that changes is the schema you target.
What's the synchronous vs. async limit?
Synchronous extraction handles documents up to 10 pages. Above that, async mode with a webhook callback is recommended — same per-page rate, just a different request shape. Batch endpoints accept up to 10,000 documents per request for back-office processing.
How does Cogneris handle multi-language documents?
Thirty-plus languages are supported with field-level localization. Currency conversion, date format normalization (DD/MM vs. MM/DD), and address parsing follow the source-document locale. Output JSON normalizes to ISO formats (ISO 4217 for currency, ISO 8601 for dates, ISO 3166 for countries) so downstream systems get consistent data regardless of source language.
What happens to documents that fail validation?
Low-confidence fields or schema/business-rule failures route to a configurable HITL queue. Reviewers see the original document, the extracted value, the confidence score, and the validation error — they correct in-line, and corrections feed active learning to lift confidence on the next similar document. Full audit trail (model version, prompt hash, response, reviewer ID) is preserved.
Can Cogneris run in our cloud or on-premises?
Yes. VPC-deployed processing is available on Enterprise, with a customer-managed control plane in your AWS, GCP, or Azure tenant. The data plane never leaves your network boundary; you bring your own KMS keys (BYOK) for envelope encryption. PrivateLink and Private Service Connect are supported for the managed-tenant deployment so traffic stays off the public internet.
How fast is the integration?
Three lines of code to extract your first document via the REST API. Most teams ship a first extractor into production within 1 week and reach steady-state accuracy on their document mix in 30 days. For pre-built ERP, CRM, and CLM connectors, integration is configuration rather than engineering work. See the docs or the API reference.
Let's talk

Ready for your first extraction?

In 30 minutes we'll show you the platform running on your own documents.