Extract the fields your workflow actually needs
Raw document text is rarely the target. Your system needs invoice totals, bank-statement transactions, KYC identity fields, contract clauses, claim amounts, policy numbers, and the evidence behind each value. Cogneris lets you request a shipped template or pass your own JSON schema so the response matches the object your application expects.
{
"document_type": "invoice",
"invoice_number": { "value": "INV-2048", "confidence": 0.99 },
"line_items": [{ "description": "Platform usage", "amount": 1480.00 }],
"validation": { "status": "passed" }
}
One API for PDFs, scans, images, and document packets
Use the same endpoint for native PDFs, scanned PDFs, mobile photos, email attachments, and multi-document packets. Cogneris classifies the document, chooses the right extractor, stitches multi-page files, splits bundled packets, and returns one normalized payload per document.
Schema-based output instead of raw OCR text
The extraction response is typed JSON: values, confidence, page references, bounding boxes, validation status, and audit metadata. That makes the output ready for underwriting systems, AP automation, onboarding flows, claims platforms, data warehouses, and AI agents.
Financial documents
Invoices, receipts, payroll, tax forms, bank statements, and reconciliation packets.
Identity and onboarding
KYC IDs, passports, proof of address, beneficial ownership forms, and supporting evidence.
Contracts and claims
Clauses, obligations, FNOL packets, policy details, repair estimates, and medical bills.
Confidence, citations, and human review
Every field carries confidence and source evidence. High-confidence values can move straight through. Low-confidence or high-risk fields can route to human review without blocking the full document.
Async jobs and webhooks for production volume
Small files can run synchronously. Long packets, batch uploads, and high-volume workflows use async jobs with webhook callbacks, retry semantics, and signed payloads. Start in the API reference or go deeper in the extraction docs.
SDK snippet: extract a document
Use the SDK when your app needs retries, signed webhooks, and typed responses. The REST endpoint is the same underneath, so teams can start with cURL and move to Node or Python without changing their workflow contract.
import { Cogneris } from '@cogneris/sdk';
const client = new Cogneris({ apiKey: process.env.COGNERIS_API_KEY });
const result = await client.extractions.create({
file: './invoice.pdf',
template: 'invoice',
webhookUrl: 'https://app.example.com/webhooks/document-ai'
});
console.log(result.data.fields.total.value);
console.log(result.data.fields.total.citations[0].page);Validation before data reaches your system
Cogneris validates extracted fields with totals reconciliation, date checks, regex rules, cross-document consistency, required-field checks, and tenant-specific business rules before the data is approved.
Common document extraction API use cases
Use Cogneris for invoice extraction, bank statement extraction, contract extraction, KYC onboarding, insurance claims, payroll verification, and document-heavy workflow automation.
High-intent document extraction pages
PDF data extraction API
Extract fields, tables, line items, and citations from native PDFs and scanned PDFs.
Convert documents to JSON API
Turn document packets into typed JSON with schemas, confidence, validation, and webhooks.
Extract tables from PDF API
Return line items, transactions, rows, columns, totals, and table-level evidence.
Document extraction API pricing
Compare pricing drivers across page volume, complexity, validation, review, and audit needs.
Document extraction benchmark
Score vendors on field accuracy, citations, table fidelity, schema stability, review load, and webhook behavior.
Python SDK for document extraction
Upload documents, run async jobs, verify webhooks, parse JSON, and route low-confidence fields to review.