API · Document extraction

Document extraction API. PDFs in, structured JSON out.

Cogneris extracts fields, tables, line items, signatures, clauses, and decisions from PDFs, scans, and images. Send a document with a template or inline schema; get validated JSON with confidence scores, citations, review routing, and webhook delivery.

Extract the fields your workflow actually needs

Raw document text is rarely the target. Your system needs invoice totals, bank-statement transactions, KYC identity fields, contract clauses, claim amounts, policy numbers, and the evidence behind each value. Cogneris lets you request a shipped template or pass your own JSON schema so the response matches the object your application expects.

{
  "document_type": "invoice",
  "invoice_number": { "value": "INV-2048", "confidence": 0.99 },
  "line_items": [{ "description": "Platform usage", "amount": 1480.00 }],
  "validation": { "status": "passed" }
}

One API for PDFs, scans, images, and document packets

Use the same endpoint for native PDFs, scanned PDFs, mobile photos, email attachments, and multi-document packets. Cogneris classifies the document, chooses the right extractor, stitches multi-page files, splits bundled packets, and returns one normalized payload per document.

Schema-based output instead of raw OCR text

The extraction response is typed JSON: values, confidence, page references, bounding boxes, validation status, and audit metadata. That makes the output ready for underwriting systems, AP automation, onboarding flows, claims platforms, data warehouses, and AI agents.

Financial documents

Invoices, receipts, payroll, tax forms, bank statements, and reconciliation packets.

Identity and onboarding

KYC IDs, passports, proof of address, beneficial ownership forms, and supporting evidence.

Contracts and claims

Clauses, obligations, FNOL packets, policy details, repair estimates, and medical bills.

Confidence, citations, and human review

Every field carries confidence and source evidence. High-confidence values can move straight through. Low-confidence or high-risk fields can route to human review without blocking the full document.

Async jobs and webhooks for production volume

Small files can run synchronously. Long packets, batch uploads, and high-volume workflows use async jobs with webhook callbacks, retry semantics, and signed payloads. Start in the API reference or go deeper in the extraction docs.

SDK snippet: extract a document

Use the SDK when your app needs retries, signed webhooks, and typed responses. The REST endpoint is the same underneath, so teams can start with cURL and move to Node or Python without changing their workflow contract.

Node.js
import { Cogneris } from '@cogneris/sdk';

const client = new Cogneris({ apiKey: process.env.COGNERIS_API_KEY });
const result = await client.extractions.create({
  file: './invoice.pdf',
  template: 'invoice',
  webhookUrl: 'https://app.example.com/webhooks/document-ai'
});

console.log(result.data.fields.total.value);
console.log(result.data.fields.total.citations[0].page);

Validation before data reaches your system

Cogneris validates extracted fields with totals reconciliation, date checks, regex rules, cross-document consistency, required-field checks, and tenant-specific business rules before the data is approved.

Common document extraction API use cases

Use Cogneris for invoice extraction, bank statement extraction, contract extraction, KYC onboarding, insurance claims, payroll verification, and document-heavy workflow automation.

High-intent document extraction pages

PDF data extraction API

Extract fields, tables, line items, and citations from native PDFs and scanned PDFs.

Convert documents to JSON API

Turn document packets into typed JSON with schemas, confidence, validation, and webhooks.

Extract tables from PDF API

Return line items, transactions, rows, columns, totals, and table-level evidence.

Document extraction API pricing

Compare pricing drivers across page volume, complexity, validation, review, and audit needs.

Document extraction benchmark

Score vendors on field accuracy, citations, table fidelity, schema stability, review load, and webhook behavior.

Python SDK for document extraction

Upload documents, run async jobs, verify webhooks, parse JSON, and route low-confidence fields to review.

Document extraction API FAQ

What is a document extraction API?
A document extraction API accepts documents such as PDFs, scans, and images, then returns structured fields, tables, confidence scores, and metadata that software can use directly.
Can Cogneris extract tables and line items?
Yes. Cogneris extracts line items, table rows, totals, dates, parties, clauses, identifiers, and other schema-defined fields with page evidence and confidence scores.
Can I pass my own JSON schema?
Yes. You can use shipped templates or pass an inline schema so the response matches the object your application expects.
How is this different from OCR-only APIs?
OCR returns recognized text. Cogneris turns recognized document content into typed JSON, validates it, attaches evidence, and routes uncertain fields to review.
How is document extraction API pricing calculated?
Cogneris pricing is page-based, with workflow cost affected by document volume, complexity, validation rules, review routing, and support requirements.
What file types does the document extraction API support?
Cogneris supports native PDFs, scanned PDFs, common image formats, and document packets used in workflows such as invoices, KYC, claims, contracts, and lending.
How fast is document extraction?
Small documents can run synchronously, while longer packets and batches run asynchronously. Actual latency depends on page count, document quality, schema complexity, and validation steps.
Does the API support webhooks for long documents?
Yes. Long documents, batches, and high-volume workflows can run as asynchronous jobs with signed webhook callbacks.
Can extracted fields include source citations?
Yes. Extracted fields can include source citations, page references, bounding boxes, confidence scores, and validation state for reviewable output.
Can low-confidence fields route to human review?
Yes. Confidence thresholds and business rules can route only the uncertain fields to review while high-confidence fields continue through automation.
Which high-intent keywords match this API?
High-intent searches include document extraction API pricing, PDF data extraction API, invoice extraction API, convert documents to JSON API, document extraction API for developers, and comparison queries such as AWS Textract alternative.