Built for native PDFs, scanned PDFs, and packets
A PDF data extraction API has to handle embedded text, OCR-only scans, rotated pages, multi-page packets, and mixed document types. Cogneris classifies each file, applies OCR when needed, extracts against a schema, validates the result, and keeps page evidence attached to the output.
Fields
Names, dates, totals, IDs, addresses, clauses, balances, policy numbers, and custom schema fields.
Tables
Line items, transactions, row values, columns, subtotals, and table-level confidence.
Evidence
Page references, citations, confidence scores, validation status, and review metadata.
When this is stronger than OCR alone
OCR gives you text. PDF data extraction gives you typed fields, nested arrays, normalized values, validation errors, and workflow state. That difference matters when the data feeds underwriting systems, ERPs, CRMs, compliance workflows, or agent tools.
SDK snippet: PDF to JSON
For PDFs with known fields, pass a template or inline schema and request citations so reviewers can trace every value.
const extraction = await client.extractions.create({
file: './loan-packet.pdf',
schema: {
borrower_name: 'string',
statement_period: 'date_range',
ending_balance: 'currency'
},
includeCitations: true
});
console.log(extraction.data.fields.ending_balance);