Use case · US tax returns

US tax return extraction. Line-level fidelity.

Lending, mortgage, audit, and high-net-worth underwriting all depend on US tax returns being parsed accurately, line by line. Cogneris extracts every numeric line of a 1040, every form 1099, and the standard schedules (C, E, K-1) into a normalized schema in under five seconds — with a confidence score per field and a complete audit trail.

Why US tax returns are a pain

A complete US tax return is rarely one form. It's a 1040 plus an unpredictable mix of schedules (A, B, C, D, E, K-1) and 1099s (NEC, MISC, INT, DIV, B, R) — sometimes 30+ pages of forms with field codes that humans can read at a glance and OCR systems consistently misread. Lenders and underwriters need every line accurate; a misread Schedule C net profit is a denial. Manual data entry takes hours per return; legacy OCR systems hit acceptable accuracy on the 1040 itself but fall over on schedules.

How Cogneris does it

Cogneris extracts US tax forms into a flat normalized schema keyed by IRS line number — so line_22_total_income on a 1040 is the same field whether the customer used TurboTax, H&R Block, or paper-filed. Every line is returned with a confidence score, a bounding box reference back to the source page, and a validation result against IRS arithmetic rules. Schedule attachments are parsed and linked to their parent form automatically.

Sample extraction output

doc_typeForm 1040 (2024)
filing_statusMarried filing jointly
primary_taxpayerJordan T. Hall
ssn***-**-1234
agi_line_11US$ 287,420.00
taxable_income_line_15US$ 256,140.00
total_tax_line_24US$ 51,228.00
federal_tax_withheld_line_25aUS$ 48,150.00
refund_line_34US$ 0.00
amount_owed_line_37US$ 3,078.00
schedules_attachedSchedule A, Schedule B, Schedule D, Schedule E
forms_10991099-INT, 1099-DIV, 1099-B (×2)
arithmetic_check✓ all lines balance
confidence0.97 → auto-approved

What you get out of the box

Every line, every schedule

Full 1040, Schedules A/B/C/D/E/K-1, and the 1099 family. Returned as a flat JSON keyed by IRS line numbers.

Arithmetic validation

IRS line arithmetic rules are checked automatically. Inconsistencies are flagged before the document reaches your model.

Multi-year stitching

Process 2 or 3 years of returns at once and get a stitched view ready for underwriting (income trend, AGI history, deduction patterns).

PII-aware audit trail

SSNs and dependent identifiers are redacted in metadata by default. Configurable retention from 0 to 7 years.

Integration patterns

For mortgage lenders and underwriters who need fast tax-return parsing, Cogneris's async mode handles 50-100 page returns in under 30 seconds end-to-end. Direct integrations exist for Encompass, Blend and lender CRMs. The REST API can also drop a structured JSON straight into your underwriting model.

Compliance & trust

Tax returns are highly sensitive. Cogneris retains them encrypted at rest with per-tenant keys, with configurable retention (default 90 days, can be 0). PII is redacted in audit metadata by default. HIPAA BAA is available on Enterprise for healthcare-adjacent workflows. See our trust page for the full posture: encryption, tenant isolation, sub-processors, GDPR DPA, CCPA, SOC 2 Type II in progress, and HIPAA BAA on Enterprise.

Get started

Pay-per-page pricing means you can start an evaluation today without an annual commit. Most teams ship their first tax-return extraction into production within a week.

Related extractors

Cogneris extracts dozens of structured document types. The closest neighbors to tax-return extraction:

  • Payroll extraction — W-2s, 1099s, and pay stubs that reconcile against 1040 and schedule data.
  • Bank statement extraction — transaction-level parsing for interest income, dividend, and cash-flow validation.
  • KYC document extraction — government IDs and proof of address used alongside tax returns for full client onboarding.

For broader context, see the IDP buyer's guide, the 2026 State of Document AI report, or estimate ROI at your volume.