claritize — documents in. data out.

how it works

a pdf goes in. typed json comes out.

hand us a pdf

web upload for people; one mcp call for agents.

the managed pipeline parses it

layout, tables across page breaks, six languages — you never see the pipeline.

typed json comes back

a download for you; a presigned url for your agent, so its context stays clean.

q3-financials.pdf parsed

schema=revenue_report

period="2026 Q3"

revenue=48_213_400.00

line_items=12 rows · 4 tables

citations=38 spans · 34 pages

confidence=0.942

two audiences, both first-class

one portal for people.
one mcp server for agents.

for people

a folder portal. upload pdfs, watch them parse, download json. rename, move, delete — a file manager, not a developer tool. some customers never write a line of code.

for agents

two mcp tools: claritize does the work, claritize_check watches it. async by design — submit a long document, keep working, harvest the result when it lands. results arrive as a presigned url, never inline.

a 200-page contract in. a 50-token answer out.

a 200-page contract, read inline≈ 80,000 tokens· per question

the same contract via claritize≈ 50 tokens· one presigned url

illustrative arithmetic — the json waits in storage; your agent reads it where tokens are free.

what it's exceptionally good at

receipts. tables. six languages.

receipts

merchant, date, line items, tax broken out by jurisdiction, tip, total, last 4 of the card. restaurant, retail, gas, online — every flavor.

receipt/v1 · out of the box

tables

multi-page tables stitched across page breaks. header rows become field names; merged cells flatten intelligently. tables come back as arrays of objects, ready for whatever's downstream.

where generic ocr visibly fails

multi-language

english, spanish, french, german, italian, and portuguese — native, with mixed-language documents handled per page. need another? more on request. we extract in the source language — translation stays yours.

en · es · fr · de · it · pt

and by extension: forms, thousand-page documents, bad scans — same pipeline, no special code path.

trust

zero data retention.

claritize never trains on, sells, shares, or analyzes your document content. we process documents to return structured data — and that is the end of our relationship with the content.

retention="zero" · on every parsing call

encryption=customer-managed keys · rotate or revoke anytime

training=none · no pipeline ever touches your content

metadata=aggregate only · page counts and costs, never content

pricing

coming soon.

per-page pricing, no surprise bills.

documents in.data out.