Frontier vision + OCR · agent-installed

Any file.
Any field.
Agent-wired.

Docex is one API call over whatever frontier vision model is winning this week. We constantly hunt for the cheapest, fastest, most reliable, and accurate orchestration so you don't have to. Your agent runs docex setup, you click one approval link, and the key lands in your .env. The human part is about 90 seconds.

pdf · jpeg · png · heic $5 = ~100 pages No plan · No sales call
~/your-app · docex cli
$
Click and type · try docex help
/ Paste this into your coding agent

One paste.
Five minutes. Back to work.

If your agent has the Docex skill, it picks this up natively. If not, the prompt is explicit enough that any decent agent figures out the CLI. Either way, you don't touch the docs.

prompt.txt — copy & paste ⧉ copy
Set up Docex in this project for [describe what you want to extract —
e.g., "company details from trade licenses during onboarding"].

Take me through the GitHub approval and the $5 wallet top-up, wire the
API key into my env, scaffold a server-side extraction endpoint for my
stack, and run a smoke test when you're done.
MinuteWhat's happeningWho
0:00–0:30 Agent sniffs your stack (Next.js, Express, plain Node), installs docex, runs docex setup. ▸ AGENT
0:30–1:30 You get one link. Sign in with GitHub. Pick $5 — that's about 100 pages at current rates. ▸ YOU
1:30–2:30 Stripe checkout. One card. No contract, no plan to pick, no procurement cycle. ▸ YOU
2:30–3:30 Agent receives the scoped API key, writes DOCEX_API_KEY into your env, scaffolds the extraction endpoint, wires the client wrapper. ▸ AGENT
3:30–4:30 Smoke extraction runs against a fixture or a file you point at. JSON comes back. Agent declares victory. ▸ AGENT
4:30 You're live. Ship the feature. Go get coffee. — DONE
/ The path most teams take

Three ways to read
a file.

All three assume a developer sits down and reads docs. Fine if that's your workflow. Painfully slow if you've been letting an agent do the wiring — and you still have to track which vision model is winning this week.

A · DIY

Roll your own with an LLM SDK

Fastest to start, slowest to maintain. You own model selection (and re-selection every week), prompt tuning, HEIC → JPEG, orientation fix, OCR fallback, page chunking, retries, schema validation, and the Tuesday 3pm "why did this invoice break" ticket.

B · Doc AI Platform

Plug in Textract, Document AI, or Azure DI

OCR is fine. Layout parsing is fine. But most predate the multimodal jump, so you're stitching them to an LLM yourself — and reading IAM docs before you read a file.

C · Docex

Let the agent install it. You approve.

Shaped around the agent doing the work. One approval link, a wallet, a skill in your repo. The engine underneath is whatever frontier vision model is currently winning. You don't track the frontier. We do.

/ What's actually different

What's actually
different.

01

The agent runs setup, not you.

One approval link. GitHub sign-in. $5 wallet top-up. The scoped key returns to your agent and lands in your .env. Your job in the loop: click approve. That's the whole human part.

02

Pay-as-you-go from page one.

No annual commit, no per-seat fee, no "talk to sales." Five dollars is about a hundred pages at current rates. Run out, top up, keep going. Like a coffee card, but for document extraction.

03

Frontier vision you don't have to track.

Provider abstraction over the current best multimodal models — Anthropic by default, more behind feature flags. When a better model ships next week, your extractions get better with zero code change. We constantly hunt for the cheapest, fastest, most reliable orchestration so you don't have to.

04

One envelope, every input.

PDF, JPEG, PNG, HEIC. Trade licenses, invoices, IDs, receipts photographed at a weird angle in a car at 6pm. Same SDK call, same response shape. Describe the fields; get them back. No preprocessing pipeline to maintain.

/ 01 — Approval

Approve once,
key in your repo.

The agent receives the scoped API key on the other side and writes it into your project. No portal tour. No "where do I find my key" thread. No key rotation dance every 90 days.

  • SkillFirst-party install for Claude Code, Cursor, Codex, Aider, Continue. Your agent knows what to do.
  • ApprovalOne link · GitHub sign-in · wallet top-up · done. 90 seconds of your life.
  • SurfaceCLI · Node SDK · HTTP. Same envelope back from each. Pick your surface.
approve.docex.dev/s/8Hk2-ax9R
Approval · session 8Hk2-ax9R
Approve onboard-trade-licenses for Acme Inc — extract company name, license number, expiry from uploaded files.
Agent
claude-code · v0.42
Wallet top-up
$5.00 · ~100 pages
Use case
trade-license-onboarding
Provider
anthropic · sonnet-4.5
Key returns to your agent. Never shown in browser.
/ 02 — One envelope

A phone photo,
read like a form.

Crooked iPhone HEIC, blurry scan, 12-page PDF — same call. Docex handles orientation, glare, OCR fallback, schema validation, retries, and provenance. You describe the fields. We handle the rest.

  • Inputspdf, jpeg, jpg, png, heic. Multi-page or single shot. We don't care.
  • EngineFrontier multimodal — Anthropic default. We track the model frontier, not you. When Sonnet 5 ships, you get it automatically.
  • TestsMock provider for CI. Zero credits burned. Zero network calls in your test suite.
request · extract.ts
await docex.extract({
  file: "./uploaded-license.heic",
  prompt: "Pull legal name,
           license no., expiry.",
});

// → 200 OK · 2.4s · $0.05
{
  "legal_name": "ACME LOGISTICS L.L.C",
  "license_no": "1019388",
  "expires_on": "2026-03-13"
}
Trade License · Mainland
LegalACME LOGISTICS L.L.C
License1019388
Issued14·03·2024
Expires13·03·2026
ActivityFreight forwarding agent
ManagerR. K. Menon
IMG_4128.HEICiPhone 15 · 4032×3024
/ 03 — Pay-as-you-go

A wallet,
not a contract.

Drop in five dollars. About a hundred pages at current upstream rates. Run out, top up, keep going. No card on file until you want one there. No invoice. No procurement cycle.

  • ModelBilled at 2× upstream provider cost. That's it. No platform fee, no egress charge, no "enterprise tier" gate.
  • Floor$5 minimum top-up. Per-page billing at 2× upstream cost. Transparent.
  • Mock$0.00 for fixtures and CI. The mock provider returns realistic JSON without burning a single credit.
Wallet · acme-prod ● Active
$3.18 ≈ 64 pages left
$5.00last top-up · 2 days ago
$1.82spent this week
upstream cost
extract · trade-license.heic1 page−$0.02
extract · invoice-Q3.pdf11 pages−$0.22
extract · receipt-photo.jpg1 page · mock−$0.00
top-up · GitHub @rkmenonapproved+$5.00
/ 04 — Dashboard

To debug,
not to upsell.

See what your agent extracted, the prompt it ran, the credits it burned, the failures and why. No demo banner. No upsell modal. No "schedule a call to unlock this view."

  • TracePer-call timeline · prompt · spend · provenance. Debug in minutes, not hours.
  • FixturesPromote any extraction to a CI fixture in one click. Regression testing for free.
  • ReplayRe-run a failed file against a different prompt or model. Iterate without rewriting code.
acme-prod / extractions
last 24h ▾
● live
Extractions Prompts Fixtures Spend Keys
Pages today
412
+38 vs yesterday
P50 latency
2.4s
−0.3s wk
Failure rate
0.7%
+2 this hour
IDFILE · USE-CASEFIELDSCOSTSTATUS
8841heictrade-license.heic / onboarding3 / 3$0.05ok
8840pdfinvoice-Q3-summary.pdf / ap-ingest14 / 14$0.55ok
8839jpgnoc-letter-blurry.jpg / kyc2 / 4$0.05retry
8838pdfcontract-msa-acme.pdf / legal28 / 28$1.40ok
8837heicpassport-front.heic / kyc5 / 5$0.05ok
8836jpgsalary-cert.jpg / kyc— / 3$0.00running
/ Who Docex is built for

Solo builders
and small teams.

People shipping extraction as a feature, not a research project. Already coding with an agent in the loop. Onboarding, KYC, claims, expense capture, invoice ingestion, contract review. If you'd rather ship than integrate, you're the target.

You'll love it if

  • +You're a solo builder or small team. Extraction is a feature, not the whole product.
  • +You're already coding with an agent — Claude Code, Cursor, Codex, Aider, Continue. The agent does the boilerplate; you do the thinking.
  • +You're building "user uploads a file or snaps a photo, app reads the fields." That's literally the whole use case.
  • +You're done stitching OCR + image preprocessing + model selection + retries + schema validation. Life's too short.

You probably won't

  • You need on-prem or VPC deployment. We're multi-tenant cloud. That's the tradeoff.
  • You need a signed BAA before you can evaluate. We're not there yet.
  • You want a sales rep on speed-dial and a quarterly QBR. We don't have sales reps. We have docs.
  • "Approve once, key in repo" sounds reckless rather than refreshing. We get it. Not for everyone.
/ Why now

Two shifts
in the last year.

01

Agents got good enough to own integration work.

The bottleneck moved from "can the agent write the code" to "can the agent get through your signup, billing, and API key UI." Most dev tools weren't designed for an agent doing the clicking. Docex was.

02

Vision models lapped the OCR pipeline industry.

Crooked phone photos, scanned 40-page leases, hand-annotated invoices — read better today than the six-month pipeline you'd have built two years ago. The hard part isn't reading the file anymore. It's constantly hunting for the cheapest, fastest, most reliable, and accurate orchestration. That's the layer we own.

/ How it fits in your stack

Two commands.
That's the whole shape.

step 01 — your agent runs this. you approve once.
$ docex setup \
  --use-case "extract trade licenses" \
  --top-up 5

 opening approval link…
 wallet funded · $5.00
 key written to .env.local
 skill installed for claude-code
step 02 — pdf, photo, scan, whatever your user uploaded.
$ docex extract ./uploaded-license.heic \
  --prompt "Pull legal name, no., expiry"

 1 page · anthropic · sonnet-4.5
 200 OK · 2.41s · $0.05
{ "legal_name": "ACME LOGISTICS L.L.C",
  "license_no": "1019388",
  "expires_on": "2026-03-13" }

Fire up your
coding agent.
Paste the prompt.

Approve the link, drop in five bucks, watch the smoke test pass. Frontier vision, agent-installed, pay-as-you-go. Five minutes from paste to live.

$ npx docex setup --use-case "<your workflow>"