Frontier vision + OCR · agent-installed

Any file.
Any field.
Under five minutes.

Frontier vision + OCR, wrapped in an extraction API your coding agent installs for you. You sign in once and approve a $5 top-up. That's the whole human part.

pdf · jpeg · png · heic $5 minimum top-up SOC 2 Type II
~/your-app · agent session
One approval link · key written to .env CycleTAB
/ Paste this into your coding agent

One prompt.
Five minutes to live.

If your agent has the first-party Docex skill installed it picks this up natively. If not, the prompt is explicit enough that any decent coding agent figures out the CLI path.

prompt.txt — copy & paste ⧉ copy
Set up Docex in this project for [describe what you want to extract —
e.g., "company details from trade licenses during onboarding"].

Take me through the GitHub approval and the $5 wallet top-up, wire the
API key into my env, scaffold a server-side extraction endpoint for my
stack, and run a smoke test when you're done.
MinuteWhat's happeningWho
0:00–0:30 Agent detects your stack (Next.js, Express, plain Node), installs the docex package, runs docex setup. ▸ AGENT
0:30–1:30 You get one approval link. Sign in with GitHub. Pick $5 — that's about 100 pages. ▸ YOU
1:30–2:30 Stripe checkout. One card, no contract, no plan to pick. ▸ YOU
2:30–3:30 Agent receives the scoped API key, writes DOCEX_API_KEY into your env, scaffolds the extraction endpoint, wires the client wrapper. ▸ AGENT
3:30–4:30 Smoke extraction runs against a fixture or a file you point at. JSON comes back. ▸ AGENT
4:30 You're live. Ship the feature. — DONE
/ The path most teams take

Three ways to read
a file.

All three are designed for a developer to sit down, read the docs, and stand up an account. Fine if that's how you work. Slow if you've been letting an agent do the wiring lately — and you still have to track the model frontier yourself.

A · DIY

Roll your own with an LLM SDK

Fastest to start, slowest to finish. You own model selection (and re-selection every quarter), prompt tuning, HEIC → JPEG, orientation fix, OCR fallback, page chunking, retries, and the Tuesday-invoice ticket.

B · Doc AI Platform

Plug in Textract, Document AI, or Azure DI

OCR is fine. Layout parsing is fine. But most predate the multimodal jump, so you're stitching them to an LLM yourself — and reading IAM docs before you read a file.

C · Docex

Let the agent install it. You approve.

Shaped around the agent doing the work. One approval link, a wallet, a skill in your repo. The engine underneath is whatever frontier vision model is currently winning.

/ What's actually different

What's actually
different.

01

The agent runs setup, not you.

One approval link. GitHub sign-in. Wallet top-up. The key returns to your agent and lands in your .env. Your job in the loop: click approve.

02

Pay-as-you-go from page one.

No annual commit, no per-seat fee, no “talk to sales.” Five dollars in the wallet is around a hundred pages. Run out, top up, keep going.

03

Frontier vision you don't have to track.

A provider abstraction over the current best multimodal models — Anthropic by default, more behind feature flags. When a better model ships next quarter, your extractions get better with no code change.

04

One envelope, every input.

PDF, JPEG, PNG, HEIC. Trade licenses, invoices, IDs, receipts photographed at a weird angle. Same SDK call, same response shape. Describe the fields; get them back.

/ 01 — Approval

Approve once,
key in your repo.

The agent receives the API key on the other side and writes it into your project. No portal tour. No "where do I find my key" thread.

  • SkillFirst-party install for Claude Code, Cursor, Codex, Aider, Continue.
  • ApprovalOne link · GitHub sign-in · wallet top-up · done.
  • SurfaceCLI · Node SDK · HTTP. Same envelope back from each.
approve.docex.dev/s/8Hk2-ax9R
Approval · session 8Hk2-ax9R
Approve onboard-trade-licenses for Acme Inc — extract company name, license number, expiry from uploaded files.
Agent
claude-code · v0.42
Wallet top-up
$5.00 · ~100 pages
Use case
trade-license-onboarding
Provider
anthropic · sonnet-4.5
Key returns to your agent. Never shown in browser.
/ 02 — One envelope

A phone photo,
read like a form.

Crooked iPhone HEIC, blurry scan, 12-page PDF — same call. Docex handles orientation, glare, OCR fallback, schema validation, retries, and provenance.

  • Inputspdf, jpeg, jpg, png, heic. Multi-page or single shot.
  • EngineFrontier multimodal — Anthropic default. We track the model frontier, not you.
  • TestsMock provider so CI never burns credits or hits the network.
request · extract.ts
await docex.extract({
  file: "./uploaded-license.heic",
  prompt: "Pull legal name,
           license no., expiry.",
});

// → 200 OK · 2.4s · $0.05
{
  "legal_name": "ACME LOGISTICS L.L.C",
  "license_no": "1019388",
  "expires_on": "2026-03-13"
}
Trade License · Mainland
LegalACME LOGISTICS L.L.C
License1019388
Issued14·03·2024
Expires13·03·2026
ActivityFreight forwarding agent
ManagerR. K. Menon
IMG_4128.HEICiPhone 15 · 4032×3024
/ 03 — Pay-as-you-go

A wallet,
not a contract.

Drop in five dollars. About a hundred pages. Run out, top up, keep going. No card on file until you want one there.

  • Floor$5 minimum top-up · ~$0.05 per page extracted.
  • Mock$0.00 charged for fixtures and CI runs.
  • LimitsPer-key daily ceiling · auto-pause · Slack webhook.
Wallet · acme-prod ● Active
$3.18 ≈ 64 pages left
$5.00last top-up · 2 days ago
$1.82spent this week
$0.05avg per page
extract · trade-license.heic1 page−$0.05
extract · invoice-Q3.pdf11 pages−$0.55
extract · receipt-photo.jpg1 page · mock−$0.00
top-up · GitHub @rkmenonapproved+$5.00
/ 04 — Dashboard

To debug,
not to upsell.

See what your agent extracted, the prompt it ran, the credits it burned, the failures and why. No demo banner. No upsell modal.

  • TracePer-call timeline · prompt · spend · provenance.
  • FixturesPromote any extraction to a CI fixture in one click.
  • ReplayRe-run a failed file against a different prompt or model.
acme-prod / extractions
last 24h ▾
● live
Extractions Prompts Fixtures Spend Keys
Pages today
412
+38 vs yesterday
P50 latency
2.4s
−0.3s wk
Failure rate
0.7%
+2 this hour
IDFILE · USE-CASEFIELDSCOSTSTATUS
8841heictrade-license.heic / onboarding3 / 3$0.05ok
8840pdfinvoice-Q3-summary.pdf / ap-ingest14 / 14$0.55ok
8839jpgnoc-letter-blurry.jpg / kyc2 / 4$0.05retry
8838pdfcontract-msa-acme.pdf / legal28 / 28$1.40ok
8837heicpassport-front.heic / kyc5 / 5$0.05ok
8836jpgsalary-cert.jpg / kyc— / 3$0.00running
/ Who Docex is built for

Solo builders
and small teams.

People shipping extraction as a feature, not a research project. Already coding with an agent in the loop. Onboarding, KYC, claims, expense capture, invoice ingestion, contract review.

You'll love it if

  • +You're a solo builder or small team. Extraction is a feature, not the whole product.
  • +You're already coding with an agent — Claude Code, Cursor, Codex, Aider, Continue.
  • +You're building "user uploads a file or snaps a photo, app reads the fields."
  • +You're done stitching OCR + image preprocessing + model selection + retries + schema validation.

You probably won't

  • If you need on-prem or VPC deployment.
  • If you need a signed BAA before you can evaluate.
  • If you want a sales rep on speed-dial and a quarterly QBR.
  • If "approve once, key in repo" sounds reckless rather than refreshing.
/ Why now

Two shifts
in the last year.

01

Agents got good enough to own integration work.

The bottleneck moved from "can the agent write the code" to "can the agent get through your signup, billing, and API key UI." Most dev tools weren't designed for that. Docex was.

02

Vision models lapped the OCR pipeline industry.

Crooked phone photos, scanned 40-page leases, hand-annotated invoices — read better than the stack you'd have spent six months assembling two years ago. The hard part isn't reading the file. It's staying on the current model. That's the layer Docex owns.

/ How it fits in your stack

Two commands.
That's the whole shape.

step 01 — your agent runs this. you approve once.
$ docex setup \
  --use-case "extract trade licenses" \
  --top-up 5

 opening approval link…
 wallet funded · $5.00
 key written to .env.local
 skill installed for claude-code
step 02 — pdf, photo, scan, whatever your user uploaded.
$ docex extract ./uploaded-license.heic \
  --prompt "Pull legal name, no., expiry"

 1 page · anthropic · sonnet-4.5
 200 OK · 2.41s · $0.05
{ "legal_name": "ACME LOGISTICS L.L.C",
  "license_no": "1019388",
  "expires_on": "2026-03-13" }

Open your
coding agent.
Paste the prompt.

Approve the link, top up $5, watch the smoke test pass. Frontier vision, agent-installed, pay-as-you-go. Five minutes from paste to live.

$ npx docex setup --use-case "<your workflow>"