Docex is one API call over whatever frontier vision model is winning this week. We constantly hunt for the cheapest, fastest, most reliable, and accurate orchestration so you don't have to. Your agent runs docex setup, you click one approval link, and the key lands in your .env. The human part is about 90 seconds.
If your agent has the Docex skill, it picks this up natively. If not, the prompt is explicit enough that any decent agent figures out the CLI. Either way, you don't touch the docs.
Set up Docex in this project for [describe what you want to extract —
e.g., "company details from trade licenses during onboarding"].
Take me through the GitHub approval and the $5 wallet top-up, wire the
API key into my env, scaffold a server-side extraction endpoint for my
stack, and run a smoke test when you're done.
All three assume a developer sits down and reads docs. Fine if that's your workflow. Painfully slow if you've been letting an agent do the wiring — and you still have to track which vision model is winning this week.
Fastest to start, slowest to maintain. You own model selection (and re-selection every week), prompt tuning, HEIC → JPEG, orientation fix, OCR fallback, page chunking, retries, schema validation, and the Tuesday 3pm "why did this invoice break" ticket.
OCR is fine. Layout parsing is fine. But most predate the multimodal jump, so you're stitching them to an LLM yourself — and reading IAM docs before you read a file.
Shaped around the agent doing the work. One approval link, a wallet, a skill in your repo. The engine underneath is whatever frontier vision model is currently winning. You don't track the frontier. We do.
One approval link. GitHub sign-in. $5 wallet top-up. The scoped key returns to your agent and lands in your .env. Your job in the loop: click approve. That's the whole human part.
No annual commit, no per-seat fee, no "talk to sales." Five dollars is about a hundred pages at current rates. Run out, top up, keep going. Like a coffee card, but for document extraction.
Provider abstraction over the current best multimodal models — Anthropic by default, more behind feature flags. When a better model ships next week, your extractions get better with zero code change. We constantly hunt for the cheapest, fastest, most reliable orchestration so you don't have to.
PDF, JPEG, PNG, HEIC. Trade licenses, invoices, IDs, receipts photographed at a weird angle in a car at 6pm. Same SDK call, same response shape. Describe the fields; get them back. No preprocessing pipeline to maintain.
The agent receives the scoped API key on the other side and writes it into your project. No portal tour. No "where do I find my key" thread. No key rotation dance every 90 days.
Crooked iPhone HEIC, blurry scan, 12-page PDF — same call. Docex handles orientation, glare, OCR fallback, schema validation, retries, and provenance. You describe the fields. We handle the rest.
await docex.extract({ file: "./uploaded-license.heic", prompt: "Pull legal name, license no., expiry.", }); // → 200 OK · 2.4s · $0.05 { "legal_name": "ACME LOGISTICS L.L.C", "license_no": "1019388", "expires_on": "2026-03-13" }
Drop in five dollars. About a hundred pages at current upstream rates. Run out, top up, keep going. No card on file until you want one there. No invoice. No procurement cycle.
See what your agent extracted, the prompt it ran, the credits it burned, the failures and why. No demo banner. No upsell modal. No "schedule a call to unlock this view."
People shipping extraction as a feature, not a research project. Already coding with an agent in the loop. Onboarding, KYC, claims, expense capture, invoice ingestion, contract review. If you'd rather ship than integrate, you're the target.
The bottleneck moved from "can the agent write the code" to "can the agent get through your signup, billing, and API key UI." Most dev tools weren't designed for an agent doing the clicking. Docex was.
Crooked phone photos, scanned 40-page leases, hand-annotated invoices — read better today than the six-month pipeline you'd have built two years ago. The hard part isn't reading the file anymore. It's constantly hunting for the cheapest, fastest, most reliable, and accurate orchestration. That's the layer we own.
$ docex setup \ --use-case "extract trade licenses" \ --top-up 5 → opening approval link… ✓ wallet funded · $5.00 ✓ key written to .env.local ✓ skill installed for claude-code
$ docex extract ./uploaded-license.heic \ --prompt "Pull legal name, no., expiry" → 1 page · anthropic · sonnet-4.5 ✓ 200 OK · 2.41s · $0.05 { "legal_name": "ACME LOGISTICS L.L.C", "license_no": "1019388", "expires_on": "2026-03-13" }
Approve the link, drop in five bucks, watch the smoke test pass. Frontier vision, agent-installed, pay-as-you-go. Five minutes from paste to live.