Overview

Your app can now see.
Understand anything.
Ship faster.

Your users upload files — blurry scans, crumpled receipts, low-res screenshots, multi-page PDFs. You need structured data. Docex is the vision layer that handles the models, the OCR, the retries, and the schema validation. One API call. No pipeline to maintain.

Start with GitHub ↵ See how it works

01 · setup

02 · approve

03 · connect

04 · analyze

Infrastructure

You shouldn't need
a vision team.

Building image-to-data pipelines means evaluating models, handling OCR fallbacks, writing retry logic, and watching costs spiral. Or you lock into one provider and hope it works on every input. Docex is infrastructure — an orchestration engine with an expanding model library that routes each job to the right pipeline, handles failure automatically, and keeps you honest on cost.

GROCERY MART

Milk$3.49

Bread$2.99

Eggs$4.29

TOTAL $10.77

J. DOE

ID-8842-91

INVOICE #1042

Cloud Services Inc.

$2,450.00 USD

Due: 14 days

Dynamic model selection Automatic fallback Structured output 2× upstream billing Schema validation Provider abstraction

Engine

The right model.
Every time.

Docex maintains a catalog of vision and OCR providers. For each job, it selects the optimal pipeline based on your input, budget, and latency requirements. If a provider fails or returns low confidence, it falls back automatically. You describe what you need in plain text. Docex handles the rest.

Input

Router

Claude Vision

GPT-4o OCR

Mistral Fallback

JSON

Trade License · Mainland

LegalACME LOGISTICS L.L.C

License1019388

Issued14·03·2024

Expires13·03·2026

ActivityFreight forwarding

IMG_4128.HEICiPhone 15

BISTRO 42

Truffle Pasta$24.00

Sparkling$8.50

Tiramisu$12.00

Total$44.50

Thu 09·05·24 · Table 7

receipt.jpgPixel 7

await docex.run({
  file: "./uploaded-license.heic",
  prompt: "company name, number, expiry",
});

// → 200 OK · 2.4s · ~$0.03
{
  "legal_name": "ACME LOGISTICS L.L.C",
  "license_no": "1019388",
  "expires_on": "2026-03-13"
}

Use Cases

Any image.
Any task.

Standard document extraction is where most tools stop. Docex starts there and keeps going. Same API. Same structured output. Any input you throw at it.

KYC

KYC & Onboarding

IDs, licenses, bank statements, proof-of-address

SEC

Security & Compliance

Email attachments, suspicious screenshots, scan reports

FIN

Finance & Expenses

Invoices, receipts, purchase orders — crumpled, low-light, any angle

MED

Media & Video

Low-res frame grabs, compressed thumbnails, screen captures

LGL

Legal & Contracts

Parties, clauses, signature detection, term extraction

OPS

Logistics & Operations

Shipping labels, waybills, manifests, damage photos

HLT

Healthcare & Forms

Medical records, insurance claims, lab results, handwritten notes

ANY

Your Workflow

This is what we've tested. Docex adapts to any image-to-data task you have in store.

Pricing

Predictable cost.
No surprises.

We charge 2× what the upstream provider charges us. No markup games, no opaque credits. For most inputs, that means pennies per request. Dynamic routing helps — when a cheaper OCR model handles the job, you pay less. Drop in five dollars to start. Cancel anytime. No annual contract, no sales call.

Billed at 2× upstream cost
$5 minimum top-up
Mock provider for CI — $0
Cancel anytime, no lock-in

Add funds →

Wallet · acme-prod ● Active

$3.18 ≈ 64 requests

$5.00last top-up

$1.82spent this week

2×upstream cost

Deploy

Ship vision.
In five minutes.

Paste this into your coding agent. It wires Docex into your product, scaffolds the endpoint for your stack, and runs a smoke test. You approve one link. Production-ready vision analysis without the production-ready team.

prompt.txt — copy & paste ⧉ copy

Wire Docex into this project as the vision analysis layer for
[describe the use case — e.g., "reading trade licenses during user onboarding"].

Take me through the GitHub approval and the $5 wallet top-up, store the
API key in my env, scaffold a server-side analysis endpoint for my
stack, and run a smoke test to confirm the integration works end-to-end.

Start with GitHub ↵ Talk to us

Your app can now see. Understand anything. Ship faster.

You shouldn't need a vision team.

The right model. Every time.

Any image. Any task.