Pricing · public beta · CodeSOTA tier framework

Free while we're in beta.
Reasonable after.

You're using hardparse in its public beta. The API is free and keyless — no card, no trial timer. When we exit beta, pricing follows CodeSOTA's three-tier framework: you pick the trade-off, we run the inference.

You are here · beta

Anonymous, keyless, free.

Every request to POST /v1/parse goes through our GPU pipeline on open-weight VLMs. No signup. No auth. No credit card. Just ship.

100 requests / 24h per IP · rolling window
PDF, images, scans up to MB
Tables → Markdown tables · formulas → LaTeX · layout preserved
Python + TypeScript SDKs, zero deps

$0 while in beta · no cap by signup, just per-IP rate limit

See the API Need more?

§ after beta

Balanced is the default.
Everything else is opt-in.

Three tiers mapping onto the benchmarks we publish at codesota.com/ocr. /v1/parse routes to tier: "balanced" unless you opt into something else. Prices are approximate — we'll finalize when the Stripe dial is flipped.

tier: "balanced" default

95% quality.
~20× cheaper.

Best open-weight VLM sitting within a few points of the frontier at a tiny fraction of the cost. GLM-OCR / PaddleOCR-VL on our GPUs. This is the tier you want 99% of the time — production workloads, RAG pipelines, document ingestion jobs.

Target price

~$0.10 / 1K pages

Quality

≥ 95% of SOTA

Typical use

Production

Latency

Fast (GPU pool)

tier: "sota"

The best — when it matters.

Frontier quality for the last couple of points: compliance runs, eval suites, audit trails. Routes to whichever model is top of the CodeSOTA OCR leaderboard. Most workloads don't need this. When they do, one flag flips.

Target price

~$1–$15 / 1K pages

Quality

100% of SOTA

Typical use

Compliance, eval

Latency

Medium

tier: "cheap"

For scale, batch, edge.

Smallest model that still clears the quality bar — 3–8B open checkpoints, distilled variants. When you're doing a million calls, money wins. Background OCR, bulk archive work, pre-processing before LLM calls.

Target price

~$0.03 / 1K pages

Quality

≥ 85% of SOTA

Typical use

Scale, batch

Latency

Instant

Vs. the vendors

Why "reasonable" matters.

Vendor APIs charge 10–150× what open-weight VLMs actually cost to run. We pass the open-model economics through, keep a fair margin, and publish the benchmarks so you can check our work.

Vendor	Per 1K pages	Quality class	Open weights
Hardparse ● live	$0 today · ~$0.10 after	balanced · 95% of SOTA	yes
Google Document AI	$100–$1,500	Good on printed docs	no
Azure AI Document Intelligence	$50–$1,500	Tiers by feature	no
Mistral OCR	$1,000	Closed, fast	no
Tesseract (self-host)	$0 + ops	Dated · loses structure	yes

FAQ

Fair questions.

When does beta end?

When the pipeline is stable enough that we'd be comfortable billing someone for it. No hard date — we'd rather leave it free a month too long than charge for something that's still rough. We'll email everyone who's used the API before flipping the switch, and keep a grandfathered tier for early adopters.

Will the current free tier stay free after beta?

Yes — there will always be an anonymous free tier rate-limited by IP. We expect that limit to drop (from 100/day to maybe 25/day) when paid tiers activate, but the keyless "try it now" workflow is permanent.

Why three tiers instead of a flat subscription?

Because the cost gap between the best open VLM and the best frontier model is roughly 150×, and forcing every user onto the frontier wastes money 99% of the time. This is the CodeSOTA thesis: three tiers, you pick the trade-off, we run the rest.

Are the prices locked in?

No — numbers above are targets, not commitments. We'll finalize when we have enough throughput data from beta to price without guessing. If the final number surprises you, we'll grandfather beta users at the target prices shown here.

Can I self-host instead?

Not today. The models are open-weight — GLM-OCR and PaddleOCR-VL-1.5 are on Hugging Face — so nothing stops you from running the pipeline yourself. We're working on a published Docker image for teams that need on-prem for compliance reasons. Email hi@hardparse.com if that's you.

What about volume / enterprise?

Email hi@hardparse.com with your expected volume and any compliance constraints. Volume discounts and dedicated capacity are available today in beta for teams willing to commit.

Part of CodeSOTA

Pricing transparency comes from benchmarks.

We publish the numbers behind each model at codesota.com. If a frontier API beats the open-source tier for your task, we'll say so — and so will the leaderboard.

Browse OCR benchmarks

Free while we're in beta. Reasonable after.

Anonymous, keyless, free.

Balanced is the default.Everything else is opt-in.

95% quality.~20× cheaper.

The best — when it matters.

For scale, batch, edge.

Why "reasonable" matters.

Fair questions.

Free while we're in beta.
Reasonable after.

Balanced is the default.
Everything else is opt-in.

95% quality.
~20× cheaper.