Pricing · public beta · CodeSOTA tier framework

Free while we're in beta.
Reasonable after.

You're using hardparse in its public beta. The API is free and keyless — no card, no trial timer. When we exit beta, pricing follows CodeSOTA's three-tier framework: you pick the trade-off, we run the inference.

You are here · beta

Anonymous, keyless, free.

Every request to POST /v1/parse goes through our GPU pipeline on open-weight VLMs. No signup. No auth. No credit card. Just ship.

  • 100 requests / 24h per IP · rolling window
  • PDF, images, scans up to MB
  • Tables → Markdown tables · formulas → LaTeX · layout preserved
  • Python + TypeScript SDKs, zero deps
$0 while in beta · no cap by signup, just per-IP rate limit
§ after beta

Balanced is the default.
Everything else is opt-in.

Three tiers mapping onto the benchmarks we publish at codesota.com/ocr. /v1/parse routes to tier: "balanced" unless you opt into something else. Prices are approximate — we'll finalize when the Stripe dial is flipped.

tier: "balanced" default

95% quality.
~20× cheaper.

Best open-weight VLM sitting within a few points of the frontier at a tiny fraction of the cost. GLM-OCR / PaddleOCR-VL on our GPUs. This is the tier you want 99% of the time — production workloads, RAG pipelines, document ingestion jobs.

Target price
~$0.10 / 1K pages
Quality
≥ 95% of SOTA
Typical use
Production
Latency
Fast (GPU pool)
tier: "sota"

The best — when it matters.

Frontier quality for the last couple of points: compliance runs, eval suites, audit trails. Routes to whichever model is top of the CodeSOTA OCR leaderboard. Most workloads don't need this. When they do, one flag flips.

Target price
~$1–$15 / 1K pages
Quality
100% of SOTA
Typical use
Compliance, eval
Latency
Medium
tier: "cheap"

For scale, batch, edge.

Smallest model that still clears the quality bar — 3–8B open checkpoints, distilled variants. When you're doing a million calls, money wins. Background OCR, bulk archive work, pre-processing before LLM calls.

Target price
~$0.03 / 1K pages
Quality
≥ 85% of SOTA
Typical use
Scale, batch
Latency
Instant
Vs. the vendors

Why "reasonable" matters.

Vendor APIs charge 10–150× what open-weight VLMs actually cost to run. We pass the open-model economics through, keep a fair margin, and publish the benchmarks so you can check our work.

Vendor Per 1K pages Quality class Open weights
Hardparse ● live $0 today · ~$0.10 after balanced · 95% of SOTA yes
Google Document AI $100–$1,500 Good on printed docs no
Azure AI Document Intelligence $50–$1,500 Tiers by feature no
Mistral OCR $1,000 Closed, fast no
Tesseract (self-host) $0 + ops Dated · loses structure yes
FAQ

Fair questions.

When does beta end?

When the pipeline is stable enough that we'd be comfortable billing someone for it. No hard date — we'd rather leave it free a month too long than charge for something that's still rough. We'll email everyone who's used the API before flipping the switch, and keep a grandfathered tier for early adopters.

Will the current free tier stay free after beta?

Yes — there will always be an anonymous free tier rate-limited by IP. We expect that limit to drop (from 100/day to maybe 25/day) when paid tiers activate, but the keyless "try it now" workflow is permanent.

Why three tiers instead of a flat subscription?

Because the cost gap between the best open VLM and the best frontier model is roughly 150×, and forcing every user onto the frontier wastes money 99% of the time. This is the CodeSOTA thesis: three tiers, you pick the trade-off, we run the rest.

Are the prices locked in?

No — numbers above are targets, not commitments. We'll finalize when we have enough throughput data from beta to price without guessing. If the final number surprises you, we'll grandfather beta users at the target prices shown here.

Can I self-host instead?

Not today. The models are open-weight — GLM-OCR and PaddleOCR-VL-1.5 are on Hugging Face — so nothing stops you from running the pipeline yourself. We're working on a published Docker image for teams that need on-prem for compliance reasons. Email hi@hardparse.com if that's you.

What about volume / enterprise?

Email hi@hardparse.com with your expected volume and any compliance constraints. Volume discounts and dedicated capacity are available today in beta for teams willing to commit.

Part of CodeSOTA
Pricing transparency comes from benchmarks.

We publish the numbers behind each model at codesota.com. If a frontier API beats the open-source tier for your task, we'll say so — and so will the leaderboard.

Browse OCR benchmarks