LLM API Cost Calculator

Estimate monthly API cost for any LLM workload across Anthropic and OpenAI models. Includes prompt-cache and batch-API math so you see what you actually pay — not just the headline rate.

DomainProgrammingVersionv1.0.0Added2026-05-17

Estimate what an LLM feature will actually cost per month across Anthropic Claude and OpenAI GPT models — not just the headline per-token rate. Enter your tokens per call, calls per month, and how much of your input is cacheable, and the calculator shows the monthly bill, the per-call cost, and how much prompt caching and the Batch API save versus the naive rate.

Inputs

Model

Provider/model. Per-Mtok rates baked in — see pricing_as_of for freshness.

Input Tokens / Calltok

Average prompt size: system + user + retrieved context combined.

Output Tokens / Calltok

Calls / Month

Cached Input Fraction

Share of input tokens served from prompt cache (0 = no cache, 1 = fully cached). Anthropic and GPT-5.x get ~90% off cache reads; GPT-4o-class get 50% off.

Use Batch API

Both providers offer 50% off both rates for async batch jobs.

Result

version1.0.0

POST /v1/programming-dev/llm-api-cost-calculatorView API docs →

curl -X POST https://toolsamurai.com/api/v1/programming-dev/llm-api-cost-calculator \
  -H "Authorization: Bearer sk_live_•••••••••••••••" \
  -H "Content-Type: application/json" \
  -d '{
     "model": "anthropic/claude-sonnet-4.6",
     "input_tokens_per_call": 2000,
     "output_tokens_per_call": 500,
     "calls_per_month": 100000,
     "cached_input_fraction": 0,
     "use_batch_api": false
  }'

llmopenaianthropiccostapi-pricingtokensprompt-cachebatch-apigptclaude

How it works

The method behind the numbers

LLM pricing is per million tokens, billed separately for input and output, with output usually several times more expensive than input. Monthly cost is (input tokens × input rate + output tokens × output rate) × calls, scaled to a million-token basis.

Two optimisations change the real number. Prompt caching bills repeated input (a long system prompt, a fixed context) at a steep discount on cache reads — so the cached fraction of your input is charged at the cache-read rate instead of the full input rate. The Batch API applies a flat discount (commonly 50%) to both input and output for asynchronous workloads. The tool computes a naive baseline with neither optimisation, then shows the cache savings and batch savings separately so you can see which lever matters for your workload.

Worked examples

See it in practice

A chat feature on Claude Sonnet

2k input / 500 output tokens, 100k calls a month, no caching yet.

model: anthropic/claude-sonnet-4.6
input_tokens_per_call: 2000
output_tokens_per_call: 500
calls_per_month: 100000
cached_input_fraction: 0
use_batch_api: false

The same workload with caching + batch

70% of input cached and run through the Batch API.

model: anthropic/claude-sonnet-4.6
input_tokens_per_call: 2000
output_tokens_per_call: 500
calls_per_month: 100000
cached_input_fraction: 0.7
use_batch_api: true

FAQ

Frequently asked questions

Why is output so much more expensive than input?

Generating tokens is more compute-intensive than reading them, so providers price output well above input — often 3–5×. That's why workloads that emit long responses cost more than their input size suggests; trimming output length is frequently the biggest saving.

How much does prompt caching save?

It depends on how much of your input repeats. A long, fixed system prompt or context reused across calls can be billed at a fraction of the input rate on cache reads. Set the cached-input fraction to match the share of input that's identical call-to-call to see the effect.

When can I use the Batch API?

When responses don't need to be real-time — bulk classification, evals, offline generation. It typically halves both input and output cost in exchange for asynchronous processing. It won't help an interactive chat feature that needs an immediate reply.

Are the prices current?

Rates are baked in from public provider pricing as of the date shown in the result (pricing_as_of). Providers change prices and release models, so confirm against the official pricing pages before committing a budget.

Embedding

Embed LLM API Cost Calculator on your site

Open embed dashboard Docs

LLM API Cost Calculator

Claude / GPT / Gemini monthly cost

The method behind the numbers

See it in practice

Frequently asked questions

Embed LLM API Cost Calculator on your site

Keep building