LLM API Cost Calculator
Estimate monthly API cost for any LLM workload across Anthropic and OpenAI models. Includes prompt-cache and batch-API math so you see what you actually pay — not just the headline rate.
Estimate what an LLM feature will actually cost per month across Anthropic Claude and OpenAI GPT models — not just the headline per-token rate. Enter your tokens per call, calls per month, and how much of your input is cacheable, and the calculator shows the monthly bill, the per-call cost, and how much prompt caching and the Batch API save versus the naive rate.
curl -X POST https://toolsamurai.com/api/v1/programming-dev/llm-api-cost-calculator \ -H "Authorization: Bearer sk_live_•••••••••••••••" \ -H "Content-Type: application/json" \ -d '{ "model": "anthropic/claude-sonnet-4.6", "input_tokens_per_call": 2000, "output_tokens_per_call": 500, "calls_per_month": 100000, "cached_input_fraction": 0, "use_batch_api": false }'
The method behind the numbers
LLM pricing is per million tokens, billed separately for input and output, with output usually several times more expensive than input. Monthly cost is (input tokens × input rate + output tokens × output rate) × calls, scaled to a million-token basis.
Two optimisations change the real number. Prompt caching bills repeated input (a long system prompt, a fixed context) at a steep discount on cache reads — so the cached fraction of your input is charged at the cache-read rate instead of the full input rate. The Batch API applies a flat discount (commonly 50%) to both input and output for asynchronous workloads. The tool computes a naive baseline with neither optimisation, then shows the cache savings and batch savings separately so you can see which lever matters for your workload.
See it in practice
2k input / 500 output tokens, 100k calls a month, no caching yet.
- model
- anthropic/claude-sonnet-4.6
- input_tokens_per_call
- 2000
- output_tokens_per_call
- 500
- calls_per_month
- 100000
- cached_input_fraction
- 0
- use_batch_api
- false
70% of input cached and run through the Batch API.
- model
- anthropic/claude-sonnet-4.6
- input_tokens_per_call
- 2000
- output_tokens_per_call
- 500
- calls_per_month
- 100000
- cached_input_fraction
- 0.7
- use_batch_api
- true
Frequently asked questions
Why is output so much more expensive than input?
Generating tokens is more compute-intensive than reading them, so providers price output well above input — often 3–5×. That's why workloads that emit long responses cost more than their input size suggests; trimming output length is frequently the biggest saving.
How much does prompt caching save?
It depends on how much of your input repeats. A long, fixed system prompt or context reused across calls can be billed at a fraction of the input rate on cache reads. Set the cached-input fraction to match the share of input that's identical call-to-call to see the effect.
When can I use the Batch API?
When responses don't need to be real-time — bulk classification, evals, offline generation. It typically halves both input and output cost in exchange for asynchronous processing. It won't help an interactive chat feature that needs an immediate reply.
Are the prices current?
Rates are baked in from public provider pricing as of the date shown in the result (pricing_as_of). Providers change prices and release models, so confirm against the official pricing pages before committing a budget.
Embed LLM API Cost Calculator on your site
Sign in to configure the live preview, theme, defaults, locked inputs, and analytics ID from the embed dashboard.