LLM Cost Calculator
Calculate token spend across 16 models. Prices last updated 2026-05-01. Verify against provider docs before relying on a quote.
$2.5/M input · $10/M output · $1.25/M cached
Used for per-request average
Input
$2.50
Output
$2.50
Cached
$0.00
Total
$5.00
$5.00 / request
| Cheapest models for this workload | Total |
|---|---|
Gemini 2.0 Flash-LiteGoogle | $0.1500 |
Gemini 2.0 FlashGoogle | $0.2000 |
GPT-4o miniOpenAI | $0.3000 |
DeepSeek V3DeepSeek | $0.5450 |
Llama 3.3 70BMeta | $0.7875 |
GPT-4.1 miniOpenAI | $0.8000 |
o3-miniOpenAI | $2.20 |
Claude Haiku 4Anthropic | $2.25 |
About LLM Cost Calculator
Calculate the token spend for any combination of input, output, and cached tokens across 16 models from OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, and xAI. The comparison table ranks every model by total cost for your specific workload so you can make an informed provider choice.
What this tool does
- Per-model breakdown — input, output, and cached input costs shown separately.
- Per-request average — divide total cost by request count to see unit economics.
- Cheapest-first leaderboard — top 8 models ranked for your exact token mix.
- Cached input support — models with published cache rates use them; others fall back to the standard input rate.
Pipeline
- LLM Token Counter — count tokens in your prompt text, then pipe the count here.
- Prompt Template Tester — render your template with real variables, then estimate cost.
Frequently asked
- How are token costs calculated?
- Cost = (tokens × price per million) ÷ 1,000,000. Input and output tokens are priced separately because output generation is more compute-intensive. Cached input tokens (where supported) are priced at a discount — typically 50–90% off the standard input rate.
- What is prompt caching and how does it affect cost?
- Prompt caching lets you reuse a previously processed prefix (system prompt, documents, few-shot examples) across multiple requests. The provider stores the KV cache and charges a reduced rate for cache hits. Anthropic charges ~10% of the input rate for cached tokens; OpenAI charges ~50%. If your system prompt is large and reused across many calls, caching can cut costs dramatically.
- How current are the prices?
- Prices are hardcoded with a last-updated date shown on the page. LLM pricing changes frequently — always verify against the provider's official pricing page before committing to a budget. The tool is designed for quick estimates, not billing-accurate quotes.
- Why does the cheapest model not always win?
- Cost is only one dimension. Cheaper models may require more tokens to produce the same quality output (more retries, longer prompts for few-shot examples), which can erase the per-token savings. The comparison table shows raw token cost; factor in quality and retry rate for a true total-cost-of-ownership comparison.
- How do I estimate tokens without running the model?
- Use the LLM Token Counter tool on this site to get an approximate count for your prompt text. For exact counts, use the provider's tokenizer: tiktoken for OpenAI, the Anthropic SDK's token counting endpoint, or sentencepiece for Gemini/Llama.