LLM Cost Calculator — GPT, Claude, Gemini, Llama Pricing
Skip to main content

LLM Cost Calculator

Calculate token spend across 16 models. Prices last updated 2026-05-01. Verify against provider docs before relying on a quote.

$2.5/M input · $10/M output · $1.25/M cached

Used for per-request average

Input

$2.50

Output

$2.50

Cached

$0.00

Total

$5.00

$5.00 / request

Cheapest models for this workloadTotal
Gemini 2.0 Flash-LiteGoogle
$0.1500
Gemini 2.0 FlashGoogle
$0.2000
GPT-4o miniOpenAI
$0.3000
DeepSeek V3DeepSeek
$0.5450
Llama 3.3 70BMeta
$0.7875
GPT-4.1 miniOpenAI
$0.8000
o3-miniOpenAI
$2.20
Claude Haiku 4Anthropic
$2.25

About LLM Cost Calculator

Calculate the token spend for any combination of input, output, and cached tokens across 16 models from OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, and xAI. The comparison table ranks every model by total cost for your specific workload so you can make an informed provider choice.

What this tool does

  • Per-model breakdown — input, output, and cached input costs shown separately.
  • Per-request average — divide total cost by request count to see unit economics.
  • Cheapest-first leaderboard — top 8 models ranked for your exact token mix.
  • Cached input support — models with published cache rates use them; others fall back to the standard input rate.

Pipeline

Frequently asked

How are token costs calculated?
Cost = (tokens × price per million) ÷ 1,000,000. Input and output tokens are priced separately because output generation is more compute-intensive. Cached input tokens (where supported) are priced at a discount — typically 50–90% off the standard input rate.
What is prompt caching and how does it affect cost?
Prompt caching lets you reuse a previously processed prefix (system prompt, documents, few-shot examples) across multiple requests. The provider stores the KV cache and charges a reduced rate for cache hits. Anthropic charges ~10% of the input rate for cached tokens; OpenAI charges ~50%. If your system prompt is large and reused across many calls, caching can cut costs dramatically.
How current are the prices?
Prices are hardcoded with a last-updated date shown on the page. LLM pricing changes frequently — always verify against the provider's official pricing page before committing to a budget. The tool is designed for quick estimates, not billing-accurate quotes.
Why does the cheapest model not always win?
Cost is only one dimension. Cheaper models may require more tokens to produce the same quality output (more retries, longer prompts for few-shot examples), which can erase the per-token savings. The comparison table shows raw token cost; factor in quality and retry rate for a true total-cost-of-ownership comparison.
How do I estimate tokens without running the model?
Use the LLM Token Counter tool on this site to get an approximate count for your prompt text. For exact counts, use the provider's tokenizer: tiktoken for OpenAI, the Anthropic SDK's token counting endpoint, or sentencepiece for Gemini/Llama.