LLM Token Counter — GPT, Claude, Gemini, Llama
Skip to main content

LLM Token Counter

Approximate token counts across the four major tokenizer families. All counts are estimates — use the provider's tokenizer for exact billing.

FamilyCharsWordsTokens (approx)
OpenAI (GPT-4 / o1 / o3)
cl100k_base BPE family — English prose ≈ 4 chars/token
000
Anthropic (Claude 4 family)
Anthropic tokenizer — slightly tighter than GPT
000
Google (Gemini 1.5 / 2.0)
SentencePiece — close to GPT baseline
000
Meta (Llama 3 / 3.3)
Tiktoken-derived BPE with code-friendly merges
000
Approximate only. No browser-side library reproduces every provider's exact tokenizer. Numbers here are character-ratio heuristics adjusted for multi-byte text. They're typically within 10% of the real count for English prose, but can drift more on code or CJK input.

About LLM Token Counter

Approximate token counts for the four major tokenizer families — OpenAI (GPT-4/o-series), Anthropic (Claude 4), Google (Gemini 2.0), and Meta (Llama 3.3) — without shipping a multi-megabyte WASM tokenizer. Useful for quick budget checks and prompt sizing before you hit the API.

Accuracy note

Counts are character-ratio heuristics with a multi-byte adjustment. They are typically within 10% for English prose, but can drift more on code-heavy or CJK input. The table labels every count as approximate — never use these numbers for billing estimates without verifying against the provider's tokenizer.

Pipeline

Frequently asked

Why are the counts approximate?
Each provider uses a different tokenizer. OpenAI uses cl100k_base BPE; Anthropic uses a proprietary tokenizer; Google uses SentencePiece; Meta uses a tiktoken-derived BPE. Shipping all four as WASM bundles would add ~4 MB to the page. Instead, this tool uses character-ratio heuristics (adjusted for multi-byte text) that are typically within 10% for English prose. For exact counts, use the provider's SDK.
Why does code or CJK text have more tokens per character?
BPE tokenizers learn merges from training data. Common English words become single tokens; rare words, code symbols, and CJK characters are split into smaller pieces. A Python function with many operators and identifiers can tokenize at 2–3 chars/token instead of the ~4 chars/token baseline for English prose.
What is the context window and why does it matter?
The context window is the maximum number of tokens a model can process in a single call — input + output combined. If your prompt plus expected output exceeds the window, the model will either refuse the request or silently truncate the input. Always leave headroom for the output.
How do I get exact token counts?
For OpenAI: use the tiktoken Python library or the JS port. For Anthropic: call the token counting API endpoint (beta). For Gemini: use the countTokens method in the Google AI SDK. For Llama: use the tokenizer.json from the model repository with the Hugging Face tokenizers library.