Token Counter — Estimate LLM Tokens for GPT, Claude, Gemini & Llama
When you send a message to an AI model like GPT-4, Claude, or Gemini, your text is first broken into small units called tokens before the model processes it. Tokens are not words — they are subword fragments produced by an algorithm called Byte-Pair Encoding (BPE). The number of tokens in your prompt directly determines what you pay, how much context the model can hold, and whether you hit rate limits. Because each model uses its own tokenizer, a 1,000-word document can produce different token counts in GPT-4 versus Gemini. This tool gives you a fast, model-aware estimate without requiring you to install any libraries. Paste your text below, select your model family, and see the count and cost table instantly.
Paste or type text above to see the token estimate.
What Is a Token?
A token is the atomic unit of text that a large language model reads and writes. Modern LLMs do not process characters or whole words — they use a compression scheme called Byte-Pair Encoding (BPE) to split text into frequently occurring subword chunks. Common short words like "the", "is", "a" are each one token. Longer or rarer words are split: "tokenization" might become ["token", "ization"] (2 tokens), and an unusual proper noun might split into 3 or 4 fragments.
Punctuation typically adds tokens too. A period, comma, or colon is usually a separate token from the word it follows. This is why code and JSON — which are dense with brackets, colons, and quotation marks — produce more tokens per character than plain English prose.
Every model has a context window — a maximum number of tokens it can process in one request. GPT-4o supports 128K tokens. Claude 3 Opus supports 200K. When your prompt plus the expected response approaches this limit, you need to truncate or summarize the input. Knowing your token count upfront prevents surprise context-length errors.
How Token Counts Differ by Model
Different models use different tokenizers trained on different vocabularies. While all major LLMs use variants of BPE, the specific vocabulary size and training data vary. This produces measurable differences in token efficiency:
- GPT-4 / GPT-3.5 (OpenAI cl100k_base tokenizer): The baseline. OpenAI's public guidance
is roughly 4 characters per token for English. You can verify exact counts using the open-source
tiktokenlibrary. - Claude (Anthropic): Very similar to GPT-4. In practice, the difference is usually under 2%. Anthropic provides a token-counting API endpoint for exact counts.
- Gemini (Google): Generally the most efficient for English — approximately 15% fewer tokens for the same text compared to GPT-4. For non-English languages, efficiency varies more.
- Llama 3 (Meta): Slightly less efficient than GPT-4, producing roughly 10% more tokens for the same English text.
These multipliers matter at scale. If you send 10 million prompts per month, a 15% difference in token count translates directly to a 15% difference in your API bill.
Rule of Thumb — 4 Characters per Token for English
OpenAI's own documentation states that, on average, 1 token corresponds to roughly 4 characters of English text, or about 0.75 words. This means a 1,000-word document is approximately 1,333 tokens.
When this rule works well: Clean English prose, blog posts, emails, chat messages, documentation. The 4-char rule typically produces estimates within 5-10% of the true count.
When this rule breaks down:
- Code and JSON: Punctuation characters (braces, colons, semicolons) each consume tokens, pushing the ratio to about 3.3 chars/token — 20-25% more tokens per character.
- Non-English text: CJK characters (Chinese, Japanese, Korean), Arabic, Hindi, and Thai scripts are often represented as multiple bytes in UTF-8, which increases token consumption significantly — typically 1 token per 2-3 characters instead of 1 per 4.
- Numeric data: Long numbers are split at unusual boundaries. "123456789" may become 3-4 tokens rather than what intuition suggests.
- Unusual whitespace: Multiple spaces, tabs, and unusual Unicode whitespace characters can each become separate tokens.
Worked Example: Counting Tokens in a 1,000-Word Email
Suppose you have a 1,000-word plain-English email you want to summarize with GPT-4o. Here is the step-by-step estimate:
Step 1 — Count characters (excluding trailing whitespace) 1,000 words × 5.1 avg chars/word = ~5,100 characters Step 2 — Detect content type Plain English prose → use 4.0 chars per token ratio Step 3 — Compute base tokens 5,100 chars ÷ 4.0 = 1,275 tokens (base estimate) Step 4 — Apply model multiplier (GPT-4 = 1.0×) 1,275 × 1.0 = ~1,275 tokens Step 5 — Compute cost at $2.50 / 1M input tokens (GPT-4o) 1,275 ÷ 1,000,000 × $2.50 = $0.0031875 per prompt 100 prompts → $0.32 1,000 prompts → $3.19 Real-world comparison: tiktoken (GPT-4 cl100k_base) actual count for 1,000 English words typically falls between 1,200 and 1,400 tokens, so this estimate is within the expected ±5-10% accuracy range.
Code Tokenizes Higher
JSON and source code produce significantly more tokens per character than English prose. Consider this minimal JSON API schema:
{"type":"object","properties":{"name":{"type":"string"},"age":{"type":"integer"}}}
Characters: 80
Estimated tokens (prose rule, 4 chars): 20 tokens
Estimated tokens (code rule, 3.3 chars): 25 tokens ← ~24% more
Why? Every { } [ ] : , " counts as a separate token.
The colons, braces, and quoted keys dominate the token budget. This matters most when you are sending function-calling schemas, tool definitions, or structured output instructions as part of a system prompt. A detailed OpenAI function schema for 10 functions might consume 500-800 tokens before you have written a single word of user query. Keep schemas concise to reduce costs.
Non-English Languages Use More Tokens
CJK scripts (Chinese, Japanese, Korean) and other non-Latin scripts tokenize very differently from English. A single visible Chinese character typically encodes as 2-4 UTF-8 bytes, and the tokenizer often maps each character to its own token. This means:
- A 100-character Chinese sentence uses roughly 100-130 tokens, not the ~25 tokens you would expect from the 4-char English rule.
- Arabic script, which uses a right-to-left alphabet with joined letters, produces similar inflation — roughly 1.5-2 tokens per visible character.
- Hindi (Devanagari script) and Thai (no spaces between words) also tokenize at 2-3× the English rate.
If you are building a multilingual product, budget 2-3× more tokens per visible character for CJK or Indic content versus English. This can make non-English API costs 2-3× higher for the same perceived content length.
Why Precise Count Matters (and When It Doesn't)
For one-off or low-volume prompts, a 10% estimation error is irrelevant. If you send 10 prompts a day, the difference between 1,200 and 1,320 tokens costs fractions of a cent.
For production workloads, precision matters for three reasons:
- Billing accuracy: At 10 million requests per month, a 10% token overestimate in your budget planning translates to thousands of dollars of unplanned spend.
- Context window management: If your code assumes 1,200 tokens but the actual count is 1,380, prompts near the context limit will fail with truncation errors.
- Rate limit compliance: Most API providers impose tokens-per-minute limits. Precise counts let you implement accurate request throttling.
For production systems, use the model's official tokenizer: tiktoken for OpenAI models,
Anthropic's token counting API,
or the Vertex AI tokenizer for Gemini. Compare costs across models using our
LLM API Cost Calculator.
Frequently Asked Questions
How many tokens is 1 word?
On average, one English word equals about 1.33 tokens in GPT-4 and Claude. Short, common words like "the", "is", "a" are typically 1 token each. Longer or rarer words (e.g., "cryptocurrency", "photosynthesis") often split into 2-4 subword tokens. The rough rule of thumb is 4 characters per token, which equates to about 0.75 words per token or 1.33 tokens per word.
Why does the same text cost different amounts in different models?
Each model uses a different tokenizer algorithm trained on different vocabularies. Gemini's tokenizer is generally more efficient for English, producing roughly 15% fewer tokens for the same text. Llama 3's tokenizer is slightly less efficient. Beyond tokenizer differences, each provider charges a different price per million tokens — Claude Opus 4 at $15/M is 200× more expensive than Gemini 1.5 Flash at $0.075/M for the same text.
Can I reduce token count by rewording my prompts?
Yes. Shorter, clearer prompts use fewer tokens. Removing filler phrases ("Please", "I would like you to", "Could you please"), cutting redundant context, and using abbreviations in system prompts all reduce token counts. For production workloads, compressing prompts by 10-20% can yield meaningful cost savings at scale. However, shorter prompts sometimes reduce output quality — test before optimizing aggressively.
Does JSON cost more tokens than plain text?
Yes, significantly. JSON and code contain dense punctuation — curly braces, colons, commas, quotation marks — that each count as one or more tokens. Where English prose averages about 4 characters per token, JSON and code average around 3.3 characters per token, meaning JSON is roughly 20-25% more expensive per character than equivalent plain text. This matters when building function-calling or tool-use prompts that include large JSON schemas.
What is the difference between input tokens and output tokens in pricing?
Input tokens are the tokens in your prompt (system prompt + user message + conversation history). Output tokens are the tokens in the model's response. Output tokens are priced 3-5× higher than input tokens for most models — Claude Opus 4 charges $15/M input but $75/M output. For tasks like summarization where the output is much shorter than the input, total cost is dominated by input pricing. For code generation or creative writing, output tokens drive the bill.
Is this counter exact?
No — it is an approximation. Exact token counts require running the same proprietary tokenizer that each model uses. For GPT-4, OpenAI provides the open-source tiktoken library. Anthropic provides a token-counting API endpoint. This tool uses the widely-cited "4 chars per token" heuristic (with adjustments for code and non-English text) and is typically accurate to within 5-15%. For one-off prompts this is fine; for production billing estimates use the official tokenizers.