What Is a Token?
A token is a chunk of text — roughly 3–4 characters or about 0.75 words in English. Tokenization is not simply splitting on spaces; punctuation, subwords, and whitespace each contribute their own tokens.| Text | Approximate tokens |
|---|---|
"Hello, world!" | 4 |
"Explain quantum entanglement." | 5 |
| A typical paragraph (100 words) | ~130 |
| A full A4 page of text | ~500–700 |
| A 10,000-word document | ~13,000 |
Token counts vary slightly by model because each provider uses a different tokenizer. The figures above are approximate. The
usage field in every API response gives you the exact counts for that call.Token Types in a Response
Theusage object returned with every completion breaks down token usage:
| Field | What it counts |
|---|---|
prompt_tokens | All tokens in your messages array, including system prompts |
completion_tokens | Tokens generated by the model in this response |
total_tokens | Sum of the above |
completion_tokens includes internal reasoning tokens. Some responses include a completion_tokens_details breakdown.
Context Windows
A model’s context window is the maximum number of tokens it can process in a single request — the sum ofprompt_tokens and completion_tokens combined.
| Model | Context window |
|---|---|
gpt-4o-mini | 128,000 tokens |
gpt-4o | 128,000 tokens |
o3, o4-mini | 200,000 tokens |
claude-opus-4-5 | 200,000 tokens |
gemini-2.5-pro | 1,000,000 tokens |
400 error. Always leave headroom for the model’s output — if the window is 128K and your prompt is 127K tokens, the model has almost no room to respond.
Controlling Output Length
Usemax_tokens to cap how many tokens the model generates. This prevents runaway costs and enforces response length for your use case.
Managing Long Conversations
Because the fullmessages array is sent on every request, conversation costs grow with each turn. For long sessions, use one of these strategies:
Sliding window
Sliding window
Keep only the last N turns in the messages array, always preserving the
system message at the start.Summarization
Summarization
When history gets long, ask the model to summarize it, then replace the old messages with the summary.
Prompt caching
Prompt caching
If your system prompt or context is large and stable across many requests, enable Prompt Caching. The first request processes and caches the prefix; subsequent requests with the same prefix pay a fraction of the cost.
Estimating Costs Before Sending
You can estimate token usage before making a request by counting tokens locally. Thetiktoken library implements OpenAI’s tokenizer:
Token Cost Summary
| Token type | Billing |
|---|---|
| Prompt tokens | Charged per model’s input rate |
| Completion tokens | Charged per model’s output rate |
| Cached prompt tokens | Discounted (typically 50–75% off input rate) |
| Reasoning tokens | Billed as output tokens on most models |