Lesson 2.2: Tokens and Tokenization
Duration: 50 minutes
Learning Objectives
By the end of this lesson, you will be able to:
- Define what tokens are and why they matter
- Understand how text is tokenized by different models
- Count tokens for cost estimation
- Optimize prompts to reduce token usage
- Handle tokenization edge cases in your applications
Introduction
When you send text to an LLM, it does not see words or characters - it sees tokens. Understanding tokenization is essential for:
- Estimating costs: You pay per token, not per word
- Managing context: Context windows are measured in tokens
- Writing better prompts: Some phrases tokenize more efficiently than others
- Debugging issues: Strange behavior often relates to tokenization
Let us explore how LLMs break down text.
What is a Token?
A token is the basic unit of text that an LLM processes. It is not quite a word, not quite a character - it is something in between.
Common Tokenization Patterns
┌─────────────────────────────────────────────────────────┐
│ Tokenization Examples │
├─────────────────────────────────────────────────────────┤
│ Text │ Tokens │
├────────────────────┼────────────────────────────────────┤
│ "Hello" │ ["Hello"] (1) │
│ "hello" │ ["hello"] (1) │
│ "Hello!" │ ["Hello", "!"] (2) │
│ "don't" │ ["don", "'t"] (2) │
│ "ChatGPT" │ ["Chat", "G", "PT"] (3) │
│ "TypeScript" │ ["Type", "Script"] (2) │
│ " spaces " │ [" ", "spaces", " "] (3) │
│ "2024" │ ["202", "4"] or ["2024"] (1-2)│
└─────────────────────────────────────────────────────────┘
Rules of Thumb
- 1 token is approximately 4 characters in English
- 1 token is approximately 0.75 words in English
- 100 tokens is approximately 75 words
- Other languages often use more tokens per word
// Rough estimation function
function estimateTokens(text: string): number {
// This is an approximation - actual count may vary
return Math.ceil(text.length / 4);
}
const text = 'Hello, how are you today?';
console.log(`Characters: ${text.length}`); // 26
console.log(`Estimated tokens: ${estimateTokens(text)}`); // ~7
// Actual tokens with GPT: 7
How Tokenization Works
LLMs use algorithms like Byte Pair Encoding (BPE) to build their vocabulary. Here is a simplified explanation:
Building a Vocabulary
- Start with individual characters
- Find the most common pair of characters
- Merge them into a new token
- Repeat until vocabulary is large enough (typically 50,000-100,000 tokens)
Step 1: Start with characters
"hello hello" → ["h","e","l","l","o"," ","h","e","l","l","o"]
Step 2: "l" + "l" is common, merge to "ll"
"hello hello" → ["h","e","ll","o"," ","h","e","ll","o"]
Step 3: "he" is common, merge to "he"
"hello hello" → ["he","ll","o"," ","he","ll","o"]
Step 4: "llo" is common... and so on
Eventually: ["hello"," ","hello"] or ["hello"," hello"]
Why Not Just Use Words?
Using whole words would create problems:
- Vocabulary size: English has hundreds of thousands of words
- New words: How would the model handle "COVID-19" or "TikTok"?
- Typos: "teh" would be unknown
- Other languages: Some languages do not have clear word boundaries
Subword tokenization handles all of these gracefully.
Different Tokenizers
Each model family uses its own tokenizer. The same text produces different tokens:
GPT Tokenizer (tiktoken)
// Using OpenAI's tiktoken library
import { encoding_for_model } from 'tiktoken';
const encoder = encoding_for_model('gpt-4');
const text = 'Hello, TypeScript developers!';
const tokens = encoder.encode(text);
console.log('Token count:', tokens.length); // 6
console.log('Token IDs:', tokens); // [9906, 11, 88557, 13324, 0]
// Decode back to text
const decoded = encoder.decode(tokens);
console.log('Decoded:', decoded); // "Hello, TypeScript developers!"
encoder.free(); // Clean up
Claude Tokenizer
Anthropic uses a different tokenizer. While they do not provide a public library, you can estimate:
// Claude uses roughly similar tokenization
// But exact counts may differ from GPT
// Anthropic provides token counts in API responses
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 100,
messages: [{ role: 'user', content: 'Hello, world!' }],
});
console.log('Input tokens:', response.usage.input_tokens);
console.log('Output tokens:', response.usage.output_tokens);
Comparison
| Text | GPT-4 Tokens | Claude Tokens (approx) |
|---|---|---|
| "Hello" | 1 | 1 |
| "artificial intelligence" | 2 | 2 |
| "TypeScript" | 2 | 1-2 |
| "supercalifragilisticexpialidocious" | 9 | ~8-10 |
Token Counting in Practice
Using OpenAI's Tokenizer
import { encoding_for_model } from 'tiktoken';
function countTokens(text: string, model: string = 'gpt-4'): number {
const encoder = encoding_for_model(model as any);
const tokens = encoder.encode(text);
encoder.free();
return tokens.length;
}
// Count tokens in a conversation
function countConversationTokens(messages: Array<{ role: string; content: string }>): number {
let totalTokens = 0;
for (const message of messages) {
// Each message has overhead for role and formatting
totalTokens += 4; // Approximate overhead per message
totalTokens += countTokens(message.content);
}
totalTokens += 2; // Conversation overhead
return totalTokens;
}
const conversation = [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'What is TypeScript?' },
];
console.log('Total tokens:', countConversationTokens(conversation));
Cost Estimation
interface PricingTier {
inputPer1M: number; // Cost per 1M input tokens
outputPer1M: number; // Cost per 1M output tokens
}
const pricing: Record<string, PricingTier> = {
'gpt-4o': { inputPer1M: 2.5, outputPer1M: 10.0 },
'gpt-4o-mini': { inputPer1M: 0.15, outputPer1M: 0.6 },
'claude-3-5-sonnet': { inputPer1M: 3.0, outputPer1M: 15.0 },
'claude-3-haiku': { inputPer1M: 0.25, outputPer1M: 1.25 },
};
function estimateCost(inputTokens: number, outputTokens: number, model: string): number {
const tier = pricing[model];
if (!tier) throw new Error(`Unknown model: ${model}`);
const inputCost = (inputTokens / 1_000_000) * tier.inputPer1M;
const outputCost = (outputTokens / 1_000_000) * tier.outputPer1M;
return inputCost + outputCost;
}
// Example: 1000 input tokens, 500 output tokens
const cost = estimateCost(1000, 500, 'gpt-4o');
console.log(`Estimated cost: $${cost.toFixed(4)}`); // $0.0075
Tokenization Gotchas
Numbers and Special Characters
Numbers often tokenize unpredictably:
"2024" → ["202", "4"] or ["2024"] (varies by model)
"3.14159" → ["3", ".", "14", "159"] (4 tokens)
"$100" → ["$", "100"] (2 tokens)
"100USD" → ["100", "USD"] (2 tokens)
Whitespace Matters
"Hello World" → ["Hello", " World"] (2 tokens)
"Hello World" → ["Hello", " ", "World"] (3 tokens)
"HelloWorld" → ["Hello", "World"] (2 tokens)
Leading spaces are often part of tokens:
// These are different tokens!
const token1 = ' hello'; // includes leading space
const token2 = 'hello'; // no leading space
// This is why prompt formatting matters
Non-English Languages
Other languages typically require more tokens:
English: "Hello" → 1 token
Spanish: "Hola" → 1 token
Chinese: "你好" → 2 tokens (or more)
Japanese: "こんにちは" → 5+ tokens
Arabic: "مرحبا" → 2+ tokens
This affects cost - the same message in different languages has different token counts.
Code and Technical Content
Code often tokenizes efficiently because common patterns are in the vocabulary:
"function" → 1 token
"const" → 1 token
"async function" → 2 tokens
"console.log" → 3 tokens (console, ., log)
"======" → 1-2 tokens (common separator)
Optimizing Token Usage
Technique 1: Be Concise
// Verbose (more tokens, higher cost)
const verbosePrompt = `
I would really appreciate it if you could please help me
understand what TypeScript is and why I should consider
using it in my projects. Could you explain this to me?
`;
// Concise (fewer tokens, lower cost)
const concisePrompt = 'Explain TypeScript and its benefits.';
// Both get similar quality responses
// But concise version uses ~80% fewer tokens
Technique 2: Use Efficient Formats
// Inefficient: Natural language description
const inefficientData = `
The user's name is John Smith. Their email address is
john.smith@email.com. They are 30 years old and live
in New York City.
`;
// Efficient: Structured format
const efficientData = `
Name: John Smith
Email: john.smith@email.com
Age: 30
City: New York City
`;
// Even more efficient: JSON (if model handles it well)
const jsonData = JSON.stringify({
name: 'John Smith',
email: 'john.smith@email.com',
age: 30,
city: 'New York City',
});
Technique 3: Avoid Repetition
// Instead of repeating instructions in every message
const messages = [
{ role: 'system', content: 'You are a helpful assistant. Always be concise.' },
{ role: 'user', content: 'What is TypeScript? Be concise.' }, // Redundant!
];
// Put instructions once in the system message
const betterMessages = [
{ role: 'system', content: 'You are a helpful assistant. Always be concise.' },
{ role: 'user', content: 'What is TypeScript?' },
];
Technique 4: Truncate Intelligently
function truncateToTokenLimit(text: string, maxTokens: number): string {
// Rough estimation: 4 chars per token
const maxChars = maxTokens * 4;
if (text.length <= maxChars) {
return text;
}
// Truncate at word boundary
const truncated = text.slice(0, maxChars);
const lastSpace = truncated.lastIndexOf(' ');
return truncated.slice(0, lastSpace) + '...';
}
const longText = 'Very long document...'.repeat(1000);
const truncated = truncateToTokenLimit(longText, 1000);
Practical Example: Token-Aware Chat Application
import OpenAI from 'openai';
import { encoding_for_model } from 'tiktoken';
const openai = new OpenAI();
interface Message {
role: 'system' | 'user' | 'assistant';
content: string;
}
class TokenAwareChat {
private messages: Message[] = [];
private maxContextTokens = 4000; // Leave room for response
private encoder = encoding_for_model('gpt-4');
constructor(systemPrompt: string) {
this.messages.push({ role: 'system', content: systemPrompt });
}
private countTokens(text: string): number {
return this.encoder.encode(text).length;
}
private getTotalTokens(): number {
let total = 0;
for (const msg of this.messages) {
total += this.countTokens(msg.content) + 4; // Message overhead
}
return total + 2; // Conversation overhead
}
private trimHistory(): void {
// Remove old messages if we exceed limit
while (
this.getTotalTokens() > this.maxContextTokens &&
this.messages.length > 2 // Keep system + at least one message
) {
// Remove the second message (first user message)
this.messages.splice(1, 1);
}
}
async chat(userMessage: string): Promise<string> {
this.messages.push({ role: 'user', content: userMessage });
this.trimHistory();
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: this.messages,
});
const assistantMessage = response.choices[0].message.content || '';
this.messages.push({ role: 'assistant', content: assistantMessage });
console.log(`Total tokens used: ${response.usage?.total_tokens}`);
console.log(`Estimated cost: $${this.estimateCost(response.usage)}`);
return assistantMessage;
}
private estimateCost(usage: any): string {
if (!usage) return 'unknown';
const inputCost = (usage.prompt_tokens / 1_000_000) * 0.15;
const outputCost = (usage.completion_tokens / 1_000_000) * 0.6;
return (inputCost + outputCost).toFixed(6);
}
cleanup(): void {
this.encoder.free();
}
}
// Usage
const chat = new TokenAwareChat('You are a helpful TypeScript tutor.');
const response1 = await chat.chat('What are generics?');
console.log(response1);
const response2 = await chat.chat('Can you give me an example?');
console.log(response2);
chat.cleanup();
Exercises
Exercise 1: Estimate Token Count
Without using a tokenizer, estimate the token count for these strings:
- "Hello, world!"
- "The quick brown fox jumps over the lazy dog."
- "TypeScript is a typed superset of JavaScript."
Solution
Using the 4-characters-per-token rule:
- "Hello, world!" = 13 characters ≈ 4 tokens (actual: 4)
- "The quick brown fox..." = 44 characters ≈ 11 tokens (actual: 9)
- "TypeScript is a typed..." = 45 characters ≈ 11 tokens (actual: 9)
The rule gives rough estimates. Actual counts depend on the tokenizer and how common the words are.
Exercise 2: Optimize This Prompt
Rewrite this prompt to use fewer tokens while maintaining the same intent:
I would be very grateful if you could please take a moment
to explain to me in great detail what the difference is
between var, let, and const in JavaScript programming
language. Please make sure to include examples.
Solution
Explain the difference between var, let, and const in JavaScript with examples.
Original: ~50 tokens Optimized: ~15 tokens
The optimized version is 70% shorter but gets the same quality response.
Exercise 3: Calculate Cost
You are building a customer service bot. Each conversation averages:
- System prompt: 200 tokens
- Customer message: 50 tokens
- Bot response: 150 tokens
- Average 5 exchanges per conversation
Using GPT-4o-mini pricing ($0.15/1M input, $0.60/1M output), calculate:
- Tokens per conversation
- Cost per conversation
- Cost for 10,000 conversations/month
Solution
-
Tokens per conversation:
- Input: 200 (system) + 5 _ 50 (user) + 5 _ 150 (prior assistant) = 200 + 250 + 750 = 1,200 input tokens
- Output: 5 * 150 = 750 output tokens
-
Cost per conversation:
- Input: (1,200 / 1,000,000) * $0.15 = $0.00018
- Output: (750 / 1,000,000) * $0.60 = $0.00045
- Total: $0.00063 per conversation
-
Cost for 10,000 conversations:
- $0.00063 * 10,000 = $6.30/month
Note: This is a simplified calculation. Real conversations vary in length.
Key Takeaways
- Tokens are the basic unit LLMs use to process text - roughly 4 characters or 0.75 words
- Different models use different tokenizers - exact counts vary
- You pay per token - understanding tokenization helps manage costs
- Non-English text uses more tokens - plan accordingly for multilingual apps
- Concise prompts save money without sacrificing quality
- Track token usage in your applications to manage costs and context limits
Resources
| Resource | Type | Description |
|---|---|---|
| OpenAI Tokenizer | Tool | Interactive token counter |
| tiktoken Library | Library | Official OpenAI tokenizer for Python |
| tiktoken for JS | Library | JavaScript port of tiktoken |
| OpenAI Pricing | Documentation | Current API pricing |
| Anthropic Pricing | Documentation | Claude API pricing |
Next Lesson
Now that you understand tokens, let us explore how many tokens you can use at once - the context window.