Lesson 2.2: Tokens and Tokenization

Duration: 50 minutes

Learning Objectives

By the end of this lesson, you will be able to:

Define what tokens are and why they matter
Understand how text is tokenized by different models
Count tokens for cost estimation
Optimize prompts to reduce token usage
Handle tokenization edge cases in your applications

Introduction

When you send text to an LLM, it does not see words or characters - it sees tokens. Understanding tokenization is essential for:

Estimating costs: You pay per token, not per word
Managing context: Context windows are measured in tokens
Writing better prompts: Some phrases tokenize more efficiently than others
Debugging issues: Strange behavior often relates to tokenization

Let us explore how LLMs break down text.

What is a Token?

A token is the basic unit of text that an LLM processes. It is not quite a word, not quite a character - it is something in between.

Common Tokenization Patterns

Text	Tokens	Count
"Hello"	["Hello"]	1
"hello"	["hello"]	1
"Hello!"	["Hello", "!"]	2
"don't"	["don", "'t"]	2
"ChatGPT"	["Chat", "G", "PT"]	3
"TypeScript"	["Type", "Script"]	2
" spaces "	[" ", "spaces", " "]	3
"2024"	["202", "4"] or ["2024"]	1-2

Rules of Thumb

1 token is approximately 4 characters in English
1 token is approximately 0.75 words in English
100 tokens is approximately 75 words
Other languages often use more tokens per word

// Rough estimation function
function estimateTokens(text: string): number {
  // This is an approximation - actual count may vary
  return Math.ceil(text.length / 4);
}

const text = 'Hello, how are you today?';
console.log(`Characters: ${text.length}`); // 26
console.log(`Estimated tokens: ${estimateTokens(text)}`); // ~7
// Actual tokens with GPT: 7

How Tokenization Works

LLMs use algorithms like Byte Pair Encoding (BPE) to build their vocabulary. Here is a simplified explanation:

Building a Vocabulary

Start with individual characters
Find the most common pair of characters
Merge them into a new token
Repeat until vocabulary is large enough (typically 50,000-100,000 tokens)

Step 1: Start with characters
  "hello hello" → ["h","e","l","l","o"," ","h","e","l","l","o"]

Step 2: "l" + "l" is common, merge to "ll"
  "hello hello" → ["h","e","ll","o"," ","h","e","ll","o"]

Step 3: "he" is common, merge to "he"
  "hello hello" → ["he","ll","o"," ","he","ll","o"]

Step 4: "llo" is common... and so on
  Eventually: ["hello"," ","hello"] or ["hello"," hello"]

Why Not Just Use Words?

Using whole words would create problems:

Vocabulary size: English has hundreds of thousands of words
New words: How would the model handle "COVID-19" or "TikTok"?
Typos: "teh" would be unknown
Other languages: Some languages do not have clear word boundaries

Subword tokenization handles all of these gracefully.

Different Tokenizers

Each model family uses its own tokenizer. The same text produces different tokens:

GPT Tokenizer (tiktoken)

// Using OpenAI's tiktoken library
import { encoding_for_model } from 'tiktoken';

const encoder = encoding_for_model('gpt-4');

const text = 'Hello, TypeScript developers!';
const tokens = encoder.encode(text);

console.log('Token count:', tokens.length); // 6
console.log('Token IDs:', tokens); // [9906, 11, 88557, 13324, 0]

// Decode back to text
const decoded = encoder.decode(tokens);
console.log('Decoded:', decoded); // "Hello, TypeScript developers!"

encoder.free(); // Clean up

Claude Tokenizer

Anthropic uses a different tokenizer. While they do not provide a public library, you can estimate:

// Claude uses roughly similar tokenization
// But exact counts may differ from GPT

// Anthropic provides token counts in API responses
const response = await anthropic.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 100,
  messages: [{ role: 'user', content: 'Hello, world!' }],
});

console.log('Input tokens:', response.usage.input_tokens);
console.log('Output tokens:', response.usage.output_tokens);

Comparison

Text	GPT-4 Tokens	Claude Tokens (approx)
"Hello"	1	1
"artificial intelligence"	2	2
"TypeScript"	2	1-2
"supercalifragilisticexpialidocious"	9	~8-10

Token Counting in Practice

Using OpenAI's Tokenizer

import { encoding_for_model } from 'tiktoken';

function countTokens(text: string, model: string = 'gpt-4'): number {
  const encoder = encoding_for_model(model as any);
  const tokens = encoder.encode(text);
  encoder.free();
  return tokens.length;
}

// Count tokens in a conversation
function countConversationTokens(messages: Array<{ role: string; content: string }>): number {
  let totalTokens = 0;

  for (const message of messages) {
    // Each message has overhead for role and formatting
    totalTokens += 4; // Approximate overhead per message
    totalTokens += countTokens(message.content);
  }

  totalTokens += 2; // Conversation overhead

  return totalTokens;
}

const conversation = [
  { role: 'system', content: 'You are a helpful assistant.' },
  { role: 'user', content: 'What is TypeScript?' },
];

console.log('Total tokens:', countConversationTokens(conversation));

Cost Estimation

interface PricingTier {
  inputPer1M: number; // Cost per 1M input tokens
  outputPer1M: number; // Cost per 1M output tokens
}

const pricing: Record<string, PricingTier> = {
  'gpt-4o': { inputPer1M: 2.5, outputPer1M: 10.0 },
  'gpt-4o-mini': { inputPer1M: 0.15, outputPer1M: 0.6 },
  'claude-3-5-sonnet': { inputPer1M: 3.0, outputPer1M: 15.0 },
  'claude-3-haiku': { inputPer1M: 0.25, outputPer1M: 1.25 },
};

function estimateCost(inputTokens: number, outputTokens: number, model: string): number {
  const tier = pricing[model];
  if (!tier) throw new Error(`Unknown model: ${model}`);

  const inputCost = (inputTokens / 1_000_000) * tier.inputPer1M;
  const outputCost = (outputTokens / 1_000_000) * tier.outputPer1M;

  return inputCost + outputCost;
}

// Example: 1000 input tokens, 500 output tokens
const cost = estimateCost(1000, 500, 'gpt-4o');
console.log(`Estimated cost: $${cost.toFixed(4)}`); // $0.0075

Tokenization Gotchas

Numbers and Special Characters

Numbers often tokenize unpredictably:

"2024"       → ["202", "4"] or ["2024"]     (varies by model)
"3.14159"    → ["3", ".", "14", "159"]      (4 tokens)
"$100"       → ["$", "100"]                  (2 tokens)
"100USD"     → ["100", "USD"]                (2 tokens)

Whitespace Matters

"Hello World"    → ["Hello", " World"]       (2 tokens)
"Hello  World"   → ["Hello", "  ", "World"]  (3 tokens)
"HelloWorld"     → ["Hello", "World"]        (2 tokens)

Leading spaces are often part of tokens:

// These are different tokens!
const token1 = ' hello'; // includes leading space
const token2 = 'hello'; // no leading space

// This is why prompt formatting matters

Non-English Languages

Other languages typically require more tokens:

English: "Hello"          → 1 token
Spanish: "Hola"           → 1 token
Chinese: "你好"           → 2 tokens (or more)
Japanese: "こんにちは"     → 5+ tokens
Arabic: "مرحبا"           → 2+ tokens

This affects cost - the same message in different languages has different token counts.

Code and Technical Content

Code often tokenizes efficiently because common patterns are in the vocabulary:

"function"               → 1 token
"const"                  → 1 token
"async function"         → 2 tokens
"console.log"            → 3 tokens (console, ., log)
"======"                 → 1-2 tokens (common separator)

Optimizing Token Usage

Technique 1: Be Concise

// Verbose (more tokens, higher cost)
const verbosePrompt = `
I would really appreciate it if you could please help me 
understand what TypeScript is and why I should consider 
using it in my projects. Could you explain this to me?
`;

// Concise (fewer tokens, lower cost)
const concisePrompt = 'Explain TypeScript and its benefits.';

// Both get similar quality responses
// But concise version uses ~80% fewer tokens

Technique 2: Use Efficient Formats

// Inefficient: Natural language description
const inefficientData = `
The user's name is John Smith. Their email address is 
john.smith@email.com. They are 30 years old and live 
in New York City.
`;

// Efficient: Structured format
const efficientData = `
Name: John Smith
Email: john.smith@email.com
Age: 30
City: New York City
`;

// Even more efficient: JSON (if model handles it well)
const jsonData = JSON.stringify({
  name: 'John Smith',
  email: 'john.smith@email.com',
  age: 30,
  city: 'New York City',
});

Technique 3: Avoid Repetition

// Instead of repeating instructions in every message
const messages = [
  { role: 'system', content: 'You are a helpful assistant. Always be concise.' },
  { role: 'user', content: 'What is TypeScript? Be concise.' }, // Redundant!
];

// Put instructions once in the system message
const betterMessages = [
  { role: 'system', content: 'You are a helpful assistant. Always be concise.' },
  { role: 'user', content: 'What is TypeScript?' },
];

Technique 4: Truncate Intelligently

function truncateToTokenLimit(text: string, maxTokens: number): string {
  // Rough estimation: 4 chars per token
  const maxChars = maxTokens * 4;

  if (text.length <= maxChars) {
    return text;
  }

  // Truncate at word boundary
  const truncated = text.slice(0, maxChars);
  const lastSpace = truncated.lastIndexOf(' ');

  return truncated.slice(0, lastSpace) + '...';
}

const longText = 'Very long document...'.repeat(1000);
const truncated = truncateToTokenLimit(longText, 1000);

Practical Example: Token-Aware Chat Application

import OpenAI from 'openai';
import { encoding_for_model } from 'tiktoken';

const openai = new OpenAI();

interface Message {
  role: 'system' | 'user' | 'assistant';
  content: string;
}

class TokenAwareChat {
  private messages: Message[] = [];
  private maxContextTokens = 4000; // Leave room for response
  private encoder = encoding_for_model('gpt-4');

  constructor(systemPrompt: string) {
    this.messages.push({ role: 'system', content: systemPrompt });
  }

  private countTokens(text: string): number {
    return this.encoder.encode(text).length;
  }

  private getTotalTokens(): number {
    let total = 0;
    for (const msg of this.messages) {
      total += this.countTokens(msg.content) + 4; // Message overhead
    }
    return total + 2; // Conversation overhead
  }

  private trimHistory(): void {
    // Remove old messages if we exceed limit
    while (
      this.getTotalTokens() > this.maxContextTokens &&
      this.messages.length > 2 // Keep system + at least one message
    ) {
      // Remove the second message (first user message)
      this.messages.splice(1, 1);
    }
  }

  async chat(userMessage: string): Promise<string> {
    this.messages.push({ role: 'user', content: userMessage });
    this.trimHistory();

    const response = await openai.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: this.messages,
    });

    const assistantMessage = response.choices[0].message.content || '';
    this.messages.push({ role: 'assistant', content: assistantMessage });

    console.log(`Total tokens used: ${response.usage?.total_tokens}`);
    console.log(`Estimated cost: $${this.estimateCost(response.usage)}`);

    return assistantMessage;
  }

  private estimateCost(usage: any): string {
    if (!usage) return 'unknown';
    const inputCost = (usage.prompt_tokens / 1_000_000) * 0.15;
    const outputCost = (usage.completion_tokens / 1_000_000) * 0.6;
    return (inputCost + outputCost).toFixed(6);
  }

  cleanup(): void {
    this.encoder.free();
  }
}

// Usage
const chat = new TokenAwareChat('You are a helpful TypeScript tutor.');

const response1 = await chat.chat('What are generics?');
console.log(response1);

const response2 = await chat.chat('Can you give me an example?');
console.log(response2);

chat.cleanup();

Exercises

Exercise 1: Estimate Token Count

Without using a tokenizer, estimate the token count for these strings:

"Hello, world!"
"The quick brown fox jumps over the lazy dog."
"TypeScript is a typed superset of JavaScript."

Solution

Using the 4-characters-per-token rule:

"Hello, world!" = 13 characters ≈ 4 tokens (actual: 4)
"The quick brown fox..." = 44 characters ≈ 11 tokens (actual: 9)
"TypeScript is a typed..." = 45 characters ≈ 11 tokens (actual: 9)

The rule gives rough estimates. Actual counts depend on the tokenizer and how common the words are.

Exercise 2: Optimize This Prompt

Rewrite this prompt to use fewer tokens while maintaining the same intent:

I would be very grateful if you could please take a moment
to explain to me in great detail what the difference is
between var, let, and const in JavaScript programming
language. Please make sure to include examples.

Solution

Explain the difference between var, let, and const in JavaScript with examples.

Original: ~50 tokens Optimized: ~15 tokens

The optimized version is 70% shorter but gets the same quality response.

Exercise 3: Calculate Cost

You are building a customer service bot. Each conversation averages:

System prompt: 200 tokens
Customer message: 50 tokens
Bot response: 150 tokens
Average 5 exchanges per conversation

Using GPT-4o-mini pricing ($0.15/1M input, $0.60/1M output), calculate:

Tokens per conversation
Cost per conversation
Cost for 10,000 conversations/month

Solution

Tokens per conversation:
- Input: 200 (system) + 5 _ 50 (user) + 5 _ 150 (prior assistant) = 200 + 250 + 750 = 1,200 input tokens
- Output: 5 * 150 = 750 output tokens
Cost per conversation:
- Input: (1,200 / 1,000,000) * $0.15 = $0.00018
- Output: (750 / 1,000,000) * $0.60 = $0.00045
- Total: $0.00063 per conversation
Cost for 10,000 conversations:
- $0.00063 * 10,000 = $6.30/month

Note: This is a simplified calculation. Real conversations vary in length.

Key Takeaways

Tokens are the basic unit LLMs use to process text - roughly 4 characters or 0.75 words
Different models use different tokenizers - exact counts vary
You pay per token - understanding tokenization helps manage costs
Non-English text uses more tokens - plan accordingly for multilingual apps
Concise prompts save money without sacrificing quality
Track token usage in your applications to manage costs and context limits

Resources

Resource	Type	Description
OpenAI Tokenizer	Tool	Interactive token counter
tiktoken Library	Library	Official OpenAI tokenizer for Python
tiktoken for JS	Library	JavaScript port of tiktoken
OpenAI Pricing	Documentation	Current API pricing
Anthropic Pricing	Documentation	Claude API pricing

Next Lesson

Now that you understand tokens, let us explore how many tokens you can use at once - the context window.

Continue to Lesson 2.3: Context Window - Memory Limitations