From Zero to AI

Lesson 1.2: Managing Message History

Duration: 60 minutes

Learning Objectives

By the end of this lesson, you will be able to:

  • Understand why message history management is critical
  • Implement conversation state tracking
  • Handle context window limits effectively
  • Apply different strategies for managing long conversations
  • Build a robust conversation manager class

Introduction

Every message you send to an AI costs tokens. Every message in your history is sent again with each request. This means a long conversation can quickly become expensive and eventually hit the model's context limit.

Context window limits are real constraints:

Model Context Window Approximate Words
GPT-4o-mini 128K tokens ~96,000 words
GPT-4o 128K tokens ~96,000 words
Claude 3.5 Sonnet 200K tokens ~150,000 words

These seem large, but conversations grow faster than you might expect. A back-and-forth of 50 messages can easily consume 10,000+ tokens. Managing this is essential for production chatbots.


The Problem with Unlimited History

Consider what happens when you never trim history:

// Naive approach - keep everything
class NaiveChatbot {
  private messages: Message[] = [];

  async chat(userMessage: string): Promise<string> {
    this.messages.push({ role: 'user', content: userMessage });

    // Every request sends ALL previous messages
    const response = await this.openai.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: this.messages, // This keeps growing!
    });

    this.messages.push({ role: 'assistant', content: response.content });
    return response.content;
  }
}

Problems with this approach:

  1. Cost increases: Each request sends more tokens
  2. Latency increases: More tokens = slower responses
  3. Context limit: Eventually you hit the wall and get errors
  4. Irrelevant context: Old messages may confuse the AI

Token Counting Basics

Before managing history, you need to understand token counting. While exact counts require the model's tokenizer, we can estimate:

// Rough estimation: 1 token ≈ 4 characters (for English)
function estimateTokens(text: string): number {
  return Math.ceil(text.length / 4);
}

// More accurate for messages (includes role overhead)
function estimateMessageTokens(message: Message): number {
  // Each message has overhead for role and formatting (~4 tokens)
  const overhead = 4;
  return overhead + estimateTokens(message.content);
}

function estimateTotalTokens(messages: Message[]): number {
  // Base overhead for the request (~3 tokens)
  let total = 3;
  for (const message of messages) {
    total += estimateMessageTokens(message);
  }
  return total;
}

For production, use a proper tokenizer:

npm install tiktoken
import { encoding_for_model } from 'tiktoken';

function countTokens(text: string, model: string = 'gpt-4o-mini'): number {
  const encoder = encoding_for_model(model as any);
  const tokens = encoder.encode(text);
  encoder.free();
  return tokens.length;
}

Strategy 1: Sliding Window

Keep only the most recent N messages:

class SlidingWindowChat {
  private messages: Message[] = [];
  private maxMessages: number;
  private systemPrompt: Message;

  constructor(systemPrompt: string, maxMessages: number = 20) {
    this.maxMessages = maxMessages;
    this.systemPrompt = { role: 'system', content: systemPrompt };
    this.messages.push(this.systemPrompt);
  }

  async chat(userMessage: string): Promise<string> {
    this.messages.push({ role: 'user', content: userMessage });

    // Trim if too many messages (keep system prompt)
    while (this.messages.length > this.maxMessages + 1) {
      // Remove oldest non-system message
      this.messages.splice(1, 1);
    }

    const response = await this.sendToApi(this.messages);
    this.messages.push({ role: 'assistant', content: response });

    return response;
  }

  private async sendToApi(messages: Message[]): Promise<string> {
    const response = await this.openai.chat.completions.create({
      model: 'gpt-4o-mini',
      messages,
    });
    return response.choices[0].message.content || '';
  }
}
┌─────────────────────────────────────────────────────────────────┐
│                    Sliding Window Strategy                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Before trimming (maxMessages = 4):                             │
│  ┌────────┬────────┬────────┬────────┬────────┬────────┐       │
│  │ SystemUser 1 │ Asst 1User 2 │ Asst 2User 3 │       │
│  └────────┴────────┴────────┴────────┴────────┴────────┘       │
│                                                                  │
│  After trimming:                                                 │
│  ┌────────┬────────┬────────┬────────┬────────┐                 │
│  │ SystemUser 2 │ Asst 2User 3 │ (new)  │                 │
│  └────────┴────────┴────────┴────────┴────────┘                 │
│             ▲                                                    │
│             │ Oldest messages removed                           │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Pros: Simple, predictable memory usage Cons: Loses important early context


Strategy 2: Token Budget

Keep messages until you hit a token limit:

class TokenBudgetChat {
  private messages: Message[] = [];
  private maxTokens: number;
  private systemPrompt: Message;

  constructor(systemPrompt: string, maxTokens: number = 4000) {
    this.maxTokens = maxTokens;
    this.systemPrompt = { role: 'system', content: systemPrompt };
    this.messages.push(this.systemPrompt);
  }

  async chat(userMessage: string): Promise<string> {
    this.messages.push({ role: 'user', content: userMessage });

    // Trim until under budget
    this.trimToTokenBudget();

    const response = await this.sendToApi(this.messages);
    this.messages.push({ role: 'assistant', content: response });

    // Trim again after adding response
    this.trimToTokenBudget();

    return response;
  }

  private trimToTokenBudget(): void {
    while (this.messages.length > 1 && this.estimateTotalTokens() > this.maxTokens) {
      // Remove oldest non-system message
      this.messages.splice(1, 1);
    }
  }

  private estimateTotalTokens(): number {
    return this.messages.reduce((total, msg) => {
      return total + 4 + Math.ceil(msg.content.length / 4);
    }, 3);
  }
}

Pros: More control over costs, adapts to message length Cons: Still loses early context


Strategy 3: Summarization

Summarize old messages instead of deleting them:

class SummarizingChat {
  private messages: Message[] = [];
  private summaryThreshold: number;
  private openai: OpenAI;

  constructor(systemPrompt: string, summaryThreshold: number = 10) {
    this.summaryThreshold = summaryThreshold;
    this.openai = new OpenAI();
    this.messages.push({ role: 'system', content: systemPrompt });
  }

  async chat(userMessage: string): Promise<string> {
    this.messages.push({ role: 'user', content: userMessage });

    // Check if we need to summarize
    if (this.messages.length > this.summaryThreshold) {
      await this.summarizeOldMessages();
    }

    const response = await this.sendToApi(this.messages);
    this.messages.push({ role: 'assistant', content: response });

    return response;
  }

  private async summarizeOldMessages(): Promise<void> {
    // Keep system prompt and last 4 messages
    const systemPrompt = this.messages[0];
    const recentMessages = this.messages.slice(-4);
    const oldMessages = this.messages.slice(1, -4);

    if (oldMessages.length === 0) return;

    // Create a summary of old messages
    const summaryResponse = await this.openai.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: [
        {
          role: 'system',
          content:
            'Summarize this conversation concisely. Focus on key facts, decisions, and context that would be important for continuing the conversation.',
        },
        {
          role: 'user',
          content: oldMessages.map((m) => `${m.role}: ${m.content}`).join('\n'),
        },
      ],
      max_tokens: 500,
    });

    const summary = summaryResponse.choices[0].message.content || '';

    // Rebuild messages with summary
    this.messages = [
      systemPrompt,
      {
        role: 'system',
        content: `Previous conversation summary: ${summary}`,
      },
      ...recentMessages,
    ];
  }
}
┌─────────────────────────────────────────────────────────────────┐
│                    Summarization Strategy                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Before summarization:                                           │
│  ┌────────┬────────┬────────┬────────┬────────┬────────┐       │
│  │ SystemUser 1 │ Asst 1User 2 │ Asst 2User 3 │       │
│  └────────┴────────┴────────┴────────┴────────┴────────┘       │
│                                                                  │
│  After summarization:                                            │
│  ┌────────┬─────────────┬────────┬────────┐                     │
│  │ System │  Summary    │ Asst 2User 3 │                     │
│  │        │ (User1+A1+  │        │        │                     │
│  │        │  User2)     │        │        │                     │
│  └────────┴─────────────┴────────┴────────┘                     │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Pros: Preserves context, smarter trimming Cons: Extra API calls, summary may miss details


Strategy 4: Important Message Pinning

Mark important messages to never delete:

interface PinnedMessage {
  role: 'system' | 'user' | 'assistant';
  content: string;
  pinned: boolean;
}

class PinningChat {
  private messages: PinnedMessage[] = [];
  private maxMessages: number;

  constructor(systemPrompt: string, maxMessages: number = 20) {
    this.maxMessages = maxMessages;
    this.messages.push({
      role: 'system',
      content: systemPrompt,
      pinned: true, // System prompt is always pinned
    });
  }

  async chat(userMessage: string, pinThis: boolean = false): Promise<string> {
    this.messages.push({
      role: 'user',
      content: userMessage,
      pinned: pinThis,
    });

    this.trimMessages();

    const response = await this.sendToApi(this.messages);

    this.messages.push({
      role: 'assistant',
      content: response,
      pinned: pinThis, // Pin response if user message was pinned
    });

    return response;
  }

  private trimMessages(): void {
    while (this.messages.length > this.maxMessages) {
      // Find oldest unpinned message
      const unpinnedIndex = this.messages.findIndex((m, i) => i > 0 && !m.pinned);

      if (unpinnedIndex === -1) {
        // All messages are pinned, cannot trim
        console.warn('Cannot trim: all messages are pinned');
        break;
      }

      this.messages.splice(unpinnedIndex, 1);
    }
  }

  pinMessage(index: number): void {
    if (index >= 0 && index < this.messages.length) {
      this.messages[index].pinned = true;
    }
  }
}

Pros: Preserves critical information Cons: Requires knowing what is important


Building a Complete Conversation Manager

Let us create a production-ready conversation manager that combines these strategies:

import 'dotenv/config';
import OpenAI from 'openai';

// Types
interface Message {
  role: 'system' | 'user' | 'assistant';
  content: string;
  timestamp: Date;
  pinned: boolean;
}

interface ConversationConfig {
  systemPrompt: string;
  model?: string;
  maxTokens?: number;
  maxMessages?: number;
  autoSummarize?: boolean;
}

interface ConversationState {
  id: string;
  messages: Message[];
  summary: string | null;
  createdAt: Date;
  updatedAt: Date;
  totalTokensUsed: number;
}

// Main class
export class ConversationManager {
  private openai: OpenAI;
  private config: Required<ConversationConfig>;
  private state: ConversationState;

  constructor(config: ConversationConfig) {
    this.openai = new OpenAI();

    // Set defaults
    this.config = {
      systemPrompt: config.systemPrompt,
      model: config.model ?? 'gpt-4o-mini',
      maxTokens: config.maxTokens ?? 4000,
      maxMessages: config.maxMessages ?? 50,
      autoSummarize: config.autoSummarize ?? true,
    };

    // Initialize state
    this.state = {
      id: this.generateId(),
      messages: [
        {
          role: 'system',
          content: config.systemPrompt,
          timestamp: new Date(),
          pinned: true,
        },
      ],
      summary: null,
      createdAt: new Date(),
      updatedAt: new Date(),
      totalTokensUsed: 0,
    };
  }

  async sendMessage(content: string, options: { pin?: boolean } = {}): Promise<string> {
    // Add user message
    this.state.messages.push({
      role: 'user',
      content,
      timestamp: new Date(),
      pinned: options.pin ?? false,
    });

    // Manage context before sending
    await this.manageContext();

    // Build messages for API
    const apiMessages = this.buildApiMessages();

    // Send to API
    const response = await this.openai.chat.completions.create({
      model: this.config.model,
      messages: apiMessages,
    });

    const assistantContent = response.choices[0].message.content || '';

    // Track tokens
    this.state.totalTokensUsed += response.usage?.total_tokens || 0;

    // Add assistant response
    this.state.messages.push({
      role: 'assistant',
      content: assistantContent,
      timestamp: new Date(),
      pinned: options.pin ?? false,
    });

    this.state.updatedAt = new Date();

    return assistantContent;
  }

  private async manageContext(): Promise<void> {
    const estimatedTokens = this.estimateTokens();

    // If under limits, no action needed
    if (
      estimatedTokens < this.config.maxTokens &&
      this.state.messages.length < this.config.maxMessages
    ) {
      return;
    }

    // Try summarization first if enabled
    if (this.config.autoSummarize) {
      await this.summarizeIfNeeded();
    }

    // Then apply sliding window to remaining
    this.applySlidingWindow();
  }

  private async summarizeIfNeeded(): Promise<void> {
    // Only summarize if we have enough messages
    const nonSystemMessages = this.state.messages.filter((m) => m.role !== 'system');
    if (nonSystemMessages.length < 10) return;

    // Get messages to summarize (oldest half, excluding pinned)
    const halfPoint = Math.floor(nonSystemMessages.length / 2);
    const toSummarize = this.state.messages.slice(1, halfPoint + 1).filter((m) => !m.pinned);

    if (toSummarize.length < 4) return;

    // Generate summary
    const summaryResponse = await this.openai.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: [
        {
          role: 'system',
          content: `Summarize this conversation excerpt. Include:
- Key topics discussed
- Important facts or decisions
- Any user preferences mentioned
Keep it concise but informative.`,
        },
        {
          role: 'user',
          content: toSummarize.map((m) => `${m.role}: ${m.content}`).join('\n\n'),
        },
      ],
      max_tokens: 300,
    });

    const newSummary = summaryResponse.choices[0].message.content || '';

    // Update or append to existing summary
    if (this.state.summary) {
      this.state.summary = `${this.state.summary}\n\nLater: ${newSummary}`;
    } else {
      this.state.summary = newSummary;
    }

    // Remove summarized messages (except pinned)
    const pinnedFromSummarized = this.state.messages
      .slice(1, halfPoint + 1)
      .filter((m) => m.pinned);

    this.state.messages = [
      this.state.messages[0], // System prompt
      ...pinnedFromSummarized,
      ...this.state.messages.slice(halfPoint + 1),
    ];
  }

  private applySlidingWindow(): void {
    // Keep system prompt + summary + recent messages
    const maxToKeep = Math.min(this.config.maxMessages, 30);

    while (this.state.messages.length > maxToKeep) {
      // Find first unpinned, non-system message
      const indexToRemove = this.state.messages.findIndex((m, i) => i > 0 && !m.pinned);

      if (indexToRemove === -1) break;
      this.state.messages.splice(indexToRemove, 1);
    }
  }

  private buildApiMessages(): Array<{ role: 'system' | 'user' | 'assistant'; content: string }> {
    const messages: Array<{ role: 'system' | 'user' | 'assistant'; content: string }> = [];

    // Add system prompt with summary if available
    let systemContent = this.config.systemPrompt;
    if (this.state.summary) {
      systemContent += `\n\nPrevious conversation context:\n${this.state.summary}`;
    }
    messages.push({ role: 'system', content: systemContent });

    // Add conversation messages (skip original system message)
    for (const msg of this.state.messages.slice(1)) {
      messages.push({ role: msg.role, content: msg.content });
    }

    return messages;
  }

  private estimateTokens(): number {
    let total = 3; // Base overhead
    for (const msg of this.state.messages) {
      total += 4 + Math.ceil(msg.content.length / 4);
    }
    if (this.state.summary) {
      total += Math.ceil(this.state.summary.length / 4);
    }
    return total;
  }

  private generateId(): string {
    return `conv_${Date.now()}_${Math.random().toString(36).substring(2, 9)}`;
  }

  // Public getters
  getState(): ConversationState {
    return { ...this.state };
  }

  getMessageCount(): number {
    return this.state.messages.length;
  }

  getTotalTokensUsed(): number {
    return this.state.totalTokensUsed;
  }

  // Reset conversation
  reset(): void {
    this.state = {
      id: this.generateId(),
      messages: [
        {
          role: 'system',
          content: this.config.systemPrompt,
          timestamp: new Date(),
          pinned: true,
        },
      ],
      summary: null,
      createdAt: new Date(),
      updatedAt: new Date(),
      totalTokensUsed: 0,
    };
  }
}

Using the Conversation Manager

Here is how to use the manager in practice:

import { ConversationManager } from './conversation-manager';

async function main() {
  const manager = new ConversationManager({
    systemPrompt: 'You are a helpful programming tutor.',
    maxTokens: 4000,
    maxMessages: 20,
    autoSummarize: true,
  });

  console.log('Chat started. This conversation will be managed automatically.\n');

  // Simulate a conversation
  const exchanges = [
    'What is TypeScript?',
    'How do I define types?',
    'Show me an interface example',
    'What about generics?',
    'Can you explain type inference?',
    // Pin an important message
    { content: 'Remember: I prefer concise code examples', pin: true },
    'Show me a generic function',
    'What are utility types?',
    'Explain Partial<T>',
    'What about Required<T>?',
  ];

  for (const exchange of exchanges) {
    const isObject = typeof exchange === 'object';
    const content = isObject ? exchange.content : exchange;
    const options = isObject ? { pin: exchange.pin } : {};

    console.log(`You: ${content}`);
    const response = await manager.sendMessage(content, options);
    console.log(`Assistant: ${response.substring(0, 200)}...\n`);
  }

  // Check state
  const state = manager.getState();
  console.log('\n--- Conversation State ---');
  console.log(`Messages: ${state.messages.length}`);
  console.log(`Total tokens used: ${state.totalTokensUsed}`);
  console.log(`Has summary: ${state.summary !== null}`);
  if (state.summary) {
    console.log(`Summary: ${state.summary.substring(0, 200)}...`);
  }
}

main().catch(console.error);

Best Practices

1. Choose the Right Strategy

Scenario Recommended Strategy
Short conversations (<20 turns) Sliding window
Long customer support chats Summarization
Technical discussions Token budget + pinning
Casual chatbots Simple sliding window

2. Monitor Token Usage

Always track tokens to control costs:

class MonitoredChat {
  private tokenBudget: number;
  private tokensUsed: number = 0;

  async chat(message: string): Promise<string | null> {
    if (this.tokensUsed >= this.tokenBudget) {
      console.warn('Token budget exhausted');
      return null;
    }
    // ... send message and track usage
  }
}

3. Handle Edge Cases

// Empty messages
if (!userMessage.trim()) {
  return 'Please enter a message.';
}

// Very long messages
if (userMessage.length > 10000) {
  return 'Your message is too long. Please shorten it.';
}

// Conversation too long to summarize
if (this.messages.length > 100 && !this.canSummarize()) {
  this.forceReset();
}

4. Preserve User Intent

When summarizing, ensure key user preferences are maintained:

const summaryPrompt = `Summarize this conversation. 
IMPORTANT: Preserve any user preferences, requirements, or constraints they mentioned.
For example: preferred language, coding style, level of detail, etc.`;

Exercises

Exercise 1: Implement Token Counting

Use the tiktoken library for accurate token counting:

// Your implementation here
class AccurateTokenChat {
  countTokens(text: string): number {
    // TODO: Use tiktoken for accurate counting
  }
}
Solution
import 'dotenv/config';
import OpenAI from 'openai';
import { TiktokenModel, encoding_for_model } from 'tiktoken';

interface Message {
  role: 'system' | 'user' | 'assistant';
  content: string;
}

class AccurateTokenChat {
  private openai: OpenAI;
  private messages: Message[] = [];
  private model: TiktokenModel = 'gpt-4o-mini';
  private maxTokens: number;

  constructor(systemPrompt: string, maxTokens: number = 4000) {
    this.openai = new OpenAI();
    this.maxTokens = maxTokens;
    this.messages.push({ role: 'system', content: systemPrompt });
  }

  countTokens(text: string): number {
    const encoder = encoding_for_model(this.model);
    const tokens = encoder.encode(text);
    const count = tokens.length;
    encoder.free();
    return count;
  }

  countMessageTokens(messages: Message[]): number {
    let total = 3; // Base overhead

    for (const message of messages) {
      total += 4; // Message overhead
      total += this.countTokens(message.content);
    }

    return total;
  }

  async chat(userMessage: string): Promise<string> {
    this.messages.push({ role: 'user', content: userMessage });

    // Trim if over budget
    while (this.messages.length > 1 && this.countMessageTokens(this.messages) > this.maxTokens) {
      this.messages.splice(1, 1);
    }

    const currentTokens = this.countMessageTokens(this.messages);
    console.log(`Current context: ${currentTokens} tokens`);

    const response = await this.openai.chat.completions.create({
      model: this.model,
      messages: this.messages,
    });

    const content = response.choices[0].message.content || '';
    this.messages.push({ role: 'assistant', content });

    return content;
  }
}

// Test
async function main() {
  const chat = new AccurateTokenChat('You are a helpful assistant.', 2000);

  console.log(await chat.chat('What is JavaScript?'));
  console.log(await chat.chat('And TypeScript?'));
}

main();

Exercise 2: Conversation Export/Import

Implement saving and loading conversations:

// Your implementation here
class PersistentChat {
  exportConversation(): string {
    // TODO: Return JSON string of conversation
  }

  importConversation(data: string): void {
    // TODO: Load conversation from JSON
  }
}
Solution
import 'dotenv/config';
import * as fs from 'fs';
import OpenAI from 'openai';

interface Message {
  role: 'system' | 'user' | 'assistant';
  content: string;
  timestamp: string;
}

interface ConversationData {
  id: string;
  messages: Message[];
  createdAt: string;
  exportedAt: string;
}

class PersistentChat {
  private openai: OpenAI;
  private messages: Message[] = [];
  private id: string;
  private createdAt: Date;

  constructor(systemPrompt: string) {
    this.openai = new OpenAI();
    this.id = `conv_${Date.now()}`;
    this.createdAt = new Date();
    this.messages.push({
      role: 'system',
      content: systemPrompt,
      timestamp: new Date().toISOString(),
    });
  }

  async chat(userMessage: string): Promise<string> {
    this.messages.push({
      role: 'user',
      content: userMessage,
      timestamp: new Date().toISOString(),
    });

    const response = await this.openai.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: this.messages.map((m) => ({ role: m.role, content: m.content })),
    });

    const content = response.choices[0].message.content || '';

    this.messages.push({
      role: 'assistant',
      content,
      timestamp: new Date().toISOString(),
    });

    return content;
  }

  exportConversation(): string {
    const data: ConversationData = {
      id: this.id,
      messages: this.messages,
      createdAt: this.createdAt.toISOString(),
      exportedAt: new Date().toISOString(),
    };
    return JSON.stringify(data, null, 2);
  }

  importConversation(jsonData: string): void {
    const data: ConversationData = JSON.parse(jsonData);
    this.id = data.id;
    this.messages = data.messages;
    this.createdAt = new Date(data.createdAt);
  }

  saveToFile(filepath: string): void {
    fs.writeFileSync(filepath, this.exportConversation());
  }

  loadFromFile(filepath: string): void {
    const data = fs.readFileSync(filepath, 'utf-8');
    this.importConversation(data);
  }

  getMessageCount(): number {
    return this.messages.length;
  }
}

// Test
async function main() {
  const chat = new PersistentChat('You are helpful.');

  await chat.chat('Hello!');
  await chat.chat('What is TypeScript?');

  // Export
  const exported = chat.exportConversation();
  console.log('Exported:', exported);

  // Save to file
  chat.saveToFile('conversation.json');
  console.log('Saved to file');

  // Create new chat and import
  const newChat = new PersistentChat('Placeholder');
  newChat.loadFromFile('conversation.json');
  console.log(`Loaded ${newChat.getMessageCount()} messages`);

  // Continue conversation
  const response = await newChat.chat('Can you summarize our conversation?');
  console.log('Response:', response);
}

main();

Exercise 3: Smart Context Window

Create a context manager that prioritizes recent and relevant messages:

// Your implementation here
class SmartContextChat {
  // TODO: Implement a chat that keeps messages based on:
  // 1. Recency (newer = higher priority)
  // 2. Relevance to current topic
  // 3. Pinned status
}
Solution
import 'dotenv/config';
import OpenAI from 'openai';

interface Message {
  role: 'system' | 'user' | 'assistant';
  content: string;
  timestamp: Date;
  pinned: boolean;
  relevanceScore?: number;
}

class SmartContextChat {
  private openai: OpenAI;
  private messages: Message[] = [];
  private maxTokens: number;

  constructor(systemPrompt: string, maxTokens: number = 4000) {
    this.openai = new OpenAI();
    this.maxTokens = maxTokens;
    this.messages.push({
      role: 'system',
      content: systemPrompt,
      timestamp: new Date(),
      pinned: true,
    });
  }

  async chat(userMessage: string): Promise<string> {
    this.messages.push({
      role: 'user',
      content: userMessage,
      timestamp: new Date(),
      pinned: false,
    });

    // Score messages for relevance
    await this.scoreRelevance(userMessage);

    // Build context with smart selection
    const contextMessages = this.buildSmartContext();

    const response = await this.openai.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: contextMessages.map((m) => ({ role: m.role, content: m.content })),
    });

    const content = response.choices[0].message.content || '';

    this.messages.push({
      role: 'assistant',
      content,
      timestamp: new Date(),
      pinned: false,
    });

    return content;
  }

  private async scoreRelevance(currentMessage: string): Promise<void> {
    // Simple relevance scoring based on keyword overlap
    const currentWords = new Set(currentMessage.toLowerCase().split(/\s+/));

    for (const msg of this.messages) {
      if (msg.role === 'system') {
        msg.relevanceScore = 1; // System always relevant
        continue;
      }

      const msgWords = msg.content.toLowerCase().split(/\s+/);
      const overlap = msgWords.filter((w) => currentWords.has(w)).length;
      const recencyBonus = this.getRecencyScore(msg.timestamp);

      msg.relevanceScore = (overlap / msgWords.length) * 0.7 + recencyBonus * 0.3;
    }
  }

  private getRecencyScore(timestamp: Date): number {
    const ageMs = Date.now() - timestamp.getTime();
    const ageMinutes = ageMs / (1000 * 60);
    // Score decreases with age, but never below 0.1
    return Math.max(0.1, 1 - ageMinutes / 60);
  }

  private buildSmartContext(): Message[] {
    // Always include system prompt
    const result: Message[] = [this.messages[0]];

    // Get non-system messages
    const candidates = this.messages.slice(1);

    // Sort by priority: pinned first, then by combined score
    const sorted = [...candidates].sort((a, b) => {
      if (a.pinned !== b.pinned) return a.pinned ? -1 : 1;
      return (b.relevanceScore || 0) - (a.relevanceScore || 0);
    });

    // Add messages until token budget
    let tokenCount = this.estimateTokens(result[0]);

    for (const msg of sorted) {
      const msgTokens = this.estimateTokens(msg);
      if (tokenCount + msgTokens > this.maxTokens) break;

      result.push(msg);
      tokenCount += msgTokens;
    }

    // Re-sort by timestamp for proper conversation flow
    result.sort((a, b) => a.timestamp.getTime() - b.timestamp.getTime());

    return result;
  }

  private estimateTokens(msg: Message): number {
    return 4 + Math.ceil(msg.content.length / 4);
  }

  pinMessage(index: number): void {
    if (index >= 0 && index < this.messages.length) {
      this.messages[index].pinned = true;
    }
  }
}

// Test
async function main() {
  const chat = new SmartContextChat('You are a programming tutor.', 2000);

  // Have a conversation about different topics
  await chat.chat('What is TypeScript?');
  await chat.chat('What is Python?');
  await chat.chat('What is JavaScript?');
  await chat.chat('Tell me more about TypeScript generics');
  // This should prioritize TypeScript messages due to relevance

  console.log('Smart context chat completed');
}

main();

Key Takeaways

  1. History grows quickly: Every message adds to your token count and costs
  2. Multiple strategies exist: Sliding window, token budget, summarization, pinning
  3. Choose based on use case: Short chats need simple solutions, long ones need summarization
  4. Always monitor tokens: Track usage to prevent surprises
  5. Preserve important context: Use pinning or smart selection for key information
  6. Test edge cases: Very long messages, rapid conversations, context limits

Resources

Resource Type Description
OpenAI Tokenizer Tool Visualize token counts
tiktoken Library Official token counting
Managing Conversation History Documentation Official guide
Context Window Best Practices Tutorial Token counting examples

Next Lesson

You now know how to manage conversation history effectively. In the next lesson, you will learn how to use system prompts to create personalized AI assistants with distinct personalities and capabilities.

Continue to Lesson 1.3: System Prompts for Personalization