Lesson 1.2: Managing Message History
Duration: 60 minutes
Learning Objectives
By the end of this lesson, you will be able to:
- Understand why message history management is critical
- Implement conversation state tracking
- Handle context window limits effectively
- Apply different strategies for managing long conversations
- Build a robust conversation manager class
Introduction
Every message you send to an AI costs tokens. Every message in your history is sent again with each request. This means a long conversation can quickly become expensive and eventually hit the model's context limit.
Context window limits are real constraints:
| Model | Context Window | Approximate Words |
|---|---|---|
| GPT-4o-mini | 128K tokens | ~96,000 words |
| GPT-4o | 128K tokens | ~96,000 words |
| Claude 3.5 Sonnet | 200K tokens | ~150,000 words |
These seem large, but conversations grow faster than you might expect. A back-and-forth of 50 messages can easily consume 10,000+ tokens. Managing this is essential for production chatbots.
The Problem with Unlimited History
Consider what happens when you never trim history:
// Naive approach - keep everything
class NaiveChatbot {
private messages: Message[] = [];
async chat(userMessage: string): Promise<string> {
this.messages.push({ role: 'user', content: userMessage });
// Every request sends ALL previous messages
const response = await this.openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: this.messages, // This keeps growing!
});
this.messages.push({ role: 'assistant', content: response.content });
return response.content;
}
}
Problems with this approach:
- Cost increases: Each request sends more tokens
- Latency increases: More tokens = slower responses
- Context limit: Eventually you hit the wall and get errors
- Irrelevant context: Old messages may confuse the AI
Token Counting Basics
Before managing history, you need to understand token counting. While exact counts require the model's tokenizer, we can estimate:
// Rough estimation: 1 token ≈ 4 characters (for English)
function estimateTokens(text: string): number {
return Math.ceil(text.length / 4);
}
// More accurate for messages (includes role overhead)
function estimateMessageTokens(message: Message): number {
// Each message has overhead for role and formatting (~4 tokens)
const overhead = 4;
return overhead + estimateTokens(message.content);
}
function estimateTotalTokens(messages: Message[]): number {
// Base overhead for the request (~3 tokens)
let total = 3;
for (const message of messages) {
total += estimateMessageTokens(message);
}
return total;
}
For production, use a proper tokenizer:
npm install tiktoken
import { encoding_for_model } from 'tiktoken';
function countTokens(text: string, model: string = 'gpt-4o-mini'): number {
const encoder = encoding_for_model(model as any);
const tokens = encoder.encode(text);
encoder.free();
return tokens.length;
}
Strategy 1: Sliding Window
Keep only the most recent N messages:
class SlidingWindowChat {
private messages: Message[] = [];
private maxMessages: number;
private systemPrompt: Message;
constructor(systemPrompt: string, maxMessages: number = 20) {
this.maxMessages = maxMessages;
this.systemPrompt = { role: 'system', content: systemPrompt };
this.messages.push(this.systemPrompt);
}
async chat(userMessage: string): Promise<string> {
this.messages.push({ role: 'user', content: userMessage });
// Trim if too many messages (keep system prompt)
while (this.messages.length > this.maxMessages + 1) {
// Remove oldest non-system message
this.messages.splice(1, 1);
}
const response = await this.sendToApi(this.messages);
this.messages.push({ role: 'assistant', content: response });
return response;
}
private async sendToApi(messages: Message[]): Promise<string> {
const response = await this.openai.chat.completions.create({
model: 'gpt-4o-mini',
messages,
});
return response.choices[0].message.content || '';
}
}
┌─────────────────────────────────────────────────────────────────┐
│ Sliding Window Strategy │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Before trimming (maxMessages = 4): │
│ ┌────────┬────────┬────────┬────────┬────────┬────────┐ │
│ │ System │ User 1 │ Asst 1 │ User 2 │ Asst 2 │ User 3 │ │
│ └────────┴────────┴────────┴────────┴────────┴────────┘ │
│ │
│ After trimming: │
│ ┌────────┬────────┬────────┬────────┬────────┐ │
│ │ System │ User 2 │ Asst 2 │ User 3 │ (new) │ │
│ └────────┴────────┴────────┴────────┴────────┘ │
│ ▲ │
│ │ Oldest messages removed │
│ │
└─────────────────────────────────────────────────────────────────┘
Pros: Simple, predictable memory usage Cons: Loses important early context
Strategy 2: Token Budget
Keep messages until you hit a token limit:
class TokenBudgetChat {
private messages: Message[] = [];
private maxTokens: number;
private systemPrompt: Message;
constructor(systemPrompt: string, maxTokens: number = 4000) {
this.maxTokens = maxTokens;
this.systemPrompt = { role: 'system', content: systemPrompt };
this.messages.push(this.systemPrompt);
}
async chat(userMessage: string): Promise<string> {
this.messages.push({ role: 'user', content: userMessage });
// Trim until under budget
this.trimToTokenBudget();
const response = await this.sendToApi(this.messages);
this.messages.push({ role: 'assistant', content: response });
// Trim again after adding response
this.trimToTokenBudget();
return response;
}
private trimToTokenBudget(): void {
while (this.messages.length > 1 && this.estimateTotalTokens() > this.maxTokens) {
// Remove oldest non-system message
this.messages.splice(1, 1);
}
}
private estimateTotalTokens(): number {
return this.messages.reduce((total, msg) => {
return total + 4 + Math.ceil(msg.content.length / 4);
}, 3);
}
}
Pros: More control over costs, adapts to message length Cons: Still loses early context
Strategy 3: Summarization
Summarize old messages instead of deleting them:
class SummarizingChat {
private messages: Message[] = [];
private summaryThreshold: number;
private openai: OpenAI;
constructor(systemPrompt: string, summaryThreshold: number = 10) {
this.summaryThreshold = summaryThreshold;
this.openai = new OpenAI();
this.messages.push({ role: 'system', content: systemPrompt });
}
async chat(userMessage: string): Promise<string> {
this.messages.push({ role: 'user', content: userMessage });
// Check if we need to summarize
if (this.messages.length > this.summaryThreshold) {
await this.summarizeOldMessages();
}
const response = await this.sendToApi(this.messages);
this.messages.push({ role: 'assistant', content: response });
return response;
}
private async summarizeOldMessages(): Promise<void> {
// Keep system prompt and last 4 messages
const systemPrompt = this.messages[0];
const recentMessages = this.messages.slice(-4);
const oldMessages = this.messages.slice(1, -4);
if (oldMessages.length === 0) return;
// Create a summary of old messages
const summaryResponse = await this.openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{
role: 'system',
content:
'Summarize this conversation concisely. Focus on key facts, decisions, and context that would be important for continuing the conversation.',
},
{
role: 'user',
content: oldMessages.map((m) => `${m.role}: ${m.content}`).join('\n'),
},
],
max_tokens: 500,
});
const summary = summaryResponse.choices[0].message.content || '';
// Rebuild messages with summary
this.messages = [
systemPrompt,
{
role: 'system',
content: `Previous conversation summary: ${summary}`,
},
...recentMessages,
];
}
}
┌─────────────────────────────────────────────────────────────────┐
│ Summarization Strategy │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Before summarization: │
│ ┌────────┬────────┬────────┬────────┬────────┬────────┐ │
│ │ System │ User 1 │ Asst 1 │ User 2 │ Asst 2 │ User 3 │ │
│ └────────┴────────┴────────┴────────┴────────┴────────┘ │
│ │
│ After summarization: │
│ ┌────────┬─────────────┬────────┬────────┐ │
│ │ System │ Summary │ Asst 2 │ User 3 │ │
│ │ │ (User1+A1+ │ │ │ │
│ │ │ User2) │ │ │ │
│ └────────┴─────────────┴────────┴────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Pros: Preserves context, smarter trimming Cons: Extra API calls, summary may miss details
Strategy 4: Important Message Pinning
Mark important messages to never delete:
interface PinnedMessage {
role: 'system' | 'user' | 'assistant';
content: string;
pinned: boolean;
}
class PinningChat {
private messages: PinnedMessage[] = [];
private maxMessages: number;
constructor(systemPrompt: string, maxMessages: number = 20) {
this.maxMessages = maxMessages;
this.messages.push({
role: 'system',
content: systemPrompt,
pinned: true, // System prompt is always pinned
});
}
async chat(userMessage: string, pinThis: boolean = false): Promise<string> {
this.messages.push({
role: 'user',
content: userMessage,
pinned: pinThis,
});
this.trimMessages();
const response = await this.sendToApi(this.messages);
this.messages.push({
role: 'assistant',
content: response,
pinned: pinThis, // Pin response if user message was pinned
});
return response;
}
private trimMessages(): void {
while (this.messages.length > this.maxMessages) {
// Find oldest unpinned message
const unpinnedIndex = this.messages.findIndex((m, i) => i > 0 && !m.pinned);
if (unpinnedIndex === -1) {
// All messages are pinned, cannot trim
console.warn('Cannot trim: all messages are pinned');
break;
}
this.messages.splice(unpinnedIndex, 1);
}
}
pinMessage(index: number): void {
if (index >= 0 && index < this.messages.length) {
this.messages[index].pinned = true;
}
}
}
Pros: Preserves critical information Cons: Requires knowing what is important
Building a Complete Conversation Manager
Let us create a production-ready conversation manager that combines these strategies:
import 'dotenv/config';
import OpenAI from 'openai';
// Types
interface Message {
role: 'system' | 'user' | 'assistant';
content: string;
timestamp: Date;
pinned: boolean;
}
interface ConversationConfig {
systemPrompt: string;
model?: string;
maxTokens?: number;
maxMessages?: number;
autoSummarize?: boolean;
}
interface ConversationState {
id: string;
messages: Message[];
summary: string | null;
createdAt: Date;
updatedAt: Date;
totalTokensUsed: number;
}
// Main class
export class ConversationManager {
private openai: OpenAI;
private config: Required<ConversationConfig>;
private state: ConversationState;
constructor(config: ConversationConfig) {
this.openai = new OpenAI();
// Set defaults
this.config = {
systemPrompt: config.systemPrompt,
model: config.model ?? 'gpt-4o-mini',
maxTokens: config.maxTokens ?? 4000,
maxMessages: config.maxMessages ?? 50,
autoSummarize: config.autoSummarize ?? true,
};
// Initialize state
this.state = {
id: this.generateId(),
messages: [
{
role: 'system',
content: config.systemPrompt,
timestamp: new Date(),
pinned: true,
},
],
summary: null,
createdAt: new Date(),
updatedAt: new Date(),
totalTokensUsed: 0,
};
}
async sendMessage(content: string, options: { pin?: boolean } = {}): Promise<string> {
// Add user message
this.state.messages.push({
role: 'user',
content,
timestamp: new Date(),
pinned: options.pin ?? false,
});
// Manage context before sending
await this.manageContext();
// Build messages for API
const apiMessages = this.buildApiMessages();
// Send to API
const response = await this.openai.chat.completions.create({
model: this.config.model,
messages: apiMessages,
});
const assistantContent = response.choices[0].message.content || '';
// Track tokens
this.state.totalTokensUsed += response.usage?.total_tokens || 0;
// Add assistant response
this.state.messages.push({
role: 'assistant',
content: assistantContent,
timestamp: new Date(),
pinned: options.pin ?? false,
});
this.state.updatedAt = new Date();
return assistantContent;
}
private async manageContext(): Promise<void> {
const estimatedTokens = this.estimateTokens();
// If under limits, no action needed
if (
estimatedTokens < this.config.maxTokens &&
this.state.messages.length < this.config.maxMessages
) {
return;
}
// Try summarization first if enabled
if (this.config.autoSummarize) {
await this.summarizeIfNeeded();
}
// Then apply sliding window to remaining
this.applySlidingWindow();
}
private async summarizeIfNeeded(): Promise<void> {
// Only summarize if we have enough messages
const nonSystemMessages = this.state.messages.filter((m) => m.role !== 'system');
if (nonSystemMessages.length < 10) return;
// Get messages to summarize (oldest half, excluding pinned)
const halfPoint = Math.floor(nonSystemMessages.length / 2);
const toSummarize = this.state.messages.slice(1, halfPoint + 1).filter((m) => !m.pinned);
if (toSummarize.length < 4) return;
// Generate summary
const summaryResponse = await this.openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{
role: 'system',
content: `Summarize this conversation excerpt. Include:
- Key topics discussed
- Important facts or decisions
- Any user preferences mentioned
Keep it concise but informative.`,
},
{
role: 'user',
content: toSummarize.map((m) => `${m.role}: ${m.content}`).join('\n\n'),
},
],
max_tokens: 300,
});
const newSummary = summaryResponse.choices[0].message.content || '';
// Update or append to existing summary
if (this.state.summary) {
this.state.summary = `${this.state.summary}\n\nLater: ${newSummary}`;
} else {
this.state.summary = newSummary;
}
// Remove summarized messages (except pinned)
const pinnedFromSummarized = this.state.messages
.slice(1, halfPoint + 1)
.filter((m) => m.pinned);
this.state.messages = [
this.state.messages[0], // System prompt
...pinnedFromSummarized,
...this.state.messages.slice(halfPoint + 1),
];
}
private applySlidingWindow(): void {
// Keep system prompt + summary + recent messages
const maxToKeep = Math.min(this.config.maxMessages, 30);
while (this.state.messages.length > maxToKeep) {
// Find first unpinned, non-system message
const indexToRemove = this.state.messages.findIndex((m, i) => i > 0 && !m.pinned);
if (indexToRemove === -1) break;
this.state.messages.splice(indexToRemove, 1);
}
}
private buildApiMessages(): Array<{ role: 'system' | 'user' | 'assistant'; content: string }> {
const messages: Array<{ role: 'system' | 'user' | 'assistant'; content: string }> = [];
// Add system prompt with summary if available
let systemContent = this.config.systemPrompt;
if (this.state.summary) {
systemContent += `\n\nPrevious conversation context:\n${this.state.summary}`;
}
messages.push({ role: 'system', content: systemContent });
// Add conversation messages (skip original system message)
for (const msg of this.state.messages.slice(1)) {
messages.push({ role: msg.role, content: msg.content });
}
return messages;
}
private estimateTokens(): number {
let total = 3; // Base overhead
for (const msg of this.state.messages) {
total += 4 + Math.ceil(msg.content.length / 4);
}
if (this.state.summary) {
total += Math.ceil(this.state.summary.length / 4);
}
return total;
}
private generateId(): string {
return `conv_${Date.now()}_${Math.random().toString(36).substring(2, 9)}`;
}
// Public getters
getState(): ConversationState {
return { ...this.state };
}
getMessageCount(): number {
return this.state.messages.length;
}
getTotalTokensUsed(): number {
return this.state.totalTokensUsed;
}
// Reset conversation
reset(): void {
this.state = {
id: this.generateId(),
messages: [
{
role: 'system',
content: this.config.systemPrompt,
timestamp: new Date(),
pinned: true,
},
],
summary: null,
createdAt: new Date(),
updatedAt: new Date(),
totalTokensUsed: 0,
};
}
}
Using the Conversation Manager
Here is how to use the manager in practice:
import { ConversationManager } from './conversation-manager';
async function main() {
const manager = new ConversationManager({
systemPrompt: 'You are a helpful programming tutor.',
maxTokens: 4000,
maxMessages: 20,
autoSummarize: true,
});
console.log('Chat started. This conversation will be managed automatically.\n');
// Simulate a conversation
const exchanges = [
'What is TypeScript?',
'How do I define types?',
'Show me an interface example',
'What about generics?',
'Can you explain type inference?',
// Pin an important message
{ content: 'Remember: I prefer concise code examples', pin: true },
'Show me a generic function',
'What are utility types?',
'Explain Partial<T>',
'What about Required<T>?',
];
for (const exchange of exchanges) {
const isObject = typeof exchange === 'object';
const content = isObject ? exchange.content : exchange;
const options = isObject ? { pin: exchange.pin } : {};
console.log(`You: ${content}`);
const response = await manager.sendMessage(content, options);
console.log(`Assistant: ${response.substring(0, 200)}...\n`);
}
// Check state
const state = manager.getState();
console.log('\n--- Conversation State ---');
console.log(`Messages: ${state.messages.length}`);
console.log(`Total tokens used: ${state.totalTokensUsed}`);
console.log(`Has summary: ${state.summary !== null}`);
if (state.summary) {
console.log(`Summary: ${state.summary.substring(0, 200)}...`);
}
}
main().catch(console.error);
Best Practices
1. Choose the Right Strategy
| Scenario | Recommended Strategy |
|---|---|
| Short conversations (<20 turns) | Sliding window |
| Long customer support chats | Summarization |
| Technical discussions | Token budget + pinning |
| Casual chatbots | Simple sliding window |
2. Monitor Token Usage
Always track tokens to control costs:
class MonitoredChat {
private tokenBudget: number;
private tokensUsed: number = 0;
async chat(message: string): Promise<string | null> {
if (this.tokensUsed >= this.tokenBudget) {
console.warn('Token budget exhausted');
return null;
}
// ... send message and track usage
}
}
3. Handle Edge Cases
// Empty messages
if (!userMessage.trim()) {
return 'Please enter a message.';
}
// Very long messages
if (userMessage.length > 10000) {
return 'Your message is too long. Please shorten it.';
}
// Conversation too long to summarize
if (this.messages.length > 100 && !this.canSummarize()) {
this.forceReset();
}
4. Preserve User Intent
When summarizing, ensure key user preferences are maintained:
const summaryPrompt = `Summarize this conversation.
IMPORTANT: Preserve any user preferences, requirements, or constraints they mentioned.
For example: preferred language, coding style, level of detail, etc.`;
Exercises
Exercise 1: Implement Token Counting
Use the tiktoken library for accurate token counting:
// Your implementation here
class AccurateTokenChat {
countTokens(text: string): number {
// TODO: Use tiktoken for accurate counting
}
}
Solution
import 'dotenv/config';
import OpenAI from 'openai';
import { TiktokenModel, encoding_for_model } from 'tiktoken';
interface Message {
role: 'system' | 'user' | 'assistant';
content: string;
}
class AccurateTokenChat {
private openai: OpenAI;
private messages: Message[] = [];
private model: TiktokenModel = 'gpt-4o-mini';
private maxTokens: number;
constructor(systemPrompt: string, maxTokens: number = 4000) {
this.openai = new OpenAI();
this.maxTokens = maxTokens;
this.messages.push({ role: 'system', content: systemPrompt });
}
countTokens(text: string): number {
const encoder = encoding_for_model(this.model);
const tokens = encoder.encode(text);
const count = tokens.length;
encoder.free();
return count;
}
countMessageTokens(messages: Message[]): number {
let total = 3; // Base overhead
for (const message of messages) {
total += 4; // Message overhead
total += this.countTokens(message.content);
}
return total;
}
async chat(userMessage: string): Promise<string> {
this.messages.push({ role: 'user', content: userMessage });
// Trim if over budget
while (this.messages.length > 1 && this.countMessageTokens(this.messages) > this.maxTokens) {
this.messages.splice(1, 1);
}
const currentTokens = this.countMessageTokens(this.messages);
console.log(`Current context: ${currentTokens} tokens`);
const response = await this.openai.chat.completions.create({
model: this.model,
messages: this.messages,
});
const content = response.choices[0].message.content || '';
this.messages.push({ role: 'assistant', content });
return content;
}
}
// Test
async function main() {
const chat = new AccurateTokenChat('You are a helpful assistant.', 2000);
console.log(await chat.chat('What is JavaScript?'));
console.log(await chat.chat('And TypeScript?'));
}
main();
Exercise 2: Conversation Export/Import
Implement saving and loading conversations:
// Your implementation here
class PersistentChat {
exportConversation(): string {
// TODO: Return JSON string of conversation
}
importConversation(data: string): void {
// TODO: Load conversation from JSON
}
}
Solution
import 'dotenv/config';
import * as fs from 'fs';
import OpenAI from 'openai';
interface Message {
role: 'system' | 'user' | 'assistant';
content: string;
timestamp: string;
}
interface ConversationData {
id: string;
messages: Message[];
createdAt: string;
exportedAt: string;
}
class PersistentChat {
private openai: OpenAI;
private messages: Message[] = [];
private id: string;
private createdAt: Date;
constructor(systemPrompt: string) {
this.openai = new OpenAI();
this.id = `conv_${Date.now()}`;
this.createdAt = new Date();
this.messages.push({
role: 'system',
content: systemPrompt,
timestamp: new Date().toISOString(),
});
}
async chat(userMessage: string): Promise<string> {
this.messages.push({
role: 'user',
content: userMessage,
timestamp: new Date().toISOString(),
});
const response = await this.openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: this.messages.map((m) => ({ role: m.role, content: m.content })),
});
const content = response.choices[0].message.content || '';
this.messages.push({
role: 'assistant',
content,
timestamp: new Date().toISOString(),
});
return content;
}
exportConversation(): string {
const data: ConversationData = {
id: this.id,
messages: this.messages,
createdAt: this.createdAt.toISOString(),
exportedAt: new Date().toISOString(),
};
return JSON.stringify(data, null, 2);
}
importConversation(jsonData: string): void {
const data: ConversationData = JSON.parse(jsonData);
this.id = data.id;
this.messages = data.messages;
this.createdAt = new Date(data.createdAt);
}
saveToFile(filepath: string): void {
fs.writeFileSync(filepath, this.exportConversation());
}
loadFromFile(filepath: string): void {
const data = fs.readFileSync(filepath, 'utf-8');
this.importConversation(data);
}
getMessageCount(): number {
return this.messages.length;
}
}
// Test
async function main() {
const chat = new PersistentChat('You are helpful.');
await chat.chat('Hello!');
await chat.chat('What is TypeScript?');
// Export
const exported = chat.exportConversation();
console.log('Exported:', exported);
// Save to file
chat.saveToFile('conversation.json');
console.log('Saved to file');
// Create new chat and import
const newChat = new PersistentChat('Placeholder');
newChat.loadFromFile('conversation.json');
console.log(`Loaded ${newChat.getMessageCount()} messages`);
// Continue conversation
const response = await newChat.chat('Can you summarize our conversation?');
console.log('Response:', response);
}
main();
Exercise 3: Smart Context Window
Create a context manager that prioritizes recent and relevant messages:
// Your implementation here
class SmartContextChat {
// TODO: Implement a chat that keeps messages based on:
// 1. Recency (newer = higher priority)
// 2. Relevance to current topic
// 3. Pinned status
}
Solution
import 'dotenv/config';
import OpenAI from 'openai';
interface Message {
role: 'system' | 'user' | 'assistant';
content: string;
timestamp: Date;
pinned: boolean;
relevanceScore?: number;
}
class SmartContextChat {
private openai: OpenAI;
private messages: Message[] = [];
private maxTokens: number;
constructor(systemPrompt: string, maxTokens: number = 4000) {
this.openai = new OpenAI();
this.maxTokens = maxTokens;
this.messages.push({
role: 'system',
content: systemPrompt,
timestamp: new Date(),
pinned: true,
});
}
async chat(userMessage: string): Promise<string> {
this.messages.push({
role: 'user',
content: userMessage,
timestamp: new Date(),
pinned: false,
});
// Score messages for relevance
await this.scoreRelevance(userMessage);
// Build context with smart selection
const contextMessages = this.buildSmartContext();
const response = await this.openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: contextMessages.map((m) => ({ role: m.role, content: m.content })),
});
const content = response.choices[0].message.content || '';
this.messages.push({
role: 'assistant',
content,
timestamp: new Date(),
pinned: false,
});
return content;
}
private async scoreRelevance(currentMessage: string): Promise<void> {
// Simple relevance scoring based on keyword overlap
const currentWords = new Set(currentMessage.toLowerCase().split(/\s+/));
for (const msg of this.messages) {
if (msg.role === 'system') {
msg.relevanceScore = 1; // System always relevant
continue;
}
const msgWords = msg.content.toLowerCase().split(/\s+/);
const overlap = msgWords.filter((w) => currentWords.has(w)).length;
const recencyBonus = this.getRecencyScore(msg.timestamp);
msg.relevanceScore = (overlap / msgWords.length) * 0.7 + recencyBonus * 0.3;
}
}
private getRecencyScore(timestamp: Date): number {
const ageMs = Date.now() - timestamp.getTime();
const ageMinutes = ageMs / (1000 * 60);
// Score decreases with age, but never below 0.1
return Math.max(0.1, 1 - ageMinutes / 60);
}
private buildSmartContext(): Message[] {
// Always include system prompt
const result: Message[] = [this.messages[0]];
// Get non-system messages
const candidates = this.messages.slice(1);
// Sort by priority: pinned first, then by combined score
const sorted = [...candidates].sort((a, b) => {
if (a.pinned !== b.pinned) return a.pinned ? -1 : 1;
return (b.relevanceScore || 0) - (a.relevanceScore || 0);
});
// Add messages until token budget
let tokenCount = this.estimateTokens(result[0]);
for (const msg of sorted) {
const msgTokens = this.estimateTokens(msg);
if (tokenCount + msgTokens > this.maxTokens) break;
result.push(msg);
tokenCount += msgTokens;
}
// Re-sort by timestamp for proper conversation flow
result.sort((a, b) => a.timestamp.getTime() - b.timestamp.getTime());
return result;
}
private estimateTokens(msg: Message): number {
return 4 + Math.ceil(msg.content.length / 4);
}
pinMessage(index: number): void {
if (index >= 0 && index < this.messages.length) {
this.messages[index].pinned = true;
}
}
}
// Test
async function main() {
const chat = new SmartContextChat('You are a programming tutor.', 2000);
// Have a conversation about different topics
await chat.chat('What is TypeScript?');
await chat.chat('What is Python?');
await chat.chat('What is JavaScript?');
await chat.chat('Tell me more about TypeScript generics');
// This should prioritize TypeScript messages due to relevance
console.log('Smart context chat completed');
}
main();
Key Takeaways
- History grows quickly: Every message adds to your token count and costs
- Multiple strategies exist: Sliding window, token budget, summarization, pinning
- Choose based on use case: Short chats need simple solutions, long ones need summarization
- Always monitor tokens: Track usage to prevent surprises
- Preserve important context: Use pinning or smart selection for key information
- Test edge cases: Very long messages, rapid conversations, context limits
Resources
| Resource | Type | Description |
|---|---|---|
| OpenAI Tokenizer | Tool | Visualize token counts |
| tiktoken | Library | Official token counting |
| Managing Conversation History | Documentation | Official guide |
| Context Window Best Practices | Tutorial | Token counting examples |
Next Lesson
You now know how to manage conversation history effectively. In the next lesson, you will learn how to use system prompts to create personalized AI assistants with distinct personalities and capabilities.