Lesson 6.3: Adding RAG

Duration: 90 minutes

Learning Objectives

By the end of this lesson, you will be able to:

Load and chunk documents for embedding
Generate and store vector embeddings
Implement semantic search for relevant context
Integrate RAG with the assistant for knowledge-grounded responses
Handle edge cases like no relevant documents found

Introduction

RAG (Retrieval-Augmented Generation) transforms your assistant from a general-purpose AI into a knowledge expert on your specific documents. In Module 4, you learned the concepts. Now you will implement a production-ready RAG system.

The RAG pipeline consists of:

Document Loading

First, create a loader that reads documents from a directory.

Create src/rag/loader.ts:

import { readFile, readdir, stat } from 'fs/promises';
import { extname, join } from 'path';

import type { Document } from '../core/types.js';
import { createLogger } from '../utils/logger.js';

const logger = createLogger('DocumentLoader');

export interface LoaderOptions {
  extensions?: string[];
  recursive?: boolean;
}

const DEFAULT_EXTENSIONS = ['.txt', '.md', '.markdown'];

export async function loadDocuments(
  directory: string,
  options: LoaderOptions = {}
): Promise<Document[]> {
  const extensions = options.extensions ?? DEFAULT_EXTENSIONS;
  const recursive = options.recursive ?? true;

  logger.info(`Loading documents from ${directory}`);

  const documents: Document[] = [];
  await loadFromDirectory(directory, documents, extensions, recursive);

  logger.info(`Loaded ${documents.length} documents`);
  return documents;
}

async function loadFromDirectory(
  directory: string,
  documents: Document[],
  extensions: string[],
  recursive: boolean
): Promise<void> {
  let entries;

  try {
    entries = await readdir(directory);
  } catch (error) {
    logger.warn(`Could not read directory: ${directory}`);
    return;
  }

  for (const entry of entries) {
    const fullPath = join(directory, entry);

    try {
      const stats = await stat(fullPath);

      if (stats.isDirectory() && recursive) {
        await loadFromDirectory(fullPath, documents, extensions, recursive);
      } else if (stats.isFile()) {
        const ext = extname(entry).toLowerCase();

        if (extensions.includes(ext)) {
          const content = await readFile(fullPath, 'utf-8');

          documents.push({
            id: fullPath,
            content,
            metadata: {
              source: fullPath,
              filename: entry,
              extension: ext,
              size: stats.size,
              modified: stats.mtime.toISOString(),
            },
          });

          logger.debug(`Loaded: ${entry}`);
        }
      }
    } catch (error) {
      logger.warn(`Error processing ${fullPath}: ${error}`);
    }
  }
}

// Load a single document
export async function loadDocument(filePath: string): Promise<Document> {
  const content = await readFile(filePath, 'utf-8');
  const stats = await stat(filePath);
  const filename = filePath.split('/').pop() ?? filePath;

  return {
    id: filePath,
    content,
    metadata: {
      source: filePath,
      filename,
      extension: extname(filename).toLowerCase(),
      size: stats.size,
      modified: stats.mtime.toISOString(),
    },
  };
}

Text Chunking

Large documents must be split into smaller chunks for effective retrieval.

Create src/rag/chunker.ts:

import { config } from '../core/config.js';
import type { Document } from '../core/types.js';
import { createLogger } from '../utils/logger.js';

const logger = createLogger('Chunker');

export interface ChunkOptions {
  chunkSize?: number;
  chunkOverlap?: number;
  separators?: string[];
}

export interface Chunk {
  id: string;
  content: string;
  metadata: Record<string, unknown>;
}

const DEFAULT_SEPARATORS = ['\n\n', '\n', '. ', ' ', ''];

export function chunkDocuments(documents: Document[], options: ChunkOptions = {}): Chunk[] {
  const chunkSize = options.chunkSize ?? config.chunkSize;
  const chunkOverlap = options.chunkOverlap ?? config.chunkOverlap;
  const separators = options.separators ?? DEFAULT_SEPARATORS;

  logger.debug(`Chunking ${documents.length} documents`);
  logger.debug(`Chunk size: ${chunkSize}, overlap: ${chunkOverlap}`);

  const allChunks: Chunk[] = [];

  for (const doc of documents) {
    const chunks = splitText(doc.content, chunkSize, chunkOverlap, separators);

    for (let i = 0; i < chunks.length; i++) {
      allChunks.push({
        id: `${doc.id}#chunk${i}`,
        content: chunks[i],
        metadata: {
          ...doc.metadata,
          chunkIndex: i,
          totalChunks: chunks.length,
        },
      });
    }
  }

  logger.info(`Created ${allChunks.length} chunks from ${documents.length} documents`);
  return allChunks;
}

function splitText(
  text: string,
  chunkSize: number,
  overlap: number,
  separators: string[]
): string[] {
  const chunks: string[] = [];

  // Try each separator in order
  for (const separator of separators) {
    if (separator === '') {
      // Character-level splitting as last resort
      return splitByCharacter(text, chunkSize, overlap);
    }

    const splits = text.split(separator);

    // If this separator creates reasonable chunks, use it
    if (splits.length > 1 && splits.some((s) => s.length > chunkSize / 4)) {
      return mergeSplits(splits, separator, chunkSize, overlap);
    }
  }

  // Fallback to character splitting
  return splitByCharacter(text, chunkSize, overlap);
}

function mergeSplits(
  splits: string[],
  separator: string,
  chunkSize: number,
  overlap: number
): string[] {
  const chunks: string[] = [];
  let currentChunk: string[] = [];
  let currentLength = 0;

  for (const split of splits) {
    const splitLength = split.length + separator.length;

    if (currentLength + splitLength > chunkSize && currentChunk.length > 0) {
      // Save current chunk
      chunks.push(currentChunk.join(separator).trim());

      // Start new chunk with overlap
      const overlapTarget = overlap;
      let overlapLength = 0;
      const overlapChunks: string[] = [];

      for (let i = currentChunk.length - 1; i >= 0 && overlapLength < overlapTarget; i--) {
        overlapChunks.unshift(currentChunk[i]);
        overlapLength += currentChunk[i].length + separator.length;
      }

      currentChunk = overlapChunks;
      currentLength = overlapLength;
    }

    currentChunk.push(split);
    currentLength += splitLength;
  }

  // Don't forget the last chunk
  if (currentChunk.length > 0) {
    chunks.push(currentChunk.join(separator).trim());
  }

  return chunks.filter((c) => c.length > 0);
}

function splitByCharacter(text: string, chunkSize: number, overlap: number): string[] {
  const chunks: string[] = [];
  let start = 0;

  while (start < text.length) {
    const end = Math.min(start + chunkSize, text.length);
    chunks.push(text.slice(start, end).trim());
    start = end - overlap;

    // Prevent infinite loop
    if (start >= text.length - overlap) break;
  }

  return chunks.filter((c) => c.length > 0);
}

// Utility to estimate tokens (rough approximation)
export function estimateTokens(text: string): number {
  // Rough estimate: 1 token ~= 4 characters for English
  return Math.ceil(text.length / 4);
}

Embedding Generation

Generate vector embeddings using OpenAI's embedding API.

Create src/rag/embeddings.ts:

import OpenAI from 'openai';

import { config } from '../core/config.js';
import { RagError } from '../utils/errors.js';
import { createLogger } from '../utils/logger.js';

const logger = createLogger('Embeddings');

export class EmbeddingService {
  private client: OpenAI;
  private model: string;
  private batchSize: number;

  constructor() {
    this.client = new OpenAI({
      apiKey: config.openaiApiKey,
    });
    this.model = config.embeddingModel;
    this.batchSize = 100; // OpenAI allows up to 2048, but smaller batches are safer
  }

  async embed(text: string): Promise<number[]> {
    try {
      const response = await this.client.embeddings.create({
        model: this.model,
        input: text,
      });

      return response.data[0].embedding;
    } catch (error) {
      logger.error('Embedding failed', error);
      throw new RagError(
        `Failed to generate embedding: ${error instanceof Error ? error.message : 'Unknown error'}`,
        error instanceof Error ? error : undefined
      );
    }
  }

  async embedBatch(texts: string[]): Promise<number[][]> {
    logger.debug(`Embedding ${texts.length} texts`);

    const allEmbeddings: number[][] = [];

    // Process in batches
    for (let i = 0; i < texts.length; i += this.batchSize) {
      const batch = texts.slice(i, i + this.batchSize);

      try {
        const response = await this.client.embeddings.create({
          model: this.model,
          input: batch,
        });

        // Sort by index to maintain order
        const sorted = response.data.sort((a, b) => a.index - b.index);
        allEmbeddings.push(...sorted.map((d) => d.embedding));

        logger.debug(`Processed batch ${Math.floor(i / this.batchSize) + 1}`);
      } catch (error) {
        logger.error(`Batch embedding failed at index ${i}`, error);
        throw new RagError(
          `Failed to generate embeddings: ${error instanceof Error ? error.message : 'Unknown error'}`,
          error instanceof Error ? error : undefined
        );
      }
    }

    logger.info(`Generated ${allEmbeddings.length} embeddings`);
    return allEmbeddings;
  }

  // Get embedding dimension for the current model
  getDimension(): number {
    // OpenAI embedding dimensions
    const dimensions: Record<string, number> = {
      'text-embedding-3-small': 1536,
      'text-embedding-3-large': 3072,
      'text-embedding-ada-002': 1536,
    };

    return dimensions[this.model] ?? 1536;
  }
}

Vector Store

Implement a simple in-memory vector store with cosine similarity search.

Create src/rag/vector-store.ts:

import { createLogger } from '../utils/logger.js';
import type { Chunk } from './chunker.js';

const logger = createLogger('VectorStore');

export interface VectorEntry {
  id: string;
  embedding: number[];
  content: string;
  metadata: Record<string, unknown>;
}

export interface SearchResult {
  id: string;
  content: string;
  metadata: Record<string, unknown>;
  score: number;
}

export class VectorStore {
  private entries: VectorEntry[] = [];
  private dimension: number;

  constructor(dimension: number) {
    this.dimension = dimension;
  }

  add(
    id: string,
    embedding: number[],
    content: string,
    metadata: Record<string, unknown> = {}
  ): void {
    if (embedding.length !== this.dimension) {
      throw new Error(
        `Embedding dimension mismatch: expected ${this.dimension}, got ${embedding.length}`
      );
    }

    this.entries.push({ id, embedding, content, metadata });
  }

  addBatch(chunks: Chunk[], embeddings: number[][]): void {
    if (chunks.length !== embeddings.length) {
      throw new Error('Chunks and embeddings arrays must have same length');
    }

    for (let i = 0; i < chunks.length; i++) {
      this.add(chunks[i].id, embeddings[i], chunks[i].content, chunks[i].metadata);
    }

    logger.info(`Added ${chunks.length} entries to vector store`);
  }

  search(queryEmbedding: number[], topK: number = 3): SearchResult[] {
    if (queryEmbedding.length !== this.dimension) {
      throw new Error(
        `Query embedding dimension mismatch: expected ${this.dimension}, got ${queryEmbedding.length}`
      );
    }

    if (this.entries.length === 0) {
      logger.warn('Vector store is empty');
      return [];
    }

    // Calculate similarity scores
    const scored = this.entries.map((entry) => ({
      ...entry,
      score: this.cosineSimilarity(queryEmbedding, entry.embedding),
    }));

    // Sort by score (highest first) and take top K
    scored.sort((a, b) => b.score - a.score);

    const results = scored.slice(0, topK).map(({ id, content, metadata, score }) => ({
      id,
      content,
      metadata,
      score,
    }));

    logger.debug(`Search returned ${results.length} results`);
    return results;
  }

  private cosineSimilarity(a: number[], b: number[]): number {
    let dotProduct = 0;
    let normA = 0;
    let normB = 0;

    for (let i = 0; i < a.length; i++) {
      dotProduct += a[i] * b[i];
      normA += a[i] * a[i];
      normB += b[i] * b[i];
    }

    const magnitude = Math.sqrt(normA) * Math.sqrt(normB);

    if (magnitude === 0) return 0;

    return dotProduct / magnitude;
  }

  size(): number {
    return this.entries.length;
  }

  clear(): void {
    this.entries = [];
    logger.info('Vector store cleared');
  }

  // Export for persistence
  export(): VectorEntry[] {
    return [...this.entries];
  }

  // Import from persistence
  import(entries: VectorEntry[]): void {
    this.entries = entries;
    logger.info(`Imported ${entries.length} entries`);
  }
}

RAG Retriever

Combine all components into a retriever that can be used by the assistant.

Create src/rag/retriever.ts:

import { config } from '../core/config.js';
import { RagError } from '../utils/errors.js';
import { createLogger } from '../utils/logger.js';
import { type Chunk, chunkDocuments } from './chunker.js';
import { EmbeddingService } from './embeddings.js';
import { loadDocuments } from './loader.js';
import { type SearchResult, VectorStore } from './vector-store.js';

const logger = createLogger('Retriever');

export interface RetrieverOptions {
  documentsPath?: string;
  topK?: number;
  minScore?: number;
}

export class Retriever {
  private embeddingService: EmbeddingService;
  private vectorStore: VectorStore;
  private topK: number;
  private minScore: number;
  private initialized = false;

  constructor(options: RetrieverOptions = {}) {
    this.embeddingService = new EmbeddingService();
    this.vectorStore = new VectorStore(this.embeddingService.getDimension());
    this.topK = options.topK ?? config.retrievalTopK;
    this.minScore = options.minScore ?? 0.7;
  }

  async initialize(documentsPath?: string): Promise<void> {
    const path = documentsPath ?? config.documentsPath;

    logger.info(`Initializing retriever with documents from: ${path}`);

    try {
      // Load documents
      const documents = await loadDocuments(path);

      if (documents.length === 0) {
        logger.warn('No documents found to index');
        this.initialized = true;
        return;
      }

      // Chunk documents
      const chunks = chunkDocuments(documents);

      if (chunks.length === 0) {
        logger.warn('No chunks created from documents');
        this.initialized = true;
        return;
      }

      // Generate embeddings
      const contents = chunks.map((c) => c.content);
      const embeddings = await this.embeddingService.embedBatch(contents);

      // Store in vector store
      this.vectorStore.addBatch(chunks, embeddings);

      this.initialized = true;
      logger.info(`Retriever initialized with ${this.vectorStore.size()} chunks`);
    } catch (error) {
      logger.error('Failed to initialize retriever', error);
      throw new RagError(
        `Retriever initialization failed: ${error instanceof Error ? error.message : 'Unknown error'}`,
        error instanceof Error ? error : undefined
      );
    }
  }

  async retrieve(query: string): Promise<string> {
    if (!this.initialized) {
      throw new RagError('Retriever not initialized. Call initialize() first.');
    }

    if (this.vectorStore.size() === 0) {
      logger.debug('No documents indexed, returning empty context');
      return '';
    }

    logger.debug(`Retrieving context for: ${query.substring(0, 50)}...`);

    try {
      // Embed the query
      const queryEmbedding = await this.embeddingService.embed(query);

      // Search for relevant chunks
      const results = this.vectorStore.search(queryEmbedding, this.topK);

      // Filter by minimum score
      const relevant = results.filter((r) => r.score >= this.minScore);

      if (relevant.length === 0) {
        logger.debug('No relevant documents found above threshold');
        return '';
      }

      // Format results as context
      const context = this.formatContext(relevant);

      logger.debug(`Retrieved ${relevant.length} relevant chunks`);
      return context;
    } catch (error) {
      logger.error('Retrieval failed', error);
      throw new RagError(
        `Retrieval failed: ${error instanceof Error ? error.message : 'Unknown error'}`,
        error instanceof Error ? error : undefined
      );
    }
  }

  private formatContext(results: SearchResult[]): string {
    return results
      .map((result, index) => {
        const source = result.metadata.filename ?? result.metadata.source ?? 'Unknown';
        return `[Document ${index + 1}] (Source: ${source}, Relevance: ${(result.score * 100).toFixed(1)}%)
${result.content}`;
      })
      .join('\n\n---\n\n');
  }

  // Add documents at runtime
  async addDocument(content: string, metadata: Record<string, unknown> = {}): Promise<void> {
    const chunks = chunkDocuments([
      {
        id: `runtime-${Date.now()}`,
        content,
        metadata,
      },
    ]);

    const embeddings = await this.embeddingService.embedBatch(chunks.map((c) => c.content));

    this.vectorStore.addBatch(chunks, embeddings);
    logger.info(`Added document with ${chunks.length} chunks`);
  }

  // Get statistics
  getStats(): { documentCount: number; initialized: boolean } {
    return {
      documentCount: this.vectorStore.size(),
      initialized: this.initialized,
    };
  }

  // Clear all documents
  clear(): void {
    this.vectorStore.clear();
    logger.info('Retriever cleared');
  }
}

// Create a retriever function for the assistant
export function createRetriever(options?: RetrieverOptions): (query: string) => Promise<string> {
  const retriever = new Retriever(options);
  let initPromise: Promise<void> | null = null;

  return async (query: string): Promise<string> => {
    // Lazy initialization
    if (!initPromise) {
      initPromise = retriever.initialize();
    }

    await initPromise;
    return retriever.retrieve(query);
  };
}

Integrating RAG with the Assistant

Update your assistant to use RAG. Modify src/index.ts:

import * as readline from 'readline';

import { Assistant } from './core/assistant.js';
import { createRetriever } from './rag/retriever.js';
import { createLogger } from './utils/logger.js';

const logger = createLogger('Main');

async function main() {
  console.log('AI Knowledge Assistant');
  console.log('======================');
  console.log('Type your message and press Enter.');
  console.log('Commands: /clear (clear history), /status (show stats), /exit (quit)');
  console.log('');

  // Create assistant with RAG
  const assistant = new Assistant({
    enableRag: true,
  });

  // Configure RAG retriever
  const retriever = createRetriever();
  assistant.setRagRetriever(retriever);

  console.log('Initializing knowledge base...');

  // Trigger initialization by making a dummy query
  try {
    await retriever('initialize');
    console.log('Knowledge base ready!\n');
  } catch (error) {
    console.log('Warning: Could not load documents. Continuing without RAG.\n');
  }

  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout,
  });

  const prompt = () => {
    rl.question('You: ', async (input) => {
      const trimmed = input.trim();

      if (!trimmed) {
        prompt();
        return;
      }

      // Handle commands
      if (trimmed === '/exit') {
        console.log('Goodbye!');
        rl.close();
        process.exit(0);
      }

      if (trimmed === '/clear') {
        assistant.clearHistory();
        console.log('Conversation history cleared.\n');
        prompt();
        return;
      }

      if (trimmed === '/status') {
        const history = assistant.getHistory();
        console.log(`Messages in history: ${history.length}`);
        console.log('');
        prompt();
        return;
      }

      // Process message with streaming
      process.stdout.write('Assistant: ');

      try {
        for await (const chunk of assistant.chatStream(trimmed)) {
          if (chunk.type === 'text' && chunk.content) {
            process.stdout.write(chunk.content);
          } else if (chunk.type === 'tool_call' && chunk.toolCall) {
            process.stdout.write(`\n[Using tool: ${chunk.toolCall.name}]\n`);
          }
        }
        console.log('\n');
      } catch (error) {
        console.error(`\nError: ${error instanceof Error ? error.message : error}\n`);
      }

      prompt();
    });
  };

  prompt();
}

main().catch((error) => {
  logger.error('Fatal error', error);
  process.exit(1);
});

Creating Sample Documents

Add some documents to test RAG. Create documents/typescript-guide.md:

# TypeScript Quick Reference

## Variables

TypeScript supports three ways to declare variables:

- `let` - Block-scoped, can be reassigned
- `const` - Block-scoped, cannot be reassigned
- `var` - Function-scoped (avoid using)

Example:

```typescript
let name = 'Alice';
const age = 30;
name = 'Bob'; // OK
// age = 31; // Error!
```

Types

Basic Types

string - Text values
number - All numbers (integer and floating point)
boolean - true or false
null and undefined - Absence of value
any - Opt out of type checking (avoid when possible)

Arrays

const numbers: number[] = [1, 2, 3];
const names: Array<string> = ['Alice', 'Bob'];

Objects and Interfaces

interface User {
  name: string;
  age: number;
  email?: string; // Optional
}

const user: User = {
  name: 'Alice',
  age: 30,
};

Functions

Function Types

function add(a: number, b: number): number {
  return a + b;
}

const multiply = (a: number, b: number): number => a * b;

Optional Parameters

function greet(name: string, greeting?: string): string {
  return `${greeting ?? 'Hello'}, ${name}!`;
}

Generics

Generics allow creating reusable components:

function identity<T>(value: T): T {
  return value;
}

const num = identity(42); // number
const str = identity('hello'); // string

Common Patterns

Type Guards

function isString(value: unknown): value is string {
  return typeof value === 'string';
}

Utility Types

Partial<T> - Makes all properties optional
Required<T> - Makes all properties required
Pick<T, K> - Select specific properties
Omit<T, K> - Remove specific properties


Create `documents/ai-assistant-features.md`:

```markdown
# AI Assistant Features

## Overview

This AI assistant provides intelligent help with various tasks. It combines the power of large language models with custom tools and knowledge retrieval.

## Core Capabilities

### 1. Conversational Memory

The assistant remembers your conversation history within a session. You can:
- Ask follow-up questions without repeating context
- Reference previous topics
- Build on earlier responses

### 2. Knowledge Base

The assistant can search through loaded documents to find relevant information. This is useful for:
- Company documentation
- Technical references
- Personal notes
- Any text-based knowledge

### 3. Tool Usage

The assistant can use various tools:

**Calculator**: Perform mathematical operations
- Basic arithmetic: addition, subtraction, multiplication, division
- Complex expressions: powers, roots, percentages

**Weather**: Get current weather information
- Requires location (city name)
- Returns temperature and conditions

**Notes**: Save and retrieve information
- Persistent storage across conversations
- Useful for remembering important details

## Best Practices

### Asking Questions

1. **Be specific**: "What is the syntax for TypeScript generics?" is better than "Tell me about TypeScript"

2. **Provide context**: If your question relates to something specific, mention it

3. **One topic at a time**: For complex questions, break them into parts

### Using Tools

The assistant automatically decides when to use tools. You can also explicitly request them:
- "Calculate 15% of 250"
- "What's the weather in Tokyo?"
- "Save this note: Meeting at 3pm tomorrow"

## Limitations

- Knowledge is limited to loaded documents
- Cannot browse the internet in real-time
- Weather requires API key configuration
- Conversation history is not persisted between sessions

Testing RAG

Run your assistant and test RAG:

npm start

Example interaction:

AI Knowledge Assistant
======================
Type your message and press Enter.
Commands: /clear (clear history), /status (show stats), /exit (quit)

Initializing knowledge base...
Knowledge base ready!

You: What types of variables does TypeScript support?
Assistant: Based on the documentation, TypeScript supports three ways to declare variables:

1. **`let`** - Block-scoped and can be reassigned
2. **`const`** - Block-scoped and cannot be reassigned
3. **`var`** - Function-scoped (though it's recommended to avoid using this)

Here's an example:
```typescript
let name = "Alice";
const age = 30;
name = "Bob"; // This works
// age = 31; // This would cause an error!

The key difference is that let allows reassignment while const creates an immutable binding.

You: What tools does the assistant have? Assistant: According to the knowledge base, the assistant has three main tools:

Calculator - For mathematical operations including:
- Basic arithmetic (addition, subtraction, multiplication, division)
- Complex expressions (powers, roots, percentages)
Weather - Gets current weather information:
- Requires a location (city name)
- Returns temperature and conditions
Notes - For saving and retrieving information:
- Provides persistent storage across conversations
- Useful for remembering important details

The assistant automatically decides when to use these tools based on your questions, but you can also explicitly request them.


---

## Key Takeaways

1. **Document loading** reads files from a directory into a standard format
2. **Chunking** splits large documents into retrievable pieces
3. **Embeddings** convert text into vectors for semantic search
4. **Vector stores** enable fast similarity search
5. **The retriever** combines all components for end-to-end retrieval
6. **Integration** with the assistant augments the system prompt with context

---

## Practice Exercise

1. Add support for PDF documents (using a PDF parsing library)
2. Implement document metadata filtering (e.g., search only recent documents)
3. Add a `/docs` command to list indexed documents
4. Implement hybrid search combining keyword and semantic matching
5. Add relevance feedback to improve retrieval over time

---

## Next Steps

Your assistant now has knowledge! In the next lesson, you will add tools to give it the ability to take actions.

[Continue to Lesson 6.4: Integrating Tools](./04-integrating-tools.md)