Lesson 6.3: Adding RAG
Duration: 90 minutes
Learning Objectives
By the end of this lesson, you will be able to:
- Load and chunk documents for embedding
- Generate and store vector embeddings
- Implement semantic search for relevant context
- Integrate RAG with the assistant for knowledge-grounded responses
- Handle edge cases like no relevant documents found
Introduction
RAG (Retrieval-Augmented Generation) transforms your assistant from a general-purpose AI into a knowledge expert on your specific documents. In Module 4, you learned the concepts. Now you will implement a production-ready RAG system.
The RAG pipeline consists of:
┌─────────────────────────────────────────────────────────────────────┐
│ RAG PIPELINE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ INDEXING PHASE (Offline) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Load │───▶│ Chunk │───▶│ Embed │───▶│ Store │ │
│ │Documents │ │ Text │ │ Chunks │ │ Vectors │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ QUERY PHASE (Online) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ User │───▶│ Embed │───▶│ Search │───▶│ Augment │ │
│ │ Query │ │ Query │ │ Vectors │ │ Context │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Document Loading
First, create a loader that reads documents from a directory.
Create src/rag/loader.ts:
import { readFile, readdir, stat } from 'fs/promises';
import { extname, join } from 'path';
import type { Document } from '../core/types.js';
import { createLogger } from '../utils/logger.js';
const logger = createLogger('DocumentLoader');
export interface LoaderOptions {
extensions?: string[];
recursive?: boolean;
}
const DEFAULT_EXTENSIONS = ['.txt', '.md', '.markdown'];
export async function loadDocuments(
directory: string,
options: LoaderOptions = {}
): Promise<Document[]> {
const extensions = options.extensions ?? DEFAULT_EXTENSIONS;
const recursive = options.recursive ?? true;
logger.info(`Loading documents from ${directory}`);
const documents: Document[] = [];
await loadFromDirectory(directory, documents, extensions, recursive);
logger.info(`Loaded ${documents.length} documents`);
return documents;
}
async function loadFromDirectory(
directory: string,
documents: Document[],
extensions: string[],
recursive: boolean
): Promise<void> {
let entries;
try {
entries = await readdir(directory);
} catch (error) {
logger.warn(`Could not read directory: ${directory}`);
return;
}
for (const entry of entries) {
const fullPath = join(directory, entry);
try {
const stats = await stat(fullPath);
if (stats.isDirectory() && recursive) {
await loadFromDirectory(fullPath, documents, extensions, recursive);
} else if (stats.isFile()) {
const ext = extname(entry).toLowerCase();
if (extensions.includes(ext)) {
const content = await readFile(fullPath, 'utf-8');
documents.push({
id: fullPath,
content,
metadata: {
source: fullPath,
filename: entry,
extension: ext,
size: stats.size,
modified: stats.mtime.toISOString(),
},
});
logger.debug(`Loaded: ${entry}`);
}
}
} catch (error) {
logger.warn(`Error processing ${fullPath}: ${error}`);
}
}
}
// Load a single document
export async function loadDocument(filePath: string): Promise<Document> {
const content = await readFile(filePath, 'utf-8');
const stats = await stat(filePath);
const filename = filePath.split('/').pop() ?? filePath;
return {
id: filePath,
content,
metadata: {
source: filePath,
filename,
extension: extname(filename).toLowerCase(),
size: stats.size,
modified: stats.mtime.toISOString(),
},
};
}
Text Chunking
Large documents must be split into smaller chunks for effective retrieval.
Create src/rag/chunker.ts:
import { config } from '../core/config.js';
import type { Document } from '../core/types.js';
import { createLogger } from '../utils/logger.js';
const logger = createLogger('Chunker');
export interface ChunkOptions {
chunkSize?: number;
chunkOverlap?: number;
separators?: string[];
}
export interface Chunk {
id: string;
content: string;
metadata: Record<string, unknown>;
}
const DEFAULT_SEPARATORS = ['\n\n', '\n', '. ', ' ', ''];
export function chunkDocuments(documents: Document[], options: ChunkOptions = {}): Chunk[] {
const chunkSize = options.chunkSize ?? config.chunkSize;
const chunkOverlap = options.chunkOverlap ?? config.chunkOverlap;
const separators = options.separators ?? DEFAULT_SEPARATORS;
logger.debug(`Chunking ${documents.length} documents`);
logger.debug(`Chunk size: ${chunkSize}, overlap: ${chunkOverlap}`);
const allChunks: Chunk[] = [];
for (const doc of documents) {
const chunks = splitText(doc.content, chunkSize, chunkOverlap, separators);
for (let i = 0; i < chunks.length; i++) {
allChunks.push({
id: `${doc.id}#chunk${i}`,
content: chunks[i],
metadata: {
...doc.metadata,
chunkIndex: i,
totalChunks: chunks.length,
},
});
}
}
logger.info(`Created ${allChunks.length} chunks from ${documents.length} documents`);
return allChunks;
}
function splitText(
text: string,
chunkSize: number,
overlap: number,
separators: string[]
): string[] {
const chunks: string[] = [];
// Try each separator in order
for (const separator of separators) {
if (separator === '') {
// Character-level splitting as last resort
return splitByCharacter(text, chunkSize, overlap);
}
const splits = text.split(separator);
// If this separator creates reasonable chunks, use it
if (splits.length > 1 && splits.some((s) => s.length > chunkSize / 4)) {
return mergeSplits(splits, separator, chunkSize, overlap);
}
}
// Fallback to character splitting
return splitByCharacter(text, chunkSize, overlap);
}
function mergeSplits(
splits: string[],
separator: string,
chunkSize: number,
overlap: number
): string[] {
const chunks: string[] = [];
let currentChunk: string[] = [];
let currentLength = 0;
for (const split of splits) {
const splitLength = split.length + separator.length;
if (currentLength + splitLength > chunkSize && currentChunk.length > 0) {
// Save current chunk
chunks.push(currentChunk.join(separator).trim());
// Start new chunk with overlap
const overlapTarget = overlap;
let overlapLength = 0;
const overlapChunks: string[] = [];
for (let i = currentChunk.length - 1; i >= 0 && overlapLength < overlapTarget; i--) {
overlapChunks.unshift(currentChunk[i]);
overlapLength += currentChunk[i].length + separator.length;
}
currentChunk = overlapChunks;
currentLength = overlapLength;
}
currentChunk.push(split);
currentLength += splitLength;
}
// Don't forget the last chunk
if (currentChunk.length > 0) {
chunks.push(currentChunk.join(separator).trim());
}
return chunks.filter((c) => c.length > 0);
}
function splitByCharacter(text: string, chunkSize: number, overlap: number): string[] {
const chunks: string[] = [];
let start = 0;
while (start < text.length) {
const end = Math.min(start + chunkSize, text.length);
chunks.push(text.slice(start, end).trim());
start = end - overlap;
// Prevent infinite loop
if (start >= text.length - overlap) break;
}
return chunks.filter((c) => c.length > 0);
}
// Utility to estimate tokens (rough approximation)
export function estimateTokens(text: string): number {
// Rough estimate: 1 token ~= 4 characters for English
return Math.ceil(text.length / 4);
}
Embedding Generation
Generate vector embeddings using OpenAI's embedding API.
Create src/rag/embeddings.ts:
import OpenAI from 'openai';
import { config } from '../core/config.js';
import { RagError } from '../utils/errors.js';
import { createLogger } from '../utils/logger.js';
const logger = createLogger('Embeddings');
export class EmbeddingService {
private client: OpenAI;
private model: string;
private batchSize: number;
constructor() {
this.client = new OpenAI({
apiKey: config.openaiApiKey,
});
this.model = config.embeddingModel;
this.batchSize = 100; // OpenAI allows up to 2048, but smaller batches are safer
}
async embed(text: string): Promise<number[]> {
try {
const response = await this.client.embeddings.create({
model: this.model,
input: text,
});
return response.data[0].embedding;
} catch (error) {
logger.error('Embedding failed', error);
throw new RagError(
`Failed to generate embedding: ${error instanceof Error ? error.message : 'Unknown error'}`,
error instanceof Error ? error : undefined
);
}
}
async embedBatch(texts: string[]): Promise<number[][]> {
logger.debug(`Embedding ${texts.length} texts`);
const allEmbeddings: number[][] = [];
// Process in batches
for (let i = 0; i < texts.length; i += this.batchSize) {
const batch = texts.slice(i, i + this.batchSize);
try {
const response = await this.client.embeddings.create({
model: this.model,
input: batch,
});
// Sort by index to maintain order
const sorted = response.data.sort((a, b) => a.index - b.index);
allEmbeddings.push(...sorted.map((d) => d.embedding));
logger.debug(`Processed batch ${Math.floor(i / this.batchSize) + 1}`);
} catch (error) {
logger.error(`Batch embedding failed at index ${i}`, error);
throw new RagError(
`Failed to generate embeddings: ${error instanceof Error ? error.message : 'Unknown error'}`,
error instanceof Error ? error : undefined
);
}
}
logger.info(`Generated ${allEmbeddings.length} embeddings`);
return allEmbeddings;
}
// Get embedding dimension for the current model
getDimension(): number {
// OpenAI embedding dimensions
const dimensions: Record<string, number> = {
'text-embedding-3-small': 1536,
'text-embedding-3-large': 3072,
'text-embedding-ada-002': 1536,
};
return dimensions[this.model] ?? 1536;
}
}
Vector Store
Implement a simple in-memory vector store with cosine similarity search.
Create src/rag/vector-store.ts:
import { createLogger } from '../utils/logger.js';
import type { Chunk } from './chunker.js';
const logger = createLogger('VectorStore');
export interface VectorEntry {
id: string;
embedding: number[];
content: string;
metadata: Record<string, unknown>;
}
export interface SearchResult {
id: string;
content: string;
metadata: Record<string, unknown>;
score: number;
}
export class VectorStore {
private entries: VectorEntry[] = [];
private dimension: number;
constructor(dimension: number) {
this.dimension = dimension;
}
add(
id: string,
embedding: number[],
content: string,
metadata: Record<string, unknown> = {}
): void {
if (embedding.length !== this.dimension) {
throw new Error(
`Embedding dimension mismatch: expected ${this.dimension}, got ${embedding.length}`
);
}
this.entries.push({ id, embedding, content, metadata });
}
addBatch(chunks: Chunk[], embeddings: number[][]): void {
if (chunks.length !== embeddings.length) {
throw new Error('Chunks and embeddings arrays must have same length');
}
for (let i = 0; i < chunks.length; i++) {
this.add(chunks[i].id, embeddings[i], chunks[i].content, chunks[i].metadata);
}
logger.info(`Added ${chunks.length} entries to vector store`);
}
search(queryEmbedding: number[], topK: number = 3): SearchResult[] {
if (queryEmbedding.length !== this.dimension) {
throw new Error(
`Query embedding dimension mismatch: expected ${this.dimension}, got ${queryEmbedding.length}`
);
}
if (this.entries.length === 0) {
logger.warn('Vector store is empty');
return [];
}
// Calculate similarity scores
const scored = this.entries.map((entry) => ({
...entry,
score: this.cosineSimilarity(queryEmbedding, entry.embedding),
}));
// Sort by score (highest first) and take top K
scored.sort((a, b) => b.score - a.score);
const results = scored.slice(0, topK).map(({ id, content, metadata, score }) => ({
id,
content,
metadata,
score,
}));
logger.debug(`Search returned ${results.length} results`);
return results;
}
private cosineSimilarity(a: number[], b: number[]): number {
let dotProduct = 0;
let normA = 0;
let normB = 0;
for (let i = 0; i < a.length; i++) {
dotProduct += a[i] * b[i];
normA += a[i] * a[i];
normB += b[i] * b[i];
}
const magnitude = Math.sqrt(normA) * Math.sqrt(normB);
if (magnitude === 0) return 0;
return dotProduct / magnitude;
}
size(): number {
return this.entries.length;
}
clear(): void {
this.entries = [];
logger.info('Vector store cleared');
}
// Export for persistence
export(): VectorEntry[] {
return [...this.entries];
}
// Import from persistence
import(entries: VectorEntry[]): void {
this.entries = entries;
logger.info(`Imported ${entries.length} entries`);
}
}
RAG Retriever
Combine all components into a retriever that can be used by the assistant.
Create src/rag/retriever.ts:
import { config } from '../core/config.js';
import { RagError } from '../utils/errors.js';
import { createLogger } from '../utils/logger.js';
import { type Chunk, chunkDocuments } from './chunker.js';
import { EmbeddingService } from './embeddings.js';
import { loadDocuments } from './loader.js';
import { type SearchResult, VectorStore } from './vector-store.js';
const logger = createLogger('Retriever');
export interface RetrieverOptions {
documentsPath?: string;
topK?: number;
minScore?: number;
}
export class Retriever {
private embeddingService: EmbeddingService;
private vectorStore: VectorStore;
private topK: number;
private minScore: number;
private initialized = false;
constructor(options: RetrieverOptions = {}) {
this.embeddingService = new EmbeddingService();
this.vectorStore = new VectorStore(this.embeddingService.getDimension());
this.topK = options.topK ?? config.retrievalTopK;
this.minScore = options.minScore ?? 0.7;
}
async initialize(documentsPath?: string): Promise<void> {
const path = documentsPath ?? config.documentsPath;
logger.info(`Initializing retriever with documents from: ${path}`);
try {
// Load documents
const documents = await loadDocuments(path);
if (documents.length === 0) {
logger.warn('No documents found to index');
this.initialized = true;
return;
}
// Chunk documents
const chunks = chunkDocuments(documents);
if (chunks.length === 0) {
logger.warn('No chunks created from documents');
this.initialized = true;
return;
}
// Generate embeddings
const contents = chunks.map((c) => c.content);
const embeddings = await this.embeddingService.embedBatch(contents);
// Store in vector store
this.vectorStore.addBatch(chunks, embeddings);
this.initialized = true;
logger.info(`Retriever initialized with ${this.vectorStore.size()} chunks`);
} catch (error) {
logger.error('Failed to initialize retriever', error);
throw new RagError(
`Retriever initialization failed: ${error instanceof Error ? error.message : 'Unknown error'}`,
error instanceof Error ? error : undefined
);
}
}
async retrieve(query: string): Promise<string> {
if (!this.initialized) {
throw new RagError('Retriever not initialized. Call initialize() first.');
}
if (this.vectorStore.size() === 0) {
logger.debug('No documents indexed, returning empty context');
return '';
}
logger.debug(`Retrieving context for: ${query.substring(0, 50)}...`);
try {
// Embed the query
const queryEmbedding = await this.embeddingService.embed(query);
// Search for relevant chunks
const results = this.vectorStore.search(queryEmbedding, this.topK);
// Filter by minimum score
const relevant = results.filter((r) => r.score >= this.minScore);
if (relevant.length === 0) {
logger.debug('No relevant documents found above threshold');
return '';
}
// Format results as context
const context = this.formatContext(relevant);
logger.debug(`Retrieved ${relevant.length} relevant chunks`);
return context;
} catch (error) {
logger.error('Retrieval failed', error);
throw new RagError(
`Retrieval failed: ${error instanceof Error ? error.message : 'Unknown error'}`,
error instanceof Error ? error : undefined
);
}
}
private formatContext(results: SearchResult[]): string {
return results
.map((result, index) => {
const source = result.metadata.filename ?? result.metadata.source ?? 'Unknown';
return `[Document ${index + 1}] (Source: ${source}, Relevance: ${(result.score * 100).toFixed(1)}%)
${result.content}`;
})
.join('\n\n---\n\n');
}
// Add documents at runtime
async addDocument(content: string, metadata: Record<string, unknown> = {}): Promise<void> {
const chunks = chunkDocuments([
{
id: `runtime-${Date.now()}`,
content,
metadata,
},
]);
const embeddings = await this.embeddingService.embedBatch(chunks.map((c) => c.content));
this.vectorStore.addBatch(chunks, embeddings);
logger.info(`Added document with ${chunks.length} chunks`);
}
// Get statistics
getStats(): { documentCount: number; initialized: boolean } {
return {
documentCount: this.vectorStore.size(),
initialized: this.initialized,
};
}
// Clear all documents
clear(): void {
this.vectorStore.clear();
logger.info('Retriever cleared');
}
}
// Create a retriever function for the assistant
export function createRetriever(options?: RetrieverOptions): (query: string) => Promise<string> {
const retriever = new Retriever(options);
let initPromise: Promise<void> | null = null;
return async (query: string): Promise<string> => {
// Lazy initialization
if (!initPromise) {
initPromise = retriever.initialize();
}
await initPromise;
return retriever.retrieve(query);
};
}
Integrating RAG with the Assistant
Update your assistant to use RAG. Modify src/index.ts:
import * as readline from 'readline';
import { Assistant } from './core/assistant.js';
import { createRetriever } from './rag/retriever.js';
import { createLogger } from './utils/logger.js';
const logger = createLogger('Main');
async function main() {
console.log('AI Knowledge Assistant');
console.log('======================');
console.log('Type your message and press Enter.');
console.log('Commands: /clear (clear history), /status (show stats), /exit (quit)');
console.log('');
// Create assistant with RAG
const assistant = new Assistant({
enableRag: true,
});
// Configure RAG retriever
const retriever = createRetriever();
assistant.setRagRetriever(retriever);
console.log('Initializing knowledge base...');
// Trigger initialization by making a dummy query
try {
await retriever('initialize');
console.log('Knowledge base ready!\n');
} catch (error) {
console.log('Warning: Could not load documents. Continuing without RAG.\n');
}
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
});
const prompt = () => {
rl.question('You: ', async (input) => {
const trimmed = input.trim();
if (!trimmed) {
prompt();
return;
}
// Handle commands
if (trimmed === '/exit') {
console.log('Goodbye!');
rl.close();
process.exit(0);
}
if (trimmed === '/clear') {
assistant.clearHistory();
console.log('Conversation history cleared.\n');
prompt();
return;
}
if (trimmed === '/status') {
const history = assistant.getHistory();
console.log(`Messages in history: ${history.length}`);
console.log('');
prompt();
return;
}
// Process message with streaming
process.stdout.write('Assistant: ');
try {
for await (const chunk of assistant.chatStream(trimmed)) {
if (chunk.type === 'text' && chunk.content) {
process.stdout.write(chunk.content);
} else if (chunk.type === 'tool_call' && chunk.toolCall) {
process.stdout.write(`\n[Using tool: ${chunk.toolCall.name}]\n`);
}
}
console.log('\n');
} catch (error) {
console.error(`\nError: ${error instanceof Error ? error.message : error}\n`);
}
prompt();
});
};
prompt();
}
main().catch((error) => {
logger.error('Fatal error', error);
process.exit(1);
});
Creating Sample Documents
Add some documents to test RAG. Create documents/typescript-guide.md:
# TypeScript Quick Reference
## Variables
TypeScript supports three ways to declare variables:
- `let` - Block-scoped, can be reassigned
- `const` - Block-scoped, cannot be reassigned
- `var` - Function-scoped (avoid using)
Example:
```typescript
let name = 'Alice';
const age = 30;
name = 'Bob'; // OK
// age = 31; // Error!
```
Types
Basic Types
string- Text valuesnumber- All numbers (integer and floating point)boolean- true or falsenullandundefined- Absence of valueany- Opt out of type checking (avoid when possible)
Arrays
const numbers: number[] = [1, 2, 3];
const names: Array<string> = ['Alice', 'Bob'];
Objects and Interfaces
interface User {
name: string;
age: number;
email?: string; // Optional
}
const user: User = {
name: 'Alice',
age: 30,
};
Functions
Function Types
function add(a: number, b: number): number {
return a + b;
}
const multiply = (a: number, b: number): number => a * b;
Optional Parameters
function greet(name: string, greeting?: string): string {
return `${greeting ?? 'Hello'}, ${name}!`;
}
Generics
Generics allow creating reusable components:
function identity<T>(value: T): T {
return value;
}
const num = identity(42); // number
const str = identity('hello'); // string
Common Patterns
Type Guards
function isString(value: unknown): value is string {
return typeof value === 'string';
}
Utility Types
Partial<T>- Makes all properties optionalRequired<T>- Makes all properties requiredPick<T, K>- Select specific propertiesOmit<T, K>- Remove specific properties
Create `documents/ai-assistant-features.md`:
```markdown
# AI Assistant Features
## Overview
This AI assistant provides intelligent help with various tasks. It combines the power of large language models with custom tools and knowledge retrieval.
## Core Capabilities
### 1. Conversational Memory
The assistant remembers your conversation history within a session. You can:
- Ask follow-up questions without repeating context
- Reference previous topics
- Build on earlier responses
### 2. Knowledge Base
The assistant can search through loaded documents to find relevant information. This is useful for:
- Company documentation
- Technical references
- Personal notes
- Any text-based knowledge
### 3. Tool Usage
The assistant can use various tools:
**Calculator**: Perform mathematical operations
- Basic arithmetic: addition, subtraction, multiplication, division
- Complex expressions: powers, roots, percentages
**Weather**: Get current weather information
- Requires location (city name)
- Returns temperature and conditions
**Notes**: Save and retrieve information
- Persistent storage across conversations
- Useful for remembering important details
## Best Practices
### Asking Questions
1. **Be specific**: "What is the syntax for TypeScript generics?" is better than "Tell me about TypeScript"
2. **Provide context**: If your question relates to something specific, mention it
3. **One topic at a time**: For complex questions, break them into parts
### Using Tools
The assistant automatically decides when to use tools. You can also explicitly request them:
- "Calculate 15% of 250"
- "What's the weather in Tokyo?"
- "Save this note: Meeting at 3pm tomorrow"
## Limitations
- Knowledge is limited to loaded documents
- Cannot browse the internet in real-time
- Weather requires API key configuration
- Conversation history is not persisted between sessions
Testing RAG
Run your assistant and test RAG:
npm start
Example interaction:
AI Knowledge Assistant
======================
Type your message and press Enter.
Commands: /clear (clear history), /status (show stats), /exit (quit)
Initializing knowledge base...
Knowledge base ready!
You: What types of variables does TypeScript support?
Assistant: Based on the documentation, TypeScript supports three ways to declare variables:
1. **`let`** - Block-scoped and can be reassigned
2. **`const`** - Block-scoped and cannot be reassigned
3. **`var`** - Function-scoped (though it's recommended to avoid using this)
Here's an example:
```typescript
let name = "Alice";
const age = 30;
name = "Bob"; // This works
// age = 31; // This would cause an error!
The key difference is that let allows reassignment while const creates an immutable binding.
You: What tools does the assistant have? Assistant: According to the knowledge base, the assistant has three main tools:
-
Calculator - For mathematical operations including:
- Basic arithmetic (addition, subtraction, multiplication, division)
- Complex expressions (powers, roots, percentages)
-
Weather - Gets current weather information:
- Requires a location (city name)
- Returns temperature and conditions
-
Notes - For saving and retrieving information:
- Provides persistent storage across conversations
- Useful for remembering important details
The assistant automatically decides when to use these tools based on your questions, but you can also explicitly request them.
---
## Key Takeaways
1. **Document loading** reads files from a directory into a standard format
2. **Chunking** splits large documents into retrievable pieces
3. **Embeddings** convert text into vectors for semantic search
4. **Vector stores** enable fast similarity search
5. **The retriever** combines all components for end-to-end retrieval
6. **Integration** with the assistant augments the system prompt with context
---
## Practice Exercise
1. Add support for PDF documents (using a PDF parsing library)
2. Implement document metadata filtering (e.g., search only recent documents)
3. Add a `/docs` command to list indexed documents
4. Implement hybrid search combining keyword and semantic matching
5. Add relevance feedback to improve retrieval over time
---
## Next Steps
Your assistant now has knowledge! In the next lesson, you will add tools to give it the ability to take actions.
[Continue to Lesson 6.4: Integrating Tools](./04-integrating-tools.md)