Lesson 4.2: Embeddings - Vector Representations
Duration: 60 minutes
Learning Objectives
By the end of this lesson, you will be able to:
- Explain what embeddings are and how they capture meaning
- Understand how similarity search works with vectors
- Use embedding APIs from OpenAI and other providers
- Compare different embedding models and their trade-offs
What are Embeddings
Embeddings are numerical representations of text that capture semantic meaning. They convert words, sentences, or documents into arrays of numbers (vectors) where similar meanings result in similar numbers.
Think of embeddings as coordinates in a meaning space:
"happy" → [0.8, 0.2, 0.1, ...]
"joyful" → [0.75, 0.25, 0.12, ...] (close to "happy")
"sad" → [-0.7, 0.1, 0.3, ...] (far from "happy")
The key insight is that mathematical operations on these vectors correspond to semantic operations on the text:
- Distance between vectors = semantic difference
- Similar vectors = similar meanings
- Search = find vectors closest to query vector
How Embeddings Work
Embedding models are neural networks trained to place semantically similar text close together in vector space:
Training Process:
1. Feed millions of text examples to the model
2. Model learns relationships between words and concepts
3. Model produces consistent vector representations
Result:
- "dog" and "puppy" get similar vectors
- "dog" and "automobile" get different vectors
- "king - man + woman ≈ queen" (vector arithmetic works!)
Modern embedding models produce vectors with hundreds or thousands of dimensions. OpenAI's text-embedding-3-small produces 1536-dimensional vectors by default.
Similarity Measurement
To find relevant documents, we measure how similar two vectors are. The most common metric is cosine similarity:
Cosine Similarity = (A · B) / (|A| × |B|)
Where:
- A · B is the dot product of vectors A and B
- |A| and |B| are the magnitudes of the vectors
Result ranges from -1 to 1:
- 1 = identical direction (most similar)
- 0 = perpendicular (unrelated)
- -1 = opposite direction (most different)
In practice, you rarely calculate this yourself. Vector databases handle similarity search efficiently.
Using OpenAI Embeddings
Here is how to generate embeddings with OpenAI:
import OpenAI from 'openai';
const openai = new OpenAI();
async function getEmbedding(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text,
});
return response.data[0].embedding;
}
// Example usage
const embedding = await getEmbedding('How do I reset my password?');
console.log(`Dimensions: ${embedding.length}`); // 1536
console.log(`First 5 values: ${embedding.slice(0, 5)}`);
You can embed multiple texts in a single request:
async function getEmbeddings(texts: string[]): Promise<number[][]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: texts,
});
// Sort by index to maintain order
return response.data.sort((a, b) => a.index - b.index).map((item) => item.embedding);
}
// Embed multiple documents at once
const documents = [
'Password reset instructions',
'Account security settings',
'Two-factor authentication guide',
];
const embeddings = await getEmbeddings(documents);
console.log(`Generated ${embeddings.length} embeddings`);
Calculating Similarity
Here is how to calculate cosine similarity between two vectors:
function cosineSimilarity(a: number[], b: number[]): number {
if (a.length !== b.length) {
throw new Error('Vectors must have the same length');
}
let dotProduct = 0;
let magnitudeA = 0;
let magnitudeB = 0;
for (let i = 0; i < a.length; i++) {
dotProduct += a[i] * b[i];
magnitudeA += a[i] * a[i];
magnitudeB += b[i] * b[i];
}
magnitudeA = Math.sqrt(magnitudeA);
magnitudeB = Math.sqrt(magnitudeB);
if (magnitudeA === 0 || magnitudeB === 0) {
return 0;
}
return dotProduct / (magnitudeA * magnitudeB);
}
// Example: Find most similar document
async function findMostSimilar(
query: string,
documents: string[]
): Promise<{ document: string; similarity: number }[]> {
const queryEmbedding = await getEmbedding(query);
const docEmbeddings = await getEmbeddings(documents);
const results = documents.map((doc, index) => ({
document: doc,
similarity: cosineSimilarity(queryEmbedding, docEmbeddings[index]),
}));
return results.sort((a, b) => b.similarity - a.similarity);
}
Complete Example: Simple Semantic Search
Here is a complete example of semantic search without a vector database:
import OpenAI from 'openai';
const openai = new OpenAI();
// Simple in-memory document store
interface Document {
id: string;
content: string;
embedding?: number[];
}
class SimpleVectorStore {
private documents: Document[] = [];
async addDocuments(docs: { id: string; content: string }[]): Promise<void> {
const contents = docs.map((d) => d.content);
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: contents,
});
for (let i = 0; i < docs.length; i++) {
this.documents.push({
id: docs[i].id,
content: docs[i].content,
embedding: response.data[i].embedding,
});
}
console.log(`Added ${docs.length} documents to store`);
}
async search(
query: string,
topK: number = 3
): Promise<{ id: string; content: string; score: number }[]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: query,
});
const queryEmbedding = response.data[0].embedding;
const results = this.documents.map((doc) => ({
id: doc.id,
content: doc.content,
score: this.cosineSimilarity(queryEmbedding, doc.embedding!),
}));
return results.sort((a, b) => b.score - a.score).slice(0, topK);
}
private cosineSimilarity(a: number[], b: number[]): number {
let dotProduct = 0;
let magnitudeA = 0;
let magnitudeB = 0;
for (let i = 0; i < a.length; i++) {
dotProduct += a[i] * b[i];
magnitudeA += a[i] * a[i];
magnitudeB += b[i] * b[i];
}
return dotProduct / (Math.sqrt(magnitudeA) * Math.sqrt(magnitudeB));
}
}
// Usage
async function main() {
const store = new SimpleVectorStore();
// Add some documents
await store.addDocuments([
{
id: 'doc1',
content: 'To reset your password, go to Settings > Security > Reset Password',
},
{
id: 'doc2',
content: 'Enable two-factor authentication for additional account security',
},
{
id: 'doc3',
content: 'Contact support@example.com for billing inquiries',
},
{
id: 'doc4',
content: 'Our office hours are Monday to Friday, 9 AM to 5 PM EST',
},
{
id: 'doc5',
content: 'To change your email address, visit Account Settings > Profile',
},
]);
// Search for relevant documents
const results = await store.search('How do I change my password?');
console.log('\nSearch Results:');
for (const result of results) {
console.log(`\n[${result.id}] Score: ${result.score.toFixed(4)}`);
console.log(`Content: ${result.content}`);
}
}
main();
Expected output:
Added 5 documents to store
Search Results:
[doc1] Score: 0.8234
Content: To reset your password, go to Settings > Security > Reset Password
[doc5] Score: 0.7156
Content: To change your email address, visit Account Settings > Profile
[doc2] Score: 0.6823
Content: Enable two-factor authentication for additional account security
Notice how the password reset document scores highest, even though the query used "change" and the document uses "reset".
OpenAI Embedding Models
OpenAI offers several embedding models:
| Model | Dimensions | Max Tokens | Best For |
|---|---|---|---|
| text-embedding-3-small | 1536 | 8191 | Cost-effective, general use |
| text-embedding-3-large | 3072 | 8191 | Higher accuracy, more storage |
| text-embedding-ada-002 | 1536 | 8191 | Legacy model |
Choosing Dimensions
The newer models support dimension reduction:
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: 'Hello world',
dimensions: 512, // Reduce from 1536 to 512
});
Lower dimensions mean:
- Less storage space required
- Faster similarity calculations
- Slightly reduced accuracy
For most applications, the default dimensions work well.
Alternative Embedding Providers
Cohere Embeddings
import { CohereClient } from 'cohere-ai';
const cohere = new CohereClient({
token: process.env.COHERE_API_KEY,
});
async function getCohereEmbedding(text: string): Promise<number[]> {
const response = await cohere.embed({
texts: [text],
model: 'embed-english-v3.0',
inputType: 'search_query',
});
return response.embeddings[0];
}
Local Embeddings with Transformers.js
For privacy-sensitive applications, you can run embeddings locally:
import { pipeline } from '@xenova/transformers';
// Load model once
const embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
async function getLocalEmbedding(text: string): Promise<number[]> {
const output = await embedder(text, {
pooling: 'mean',
normalize: true,
});
return Array.from(output.data);
}
Embedding Best Practices
1. Consistent Model Usage
Always use the same embedding model for indexing and querying:
// WRONG: Different models produce incompatible embeddings
const docEmbedding = await embed(doc, "text-embedding-3-small");
const queryEmbedding = await embed(query, "text-embedding-3-large");
// CORRECT: Same model for both
const docEmbedding = await embed(doc, "text-embedding-3-small");
const queryEmbedding = await embed(query, "text-embedding-3-small");
2. Text Preprocessing
Clean your text before embedding:
function preprocessText(text: string): string {
return text
.toLowerCase()
.replace(/\s+/g, ' ') // Normalize whitespace
.trim();
}
3. Batch Processing
Embed multiple texts in one request to reduce latency:
// SLOW: One request per document
for (const doc of documents) {
const embedding = await getEmbedding(doc);
}
// FAST: Single request for all documents
const embeddings = await getEmbeddings(documents);
4. Caching
Cache embeddings to avoid redundant API calls:
const embeddingCache = new Map<string, number[]>();
async function getCachedEmbedding(text: string): Promise<number[]> {
const cacheKey = text.toLowerCase().trim();
if (embeddingCache.has(cacheKey)) {
return embeddingCache.get(cacheKey)!;
}
const embedding = await getEmbedding(text);
embeddingCache.set(cacheKey, embedding);
return embedding;
}
Cost Considerations
Embedding API calls have costs:
| Model | Price per 1M tokens |
|---|---|
| text-embedding-3-small | $0.02 |
| text-embedding-3-large | $0.13 |
For a typical RAG application with 10,000 documents averaging 500 tokens each:
- Indexing cost: 5M tokens = $0.10 (small) or $0.65 (large)
- Query cost: Minimal (queries are short)
Embeddings are cached in your vector database, so you only pay once per document.
Key Takeaways
- Embeddings convert text to vectors that capture semantic meaning
- Similar meanings produce similar vectors, enabling semantic search
- Cosine similarity measures how related two vectors are
- Use the same model for indexing documents and embedding queries
- Batch embedding requests to improve performance and reduce costs
- OpenAI text-embedding-3-small is cost-effective for most applications
Resources
| Resource | Type | Level |
|---|---|---|
| OpenAI Embeddings Guide | Documentation | Beginner |
| What are Embeddings - Vicki Boykis | Article | Intermediate |
| Cohere Embed Documentation | Documentation | Beginner |
Next Lesson
In the next lesson, you will learn about vector databases - specialized databases designed to store and search embeddings efficiently at scale.