Lesson 4.2: Embeddings - Vector Representations

Duration: 60 minutes

Learning Objectives

By the end of this lesson, you will be able to:

Explain what embeddings are and how they capture meaning
Understand how similarity search works with vectors
Use embedding APIs from OpenAI and other providers
Compare different embedding models and their trade-offs

What are Embeddings

Embeddings are numerical representations of text that capture semantic meaning. They convert words, sentences, or documents into arrays of numbers (vectors) where similar meanings result in similar numbers.

Think of embeddings as coordinates in a meaning space:

"happy" → [0.8, 0.2, 0.1, ...]
"joyful" → [0.75, 0.25, 0.12, ...]  (close to "happy")
"sad" → [-0.7, 0.1, 0.3, ...]       (far from "happy")

The key insight is that mathematical operations on these vectors correspond to semantic operations on the text:

Distance between vectors = semantic difference
Similar vectors = similar meanings
Search = find vectors closest to query vector

How Embeddings Work

Embedding models are neural networks trained to place semantically similar text close together in vector space:

Training Process:
1. Feed millions of text examples to the model
2. Model learns relationships between words and concepts
3. Model produces consistent vector representations

Result:
- "dog" and "puppy" get similar vectors
- "dog" and "automobile" get different vectors
- "king - man + woman ≈ queen" (vector arithmetic works!)

Modern embedding models produce vectors with hundreds or thousands of dimensions. OpenAI's text-embedding-3-small produces 1536-dimensional vectors by default.

Similarity Measurement

To find relevant documents, we measure how similar two vectors are. The most common metric is cosine similarity:

Cosine Similarity = (A · B) / (|A| × |B|)

Where:
- A · B is the dot product of vectors A and B
- |A| and |B| are the magnitudes of the vectors

Result ranges from -1 to 1:
- 1 = identical direction (most similar)
- 0 = perpendicular (unrelated)
- -1 = opposite direction (most different)

In practice, you rarely calculate this yourself. Vector databases handle similarity search efficiently.

Using OpenAI Embeddings

Here is how to generate embeddings with OpenAI:

import OpenAI from 'openai';

const openai = new OpenAI();

async function getEmbedding(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text,
  });

  return response.data[0].embedding;
}

// Example usage
const embedding = await getEmbedding('How do I reset my password?');
console.log(`Dimensions: ${embedding.length}`); // 1536
console.log(`First 5 values: ${embedding.slice(0, 5)}`);

You can embed multiple texts in a single request:

async function getEmbeddings(texts: string[]): Promise<number[][]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: texts,
  });

  // Sort by index to maintain order
  return response.data.sort((a, b) => a.index - b.index).map((item) => item.embedding);
}

// Embed multiple documents at once
const documents = [
  'Password reset instructions',
  'Account security settings',
  'Two-factor authentication guide',
];

const embeddings = await getEmbeddings(documents);
console.log(`Generated ${embeddings.length} embeddings`);

Calculating Similarity

Here is how to calculate cosine similarity between two vectors:

function cosineSimilarity(a: number[], b: number[]): number {
  if (a.length !== b.length) {
    throw new Error('Vectors must have the same length');
  }

  let dotProduct = 0;
  let magnitudeA = 0;
  let magnitudeB = 0;

  for (let i = 0; i < a.length; i++) {
    dotProduct += a[i] * b[i];
    magnitudeA += a[i] * a[i];
    magnitudeB += b[i] * b[i];
  }

  magnitudeA = Math.sqrt(magnitudeA);
  magnitudeB = Math.sqrt(magnitudeB);

  if (magnitudeA === 0 || magnitudeB === 0) {
    return 0;
  }

  return dotProduct / (magnitudeA * magnitudeB);
}

// Example: Find most similar document
async function findMostSimilar(
  query: string,
  documents: string[]
): Promise<{ document: string; similarity: number }[]> {
  const queryEmbedding = await getEmbedding(query);
  const docEmbeddings = await getEmbeddings(documents);

  const results = documents.map((doc, index) => ({
    document: doc,
    similarity: cosineSimilarity(queryEmbedding, docEmbeddings[index]),
  }));

  return results.sort((a, b) => b.similarity - a.similarity);
}

Complete Example: Simple Semantic Search

Here is a complete example of semantic search without a vector database:

import OpenAI from 'openai';

const openai = new OpenAI();

// Simple in-memory document store
interface Document {
  id: string;
  content: string;
  embedding?: number[];
}

class SimpleVectorStore {
  private documents: Document[] = [];

  async addDocuments(docs: { id: string; content: string }[]): Promise<void> {
    const contents = docs.map((d) => d.content);

    const response = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: contents,
    });

    for (let i = 0; i < docs.length; i++) {
      this.documents.push({
        id: docs[i].id,
        content: docs[i].content,
        embedding: response.data[i].embedding,
      });
    }

    console.log(`Added ${docs.length} documents to store`);
  }

  async search(
    query: string,
    topK: number = 3
  ): Promise<{ id: string; content: string; score: number }[]> {
    const response = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: query,
    });

    const queryEmbedding = response.data[0].embedding;

    const results = this.documents.map((doc) => ({
      id: doc.id,
      content: doc.content,
      score: this.cosineSimilarity(queryEmbedding, doc.embedding!),
    }));

    return results.sort((a, b) => b.score - a.score).slice(0, topK);
  }

  private cosineSimilarity(a: number[], b: number[]): number {
    let dotProduct = 0;
    let magnitudeA = 0;
    let magnitudeB = 0;

    for (let i = 0; i < a.length; i++) {
      dotProduct += a[i] * b[i];
      magnitudeA += a[i] * a[i];
      magnitudeB += b[i] * b[i];
    }

    return dotProduct / (Math.sqrt(magnitudeA) * Math.sqrt(magnitudeB));
  }
}

// Usage
async function main() {
  const store = new SimpleVectorStore();

  // Add some documents
  await store.addDocuments([
    {
      id: 'doc1',
      content: 'To reset your password, go to Settings > Security > Reset Password',
    },
    {
      id: 'doc2',
      content: 'Enable two-factor authentication for additional account security',
    },
    {
      id: 'doc3',
      content: 'Contact support@example.com for billing inquiries',
    },
    {
      id: 'doc4',
      content: 'Our office hours are Monday to Friday, 9 AM to 5 PM EST',
    },
    {
      id: 'doc5',
      content: 'To change your email address, visit Account Settings > Profile',
    },
  ]);

  // Search for relevant documents
  const results = await store.search('How do I change my password?');

  console.log('\nSearch Results:');
  for (const result of results) {
    console.log(`\n[${result.id}] Score: ${result.score.toFixed(4)}`);
    console.log(`Content: ${result.content}`);
  }
}

main();

Expected output:

Added 5 documents to store

Search Results:

[doc1] Score: 0.8234
Content: To reset your password, go to Settings > Security > Reset Password

[doc5] Score: 0.7156
Content: To change your email address, visit Account Settings > Profile

[doc2] Score: 0.6823
Content: Enable two-factor authentication for additional account security

Notice how the password reset document scores highest, even though the query used "change" and the document uses "reset".

OpenAI Embedding Models

OpenAI offers several embedding models:

Model	Dimensions	Max Tokens	Best For
text-embedding-3-small	1536	8191	Cost-effective, general use
text-embedding-3-large	3072	8191	Higher accuracy, more storage
text-embedding-ada-002	1536	8191	Legacy model

Choosing Dimensions

The newer models support dimension reduction:

const response = await openai.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'Hello world',
  dimensions: 512, // Reduce from 1536 to 512
});

Lower dimensions mean:

Less storage space required
Faster similarity calculations
Slightly reduced accuracy

For most applications, the default dimensions work well.

Alternative Embedding Providers

Cohere Embeddings

import { CohereClient } from 'cohere-ai';

const cohere = new CohereClient({
  token: process.env.COHERE_API_KEY,
});

async function getCohereEmbedding(text: string): Promise<number[]> {
  const response = await cohere.embed({
    texts: [text],
    model: 'embed-english-v3.0',
    inputType: 'search_query',
  });

  return response.embeddings[0];
}

Local Embeddings with Transformers.js

For privacy-sensitive applications, you can run embeddings locally:

import { pipeline } from '@xenova/transformers';

// Load model once
const embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');

async function getLocalEmbedding(text: string): Promise<number[]> {
  const output = await embedder(text, {
    pooling: 'mean',
    normalize: true,
  });

  return Array.from(output.data);
}

Embedding Best Practices

1. Consistent Model Usage

Always use the same embedding model for indexing and querying:

// WRONG: Different models produce incompatible embeddings
const docEmbedding = await embed(doc, "text-embedding-3-small");
const queryEmbedding = await embed(query, "text-embedding-3-large");

// CORRECT: Same model for both
const docEmbedding = await embed(doc, "text-embedding-3-small");
const queryEmbedding = await embed(query, "text-embedding-3-small");

2. Text Preprocessing

Clean your text before embedding:

function preprocessText(text: string): string {
  return text
    .toLowerCase()
    .replace(/\s+/g, ' ') // Normalize whitespace
    .trim();
}

3. Batch Processing

Embed multiple texts in one request to reduce latency:

// SLOW: One request per document
for (const doc of documents) {
  const embedding = await getEmbedding(doc);
}

// FAST: Single request for all documents
const embeddings = await getEmbeddings(documents);

4. Caching

Cache embeddings to avoid redundant API calls:

const embeddingCache = new Map<string, number[]>();

async function getCachedEmbedding(text: string): Promise<number[]> {
  const cacheKey = text.toLowerCase().trim();

  if (embeddingCache.has(cacheKey)) {
    return embeddingCache.get(cacheKey)!;
  }

  const embedding = await getEmbedding(text);
  embeddingCache.set(cacheKey, embedding);

  return embedding;
}

Cost Considerations

Embedding API calls have costs:

Model	Price per 1M tokens
text-embedding-3-small	$0.02
text-embedding-3-large	$0.13

For a typical RAG application with 10,000 documents averaging 500 tokens each:

Indexing cost: 5M tokens = $0.10 (small) or $0.65 (large)
Query cost: Minimal (queries are short)

Embeddings are cached in your vector database, so you only pay once per document.

Key Takeaways

Embeddings convert text to vectors that capture semantic meaning
Similar meanings produce similar vectors, enabling semantic search
Cosine similarity measures how related two vectors are
Use the same model for indexing documents and embedding queries
Batch embedding requests to improve performance and reduce costs
OpenAI text-embedding-3-small is cost-effective for most applications

Resources

Resource	Type	Level
OpenAI Embeddings Guide	Documentation	Beginner
What are Embeddings - Vicki Boykis	Article	Intermediate
Cohere Embed Documentation	Documentation	Beginner

Next Lesson

In the next lesson, you will learn about vector databases - specialized databases designed to store and search embeddings efficiently at scale.

Continue to Lesson 4.3: Vector Databases