From Zero to AI

Lesson 1.2: How Models Are Trained

Duration: 60 minutes

Learning Objectives

By the end of this lesson, you will be able to:

  • Understand the concept of training data and why it matters
  • Explain the basic training loop (predict, compare, adjust)
  • Understand what "weights" and "parameters" mean
  • Recognize overfitting and underfitting
  • Appreciate why good data is crucial for good models

The Core Idea: Learning from Examples

Machine learning is fundamentally about learning patterns from examples. Think of how you learned to recognize dogs:

  1. You saw many dogs as a child
  2. Adults told you "that's a dog"
  3. Eventually, you could recognize dogs you had never seen before

ML works the same way:

  1. Show the model many examples
  2. Tell it the correct answer for each
  3. Eventually, it can handle new examples it has never seen

Training Data: The Foundation

Training data is the collection of examples used to teach a model. Each example typically has:

  • Input: The data the model sees (image, text, numbers)
  • Label: The correct answer (what we want the model to predict)
// Example: Training data for sentiment analysis
type TrainingExample = {
  input: string; // The text
  label: 'positive' | 'negative'; // The correct sentiment
};

const trainingData: TrainingExample[] = [
  { input: 'I love this product!', label: 'positive' },
  { input: 'Terrible experience, avoid.', label: 'negative' },
  { input: 'Best purchase I ever made.', label: 'positive' },
  { input: 'Complete waste of money.', label: 'negative' },
  { input: 'Exceeded my expectations!', label: 'positive' },
  // ... thousands more examples
];

Quality Over Quantity

Good training data must be:

  1. Accurate: Labels must be correct
  2. Representative: Cover the range of real-world cases
  3. Balanced: Roughly equal examples of each category
  4. Clean: No duplicates, errors, or irrelevant data
// Bad training data - all positive examples!
const badData: TrainingExample[] = [
  { input: 'Great!', label: 'positive' },
  { input: 'Excellent!', label: 'positive' },
  { input: 'Amazing!', label: 'positive' },
  // Model will think everything is positive
];

// Good training data - balanced and diverse
const goodData: TrainingExample[] = [
  { input: 'Great product, works perfectly.', label: 'positive' },
  { input: 'Broke after one day.', label: 'negative' },
  { input: 'Does exactly what it says.', label: 'positive' },
  { input: 'Poor quality, disappointed.', label: 'negative' },
  // Mix of positive and negative
];

The Training Loop

Training happens in cycles. Each cycle follows the same pattern:

┌─────────────────────────────────────────────────────────┐
│                    Training Loop                         │
│                                                         │
│   1. PREDICT                                            │
│      Model makes a guess                                │
│            │                                            │
│            ▼                                            │
│   2. COMPARE                                            │
│      Check guess vs correct answer                      │
│            │                                            │
│            ▼                                            │
│   3. ADJUST                                             │
│      Update model to reduce error                       │
│            │                                            │
│            ▼                                            │
│      Repeat thousands/millions of times                 │
│                                                         │
└─────────────────────────────────────────────────────────┘

Step 1: Predict

The model sees an input and makes a prediction:

// Model sees: "This movie was boring"
// Model predicts: 60% positive, 40% negative
// (The model is not very good yet!)

type Prediction = {
  positive: number; // Probability 0-1
  negative: number;
};

const modelPrediction: Prediction = {
  positive: 0.6,
  negative: 0.4,
};

Step 2: Compare

We compare the prediction to the correct answer using a loss function:

// Correct answer: negative
// Model said: 60% positive

// Loss measures how wrong the model was
// Higher loss = more wrong
const correctLabel = 'negative';
const predictedProbability = modelPrediction.negative; // 0.4

// Simple loss: how far from correct?
// Should have predicted 1.0 for negative, got 0.4
const loss = 1.0 - predictedProbability; // 0.6 - significant error!

Step 3: Adjust

The model adjusts its internal parameters to reduce the error. This is called backpropagation - but you do not need to understand the math.

The key idea: if the model was wrong, adjust it to be less wrong next time.

// After adjustment:
// Model sees: "This movie was boring"
// Model predicts: 45% positive, 55% negative
// Better! But still not great.

// After more training:
// Model predicts: 15% positive, 85% negative
// Much better!

Repeat Many Times

Training involves millions of these cycles:

Epoch 1:    Loss = 0.82  (model is mostly guessing)
Epoch 10:   Loss = 0.54  (model is learning)
Epoch 100:  Loss = 0.21  (model is getting good)
Epoch 1000: Loss = 0.08  (model is quite accurate)

An epoch is one complete pass through all training data.


Weights and Parameters

A model is essentially a mathematical function with adjustable numbers called weights or parameters.

A Simple Analogy

Imagine a basic spam filter with weights:

// Simplified spam scorer
type SpamWeights = {
  freeWord: number; // How much "free" suggests spam
  exclamationMarks: number; // How much !! suggests spam
  knownSender: number; // How much known sender reduces spam
};

// Initial random weights (untrained)
let weights: SpamWeights = {
  freeWord: 0.1,
  exclamationMarks: 0.2,
  knownSender: -0.1,
};

function spamScore(email: Email): number {
  let score = 0;
  if (email.body.includes('free')) score += weights.freeWord;
  score += countExclamations(email.body) * weights.exclamationMarks;
  if (isKnownSender(email.sender)) score += weights.knownSender;
  return score;
}

Training Adjusts the Weights

Through training, the weights get better:

// After training on 10,000 emails:
weights = {
  freeWord: 0.45, // "free" is moderately spammy
  exclamationMarks: 0.72, // Multiple !! is very spammy
  knownSender: -0.89, // Known senders are rarely spam
};

The model "learned" that:

  • "Free" is somewhat indicative of spam
  • Multiple exclamation marks are very suspicious
  • Known senders are almost never spam

Real Models Have Millions of Parameters

Modern LLMs have billions of parameters:

Model Parameters
GPT-2 (2019) 1.5 billion
GPT-3 (2020) 175 billion
GPT-4 (2023) ~1.7 trillion (estimated)
Claude 3 Opus Not disclosed, likely 100B+

These parameters encode everything the model "knows" about language.


Training vs Inference

Two distinct phases:

Training: Teaching the model (adjusting weights)

  • Expensive (time, compute, data)
  • Done once or periodically
  • Requires labeled data

Inference: Using the trained model

  • Fast
  • Done many times
  • Works on new, unseen data
// Training phase (done by OpenAI/Anthropic)
const model = await train({
  data: trillionsOfTextExamples,
  epochs: manyIterations,
  computeResources: 'thousands of GPUs',
});
// This took months and millions of dollars

// Inference phase (what you do via API)
const response = await model.predict('Explain quantum physics');
// This takes seconds and costs fractions of a cent

As a developer using AI APIs, you only do inference. The training was done for you.


Overfitting and Underfitting

Two common problems in training:

Overfitting: Memorizing Instead of Learning

The model learns the training data too well, including its quirks and noise.

// Training data
const data = [
  { text: "John's review: Great product!", sentiment: 'positive' },
  { text: "Mary's review: Terrible!", sentiment: 'negative' },
];

// Overfitted model might learn:
// "If text contains 'John', it's positive"
// "If text contains 'Mary', it's negative"

// This fails on new data:
// "John's review: Worst purchase ever!" -> wrongly predicts positive

Signs of overfitting:

  • Great accuracy on training data
  • Poor accuracy on new data
  • Model seems to "memorize" rather than generalize

Underfitting: Not Learning Enough

The model is too simple to capture the patterns in the data.

// Model is too simple - just checks for "good" or "bad"
function oversimplifiedSentiment(text: string): string {
  if (text.includes('good')) return 'positive';
  if (text.includes('bad')) return 'negative';
  return 'neutral';
}

// Fails on:
// "Not good at all" -> wrongly predicts positive
// "I had a bad feeling but it turned out great" -> wrongly predicts negative

Signs of underfitting:

  • Poor accuracy on training data
  • Poor accuracy on new data
  • Model misses obvious patterns

The Goal: Just Right

┌─────────────────────────────────────────────────────────┐
│                                                         │
│  Underfitting          Just Right          Overfitting  │
│       │                    │                    │       │
│       ▼                    ▼                    ▼       │
│   Too simple           Good balance        Too complex  │
│   Misses patterns      Generalizes well    Memorizes    │
│   Poor everywhere      Good on new data    Good only on │
│                                            training     │
│                                                         │
└─────────────────────────────────────────────────────────┘

Validation and Test Sets

To detect overfitting, we split data into three parts:

// Split data: 70% training, 15% validation, 15% test
const allData = loadData(); // 10,000 examples

const trainingData = allData.slice(0, 7000); // Train on this
const validationData = allData.slice(7000, 8500); // Tune on this
const testData = allData.slice(8500); // Final evaluation

Why Three Sets?

  1. Training Set: Model learns from this
  2. Validation Set: Check progress during training, tune hyperparameters
  3. Test Set: Final evaluation on truly unseen data
Training accuracy:    98%  (model saw this data)
Validation accuracy:  92%  (model tuned on this)
Test accuracy:        91%  (truly unseen - this is the real measure!)

If training accuracy is much higher than test accuracy, you have overfitting.


How LLMs Are Trained

Large Language Models follow the same principles but at massive scale.

Phase 1: Pre-training

The model learns from vast amounts of text:

// Simplified: predict the next word
const trainingExample = {
  context: 'The cat sat on the',
  nextWord: 'mat', // Model learns to predict this
};

// Trained on trillions of such examples from:
// - Books
// - Websites
// - Wikipedia
// - Code repositories
// - Scientific papers

This is unsupervised - no human labels needed. The next word IS the label.

Phase 2: Fine-tuning

The model is refined for specific tasks:

// Fine-tuning for conversation
const conversationExample = {
  prompt: 'What is the capital of France?',
  idealResponse: 'The capital of France is Paris.',
};

// Human trainers create thousands of ideal responses

Phase 3: RLHF (Reinforcement Learning from Human Feedback)

Humans rate model outputs, and the model learns from these ratings:

// Model generates two responses
const response1 = 'Paris is the capital of France.';
const response2 = 'France, capital, Paris, yes.';

// Human rates: response1 is better
// Model adjusts to produce more responses like response1

This is why ChatGPT and Claude sound helpful and polished - they were trained on human preferences.


Why Training Takes So Long

Training modern AI models requires enormous resources:

Resource Requirement
Data Trillions of tokens (words/pieces)
Compute Thousands of high-end GPUs
Time Weeks to months
Cost Millions of dollars
Energy Equivalent to small towns

This is why only large companies train foundation models. Everyone else uses their APIs.


What This Means for Developers

As a developer, you do not train models. But understanding training helps you:

  1. Understand limitations: Models only know what they were trained on
  2. Write better prompts: You are working with the model's learned patterns
  3. Evaluate outputs: Knowing about biases and overfitting helps spot issues
  4. Choose the right model: Different training = different strengths
// The model was trained on data up to a cutoff date
// It does not know recent events!
const response = await llm.complete('Who won the 2024 Olympics 100m sprint?');
// Model might not know - this is after its training data

// Better approach: provide context
const response2 = await llm.complete(`
  Based on this information: [recent news article],
  who won the 2024 Olympics 100m sprint?
`);

Exercises

Exercise 1: Identify the Problem

For each scenario, identify whether it is overfitting, underfitting, or well-trained:

  1. A spam filter that gets 99% accuracy on training data but only 60% on new emails
  2. A sentiment model that classifies everything as "neutral"
  3. A translation model that works well on both training texts and user inputs
  4. A model that perfectly memorizes all training examples but fails on slightly different inputs
Solution
  1. Overfitting: High training accuracy, low real-world accuracy
  2. Underfitting: Model is too simple, defaults to one answer
  3. Well-trained: Good performance on both training and new data
  4. Overfitting: Memorization instead of learning patterns

Exercise 2: Design Training Data

You want to train a model to classify customer support emails into categories: billing, technical, shipping, general.

What characteristics should your training data have?

Solution

Good training data should have:

  1. Balance: Roughly equal examples of each category (or at least representative proportions)
  2. Variety: Different writing styles, lengths, levels of detail
  3. Edge cases: Emails that could fit multiple categories
  4. Clean labels: Each email correctly categorized
  5. Real examples: Actual customer emails, not fabricated ones
  6. Diversity: Different products, issues, customer types
  7. Sufficient quantity: At least hundreds per category, ideally thousands

Example distribution:

  • Billing: 2,500 emails
  • Technical: 2,500 emails
  • Shipping: 2,500 emails
  • General: 2,500 emails

Exercise 3: Training Loop Trace

Given this simplified training process, trace what happens:

// Initial model: always predicts 50% positive, 50% negative
// Training example: "I hate this!" - correct label: negative

// Round 1:
// Prediction: 50% positive, 50% negative
// Correct: negative (should be 0% positive, 100% negative)
// Error: model was 50% off on the negative prediction
// Adjustment: increase weight toward negative

// What do you expect after 5 rounds of training on similar examples?
Solution

After 5 rounds of training on similar negative examples:

The model would likely predict:

  • ~15-25% positive
  • ~75-85% negative

Why not 0%/100%? Because:

  1. Learning is gradual - small adjustments each round
  2. The model needs to generalize, not memorize
  3. Some uncertainty is healthy (prevents overconfidence)

After many more rounds with mixed examples:

  • The model learns to recognize negative words ("hate", "terrible", etc.)
  • It learns context matters ("I hate to admit I love this" is positive)
  • Confidence increases but is not absolute

Key Takeaways

  1. Training data is the foundation - garbage in, garbage out
  2. Training loop: predict → compare → adjust → repeat millions of times
  3. Weights/parameters are the numbers the model adjusts during learning
  4. Overfitting = memorizing; underfitting = not learning enough
  5. Validation/test sets help detect problems before deployment
  6. LLM training involves pre-training, fine-tuning, and human feedback
  7. As a developer, you use pre-trained models via APIs

Resources

Resource Type Description
Google ML Crash Course: Training Tutorial Interactive training concepts
3Blue1Brown: Gradient Descent Video Visual explanation of how learning works
OpenAI: GPT-4 Technical Report Paper How a modern LLM was trained

Next Lesson

Now that you understand how models learn, let us explore the different types of problems machine learning can solve.

Continue to Lesson 1.3: Types of ML Tasks