From Zero to AI

Lesson 1.3: Types of ML Tasks

Duration: 60 minutes

Learning Objectives

By the end of this lesson, you will be able to:

  • Identify the main categories of machine learning tasks
  • Understand when to use classification, regression, or generation
  • Recognize supervised vs unsupervised learning
  • Match real-world problems to appropriate ML approaches
  • Understand what types of tasks LLMs can handle

The Three Main Categories

Machine learning tasks fall into three broad categories based on what you want the model to output:

┌─────────────────────────────────────────────────────────┐
│                    ML Task Types                         │
│                                                         │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐     │
│  │Classification│  │ Regression  │  │ Generation  │     │
│  │             │  │             │  │             │     │
│  │ Which       │  │ How much?   │  │ Create      │     │
│  │ category?   │  │ What value? │  │ something   │     │
│  │             │  │             │  │ new         │     │
│  └─────────────┘  └─────────────┘  └─────────────┘     │
│                                                         │
│  Examples:        Examples:        Examples:           │
│  - Spam/not spam  - House price   - Write text        │
│  - Cat/dog        - Temperature   - Create images     │
│  - Sentiment      - Stock price   - Generate code     │
│                                                         │
└─────────────────────────────────────────────────────────┘

Let us explore each one.


Classification: Which Category?

Classification assigns inputs to predefined categories. The model answers "which one?"

Binary Classification

Two possible outcomes:

// Spam detection
type SpamPrediction = 'spam' | 'not_spam';

async function classifyEmail(email: Email): Promise<SpamPrediction> {
  // Model returns one of two categories
  return await model.predict(email);
}

// Examples:
// "You won $1000!" -> "spam"
// "Meeting at 3pm" -> "not_spam"

Common binary classification tasks:

  • Spam detection (spam / not spam)
  • Fraud detection (fraudulent / legitimate)
  • Medical diagnosis (disease present / absent)
  • Sentiment (positive / negative)

Multi-class Classification

Multiple possible categories (pick one):

// Document categorization
type DocumentCategory = 'sports' | 'politics' | 'technology' | 'entertainment' | 'business';

async function categorizeArticle(article: string): Promise<DocumentCategory> {
  return await model.predict(article);
}

// Examples:
// "The team won the championship..." -> "sports"
// "New AI chip announced..." -> "technology"

Common multi-class tasks:

  • Email routing (billing / support / sales / other)
  • Language detection (English / Spanish / French / ...)
  • Image recognition (cat / dog / bird / ...)
  • Intent detection (book flight / check status / cancel / ...)

Multi-label Classification

Multiple categories can apply simultaneously:

// Movie genre classification
type Genre = 'action' | 'comedy' | 'drama' | 'horror' | 'romance';

async function classifyMovie(movie: Movie): Promise<Genre[]> {
  // Returns multiple applicable genres
  return await model.predict(movie);
}

// Examples:
// "Romantic Comedy" -> ["comedy", "romance"]
// "Action Horror" -> ["action", "horror"]

Common multi-label tasks:

  • Content tagging (blog post topics)
  • Product categorization (multiple categories)
  • Skill detection (multiple skills in a resume)

Classification in Code

Here is how you might use classification via an API:

// Using an LLM for classification
async function classifySentiment(text: string): Promise<string> {
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [
      {
        role: 'system',
        content: `Classify the sentiment of the text as exactly one of: 
                  positive, negative, or neutral. 
                  Respond with only the classification.`,
      },
      {
        role: 'user',
        content: text,
      },
    ],
  });

  return response.choices[0].message.content.trim().toLowerCase();
}

// Usage
const sentiment = await classifySentiment('I love this product!');
console.log(sentiment); // "positive"

Regression: How Much?

Regression predicts a continuous numerical value. The model answers "how much?" or "what number?"

Examples of Regression

// House price prediction
type HouseFeatures = {
  squareFeet: number;
  bedrooms: number;
  bathrooms: number;
  yearBuilt: number;
  zipCode: string;
};

async function predictPrice(house: HouseFeatures): Promise<number> {
  // Returns a specific dollar amount
  return await model.predict(house);
}

// Example:
// { squareFeet: 2000, bedrooms: 3, ... } -> $425,000

Classification vs Regression

The key difference is the output type:

// Classification: discrete categories
type CreditRiskCategory = 'low' | 'medium' | 'high';
async function classifyCreditRisk(applicant: Applicant): Promise<CreditRiskCategory> {
  return await model.predict(applicant); // Returns a category
}

// Regression: continuous number
async function predictDefaultProbability(applicant: Applicant): Promise<number> {
  return await model.predict(applicant); // Returns 0.0 to 1.0
}

// Classification: "This applicant is high risk"
// Regression: "This applicant has 73.2% chance of default"

Common Regression Tasks

Task Input Output
House price prediction House features Dollar amount
Temperature forecasting Weather data Degrees
Stock price prediction Market data Price
Delivery time estimation Order details Minutes
Age estimation Photo Years
Energy consumption Building data Kilowatt-hours

Regression with LLMs

LLMs can do regression, though they are not specialized for it:

async function estimateReadingTime(article: string): Promise<number> {
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [
      {
        role: 'system',
        content: `Estimate the reading time in minutes for the given article. 
                  Consider complexity and length. 
                  Respond with only a number.`,
      },
      {
        role: 'user',
        content: article,
      },
    ],
  });

  return parseInt(response.choices[0].message.content.trim());
}

Generation: Create Something New

Generation creates new content that did not exist before. This is where LLMs truly shine.

Text Generation

// Generate a product description
async function generateDescription(product: Product): Promise<string> {
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [
      {
        role: 'system',
        content: 'Write compelling product descriptions for an e-commerce site.',
      },
      {
        role: 'user',
        content: `Product: ${product.name}
                  Features: ${product.features.join(', ')}
                  Target audience: ${product.audience}`,
      },
    ],
  });

  return response.choices[0].message.content;
}

// Input: { name: "Wireless Earbuds", features: ["noise canceling", "24hr battery"], ... }
// Output: "Experience pure audio freedom with our premium Wireless Earbuds..."

Code Generation

// Generate code from description
async function generateFunction(description: string): Promise<string> {
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [
      {
        role: 'system',
        content: 'Generate TypeScript code based on the description. Include types.',
      },
      {
        role: 'user',
        content: description,
      },
    ],
  });

  return response.choices[0].message.content;
}

// Input: "A function that filters an array to keep only even numbers"
// Output: "function filterEven(numbers: number[]): number[] { ... }"

Other Generation Tasks

Task Input Output
Summarization Long text Short summary
Translation Text in one language Text in another
Question answering Question + context Answer
Chatbot response Conversation history Reply
Image generation Text description Image
Music generation Style/mood Audio

Why LLMs Excel at Generation

LLMs were trained to predict "what comes next" in text. This makes them natural generators:

// LLM training objective: predict next token
// Input: "The capital of France is"
// LLM learns: "Paris" is likely next

// This same ability enables generation:
// Input: "Write a haiku about programming"
// LLM generates token by token:
// "Code" -> "flows" -> "like" -> "water" -> ...

Supervised vs Unsupervised Learning

Another way to categorize ML is by how models learn:

Supervised Learning

Model learns from labeled examples (input + correct answer):

// Supervised: we tell the model the right answer
const labeledData = [
  { email: 'You won!', label: 'spam' },
  { email: 'Meeting at 3', label: 'not_spam' },
  // Model learns the pattern from these examples
];

Most classification and regression tasks use supervised learning.

Unsupervised Learning

Model finds patterns without labeled data:

// Unsupervised: no labels, model finds structure
const unlabeledData = [
  { spending: 100, visits: 5, age: 25 },
  { spending: 5000, visits: 50, age: 45 },
  { spending: 50, visits: 2, age: 30 },
  // Model might discover "high-value" vs "casual" customer groups
];

// Clustering: group similar items
const clusters = await model.cluster(unlabeledData);
// Returns: { cluster1: [...], cluster2: [...] }

Common unsupervised tasks:

  • Clustering: Group similar items (customer segmentation)
  • Anomaly detection: Find unusual patterns (fraud detection)
  • Dimensionality reduction: Simplify complex data

Self-Supervised Learning

LLMs use a clever trick: they create their own labels from data:

// Self-supervised: the data creates its own labels
const text = 'The cat sat on the mat';

// Training examples generated automatically:
// Input: "The cat sat on the" -> Label: "mat"
// Input: "The cat sat on" -> Label: "the"
// Input: "The cat sat" -> Label: "on"

// No human labeling needed!

This is how LLMs can train on trillions of words without manual labeling.


Choosing the Right Approach

How do you know which type of ML task you need?

Decision Framework

┌─────────────────────────────────────────────────────────┐
│                    What do you need?                     │
│                                                         │
│  "Assign to a category"  ──────> Classification         │
│                                                         │
│  "Predict a number"      ──────> Regression             │
│                                                         │
│  "Create new content"    ──────> Generation             │
│                                                         │
│  "Find groups/patterns"  ──────> Clustering             │
│                                                         │
│  "Find outliers"         ──────> Anomaly Detection      │
│                                                         │
└─────────────────────────────────────────────────────────┘

Example Scenarios

Scenario Task Type Why
"Should we approve this loan?" Classification Yes/No decision
"How much should we charge?" Regression Continuous number
"Write an email reply" Generation Create new content
"Group similar customers" Clustering Find natural groups
"Is this transaction suspicious?" Classification or Anomaly Category or outlier
"What will sales be next month?" Regression Predict a number
"Summarize this document" Generation Create new text

LLMs Can Do Many Tasks

Modern LLMs are versatile and can handle multiple task types:

// Same model, different tasks

// Classification
const category = await llm.complete(
  'Classify this email as billing, support, or sales: [email text]'
);

// Regression (sort of)
const estimate = await llm.complete('Estimate the word count of a 5-page document: ');

// Generation
const summary = await llm.complete('Summarize this article in 3 sentences: [article]');

// Extraction (a form of classification/generation)
const entities = await llm.complete('Extract all company names from this text: [text]');

Specialized Models vs General LLMs

When should you use a specialized model vs an LLM?

Specialized Models

Better for:

  • High-volume, low-latency needs
  • Specific, well-defined tasks
  • Cost-sensitive applications
  • When you have good training data
// Specialized sentiment model
// - Fast (milliseconds)
// - Cheap (trained once, runs locally)
// - Accurate for this specific task
const sentiment = await sentimentModel.predict(text);

General LLMs

Better for:

  • Flexible, varied tasks
  • Natural language understanding
  • Complex reasoning
  • When you do not have training data
  • Rapid prototyping
// LLM for sentiment
// - Slower (hundreds of milliseconds)
// - Per-request cost
// - Can handle nuance and context
// - No training data needed
const sentiment = await llm.complete('What is the sentiment of: ' + text);

The Practical Reality

For most developers building applications:

  1. Start with LLMs - faster to build, no training needed
  2. Move to specialized models - when you need speed/cost optimization
  3. Combine both - use LLMs for complex tasks, specialized for simple ones

Exercises

Exercise 1: Classify the Task

For each scenario, identify the ML task type (classification, regression, generation, clustering):

  1. Predict how many stars a user will give a product
  2. Decide if a news article is real or fake
  3. Write a response to a customer complaint
  4. Group news articles by topic (without predefined categories)
  5. Estimate delivery time for a package
  6. Tag a photo with relevant keywords
  7. Translate a document from English to Spanish
Solution
  1. Regression - Predicting a number (1-5 stars)
  2. Classification - Binary category (real/fake)
  3. Generation - Creating new text
  4. Clustering - Finding groups without labels
  5. Regression - Predicting a number (minutes/hours)
  6. Multi-label Classification - Multiple categories
  7. Generation - Creating new text (translation)

Exercise 2: Design the Solution

You are building a customer service system. For each requirement, describe what type of ML task you would use and why:

  1. Route incoming emails to the right department
  2. Suggest response templates to agents
  3. Predict how long a ticket will take to resolve
  4. Identify VIP customers who might churn
  5. Generate personalized responses to common questions
Solution
  1. Multi-class Classification - Route to billing/support/sales/returns

    • Input: email text
    • Output: one department
  2. Retrieval/Classification - Match to existing templates

    • Input: customer email
    • Output: relevant template IDs
  3. Regression - Predict resolution time

    • Input: ticket details
    • Output: hours/days
  4. Classification + Regression - Two-step approach

    • Classification: is customer at risk? (yes/no)
    • Regression: churn probability (0-100%)
  5. Generation - Create custom responses

    • Input: customer question + context
    • Output: personalized answer

Exercise 3: LLM Prompt Design

Write prompts that would make an LLM perform each task type:

  1. Binary classification (spam detection)
  2. Multi-class classification (emotion detection)
  3. Regression-like estimation (difficulty rating)
  4. Generation (product description)
Solution
// 1. Binary classification
const spamPrompt = `
Classify the following email as "spam" or "not_spam".
Respond with only one word: either "spam" or "not_spam".

Email: ${emailText}
`;

// 2. Multi-class classification
const emotionPrompt = `
Identify the primary emotion in this text.
Choose exactly one: happy, sad, angry, fearful, surprised, disgusted.
Respond with only the emotion word.

Text: ${text}
`;

// 3. Regression-like estimation
const difficultyPrompt = `
Rate the difficulty of this programming problem on a scale of 1-10,
where 1 is trivial and 10 is expert-level.
Respond with only a number.

Problem: ${problemDescription}
`;

// 4. Generation
const descriptionPrompt = `
Write a compelling product description for an e-commerce website.
Keep it under 100 words. Highlight key benefits.

Product name: ${name}
Features: ${features}
Target customer: ${audience}
`;

Key Takeaways

  1. Classification assigns inputs to categories (which one?)
  2. Regression predicts numerical values (how much?)
  3. Generation creates new content (what should I write?)
  4. Supervised learning uses labeled data; unsupervised finds patterns without labels
  5. LLMs can perform all these tasks through clever prompting
  6. Choose specialized models for high-volume/low-cost; LLMs for flexibility
  7. Start with LLMs, optimize with specialized models when needed

Resources

Resource Type Description
Google ML Concepts Tutorial ML terminology and task types
Scikit-learn: Choosing the Right Estimator Documentation Visual guide to ML algorithms
OpenAI Cookbook: Classification Tutorial Using GPT for classification

Next Lesson

Now that you understand the types of ML tasks, let us see how AI is being used in the real world across different industries.

Continue to Lesson 1.4: AI in the Real World