Lesson 1.3: Types of ML Tasks

Duration: 60 minutes

Learning Objectives

By the end of this lesson, you will be able to:

Identify the main categories of machine learning tasks
Understand when to use classification, regression, or generation
Recognize supervised vs unsupervised learning
Match real-world problems to appropriate ML approaches
Understand what types of tasks LLMs can handle

The Three Main Categories

Machine learning tasks fall into three broad categories based on what you want the model to output:

Let us explore each one.

Classification: Which Category?

Classification assigns inputs to predefined categories. The model answers "which one?"

Binary Classification

Two possible outcomes:

// Spam detection
type SpamPrediction = 'spam' | 'not_spam';

async function classifyEmail(email: Email): Promise<SpamPrediction> {
  // Model returns one of two categories
  return await model.predict(email);
}

// Examples:
// "You won $1000!" -> "spam"
// "Meeting at 3pm" -> "not_spam"

Common binary classification tasks:

Spam detection (spam / not spam)
Fraud detection (fraudulent / legitimate)
Medical diagnosis (disease present / absent)
Sentiment (positive / negative)

Multi-class Classification

Multiple possible categories (pick one):

// Document categorization
type DocumentCategory = 'sports' | 'politics' | 'technology' | 'entertainment' | 'business';

async function categorizeArticle(article: string): Promise<DocumentCategory> {
  return await model.predict(article);
}

// Examples:
// "The team won the championship..." -> "sports"
// "New AI chip announced..." -> "technology"

Common multi-class tasks:

Email routing (billing / support / sales / other)
Language detection (English / Spanish / French / ...)
Image recognition (cat / dog / bird / ...)
Intent detection (book flight / check status / cancel / ...)

Multi-label Classification

Multiple categories can apply simultaneously:

// Movie genre classification
type Genre = 'action' | 'comedy' | 'drama' | 'horror' | 'romance';

async function classifyMovie(movie: Movie): Promise<Genre[]> {
  // Returns multiple applicable genres
  return await model.predict(movie);
}

// Examples:
// "Romantic Comedy" -> ["comedy", "romance"]
// "Action Horror" -> ["action", "horror"]

Common multi-label tasks:

Content tagging (blog post topics)
Product categorization (multiple categories)
Skill detection (multiple skills in a resume)

Classification in Code

Here is how you might use classification via an API:

// Using an LLM for classification
async function classifySentiment(text: string): Promise<string> {
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [
      {
        role: 'system',
        content: `Classify the sentiment of the text as exactly one of: 
                  positive, negative, or neutral. 
                  Respond with only the classification.`,
      },
      {
        role: 'user',
        content: text,
      },
    ],
  });

  return response.choices[0].message.content.trim().toLowerCase();
}

// Usage
const sentiment = await classifySentiment('I love this product!');
console.log(sentiment); // "positive"

Regression: How Much?

Regression predicts a continuous numerical value. The model answers "how much?" or "what number?"

Examples of Regression

// House price prediction
type HouseFeatures = {
  squareFeet: number;
  bedrooms: number;
  bathrooms: number;
  yearBuilt: number;
  zipCode: string;
};

async function predictPrice(house: HouseFeatures): Promise<number> {
  // Returns a specific dollar amount
  return await model.predict(house);
}

// Example:
// { squareFeet: 2000, bedrooms: 3, ... } -> $425,000

Classification vs Regression

The key difference is the output type:

// Classification: discrete categories
type CreditRiskCategory = 'low' | 'medium' | 'high';
async function classifyCreditRisk(applicant: Applicant): Promise<CreditRiskCategory> {
  return await model.predict(applicant); // Returns a category
}

// Regression: continuous number
async function predictDefaultProbability(applicant: Applicant): Promise<number> {
  return await model.predict(applicant); // Returns 0.0 to 1.0
}

// Classification: "This applicant is high risk"
// Regression: "This applicant has 73.2% chance of default"

Common Regression Tasks

Task	Input	Output
House price prediction	House features	Dollar amount
Temperature forecasting	Weather data	Degrees
Stock price prediction	Market data	Price
Delivery time estimation	Order details	Minutes
Age estimation	Photo	Years
Energy consumption	Building data	Kilowatt-hours

Regression with LLMs

LLMs can do regression, though they are not specialized for it:

async function estimateReadingTime(article: string): Promise<number> {
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [
      {
        role: 'system',
        content: `Estimate the reading time in minutes for the given article. 
                  Consider complexity and length. 
                  Respond with only a number.`,
      },
      {
        role: 'user',
        content: article,
      },
    ],
  });

  return parseInt(response.choices[0].message.content.trim());
}

Generation: Create Something New

Generation creates new content that did not exist before. This is where LLMs truly shine.

Text Generation

// Generate a product description
async function generateDescription(product: Product): Promise<string> {
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [
      {
        role: 'system',
        content: 'Write compelling product descriptions for an e-commerce site.',
      },
      {
        role: 'user',
        content: `Product: ${product.name}
                  Features: ${product.features.join(', ')}
                  Target audience: ${product.audience}`,
      },
    ],
  });

  return response.choices[0].message.content;
}

// Input: { name: "Wireless Earbuds", features: ["noise canceling", "24hr battery"], ... }
// Output: "Experience pure audio freedom with our premium Wireless Earbuds..."

Code Generation

// Generate code from description
async function generateFunction(description: string): Promise<string> {
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [
      {
        role: 'system',
        content: 'Generate TypeScript code based on the description. Include types.',
      },
      {
        role: 'user',
        content: description,
      },
    ],
  });

  return response.choices[0].message.content;
}

// Input: "A function that filters an array to keep only even numbers"
// Output: "function filterEven(numbers: number[]): number[] { ... }"

Other Generation Tasks

Task	Input	Output
Summarization	Long text	Short summary
Translation	Text in one language	Text in another
Question answering	Question + context	Answer
Chatbot response	Conversation history	Reply
Image generation	Text description	Image
Music generation	Style/mood	Audio

Why LLMs Excel at Generation

LLMs were trained to predict "what comes next" in text. This makes them natural generators:

// LLM training objective: predict next token
// Input: "The capital of France is"
// LLM learns: "Paris" is likely next

// This same ability enables generation:
// Input: "Write a haiku about programming"
// LLM generates token by token:
// "Code" -> "flows" -> "like" -> "water" -> ...

Supervised vs Unsupervised Learning

Another way to categorize ML is by how models learn:

Supervised Learning

Model learns from labeled examples (input + correct answer):

// Supervised: we tell the model the right answer
const labeledData = [
  { email: 'You won!', label: 'spam' },
  { email: 'Meeting at 3', label: 'not_spam' },
  // Model learns the pattern from these examples
];

Most classification and regression tasks use supervised learning.

Unsupervised Learning

Model finds patterns without labeled data:

// Unsupervised: no labels, model finds structure
const unlabeledData = [
  { spending: 100, visits: 5, age: 25 },
  { spending: 5000, visits: 50, age: 45 },
  { spending: 50, visits: 2, age: 30 },
  // Model might discover "high-value" vs "casual" customer groups
];

// Clustering: group similar items
const clusters = await model.cluster(unlabeledData);
// Returns: { cluster1: [...], cluster2: [...] }

Common unsupervised tasks:

Clustering: Group similar items (customer segmentation)
Anomaly detection: Find unusual patterns (fraud detection)
Dimensionality reduction: Simplify complex data

Self-Supervised Learning

LLMs use a clever trick: they create their own labels from data:

// Self-supervised: the data creates its own labels
const text = 'The cat sat on the mat';

// Training examples generated automatically:
// Input: "The cat sat on the" -> Label: "mat"
// Input: "The cat sat on" -> Label: "the"
// Input: "The cat sat" -> Label: "on"

// No human labeling needed!

This is how LLMs can train on trillions of words without manual labeling.

Choosing the Right Approach

How do you know which type of ML task you need?

Decision Framework

Example Scenarios

Scenario	Task Type	Why
"Should we approve this loan?"	Classification	Yes/No decision
"How much should we charge?"	Regression	Continuous number
"Write an email reply"	Generation	Create new content
"Group similar customers"	Clustering	Find natural groups
"Is this transaction suspicious?"	Classification or Anomaly	Category or outlier
"What will sales be next month?"	Regression	Predict a number
"Summarize this document"	Generation	Create new text

LLMs Can Do Many Tasks

Modern LLMs are versatile and can handle multiple task types:

// Same model, different tasks

// Classification
const category = await llm.complete(
  'Classify this email as billing, support, or sales: [email text]'
);

// Regression (sort of)
const estimate = await llm.complete('Estimate the word count of a 5-page document: ');

// Generation
const summary = await llm.complete('Summarize this article in 3 sentences: [article]');

// Extraction (a form of classification/generation)
const entities = await llm.complete('Extract all company names from this text: [text]');

Specialized Models vs General LLMs

When should you use a specialized model vs an LLM?

Specialized Models

Better for:

High-volume, low-latency needs
Specific, well-defined tasks
Cost-sensitive applications
When you have good training data

// Specialized sentiment model
// - Fast (milliseconds)
// - Cheap (trained once, runs locally)
// - Accurate for this specific task
const sentiment = await sentimentModel.predict(text);

General LLMs

Better for:

Flexible, varied tasks
Natural language understanding
Complex reasoning
When you do not have training data
Rapid prototyping

// LLM for sentiment
// - Slower (hundreds of milliseconds)
// - Per-request cost
// - Can handle nuance and context
// - No training data needed
const sentiment = await llm.complete('What is the sentiment of: ' + text);

The Practical Reality

For most developers building applications:

Start with LLMs - faster to build, no training needed
Move to specialized models - when you need speed/cost optimization
Combine both - use LLMs for complex tasks, specialized for simple ones

Exercises

Exercise 1: Classify the Task

For each scenario, identify the ML task type (classification, regression, generation, clustering):

Predict how many stars a user will give a product
Decide if a news article is real or fake
Write a response to a customer complaint
Group news articles by topic (without predefined categories)
Estimate delivery time for a package
Tag a photo with relevant keywords
Translate a document from English to Spanish

Solution

Regression - Predicting a number (1-5 stars)
Classification - Binary category (real/fake)
Generation - Creating new text
Clustering - Finding groups without labels
Regression - Predicting a number (minutes/hours)
Multi-label Classification - Multiple categories
Generation - Creating new text (translation)

Exercise 2: Design the Solution

You are building a customer service system. For each requirement, describe what type of ML task you would use and why:

Route incoming emails to the right department
Suggest response templates to agents
Predict how long a ticket will take to resolve
Identify VIP customers who might churn
Generate personalized responses to common questions

Solution

Multi-class Classification - Route to billing/support/sales/returns
- Input: email text
- Output: one department
Retrieval/Classification - Match to existing templates
- Input: customer email
- Output: relevant template IDs
Regression - Predict resolution time
- Input: ticket details
- Output: hours/days
Classification + Regression - Two-step approach
- Classification: is customer at risk? (yes/no)
- Regression: churn probability (0-100%)
Generation - Create custom responses
- Input: customer question + context
- Output: personalized answer

Exercise 3: LLM Prompt Design

Write prompts that would make an LLM perform each task type:

Binary classification (spam detection)
Multi-class classification (emotion detection)
Regression-like estimation (difficulty rating)
Generation (product description)

Solution

// 1. Binary classification
const spamPrompt = `
Classify the following email as "spam" or "not_spam".
Respond with only one word: either "spam" or "not_spam".

Email: ${emailText}
`;

// 2. Multi-class classification
const emotionPrompt = `
Identify the primary emotion in this text.
Choose exactly one: happy, sad, angry, fearful, surprised, disgusted.
Respond with only the emotion word.

Text: ${text}
`;

// 3. Regression-like estimation
const difficultyPrompt = `
Rate the difficulty of this programming problem on a scale of 1-10,
where 1 is trivial and 10 is expert-level.
Respond with only a number.

Problem: ${problemDescription}
`;

// 4. Generation
const descriptionPrompt = `
Write a compelling product description for an e-commerce website.
Keep it under 100 words. Highlight key benefits.

Product name: ${name}
Features: ${features}
Target customer: ${audience}
`;

Key Takeaways

Classification assigns inputs to categories (which one?)
Regression predicts numerical values (how much?)
Generation creates new content (what should I write?)
Supervised learning uses labeled data; unsupervised finds patterns without labels
LLMs can perform all these tasks through clever prompting
Choose specialized models for high-volume/low-cost; LLMs for flexibility
Start with LLMs, optimize with specialized models when needed

Resources

Resource	Type	Description
Google ML Concepts	Tutorial	ML terminology and task types
Scikit-learn: Choosing the Right Estimator	Documentation	Visual guide to ML algorithms
OpenAI Cookbook: Classification	Tutorial	Using GPT for classification

Next Lesson

Now that you understand the types of ML tasks, let us see how AI is being used in the real world across different industries.

Continue to Lesson 1.4: AI in the Real World