Lesson 1.3: Types of ML Tasks
Duration: 60 minutes
Learning Objectives
By the end of this lesson, you will be able to:
- Identify the main categories of machine learning tasks
- Understand when to use classification, regression, or generation
- Recognize supervised vs unsupervised learning
- Match real-world problems to appropriate ML approaches
- Understand what types of tasks LLMs can handle
The Three Main Categories
Machine learning tasks fall into three broad categories based on what you want the model to output:
┌─────────────────────────────────────────────────────────┐
│ ML Task Types │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │Classification│ │ Regression │ │ Generation │ │
│ │ │ │ │ │ │ │
│ │ Which │ │ How much? │ │ Create │ │
│ │ category? │ │ What value? │ │ something │ │
│ │ │ │ │ │ new │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ Examples: Examples: Examples: │
│ - Spam/not spam - House price - Write text │
│ - Cat/dog - Temperature - Create images │
│ - Sentiment - Stock price - Generate code │
│ │
└─────────────────────────────────────────────────────────┘
Let us explore each one.
Classification: Which Category?
Classification assigns inputs to predefined categories. The model answers "which one?"
Binary Classification
Two possible outcomes:
// Spam detection
type SpamPrediction = 'spam' | 'not_spam';
async function classifyEmail(email: Email): Promise<SpamPrediction> {
// Model returns one of two categories
return await model.predict(email);
}
// Examples:
// "You won $1000!" -> "spam"
// "Meeting at 3pm" -> "not_spam"
Common binary classification tasks:
- Spam detection (spam / not spam)
- Fraud detection (fraudulent / legitimate)
- Medical diagnosis (disease present / absent)
- Sentiment (positive / negative)
Multi-class Classification
Multiple possible categories (pick one):
// Document categorization
type DocumentCategory = 'sports' | 'politics' | 'technology' | 'entertainment' | 'business';
async function categorizeArticle(article: string): Promise<DocumentCategory> {
return await model.predict(article);
}
// Examples:
// "The team won the championship..." -> "sports"
// "New AI chip announced..." -> "technology"
Common multi-class tasks:
- Email routing (billing / support / sales / other)
- Language detection (English / Spanish / French / ...)
- Image recognition (cat / dog / bird / ...)
- Intent detection (book flight / check status / cancel / ...)
Multi-label Classification
Multiple categories can apply simultaneously:
// Movie genre classification
type Genre = 'action' | 'comedy' | 'drama' | 'horror' | 'romance';
async function classifyMovie(movie: Movie): Promise<Genre[]> {
// Returns multiple applicable genres
return await model.predict(movie);
}
// Examples:
// "Romantic Comedy" -> ["comedy", "romance"]
// "Action Horror" -> ["action", "horror"]
Common multi-label tasks:
- Content tagging (blog post topics)
- Product categorization (multiple categories)
- Skill detection (multiple skills in a resume)
Classification in Code
Here is how you might use classification via an API:
// Using an LLM for classification
async function classifySentiment(text: string): Promise<string> {
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [
{
role: 'system',
content: `Classify the sentiment of the text as exactly one of:
positive, negative, or neutral.
Respond with only the classification.`,
},
{
role: 'user',
content: text,
},
],
});
return response.choices[0].message.content.trim().toLowerCase();
}
// Usage
const sentiment = await classifySentiment('I love this product!');
console.log(sentiment); // "positive"
Regression: How Much?
Regression predicts a continuous numerical value. The model answers "how much?" or "what number?"
Examples of Regression
// House price prediction
type HouseFeatures = {
squareFeet: number;
bedrooms: number;
bathrooms: number;
yearBuilt: number;
zipCode: string;
};
async function predictPrice(house: HouseFeatures): Promise<number> {
// Returns a specific dollar amount
return await model.predict(house);
}
// Example:
// { squareFeet: 2000, bedrooms: 3, ... } -> $425,000
Classification vs Regression
The key difference is the output type:
// Classification: discrete categories
type CreditRiskCategory = 'low' | 'medium' | 'high';
async function classifyCreditRisk(applicant: Applicant): Promise<CreditRiskCategory> {
return await model.predict(applicant); // Returns a category
}
// Regression: continuous number
async function predictDefaultProbability(applicant: Applicant): Promise<number> {
return await model.predict(applicant); // Returns 0.0 to 1.0
}
// Classification: "This applicant is high risk"
// Regression: "This applicant has 73.2% chance of default"
Common Regression Tasks
| Task | Input | Output |
|---|---|---|
| House price prediction | House features | Dollar amount |
| Temperature forecasting | Weather data | Degrees |
| Stock price prediction | Market data | Price |
| Delivery time estimation | Order details | Minutes |
| Age estimation | Photo | Years |
| Energy consumption | Building data | Kilowatt-hours |
Regression with LLMs
LLMs can do regression, though they are not specialized for it:
async function estimateReadingTime(article: string): Promise<number> {
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [
{
role: 'system',
content: `Estimate the reading time in minutes for the given article.
Consider complexity and length.
Respond with only a number.`,
},
{
role: 'user',
content: article,
},
],
});
return parseInt(response.choices[0].message.content.trim());
}
Generation: Create Something New
Generation creates new content that did not exist before. This is where LLMs truly shine.
Text Generation
// Generate a product description
async function generateDescription(product: Product): Promise<string> {
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [
{
role: 'system',
content: 'Write compelling product descriptions for an e-commerce site.',
},
{
role: 'user',
content: `Product: ${product.name}
Features: ${product.features.join(', ')}
Target audience: ${product.audience}`,
},
],
});
return response.choices[0].message.content;
}
// Input: { name: "Wireless Earbuds", features: ["noise canceling", "24hr battery"], ... }
// Output: "Experience pure audio freedom with our premium Wireless Earbuds..."
Code Generation
// Generate code from description
async function generateFunction(description: string): Promise<string> {
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [
{
role: 'system',
content: 'Generate TypeScript code based on the description. Include types.',
},
{
role: 'user',
content: description,
},
],
});
return response.choices[0].message.content;
}
// Input: "A function that filters an array to keep only even numbers"
// Output: "function filterEven(numbers: number[]): number[] { ... }"
Other Generation Tasks
| Task | Input | Output |
|---|---|---|
| Summarization | Long text | Short summary |
| Translation | Text in one language | Text in another |
| Question answering | Question + context | Answer |
| Chatbot response | Conversation history | Reply |
| Image generation | Text description | Image |
| Music generation | Style/mood | Audio |
Why LLMs Excel at Generation
LLMs were trained to predict "what comes next" in text. This makes them natural generators:
// LLM training objective: predict next token
// Input: "The capital of France is"
// LLM learns: "Paris" is likely next
// This same ability enables generation:
// Input: "Write a haiku about programming"
// LLM generates token by token:
// "Code" -> "flows" -> "like" -> "water" -> ...
Supervised vs Unsupervised Learning
Another way to categorize ML is by how models learn:
Supervised Learning
Model learns from labeled examples (input + correct answer):
// Supervised: we tell the model the right answer
const labeledData = [
{ email: 'You won!', label: 'spam' },
{ email: 'Meeting at 3', label: 'not_spam' },
// Model learns the pattern from these examples
];
Most classification and regression tasks use supervised learning.
Unsupervised Learning
Model finds patterns without labeled data:
// Unsupervised: no labels, model finds structure
const unlabeledData = [
{ spending: 100, visits: 5, age: 25 },
{ spending: 5000, visits: 50, age: 45 },
{ spending: 50, visits: 2, age: 30 },
// Model might discover "high-value" vs "casual" customer groups
];
// Clustering: group similar items
const clusters = await model.cluster(unlabeledData);
// Returns: { cluster1: [...], cluster2: [...] }
Common unsupervised tasks:
- Clustering: Group similar items (customer segmentation)
- Anomaly detection: Find unusual patterns (fraud detection)
- Dimensionality reduction: Simplify complex data
Self-Supervised Learning
LLMs use a clever trick: they create their own labels from data:
// Self-supervised: the data creates its own labels
const text = 'The cat sat on the mat';
// Training examples generated automatically:
// Input: "The cat sat on the" -> Label: "mat"
// Input: "The cat sat on" -> Label: "the"
// Input: "The cat sat" -> Label: "on"
// No human labeling needed!
This is how LLMs can train on trillions of words without manual labeling.
Choosing the Right Approach
How do you know which type of ML task you need?
Decision Framework
┌─────────────────────────────────────────────────────────┐
│ What do you need? │
│ │
│ "Assign to a category" ──────> Classification │
│ │
│ "Predict a number" ──────> Regression │
│ │
│ "Create new content" ──────> Generation │
│ │
│ "Find groups/patterns" ──────> Clustering │
│ │
│ "Find outliers" ──────> Anomaly Detection │
│ │
└─────────────────────────────────────────────────────────┘
Example Scenarios
| Scenario | Task Type | Why |
|---|---|---|
| "Should we approve this loan?" | Classification | Yes/No decision |
| "How much should we charge?" | Regression | Continuous number |
| "Write an email reply" | Generation | Create new content |
| "Group similar customers" | Clustering | Find natural groups |
| "Is this transaction suspicious?" | Classification or Anomaly | Category or outlier |
| "What will sales be next month?" | Regression | Predict a number |
| "Summarize this document" | Generation | Create new text |
LLMs Can Do Many Tasks
Modern LLMs are versatile and can handle multiple task types:
// Same model, different tasks
// Classification
const category = await llm.complete(
'Classify this email as billing, support, or sales: [email text]'
);
// Regression (sort of)
const estimate = await llm.complete('Estimate the word count of a 5-page document: ');
// Generation
const summary = await llm.complete('Summarize this article in 3 sentences: [article]');
// Extraction (a form of classification/generation)
const entities = await llm.complete('Extract all company names from this text: [text]');
Specialized Models vs General LLMs
When should you use a specialized model vs an LLM?
Specialized Models
Better for:
- High-volume, low-latency needs
- Specific, well-defined tasks
- Cost-sensitive applications
- When you have good training data
// Specialized sentiment model
// - Fast (milliseconds)
// - Cheap (trained once, runs locally)
// - Accurate for this specific task
const sentiment = await sentimentModel.predict(text);
General LLMs
Better for:
- Flexible, varied tasks
- Natural language understanding
- Complex reasoning
- When you do not have training data
- Rapid prototyping
// LLM for sentiment
// - Slower (hundreds of milliseconds)
// - Per-request cost
// - Can handle nuance and context
// - No training data needed
const sentiment = await llm.complete('What is the sentiment of: ' + text);
The Practical Reality
For most developers building applications:
- Start with LLMs - faster to build, no training needed
- Move to specialized models - when you need speed/cost optimization
- Combine both - use LLMs for complex tasks, specialized for simple ones
Exercises
Exercise 1: Classify the Task
For each scenario, identify the ML task type (classification, regression, generation, clustering):
- Predict how many stars a user will give a product
- Decide if a news article is real or fake
- Write a response to a customer complaint
- Group news articles by topic (without predefined categories)
- Estimate delivery time for a package
- Tag a photo with relevant keywords
- Translate a document from English to Spanish
Solution
- Regression - Predicting a number (1-5 stars)
- Classification - Binary category (real/fake)
- Generation - Creating new text
- Clustering - Finding groups without labels
- Regression - Predicting a number (minutes/hours)
- Multi-label Classification - Multiple categories
- Generation - Creating new text (translation)
Exercise 2: Design the Solution
You are building a customer service system. For each requirement, describe what type of ML task you would use and why:
- Route incoming emails to the right department
- Suggest response templates to agents
- Predict how long a ticket will take to resolve
- Identify VIP customers who might churn
- Generate personalized responses to common questions
Solution
-
Multi-class Classification - Route to billing/support/sales/returns
- Input: email text
- Output: one department
-
Retrieval/Classification - Match to existing templates
- Input: customer email
- Output: relevant template IDs
-
Regression - Predict resolution time
- Input: ticket details
- Output: hours/days
-
Classification + Regression - Two-step approach
- Classification: is customer at risk? (yes/no)
- Regression: churn probability (0-100%)
-
Generation - Create custom responses
- Input: customer question + context
- Output: personalized answer
Exercise 3: LLM Prompt Design
Write prompts that would make an LLM perform each task type:
- Binary classification (spam detection)
- Multi-class classification (emotion detection)
- Regression-like estimation (difficulty rating)
- Generation (product description)
Solution
// 1. Binary classification
const spamPrompt = `
Classify the following email as "spam" or "not_spam".
Respond with only one word: either "spam" or "not_spam".
Email: ${emailText}
`;
// 2. Multi-class classification
const emotionPrompt = `
Identify the primary emotion in this text.
Choose exactly one: happy, sad, angry, fearful, surprised, disgusted.
Respond with only the emotion word.
Text: ${text}
`;
// 3. Regression-like estimation
const difficultyPrompt = `
Rate the difficulty of this programming problem on a scale of 1-10,
where 1 is trivial and 10 is expert-level.
Respond with only a number.
Problem: ${problemDescription}
`;
// 4. Generation
const descriptionPrompt = `
Write a compelling product description for an e-commerce website.
Keep it under 100 words. Highlight key benefits.
Product name: ${name}
Features: ${features}
Target customer: ${audience}
`;
Key Takeaways
- Classification assigns inputs to categories (which one?)
- Regression predicts numerical values (how much?)
- Generation creates new content (what should I write?)
- Supervised learning uses labeled data; unsupervised finds patterns without labels
- LLMs can perform all these tasks through clever prompting
- Choose specialized models for high-volume/low-cost; LLMs for flexibility
- Start with LLMs, optimize with specialized models when needed
Resources
| Resource | Type | Description |
|---|---|---|
| Google ML Concepts | Tutorial | ML terminology and task types |
| Scikit-learn: Choosing the Right Estimator | Documentation | Visual guide to ML algorithms |
| OpenAI Cookbook: Classification | Tutorial | Using GPT for classification |
Next Lesson
Now that you understand the types of ML tasks, let us see how AI is being used in the real world across different industries.