← Evaluation Metrics for Supervised Clustering & Clustering Techniques →

Introduction to Unsupervised Learning

Definition

Unsupervised Learning is a type of Machine Learning in which the model learns from unlabeled data and tries to discover hidden patterns, structures, or relationships within the data.

Unlike Supervised Learning, there are no predefined outputs or correct answers.

The objective is:

To identify meaningful patterns and relationships from data without prior labels.

Why is it called "Unsupervised"?

It is called unsupervised because there is no teacher or guidance available.

In supervised learning:

Input → Correct Output

In unsupervised learning:

Input only

The machine itself attempts to discover hidden structures.

Example

Suppose an online shopping company has customer data:

CustomerAgeSpendingA22500B24600C505000D524500

No labels exist such as:

Young Customer
High Spender
Low Spender

The model automatically discovers groups:

Group 1 → Young customers

Group 2 → High-spending customers

Working of Unsupervised Learning

Input Data
(No Labels)
       ↓
Find similarities and patterns
       ↓
Identify relationships
       ↓
Create groups or structures

Steps involved in Unsupervised Learning

Step 1: Collect Data

Example:

Customer information
Sales records
Website activity

Step 2: Preprocess Data

Tasks:

Handle missing values
Remove duplicates
Scale features
Normalize data

Step 3: Apply Algorithm

Examples:

K-Means
Hierarchical Clustering
PCA

Step 4: Discover Patterns

Possible outputs:

Clusters
Patterns
Relationships
Reduced dimensions

Step 5: Analyze Results

Interpret the discovered patterns.

Characteristics of Unsupervised Learning

Uses unlabeled data
No predefined target variable
Discovers hidden structures
Learns automatically from data
Mostly exploratory in nature

Major Types of Unsupervised Learning

Unsupervised Learning
        |
        |---- Clustering
        |
        |---- Association
        |
        |---- Dimensionality Reduction

1. Clustering

Definition

Clustering groups similar data points into clusters.

The goal:

Similar data should belong to the same group.

Example:

Customer segmentation:

Cluster 1 → Students

Cluster 2 → Working Professionals

Cluster 3 → Senior Citizens

Algorithms:

K-Means
Hierarchical Clustering
DBSCAN

Applications:

Customer segmentation
Market analysis
Image grouping

2. Association

Definition

Association finds relationships among variables or items.

Goal:

Identify items that frequently occur together.

Example:

Customers buying bread
also buy butter

Association Rule:

Bread → Butter

Algorithms:

Apriori
FP Growth

Applications:

Recommendation systems
Market basket analysis

3. Dimensionality Reduction

Definition

Dimensionality Reduction reduces the number of features while retaining important information.

Goal:

Reduce complexity without losing significant information.

Example:

Suppose:

100 features

Reduced to:

10 features

Algorithms:

PCA (Principal Component Analysis)
t-SNE

Applications:

Data compression
Visualization
Faster training

Difference between Supervised and Unsupervised Learning

Supervised LearningUnsupervised LearningUses labeled dataUses unlabeled dataOutput availableOutput not availableLearns input-output relationshipFinds hidden patternsUsed for predictionUsed for pattern discoveryExample: Spam detectionExample: Customer segmentationPerformance Evaluation Metrics

Unlike supervised learning, evaluation is more difficult because no labels exist.

Common metrics:

For Clustering

Silhouette Score
Davies-Bouldin Index
Inertia

For Dimensionality Reduction

Explained Variance Ratio

Advantages

Does not require labeled data
Finds hidden patterns
Useful for exploratory analysis
Works with large datasets
Can discover unknown relationships

Disadvantages

Hard to evaluate results
Interpretation may be difficult
Accuracy measurement is challenging
Results may vary across algorithms

Real-world Applications

E-commerce

Customer segmentation
Recommendation systems

Banking

Fraud pattern identification

Healthcare

Disease pattern discovery

Social Media

User behavior analysis

Marketing

Market basket analysis

Complete Workflow

Collect Data
      ↓
Preprocess Data
      ↓
Apply Unsupervised Algorithm
      ↓
Identify Patterns
      ↓
Create Groups/Relationships
      ↓
Evaluate Results

One-line summary

Unsupervised Learning is a machine learning technique where a model learns from unlabeled data and discovers hidden patterns, structures, and relationships automatically.

← Previous: Evaluation Metrics for Supervised Next: Clustering & Clustering Techniques →

Machine Learning — Introduction to Unsupervised ML

Introduction to Unsupervised Learning

Definition

Why is it called "Unsupervised"?

Example

Working of Unsupervised Learning

Steps involved in Unsupervised Learning

Step 1: Collect Data

Step 2: Preprocess Data

Step 3: Apply Algorithm

Step 4: Discover Patterns

Step 5: Analyze Results

Characteristics of Unsupervised Learning

Major Types of Unsupervised Learning

1. Clustering

Definition

2. Association

Definition

3. Dimensionality Reduction

Definition

Difference between Supervised and Unsupervised Learning

Supervised LearningUnsupervised LearningUses labeled dataUses unlabeled dataOutput availableOutput not availableLearns input-output relationshipFinds hidden patternsUsed for predictionUsed for pattern discoveryExample: Spam detectionExample: Customer segmentationPerformance Evaluation Metrics

For Clustering

For Dimensionality Reduction

Advantages

Disadvantages

Real-world Applications

E-commerce

Banking

Healthcare

Social Media

Marketing

Complete Workflow

One-line summary