Machine Learning — Hierarchical Clustering
Hierarchical Clustering (Agglomerative & Divisive), Dendrogram Analysis
Introduction to Hierarchical Clustering
Hierarchical Clustering is an Unsupervised Machine Learning algorithm used to group similar data points into clusters by creating a hierarchy of clusters.
Unlike K-Means:
K-Means → Need K value initially
Hierarchical clustering:
No need to specify K initially
The objective is:
To create a hierarchy or tree-like structure of clusters based on similarity.
Basic Idea of Hierarchical Clustering
Hierarchical clustering builds clusters gradually:
Data Points
↓
Merge or divide clusters
↓
Build hierarchy
↓
Generate cluster tree
Types of Hierarchical Clustering
Hierarchical Clustering has two approaches:
Hierarchical Clustering
|
|------ Agglomerative
|
|------ Divisive
1. Agglomerative Hierarchical Clustering
Definition
Agglomerative clustering follows a bottom-up approach.
Rule:
Start with individual data points
↓
Merge closest clusters
↓
Continue merging
↓
One large cluster
This is the most commonly used hierarchical clustering method.
Working of Agglomerative Clustering
Suppose data points:
A B C D
Step 1: Start with individual clusters
A B C D
Step 2: Find nearest clusters
Suppose:
A and B are nearest
Merge:
(A,B) C D
Step 3: Repeat merging
Suppose:
C and D are nearest
Merge:
(A,B) (C,D)
Step 4: Continue until one cluster remains
((A,B),(C,D))
Agglomerative Clustering Workflow
Start with individual points
↓
Calculate distances
↓
Merge nearest clusters
↓
Update distances
↓
Repeat process
↓
Create hierarchy
2. Divisive Hierarchical Clustering
Definition
Divisive clustering follows a top-down approach.
Rule:
Start with one large cluster
↓
Split cluster
↓
Split again
↓
Individual clusters
Working of Divisive Clustering
Suppose:
A B C D
Initially:
(A,B,C,D)
Step 1: Divide into groups
(A,B) (C,D)
Step 2: Split further
A B C D
Divisive Clustering Workflow
Start with all data
↓
Find dissimilar groups
↓
Split cluster
↓
Continue splitting
↓
Create hierarchy
Difference Between Agglomerative and Divisive
AgglomerativeDivisiveBottom-up approachTop-down approachStarts with individual pointsStarts with one clusterMerge clustersSplit clustersMore commonly usedLess commonly usedComputationally efficientMore expensiveDistance Measures Used
Hierarchical clustering requires distance calculations.
Euclidean Distance
genui{"math_block_widget_always_prefetch_v2":{"content":"d=\sqrt{(x_2-x_1)^2+(y_2-y_1)^2}"}}
Manhattan Distance
d=\sum|x_i-y_i|
Cosine Distance
Measures similarity using angle between vectors.
Linkage Methods
When clusters contain multiple points, distance between clusters must be calculated.
1. Single Linkage
Uses minimum distance between clusters.
D(A,B)=min(d(a,b))
2. Complete Linkage
Uses maximum distance.
D(A,B)=max(d(a,b))
3. Average Linkage
Uses average distance.
D(A,B)=\frac{1}{n}\sum d(a,b)
4. Ward Linkage
Minimizes variance within clusters.
Most commonly used in practice.
Dendrogram Analysis
Definition
A Dendrogram is a tree-like diagram used to visualize hierarchical clustering.
It shows:
- Cluster formation
- Cluster merging
- Similarity among observations
Basic Structure of Dendrogram
Distance ^ | 10 | _________ | | | 8 | ___|___ | | | | | 6 | __|__ | | | | | | | 4 | A B C D | +------------------------
Understanding Dendrogram
Horizontal axis
Represents:
Data points or observations
Vertical axis
Represents:
Distance between clusters
How to Determine Number of Clusters using Dendrogram
Rule:
Draw a horizontal line through the largest vertical gap.
Example:
Distance ^ 10 | _________ | | | 8 | ____|____ | | | | | 6 | |---------|----| ← Cut here | | | | 4 | A B C D
Clusters formed:
Cluster 1 → A,B Cluster 2 → C,D
Performance Metrics
1. Silhouette Score
Measures compactness and separation.
S=\frac{b-a}{max(a,b)}
Range:
-1 → Poor 0 → Overlap 1 → Perfect
2. Davies-Bouldin Index
Measures cluster similarity.
Interpretation:
Lower value
↓
Better clustering
Advantages
- No need to specify K initially
- Produces hierarchy of clusters
- Easy visualization using dendrogram
- Works with small datasets
Disadvantages
- Computationally expensive
- Sensitive to noise and outliers
- Difficult for very large datasets
- Once merged, clusters cannot be reversed in agglomerative clustering
Applications
Business
- Customer segmentation
Healthcare
- Disease pattern analysis
Social Media
- User behavior grouping
Biology
- Gene sequence analysis
Image Processing
- Image segmentation
Complete Workflow
Collect Data
↓
Calculate Distance Matrix
↓
Choose Linkage Method
↓
Merge/Split Clusters
↓
Generate Dendrogram
↓
Determine Clusters
↓
Evaluate Clustering
One-line summary
Hierarchical Clustering is an unsupervised learning method that creates a hierarchy of clusters using bottom-up (Agglomerative) or top-down (Divisive) approaches and visualizes them through a dendrogram.