Hierarchical Clustering (Agglomerative & Divisive), Dendrogram Analysis

Introduction to Hierarchical Clustering

Hierarchical Clustering is an Unsupervised Machine Learning algorithm used to group similar data points into clusters by creating a hierarchy of clusters.

Unlike K-Means:

K-Means → Need K value initially

Hierarchical clustering:

No need to specify K initially

The objective is:

To create a hierarchy or tree-like structure of clusters based on similarity.

Basic Idea of Hierarchical Clustering

Hierarchical clustering builds clusters gradually:

Data Points
      ↓
Merge or divide clusters
      ↓
Build hierarchy
      ↓
Generate cluster tree

Types of Hierarchical Clustering

Hierarchical Clustering has two approaches:

Hierarchical Clustering
          |
          |------ Agglomerative
          |
          |------ Divisive

1. Agglomerative Hierarchical Clustering

Definition

Agglomerative clustering follows a bottom-up approach.

Rule:

Start with individual data points
            ↓
Merge closest clusters
            ↓
Continue merging
            ↓
One large cluster

This is the most commonly used hierarchical clustering method.

Working of Agglomerative Clustering

Suppose data points:

A   B   C   D

Step 1: Start with individual clusters

A     B     C     D

Step 2: Find nearest clusters

Suppose:

A and B are nearest

Merge:

(A,B)     C     D

Step 3: Repeat merging

Suppose:

C and D are nearest

Merge:

(A,B)     (C,D)

Step 4: Continue until one cluster remains

((A,B),(C,D))

Agglomerative Clustering Workflow

Start with individual points
          ↓
Calculate distances
          ↓
Merge nearest clusters
          ↓
Update distances
          ↓
Repeat process
          ↓
Create hierarchy

2. Divisive Hierarchical Clustering

Definition

Divisive clustering follows a top-down approach.

Rule:

Start with one large cluster
           ↓
Split cluster
           ↓
Split again
           ↓
Individual clusters

Working of Divisive Clustering

Suppose:

A B C D

Initially:

(A,B,C,D)

Step 1: Divide into groups

(A,B)      (C,D)

Step 2: Split further

A    B    C    D

Divisive Clustering Workflow

Start with all data
         ↓
Find dissimilar groups
         ↓
Split cluster
         ↓
Continue splitting
         ↓
Create hierarchy

Difference Between Agglomerative and Divisive

AgglomerativeDivisiveBottom-up approachTop-down approachStarts with individual pointsStarts with one clusterMerge clustersSplit clustersMore commonly usedLess commonly usedComputationally efficientMore expensiveDistance Measures Used

Hierarchical clustering requires distance calculations.

Euclidean Distance

genui{"math_block_widget_always_prefetch_v2":{"content":"d=\sqrt{(x_2-x_1)^2+(y_2-y_1)^2}"}}

Manhattan Distance

d=\sum|x_i-y_i|

Cosine Distance

Measures similarity using angle between vectors.

Linkage Methods

When clusters contain multiple points, distance between clusters must be calculated.

1. Single Linkage

Uses minimum distance between clusters.

D(A,B)=min(d(a,b))

2. Complete Linkage

Uses maximum distance.

D(A,B)=max(d(a,b))

3. Average Linkage

Uses average distance.

D(A,B)=\frac{1}{n}\sum d(a,b)

4. Ward Linkage

Minimizes variance within clusters.

Most commonly used in practice.

Dendrogram Analysis

Definition

A Dendrogram is a tree-like diagram used to visualize hierarchical clustering.

It shows:

  • Cluster formation
  • Cluster merging
  • Similarity among observations

Basic Structure of Dendrogram

Distance
   ^
   |
10 |          _________
   |         |         |
8  |      ___|___      |
   |     |       |     |
6  |   __|__     |     |
   |  |     |    |     |
4  |  A     B    C     D
   |
   +------------------------

Understanding Dendrogram

Horizontal axis

Represents:

Data points or observations

Vertical axis

Represents:

Distance between clusters

How to Determine Number of Clusters using Dendrogram

Rule:

Draw a horizontal line through the largest vertical gap.

Example:

Distance
   ^
10 |        _________
   |       |         |
8  |   ____|____     |
   |  |         |    |
6  |  |---------|----| ← Cut here
   |  |         |    |
4  | A B       C D

Clusters formed:

Cluster 1 → A,B

Cluster 2 → C,D

Performance Metrics

1. Silhouette Score

Measures compactness and separation.

S=\frac{b-a}{max(a,b)}

Range:

-1 → Poor

0 → Overlap

1 → Perfect

2. Davies-Bouldin Index

Measures cluster similarity.

Interpretation:

Lower value
       ↓
Better clustering

Advantages

  1. No need to specify K initially
  2. Produces hierarchy of clusters
  3. Easy visualization using dendrogram
  4. Works with small datasets

Disadvantages

  1. Computationally expensive
  2. Sensitive to noise and outliers
  3. Difficult for very large datasets
  4. Once merged, clusters cannot be reversed in agglomerative clustering

Applications

Business

  • Customer segmentation

Healthcare

  • Disease pattern analysis

Social Media

  • User behavior grouping

Biology

  • Gene sequence analysis

Image Processing

  • Image segmentation

Complete Workflow

Collect Data
      ↓
Calculate Distance Matrix
      ↓
Choose Linkage Method
      ↓
Merge/Split Clusters
      ↓
Generate Dendrogram
      ↓
Determine Clusters
      ↓
Evaluate Clustering

One-line summary

Hierarchical Clustering is an unsupervised learning method that creates a hierarchy of clusters using bottom-up (Agglomerative) or top-down (Divisive) approaches and visualizes them through a dendrogram.