Decision Tree (Detailed Version)

Definition

Decision Tree is a Supervised Machine Learning algorithm used for both:

  • Classification problems
  • Regression problems

It works by creating a tree-like structure of decisions where data is repeatedly split into smaller subsets based on certain conditions.

The main idea is:

Divide data into smaller groups by asking questions until a prediction can be made.

Examples:

  • Loan approval prediction
  • Disease diagnosis
  • Student performance prediction
  • Customer churn prediction
  • House price prediction

Why is it called a Decision Tree?

It resembles an upside-down tree.

Structure:

                    Root Node
                         |
            -------------------------
            |                       |
        Condition 1            Condition 2
            |                       |
      -----------             ----------
      |         |             |        |
   Leaf      Leaf         Leaf      Leaf

Components of Decision Tree

1. Root Node

The topmost node representing the complete dataset.

Example:

Age > 30 ?

2. Decision Node

A node where a condition is tested.

Example:

Income > 50000 ?

3. Branch

Represents outcome of a condition.

Example:

Yes
No

4. Leaf Node

Final prediction/output node.

Example:

Loan Approved

Working of Decision Tree

Suppose we want to predict whether a student will pass.

Dataset:

Study HoursAttendancePass260No365No685Yes890Yes

Decision Tree may create:

Study Hours >5 ?
         |
    ----------------
    |              |
   No             Yes
    |              |
Fail      Attendance >80?
                   |
             --------------
             |            |
            No          Yes
             |            |
           Fail         Pass

Step-by-Step Working

Step 1: Start with complete dataset

All records

Step 2: Select best feature for splitting

Example:

Possible features:

  • Study Hours
  • Attendance
  • Age

Algorithm determines:

Study Hours

provides best separation.

Step 3: Split dataset

Example:

Study Hours >5

Step 4: Repeat process recursively

Continue splitting until:

  • Data becomes pure
  • Maximum depth reached
  • Minimum samples reached

Step 5: Create leaf node

Final prediction:

Pass

or

Fail

How does Decision Tree decide the best split?

Decision Trees use measures called splitting criteria.

Common methods:

  1. Entropy
  2. Information Gain
  3. Gini Index
  4. Variance Reduction

Entropy

Entropy measures impurity or randomness.

Formula:

Entropy(S)=-\sum p_i\log_2(p_i)

Where:

  • (p_i) = Probability of class

Interpretation:

Entropy = 0

means:

Pure node

Example:

PassFail100

Entropy:

0

Maximum entropy:

0.5 Pass
0.5 Fail

Entropy:

1

Information Gain

Information Gain measures reduction in entropy after splitting.

Formula:

Information\ Gain=Entropy(Parent)-Weighted\ Entropy(Children)

Goal:

Choose feature with maximum information gain

Gini Index

Measures impurity.

Formula:

Gini=1-\sum p_i^2

Interpretation:

Lower Gini
↓
Better split

Types of Decision Trees

1. Classification Tree

Used for categorical outputs.

Examples:

  • Spam / Not Spam
  • Pass / Fail
  • Disease prediction

2. Regression Tree

Used for continuous outputs.

Examples:

  • House price prediction
  • Salary prediction
  • Temperature prediction

Hyperparameters in Decision Tree

1. Maximum Depth

Maximum allowed depth of tree.

Example:

max_depth=5

Purpose:

  • Prevent overfitting

2. Minimum Samples Split

Minimum records needed to split.

Example:

min_samples_split=4

3. Minimum Samples Leaf

Minimum observations required at leaf node.

Example:

min_samples_leaf=2

4. Maximum Features

Maximum features considered for splitting.

Example:

max_features=3

Performance Metrics for Classification Trees

1. Accuracy

Accuracy=\frac{TP+TN}{TP+TN+FP+FN}

Measures percentage of correct predictions.

2. Precision

Precision=\frac{TP}{TP+FP}

Measures positive prediction correctness.

3. Recall

Recall=\frac{TP}{TP+FN}

Measures ability to identify actual positives.

4. F1 Score

F1=2\times\frac{Precision\times Recall}{Precision+Recall}

5. Confusion Matrix


Predicted PositivePredicted NegativeActual PositiveTPFNActual NegativeFPTNPerformance Metrics for Regression Trees

MAE

MAE=\frac{1}{n}\sum|y_i-\hat y_i|

MSE

MSE=\frac{1}{n}\sum(y_i-\hat y_i)^2

RMSE

RMSE=\sqrt{\frac{1}{n}\sum(y_i-\hat y_i)^2}

R² Score

R^2=1-\frac{SS_{res}}{SS_{tot}}

Overfitting in Decision Trees

Decision Trees can easily overfit.

Example:

Very deep tree
      ↓
Memorizes training data
      ↓
Poor performance on new data

Solutions:

  • Pruning
  • Limit depth
  • Set minimum samples
  • Use Random Forest

Advantages

  1. Easy to understand and visualize
  2. Works for classification and regression
  3. Handles numerical and categorical data
  4. Requires less preprocessing
  5. Feature scaling usually not needed

Disadvantages

  1. Can overfit easily
  2. Sensitive to small data changes
  3. Deep trees become complex
  4. May become computationally expensive

Real-world Applications

  • Fraud detection
  • Medical diagnosis
  • Loan approval systems
  • Customer segmentation
  • Risk analysis
  • Sales prediction

Complete Workflow

Collect Data
      ↓
Preprocess Data
      ↓
Select Splitting Criteria
      ↓
Build Tree
      ↓
Split Nodes
      ↓
Create Leaf Nodes
      ↓
Predict Output
      ↓
Evaluate Performance

One-line summary

Decision Tree is a supervised learning algorithm that predicts outputs by recursively splitting data into smaller groups using decision rules.