Machine Learning — Decision Tree
Decision Tree (Detailed Version)
Definition
Decision Tree is a Supervised Machine Learning algorithm used for both:
- Classification problems
- Regression problems
It works by creating a tree-like structure of decisions where data is repeatedly split into smaller subsets based on certain conditions.
The main idea is:
Divide data into smaller groups by asking questions until a prediction can be made.
Examples:
- Loan approval prediction
- Disease diagnosis
- Student performance prediction
- Customer churn prediction
- House price prediction
Why is it called a Decision Tree?
It resembles an upside-down tree.
Structure:
Root Node
|
-------------------------
| |
Condition 1 Condition 2
| |
----------- ----------
| | | |
Leaf Leaf Leaf Leaf
Components of Decision Tree
1. Root Node
The topmost node representing the complete dataset.
Example:
Age > 30 ?
2. Decision Node
A node where a condition is tested.
Example:
Income > 50000 ?
3. Branch
Represents outcome of a condition.
Example:
Yes No
4. Leaf Node
Final prediction/output node.
Example:
Loan Approved
Working of Decision Tree
Suppose we want to predict whether a student will pass.
Dataset:
Study HoursAttendancePass260No365No685Yes890Yes
Decision Tree may create:
Study Hours >5 ?
|
----------------
| |
No Yes
| |
Fail Attendance >80?
|
--------------
| |
No Yes
| |
Fail Pass
Step-by-Step Working
Step 1: Start with complete dataset
All records
Step 2: Select best feature for splitting
Example:
Possible features:
- Study Hours
- Attendance
- Age
Algorithm determines:
Study Hours
provides best separation.
Step 3: Split dataset
Example:
Study Hours >5
Step 4: Repeat process recursively
Continue splitting until:
- Data becomes pure
- Maximum depth reached
- Minimum samples reached
Step 5: Create leaf node
Final prediction:
Pass
or
Fail
How does Decision Tree decide the best split?
Decision Trees use measures called splitting criteria.
Common methods:
- Entropy
- Information Gain
- Gini Index
- Variance Reduction
Entropy
Entropy measures impurity or randomness.
Formula:
Entropy(S)=-\sum p_i\log_2(p_i)
Where:
- (p_i) = Probability of class
Interpretation:
Entropy = 0
means:
Pure node
Example:
PassFail100
Entropy:
0
Maximum entropy:
0.5 Pass 0.5 Fail
Entropy:
1
Information Gain
Information Gain measures reduction in entropy after splitting.
Formula:
Information\ Gain=Entropy(Parent)-Weighted\ Entropy(Children)
Goal:
Choose feature with maximum information gain
Gini Index
Measures impurity.
Formula:
Gini=1-\sum p_i^2
Interpretation:
Lower Gini ↓ Better split
Types of Decision Trees
1. Classification Tree
Used for categorical outputs.
Examples:
- Spam / Not Spam
- Pass / Fail
- Disease prediction
2. Regression Tree
Used for continuous outputs.
Examples:
- House price prediction
- Salary prediction
- Temperature prediction
Hyperparameters in Decision Tree
1. Maximum Depth
Maximum allowed depth of tree.
Example:
max_depth=5
Purpose:
- Prevent overfitting
2. Minimum Samples Split
Minimum records needed to split.
Example:
min_samples_split=4
3. Minimum Samples Leaf
Minimum observations required at leaf node.
Example:
min_samples_leaf=2
4. Maximum Features
Maximum features considered for splitting.
Example:
max_features=3
Performance Metrics for Classification Trees
1. Accuracy
Accuracy=\frac{TP+TN}{TP+TN+FP+FN}
Measures percentage of correct predictions.
2. Precision
Precision=\frac{TP}{TP+FP}
Measures positive prediction correctness.
3. Recall
Recall=\frac{TP}{TP+FN}
Measures ability to identify actual positives.
4. F1 Score
F1=2\times\frac{Precision\times Recall}{Precision+Recall}
5. Confusion Matrix
Predicted PositivePredicted NegativeActual PositiveTPFNActual NegativeFPTNPerformance Metrics for Regression Trees
MAE
MAE=\frac{1}{n}\sum|y_i-\hat y_i|
MSE
MSE=\frac{1}{n}\sum(y_i-\hat y_i)^2
RMSE
RMSE=\sqrt{\frac{1}{n}\sum(y_i-\hat y_i)^2}
R² Score
R^2=1-\frac{SS_{res}}{SS_{tot}}
Overfitting in Decision Trees
Decision Trees can easily overfit.
Example:
Very deep tree
↓
Memorizes training data
↓
Poor performance on new data
Solutions:
- Pruning
- Limit depth
- Set minimum samples
- Use Random Forest
Advantages
- Easy to understand and visualize
- Works for classification and regression
- Handles numerical and categorical data
- Requires less preprocessing
- Feature scaling usually not needed
Disadvantages
- Can overfit easily
- Sensitive to small data changes
- Deep trees become complex
- May become computationally expensive
Real-world Applications
- Fraud detection
- Medical diagnosis
- Loan approval systems
- Customer segmentation
- Risk analysis
- Sales prediction
Complete Workflow
Collect Data
↓
Preprocess Data
↓
Select Splitting Criteria
↓
Build Tree
↓
Split Nodes
↓
Create Leaf Nodes
↓
Predict Output
↓
Evaluate Performance
One-line summary
Decision Tree is a supervised learning algorithm that predicts outputs by recursively splitting data into smaller groups using decision rules.