Machine Learning — Support Vector Machine (SVM)
Support Vector Machine (SVM) (Detailed Version)
Definition
Support Vector Machine (SVM) is a Supervised Machine Learning algorithm used for:
- Classification problems
- Regression problems
SVM works by finding the best boundary (hyperplane) that separates different classes with the maximum possible margin.
The main idea is:
Find a decision boundary that separates classes while maximizing the distance between the nearest data points.
Examples:
- Spam detection
- Face recognition
- Cancer diagnosis
- Handwritten digit recognition
- Image classification
Why SVM?
Suppose we have two classes:
○ = Class A × = Class B
Data:
× × × × ------------------- ○ ○ ○ ○
Many lines can separate classes:
Line 1 Line 2 Line 3
SVM chooses:
The line with maximum margin
because it gives better generalization.
Basic Working of SVM
Input Data
↓
Find separating boundary
↓
Maximize margin
↓
Create optimal hyperplane
↓
Classify new observations
Important Components of SVM
1. Hyperplane
A hyperplane is the decision boundary separating classes.
For two dimensions:
w_1x_1+w_2x_2+b=0
Where:
- (w_1,w_2) = weights
- (x_1,x_2) = input variables
- (b) = bias
Example:
× × ------------------ Hyperplane ○ ○
2. Support Vectors
Support vectors are the nearest data points to the hyperplane.
Example:
× ×
×
------------------
○
○ ○
Nearest points:
× and ○
These determine the position of the hyperplane.
3. Margin
Margin is the distance between support vectors and hyperplane.
Support Vector
|
|← Margin →
|
------------------ Hyperplane
|
|← Margin →
|
Support Vector
Goal:
Maximum Margin
Mathematical Objective of SVM
SVM tries to maximize:
Margin=\frac{2}{||w||}
Goal:
Maximize Margin
Equivalent optimization:
\min\frac{1}{2}||w||^2
Subject to:
y_i(w\cdot x_i+b)\ge1
Working of SVM (Step by Step)
Step 1: Collect data
Example:
Study HoursAttendancePass250Fail355Fail685Pass890PassStep 2: Plot data
Pass(○) ○
○
--------------------
×
× Fail
Step 3: Find support vectors
Nearest observations are selected.
Step 4: Create optimal hyperplane
Choose boundary with largest margin.
Step 5: Predict new observations
Example:
Study Hours=7 Attendance=80
Predict:
Pass
What if data is not linearly separable?
Real-world data often looks like:
○ ○ ○ × ○ × ○ × ○
No straight line can separate this.
SVM uses Kernel Functions.
Kernel Trick
Kernel trick transforms lower-dimensional data into higher dimensions so that separation becomes easier.
Original Data
↓
Transform dimensions
↓
Find hyperplane
Types of Kernels
1. Linear Kernel
Used for linearly separable data.
Formula:
K(x_i,x_j)=x_i^Tx_j
Applications:
- Text classification
- Spam detection
2. Polynomial Kernel
Used when relationships are curved.
Formula:
K(x_i,x_j)=(x_i^Tx_j+c)^d
Applications:
- Image processing
3. Radial Basis Function (RBF) Kernel
Most commonly used kernel.
Formula:
K(x_i,x_j)=e^{-\gamma||x_i-x_j||^2}
Applications:
- Complex datasets
4. Sigmoid Kernel
Formula:
K(x_i,x_j)=\tanh(\alpha x_i^Tx_j+c)
Applications:
- Neural-network-like behavior
Types of SVM
1. Linear SVM
Uses a straight-line hyperplane.
Example:
Spam / Not Spam
2. Nonlinear SVM
Uses kernels.
Example:
Image Classification
3. Support Vector Regression (SVR)
Used for regression problems.
Example:
House Price Prediction
Hyperparameters in SVM
1. C Parameter
Controls penalty for incorrect classification.
Example:
C=100
High C:
Low training error Higher overfitting risk
Low C:
More flexible boundary Less overfitting
2. Gamma
Controls influence of nearby points.
Example:
gamma=0.1
High Gamma:
Complex boundaries
Low Gamma:
Smooth boundaries
Performance Metrics for Classification SVM
Accuracy
Accuracy=\frac{TP+TN}{TP+TN+FP+FN}
Precision
Precision=\frac{TP}{TP+FP}
Recall
Recall=\frac{TP}{TP+FN}
F1 Score
F1=2\times\frac{Precision\times Recall}{Precision+Recall}
ROC-AUC
Interpretation:
AUC = 1 → Perfect AUC = 0.5 → Random
Performance Metrics for SVR
MAE
MAE=\frac{1}{n}\sum|y_i-\hat y_i|
MSE
MSE=\frac{1}{n}\sum(y_i-\hat y_i)^2
RMSE
RMSE=\sqrt{\frac{1}{n}\sum(y_i-\hat y_i)^2}
R² Score
R^2=1-\frac{SS_{res}}{SS_{tot}}
Advantages
- Works well with high-dimensional data
- Effective with small datasets
- Handles complex boundaries using kernels
- Less overfitting because of maximum margin
Disadvantages
- Slow for very large datasets
- Difficult to interpret
- Choosing kernel can be difficult
- Requires parameter tuning
Real-world Applications
- Face recognition
- Text classification
- Disease prediction
- Spam detection
- Handwriting recognition
- Image classification
Complete Workflow
Collect Data
↓
Preprocess Data
↓
Select Kernel
↓
Find Support Vectors
↓
Build Hyperplane
↓
Classify Data
↓
Evaluate Performance
One-line summary
Support Vector Machine (SVM) is a supervised learning algorithm that finds the optimal hyperplane with maximum margin to separate classes and can use kernels to handle non-linear data.