Support Vector Machine (SVM) (Detailed Version)

Definition

Support Vector Machine (SVM) is a Supervised Machine Learning algorithm used for:

  • Classification problems
  • Regression problems

SVM works by finding the best boundary (hyperplane) that separates different classes with the maximum possible margin.

The main idea is:

Find a decision boundary that separates classes while maximizing the distance between the nearest data points.

Examples:

  • Spam detection
  • Face recognition
  • Cancer diagnosis
  • Handwritten digit recognition
  • Image classification

Why SVM?

Suppose we have two classes:

○ = Class A
× = Class B

Data:

×      ×

×      ×

-------------------

○      ○

○      ○

Many lines can separate classes:

Line 1
Line 2
Line 3

SVM chooses:

The line with maximum margin

because it gives better generalization.

Basic Working of SVM

Input Data
      ↓
Find separating boundary
      ↓
Maximize margin
      ↓
Create optimal hyperplane
      ↓
Classify new observations

Important Components of SVM

1. Hyperplane

A hyperplane is the decision boundary separating classes.

For two dimensions:

w_1x_1+w_2x_2+b=0

Where:

  • (w_1,w_2) = weights
  • (x_1,x_2) = input variables
  • (b) = bias

Example:

×    ×

------------------ Hyperplane

○    ○

2. Support Vectors

Support vectors are the nearest data points to the hyperplane.

Example:

×       ×

      ×

------------------

      ○

○       ○

Nearest points:

× and ○

These determine the position of the hyperplane.

3. Margin

Margin is the distance between support vectors and hyperplane.

Support Vector
        |
        |← Margin →
        |
------------------ Hyperplane
        |
        |← Margin →
        |
Support Vector

Goal:

Maximum Margin

Mathematical Objective of SVM

SVM tries to maximize:

Margin=\frac{2}{||w||}

Goal:

Maximize Margin

Equivalent optimization:

\min\frac{1}{2}||w||^2

Subject to:

y_i(w\cdot x_i+b)\ge1

Working of SVM (Step by Step)

Step 1: Collect data

Example:

Study HoursAttendancePass250Fail355Fail685Pass890PassStep 2: Plot data

Pass(○)      ○

             ○

--------------------

×

× Fail

Step 3: Find support vectors

Nearest observations are selected.

Step 4: Create optimal hyperplane

Choose boundary with largest margin.

Step 5: Predict new observations

Example:

Study Hours=7
Attendance=80

Predict:

Pass

What if data is not linearly separable?

Real-world data often looks like:

○ ○ ○

× ○ ×

○ × ○

No straight line can separate this.

SVM uses Kernel Functions.

Kernel Trick

Kernel trick transforms lower-dimensional data into higher dimensions so that separation becomes easier.

Original Data
       ↓
Transform dimensions
       ↓
Find hyperplane

Types of Kernels

1. Linear Kernel

Used for linearly separable data.

Formula:

K(x_i,x_j)=x_i^Tx_j

Applications:

  • Text classification
  • Spam detection

2. Polynomial Kernel

Used when relationships are curved.

Formula:

K(x_i,x_j)=(x_i^Tx_j+c)^d

Applications:

  • Image processing

3. Radial Basis Function (RBF) Kernel

Most commonly used kernel.

Formula:

K(x_i,x_j)=e^{-\gamma||x_i-x_j||^2}

Applications:

  • Complex datasets

4. Sigmoid Kernel

Formula:

K(x_i,x_j)=\tanh(\alpha x_i^Tx_j+c)

Applications:

  • Neural-network-like behavior

Types of SVM

1. Linear SVM

Uses a straight-line hyperplane.

Example:

Spam / Not Spam

2. Nonlinear SVM

Uses kernels.

Example:

Image Classification

3. Support Vector Regression (SVR)

Used for regression problems.

Example:

House Price Prediction

Hyperparameters in SVM

1. C Parameter

Controls penalty for incorrect classification.

Example:

C=100

High C:

Low training error
Higher overfitting risk

Low C:

More flexible boundary
Less overfitting

2. Gamma

Controls influence of nearby points.

Example:

gamma=0.1

High Gamma:

Complex boundaries

Low Gamma:

Smooth boundaries

Performance Metrics for Classification SVM

Accuracy

Accuracy=\frac{TP+TN}{TP+TN+FP+FN}

Precision

Precision=\frac{TP}{TP+FP}

Recall

Recall=\frac{TP}{TP+FN}

F1 Score

F1=2\times\frac{Precision\times Recall}{Precision+Recall}

ROC-AUC

Interpretation:

AUC = 1 → Perfect

AUC = 0.5 → Random

Performance Metrics for SVR

MAE

MAE=\frac{1}{n}\sum|y_i-\hat y_i|

MSE

MSE=\frac{1}{n}\sum(y_i-\hat y_i)^2

RMSE

RMSE=\sqrt{\frac{1}{n}\sum(y_i-\hat y_i)^2}

R² Score

R^2=1-\frac{SS_{res}}{SS_{tot}}

Advantages

  1. Works well with high-dimensional data
  2. Effective with small datasets
  3. Handles complex boundaries using kernels
  4. Less overfitting because of maximum margin

Disadvantages

  1. Slow for very large datasets
  2. Difficult to interpret
  3. Choosing kernel can be difficult
  4. Requires parameter tuning

Real-world Applications

  • Face recognition
  • Text classification
  • Disease prediction
  • Spam detection
  • Handwriting recognition
  • Image classification

Complete Workflow

Collect Data
      ↓
Preprocess Data
      ↓
Select Kernel
      ↓
Find Support Vectors
      ↓
Build Hyperplane
      ↓
Classify Data
      ↓
Evaluate Performance

One-line summary

Support Vector Machine (SVM) is a supervised learning algorithm that finds the optimal hyperplane with maximum margin to separate classes and can use kernels to handle non-linear data.