Model Evaluation Metrics for Supervised Learning

Model evaluation metrics are used to measure how well a machine learning model performs on unseen data.

Different metrics are used depending on the problem:

Supervised Learning
       |
       |---- Regression Metrics
       |        |
       |        |--- MSE
       |        |--- RMSE
       |        |--- MAE
       |
       |---- Classification Metrics
                |
                |--- Accuracy
                |--- Precision
                |--- Recall
                |--- F1 Score
                |--- Confusion Matrix
                |--- ROC Curve

Regression Metrics

Regression metrics are used when predicting continuous values.

Examples:

  • House price prediction
  • Salary prediction
  • Temperature prediction

1. Mean Squared Error (MSE)

Definition

Mean Squared Error measures the average squared difference between actual and predicted values.

The purpose of squaring:

  • Makes all errors positive
  • Gives higher penalty to larger errors

Formula:

MSE=\frac{1}{n}\sum_{i=1}^{n}(y_i-\hat y_i)^2

Where:

  • n = Number of observations
  • yi = Actual value
  • ŷi = Predicted value

Example

Actual values:

[10,20,30]

Predicted values:

[12,18,35]

Errors:

[2,-2,5]

Squared errors:

[4,4,25]

MSE:

(4+4+25)/3

=11

Interpretation

Lower MSE → Better model

Advantages

  • Penalizes larger errors

Disadvantages

  • Sensitive to outliers

2. Root Mean Squared Error (RMSE)

Definition

RMSE is the square root of MSE.

Formula:

RMSE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i-\hat y_i)^2}

Example

If:

MSE=11

Then:

RMSE=√11

≈3.31

Interpretation

Lower RMSE → Better model

RMSE has the same unit as output values.

Advantages

  • Easy interpretation
  • Same units as target variable

Disadvantages

  • Sensitive to outliers

3. Mean Absolute Error (MAE)

Definition

MAE calculates the average absolute difference between actual and predicted values.

Formula:

MAE=\frac{1}{n}\sum_{i=1}^{n}|y_i-\hat y_i|

Example

Actual:

[10,20,30]

Predicted:

[12,18,35]

Absolute errors:

[2,2,5]

MAE:

(2+2+5)/3

=3

Interpretation

Lower MAE → Better model

Advantages

  • Easy to understand
  • Less affected by outliers

Disadvantages

  • Does not strongly penalize large errors

Classification Metrics

Used when predicting categories.

Examples:

  • Spam detection
  • Disease prediction
  • Fraud detection

Confusion Matrix

Confusion Matrix forms the basis for several classification metrics.


Predicted PositivePredicted NegativeActual PositiveTPFNActual NegativeFPTN

Where:

TP = True Positive

TN = True Negative

FP = False Positive

FN = False Negative

Example:


Predicted YesPredicted NoActual Yes455Actual No10404. Accuracy

Definition

Measures percentage of correct predictions.

Formula:

Accuracy=\frac{TP+TN}{TP+TN+FP+FN}

Example:

TP=45
TN=40
FP=10
FN=5

Calculation:

Accuracy=(45+40)/(45+40+10+5)

=0.85

Accuracy:

85%

Advantages

  • Easy to understand

Disadvantages

Not suitable for imbalanced datasets.

Example:

990 healthy
10 disease

Predicting everyone as healthy:

99% accuracy

but poor model.

5. Precision

Definition

Precision measures how many predicted positive cases were actually positive.

Formula:

Precision=\frac{TP}{TP+FP}

Example:

TP=45
FP=10

Calculation:

45/(45+10)

=0.818

Precision:

81.8%

Interpretation

Among predicted positives,
how many are correct?

Applications

  • Spam detection
  • Fraud detection

6. Recall

Definition

Recall measures how many actual positive cases were identified.

Formula:

Recall=\frac{TP}{TP+FN}

Example:

TP=45
FN=5

Calculation:

45/(45+5)

=0.90

Recall:

90%

Interpretation

Among actual positives,
how many were found?

Applications

  • Cancer detection
  • Disease prediction

7. F1 Score

Definition

F1 score is the harmonic mean of Precision and Recall.

Formula:

F1=2\times\frac{Precision\times Recall}{Precision+Recall}

Example:

Precision:

0.81

Recall:

0.90

Calculation:

F1=0.85

Interpretation

Higher F1 Score
      ↓
Better balance

Applications

Useful for imbalanced datasets.

8. ROC Curve

Definition

ROC stands for:

Receiver Operating Characteristic Curve

ROC shows relationship between:

True Positive Rate
            vs
False Positive Rate

Where:

True Positive Rate:

TPR=\frac{TP}{TP+FN}

False Positive Rate:

FPR=\frac{FP}{FP+TN}

ROC Interpretation

AUC=1.0
Perfect model

AUC=0.9
Excellent model

AUC=0.8
Good model

AUC=0.7
Average model

AUC=0.5
Random prediction

Quick Comparison

MetricUsed ForBest ValueMSERegressionLowerRMSERegressionLowerMAERegressionLowerAccuracyClassificationHigherPrecisionClassificationHigherRecallClassificationHigherF1 ScoreClassificationHigherROC-AUCClassificationHigherSimple Memory Trick

MSE → Squared Error

RMSE → Root of Squared Error

MAE → Absolute Error

Accuracy → Overall Correctness

Precision → Correct predicted positives

Recall → Found actual positives

F1 Score → Balance between Precision and Recall

ROC → Model discrimination ability