Performance Metrics for Regression

5 min readJul 24, 2021

While working on a Regression models, we may come across different types of performance of different models.

For example, the below images shows the performance of two different Linear Regression models.

Performance of two simple linear regression model

Although we can clearly see that the first model is performing better than the second model but we also need to quantify the difference between the performance of the two.

In that case, performance metrics comes handy :).

So, let’s start learning about different performance metrics one by one.

1. Mean Squared Error (MSE)

It is the average of square of the difference between actual and the predicted value.

This metric is not used much to judge the performance of a machine learning model as it magnifies the small errors.

For Example, Suppose

y_actual = 100, 105, 110, 115, 200
y_predicted = 120, 125, 135, 140, 160
Hence, for this case the MSE would be 730 which is a very large value due to which one can think that the model is not performing well.

Since, one can get the wrong interpretation of the model that’s why this metric is not commonly used.

2. Root Mean Squared Error

As the name suggests, it is simply the square root of Mean Squared Error. It is written as:

It is more commonly used and preferred metric as it is easier to interpret and also it is not much affected by outliers as in the case of MSE.

Let’s understand it with an example:

Suppose, y_actual = 100m, 105m, 110m, 115m, 200m
y_predicted = 120m, 125m, 135m, 140m, 160m
So, MSE = 730m² and RMSE = 27m
Hence, from here we can clearly understand that it is totally ridiculous to say that my model is predicting the value of 160m with the error of 730m² whereas it is not the case with RMSE as we can clearly interpret it as we are getting the predicted value of 160m with the error of 27m.

3. Mean Absolute Error

It is given as the average of the summation of the absolute difference between predicted value and absolute value.

It is not much commonly used and preferred metric for the evaluation of a model. WHY ?

Let’s continue with the above example:
Considering the same y_actual and y_predicted as taken above,
MSE = 730m², MAE = 26m, RMSE = 27m.

As we can clearly see here, RMSE > MAE, as it first squares the errors and then take the square root of their summation so it penalizes the large errors ,i.e, it shows the affect of outliers on the model but this can’t be done efficiently by MAE.

4. Median Absolute Error

This is almost similar to MAE, but in this case median of errors in considered rather than their average.

It is not affected by the outliers at all but it is not much used because it has high time complexity as in order to find median, the values are sorted first and then the median is found out which takes a lot of time so it can’t be used on huge datasets.

5. R-Squared (Coefficient of Determination)

It represents the coefficient of how well the machine learning model performs in comparison of Simple Mean Model.

It is given by:

With the help of below two images, you can clearly understand that among what two lines the comparison is being done:)

The first is Simple Mean Model and Second is Simple Linear Regression Model

So, the comparison of above two images is actually quantified by the R Squared metric.

Its values usually ranges from 0 to 1, more closer the value is to the 1 the better the model is performing.

Sometimes, the term R² confuses the users and they start thinking that this can never be negative. But the value, in some cases becomes negative as well. When the ML model is performing even worse than Simple Mean model then in those cases the value of R² comes out to be negative. Such case is shown below:

Mean is predicting better than Regression line

6. Adjusted R Squared

This is the modified version of R Squared metric as it tries to overcome the drawback of R Squared metric.

The problem with R Squared is that, its value gets increased when you add more independent variables during predictions even if they have no correlation with the target variable, so it kind off distracts the user that his model is performing even better with new feature even though it has no significance.

So, in case of Adjusted R Squared, the equation is modified such that the value of the metric decreases when a useless feature is added during the prediction of target variable.

N = Number of items in the dataset
P = Number of independent features

So, if a useless feature is added the value of P will increase and the denominator will decrease due to which the whole negative term will increase in value and the whole Adjusted R² will decrease, showing that added feature is irrelevant.