Evaluation metrics for Linear Regression
Assume we have following set of data
X | Y |
---|---|
20 | 23 |
21 | 21 |
22 | 26 |
23 | 22 |
24 | 25 |
25 | 24 |
For this data, we get this linear regression using this calculator
y = 15.14 + 0.37x
We can solve this equation and get the value of $\hat y$
X | Y | $ \hat Y$ |
---|---|---|
20 | 23 | 22.54 |
21 | 21 | 22.91 |
22 | 26 | 23.28 |
23 | 22 | 23.65 |
24 | 25 | 24.02 |
25 | 24 | 24.39 |
We can evaluate this regression line in terms of different error metrics :-
MAE (Mean Absolute Error) :-
$$MAE = (\frac{1}{n})\sum_{i=1}^{n}\left | y_{i} - \hat y_{i} \right |$$
where y = actual value in the data set ; $\hat y$ = value computed by solving the regression equation
- Calculate the difference between Y and $\hat Y$
- Get the absolute values
- Take the mean/average i.e. divide by number of elements
X = [20, 21, 22, 23, 24, 25]
Y = [23, 21, 26, 22, 25, 24]
Y_BAR = [22.54, 22.91, 23.28, 23.65, 24.02, 24.39]
# Core Python
n = len(X) # Use length of either X or Y to get number of elements
s = 0
for i in range(0,n):
s += abs(Y[i] - Y_BAR[i])
MAE = s/n
print ("MAE using Python: %", MAE)
# Using Scikit-Learn Library
from sklearn.metrics import mean_absolute_error
MAE_sci = mean_absolute_error(Y, Y_BAR)
print ("MAE using Sklearn: % ", MAE_sci)
# Using Numpy
import numpy as np
MAE_numpy = np.mean(np.abs(np.subtract(Y,Y_BAR)))
print ("MAE using Numpy: % ", MAE_numpy)
MSE (Mean Square Error) :-
$$MSE = (\frac{1}{n})\sum_{i=1}^{n}\left ( y_{i} - \hat y_{i} \right )^2$$
where y = actual value in the data set ; $\hat y$ = value computed by solving the regression equation
- Calculate the difference between Y and $\hat Y$
- Take a square
- Take the mean/average i.e. divide by number of elements
X = [20, 21, 22, 23, 24, 25]
Y = [23, 21, 26, 22, 25, 24]
Y_BAR = [22.54, 22.91, 23.28, 23.65, 24.02, 24.39]
# Core Python
n = len(X) # Use length of either X or Y to get number of elements
s = 0
for i in range(0,n):
s += (Y[i] - Y_BAR[i])**2
MSE = s/n
print ("MSE using Python: %", MSE)
# Using Scikit-Learn Library
from sklearn.metrics import mean_squared_error
MSE_sci = mean_squared_error(Y, Y_BAR)
print ("MSE using Sklearn: % ", MSE_sci)
# Using Numpy
import numpy as np
MSE_numpy = np.mean(np.square(np.subtract(Y,Y_BAR)))
print ("MSE using Numpy: % ", MSE_numpy)
RMSE (Root Mean Square Error) :-
$$RMSE = \sqrt{(\frac{1}{n})\sum_{i=1}^{n}\left ( y_{i} - \hat y_{i} \right )^2}$$
where y = actual value in the data set ; $\hat y$ = value computed by solving the regression equation
- Calculate the difference between Y and $\hat Y$
- Take a square
- Take the mean/average i.e. divide by number of elements
- Take the square root
X = [20, 21, 22, 23, 24, 25]
Y = [23, 21, 26, 22, 25, 24]
Y_BAR = [22.54, 22.91, 23.28, 23.65, 24.02, 24.39]
# Core Python
from math import sqrt
n = len(X) # Use length of either X or Y to get number of elements
s = 0
for i in range(0,n):
s += (Y[i] - Y_BAR[i])**2
RMSE = sqrt(s/n)
print ("RMSE using Python: %", RMSE)
# Using Scikit-Learn Library
from sklearn.metrics import mean_squared_error
RMSE_sci = sqrt(mean_squared_error(Y, Y_BAR))
print ("RMSE using Sklearn: % ", RMSE_sci)
# Using Numpy
import numpy as np
RMSE_numpy = np.sqrt(np.mean(np.square(np.subtract(Y,Y_BAR))))
print ("RMSE using Numpy: % ", RMSE_numpy)
RAE (Relative Absolute Error) :-
$$RAE = \frac{\sum_{i=1}^{n}\left | y_{i} - \hat y_{i} \right |}{\sum_{i=1}^{n}\left | y_{i} - \bar y \right |}$$
where y = actual value in the data set ; $\hat y$ = value computed by solving the regression equation ; $\bar y$ is mean value of y
- Calculate the difference between Y and $\hat Y$ for each row, take absolute value and sum it all
- Calculate the mean of Y denoted by $\bar Y$
- Calculate the difference between Y and $\bar Y$ for each row, take absolute value and sum it all
- Divide value obtained in step1 by step3
X = [20, 21, 22, 23, 24, 25]
Y = [23, 21, 26, 22, 25, 24]
Y_BAR = [22.54, 22.91, 23.28, 23.65, 24.02, 24.39]
# Using Numpy
import numpy as np
RAE_numpy = np.sum(np.abs(np.subtract(Y,Y_BAR))) / np.sum(np.abs(np.subtract(Y, np.mean(Y))))
print ("RAE using Numpy: % ", RAE_numpy)
RSE (Relative Squared Error) :-
$$RSE = \frac{\sum_{i=1}^{n}\left ( y_{i} - \hat y_{i} \right )^2}{\sum_{i=1}^{n}\left ( y_{i} - \bar y \right )^2}$$
where y = actual value in the data set ; $\hat y$ = value computed by solving the regression equation ; $\bar y$ is mean value of y
- Calculate the difference between Y and $\hat Y$ for each row, square it and sum it all
- Calculate the mean of Y denoted by $\bar Y$
- Calculate the difference between Y and $\bar Y$ for each row, square it and sum it all
- Divide value obtained in step1 by step3
X = [20, 21, 22, 23, 24, 25]
Y = [23, 21, 26, 22, 25, 24]
Y_BAR = [22.54, 22.91, 23.28, 23.65, 24.02, 24.39]
# Using Numpy
import numpy as np
RSE_numpy = np.sum(np.square(np.subtract(Y,Y_BAR))) / np.sum(np.square(np.subtract(Y, np.mean(Y))))
print ("RSE using Numpy: % ", RSE_numpy)