Bias vs Variance

Published in

Nerd For Tech

3 min readMay 30, 2021

Bias and variance errors come into picture when we estimate the performance of models. The aim of estimation is to find how much our model generalizes to the unseen data.

Bias

Bias is simply the estimation of difference between prediction and ground truth. In simple terms, the model makes incorrect or simple assumptions on the training data. So, when it encounters unseen data, these assumptions fail which badly affects the model performance. The higher the bias, the poorer the model performance on training and testing data i.e., underfitting.

Variance

Variance is the model’s sensitivity to noise in the dataset. The model tries to fit even the noise making it harder to generalize the unseen data. If the variance is high, the model starts to memorize the data instead of learning it i.e., overfitting. Due to this, the model performs very poorly on the unseen data.

Bias-Variance trade-off

Bias-Variance trade-off is a dilemma when we try to minimize both bias and variance of the model, and it is not possible to do so simultaneously. A generalized model is where we have an optimal bias and variance.

Identifying overfitting and underfitting

We can find overfitting and underfitting from the loss values:

The gap in loss (error) between human performance and training performance shows bias (high bias — underfitting).
The gap in loss between training and testing performance shows variance (high variance — overfitting)

Reducing underfitting and overfitting

A simpler model is the common reason for underfitting. Increasing the model complexity or choosing a different model to learn proper relationship between the data points will help reduce underfitting.

The common reasons for overfitting are less amount of data and imbalanced/noisy dataset. These will allow the model to be sensitive towards the data. The following can be done to reduce overfitting:

Train with more amount of data. In case the data is less, transfer learning or data augmentation will help.
Optimal number of model parameters. Having huge number of parameters in the model makes it less generalizable by memorizing the data.
Cross-validation is a powerful tool to reduce overfitting as the model trains and validates on mini-batches of the data.
Regularization is another powerful technique as it imposes penalty on the misclassification to reduce the loss as much as possible.

Note: Feature-selection/dataset cleansing using EDA (Exploratory Data Analysis) should be the first step in any machine learning problem.

The below bullseye diagram explains the trade-off:

Bias-Variance trade-off (Source: ResearchGate)

In the above diagram, the center grayed circle is the ground truth, and the green points are the predictions. Identify the bias from the distance between the center circle and predictions; identify the variance from the distance between prediction points.

Low Bias-Low Variance: The ideal model where the predictions are closer to the center and to themselves.
Low Bias-High Variance: The predictions are around the center but distant from each other.
High Bias-Low Variance: The predictions are closer to each other but distant from the center.
High Bias-High Variance: The predictions are both distant from each other and to the center.