Bias Vs Variance

2 min readMay 18, 2023

Bias and variance are two fundamental sources of error in machine learning models that every data scientist should know about. Understanding these concepts is key to building models that generalize well and don’t just memorize the training data. They are part of the Bias-Variance trade-off.

Bias: Bias is the error due to overly simplified assumptions in the learning algorithm that make the model unable to capture the underlying pattern in the data. For example, assuming that data is linear when it has a more complicated structure could lead to a high bias model. High-bias models are also known as “underfitting” models.

Variance: Variance is the error due to the model’s excessive sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs. This over-complexity might work well on the training data but performs poorly on unseen data, which is known as “overfitting”.

The bias-variance tradeoff is a key problem in supervised learning models. Ideally, you want to choose a model complexity that minimizes the total error, which is the sum of bias, variance, and irreducible error. Irreducible error is the error that cannot be reduced regardless of the algorithm because it’s caused by factors not represented in the data.

High bias, low variance models have a high error on the training data, but the difference in error between the training and test datasets is small.

Low bias, high variance models have a low error on the training data, but the difference in error between the training and test datasets is large.

One of the ways to manage the bias-variance tradeoff is through techniques like cross-validation, regularization (like Lasso and Ridge), and ensemble methods (like Bagging and Boosting).

Bias Vs Variance

Written by Sanjay Kumar PhD

No responses yet