Updated: Nov 14, 2021
Part one of this series dealt with the main components of error: bias and noise. Now, part two will explore what errors mean in terms of simple statistics.
Assume that you know for a fact that you weigh 70kg. That would be the red line or the true value. You then step ten times on a scale and plot all the returned values, which make up a curve. In a normal distribution, the mean of values (for example, 72 kg) would be the point where the curve can be cut into two symmetrical parts. Then, we can say that:
Accuracy or bias is the distance between the average of measurements and the true value.
Noise is the standard deviation of measurements values. Interestingly, that means that you do not need to know the true value to reduce noise! We will investigate that further in part three. Noise is also known as the amount of imprecision, random error, distance to other measures, and variability.
However, we now have an issue with the true value. In part one, we assumed that the true value was known: you knew for a fact that you weighed 70 kg and you were trying to assess your scale error. But in real life, although you have a scale, you don't know your true weight and you don't know anything about the error behavior of your scale. The lacking true value brings us to the trade-off between bias and variance, a pillar of machine learning.
When you do not know the true value, a little bit of variability and a little bit of bias is a good thing. Imagine that you have an old map of treasure buried on an island. The landscape has changed since the map was drawn. Though you dug exactly where you had thought the treasure was, you did not find anything. To maximize your chances, you dig around the spot. You want to space the trials enough so that you don’t miss the spot, but not too much so that you are not too far from the original spot. If you dig three holes, you neither want them all next to each other (high bias without variability), nor do you want them randomly scattered on the island (low bias and high variability). Bias gives you a rule, noise gives you flexibility.
Curve with too much bias and too little noise, which is unhelpful for forecasting.
Curve with the ideal balance of bias and noise for forecasting.
Curve with too much noise and too little bias, which is unhelpful for forecasting.