Precision and Recall, and autumn leaves
This essay is from a blog that I had written 3 years ago. My daughter (mentioned in the essay) is now 10.
TL;DR - Precision measures how many of your predictions are actually correct. Recall measures how many of all the actual correct ones you have correctly predicted. High Precision and Recall is obviously great, but you seldom can have both. The optimal balance between the two is purely based on the situation.
Fall is now in full force here in the Seattle area, and so came the blanket of leaves covering my yard. Last weekend, I was shoveling the leaves from the nice pile my 7-year-old daughter had proudly created into the organic waste bin when I noticed a good amount of pebbles mixed in the pile. It occurred to me that this was a great analogy to explain the two concepts that are commonly used to measure the performance of ML models (but are easily confused)—Precision & Recall.
Say there are 100 leaves and 100 pebbles in the pile. I want to make each scoop of my shovel as efficient as possible, so I take a calculated scoop. It results in having 40 leaves and 10 pebbles on my shovel.
The performance for my scoop (ML model or solution) can be measured by the following. Out of the 100 leaves (positives) in the pile, I was able to pick out (predict) 40. My scoop has a Recall of 40% (40/100).
Of the 50 items I scooped (40 leaves + 10 pebbles), I was able to pick 40 leaves. My scoop has a Precision of 80% (40/50).
An easy way to remember is to mentally note 'precision = shovel'—since everything you need to calculate precision is in the shovel.
We can also use this analogy to explain the trade-off between Precision and Recall, with the trade-off being that increasing one value decreases the other.
Obviously, the higher the Precision and Recall, the better. But increasing one often decreases the other.
Say I want to maximize my Recall. I can get a giant shovel and just scoop the entire pile. I would have scooped up 100 out of the 100 leaves in the pile, resulting in a perfect Recall (100% - 100/100). But now I have 50% (100/200) Precision instead of 80%.
Say I now want to maximize Precision. I can use a much smaller shovel so I can prevent accidentally picking up pebbles, but due to its small size I can only pick up 10 leaves. Now a scoop gives me 10 leaves and no pebbles, and thus I have perfect Precision (100% - 10/10). But now I have 10% (10/100) Recall instead of 40%.
So which combination of Precision and Recall is best?
This totally depends on the situation under which the solution (shovel scoop) will be deployed. If I was pressed for time and I needed to complete the chore with the minimum number of scoops, I would value Recall over Precision, and just scoop up the pile with a single giant shovel (along with the 100 pebbles). But since in this case the whole point was spending time with my 7-year-old, we hand-picked leaves from the pile—aka, ran the solution with perfect precision 10 times.
This is also why data scientists cannot build great solutions in isolation. They need to understand the environment or the nuances of the application in order to make the right decisions during development and training.
I found the following two well-written blogs that describe 'precision and recall'. They both use fishing analogies coincidentally and are very intuitive.