November 19, 2013

# A confused tangle

A confusion matrix is a confusing thing. There’s a surprising number of useful statistics that can be built out of just four numbers and the links between them are not always obvious. The terminology doesn’t help (is a true negative an observation that is truly in the class but classified negative or one that is negative and has truly been classified as such!) and neither does the fact that many of the statistics have more than one name (Recall=sensitivity=power!).

To unravel it a little i’ve used the tangle js library to create an interactive document that shows how these values are related. The code can be found here.

###### The interactive example

I have trained my classifier to separate wolves from sheep. Let’s say sheep is a positive result and wolf is a negative result (sorry wolves). I now need to test it on my test set. This consists of wolves and sheep That&#8217s test subjects altogether.

Say my classifier correctly identifies sheep as sheep (true positives) and wolves as wolves (true negatives)

This gives us the confusion matrix below:

Your browser does not support the HTML5 canvas tag.

Now some statistics that need to be untangled!

Precision (aka positive predictive value) is TP/(TP+FP). In our case /( + ) =

Recall (aka sensitivity, power) is TP/(TP+FN). In our case /( + ) =

Specificity is TN/(TN+FP). In our case /( + ) =

Negative predictive value is TN/(TN+FN). In our case /( + ) =

Accuracy is (TP+TN)/(TP+TN+FP+FN). In our case ( + )/( + + + ) =

False discovery rate is FP/(FP+TP). In our case /( + ) =

False positive rate (aka false alarm rate, fall-out) is FP/(FP+TN). In our case /( + ) =