Ternary Classification Matrix

Chaquayla Halmon
4 min readFeb 20, 2021

Starting off, this is my first experience with ternary classification. I have to say that I didn’t like it. I wasn’t sure how to tackle this, especially being taught how to use confusion matrix on binary classification. Of course to practice this I had to choose a large dataset. So we’re going to be looking at the Tanzania Water Pump data, which has about 59k data points. So let’s go.

What is a Confusion matrix?

After cleaning and preprocessing your data, you’re going to want to feed your data to a model and get your probabilities. Before that we have to test your model’s performance. That’s where confusion matrix, also known as an error matrix, steps in. It is a table that displays visualization of a performance of a supervised learning algorithm, typically. A confusion matrix tells you four important observations: true positive, true negative, false positive, and false negative. Let’s look at our data from the Tanzanian dataset.

So after cleaning data and making it look pretty for modeling. Let’s pull out our classes, which is status_group. I changed status_group to target. So you’re dealing with three class below:

0 — Nonfunctional Water Points, 1 — Functional Water Points, and 2 — Functional needs repair.

Not that the classes are in numeric form. Before fitting your model you have to make sure all categorical data has a numeric datatype, either through self coding or one-hot coding.

Now that we have our three classes, it’s time to fit them to a confusion matrix. This is the easy part with a little bit of coding. Sklearn has a quick and easy way to create them. In their library you can find the confusion_matrix() function inside the sklearn.metrics module. Here are the steps:

  1. Assign your classes to the y variable and all the other columns to X variable.

2. Train-test-split your variables and fit to a model. Use that model to make some predictions.

3. Import the confusion matrix and plot matrix for later. Add the true y values and predicted values to the confusion matrix function. Voila!!! Your first ternary confusion matrix.

Great you did and it didn’t take that long. Here comes the hard part. Understanding what the matrix is actually telling you. With a binary classification it is easy to tell what is true positive and what’s false postive. However when you add a third element it can get a bit tricky. Just a little.

A confusion matrix tells you four observations: true positive, true negative, false positive, and false negative. So reference back to this project:

if it was binary

However we have to add our third class in the mix. So let’s look at our confusion matrix again and interpret the third class.

The diagonal represents our True Positives since the indexes are the same for both row and column. If we look at location [0,2], we can conclude that 34 non-functional water points were classified as functional needs repair. Note when view through the nonfunctional lens, these are False Negatives because our model predicted they were functional needs repair and they’re not. However, they are also False Positives for functional needs repair since our model said they were functional needs repair and they weren’t. Are you confused yet? It gets easier over time and visuals help a lot.

SO GOOD LUCK!!!!

--

--