Back to Blog

A simple way to improve your CV model with unlabeled data

Image of Jack Lynch
Jack Lynch

Semi-supervised learning (SSL) unlocks value in your unlabeled data, but it can be difficult to implement. Masterful can fully automate model training and apply SSL, but it also allows users to easily use SSL with their existing training code. We call this Simple SSL, and the benefits are less labeling and a more accurate model. In this post we'll show you how to try it.


Masterful’s Simple SSL augments your training data using unlabeled data and a model you’ve trained on the labeled set. It does this with a single function, masterful.ssl.analyze_data_then_save_to(). We can try this out pretty quickly on an example dataset—let’s start with TensorFlow’s Flowers dataset, removing half of the labels from our images to simulate a bank of unlabeled data.

Example images from TensorFlow’s Flowers dataset.

To use Masterful’s Simple SSL, we need (1) a dataset with an unlabeled split, and (2) a model trained well on our labeled data. We can start by loading the data:

And then training a simple ResNet-50 backbone on the data for 15 epochs:

Now that we have a baseline model, we can use Simple SSL to enrich our dataset:

Once you’ve generated the enriched data, it’s easy to use: in fact, it can be loaded like any other TensorFlow dataset. Let’s use the augmented data to train a new model:

Comparing the model trained with Simple SSL to the baseline, we can see a 26% boost in accuracy:

Accuracy improvement achieved by incorporating Simple SSL into the above rudimentary training code.

How did we achieve this jump in accuracy? Simple SSL attempts to group your unlabeled data into one of your existing label categories (we call this “automatic labeling”). Automatic labels can be thought of as predicted labels, subject to some level of uncertainty. If created correctly, they can help “fill in” a model’s understanding of each class. To illustrate this, let’s plot our baseline model’s embeddings for the labeled data (reduced to 3D using UMAP, and colored by class), and fade in the autolabeled data:

ezgif.com-gif-maker (24)-1

UMAP visualization of our model’s latent space, colored by label. Autolabeled data fades into view.

 

We can see various clusters of approximately same-class samples are “filled” with unlabeled data. When we autolabel this unlabeled data, it will improve classification decision boundaries.

Simple SSL represents an easy way to start using SSL, but  Masterful’s full SSL support can push performance even further. For example, see our previous post demonstrating how the use of unlabeled data helps Masterful extend its lead over Google Vertex. (If you plug that dataset into this approach, Simple SSL still nets you a multiple-point jump in accuracy, but the platform’s overall performance is better.)

Masterful is free for personal, academic, and trial use. Just run 'pip install masterful' in your Python terminal to get the latest release, then head over to the Quickstart to start training your first model in minutes. If you give Simple SSL a try and observe anything interesting, exciting, or confusing, feel free to talk about it in our Community Slack!


Related Posts

Google Vertex vs Masterful AI (Part 1)

Image of Jack Lynch
Jack Lynch

Percent top-1 error reductions achieved when moving from Google Vertex to Masterful AI. 

Read more

10x your Computer Vision development with Masterful Low-Code

Image of Tom Rikert
Tom Rikert

Today we’re making a big leap forward in enabling new applications and insights for every...

Read more