Back to Blog

Tackling a real-world CV problem with Masterful AI

Image of Nikhil Gajendrakumar
Nikhil Gajendrakumar

Deep learning has opened the door to tackling many real-world computer vision problems. But building and deploying Deep Learning models has always been a tedious task of labeling the entire dataset, finding the right hyperparameters, determining a data augmentation policy, and then deploying the model.

Masterful makes building and deploying Deep Learning models orders of magnitude easier. All you need to get started is to label a small fraction of your image data. Masterful’s meta-learner finds the optimal hyperparameters, augmentation policy, and trains the model both on labeled and unlabeled data. 

In this post, we’ll show how easy it is to solve a real-world computer vision problem using Masterful.

Road segmentation

Autonomous driving vehicles have been on the rise and one of their primary tasks is to predict the drivable area in front of a vehicle. There are cameras mounted on the body of the car capturing images. Our task is to make use of the data coming from these cameras to train a Deep Neural Network that can predict the drivable area.

During training, we run the network through batches of labeled images, calculate the loss per batch, compute the best direction and relative magnitude, or gradient, and then update the model parameters in the direction of steepest gradient. We repeat this process until we get to the bottom of the loss-valley.

Data collection

We need to collect and label data to train a neural network to accomplish a specific task. An obvious way of collecting data for the road segmentation task is to capture the images from the camera mounted on the car and label every pixel in each image whether it belongs to the road or not. To save time, let’s work with the Cityscapes dataset.

Data description

The Cityscapes dataset consists of 3475 annotated images for train and val sets and 1525 images for the test set. Each image is of resolution 2048x1024 and its corresponding label (segmentation mask) maps each image pixel to one of the 30 classes. Since we’re just focusing on the road in this example,  we’ll throw away all the classes except ‘road’.

Below is a sample image and its corresponding label

Build the data pipeline with tf.data

The first coding task is to build a data input pipeline. We will use this pipeline later to train our deep learning model. This blog post includes code snippets, but click here for the full working Jupyter notebook

Unzip and save the cityscapes dataset anywhere on your disk and build the tf.data input pipeline using the image and label paths. Set the image pixel value range to [0-1], which is the normalized input range Masterful requires.

Model Selection

Next, we select the model architecture. In nearly all cases, we should start with an existing model architecture and pretrained weights. It's generally not the case that building a quick simple convnet is an easier place to start - instead, you'll likely run into bugs in your architecture as well as unexpected behaviors. But using an existing massive architecture is not that efficient with your time either when you are just trying to get a project off the ground. For segmentation, U-Net architecture has proven to yield excellent performance. It consists of a contracting path (left side) and an expansive path (right side). Since Imagenet weights are readily available for modern architectures like EfficientNetB0, let’s use that in the contracting path, and get 5 feature maps that will be used in the expansive path.

The segmentation_models repo provides an implementation of EfficientNetB0 based U-Net architecture with pretrained weights. We use that as follows

Check the model output before training

Before we train the model, it’s always good to confirm that the model will actually run in inference mode, and that the predictions are random.

Setup Masterful

The Masterful  platform learns how to train the model by focusing on five core organizational principles in deep learning: architecture, data, optimization, regularization, and semi-supervision.

  1. Architecture is the structure of weights, biases, and activations that define a model. In this example, the architecture is defined by the model you created above.
  2. Data is the input used to train the model. In this example, you are using a labeled training dataset - CIFAR-10. More advanced uses of the Masterful  platform can take into account unlabeled and synthetic data as well, using a variety of different techniques.
  3. Optimization means finding the best weights for a model and training data. Optimization is different from regularization because optimization does not consider generalization to unseen data. The central challenge of optimization is speed - find the best weights faster.
  4. Regularization means helping a model generalize to data it has not yet seen. Another way of saying this is that regularization is about fighting overfitting.
  5. Semi-Supervision is the process by which a model can be trained using both labeled and unlabeled data, to improve accuracy and reduce labeling time and costs.

Next, we set up Masterful to learn the optimal parameters for each of the five buckets above.

 

Training

Now let’s train the model using Masterful.

Results

In <20 minutes of training with only 10% of the data, the model achieves an accuracy of 94.3% on the test set. As you see in the below images, the model predicts the road accurately. Of course, predicting the drivable area is a more complex task that involves combining the results of different models such as road segmentation, lane line detection, traffic sign detection, and object detection.

The next step is to save the above trained model and deploy it as a service.

Deployment

Now that we’ve trained the model to an acceptable performance level, we need to deploy it to put it in use for an application. In this section, we will describe how to deploy the model as a service, which is appropriate for batch processing, web clients and mobile apps. Another approach to consider is deploying to an edge device, which increases privacy, reduces latency, and enables offline inference. But edge device deployment requires work to optimize the model to work within hardware constraints as well as needing a C/C++ client.

We will use Tensorflow Serving for deployment, which is the official method for deploying models trained with Tensorflow. Tensorflow Serving is a server to accept inference requests. Tensorflow Serving takes a trained model and assigns a specific version to it.

The first step is to save the trained model. You’ll need to save the path so that you can pick it up on the following step for serving. The default output format is Tensorflow’s SavedModel, which is perfectly compatible with Tensorflow Serving.

model.save(‘/path/to/model_output_dir/v1/’)

The model is now ready to be deployed. The preferred method to set up Tensorflow Serving is Docker.

$ docker pull tensorflow/serving

The last step is binding the model to an endpoint and assigning it a version. You can do that in one command by running:

$ docker run -p 8501:8501
    --mount type=bind,source=/path/to/model_output_dir/v1/,
    target=/models/my_model
    -e MODEL_NAME=my_model -t tensorflow/serving

At this point the model is deployed and ready to accept incoming inference requests from clients at the following endpoint which exposes a REST API:

http://localhost:8501/v1/models/my_model:predict

To test the server, the client will encode an image and send a request to the endpoint and get a prediction back in the response.

Congratulations! In this blog post you’ve experienced the full journey from beginning with a real-world problem, to building and training the right model for it, to deploying it and creating a solution to that problem powered by AI.

The steps in this blog post are generally applicable when building an application using AI to solve a computer vision problem. It is a recipe for starting from scratch on a practical real-world CV problem, whether for fun, academic research, spawning a new team in a company, or even building a CV startup. If you’ve used this blog post to do any of the above, we’d love to hear about it!

For the full code version, see this guide at docs.masterfulai.comMasterful is free for personal, academic, and trial use. Just run 'pip install masterful' in your Python terminal to get the latest release.  If you have any questions or ideas, we'd love to connect with you in our Community Slack.

Acknowledgements

Ray Tawil contributed to this post with the Deployment section and literature reviews. 

References

  1. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. "U-Net: Convolutional Networks for Biomedical Image Segmentation". https://arxiv.org/pdf/1505.04597.pdf
  2. Mingxing Tan and Quoc V. Le. "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks". https://arxiv.org/pdf/1905.11946.pdf
  3. Segmentation models with pretrained backbones. https://pypi.org/project/segmentation-models/0.1.2/
  4. The cityscapes dataset for semantic urban scene understanding. https://openaccess.thecvf.com/content_cvpr_2016/papers/Cordts_The_Cityscapes_Dataset_CVPR_2016_paper.pdf


Related Posts

Stop burning money on the wrong batch size

Image of Nikhil Gajendrakumar
Nikhil Gajendrakumar

Once your training runs become material in terms of wall-clock time and hardware budget, it's time...

Read more

Hyperparameters that can save your AWS bills

Image of Nikhil Gajendrakumar
Nikhil Gajendrakumar

       

Read more