Stop burning money on the wrong batch size
Once your training runs become material in terms of wall-clock time and hardware budget, it's time...
Deep learning has opened the door to tackling many real-world computer vision problems. But building and deploying Deep Learning models has always been a tedious task of labeling the entire dataset, finding the right hyperparameters, determining a data augmentation policy, and then deploying the model.
Masterful makes building and deploying Deep Learning models orders of magnitude easier. All you need to get started is to label a small fraction of your image data. Masterful’s meta-learner finds the optimal hyperparameters, augmentation policy, and trains the model both on labeled and unlabeled data.
In this post, we’ll show how easy it is to solve a real-world computer vision problem using Masterful.
Autonomous driving vehicles have been on the rise and one of their primary tasks is to predict the drivable area in front of a vehicle. There are cameras mounted on the body of the car capturing images. Our task is to make use of the data coming from these cameras to train a Deep Neural Network that can predict the drivable area.
During training, we run the network through batches of labeled images, calculate the loss per batch, compute the best direction and relative magnitude, or gradient, and then update the model parameters in the direction of steepest gradient. We repeat this process until we get to the bottom of the loss-valley.
We need to collect and label data to train a neural network to accomplish a specific task. An obvious way of collecting data for the road segmentation task is to capture the images from the camera mounted on the car and label every pixel in each image whether it belongs to the road or not. To save time, let’s work with the Cityscapes dataset.
The Cityscapes dataset consists of 3475 annotated images for train and val sets and 1525 images for the test set. Each image is of resolution 2048x1024 and its corresponding label (segmentation mask) maps each image pixel to one of the 30 classes. Since we’re just focusing on the road in this example, we’ll throw away all the classes except ‘road’.
Below is a sample image and its corresponding label
The first coding task is to build a data input pipeline. We will use this pipeline later to train our deep learning model. This blog post includes code snippets, but click here for the full working Jupyter notebook.
Unzip and save the cityscapes dataset anywhere on your disk and build the tf.data input pipeline using the image and label paths. Set the image pixel value range to [0-1], which is the normalized input range Masterful requires.
Next, we select the model architecture. In nearly all cases, we should start with an existing model architecture and pretrained weights. It's generally not the case that building a quick simple convnet is an easier place to start - instead, you'll likely run into bugs in your architecture as well as unexpected behaviors. But using an existing massive architecture is not that efficient with your time either when you are just trying to get a project off the ground. For segmentation, U-Net architecture has proven to yield excellent performance. It consists of a contracting path (left side) and an expansive path (right side). Since Imagenet weights are readily available for modern architectures like EfficientNetB0, let’s use that in the contracting path, and get 5 feature maps that will be used in the expansive path.
The segmentation_models repo provides an implementation of EfficientNetB0 based U-Net architecture with pretrained weights. We use that as follows
Before we train the model, it’s always good to confirm that the model will actually run in inference mode, and that the predictions are random.
The Masterful platform learns how to train the model by focusing on five core organizational principles in deep learning: architecture, data, optimization, regularization, and semi-supervision.
Next, we set up Masterful to learn the optimal parameters for each of the five buckets above.
Now let’s train the model using Masterful.
In <20 minutes of training with only 10% of the data, the model achieves an accuracy of 94.3% on the test set. As you see in the below images, the model predicts the road accurately. Of course, predicting the drivable area is a more complex task that involves combining the results of different models such as road segmentation, lane line detection, traffic sign detection, and object detection.
The next step is to save the above trained model and deploy it as a service.
Now that we’ve trained the model to an acceptable performance level, we need to deploy it to put it in use for an application. In this section, we will describe how to deploy the model as a service, which is appropriate for batch processing, web clients and mobile apps. Another approach to consider is deploying to an edge device, which increases privacy, reduces latency, and enables offline inference. But edge device deployment requires work to optimize the model to work within hardware constraints as well as needing a C/C++ client.
We will use Tensorflow Serving for deployment, which is the official method for deploying models trained with Tensorflow. Tensorflow Serving is a server to accept inference requests. Tensorflow Serving takes a trained model and assigns a specific version to it.
The first step is to save the trained model. You’ll need to save the path so that you can pick it up on the following step for serving. The default output format is Tensorflow’s SavedModel, which is perfectly compatible with Tensorflow Serving.
The model is now ready to be deployed. The preferred method to set up Tensorflow Serving is Docker.
$ docker pull tensorflow/serving
The last step is binding the model to an endpoint and assigning it a version. You can do that in one command by running:
$ docker run -p 8501:8501 --mount type=bind,source=/path/to/model_output_dir/v1/, target=/models/my_model -e MODEL_NAME=my_model -t tensorflow/serving
At this point the model is deployed and ready to accept incoming inference requests from clients at the following endpoint which exposes a REST API:
To test the server, the client will encode an image and send a request to the endpoint and get a prediction back in the response.
Congratulations! In this blog post you’ve experienced the full journey from beginning with a real-world problem, to building and training the right model for it, to deploying it and creating a solution to that problem powered by AI.
The steps in this blog post are generally applicable when building an application using AI to solve a computer vision problem. It is a recipe for starting from scratch on a practical real-world CV problem, whether for fun, academic research, spawning a new team in a company, or even building a CV startup. If you’ve used this blog post to do any of the above, we’d love to hear about it!
For the full code version, see this guide at docs.masterfulai.com. Masterful is free for personal, academic, and trial use. Just run 'pip install masterful' in your Python terminal to get the latest release. If you have any questions or ideas, we'd love to connect with you in our Community Slack.
Ray Tawil contributed to this post with the Deployment section and literature reviews.
Machine Learning Engineer and Researcher, Masterful AI