Serving inferences from your machine learning model with Sagemaker and TensorFlow

When working on a project that requires building and training a custom machine learning model, TensorFlow makes the work much easier. It provides you with most of the tools you’ll need to:

Handle your datasets.
Pre-process the data and post-process the model results.
Train your model.
Transform the information into useful visualizations (especially, when running on a Jupyter notebook).
Save your pre-trained models.
Test inferences.
Etc.

TensorFlow's possibilities are really well documented, from complete beginner to an advanced point of view.

Making real use of our trained model

Once your model has been designed, built and trained to infer predictions about the problem on the table, you’ll likely need to make use of it from one of your applications.

For example, let’s say you’ve trained the model on this TensorFlow example to retrieve the segmentation mask of a pet, given an image. When a user uploads a new picture of a pet, we’ll need to pre-process the image to a valid format that the model can use and post-process the result to convert the segmentation mask to something that makes sense to be used internally by our backend or visualised externally in an app. Note: in this article, we’ll focus on a model based on a convolutional neural network built with Keras but, basically, it should apply to any other model built with TensorFlow.

Sagemaker to serve model inferences

Although TensorFlow already provides some tools to serve your model inferences through its API, with AWS SageMaker you’ll be able to complete the rest of it:

Host the model in a docker container that can be deployed to your AWS infrastructure.
Take advantage of one of the machine learning optimised AWS instances. They are super-powered with different options of CPU, network performance or memory and are capable to rely on GPUs for accelerated computing.
Create an endpoint that can be externally invoked to request predictions.

You can proceed by directly selecting one of the available pre-built docker images when building a SageMaker model or you can build your own container, following a specific structure and deploy it via ECR. The advantages of opting for the latter are:

Local testing of your inference endpoint
Total control over the machine

Amazon has simplified the job of creating your container, publishing and documenting projects like SageMaker TensorFlow Serving Container in GitHub with all the code you need to run the container locally pretty much out the box, what is much appreciated.

Preparing the SageMaker TensorFlow Serving Container

Clone the mentioned repository and choose the TensorFlow version and the required architecture to start running in your machine the container where your model will be placed.

Save the already trained model:

Check the model is saved correctly:

And place it on the container to start testing it. The expected location is /opt/ml/model in the container which is shared with your (host) machine through test/resources/models

At this point, your model is already reachable and predictions can be requested via Http...

… if you have data examples following exactly the structure expected by the model, in our case, a 4d array tensor of floats.

However, the goal is making the endpoint as accessible as possible to our apps and take any heavy calculations away from them. So, we’ll simplify the design of the API to work with standard application/json requests as follows:

The pre and post inference data processing can be achieved by implementing the methods input_handler and output_handler (or just handler that would cover both) in an inference.py file that must be placed in a folder named code, alongside the requirements.txt with the packages required to be installed. The final structure would look like:

And the inference.py and requirements.txt files like:

Now, our container is ready to be used by real applications.

Deploying the container and creating the endpoint

Push your custom image to ECR by executing the script:

Compress and upload the saved model and code folder to a S3 bucket:

Create the model in Sagemaker, choosing our published custom image and uploaded model.tar.gz and setting other parameters like TensorFlow version and architecture type:

Create the endpoint configuration (referencing to the created SageMaker model) and spin up the endpoint. You’ll need to choose the instance/s that fit better your needs:

And voilà, the endpoint will be live and available for serving inferences in a few minutes. The invoke-endpoint command will allow you to double-check the deploy was successful:

If everything works as expected you should retrieve a json with the processed prediction:

References:

Save the model for inference: https://www.tensorflow.org/guide/saved_model
Batch inference with TensorFlow and Amazon SageMaker: https://aws.amazon.com/blogs/machine-learning/performing-batch-inference-with-tensorflow-serving-in-amazon-sagemaker/

Written by Jesús Larrubia (Full Stack Developer). Read more in Insights by Jesús or check our their socials Twitter, Instagram

Serving inferences from your machine learning model with Sagemaker and TensorFlow

Making real use of our trained model

Sagemaker to serve model inferences

Preparing the SageMaker TensorFlow Serving Container

Deploying the container and creating the endpoint

References:

Bull Library: How to manage your queues graciously

Building a content-based recommendation engine in JS