Deployment is perhaps one of the most overlooked topics in the Machine Learning world. But it most certainly is important, if you want to get into the industry as a Machine Learning Engineer (MLE). In this article, we will take a sober look at how painless this process can be, if you just know the small ins and outs of the technologies involved in deployment.

All the files for this project are available on GitHub, and you can perhaps use this project as a Hello World application, such that you have something running and later on replace it with something more complex.

Table of Contents (Click To Scroll)

  1. Setup & Installation
  2. What Is Docker And Kubernetes?
  3. A Sample Project
  4. The Deployment With Docker And Kubernetes

Setup & Installation

These are all the steps to set up your environment. We are going to be using Google Cloud Platform (GCP), as they are largely the leader in Kubernetes, and they are also the one's who developed and open-sourced the internal project. The main benefits of using GCP is that it has a great UI/UX and is easy to set up. The same cannot be said for their competitors.

  1. You probably already have a Google account. Sign up for Google Cloud Platform and get $300 free credits. You have to enter your credit card, but it won't be charged unless you give them permission. Remember to enable billing.
  2. Download the Google Cloud SDK.
  3. Export the SDK to PATH by finding the path to the SDK bin folder.
    a.    Windows: Add an environmental variable. 1) Open search and search for "Edit the environment variables", 2) Click on the "Environment Variables" button at the bottom, 3) For your user, double click Path in "User variables for <user>" OR click new if it does not exist (Variable Name is "Path"). Click new and enter the path to the SDK bin folder and save.
    b.    MacOS: In Terminal, type in nano ~/.bash_profile and add the path to your bin folder as a new line in the file.
    export PATH="/Applications/google-cloud-sdk/bin"
    Save by doing CTRL+O, press Enter/Return and press CTRL+X.
    c.    Linux: Depending on the distribution and what you have installed, there could be different profiles for Terminal. Check nano ~/.bash_profile, nano ~/.bash_login, nano ~/.profile or nano ~/.bashrc. Add the path to the bin folder of the SDK:
    export PATH="/Applications/google-cloud-sdk/bin"
    Save by doing CTRL+O, press Enter/Return and press CTRL+X.
  4. Download Docker. Note that you cannot use Docker with Windows Home – consider using a local Ubuntu VM by using Hyper-V on Windows instead.
    a.    Windows Pro/Enterprise/Education: Sign up and download Docker Desktop on Windows. Once logged in, you can download Docker. Make sure the program is running and that you are logged in locally.
    b.    MacOS: Sign up and download Docker Desktop on Mac. Once logged in, you can download Docker. Make sure the program is running and that you are logged in locally.
    c.    Linux: Use the three following command to download and start Docker:
    1) apt install docker.io, 2) systemctl start docker, and 3) systemctl enable docker. You can login with docker login if you have a registry you want to login in to.
  5. After this is done, you should be able to type gcloud init and configure the SDK for the setup.
    a.    Type Y, press enter and log into your account when gcloud displays: "You must log in to continue. Would you like to log in (Y/n)?"
    b.    Type the number of your project, mine was 1, and press enter when gcloud displays: "Please enter numeric choice or text value (must exactly match list item)"
    c.    Type n and press enter when gcloud displays: "Do you want to configure a default Compute Region and Zone? (Y/n)?"
  6. You need to use gcloud auth configure-docker in the Terminal to be able to push your containers later on. If this does not run, try restarting Terminal.

After you have prepared the tools, we need to create a cluster, such that we can go through using these tools to deploy a Machine Learning application.

Creating Your Cluster

Creating your cluster is very individual and down to what your needs are. For this tutorial, I simply need 1 node, since we are just deploying a toy dataset that just needs to work remotely. If you want to replicate my cluster, adjust the number of nodes to 1, make the machine type g1-small and create your cluster – these are the only steps I took for this article.

We will go through the options, but before please go ahead and create your cluster in the Kubernetes Engine.

Step 1: Choosing The Cluster Type For Your Clusters

Consider the resources your application needs. Are you doing Deep Learning? Then you will need GPUs, so you need to use the "GPU Accelerated Computing" cluster template in the left side.

If you want high availability, you should make use of the cluster template "Highly Available" in the left sidebar. If your application has a need for CPU power or lots of RAM, then choose those cluster templates.

Remember that you can actually combine these templates, so you can have highly available GPU nodes by using the location type of regional.

Step 2: Automatic Scaling, Automatic Upgrades And Automatic Repairs

While creating the cluster, you should pay attention to the menu under "Machine Configuration" > "More Options".

What if your application hits the front page of Hacker News or some big blog or magazine? Are you sure that your application can handle that amount of traffic? You should enable autoscaling, such that Kubernetes automatically scales your application up and down when needed. Just enter the maximum number of nodes you are prepared to pay for if your application experiences a huge load.

Another great option is auto-upgrading and auto-repairing. The nodes can actually be automatically upgraded with no downtime, just by enabling auto-upgrade – this ensures there is fewer security flaws and the latest features of the stable version.

If a node is somehow failing or not ready to be used, Kubernetes can take care of automatically restoring and making the node work again. Auto-repairing does this, by monitoring all nodes with health checks, multiple times an hour.

After Your Cluster Is Created

After you have created your cluster, you want to connect to your cluster in your local Terminal. You can connect to the cluster by clicking on "Connect", and copy the line of code that pops up under "Command-line access".

What Is Docker And Kubernetes?

Normally, I don't recommend YouTube videos. But the following YouTube video is a very comprehensive and great explanation of Docker and Kubernetes. It even answers why we want to use Docker and Kubernetes, and why we shifted away from VMs to containers.

The basic idea of Kubernetes consists of Ingress', Services, Deployments and Pods. We can think about the Ingress as a Load Balancer between multiple services, and we can think about a Service as a Load Balancer between the Pods of a specific Deployment. For this article, we will not be using an Ingress, since we have just a single container, but you should look into it, since you will likely need it for larger applications or when doing a microservice architecture.

Overview of how Kubernetes distributes your traffic. Screenshot from the video above.

Making A Sample Project

We are deploying a machine learning model on the Auto MPG dataset, which is a toy dataset. Let's walk through the steps of training a Random Forest model.

Training A Random Forest Model

To train a model, we don't have to do much work with the chosen dataset. We start by importing the packages we need, as well as the classes from the core folder.

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
import joblib

from core.clean_data import CleanData
from core.predict_data import PredictData
from core.load_data import LoadData

Then we instantiate the classes, such that we can call all the functions from the classes later on.

loader = LoadData()
cleaner = CleanData()
predicter = PredictData()

The next step is loading the dataset into a dataframe, which we provided a function for in the LoadData class. When we call the function, we use Pandas to read a data table, and then we specify the column names from the class. Remember to check out the GitHub repository to see the full code.

df = loader.load_dataset_as_df()

Now that we have the dataset loaded, we want to think about what preprocessing steps we need to take, to be able to make a model. I found that there are some question marks in the horsepower feature of the dataset, so we need to get rid of those rows.

The CleanData class has a function for this, where we specify the dataframe to only contain the rows where horsepower does not contain a question mark – in other words, it removes the rows where horsepower has a question mark instead of a number. In a similar fashion as the loader, we called the class cleaner by the following line of code.

df = cleaner.clear_question_marks(df)

The next and last step before training our model is specifying which feature we want to predict. Naturally, for this dataset at least, we want to predict the mpg feature, which is miles per gallon for a car.

What happens here is that we split the dataset into training and testing, where y is the feature we are trying to predict, and X is the features we are using to predict y.

y = df['mpg']
X = cleaner.drop_unused_columns(df)

X_train, X_test, y_train, y_test = train_test_split(
                                    X, y, 
                                    test_size=0.2, 
                                    random_state=42
                                   )

Now we are finally ready to do fit a random forest model to the dataset, since it has been cleaned and prepared for the algorithm.

We start off by instantiating the Random Forest with default parameters, and then we tell scikit-learn to train a random forest model with the training data. After that is done, we can predict on the testing data, and we can also score how well the predictions went.

rf = RandomForestRegressor()
rf.fit(X_train, y_train)

pred = predicter.predict(X_test, rf)
score = predicter.score_r2(y_test)

The absolute last step is "dumping" the model, which means exporting it to a file, that we can load into Python at another point in time. For this, we use joblib.

joblib.dump(rf, "models/rf_model.pkl")

Making Models Accessible Through A Service

Now that we have a model, we need a way to expose it as a service. Flask is the most common way to do this, and it will scale easily. The first step is to create the Flask app by app = Flask(__name__), and then a function def with an annotation @, a route / and a methods parameters methods=[].

Upon running the script, the following code imports the random forest model we exported earlier. Then once someone asks us at the specified route, we make predictions and return them.

from core.clean_data import CleanData
from core.predict_data import PredictData
from core.load_data import LoadData
from flask import Flask, jsonify, request

app = Flask(__name__)

loader = LoadData()
cleaner = CleanData()
model = loader.load_model_from_path('./models/rf_model.pkl')
predicter = PredictData(model)

@app.route("/", methods=['POST'])
def do_prediction():
    json = request.get_json()
    df = loader.json_to_df(json)
    df = cleaner.clear_question_marks(df)
    X = cleaner.drop_car_name(df)

    prediction = predicter.predict(X)
    return jsonify(prediction[0])

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=5000)

When we run this code, we have told flask to run the method do_prediction(), if someone queries the any computers IP on port 5000 (e.g. locally localhost:5000). This does not expose it to the world yet, though, since we need a cloud provider for that. This is simply the entrypoint to our application, which we will package and ship off to a cloud provider.

The Deployment With Docker And Kubernetes

This section combines the powerful combination of Docker, Kubernetes and Machine Learning to expose your application to the world. The following information was not easy to learn, although it seems easy now.

Making A Dockerfile

The very first thing we always do is: creating a Dockerfile. By making a Dockerfile, we take only the necessary files from our project and package them into an image. We want the image to be as small as possible, so we don't want to clutter it with tons of unnecessary files.

The syntax is not hard to learn. Let me give you a brief overview of the commands used in this Dockerfile.

Command What it does
FROM This specifies the base image, which is usually a Python image for Machine Learning. You can browse base images on Docker Hub.
WORKDIR This command changes (and creates) the directory within the image to the specified path.
RUN This runs a command in Terminal inside the image. It could be anything, but don't clutter it up.
ADD This specifies the files to add from your directory to a directory in the image (also creates directory if it does not exist).
EXPOSE This command opens up a specific port number, like port 5000.
CMD Takes the argument for running the application.

Now, the following is the Dockerfile, which I created for this project. We run the installation of packages directly by specifying the names of the packages, instead of through a requirements.txt file, like one would usually do.

After having installed all the packages and added the necessary files, we tell Docker to run the command gunicorn --bind 0.0.0.0:5000 main:app, which is the syntax for using Gunicorn. We always want to use Gunicorn on top of Flask, because it is suited as a production service, where as Flask is suited for development purposes. The application will work exactly the same in development and production.

The main is the filename without the extension (main.py), and the app is what we called the Flask app in the file. So if you change the name, you also have to change it here.

FROM python:3.7

WORKDIR /app

RUN pip install pandas scikit-learn flask gunicorn

ADD ./core ./core
ADD ./models ./models
ADD main.py main.py

EXPOSE 5000

CMD [ "gunicorn", "--bind", "0.0.0.0:5000", "main:app" ]

The image does not create itself though. The Dockerfile just lists the specifics of how the image should look like – so we need a way to build the image, before we can use it.

Building & Pushing An Image To Google Cloud

To be able to push the image to Google Cloud, we have to build it, tag it and push it. Luckily for you, I have mostly automated this process for Google Cloud, to the point where you just have to enter your details once and change the version number when you want to update the images.

First, you need to find your project id. You can find your project in the top left and you can find your ID by clicking on the arrow on the right.

All you have to enter is the address, project id, repository and version, and then this script will build and push the image. Use that ID for your project id.

#!/bin/bash
ADDRESS=gcr.io
PROJECT_ID=macro-authority-266522
REPOSITORY=auto
VERSION=0.17

docker build -t ${PROJECT_ID}:${VERSION} .
ID="$(docker images | grep ${REPOSITORY} | head -n 1 | awk '{print $3}')"

docker tag ${ID} $ADDRESS/${PROJECT_ID}/${REPOSITORY}:${VERSION}

docker push $ADDRESS/${PROJECT_ID}/${REPOSITORY}:${VERSION}

You can run the image with git bash or terminal sh build_push_image.sh. You should run this script instead of manually running these commands each time.

Alternatively, you can experiment with using docker build, docker tag and docker push yourself.

Deploying With Kubernetes

The image is now built and pushed to the cloud. To be able to run it, we need to create a deployment and service file, which is defined in the syntax of YAML.

I'm going to show you how to deploy the product now. The general procedure is the following steps.

  1. Create/Update the actual containers.
  2. Update the details in the bash script and run it. If you want to update the container, you just have to update the version in the script.
  3. Update the details of the image in the deployment YAML file. If you want to update the container, you just have to update the version in the image.
  4. Run the kubectl apply -f <filename>.yml command on a deployment or service.

Creating A Deployment

Below, you can see the deployment.yaml file that I used – it works by naming your project (rename mpg) and specifying the url for the image. When we apply this deployment, we have specified for Kubernetes to create 3 replicas, which are called Pods – a pod will run one instance of the container.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mpg
  labels:
    app: mpg
spec:
  replicas: 3
  selector:
    matchLabels:
      app: mpg
  template:
    metadata:
      labels:
        app: mpg
    spec:
      containers:
      - name: auto
        image: gcr.io/macro-authority-266522/auto:0.17
        imagePullPolicy: Always
        ports:
        - containerPort: 5000

The containerPort is set 5000, because that is the port we set in our Flask application. The image is set to gcr.io/<project_id>/<repository>:<version>, which is the same variable names as in the bash file from earlier.

When you have such a deployment file, you very simply run the below line of code.

kubectl apply -f deployment.yml

It should say deployment.apps/mpg created when you create it for the first time, and it will also give you another message if you update the image.

Remember we can always go back and reapply this deployment file, if we have made changes to the container. It really is as simple as running the apply command once again, after having changed the version number in your bash script to push to the cloud and in the deployment file.

Though, we are not quite done yet. The application you made is running in the cloud, but it's not exposed yet. Let me introduce services.

Creating A Service

A service is also defined in a YAML file, just like the deployment. You use the same command to make and update a service. This is why some DevOps engineers are called YAML engineers, because most of the configuration for deployments are done using YAML files just like these one's presented here.

Below is the service.yaml file used for this tutorial. You want to specify the type to be a Load Balancer, since that will distribute traffic equally amongst the available pods from your deployments. This enables for immense scaling capacity, since you can just create more nodes in your cluster and create more pods.

We called our deployment mpg, so we specify the app in the selector to be the same app name, such that this service is linked to our deployment. The port is set to 80, because that defaults to just the external-ip address, while the targetPort is set to 5000, since that is what we specified in the deployment file and Flask application.

apiVersion: v1
kind: Service
metadata:
  name: mlfromscratch
spec:
  type: LoadBalancer
  selector:
    app: mpg
  ports:
  - protocol: TCP
    port: 80
    targetPort: 5000

Quite simply as before, we apply the service in the same way as the deployment.

kubectl apply -f service.yml

And we get back a response service/mlfromscratch created.

Ok, now we know that everything is created. Let me give you a rundown of how you can access your deployment.

How Do I Access My Application?

First things first, let's take a look at what we just created, shall we? The very first command to run is (note that "service" can be shortened to "svc").

kubectl get service

We get back a response. Your EXTERNAL-IP might still say <pending>, but it will come through rather quickly with the actual public ip. Mine took 2 minutes to come online and another 2 minutes to give me the response I was expecting, instead of endlessly loading. Note that we don't care about the service named kubernetes, but instead the name specified in your service, which in this case was mlfromscratch.

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.12.0.1 <none> 443/TCP 20m
mlfromscratch LoadBalancer 10.12.15.238 35.225.0.74 80:30408/TCP 5m14s

All we have to do to access the application is access the external-ip from the service, and that's it! We just made a Machine Learning model ready to serve predictions at an endpoint. Ideally, we would query this IP-address from another service/product, such that we continually could use these predictions. You could even make your own SaaS by turning this into a microservice and deploying it using this Flask template I integrated with Stripe.

Diagnosing And Checking In On Our Application

Sometimes you want to check what the console prints or just the status of the application to see if it's running, and if there is any errors.

Just like we used the kubectl get service command to get information about our services earlier, we can use it to get information on pieces of our Kubernetes or just access it all through this command. Optionally we can add -o wide at the end for even more information.

kubectl get svc,deployment,pods -o wide

The above command will give us a look at the pods. They should say running under status, else, your application is not working like expected and you need to go back and make sure that it runs locally.

After getting a specific pod name, we could see the logs for that specific pod and see what happened. Tail can also be --tail=1h to see the last hour of logs instead of just 20 lines.

kubectl logs -f pod/mpg-768578c99c-lt6j2 --tail=20

Similarly to these get command, we have other a whole bunch of commands that you can list by typing kubectl in your Terminal. Most interestingly you can delete your deployments and services, e.g. kubectl delete -f service.yaml, if you just want to start over.

Basic Commands (Beginner):
  create         Create a resource from a file or from stdin.
  expose         Take a replication controller, service, deployment or pod and expose it as a new Kubernetes Service
  run            Run a particular image on the cluster
  set            Set specific features on objects

Basic Commands (Intermediate):
  explain        Documentation of resources
  get            Display one or many resources
  edit           Edit a resource on the server
  delete         Delete resources by filenames, stdin, resources and names, or by resources and label selector

Deploy Commands:
  rollout        Manage the rollout of a resource
  scale          Set a new size for a Deployment, ReplicaSet, Replication Controller, or Job
  autoscale      Auto-scale a Deployment, ReplicaSet, or ReplicationController

Cluster Management Commands:
  certificate    Modify certificate resources.
  cluster-info   Display cluster info
  top            Display Resource (CPU/Memory/Storage) usage.
  cordon         Mark node as unschedulable
  uncordon       Mark node as schedulable
  drain          Drain node in preparation for maintenance
  taint          Update the taints on one or more nodes

Troubleshooting and Debugging Commands:
  describe       Show details of a specific resource or group of resources
  logs           Print the logs for a container in a pod
  attach         Attach to a running container
  exec           Execute a command in a container
  port-forward   Forward one or more local ports to a pod
  proxy          Run a proxy to the Kubernetes API server
  cp             Copy files and directories to and from containers.
  auth           Inspect authorization

Advanced Commands:
  diff           Diff live version against would-be applied version
  apply          Apply a configuration to a resource by filename or stdin
  patch          Update field(s) of a resource using strategic merge patch
  replace        Replace a resource by filename or stdin
  wait           Experimental: Wait for a specific condition on one or many resources.
  convert        Convert config files between different API versions
  kustomize      Build a kustomization target from a directory or a remote url.

Settings Commands:
  label          Update the labels on a resource
  annotate       Update the annotations on a resource
  completion     Output shell completion code for the specified shell (bash or zsh)

Other Commands:
  api-resources  Print the supported API resources on the server
  api-versions   Print the supported API versions on the server, in the form of "group/version"
  config         Modify kubeconfig files
  plugin         Provides utilities for interacting with plugins.
  version        Print the client and server version information

Making A Request To Our Application

We have the external-ip from earlier, which we are going to reuse here. The following JSON request can now be sent as with a HTTP POST method, and we will receive the expected response.

{
    "cylinders": 8,
    "displacement": 307.0,
    "horsepower": 130.0,
    "weight": 3504,
    "acceleration": 12.0,
    "model_year": 70,
    "origin": 1,
    "car_name": "chevrolet chevelle malibu"
}

In just 0.29 seconds, we received the prediction of this car, and our machine learning model predicted 16.383.

Postman request with the provided JSON body.

Or from a Python script.

import json
import requests
data = {
    "cylinders": 8,
    "displacement": 307.0,
    "horsepower": 130.0,
    "weight": 3504,
    "acceleration": 12.0,
    "model_year": 70,
    "origin": 1,
    "car_name": "chevrolet chevelle malibu"
}
r = requests.post('http://35.225.0.74', json=data)

Printing r.text gives us the prediction $16.383$, and printing r.status_code gives us a HTTP status code of $200$.