How to deploy a Machine Learning microservice to Google Cloud Run

Amplemarket's mission is to help companies grow all around the world. At the core of this mission is making sales team's lives easier and Machine Learning is a big part of that.

Our Machine Learning team is still small, but we're able to achieve a great amount of scale. Our secret sauce? Keeping things simple until we can't anymore. One of the big reasons why we are able to maintain this dynamic is because we leverage the Public Cloud.

One Cloud, too many options.

Most Cloud providers offer multiple options for ML teams to deploy their software. Those options are separated in 3 big buckets: FaaS (Functions as a Service), PaaS (Platform as a Service), and IaaS (Infrastructure as a Service). To simplify things, on Google Cloud Run this means: Cloud Functions (FaaS), Cloud Run (PaaS), and Compute Engine (IaaS). On a different Cloud? There's probably an equivalent service.

These three services are distinguishable. They have different complexities, pricing structures, costs, and "elasticities". At Amplemarket, we usually start with the most simple - that's Cloud Functions (FaaS). Unfortunately, Machine Learning services usually require some sort of storage (e.g,. for your model for example), or access to a large amount of memory (e.g., loading a transformer model).

This is when Cloud Run comes into place. Cloud Run allows us to deploy our Machine Learning models as a Docker Container. This has several benefits:

  • Cloud independence: Containers can run in any Public Cloud, and even in a private Data Center.
  • Elasticity: If we get many requests, Google Cloud can spin up more containers to meet demand - without us having to touch the service.
  • Cost-effectiveness: We only pay for the time we are using the service (e.g., if you deploy to a Virtual Machine, you pay for the entire life cycle of the instance).

But how do we exactly deploy our models into Cloud Run?

From model to microservice.

In a previous post we talked about how we serve our Machine Learning models with FastAPI. To ensure we can run our model in a Docker Container, we need to package it.

Through several iterations, we've found that the following structure tends to work well:

├── README.md
├── Makefile
├── Dockerfile
├── dev-requirements.txt
├── requirements.txt
├── notebooks
├── .github
│   └── workflows
│       └── main.yml
├── src
│   ├── __init__.py
│   ├── api
│   │   ├── __init__.py
│   │   └── main.py
│   ├── project_name
│   │   ├── __init__.py
│   │   └── pipeline.py
│   └── setup.py
└── tests
    ├── results
    └── test_sample.py

This structure is by no means fixed, but it has several advantages:

  • Your code is modular (e.g., it does not live in a single gigantic file).
  • You can install your codebase as a python package (e.g., pip install -e src).
  • All of your tests live in a separate folder.
  • The src folder can include pretty much anything (i.e. pipeline, API, web app, etc).

To keep things even more simple, we tend to use a Makefile with the most common operations you can run in the microservice. With it, the next developer doesn't need to know all the commands and instead of running python -m streamlit run src.., they can just run make app - and the streamlit apps opens directly in their browser.

Yes - we know Makefiles are old. Guess what? So is git - and we use it. Here's an example of a Makefile:

install:  
    @echo Make sure you are running in a virtual env!!
    python -m pip install --upgrade pip
    python -m pip install -r requirements.txt
    python -m pip install -e src

# runs unit tests
test:  
    pytest tests

# formats all code
format:  
    black -l 79 src/ tests/ 

# runs the API
api:  
    python -m uvicorn src.api.main:app 

# runs the streamlit app
app:  
    python -m streamlit run src/streamlit/app.py

Dockerizing the microservice

In the example above, we are packaging our application as a FastAPI microservice. This will expose the API to other teams at Amplemarket. But in order to deploy to Cloud Run, we need to package our app into a Docker Container.

Here's the contents of the Dockerfile:

FROM python:3.9

COPY requirements.txt /tmp/  
RUN pip install --upgrade pip  
RUN pip install torch --extra-index-url https://download.pytorch.org/whl/cpu  
RUN pip install -r /tmp/requirements.txt

RUN mkdir -p /src  
COPY src/ /src/  
RUN pip install -e /src

WORKDIR /

EXPOSE 80

CMD ["uvicorn", "src.api.main:app", "--host", "0.0.0.0", "--port", "80"]  

Let me walk you through it:
Our base image is python3.9. If we're not using anything too heavy (e.g., GCC) this is still the best image for Python Applications. Notice the pip install -e src that installs our python package in the container itself. The last instruction specifies the FastAPI CMD that launches our API.

Bonus: If you're using PyTorch and don't need GPU access, the instruction above installs a much smaller version. It's the difference between a 5GB and a 2GB container.

To run the microservice locally, we start by building it:

$ docker build -t ml-api . 

Now, we can run our API on localhost:8000:

$ docker run -p 8000:80 ml-api

When you visit localhost:8000, your API will be up and running!

Deploying to Google Cloud Run

In order to deploy our container to Cloud Run, we first need to make sure we have the Google Cloud CLI installed. To do so, follow the instructions here.

Google Cloud Run launches services based on Containers. If we want to launch a new service, we first need to build and push a container to Google Cloud:

$ docker build -t gcr.io/my-project/ml-api:latest -f Dockerfile . 
$ docker push gcr.io/my-project/ml-api:latest

Now that our container has been pushed, we need to tell Cloud Run to launch a service based on it:

$ gcloud run deploy ml-api --image gcr.io/my-project/ml-api:latest 
                           --region your-region-name
                           --port 80
                           --memory 4Gi

Let's see this command:

  • gcloud run is the Google CLI command to manage Cloud Run services.
  • region specifies the region to where you want to deploy the service (e.g., us-central1-a - see a list here).
  • port is the port that is being served by our container. Remember we specified 80 in the Dockerfile?
  • memory is the amount of RAM you want to give to your container

There are many more options to specify if you want.

In the need, this command will also output a link where your API is now running.

Continuously delivering our service.

Running the above commands every time we want to deploy or update our API is not an option. Any reliable service must have some sort of continuous delivery system.

Using GitHub Actions, we can include a file in our repo that automatically deploys the service every time we update the code.

The first step in this process is to make sure GitHub actions can authenticate to update services on your Google Cloud account. To do so, you can follow these instructions. Ultimately, you are looking to create a repository secret with the name CLOUD_RUN_SECRET.

Once that is done, we need to create a main.yml in the .github/workflows directory of our repository. Just like specified in our initial repository structure.

Here are the contents of main.yml:

name: Deploy # the name of our workflow

# some details that we will re-use throughout the file
env:  
  PROJECT_ID: my-project
  REGION: us-central1 

  API_SERVICE: ml-api
  API_CONTAINER_PORT: 80
  API_DOCKERFILE: Dockerfile

# when this workflow should be run
on:  
  push:
    branches:
      - main
  workflow_dispatch:

# what jobs need to run - maybe we could also test?
jobs:  
  deploy_api:
    runs-on: ubuntu-latest
    steps:
        # downloads the repository code
      - name: Checkout
        uses: actions/checkout@v2

        # authenticates with Google Cloud
      - name: Authenticate
        uses: google-github-actions/auth@v0
        with:
          credentials_json: '${{ SECRETS.CLOUD_SECRET_RUN }}'

        # authorizes docker to push images to our GCP account
      - name: Authorize Docker push
        run: gcloud auth configure-docker

        # builds and pushes container to GCR
      - name: Build and Push Container
        run: |-
          docker build -t gcr.io/${{ env.PROJECT_ID }}/${{ env.API_SERVICE }}:${{ github.sha }} -f ${{ env.API_DOCKERFILE }} .
          docker push gcr.io/${{ env.PROJECT_ID }}/${{ env.API_SERVICE }}:${{ github.sha }}

        # updates Cloud Run service
      - name: Deploy to Cloud Run 
        run: |-
          gcloud run deploy ${{ env.API_SERVICE }} --image gcr.io/${{ env.PROJECT_ID }}/${{ env.API_SERVICE }}:${{ github.sha }} --region ${{ env.REGION }} --port ${{ env.API_CONTAINER_PORT }} --memory 4Gi

With the above file, every time you push to your repository, GitHub Actions will automatically build a new version of the container, push it, and update the Cloud Run service. So that whatever you are running in the Cloud is an exact replica of your repo.

Closing thoughts

If you've reached this far, congrats! You've just learned how to deploy a Machine Learning Microservice to Google Cloud run. Google Cloud will now spin up as many instances of your container as required, depending on traffic. This also allows your app to downscale when it's not being used, making it a great alternative to serving ML models.

But it's not all rainbows and unicorns. The service can sometimes take some time to wake up even though you can configure it to be "always on". This obviously carries an increased cost. I recommend reviewing some Cloud Run to ensure everything is set up according to your needs.


Amplemarket is growing! If you are interested in solving some difficult problems, we have a bunch of open positions!

Duarte O.Carmo

ML Engineer @ Amplemarket. Passionate about the intersection of technology and people.

Copenhagen, Denmark http://duarteocarmo.com

Subscribe to Amplemarket Blog | Sales Tips, Email Resources, Marketing Content

Get the latest posts delivered right to your inbox.

or subscribe via RSS with Feedly!