MLOps Recipes: Deploying ML Models in Production

MLOps Recipes: Deploying ML Models in Production

An end-to-end guideline for building MLOps pipelines in Python using Gitlab, Terraform, Serverless and AWS.

About the Book

I still remember my excitement in the early days of my studies when I learned about linear regression and started building my very first statistical models. For a data science practitioner life was good those days – so little did we had to worry about terms such as: Docker, cloud, DevOps, MLOps, distributed systems, infrastructure as code and all those other scary things that caused a headache and confusion to many of us in the community. When we attempted to build our first “production” models around 2013 in a startup where I was an intern, there were very few best practices or people with enough experience in this particular field to show you “how to do things right”. Our work was constant trial an error and learning from our own mistakes the hard way.

 

Since then, the widespread adoption of the cloud brought data science and machine learning to completely new levels. On one hand side, it made our lives much easier in certain aspects – deploying models at scale at an incredibly low cost with just a few lines of code has never been easier. On the other side, it also means ever increasing demands and expectations from a classical data scientist skillset. Nowadays, data scientists are not only required to understand the best way of building machine learning models. They also should know (at least the basics) of things such as: Docker, CICD, testing frameworks, efficient coding practices, cloud deployment, and many other very technical terms that historically have never been in our domain. Without that knowledge we are simply not fitted into the modern way of working. For some of us getting to grips with this new work paradigm comes easy, but for many making sense of all those puzzle elements becomes more problematic.

 

This book aims to bridge that gap and aims to be a hand-on, real-life guide that I self-wish to have had a few years back. It is written by a data scientist for my other fellow data science colleagues. After reading it and following along with the examples, you will have a complete, end-to-end understanding of building a modern, well-structured, and scalable machine learning pipeline. I will demonstrate deploying to AWS an exemplary python model developed in sklearn, along with all the technical novelties and frameworks (Gitlab, Terraform, Serverless and more). The examples used in this book come from my own experience and reflect the challenges that data scientist will sooner or later encounter in their day-to-day work. I share with you the best practices coined through many trials and errors. I am certain that after completing this book these things will finally “click” and your confidence at work and big picture perspective will be better than ever :)

What will you learn

This books and accompanying code repository offers a complete, end-to-end perspective of deploying a machine learning solution to the AWS using the following tools:

  • AWS tools useful for deploying machine learning models such as: ECR, Lambda, Batch and Step Functions and others
  • Terraform for deploying your AWS resources, such as: networking, storage, compute environments and other infrasctructure
  • Serverless for deploying your machine learning pipelines, job and supporting infrastructure
  • Gitlab CICD for managing your project's continuous integration and delivery pipelines

On top of that, you will learn best practices of efficient machine learning code packaging with tools and concepts such as:

  • pyenv for python versions management
  • poetry for dependency management
  • sklearn for building ML models in a structured way
  • Docker for code execution environment isolation
  • click for building CLI interfaces
  • pytest for writing code tests to secure your deployments
  • tox for executing your code quality and testing logic
  • ...and others

How will you learn

After buying this book you will be granted access to a private Gitlab repository which you will be able to clone. There you will find the end-to-end code with examples, which you will be able to execute yourself in order to deploy your pipeline in your own Gitlab and AWS accounts. Since I'm a big believer that well written and documented code itself is the best form documentation, the book itself will merely guide you in the learning process, offer best practices and other perspectives. However, you should consider the code itself the main knowledge source.

Please also note that given the breadth of topics and concepts covered in the book, none of them are covered in extreme depth like other specific, specilized books might do. The main goal of the book is to demonstrate the big, end-to-end picture and give the reader "just enough" knowledge to run the code with sufficient understanding. However, there will references to other resources both in the book as well as code, in order for the readers to further deepen their knowledge on particular concepts.

Target audience

  • individuals considering a career in a machine learning field will learn about the more technical flavours of the job
  • beginner data scientist will be able to see the full picture, gain practical experience and learn an end-to-end ML project
  • experienced data scientists will be able to improve their skills and knowledge in areas that are new to them
  • data and devops engineers will be able to discover the data science side and perspective of deploying solutions to the cloud

What this book is not

  1. This book doesn't cover sophisticated ML algorithms. Since the book is focused on the big picture, we will train a relatively simple ElasticNet model in our pipeline. The goal is to demonstrate how to approach this task end-to-end with relatively simple models, so that you could later adjust it and apply for your own use case.
  2. This book doesn't discuss various ML algorithms. Building on the previous answer, ML pipeline can often be universally applied to various problems and with various algorithms. Therefore I won't cover other algorithms in this book.
  3. This book won't teach you about data science or ML. This book doesn't explain the inner workings of an ElasticNet model or any other models or data science as a whole. There are plenty other great books on the market that you can read for that purpose.
  • Share this book

  • Categories

    • Amazon Web Services
    • AWS
    • Docker
    • Infrastructure as Code
    • Terraform
    • Machine Learning
    • Python
  • Feedback

    Email the Author(s)

About the Author

Konrad Semsch
Konrad Semsch

Konrad is a predictive modelling practitioner passionate about ML, MLOps and deploying simple solutions to production that - just work! He's worked several years in the area of data science and machine learning, deploying a wide variety of solutions at scale, having worked both at small startups, as well as large enterprise. Born and raised in Poland, Konrad currently lives with his wife in Essen, Germany. In his free (apart from writing this book...) he enjoys bouldering, volleyball and all kinds of watersports.

Table of Contents

  • I What is MLOps
  • II Getting started with your own MLOps project
  • III Structuring an ML package
  • IV Test your model locally in Docker
  • VI Building your CICD pipeline in Gitlab
  • VIII Deploy your AWS resources with Terraform
  • IX Run your project end-to-end
  • The Leanpub 60 Day 100% Happiness Guarantee

    Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

    Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

    You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

    So, there's no reason not to click the Add to Cart button, is there?

    See full terms...

    Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

    We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

    (Yes, some authors have already earned much more than that on Leanpub.)

    In fact, authors have earnedover $14 millionwriting, publishing and selling on Leanpub.

    Learn more about writing on Leanpub

    Free Updates. DRM Free.

    If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

    Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

    Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

    Learn more about Leanpub's ebook formats and where to read them

    Write and Publish on Leanpub

    You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

    Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

    Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

    Learn more about writing on Leanpub