Email the Author
You can use this page to email Konrad Semsch about MLOps Recipes: Deploying ML Models in Production.
About the Book
I still remember my excitement in the early days of my studies when I learned about linear regression and started building my very first statistical models. For a data science practitioner life was good those days – so little did we had to worry about terms such as: Docker, cloud, DevOps, MLOps, distributed systems, infrastructure as code and all those other scary things that caused a headache and confusion to many of us in the community. When we attempted to build our first “production” models around 2013 in a startup where I was an intern, there were very few best practices or people with enough experience in this particular field to show you “how to do things right”. Our work was constant trial an error and learning from our own mistakes the hard way.
Since then, the widespread adoption of the cloud brought data science and machine learning to completely new levels. On one hand side, it made our lives much easier in certain aspects – deploying models at scale at an incredibly low cost with just a few lines of code has never been easier. On the other side, it also means ever increasing demands and expectations from a classical data scientist skillset. Nowadays, data scientists are not only required to understand the best way of building machine learning models. They also should know (at least the basics) of things such as: Docker, CICD, testing frameworks, efficient coding practices, cloud deployment, and many other very technical terms that historically have never been in our domain. Without that knowledge we are simply not fitted into the modern way of working. For some of us getting to grips with this new work paradigm comes easy, but for many making sense of all those puzzle elements becomes more problematic.
This book aims to bridge that gap and aims to be a hand-on, real-life guide that I self-wish to have had a few years back. It is written by a data scientist for my other fellow data science colleagues. After reading it and following along with the examples, you will have a complete, end-to-end understanding of building a modern, well-structured, and scalable machine learning pipeline. I will demonstrate deploying to AWS an exemplary python model developed in sklearn, along with all the technical novelties and frameworks (Gitlab, Terraform, Serverless and more). The examples used in this book come from my own experience and reflect the challenges that data scientist will sooner or later encounter in their day-to-day work. I share with you the best practices coined through many trials and errors. I am certain that after completing this book these things will finally “click” and your confidence at work and big picture perspective will be better than ever :)
What will you learn
This books and accompanying code repository offers a complete, end-to-end perspective of deploying a machine learning solution to the AWS using the following tools:
- AWS tools useful for deploying machine learning models such as: ECR, Lambda, Batch and Step Functions and others
- Terraform for deploying your AWS resources, such as: networking, storage, compute environments and other infrasctructure
- Serverless for deploying your machine learning pipelines, job and supporting infrastructure
- Gitlab CICD for managing your project's continuous integration and delivery pipelines
On top of that, you will learn best practices of efficient machine learning code packaging with tools and concepts such as:
- pyenv for python versions management
- poetry for dependency management
- sklearn for building ML models in a structured way
- Docker for code execution environment isolation
- click for building CLI interfaces
- pytest for writing code tests to secure your deployments
- tox for executing your code quality and testing logic
- ...and others
How will you learn
After buying this book you will be granted access to a private Gitlab repository which you will be able to clone. There you will find the end-to-end code with examples, which you will be able to execute yourself in order to deploy your pipeline in your own Gitlab and AWS accounts. Since I'm a big believer that well written and documented code itself is the best form documentation, the book itself will merely guide you in the learning process, offer best practices and other perspectives. However, you should consider the code itself the main knowledge source.
Please also note that given the breadth of topics and concepts covered in the book, none of them are covered in extreme depth like other specific, specilized books might do. The main goal of the book is to demonstrate the big, end-to-end picture and give the reader "just enough" knowledge to run the code with sufficient understanding. However, there will references to other resources both in the book as well as code, in order for the readers to further deepen their knowledge on particular concepts.
Target audience
- individuals considering a career in a machine learning field will learn about the more technical flavours of the job
- beginner data scientist will be able to see the full picture, gain practical experience and learn an end-to-end ML project
- experienced data scientists will be able to improve their skills and knowledge in areas that are new to them
- data and devops engineers will be able to discover the data science side and perspective of deploying solutions to the cloud
What this book is not
- This book doesn't cover sophisticated ML algorithms. Since the book is focused on the big picture, we will train a relatively simple ElasticNet model in our pipeline. The goal is to demonstrate how to approach this task end-to-end with relatively simple models, so that you could later adjust it and apply for your own use case.
- This book doesn't discuss various ML algorithms. Building on the previous answer, ML pipeline can often be universally applied to various problems and with various algorithms. Therefore I won't cover other algorithms in this book.
- This book won't teach you about data science or ML. This book doesn't explain the inner workings of an ElasticNet model or any other models or data science as a whole. There are plenty other great books on the market that you can read for that purpose.
About the Author
Konrad is a predictive modelling practitioner passionate about ML, MLOps and deploying simple solutions to production that - just work! He's worked several years in the area of data science and machine learning, deploying a wide variety of solutions at scale, having worked both at small startups, as well as large enterprise. Born and raised in Poland, Konrad currently lives with his wife in Essen, Germany. In his free (apart from writing this book...) he enjoys bouldering, volleyball and all kinds of watersports.