Email the Author

You can use this page to email Enrique Garcia-Ceja about MOST COMMON MISTAKES IN MACHINE LEARNING AND HOW TO AVOID THEM: With Examples in Python.

About the Book

This book is a compilation of the most common mistakes when building machine learning models. I have gathered this list from mistakes I typically find when grading assignments, supervising graduate students, reading blog posts, looking at the accompanying code of published papers, and of course, from my own experience making those mistakes.

This book includes examples in Python. Some examples of mistakes that you will find in this book include:

- Not understanding the data

- Including irrelevant variables

- Data injection

- Assuming all users behave the same

- Wasting unlabeled data

- and much more!

Table of Contents

Introduction

Terminology

1 Not understanding the data

2 Reporting train performance

3 Not setting a seed value

4 Including irrelevant features

5 Ignoring differences in scales

6 Using the test set for fine tunning

7 Only reporting accuracy

8 Not comparing against a baseline

9 Not accounting for variance

10 Injecting data into the test set

11 Not shuffling the training data

12 Not saving the results

13 Not parallelizing

14 Encoding categories as integers

15 Forget data changes over time

16 Ignoring inter-user variance

17 Wasting unlabeled data

Apendix Setup Your Environment

About the Author

Enrique Garcia-Ceja

Enrique is a professor at Tecnologico de Monterrey University. Previously, he worked as a data scientist at Optimeering, Norway and as a Researche Scientist at SINTEF, Norway. He did a postdoc at the University of Oslo and received his PhD degree in intelligent systems from Tecnologico de Monterrey University, Mexico. He also worked as a software engineer at Huawei. For the last 12 years, he has been working on behavior monitoring and analysis with machine learning and wearable devices.