Email the Author
You can use this page to email Enrique Garcia-Ceja about MOST COMMON MISTAKES IN MACHINE LEARNING AND HOW TO AVOID THEM.
About the Book
This book is a compilation of the most common mistakes when building machine learning models. I have gathered this list from mistakes I typically find when grading assignments, supervising graduate students, reading blog posts, looking at the accompanying code of published papers, and of course, from my own experience making those mistakes.
This book includes examples in Python. Some examples of mistakes that you will find in this book include:
- Not understanding the data
- Including irrelevant variables
- Data injection
- Assuming all users behave the same
- Wasting unlabeled data
- and much more!
Table of Contents
Introduction
Terminology
1 Not understanding the data
2 Reporting train performance
3 Not setting a seed value
4 Including irrelevant features
5 Ignoring differences in scales
6 Using the test set for fine tunning
7 Only reporting accuracy
8 Not comparing against a baseline
9 Not accounting for variance
10 Injecting data into the test set
11 Not shuffling the training data
12 Not saving the results
13 Not parallelizing
14 Encoding categories as integers
15 Forget data changes over time
16 Ignoring inter-user variance
17 Wasting unlabeled data
Apendix Setup Your Environment
About the Author
Enrique is a professor at Tecnologico de Monterrey University. Previously, he worked as a data scientist at Optimeering, Norway and as a Researche Scientist at SINTEF, Norway. He did a postdoc at the University of Oslo and received his PhD degree in intelligent systems from Tecnologico de Monterrey University, Mexico. He also worked as a software engineer at Huawei. For the last 12 years, he has been working on behavior monitoring and analysis with machine learning and wearable devices.