Email the Author

You can use this page to email Enrique Garcia-Ceja about MOST COMMON MISTAKES IN MACHINE LEARNING AND HOW TO AVOID THEM.

Please include an email address so the author can respond to your query

This message will be sent to Enrique Garcia-Ceja

This site is protected by reCAPTCHA and the Google  Privacy Policy and  Terms of Service apply.

About the Book

This book is a compilation of the most common mistakes when building machine learning models. I have gathered this list from mistakes I typically find when grading assignments, supervising graduate students, reading blog posts, looking at the accompanying code of published papers, and of course, from my own experience making those mistakes.

This book includes examples in Python. Some examples of mistakes that you will find in this book include:

- Not understanding the data

- Including irrelevant variables

- Data injection

- Assuming all users behave the same

- Wasting unlabeled data

- and much more!

Table of Contents

Introduction

Terminology

1 Not understanding the data

2 Reporting train performance

3 Not setting a seed value

4 Including irrelevant features

5 Ignoring differences in scales

6 Using the test set for fine tunning

7 Only reporting accuracy

8 Not comparing against a baseline

9 Not accounting for variance

10 Injecting data into the test set

11 Not shuffling the training data

12 Not saving the results 

13 Not parallelizing 

14 Encoding categories as integers

15 Forget data changes over time

16 Ignoring inter-user variance

17 Wasting unlabeled data

Apendix Setup Your Environment


About the Author

Enrique Garcia-Ceja’s avatar Enrique Garcia-Ceja

Enrique is a professor at Tecnologico de Monterrey University. Previously, he worked as a data scientist at Optimeering, Norway and as a Researche Scientist at SINTEF, Norway. He did a postdoc at the University of Oslo and received his PhD degree in intelligent systems from Tecnologico de Monterrey University, Mexico. He also worked as a software engineer at Huawei. For the last 12 years, he has been working on behavior monitoring and analysis with machine learning and wearable devices.

Logo white 96 67 2x

Publish Early, Publish Often

  • Path
  • There are many paths, but the one you're on right now on Leanpub is:
  • Most-common-ml-mistakes › Email Author › New
    • READERS
    • Newsletters
    • Weekly Sale
    • Monthly Sale
    • Store
    • Home
    • Redeem a Token
    • Search
    • Support
    • Leanpub FAQ
    • Leanpub Author FAQ
    • Search our Help Center
    • How to Contact Us
    • FRONTMATTER PODCAST
    • Featured Episode
    • Episode List
    • MEMBERSHIPS
    • Reader Memberships
    • Department Reader Memberships
    • Author Memberships
    • Your Membership
    • COMPANY
    • About
    • About Leanpub
    • Blog
    • Contact
    • Press
    • Essays
    • AI Services
    • Imagine a world...
    • Manifesto
    • More
    • Partner Program
    • Causes
    • Accessibility
    • AUTHORS
    • Write and Publish on Leanpub
    • Create a Book
    • Create a Bundle
    • Create a Course
    • Create a Track
    • Testimonials
    • Why Leanpub
    • Services
    • TranslateAI
    • TranslateWord
    • TranslateEPUB
    • PublishWord
    • Publish on Amazon
    • CourseAI
    • GlobalAuthor
    • Marketing Packages
    • IndexAI
    • Author Newsletter
    • The Leanpub Author Update
    • Author Support
    • Author Help Center
    • Leanpub Authors Forum
    • The Leanpub Manual
    • Supported Languages
    • The LFM Manual
    • Markua Manual
    • API Docs
    • Organizations
    • Learn More
    • Sign Up
    • LEGAL
    • Terms of Service
    • Copyright Policy
    • Privacy Policy
    • Refund Policy

*   *   *

Leanpub is copyright © 2010-2025 Ruboss Technology Corp.
All rights reserved.

This site is protected by reCAPTCHA
and the Google  Privacy Policy and  Terms of Service apply.

Leanpub requires cookies in order to provide you the best experience. Dismiss