The Art of Data Science (The Book + Lecture Videos)
The Art of Data Science
A Guide for Anyone Who Works with Data
About the Book
Data analysis is a difficult process largely because few people can describe exactly how to do it. It's not that there aren't any people doing data analysis on a regular basis. It's that the process by which we state a question, explore data, conduct formal modeling, interpret results, and communicate findings, is a difficult process to generalize and abstract. Fundamentally, data analysis is an art. It is not yet something that we can easily automate. Data analysts have many tools at their disposal, from linear regression to classification trees to random forests, and these tools have all been carefully implemented on computers. But ultimately, it takes a data analyst—a person—to find a way to assemble all of the tools and apply them to data to answer a question of interest to people.
This book writes down the process of data analysis with a minimum of technical detail. What we describe is not a specific "formula" for data analysis, but rather is a general process that can be applied in a variety of situations. Through our extensive experience both managing data analysts and conducting our own data analyses, we have carefully observed what produces coherent results and what fails to produce useful insights into data. This book is a distillation of our experience in a format that is applicable to both practitioners and managers in data science.
If you are interested in obtaining a printed copy of this book, you can purchase one at Lulu.
The package containing the lecture videos offers short commentaries on each of the chapters and contains addtional explanatory material for each of the topics. In addition there is some material in the lectures that is not included in the book.
The Book + Lecture Videos
This package includes the book and lecture video files. The videos and chapters are aligned so that together they make an ideal self-learning curriculum in which students interested in data science can pair video lectures with reading material. The videos complement the reading material by extending concepts covered in the book and by providing visual and auditory presentation of the concepts. This self-guided curriculum can be covered at any pace and the completion of material should provide students with a solid foundation for thinking about the data science process. The complete package should be of interest to students interested in doing their own data analyses and to people who need to manage data science teams.
- 1. Data Analysis as Art
2. Epicycles of Analysis
- 2.1 Setting the Scene
- 2.2 Epicycle of Analysis
- 2.3 Setting Expectations
- 2.4 Collecting Information
- 2.5 Comparing Expectations to Data
- 2.6 Applying the Epicycle of Analysis Process
3. Stating and Refining the Question
- 3.1 Types of Questions
- 3.2 Applying the Epicycle to Stating and Refining Your Question
- 3.3 Characteristics of a Good Question
- 3.4 Translating a Question into a Data Problem
- 3.5 Case Study
- 3.6 Concluding Thoughts
4. Exploratory Data Analysis
- 4.1 Exploratory Data Analysis Checklist: A Case Study
- 4.2 Formulate your question
- 4.3 Read in your data
- 4.4 Check the Packaging
- 4.5 Look at the Top and the Bottom of your Data
- 4.6 ABC: Always be Checking Your “n”s
- 4.7 Validate With at Least One External Data Source
- 4.8 Make a Plot
- 4.9 Try the Easy Solution First
- 4.10 Follow-up Questions
5. Using Models to Explore Your Data
- 5.1 Models as Expectations
- 5.2 Comparing Model Expectations to Reality
- 5.3 Reacting to Data: Refining Our Expectations
- 5.4 Examining Linear Relationships
- 5.5 When Do We Stop?
- 5.6 Summary
6. Inference: A Primer
- 6.1 Identify the population
- 6.2 Describe the sampling process
- 6.3 Describe a model for the population
- 6.4 A Quick Example
- 6.5 Factors Affecting the Quality of Inference
- 6.6 Example: Apple Music Usage
- 6.7 Populations Come in Many Forms
7. Formal Modeling
- 7.1 What Are the Goals of Formal Modeling?
- 7.2 General Framework
- 7.3 Associational Analyses
- 7.4 Prediction Analyses
- 7.5 Summary
8. Inference vs. Prediction: Implications for Modeling Strategy
- 8.1 Air Pollution and Mortality in New York City
- 8.2 Inferring an Association
- 8.3 Predicting the Outcome
- 8.4 Summary
9. Interpreting Your Results
- 9.1 Principles of Interpretation
- 9.2 Case Study: Non-diet Soda Consumption and Body Mass Index
- 10.1 Routine communication
- 10.2 The Audience
- 10.3 Content
- 10.4 Style
- 10.5 Attitude
- 11. Concluding Thoughts
- About the Authors
The Leanpub 45-day 100% Happiness Guarantee
Within 45 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
See full terms
Free Updates. DRM Free.
If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).
Most Leanpub books are available in PDF (for computers), EPUB (for phones and tablets) and MOBI (for Kindle). The formats that a book includes are shown at the top right corner of this page.
Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.
El Manual del ManagerKeyvan Akbary, Félix López, and Álvaro Salazar
¿Has deseado alguna vez el haber tenido una buena introducción al rol del Engineering Manager? En este libro aprenderás lo necesario para ejercer el rol de una manera efectiva: Expectativas y Responsabilidades del Rol, 1-1s, Ayudar a Crecer, Objetivos, Planes de Carrera, Cultura, Feedback, Contratación, Cultura de Producto y mucho más.
Functional Design and ArchitectureAlexander Granin
Software Design in Functional Programming, Design Patterns and Practices, Methodologies and Application Architectures. How to build real software in Haskell with less efforts and low risks. The first complete source of knowledge.
CCIE Service Provider Version 4 Written and Lab Exam Comprehensive GuideNicholas Russo
The service provider landscape has changed rapidly over the past several years. Networking vendors are continuing to propose new standards, techniques, and procedures for overcoming new challenges while concurrently reducing costs and delivering new services. Cisco has recently updated the CCIE Service Provider track to reflect these changes; this book represents the author's personal journey in achieving that certification.
CCIE SP v4.1 - WorkbookŁukasz Bromirski, Piotr Jablonski, and Nicholas Russo
Are you striving to prepare to and pass CCIE SP lab exam? Take the opportunity and get this workbook! With the attached initial cfg files you will prepare yourself for the CCIE SP exam as well as learn SP technologies applicable to all kinds of today modern networks! This workbook covers blueprint topics and provides challenging examples.
Ansible for KubernetesJeff Geerling
Ansible is a powerful infrastructure automation tool. Kubernetes is a powerful application deployment platform. Learn how to use these tools to automate massively-scalable, highly-available infrastructure.
Ansible for DevOpsJeff Geerling
Ansible is a simple, but powerful, server and configuration management tool. Learn to use Ansible effectively, whether you manage one server—or thousands.
Code Faster in DelphiAlister Christie
This book will make you a faster Delphi developer, it doesn't matter if you are just starting out, or have been using Delphi since version 1, you will find all sorts of tips, tricks and hacks to boost your productivity.
Practical FP in Scala: A hands-on approachGabriel Volpe
A practical book aimed for those familiar with functional programming in Scala who are yet not confident about architecting an application from scratch.
Together, we will develop a purely functional application using the best libraries in the Cats ecosystem, while learning about design patterns and best practices.
R Programming for Data ScienceRoger D. Peng
This book brings the fundamentals of R programming to you, using the same material developed as part of the industry-leading Johns Hopkins Data Science Specialization. The skills taught in this book will lay the foundation for you to begin your journey learning data science. Printed copies of this book are available through Lulu.
Cloud StrategyGregor Hohpe
“Strategy is the difference between making a wish and making it come true.” A successful migration to the cloud shouldn’t be driven by wishes, but guided by a sound strategy, frameworks, and decision models. This book tells you how—without becoming superficial nor getting lost in technology and product details.
11 BooksThe Quality Software Bundle is for managers, would-be managers, and any of us who find themselves being managed and confused. This comprehensive bundle covers the entire span of software development approaches, from hacking through waterfall, cascade, prototyping, Iterative enhancement, reusable code, off-the-shelf, to Agile teams. The bundle...
Growing Agile: The Complete Coach's Guide
7 BooksGrowing Agile: Coach's Guide Series This bundle provides a collection of training and workshop plans for a variety of agile topics. The series is aimed at agile coaches, trainers and ScrumMasters who often find themselves needing to help teams understand agile concepts. Each book in the series provides the plans, slides, handouts and activity...
11 BooksIn this bundle, you will find 10 different agile books. They are about different aspects of being agile. - finding a job - doing coding dojo's - Retrospectives - Personal kanban - a non-typical coaching book and even a book that gives you an insight in the lives of some agile people.
WTFlop 6M + HU - Beta Bundle
Fifty Quick Ideas
3 BooksGet all three books for the price of two! Fifty Quick Ideas books are full of practical, real-world techniques that you can use to improve teamwork, build better products and build them in a better way.
Growing Agile: Coach's Guide Series
4 BooksThis bundle provides a collection of training and workshop plans for a variety of agile topics. The series is aimed at agile coaches, trainers and ScrumMasters who often find themselves needing to help teams understand agile concepts. Each book in the series provides the plans, slides, handouts and activity instructions to run a number of...
Marionette.js A to Z
Complete Scala Bundle
3 BooksScala is a general-purpose programming language and it's getting extremely popular these days. Some say that learning Scala could be a challenging task. My experience, however, suggests that this is actually a myth that has very little to do with reality. With the right approach, learning Scala can be easy, fun and rewarding.The first book from...
Build A Better Backbone App
3 BooksThe best way to learn new development skills is through experience, but that takes time you don't have.Get the best of both worlds with this bundle: you'll learn how to produce modern web applications by learning from experienced developers like Derick Bailey and David Sulc. BackboneJS is one of the favorite tools on the web today, but it...