Data Science Essentials
Data Science Essentials
Essential Tools Before Doing Data Science (Python)
About the Book
#1 About the book
This book comes from my experience teaching Python in a variety of settings and through different stages of its (and my) development. Much of the material has been taken from multiple drafts that I wrote on machine learning and data science as well as the Python Programming drafts too that I wrote through my blog. I’m looking forward to teaching Python to people as long as people will let me, and I’m interested in seeing how the next generation of students will approach it (and how my approach to them will change). Overall, it’s been just an amazing experience to see the widespread adoption of Python over the past decade. I’m sure the next decade will be just as amazing.
#2 How this book is Organized?
This book is not a partbased structured, it is an Iterationbased structured. Each Iteration is meant to teach you a specific thing in Data Science, for example:
 Iteration One: Introduction to Data Science.
 Iteration Two:
Python
.  Iteration Three:
Jupyter
.  Iteration Four:
Numpy
.  Iteration Five:
Pandas
.  Iteration Six:
Matplotlib
.  Iteration Seven:
SKLearn
.  Iteration Eight: Finals.
#3 Iterations Done So Far...
The book is on going progress, so we have done the first seven iterations out of eight, and soon will publish the third. Stay tuned for the sheets coming with the book and the updates with the latest tech...
#4 Updated Regularly???
If you bought one of my books, you will find that those books are updated monthly, based on calendar schedule, we do set the next book that will be updated. So, you should subscribe to the readers mail to get info about the next updated book.
Bundles that include this book
Table of Contents
 Iteration One: Introduction to Data Science

Chapter 1. A Gentle Introduction
 1.1 About Data.
 1.2 What Is Data Science?
 1.3 Motivating Hypothetical.
 1.4 What Is This Book About?
 1.5 Why Python for Data Analysis?
 1.6 Summary.

Chapter 2. Talking About Data Analysis
 2.1 Data Analysis.
 2.2 Knowledge Domains of the Data Analyst.
 2.3 Understanding the Nature of the Data.
 2.4 The Data Analysis Process.
 2.5 Quantitative and Qualitative Data Analysis.
 2.6 Data In This Book.
 2.7 Python and Data Analysis.
 2.8 Summary.
 Iteration Two: Python
 Chapter 3. Introduction to Python and it’s Environment
 3.1 What is Python?.
 3.2 The Python Philosophy.
 3.3 Back to Python.
 3.4 Basic Features of Python.
 3.5 Free Software.
 3.6 Design of the Python System.
 3.7 Limitations of Python.
 3.8 Python Resources.
 3.9 Summary.

Chapter 4. Crashing Python
 4.1 Getting Started with the Python Interface
 4.2 Choosing the IDE.
 4.3 Beyond the Python you Know.
 4.4 The Basics.
 4.5 The Advanced.
 4.6 Summary.

Chapter 5. Your Favourite IDE is Jupyter
 5.1 Introduction.
 5.1.1 What is IPython?
 5.1.2 What is Jupyter?
 5.2 Installing IPython the Jupyter.
 5.3 Introducing the Jupyter Notebook.
 5.3.1 Jupyter Main Page.
 5.3.2 Jupyter Notebook.
 5.3.2.1 What is an ipynb File?
 5.3.2.2 Code Cells.
 5.3.2.3 Markdown Cells.
 5.4 Basic Functionality and Features.
 5.4.1 Exercise 1 — Navigating the Platform.
 5.4.1 Exercise 2 — Implementing Jupyter Features.
 5.5 Getting Started with Exploratory Data Analysis in the Jupyter
Notebook.
 5.5.1 The Import and Preparations.
 5.5.2 Locating The Data on Web.
 5.5.3 Loading Data in Pandas.
 5.5.4 Checking Data.
 5.5.5 Summarizing Data.
 5.5.6 Data Visualization.
 5.5.7 Advanced Analysis.
 5.5.8 Grouping Data.
 5.5.9 Display The Analysis.
 5.5.10 The Interactivity.
 5.6 Tips About Matplotlib and Pandas.
 5.7 Summary.
 5.1 Introduction.

Chapter 6. Crashing Python.
 6.1 Introduction.
 6.1.1 The Jupyter Notebook Ecosystem.
 6.1.2 The Jupyter Notebook Architecture.
 6.1.2.1 Connecting Multiple Clients to One Kernel.
 6.1.2.2 JupyterHub.
 6.1.3 Jupyter Notebooks Security.
 6.2 Teaching Programming in The Notebook with IPython Blocks.
 6.3 Converting a Jupyter Notebook to Other Formats with nbconvert.
 6.4 Mastering Widgets in The Jupyter Notebook.
 6.5 Creating Custom Jupyter Notebook Widgets in Python, HTML, and JavaScript.
 6.6 Configuring The Jupyter Notebook.
 6.7 Creating an IPython Extension with Custom Magic Commands.
 6.8 Mastering IPython's Configuration System.
 6.8.1 Configurables.
 6.8.2 Magics.
 6.9 Introducing JupyterLab.
 6.10 Summary.
 6.1 Introduction.

Chapter 7. NumPy Introduction and Basics
 7.1 Introduction.
 7.1.1 Notes Around BuiltIn Documentation.
 7.2 Data Types in Python.
 7.2.1 A Python Integer Is More Than Just an Integer.
 7.2.2 A Python List Is Not Just a List.
 7.2.3 FixedType Arrays in Python.
 7.2.4 Creating Arrays from Python Lists.
 7.2.5 Creating Arrays from Scratch.
 7.2.6 NumPy Standard Data Types.
 7.3 The Basics of NumPy Arrays.
 7.3.1 NumPy Array Attributes.
 7.3.2 Array Indexing: Accessing Single Elements.
 7.3.3 Array Slicing: Accessing Subarrays.
 7.3.3.1 Onedimensional subarrays.
 7.3.3.2 Multidimensional subarrays.
 7.3.3.3 Subarrays as nocopy views.
 7.3.3.4 Creating copies of arrays.
 7.3.4 Reshaping of Arrays.
 7.3.5 Array Concatenation and Splitting.
 7.3.5.1 Concatenation of arrays.
 7.3.5.2 Splitting of arrays.
 7.4 Summary.
 7.1 Introduction.

Chapter 8. NumPy Advanced
 8.1 Computation on NumPy Arrays―Universal Functions.
 8.1.1 Compared To Loops.
 8.2 Introducing UFuncs.
 8.2.1 Exploring NumPy’s UFuncs.
 8.2.1.1 Array arithmetic.
 8.2.1.2 Absolute value.
 8.2.1.3 Trigonometric functions.
 8.2.1.4 Exponents and logarithms.
 8.2.1.5 Specialized ufuncs.
 8.2.2 Advanced Ufunc Features.
 8.2.2.1 Specifying output.
 8.2.2.2 Aggregates.
 8.2.2.3 Outer products.
 8.2.3 Ufuncs: Learning More.
 8.2.1 Exploring NumPy’s UFuncs.
 8.3 Aggregations: Min, Max, and Everything in Between.
 8.3.1 Summing the Values in an Array.
 8.3.2 Minimum and Maximum.
 8.3.2.1 Multidimensional aggregates.
 8.3.2.2 Other aggregation functions.
 8.3.3 Example―What Is the Average Height of US Presidents?
 8.4 Computation on Arrays: Broadcasting.
 8.4.1 Introducing Broadcasting.
 8.4.2 Rules of Broadcasting.
 8.4.2.1 Broadcasting Example 1.
 8.4.2.2 Broadcasting Example 2.
 8.4.2.3 Broadcasting Example 3.
 8.4.3 Broadcasting in Practice.
 8.4.3.1 Centering an array.
 8.4.3.2 Plotting a twodimensional function.
 8.5 Comparisons, Masks, and Boolean Logic.
 8.5.1 Example: Counting Rainy Days.
 8.5.1.1 Digging into the data.
 8.5.2 Comparison Operators as ufuncs.
 8.5.3 Working with Boolean Arrays.
 8.5.3.1 Counting entries.
 8.5.3.2 Boolean operators.
 8.5.4 Boolean Arrays as Masks.
 8.5.1 Example: Counting Rainy Days.
 8.6 Fancy Indexing.
 8.6.1 Exploring Fancy Indexing.
 8.6.2 Combined Indexing.
 8.6.3 Example: Selecting Random Points.
 8.6.4 Modifying Values with Fancy Indexing.
 8.6.5 Example: Binning Data.
 8.7 Sorting Arrays.
 8.7.1 Fast Sorting in NumPy: np.sort and np.argsort.
 8.7.1.1 Sorting along rows or columns.
 8.7.2 Partial Sorts: Partitioning.
 8.7.3 Example: kNearest Neighbors.
 8.7.1 Fast Sorting in NumPy: np.sort and np.argsort.
 8.8 Structured Data: NumPy’s Structured Arrays.
 8.8.1 Creating Structured Arrays.
 8.8.2 More Advanced Compound Types.
 8.8.3 RecordArrays: Structured Arrays with a Twist.
 8.9 Summary.
 8.1 Computation on NumPy Arrays―Universal Functions.

Chapter 9. Pandas Introduction and Basics
 9.1 Common introductory questions.
 9.1.1 What is Pandas?
 9.1.2 What Pandas Can Do?
 9.2 Exploring Series, DataFrame and Index Objects.
 9.2.1 Series.
 9.2.2 DataFrame.
 9.2.3 Index Objects.
 9.3 Summary.
 9.1 Common introductory questions.

Chapter 10. Pandas and Data Wrangling
 10.1 Essential Functionality.
 10.1.1 Reindexing.
 10.1.2 Dropping Entries from an Axis.
 10.1.2.1 Indexing, Selection, and Filtering.
 10.1.2.2 Selection with loc and iloc.
 10.1.3 Integer Indexes.
 10.1.4 Arithmetic and Data Alignment.
 10.1.4.1 Arithmetic methods with fill values.
 10.1.5 Operations between DataFrame and Series.
 10.1.6 Function Application and Mapping.
 10.1.7 Sorting and Ranking.
 10.1.8 Axis Indexes with Duplicate Labels.
 10.2 Summarizing and Computing Descriptive Statistics.
 10.2.1 Unique Values, Value Counts, and Membership.
 10.3 Summary.
 10.1 Essential Functionality.

Chapter 11. Matplotlib Introduction and Basics
 11.1 Introduction to Data Visualization with Matplotlib.
 11.2 Getting Started.
 11.2.1 Importing plt.
 11.2.2 Matplotlib Different Styles.
 11.2.3 Usage of show Method.
 11.2.4 Plotting From a Script.
 11.2.4.1 Plotting from an IPython shell.
 11.2.4.2 Saving Figures to File.
 11.3 Simple Line Plots.
 11.3.1 Line Colors and Styles.
 11.3.2 Labeling Plots.
 11.3.2 Tips and Tricks.
 11.4 Simple Scatter Plots.
 11.4.1 Scatter with plt.plot .
 11.4.2 Scatter with plt.scatter .
 11.5 Visualizing Errors.
 11.5.1 Basic Error Bars.
 11.5.2 Continuous Errors.
 11.6 Density and Contour Plots.
 11.6.1 Visualizing a ThreeDimensional Function.
 11.7 Histograms, Binnings, and Density.
 11.8 Summary.

Chapter 12. The Art of Matplotlib Data Visualization
 12.1 Legends Customization.
 12.1.1 Choosing Elements for the Legend.
 12.1.2 Legend for Size of Points.
 12.1.3 Multiple Legends.
 12.2 Colorbars.
 12.2.1 Customizing Colorbars.
 12.2.2 Color limits and extensions.
 12.2.3 Discrete colorbars.
 12.2.4 Example — Handwritten Digits.
 12.3 Multiple Subplots.
 12.3.1 Handmade Subplots with plt.axes .
 12.3.2 Simple Grids of Subplots with plt.subplot .
 12.3.3 All in One Plot with plt.subplots .
 12.3.4 Going Deeper with plt.GridSpec .
 12.4 Visualization with Seaborn.
 12.4.1 Seaborn Against Matplotlib.
 12.4.2 Exploring Seaborn Plots.
 12.4.3 Histograms vs. KDEs vs Joint plots.
 12.4.4 Pair plots.
 12.4.5 Faceted histograms.
 12.4.6 Factor plots.
 12.4.7 Joint distributions.
 12.4.8 Bar plots.
 12.4.9 Example — Exploring Marathon Finishing Times.
 12.5 Summary.
 12.1 Legends Customization.

Chapter 13. SciKit Learn Introduction
 12.8 Customizing Plot Legends.
 13.1 Introduction.
 13.2 Data Representation.
 13.2.1 Tabular Data.
 13.2.2 Features Matrix.
 13.2.3 Target Vector.
 13.3 SciKitLearn API Structure.
 13.4 Estimator API.
 13.4.1 Basics of the API.
 13.4.2 Supervised Learning Example—Linear Regression.
 13.4.2.1 Step 1: Choose a Class of Your Model.
 13.4.2.2 Step 2: Choose Model hyper‐parameters.
 13.4.2.3 Step 3: Features Matrix and Target Vector.
 13.4.2.4 Step 4: Fit the Model with Data.
 13.4.2.5 Step 5: Predict Targets for Unknown Data.
 13.4.3 Supervised Learning Example—Iris Dataset Classification.
 13.4.4 Unsupervised Learning Example—Iris Dataset Dimensionality.
 13.4.5 Unsupervised Learning Example—Iris Dataset Clustering.
 13.5 An Exploring Application—Handwritten Digits.
 13.5.1 Loading and visualizing the digits data.
 13.5.2 Unsupervised learning—Dimensionality reduction.
 13.5.3 Classification on digits.
 13.6 Checkpoint.
 13.7 hyper‐parameters and Model Validation.
 13.8 Thinking About Model Validation.
 13.8.1 Model validation the wrong way.
 13.8.2 Model validation the right way: Holdout sets.
 13.8.3 Model validation via crossvalidation.
 13.9 Selecting the Best Model.
 13.9.1 The bias–variance tradeoff.
 13.9.2 Validation curves in ScikitLearn.
 13.10 Learning Curves.
 13.10.1 Learning curves in ScikitLearn.
 13.11 Practicing Validation—Grid Search.
 13.12 Making a Checkpoint.
 13.13 Feature Engineering.
 13.14 Feature Engineering—Categorical Features.
 13.15 Feature Engineering—Text Features.
 13.16 Feature Engineering—Image Features.
 13.17 Feature Engineering—Derived Features.
 13.18 Feature Engineering—Imputation of Missing Data.
 13.19 Feature Pipelines.
 13.12 Summary.

Chapter 14. SciKit Learn Machine Learning
 14.1 Introduction.
 14.2 Supervised Classification — Naive Bayes.
 14.2.1 Naive Bayes Classifier.
 14.2.2 Gaussian Naive Bayes.
 14.2.3 Multinomial Naive Bayes.
 14.2.3.1 Example: Text Classification.
 14.2.4 When to Use Naive Bayes.
 14.3 Supervised Regression — Linear Regression.
 14.3.1 Simple Linear Regression.
 14.3.2 Basis Function Regression.
 14.3.2.1 Polynomial basis functions.
 14.3.2.2 Gaussian basis functions.
 14.3.3 Regularization.
 14.3.3.2 Lasso regularization (L1 regularization).
 14.3.3.1 Ridge regression (L2 regularization).
 14.3.4 Example: Predicting Bicycle Traffic.
 14.4 Unsupervised — Principal Component Analysis.
 14.4.1 Introducing Principal Component Analysis.
 14.4.1.1 PCA as dimensionality reduction.
 14.4.1.2 PCA for visualization: Handwritten digits.
 14.4.1.3 The Components.
 14.4.1.4 Choosing the number of components.
 14.4.2 PCA as Noise Filtering.
 14.4.3 Example: Eigenfaces.
 14.4.1 Introducing Principal Component Analysis.
 14.5 Summary.
 Chapter 15. What’s Next.
Other books by this author
The Leanpub 60 Day 100% Happiness Guarantee
Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.
You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!
So, there's no reason not to click the Add to Cart button, is there?
See full terms...
80% Royalties. Earn $16 on a $20 book.
We pay 80% royalties. That's not a typo: you earn $16 on a $20 sale. If we sell 5000 nonrefunded copies of your book or course for $20, you'll earn $80,000.
(Yes, some authors have already earned much more than that on Leanpub.)
In fact, authors have earnedover $12 millionwriting, publishing and selling on Leanpub.
Learn more about writing on Leanpub
Free Updates. DRM Free.
If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books inprogress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).
Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.
Finally, Leanpub books don't have any DRM copyprotection nonsense, so you can easily read them on any supported device.
Learn more about Leanpub's ebook formats and where to read them