Advanced Python for Data Science
Advanced Python for Data Science
About the Book
Today, most people enter the world of Data Science through the buzz and allure of “AI.” We tackle Kaggle challenges, voraciously consume Stack Overflow, and eat, live, and breathe through the Jupyter Notebook. Python, along with its “killer app” of Machine Learning, has done nothing short of revolutionize the way we “do data science,” and the world is a more interesting place because of it!
The Big Cloud providers, and many open source tools, have done wonders to democratize this technology. But, ‘easy access’ to high technology comes with a cost - we can easily go too far, rely too much on the tools we have today, and forget how to build the tools we need to truly transform our individual projects.
Most of the time, your impact as a Data Scientist is limited by your ability to enact your ideas - not by the ideas themselves. You can train a model on ‘clean’ data using Scikit Learn or FastAI, or run an ANOVA, in a notebook. Enacting that idea means getting to the data in the first place. It means knowing how to store it. It means processing your data at scale. It means running your processing script, reliably, every day on fresh data. It means testing that script. It means collaborating on that script with a coworker - or 10 - as the project scales. It means curating a library and building tools to solve the same problem for 5 new projects. It means packaging a model up for distribution - sharing with another data scientist, or deploying it as a service.
It means changing the way you think about problems by adopting new paradigms that accelerate you - and your work - across your organization. It means building an approach to data science within the broader python ecosystem.
This book is about python, and how to be an effective python programmer, as a Data Scientist. We learn the advanced python skills we need to accelerate you, and solve the real, daily problems you face in your DS role.
Table of Contents
-
Preface
-
Introduction
- What is this book?
- Who is it for?
- What will you learn?
- The state of the book
-
An introduction to Advanced Python
- What is Advanced Python?
- Tech Requirements
- Helpful Themes
- Readings
-
Introduction
-
Workflows
-
Continuous science
- Debugging
- “Bug Report” Rules
- Primers
- Higher Levels
- Testing
- Readings
-
Scientific workflows
- What is Python?
- Config
- Decorators
- Bootstraping
- Readings
-
Packages and iteration
- Packages
- Versioning
- Functional Programming
-
Avoiding the
for
loop- Primitives
- Vectorizing
- Einstein Summation
- Iteration Primitives
- Vectorization: A Case Study
- Iterators
- Readings
-
Continuous science
-
Skeletons
-
Classes, composition, and graphs
- Inheritance
- Composition
-
The DAG
- Graphical Programs
- What does Data Science look like?
- The Revelation
-
Luigi
- Project scaffolding
- The Task
- The Pieces
- The Big Picture
- Atomicity
- Atomicity
- Readings
-
Graphs
- Luigi
- The Big Picture
- Salted Graphs
- The Sorry State of Stateful Data
- Advanced Luigi
-
Classes, composition, and graphs
-
Data
-
Dask and Parquet
- Micro Sciences
- Dask - Basics
- Rookie Mistakes
- Executive Summary
- Split, Apply, Combine
- Data Containers
- Parquet
- Dask - Partitioning
- Case Study - Fancy Indexing
- Dask and Luigi
- Readings
-
Django and SQL
- Mutability
- Living Data
- Django
- ORM
- Django Code
- ORM Breakdowns
- The Competition
- Readings
-
API’s and Data
- Metaprogramming
- DB Design
- Atomic Targets
- Migrations
- The Web
- APIs
- Reading
-
More Meta
- Api’s and Clients
- Factories
- Optimization
- Readings
-
Dask and Parquet
-
Algorithms
-
Smart & Lazy Coding
- Parallel Code
- Memory Views
- Memoization
- Sketching
- Readings
-
Visualization
- Data Viz
- Declarative Grammars
- Javascript and HTML5
- Colormaps
- Data Shading
- Readings
-
Where We Are
- “Python”
- Testing
- Workflows
- Higher Levels
- Deployment
- Looping
- Functional Coding
- Composition
- Graphical Programs
- Data Scaling
- The Web
- DB’s
- API’s
- Meta
- Optimization
- Visualization
-
Smart & Lazy Coding
-
Appendix
- Changelog
The Leanpub 60 Day 100% Happiness Guarantee
Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.
You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!
So, there's no reason not to click the Add to Cart button, is there?
See full terms...
Earn $8 on a $10 Purchase, and $16 on a $20 Purchase
We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.
(Yes, some authors have already earned much more than that on Leanpub.)
In fact, authors have earnedover $14 millionwriting, publishing and selling on Leanpub.
Learn more about writing on Leanpub
Free Updates. DRM Free.
If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).
Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.
Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.
Learn more about Leanpub's ebook formats and where to read them