Introduction to Data Engineering
Minimum price
Suggested price

Introduction to Data Engineering

Learn the skills needed to break into Data Engineering.

About the Book

This is a book about the basic theories around data engineering. It's not about writing code in a particular language, it's about the concepts that you can use to learn and thrive as a data engineer.

About the Author

Daniel Beach
Daniel Beach

Daniel Beach is a data engineer who has been building high throughput, large, scalable data pipelines for data warehousing and machine learning system for years.

Table of Contents

  • Introduction
    • Knowledge and Experience
    • What are the topics we will cover?
  • Chapter 1 - The Theory.
    • What Is a Data Pipeline?
    • Data Pipelines built with Passion and Creativity
    • Storage and File Types
    • Access
    • Repeatable
    • Resilient
    • Scalable
    • In Summary
  • Chapter 2 - Data Pipeline Basics
    • Project Structure
    • Data Pipeline Code Structure
    • Code Readability and Organization
    • Tests.
    • Documentation
    • Containerzation
    • Architecture First
    • Review
  • Chapter 3 - Pipeline Architecture
    • Architecture Applied to Data
    • Data Size and Velocity
    • Calculating Compute Requirements
    • Calculating Storage Requirements
    • Understanding the End Result
    • Understanding Cost
    • Code Architecture
    • Batch vs Streaming Architecture
    • Puzzle Pieces
    • Summary
  • Chapter 4 - Storage
    • Access Patterns
    • SQL/NoSQL Databases vs Files.
    • File Types
    • Row vs Columnar Storage.
    • Common file types in data engineering.
    • Parquet.
    • Avro.
    • Orc.
    • CSV / Flat-file.
    • JSON
    • Compression.
    • Storage location.
    • Partitions.
  • Chapter 5 - Compute and Resources
    • Overview
    • RAM/Memory
    • CPU/Cores
    • Storage
    • Cluster/Nodes
  • Chapter 6 - Mastering SQL
    • Introduction To SQL
    • Does the type of database matter?
    • The fundamentals of SQL/Databases.
    • OLTP vs. OLAP
    • Table design/layout.
    • Table Design in Real Life.
    • Understanding Indexing Basics.
    • How to write fast/tune queries.
    • Where to look for common problems.
    • SQL Fundementals
    • Python + SQL
    • SQL Summary
  • Chapter 7 - Data Warehousing / Data Lakes
    • Data Warehouse vs Data Lake vs Lake House
    • Data Modeling in Data Warehouses, Data Lakes, and Lake Houses.
    • Facts and Dimensions.
    • Constraints and Schema.
    • Data Types.
    • Column Names.
    • The Role of ID’s in a Data Warehouses or Data Lake.
    • CDC / History Tracking.
    • Summary
  • Chapter 8 - Data Modeling
    • Data Types and Schema.
    • Data Types.
    • Example
    • Data Size.
    • Constraints.
    • Data Definitions.
    • Modeling Data Logically.
    • Logical data models lead to physical relationships.
    • Grain of Data.
    • Uniqueness of Data.
    • Access Patterns.
    • Example
    • Talking to the Business.
    • Normal Forms.
    • De-Duplication of Data.
    • Join Integrity.
    • Keys - Primary and Foreign.
    • The Idea Behind Keys.
    • Relational Databases (SQL) vs Data Lake (File Based) Modeling.
    • The number of Fact tables and Dimensions and normalization.
    • File size and table size matter in the new File-Based Data Lakes.
    • Partitions vs Indexes.
    • Walking the data model line between old and new.
  • Chapter 9 - Data Quality
    • What is Data Quality.
    • Reasoning about data.
    • Double meanings.
    • Data value quality.
    • Measures of Data Quality.
    • Correct Header or Column Names.
    • Correct File Formatting.
    • Correct data types.
    • Values ranges and values integrity.
    • Data Quality Applied
  • Chapter 10 - DevOps for Data Engineers
    • DevOps applied to Data Engineering
    • Dockerfiles and Docker-compose.
    • Unit Testing.
    • CI/CD.
    • Automation is the name of the game.
    • CI for Data Engineering
  • Conclusion

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

80% Royalties. Earn $16 on a $20 book.

We pay 80% royalties. That's not a typo: you earn $16 on a $20 sale. If we sell 5000 non-refunded copies of your book or course for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earnedover $13 millionwriting, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub