Learn Data with Bash Shell (The Book )
Minimum price
Suggested price

Learn Data with Bash Shell

Explore real-world data at the Linux command line

About the Book

Bash may not the best way to handle all kinds of data! But, there often comes a time when you are provided with a pure Bash environment, such as what you get in the common Linux based super computers and you just want an early result or view of the data before you drive into the real programming, using Python, R and SQL, SPSS, and so on.

Why this book?

Expertise in these data-intensive languages also comes at the price of spending a lot of time on them. In contrast, bash scripting is simple, easy to learn and perfect for mining textual data! Particularly if you deal with genomics, microarrays, social networks, life sciences, and so on. It can help you to quickly sort, search, match, replace, clean and optimise various aspect of your data, and you wouldn’t need to go through any tough learning curves.

There are several examples of practical data mining that will have a flow of importing specific data resources into flat text-type files. Bash can run different programs (grep, sort, sed, and so on) on those files, clean, optimise and extract preliminary views (cut, csvlook, view, cat, head, etc.) of the data. There is one part of data mining, which involves unstructured data and then transforming it into a structured one (awk, shell). A scripting language like Bash can be very useful for doing the transformation. We strongly believe, learning and using Bash shell scripting should be the first step if you want to say, Hello Big Data!

This book starts with some practical bash-based flat file data mining projects involving:

If you haven’t used Bash before, feel free to skip the projects and get to the tutorials part. Read the tutorials and then come back to the projects again. The tutorial section will introduce with bash scripting, regular expressions, AWK, sed, grep and so on.

Finally, it gives you a concise beginner friendly guide to the big data landscape including an overview of the critical Big Data tools such as HDFS, MapReduce, YARN, Flume, Hive and more. The book finishes with a near-complete list of references to all the relevant command line and Big data tools.

Get the interactive version!

  • Share this book

  • Categories

    • Data Science
  • Installments completed

    5 / 5

About the Author

Scientific Programming School
Scientific Programmer

Scientific programming is a rapidly growing multidisciplinary field that uses advanced computing capabilities to understand and solve complex problems. 

The Scientific programming school team helps you to learn the use of scientific programming languages, such as CUDA, Julia, OpenMP, MPI, C++, Matlab, Octave, Bash, Python Sed and AWK including RegEx in processing scientific and real-world data. The team is formed by PhD educated instructors in the areas of Computational Sciences.

The team deploys interactive courses at Scientific Programming School (now Learnitive.com) which is an interactive and advanced e-learning platform for learning scientific coding giving you the opportunity to run scientific codes/ OS commands as you learn with playgrounds and Interactive shells inside your browser.


The Book

The Book only!

  • PDF

  • EPUB

  • WEB

  • English

Minimum price
Suggested price
The Book + Data sets + Code Samples + Video Lectures

The Book + Data sets + Code Samples + Video Lectures (animated)!


  • extras
    Data sets

    Project data sets: a) University ranking data, b) Facebook data c)AU Crime Data d) Shakespeare-era plays and poems data

  • extras
    Code samples

    Code samples for the Learn Data with Bash Shell projects

  • extras
    Video Tutorials

    Instructional videos and whiteboard animations covering every project in this book

  • PDF

  • EPUB

  • WEB

  • English

Minimum price
Suggested price

Reader Testimonials

Ramon Diaz
Ramon Diaz

Great job!

Excellent explanation and content love the real world based scenarios and coding involved with this course. Great job!

Tori Joy
Tori Joy


Easy to understand tutorials on bash commands with practical data mining projects.

John de Vries
John de Vries

Good for beginners!

If you want to learn Bash in the context practical hands-on projects, this is the best course for you, but I think it 's been targeted for the beginners only. So if you don't know how to sort, uniq, use bash functions, awk for basic tasks this course is right for you. I enjoyed the animated presentations, it's just awesome!


Makes sense!

This is really the course which makes sense to use bash in order to solve data problem rather than just focusing on syntax, this way people learn better as it makes sense that whether you can make use of these commands.

Table of Contents

    • About
    • Introduction
      • What is Bash ?
      • When Bash is useful?
      • Bash in data mining
      • Who is this book for?
      • How to read this book?
  • Part 1: Projects
    • Project 1: The ‘US News’ Uni Ranks
      • Dataset Preview
      • Data Analysis
        • Find the colleges
        • Finding the percent of colleges in the ranklist
        • Listing the Institutes from a given state
        • Finding the number of Institutes from each state
        • Finding a correlation between ranks and tuition fees?
      • Chapter Summary
    • Project 2: Facebook Data Mining
      • Dataset Preview
        • How many colums and rows?
        • How the data looks like?
      • Data Analysis
        • How many status, in each status type?
        • Find the most popular status entry
      • Chapter Summary
    • Project 3: Best Australian Cities - Least Crimes
      • Data Preview
      • Finding the number of rows and columns
        • The hard way
        • The easy way
      • Data Analysis
        • Finding the top most crime in the whole country
        • Finding the top most crime per city
        • Finding the best city in Australia!
      • Chapter Summary
    • Project 4: Mining Shakespear-era Plays and Poems
      • Data Preview
      • Analysis
        • How many plays/poems?
        • How many plays/poems by each author?
        • What are the most frequent words?
      • Chapter Summary
  • Part 2: Tutorials
    • Hello Bash!
      • which bash?
      • Hello world! bash
      • Bash variables
      • Bash functions
      • Bash meta characters
        • Bash quotation basics
      • Read and store user input
      • Bash redirections
      • Bash if-else (conditional statements)
      • Bash case statement
      • Bash loop statements
      • Bash arithmatic
      • Bash arrays
    • Hello ! Regular Expressions
      • REGEX Types
      • Basic Regular Expressions
        • Metachar .
        • Metachar [ ]
        • Metachar [^ ]
        • Metachar ^
        • Metachar $
        • Metachar ( )
        • Metachar *
        • Metachar {m,n}
      • Extended Regular Expressions
        • Metachar ?
        • Metachar +
        • Metachar |
      • REGEX Character Classes
      • REGEX Look Arounds
        • REGEX Atomic Groups (?>)
      • How to Use REGEX in Bash?
    • Hello! AWK
      • AWK Built-in Variables
      • AWK statements
      • AWK built-in functions
      • AWK Examples
        • Example 1. AWK print function
        • Example 2. AWK print specific field
        • Example 3. AWK’s BEGIN and END Actions
        • Example 4. AWK fields variable ($1, $2 and so on)
        • Example 5. AWK built-in variables
        • Example 6. AWK fields comparison >
      • Self-contained AWK scripts
    • Hello! SED, GREP and Find
      • SED - Stream Editor
      • SED substitution
        • Some important SED options
        • SED substitute and regular expressions
        • SED delete
        • SED print
        • SED grouping
      • GREP
        • GREP and regular expressions
        • Find command find
  • Part 3: Hello Big Data!
      • Big Data Terminologies
        • HDFS
        • Map Reduce
        • YARN
        • Flume
        • SOOOP
        • Hive
        • Pig
        • Spark
        • HBase
        • Big Data file formats
    • Conclusion
  • References
      • Bash
      • REGEX
      • AWK
      • SED
      • GREP
      • Big data
      • A companion book

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

80% Royalties. Earn $16 on a $20 book.

We pay 80% royalties. That's not a typo: you earn $16 on a $20 sale. If we sell 5000 non-refunded copies of your book or course for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earnedover $13 millionwriting, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub