Learn Data with Bash Shell (The Book )
Learn Data with Bash Shell
Explore real-world data at the Linux command line
About the Book
Bash may not the best way to handle all kinds of data! But, there often comes a time when you are provided with a pure Bash environment, such as what you get in the common Linux based super computers and you just want an early result or view of the data before you drive into the real programming, using Python, R and SQL, SPSS, and so on.
Expertise in these data-intensive languages also comes at the price of spending a lot of time on them. In contrast, bash scripting is simple, easy to learn and perfect for mining textual data! Particularly if you deal with genomics, microarrays, social networks, life sciences, and so on. It can help you to quickly sort, search, match, replace, clean and optimise various aspect of your data, and you wouldn’t need to go through any tough learning curves.
There are several examples of practical data mining that will have a flow of importing specific data resources into flat text-type files. Bash can run different programs (grep, sort, sed, and so on) on those files, clean, optimise and extract preliminary views (cut, csvlook, view, cat, head, etc.) of the data. There is one part of data mining, which involves unstructured data and then transforming it into a structured one (awk, shell). A scripting language like Bash can be very useful for doing the transformation. We strongly believe, learning and using Bash shell scripting should be the first step if you want to say, Hello Big Data!
This book starts with some practical bash-based flat file data mining projects involving:
- University ranking data [Previews: Part I, Part II, Part III] Sample video lectures.
- Facebook data [Previews: Part I, Part II]
- Crime Data
- Shakespeare-era plays and poems data
If you haven’t used Bash before, feel free to skip the projects and get to the tutorials part. Read the tutorials and then come back to the projects again. The tutorial section will introduce with bash scripting, regular expressions, AWK, sed, grep and so on.
Finally, it gives you a concise beginner friendly guide to the big data landscape including an overview of the critical Big Data tools such as HDFS, MapReduce, YARN, Flume, Hive and more. The book finishes with a near-complete list of references to all the relevant command line and Big data tools.
Get the interactive version!
Packages
The Book
The Book only!
PDF
EPUB
WEB
English
The Book + Data sets + Code Samples + Video Lectures
The Book + Data sets + Code Samples + Video Lectures (animated)!
Includes:
Data sets
Project data sets: a) University ranking data, b) Facebook data c)AU Crime Data d) Shakespeare-era plays and poems data
Code samples
Code samples for the Learn Data with Bash Shell projects
Video Tutorials
Instructional videos and whiteboard animations covering every project in this book
PDF
EPUB
WEB
English
Reader Testimonials
Ramon Diaz
Great job!
Excellent explanation and content love the real world based scenarios and coding involved with this course. Great job!
Tori Joy
Awesome!
Easy to understand tutorials on bash commands with practical data mining projects.
John de Vries
Good for beginners!
If you want to learn Bash in the context practical hands-on projects, this is the best course for you, but I think it 's been targeted for the beginners only. So if you don't know how to sort, uniq, use bash functions, awk for basic tasks this course is right for you. I enjoyed the animated presentations, it's just awesome!
Vijay
Makes sense!
This is really the course which makes sense to use bash in order to solve data problem rather than just focusing on syntax, this way people learn better as it makes sense that whether you can make use of these commands.
Table of Contents
-
- About
-
Introduction
- What is Bash ?
- When Bash is useful?
- Bash in data mining
- Who is this book for?
- How to read this book?
-
Part 1: Projects
-
Project 1: The ‘US News’ Uni Ranks
- Dataset Preview
-
Data Analysis
- Find the colleges
- Finding the percent of colleges in the ranklist
- Listing the Institutes from a given state
- Finding the number of Institutes from each state
- Finding a correlation between ranks and tuition fees?
- Chapter Summary
-
Project 2: Facebook Data Mining
-
Dataset Preview
- How many colums and rows?
- How the data looks like?
-
Data Analysis
- How many status, in each status type?
- Find the most popular status entry
- Chapter Summary
-
Dataset Preview
-
Project 3: Best Australian Cities - Least Crimes
- Data Preview
-
Finding the number of rows and columns
- The hard way
- The easy way
-
Data Analysis
- Finding the top most crime in the whole country
- Finding the top most crime per city
- Finding the best city in Australia!
- Chapter Summary
-
Project 4: Mining Shakespear-era Plays and Poems
- Data Preview
-
Analysis
- How many plays/poems?
- How many plays/poems by each author?
- What are the most frequent words?
- Chapter Summary
-
Project 1: The ‘US News’ Uni Ranks
-
Part 2: Tutorials
-
Hello Bash!
-
which
bash? -
Hello world!
bash - Bash variables
- Bash functions
-
Bash meta characters
- Bash quotation basics
- Read and store user input
- Bash redirections
-
Bash
if-else
(conditional statements) -
Bash
case
statement -
Bash
loop
statements - Bash arithmatic
- Bash arrays
-
-
Hello ! Regular Expressions
- REGEX Types
-
Basic Regular Expressions
-
Metachar
.
-
Metachar
[ ]
-
Metachar
[^ ]
-
Metachar
^
-
Metachar
$
-
Metachar
( )
-
Metachar
*
-
Metachar
{m,n}
-
Metachar
-
Extended Regular Expressions
-
Metachar
?
-
Metachar
+
-
Metachar
|
-
Metachar
- REGEX Character Classes
-
REGEX Look Arounds
-
REGEX Atomic Groups
(?>)
-
REGEX Atomic Groups
- How to Use REGEX in Bash?
-
Hello! AWK
- AWK Built-in Variables
- AWK statements
- AWK built-in functions
-
AWK Examples
-
Example 1. AWK
print
function -
Example 2. AWK
print
specific field -
Example 3. AWK’s
BEGIN
andEND
Actions -
Example 4. AWK fields variable (
$1
,$2
and so on) - Example 5. AWK built-in variables
-
Example 6. AWK fields comparison
>
-
Example 1. AWK
- Self-contained AWK scripts
-
Hello! SED, GREP and Find
- SED - Stream Editor
-
SED substitution
- Some important SED options
- SED substitute and regular expressions
- SED delete
- SED print
- SED grouping
-
GREP
- GREP and regular expressions
-
Find command
find
-
Hello Bash!
-
Part 3: Hello Big Data!
-
-
Big Data Terminologies
- HDFS
- Map Reduce
- YARN
- Flume
- SOOOP
- Hive
- Pig
- Spark
- HBase
- Big Data file formats
-
Big Data Terminologies
- Conclusion
-
-
References
-
- Bash
- REGEX
- AWK
- SED
- GREP
- Big data
- A companion book
-
Other books by this author
The Leanpub 60 Day 100% Happiness Guarantee
Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.
You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!
So, there's no reason not to click the Add to Cart button, is there?
See full terms...
Earn $8 on a $10 Purchase, and $16 on a $20 Purchase
We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.
(Yes, some authors have already earned much more than that on Leanpub.)
In fact, authors have earnedover $13 millionwriting, publishing and selling on Leanpub.
Learn more about writing on Leanpub
Free Updates. DRM Free.
If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).
Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.
Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.
Learn more about Leanpub's ebook formats and where to read them