Natural Language Processing For Hackers
Natural Language Processing For Hackers
Learn to build awesome apps that can understand people
About the Book
This is not your typical research oriented book that exposes the theoretical approach and uses clean datasets that you can only find in introductory courses and never in the real world. This is a hands-on, practical course on getting started with Natural Language Processing and learning key concepts while coding. No guesswork required.
Throughout the book you'll get to touch some of the most important and practical areas of Natural Language Processing. Everything you do will have a working result.
Here are some things you will get to tackle
- Building your own Text Analysis engine
- Understanding how data-gathering works in the real world
- Building a Twitter listener that performs Sentiment Analysis on a certain subject
- Understanding how the classic NLP tools are actually built, enabling you to build your own: Part Of Speech Tagger, Shallow Parser, Named Entity Extractor and Dependency Parser
- Cleaning and standardising messy datasets
- Understanding how to fine tune Natural Language models
- Learn how chatbots work
The book contains complete code snippets and step-by-step examples. No need to fill in the blanks or wonder what the author meant. Everything is written in concise, easy-to-read Python 3 code.
Table of Contents
-
-
- Preface
-
Introduction
- What is Natural Language Processing?
- Challenges in Natural Language Processing
- What makes this book different?
-
-
Part 1: Introduction to NLTK
-
NLTK Fundamentals
- Installing NLTK
- Splitting Text
- Building a vocabulary
- Fun with Bigrams and Trigrams
- Part Of Speech Tagging
- Named Entity Recognition
-
Getting started with Wordnet
- Wordnet Structure
- Lemma Operations
-
Lemmatizing and Stemming
- How stemmers work
- How lemmatizers work
-
NLTK Fundamentals
-
Part 2: Create a Text Analysis service
-
Introduction to Machine Learning
- A Practical Machine Learning Example
-
Getting Started with Scikit-Learn
- Installing Scikit-Learn and building a dataset
- Training a Scikit-Learn Model
- Making Predictions
-
Finding the data
- Existing corpora
- Ideas for Gathering Data
- Getting the Data
-
Learning to Classify Text
- Text Feature Extractor
- Scikit-Learn Feature Extraction
- Text Classification with Naive Bayes
- Persisting models
-
Building the API
- Building a Flask API
- Deploy to Heroku
-
Introduction to Machine Learning
-
Part 3: Create a Social Media Monitoring Service
-
Basics of Sentiment Analysis
- Be Aware of Negations
- Machine Learning doesn’t get Humour
- Multiple and Mixed Sentiments
- Non-Verbal Communication
-
Twitter Sentiment Data
- Twitter Corpora
- Other Sentiment Analysis Corpora
- Building a Tweets Dataset
- Sentiment Analysis - A First Attempt
- Better Tokenization
-
Fine Tuning
- Try a different classifier
- Use Ngrams Instead of Words
- Using a Pipeline
- Cross Validation
- Grid Search
- Picking the Best Tokenizer
- Building the Twitter Listener
-
Classification Metrics
- Binary Classification
-
Multi-Class Metrics
- The Confusion Matrix
-
Basics of Sentiment Analysis
-
Part 4: Build Your Own NLP Toolkit
-
Build Your Own Part-Of-Speech Tagger
- Part-Of-Speech Corpora
- Building Toy Models
- About Feature Extraction
- Using the NLTK Base Classes
- Writing the Feature Extractor
- Training the Tagger
- Out-Of-Core Learning
-
Build a Chunker
- IOB Tagging
- Implementing the Chunk Parser
- Chunker Feature Detection
-
Build a Named Entity Extractor
- NER Corpora
- The Groningen Meaning Bank Corpus
- Feature Detection
- NER Training
-
Build a Dependency Parser
- Understanding the Problem
- Greedy Transition-Based Parsing
- Dependency Dataset
- Writing the Dependency Parser Class
-
Adding Labels to the Parser
- Learning to Label Dependencies
- Training our Labelled Dependency Parser
-
Build Your Own Part-Of-Speech Tagger
-
Part 5: Build Your Own Chatbot Engine
-
General Architecture
- Train the Platform via Examples
- Action Handlers
-
Building the Core
- Chatbot Base Class and Training Set
- Training the Chatbot
- Everything together
-
MovieBot
- The Movie DB API
- Small-Talk Handlers
- Simple Handlers
- Execution Handlers
-
MovieBot on Facebook
- Installing ngrok
- Setting up Facebook
- Trying it Out
- What Next?
-
General Architecture
The Leanpub 60 Day 100% Happiness Guarantee
Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.
You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!
So, there's no reason not to click the Add to Cart button, is there?
See full terms...
Earn $8 on a $10 Purchase, and $16 on a $20 Purchase
We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.
(Yes, some authors have already earned much more than that on Leanpub.)
In fact, authors have earnedover $14 millionwriting, publishing and selling on Leanpub.
Learn more about writing on Leanpub
Free Updates. DRM Free.
If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).
Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.
Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.
Learn more about Leanpub's ebook formats and where to read them