Finding Patterns in Everyday Text

This book is 60% complete


About the Book

This is a spinoff of a chapter from the Bastards Book of Ruby. Regular expressions are an essential and useful skill even outside of programming. They can serve not only as a handy tool for anyone whose work involves writing or data, but also act as a gateway into more interesting and complex kinds of programming. While you're waiting for me to finish this experiment in self-publishing, you can get a good start by reading the massive chapter on regular expressions in the BBoR

Table of Contents

  • Regular Expressions are for Everyone
  • FAQ
  • Release notes & changelog
  • Getting Started
  • Finding a proper text editor
  • Why a dedicated text editor?
  • Windows text editors
  • Mac Text Editors
  • Sublime Text
  • Online regex testing sites
  • A better Find-and-Replace
  • How to find and replace
  • The limitations of Find-and-Replace
  • There’s more than find-and-replace
  • Your first regex
  • Hello, word boundaries
  • Word boundaries
  • Escape with backslash
  • Regex Fundamentals
  • Removing emptiness
  • The newline character
  • Viewing invisible characters
  • Match one-or-more with the plus sign
  • The plus operator
  • Backslash-s
  • Match zero-or-more with the star sign
  • The star sign
  • Specific and limited repetition
  • Curly braces
  • Curly braces, maximum and no-limit matching
  • Cleaning messily-spaced data
  • Anchors: A way to trim emptiness
  • The caret as starting anchor
  • The dollar sign as the ending anchor
  • Escaping special characters
  • Matching any letter, any number
  • The numeric character class
  • Word characters
  • Bracketed character classes
  • Matching ranges of characters with brackets and hyphens
  • All the characters with dot
  • Negative character sets
  • Negative character sets
  • Capture, Reuse
  • Parentheses for precedence
  • Parentheses for captured groups
  • Correcting dates with capturing groups
  • Using parentheses without capturing
  • Optionality and alternation
  • Alternation with the pipe character
  • Optionality with the question mark
  • Laziness and greediness
  • Greediness
  • Laziness
  • Lookarounds
  • Positive lookahead
  • Negative lookahead
  • Positive lookbehind
  • Negative lookbehind
  • The importance of zero-width (TODO)
  • Regexes in Real Life
  • Why learn Excel?
  • The limits of Excel (todo)
  • Delimitation
  • Mixed commas and other delimiters
  • Dealing with text charts (todo)
  • Completely unstructured text (todo)
  • Moving in and out and into Excel
  • From Data to HTML (TODO)
  • Simple HTML tricks
  • Example Domain
  • Tabular data to HTML tables
  • Mocking full web pages from data
  • Visualizations
  • The Exercises
  • Data Cleaning with the Stars
  • Normalized alphabetical titles
  • Make your own delimiters
  • Finding needles in haystacks (TODO)
  • Shakespeare’s longest word
  • Changing phone format (TODO)
  • Telephone game
  • Ordering names and dates (TODO)
  • Year, months, days
  • Names
  • Preparing for a spreadsheet
  • Dating, Associated Press Style (TODO)
  • Scenario
  • The AP Date format
  • Real-world considerations
  • The limits of regex
  • Sorting a police blotter
  • Sloppy copy-and-paste
  • Start loose and simple
  • Conclusion
  • Converting XML to tab-delimited data
  • The payments XML
  • The pattern
  • Add more delimitation
  • Cleaning up Microsoft Word HTML (TODO)
  • Switching visualizations (TODO)
  • A visualization in Excel
  • From Excel to Google Static Chart
  • From Google Static Charts to Google Interactive Charts
  • Cleaning up OCR Text (TODO)
  • Scenario
  • Cheat Sheet
  • Moving forward
  • Additional references and resources

About the Author

