Understanding Regular Expressions
Understanding Regular Expressions
About the Book
Exploration is not enough
Many of us default to a journey of discovery when it comes to playing around with something we don’t know well. With regular expressions, the task seems just too easy: we just have to create a short expression, right? Well, often times, this point of view is very wrong.
Trial and error often times takes more time than getting the pain handled, and getting lack of knowledge cured. Yet, most developers do this over and over again. This is because learning regular expressions seems to be too hard at first glance. Therefore, my mission is to show you that
- learning regular expressions is a lot easier than you thought,
- knowing regular expressions is fun,
- knowing regular expressions is very beneficial in many areas of your software developer career.
You can easily master regular expressions to the extent that they will do exactly what you intended them to do. This mastery comes through understanding the right theory, and a lot of practice.
Why are regular expressions important?
In today’s world, we have to deal with processing a lot of data. Accessing data is not the main problem. Filtering data is. Regular expressions provide you with one type of filter that you can use to extract data relevant to you from the big chunks of data available to you.
For instance, suppose you have an XML file containing four gigabytes of data on movies. Regular expressions make it possible to query this XML text so that you can find all movies that are filmed in Budapest in 2016 for instance.
Regular expressions are a must have for software developers.
In frontend development, we often validate input using regular expressions. Many small features are also easier with regular expressions, such as splitting strings, parsing input, matching patterns.
In backend and data science, we often search, replace, and process data using regular expressions.
In IT infrastructure, regular expressions have many use cases in Linux. VIM and EMACS also come with regex support for finding commands, as well as editing text files.
Regular expressions are everywhere. These skills come handy for you in your IT engineering career.
Why Regular Expressions are Widely Misunderstood
Regular expressions are widely misunderstood. People who taught you regular expressions either come from a theoretical point of view using formal languages and computer science, or they developed their understanding using trial and error.
Whenever you hear that regular expressions are declarative, run from that tutorial or blog as far as you can. A regex is an imperative language. It’s like JavaScript, except that the syntax is different. If you want to understand regexes as declarative, chances are, you will fail.
According to the theoretical definition above, regexes specify a search pattern. Although this is a true statement, it is easy to misinterpret it, because we are not specifying a declarative structure. In the real world, we specify a sequence of instructions acting like a function in an imperative programming language. We use commands, loops, we pass arguments to our regex, we may pass arguments around inside our regex, we return a result, and we may even cause side-effects.
If you have dealt with at least one programming language in your life, chances are, you know almost everything to understand regular expressions. You are just not yet proficient in this weird language describing regular expressions. As soon as you familiarize yourself with this weird language, everything will fall into place.
About the Author
I am Zsolt Nagy, founder of zsoltnagy.eu, a blog on writing maintainable web applications, and devcareermastery.com, a career blog on designing a fulfilling career.
I am the author of two other Leanpub books:
- ES6 in Practice - The Complete Developer's Guide, and
- The Developer's Edge - How to Double Your Career Speed with Soft-Skills.
Launch Offer
If you are still reading this page, you might wonder, when will now be a good time to master regular expressions?
During the launch period, prices are still low. Grab the book before prices are raised.
If you are interested in an upgrade, and you happen to be a JavaScript developer, I am currently shooting a JavaScript course on regular expression. During the launch offer, you can buy this video course at a discount. Be aware though that the course is currently incomplete. I will add videos to the course on a regular basis during the next months.
Table of Contents
- An Introduction to Regular Expressions
-
Regex Syntax 101
- Formulating an Expression
- Arbitrary character class
- Basic Concatenation
- Alternative execution
- Operator precedence and parentheses
- Anchored start and end
- Modifiers
- Summary
-
Executing Regular Expressions
- Regular Expressions in JavaScript
- Other PCRE-Based Regex Environments
-
Visualizing Regex Execution using Finite State Machines
- Regular Expressions are Finite State Machines
- Backtracking
- Deterministic and nondeterministic Regex modeling
- Basic regex simplifications
- A successful match is cheaper than failure
- Automatically generating regex FSMs
- Summary
-
Repeat modifiers
- Match at least once
- Match At most once - optionals
- Match any number of times
- Fixed range matching
-
Loop exactly
n
times - Greedy repeat modifiers
- Lazy repeat modifiers
- Possessive repeat modifiers
- Summary
-
Character Sets and Character Classes
- Character sets
- Character Set Ranges
- Exclusions from Character Sets
- Concatenating Advanced Language Constructs
-
Substring Extraction from Regular Expressions
- Defining capture groups
- Perl 6 capture groups
- Retrieval of captured substrings
- Reusing captured substrings within a regex
- Capture groups and performance
- Extensions to capture groups
- Summary
-
Lookahead and Lookbehind
- Lookahead
- Lookbehind
- Summary
-
Maintaining Regular Expressions
- Extended mode
- Regex Subroutines
- Named Capture Groups
- Case study: XRegExp Library for JavaScript
- Summary
-
Optimizing regular expressions
- Summary of the optimization techniques
- Making character classes more specific
- Repeating character class loops
- Use possessive repeat modifiers whenever possible
- Use atomic groups
- Refactor for optimization
- Optimization techniques limit non-deterministic execution
- Summary
-
Parsing HTML Code and URL Query Strings with Regular Expressions
- Parsing HTML tags
- Processing the Query String of a URL
-
This is not the end, but the beginning
- 100% Understanding Regular Expressions Guarantee
- What if I Want to Learn More?
- Upgrade Discount
- Keep in Touch
- Notes
The Leanpub 60 Day 100% Happiness Guarantee
Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.
You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!
So, there's no reason not to click the Add to Cart button, is there?
See full terms...
Earn $8 on a $10 Purchase, and $16 on a $20 Purchase
We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.
(Yes, some authors have already earned much more than that on Leanpub.)
In fact, authors have earnedover $13 millionwriting, publishing and selling on Leanpub.
Learn more about writing on Leanpub
Free Updates. DRM Free.
If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).
Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.
Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.
Learn more about Leanpub's ebook formats and where to read them