Understanding Regular Expressions


This book is no longer available for sale.

Understanding Regular Expressions

About the Book

Exploration is not enough

Many of us default to a journey of discovery when it comes to playing around with something we don’t know well. With regular expressions, the task seems just too easy: we just have to create a short expression, right? Well, often times, this point of view is very wrong.

Trial and error often times takes more time than getting the pain handled, and getting lack of knowledge cured. Yet, most developers do this over and over again. This is because learning regular expressions seems to be too hard at first glance. Therefore, my mission is to show you that

  • learning regular expressions is a lot easier than you thought,
  • knowing regular expressions is fun,
  • knowing regular expressions is very beneficial in many areas of your software developer career.

You can easily master regular expressions to the extent that they will do exactly what you intended them to do. This mastery comes through understanding the right theory, and a lot of practice.

Why are regular expressions important?

In today’s world, we have to deal with processing a lot of data. Accessing data is not the main problem. Filtering data is. Regular expressions provide you with one type of filter that you can use to extract data relevant to you from the big chunks of data available to you.

For instance, suppose you have an XML file containing four gigabytes of data on movies. Regular expressions make it possible to query this XML text so that you can find all movies that are filmed in Budapest in 2016 for instance.

Regular expressions are a must have for software developers.

In frontend development, we often validate input using regular expressions. Many small features are also easier with regular expressions, such as splitting strings, parsing input, matching patterns.

In backend and data science, we often search, replace, and process data using regular expressions.

In IT infrastructure, regular expressions have many use cases in Linux. VIM and EMACS also come with regex support for finding commands, as well as editing text files.

Regular expressions are everywhere. These skills come handy for you in your IT engineering career. 

Why Regular Expressions are Widely Misunderstood

Regular expressions are widely misunderstood. People who taught you regular expressions either come from a theoretical point of view using formal languages and computer science, or they developed their understanding using trial and error.

Whenever you hear that regular expressions are declarative, run from that tutorial or blog as far as you can. A regex is an imperative language. It’s like JavaScript, except that the syntax is different. If you want to understand regexes as declarative, chances are, you will fail.

According to the theoretical definition above, regexes specify a search pattern. Although this is a true statement, it is easy to misinterpret it, because we are not specifying a declarative structure. In the real world, we specify a sequence of instructions acting like a function in an imperative programming language. We use commands, loops, we pass arguments to our regex, we may pass arguments around inside our regex, we return a result, and we may even cause side-effects. 

If you have dealt with at least one programming language in your life, chances are, you know almost everything to understand regular expressions. You are just not yet proficient in this weird language describing regular expressions. As soon as you familiarize yourself with this weird language, everything will fall into place.

About the Author

I am Zsolt Nagy, founder of zsoltnagy.eu, a blog on writing maintainable web applications, and devcareermastery.com, a career blog on designing a fulfilling career.

I am the author of two other Leanpub books:

Launch Offer

If you are still reading this page, you might wonder, when will now be a good time to master regular expressions?

During the launch period, prices are still low. Grab the book before prices are raised.

If you are interested in an upgrade, and you happen to be a JavaScript developer, I am currently shooting a JavaScript course on regular expression. During the launch offer, you can buy this video course at a discount. Be aware though that the course is currently incomplete. I will add videos to the course on a regular basis during the next months.

About the Author

Zsolt Nagy
Zsolt Nagy

There were times when I was not conscious about my career choices at all. During my university years, I was far too busy with my studies and with an EU funded research project. While writing my thesis, I earned more than the average starting salary of MsC graduates in my country. When things are going so well early in your career, believe me, you won't start thinking about the next step. 

This proved to be a mistake. I started my job interviews just a couple of months before graduation. I only had a resume, and some references on a research project. I also happened to be a below average communicator. 

Even though I could choose from many options, none of those options seemed to be too lucrative. Having realized my awkward position, I chose the path of maximum responsibility, and joined a tech startup. The trade-off for excellent working conditions was bad pay. I kept on telling myself that I deserved more. Truth is, I deserved exactly the amount that I was able to negotiate. Back then, I didn't accept these facts, and was waiting for others to give me a raise whenever they praised me. A simple strategy destined to fail. 

Software developers are in high demand, as there is a shortage of good professionals all over the world. I asked myself, come excellent developers still waste their talent by working for companies that don't respect them financially or professionally? 

Throughout the last ten years, I have been continuously improving my tech skills as well as my soft skills. These improvements have enabled me to assume Team Lead and Technical Lead positions. Above all, these skills have enabled me to work with the companies I want. I encourage you to do the same. 

For more, visit my website, devcareermastery.com.

My technical blog is about developing maintainable web applications using JavaScript. Read it on zsoltnagy.eu

Table of Contents

  • An Introduction to Regular Expressions
  • Regex Syntax 101
    • Formulating an Expression
    • Arbitrary character class
    • Basic Concatenation
    • Alternative execution
    • Operator precedence and parentheses
    • Anchored start and end
    • Modifiers
    • Summary
  • Executing Regular Expressions
    • Regular Expressions in JavaScript
    • Other PCRE-Based Regex Environments
  • Visualizing Regex Execution using Finite State Machines
    • Regular Expressions are Finite State Machines
    • Backtracking
    • Deterministic and nondeterministic Regex modeling
    • Basic regex simplifications
    • A successful match is cheaper than failure
    • Automatically generating regex FSMs
    • Summary
  • Repeat modifiers
    • Match at least once
    • Match At most once - optionals
    • Match any number of times
    • Fixed range matching
    • Loop exactly n times
    • Greedy repeat modifiers
    • Lazy repeat modifiers
    • Possessive repeat modifiers
    • Summary
  • Character Sets and Character Classes
    • Character sets
    • Character Set Ranges
    • Exclusions from Character Sets
    • Concatenating Advanced Language Constructs
  • Substring Extraction from Regular Expressions
    • Defining capture groups
    • Perl 6 capture groups
    • Retrieval of captured substrings
    • Reusing captured substrings within a regex
    • Capture groups and performance
    • Extensions to capture groups
    • Summary
  • Lookahead and Lookbehind
    • Lookahead
    • Lookbehind
    • Summary
  • Maintaining Regular Expressions
    • Extended mode
    • Regex Subroutines
    • Named Capture Groups
    • Case study: XRegExp Library for JavaScript
    • Summary
  • Optimizing regular expressions
    • Summary of the optimization techniques
    • Making character classes more specific
    • Repeating character class loops
    • Use possessive repeat modifiers whenever possible
    • Use atomic groups
    • Refactor for optimization
    • Optimization techniques limit non-deterministic execution
    • Summary
  • Parsing HTML Code and URL Query Strings with Regular Expressions
    • Parsing HTML tags
    • Processing the Query String of a URL
  • This is not the end, but the beginning
    • 100% Understanding Regular Expressions Guarantee
    • What if I Want to Learn More?
    • Upgrade Discount
    • Keep in Touch
  • Notes

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

80% Royalties. Earn $16 on a $20 book.

We pay 80% royalties. That's not a typo: you earn $16 on a $20 sale. If we sell 5000 non-refunded copies of your book or course for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earnedover $13 millionwriting, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub