All about CAPTCHA's
Minimum price
Suggested price

All about CAPTCHA's

Decoding CAPTCHA's for Fun and Profit

About the Book

Welcome to the practical guide to decoding CAPTCHA's. Everything you need to start decoding CAPTCHA's is contained within. Although there is a heavy emphasis on decoding you can consider it a general guide to CAPTCHA's in general. Although mostly focussed on algorithmic methods of decoding CAPTCHA's this will also briefly cover paid/human services, history and anechdotes.

This book has come about mostly due to my continued blogging and writing about CAPTCHA's and how to decode them. At some point I thought I should combine all this information and unwritten posts into a book. I have expanded out much of what was orginally written, and added a lot more content. This book is essentially what I wish had existed when I started my Honours thesis as it would have saved me several months of research and experimentation.

My honours thesis was about using a computer program to read text out of web images. My theory was that if you could get a high level of successful extraction you could use it as another source of data which could be used to improve search engine results. Since every CAPTCHA published on the web was getting cracked within a matter of months my theory was that since people can get a computer to read something that it shouldn't be able to, then normal images such as website logos should be much easier to extract using the same methods.

I was actually surprisingly successful in my goal with over 60% successful recognition rates for most of the images I used in my sample set. Rather high considering the variety of different images that are on the web.

What I did find however while doing my research was a lack of sample code or applications which show you how to crack CAPTCHA’s. While there are some excellent tutorials and many published papers on it they are very light on algorithms or sample code. In fact I didn’t find any beyond some non working PHP scripts and some Perl fragments which strung together a few non related programs and gave some reasonable results when presented with very simple CAPTCHA’s. None of them helped me very much. I found that what I needed was some detailed code with examples I could run and tweak and see how it worked. I think I am just one of those people that can read the theory, and follow along, but without something to prod and poke I never really understand it. Most of the papers and articles said they would not publish code due the potential for miss-use. Personally I think it is a waste of time since in reality building a CAPTCHA breaker is quite easy once you know how.

  • Share this book

  • Categories

    • Computer Science
    • Computer Security
    • Computers and Programming
  • Feedback

    Email the Author(s)

About the Author

Ben E. C. Boyter
Ben E. C. Boyter

My name is Ben Boyter, and I'm a Developer from Sydney Australia.I have over 10 years of software development experience specialising in search and software testing. Since 2010 have been plugging away at which I hope will become the webs main source of indexed source code.

Table of Contents

  • 1. What’s the deal?
    • 1.1 Who is this book for?
    • 1.2 Welcome
    • 1.3 Glossary / Lingo
    • 1.4 Recommended Reading
    • 1.5 How to decode a CAPTCHA
    • 1.6 How to break a CAPTCHA
  • 2. Brief History of CAPTCHA’s
    • 2.1 What is a CAPTCHA?
    • 2.2 Who invented the CAPTCHA?
    • 2.3 Usage
    • 2.4 The Future
  • 3. How to Identify Weakness in CAPTCHA
    • 3.1 Letters and Numbers
    • 3.2 Uppercase / Lowercase
    • 3.3 Constant Font
    • 3.4 Aligned Characters
    • 3.5 Rotation
    • 3.6 Deformations
    • 3.7 Textured Background
    • 3.8 Colour Variation
    • 3.9 Character Position
    • 3.10 Constant Background
    • 3.11 Small Dictionary
    • 3.12 Dictionary Words
    • 3.13 Connected Letters
    • 3.14 Fixed Font Size
  • 4. Examples of Good and Bad CAPTCHA’s
  • 5. Extracting Characters From the Background
    • 5.1 Multivalued Image Decomposition
    • 5.2 Edge Detection
    • 5.3 Disjoint Sets
  • 6. Identifying Text Locations
    • 6.1 Blur Checking
    • 6.2 Fixed Position
    • 6.3 Vertical Slices
    • 6.4 Letter Dip Checks
    • 6.5 Disjoint Sets
    • 6.6 Extract and Test
  • 7. The Training Set
    • 7.1 Neural Network Follies by Neil Fraser
    • 7.2 Storing the training sets
  • 8. Textual Image Recognition
    • 8.1 Neural Networks
    • 8.2 Vector Space
    • 8.3 Bayesian filters
    • 8.4 Support Vector Machine
  • 9. Neural Networks Example
  • 10. Vector Space Explained
    • 10.1 Vector Space Implementations
  • 11. Improving Recognition
    • 11.1 Clearing noise
    • 11.2 Thinning
    • 11.3 Endpoint Detection
    • 11.4 Line Thickening
    • 11.5 Colour Reduction
    • 11.6 Spellcheck
    • 11.7 Dictionary Check
    • 11.8 Warping / Text Angle
    • 11.9 Common Pitfalls
  • 12. Decoding a simple CAPTCHA
    • 12.1 Identify Text in CAPTCHA and Extract
    • 12.2 Image Recognition
    • 12.3 Building a training set
    • 12.4 Putting it all together
    • 12.5 Results
  • 13. Don’t Write Your Own CAPTCHA
  • 14. Why CAPTCHA’s should never use the numbers 0 1 5 7
  • 15. Tools and Articles
  • 16. Alternatives to CAPTCHA
    • 16.1 Honey Pot Fields
    • 16.2 Stupidly Simply CAPTCHA
    • 16.3 Require Login
    • 16.4 Community Participiation
    • 16.5 Blacklisting
    • 16.6 Whitelisting / Online Services
    • 16.7 Conclusion

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

80% Royalties. Earn $16 on a $20 book.

We pay 80% royalties. That's not a typo: you earn $16 on a $20 sale. If we sell 5000 non-refunded copies of your book or course for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earnedover $13 millionwriting, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub