All about CAPTCHA's
All about CAPTCHA's
$39.00
Minimum price
$39.00
Suggested price
All about CAPTCHA's

This book is 100% complete

Completed on 2016-07-18

About the Book

Welcome to the practical guide to decoding CAPTCHA's. Everything you need to start decoding CAPTCHA's is contained within. Although there is a heavy emphasis on decoding you can consider it a general guide to CAPTCHA's in general. Although mostly focussed on algorithmic methods of decoding CAPTCHA's this will also briefly cover paid/human services, history and anechdotes.

This book has come about mostly due to my continued blogging and writing about CAPTCHA's and how to decode them. At some point I thought I should combine all this information and unwritten posts into a book. I have expanded out much of what was orginally written, and added a lot more content. This book is essentially what I wish had existed when I started my Honours thesis as it would have saved me several months of research and experimentation.

My honours thesis was about using a computer program to read text out of web images. My theory was that if you could get a high level of successful extraction you could use it as another source of data which could be used to improve search engine results. Since every CAPTCHA published on the web was getting cracked within a matter of months my theory was that since people can get a computer to read something that it shouldn't be able to, then normal images such as website logos should be much easier to extract using the same methods.

I was actually surprisingly successful in my goal with over 60% successful recognition rates for most of the images I used in my sample set. Rather high considering the variety of different images that are on the web.

What I did find however while doing my research was a lack of sample code or applications which show you how to crack CAPTCHA’s. While there are some excellent tutorials and many published papers on it they are very light on algorithms or sample code. In fact I didn’t find any beyond some non working PHP scripts and some Perl fragments which strung together a few non related programs and gave some reasonable results when presented with very simple CAPTCHA’s. None of them helped me very much. I found that what I needed was some detailed code with examples I could run and tweak and see how it worked. I think I am just one of those people that can read the theory, and follow along, but without something to prod and poke I never really understand it. Most of the papers and articles said they would not publish code due the potential for miss-use. Personally I think it is a waste of time since in reality building a CAPTCHA breaker is quite easy once you know how.

Table of Contents

  • 1. What’s the deal?
    • 1.1 Who is this book for?
    • 1.2 Welcome
    • 1.3 Glossary / Lingo
    • 1.4 Recommended Reading
    • 1.5 How to decode a CAPTCHA
    • 1.6 How to break a CAPTCHA
  • 2. Brief History of CAPTCHA’s
    • 2.1 What is a CAPTCHA?
    • 2.2 Who invented the CAPTCHA?
    • 2.3 Usage
    • 2.4 The Future
  • 3. How to Identify Weakness in CAPTCHA
    • 3.1 Letters and Numbers
    • 3.2 Uppercase / Lowercase
    • 3.3 Constant Font
    • 3.4 Aligned Characters
    • 3.5 Rotation
    • 3.6 Deformations
    • 3.7 Textured Background
    • 3.8 Colour Variation
    • 3.9 Character Position
    • 3.10 Constant Background
    • 3.11 Small Dictionary
    • 3.12 Dictionary Words
    • 3.13 Connected Letters
    • 3.14 Fixed Font Size
  • 4. Examples of Good and Bad CAPTCHA’s
  • 5. Extracting Characters From the Background
    • 5.1 Multivalued Image Decomposition
    • 5.2 Edge Detection
    • 5.3 Disjoint Sets
  • 6. Identifying Text Locations
    • 6.1 Blur Checking
    • 6.2 Fixed Position
    • 6.3 Vertical Slices
    • 6.4 Letter Dip Checks
    • 6.5 Disjoint Sets
    • 6.6 Extract and Test
  • 7. The Training Set
    • 7.1 Neural Network Follies by Neil Fraser
    • 7.2 Storing the training sets
  • 8. Textual Image Recognition
    • 8.1 Neural Networks
    • 8.2 Vector Space
    • 8.3 Bayesian filters
    • 8.4 Support Vector Machine
  • 9. Neural Networks Example
  • 10. Vector Space Explained
    • 10.1 Vector Space Implementations
  • 11. Improving Recognition
    • 11.1 Clearing noise
    • 11.2 Thinning
    • 11.3 Endpoint Detection
    • 11.4 Line Thickening
    • 11.5 Colour Reduction
    • 11.6 Spellcheck
    • 11.7 Dictionary Check
    • 11.8 Warping / Text Angle
    • 11.9 Common Pitfalls
  • 12. Decoding a simple CAPTCHA
    • 12.1 Identify Text in CAPTCHA and Extract
    • 12.2 Image Recognition
    • 12.3 Building a training set
    • 12.4 Putting it all together
    • 12.5 Results
  • 13. Don’t Write Your Own CAPTCHA
  • 14. Why CAPTCHA’s should never use the numbers 0 1 5 7
  • 15. Tools and Articles
  • 16. Alternatives to CAPTCHA
    • 16.1 Honey Pot Fields
    • 16.2 Stupidly Simply CAPTCHA
    • 16.3 Require Login
    • 16.4 Community Participiation
    • 16.5 Blacklisting
    • 16.6 Whitelisting / Online Services
    • 16.7 Conclusion

About the Author

Ben E. C. Boyter
Ben E. C. Boyter

My name is Ben Boyter, and I'm a Developer from Sydney Australia.I have over 10 years of software development experience specialising in search and software testing. Since 2010 have been plugging away at searchcode.com which I hope will become the webs main source of indexed source code.

The Leanpub 45-day 100% Happiness Guarantee

Within 45 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms...

Write and Publish on Leanpub

Authors and publishers use Leanpub to publish amazing in-progress and completed ebooks, just like this one. You can use Leanpub to write, publish and sell your book as well! Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks. Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. It really is that easy.

Learn more about writing on Leanpub