Email the Author
You can use this page to email Ben E. C. Boyter about All about CAPTCHA's.
About the Book
Welcome to the practical guide to decoding CAPTCHA's. Everything you need to start decoding CAPTCHA's is contained within. Although there is a heavy emphasis on decoding you can consider it a general guide to CAPTCHA's in general. Although mostly focussed on algorithmic methods of decoding CAPTCHA's this will also briefly cover paid/human services, history and anechdotes.
This book has come about mostly due to my continued blogging and writing about CAPTCHA's and how to decode them. At some point I thought I should combine all this information and unwritten posts into a book. I have expanded out much of what was orginally written, and added a lot more content. This book is essentially what I wish had existed when I started my Honours thesis as it would have saved me several months of research and experimentation.
My honours thesis was about using a computer program to read text out of web images. My theory was that if you could get a high level of successful extraction you could use it as another source of data which could be used to improve search engine results. Since every CAPTCHA published on the web was getting cracked within a matter of months my theory was that since people can get a computer to read something that it shouldn't be able to, then normal images such as website logos should be much easier to extract using the same methods.
I was actually surprisingly successful in my goal with over 60% successful recognition rates for most of the images I used in my sample set. Rather high considering the variety of different images that are on the web.
What I did find however while doing my research was a lack of sample code or applications which show you how to crack CAPTCHA’s. While there are some excellent tutorials and many published papers on it they are very light on algorithms or sample code. In fact I didn’t find any beyond some non working PHP scripts and some Perl fragments which strung together a few non related programs and gave some reasonable results when presented with very simple CAPTCHA’s. None of them helped me very much. I found that what I needed was some detailed code with examples I could run and tweak and see how it worked. I think I am just one of those people that can read the theory, and follow along, but without something to prod and poke I never really understand it. Most of the papers and articles said they would not publish code due the potential for miss-use. Personally I think it is a waste of time since in reality building a CAPTCHA breaker is quite easy once you know how.
About the Author
My name is Ben Boyter, and I'm a Developer from Sydney Australia.I have over 10 years of software development experience specialising in search and software testing. Since 2010 have been plugging away at searchcode.com which I hope will become the webs main source of indexed source code.