Preface

Please note that this book was published in the fall of 2015 and updated July 2016 for the latest version 2.0 of Apache Spark machine learning library.

I have been programming since high school and since the early 1980s I have worked on artificial intelligence, neural networks, machine learning, and general web engineering projects. Most of my professional work is reflected in the examples in this book. These examples programs were also chosen based on their technological importance, i.e. the rapidly changing technical scene of big data, the use of machine learning in systems that touch most parts of our lives, and networked devices. I then narrowed the list of topics based on a public survey I announced on my blog. Many thanks to the people who took the time to take this survey. It is my hope that the Java example programs in this book will be useful in your projects. Hopefully you will also have a lot of fun working through these examples!

Java is a flexible language that has a huge collection of open source libraries and utilities. Java gets some criticism for being a verbose programming language. I have my own coding style that is concise but may break some of the things you have learned about “proper” use of the language. The Java language has seen many upgrades since its introduction over 20 years ago. This book requires and uses the features of Java 8 so please update to the latest JDK if you have not already done so. You will also need to have maven installed on your system. I also provide project files for the free Community Version of the IntelliJ IDE.

Everything you learn in this book can be used with some effort in the alternative JVM languages Clojure, JRuby, and Scala. In addition to Java I frequently use Clojure, Haskell, and Ruby in my work.

Book Outline

This book consists of eight chapters that I believe show the power of the Java language to good effect:

  • Network programming techniques for the Internet of Things (IoT)
  • Natural Language Processing using OpenNLP including using existing models and creating your own models
  • Machine learning using the Spark mllib library
  • Anomaly Detection Machine Learning
  • Deep Learning using Deeplearning4j
  • Web Scraping
  • Using rich semantic and linked data sources on the web to enrich the data models you use in your applications
  • Java Strategies for Knowledge Management-Lite using Cloud Data Resources

The first chapter on IoT is a tutorial on network programming techniques for IoT development. I have also used these same techniques for multiplayer game development and distributed virtual reality systems, and also in the design and implementation of a world-wide nuclear test monitoring system. This chapter stands on its own and is not connected to any other material in this book.

The second chapter shows you how to use the OpenNLP library to train your own classifiers, tag parts of speech, and generally process English language text. Both this chapter and the next chapter on machine learning using the Spark mllib library use machine learning techniques.

The fourth chapter provides an example of anomaly detection using the University of Wisconsin cancer database. The fifth chapter is a short introduction to pulling plain text and semi-structured data from web sites.

The last two chapters are for information architects or developers who would like to develop information design and knowledge management skills. These chapters cover linked data (semantic web) and knowledge management techniques.

The source code for the examples can be found at https://github.com/mark-watson/power-java and are all released under the Apache 2 license. I have tried to use only existing libraries in the examples that are either Apache 2 or MIT style licensed. In general I prefer Free Software licenses like GPL, LGPL, and AGPL but for examples in a book where I expect readers to sometimes reuse entire example programs or at least small snippets of code, a license that allows use in commercial products makes more sense.

There is a subdirectory in this github repository for each chapter, each with its own maven pom.xml file to build and run the examples.

The five chapters are independent of each other so please feel free to skip around when reading and experimenting with the sample programs.

This book is available for purchase at https://leanpub.com/powerjava.

You might be interested in other books that I have self-published via leanpub:

My older books published by Springer-Verlag, McGraw-Hill, Morgan Kaufman, APress, Sybex, M&T Press, and J. Wiley are listed on the books page of my web site.

One of the major themes of this book is machine learning. In addition to my general technical blog I have a separate blog that contains information on using machine learning and cognition technology: blog.cognition.tech and an associated website supporting cognition technology.

If You Did Not Buy This Book

I frequently find copies of my books on the web. If you have a copy of this book and did not buy it please consider paying the minimum purchase price of $4 at leanpub.com/powerjava.