Introduction

Deep Learning has been dominating recent machine learning competitions with better predictions. Unlike the neural networks of the past, modern Deep Learning has cracked the code for training stability and generalization and scales on big data. It is the algorithm of choice for highest predictive accuracy.

This documentation presents the Deep Learning framework in H2O, as experienced through the H2O R interface. Further documentation on H2O’s system and algorithms can be found at http://docs.h2o.ai, especially the “R User documentation”, and fully featured tutorials are available at http://learn.h2o.ai. The datasets, R code and instructions for this document can be found at the H2O GitHub repository at https://github.com/h2oai/h2o/tree/master/docs/deeplearning/DeepLearningRVignetteDemo. This introductory section provides instructions on getting H2O started from R, followed by a brief overview of deep learning.

Installation

To install H2O, follow the “Download” link on at H2O’s website at http://h2o.ai/. For multi-node operation, download the H2O zip file and deploy H2O on your cluster, following instructions from the “Full Documentation”. For single-node operation, follow the instructions in the “Install in R” tab. Open your R Console and run the following to install and start H2O directly from R:

 1 # The following two commands remove any previously installed H2O packages for R. 
 2 if ("package:h2o" %in% search()) { detach("package:h2o", unload=TRUE) }
 3 if ("h2o" %in% rownames(installed.packages())) { remove.packages("h2o") }
 4 
 5 # Next, download, install and initialize the H2O package for R  
 6 #(filling in the *'s with the latest version number obtained from the H2O downlo\
 7 ad page) 
 8 install.packages("h2o", repos=(c("http://s3.amazonaws.com/h2o-release/h2o/ maste\
 9 r/****/R", getOption("repos"))))
10 library(h2o)

Initialize H2O with h2o_server = h2o.init(nthreads = -1)

With this command, the H2O R module will start an instance of H2O automatically at localhost:54321. Alternatively, to specify a connection with an existing H2O cluster node (other than localhost at port 54321) you must explicitly state the IP address and port number in the h2o.init(nthreads = -1) call. An example is given below, but do not directly paste; you should specify the IP and port number appropriate to your specific environment.

h2o_server = h2o.init(ip = "192.555.1.123", port = 12345, startH2O = FALSE, nthreads = -1)

An automatic demo is available to see h2o.deeplearning at work. Run the following command to observe an example binary classification model built through H2O’s Deep Learning.

demo(h2o.deeplearning)

Support

Users of the H2O package may submit general inquiries and bug reports to the H2O.ai support address. Alternatively, specific bugs or issues may be filed to the H2O JIRA.

Deep Learning Overview

First we present a brief overview of deep neural networks for supervised learning tasks. There are several theoretical frameworks for deep learning, and here we summarize the feedforward architecture used by H2O.

Neuron
Neuron

The basic unit in the model (shown above) is the neuron, a biologically inspired model of the human neuron. For humans, varying strengths of neurons’ output signals travel along the synaptic junctions and are then aggregated as input for a connected neuron’s activation. In the model, the weighted combination \alpha = \sum_{i=1}^{n} w_i x_i + b of input signals is aggregated, and then an output signal f(\alpha) transmitted by the connected neuron. The function f represents the nonlinear activation function used throughout the network, and the bias b accounts for the neuron’s activation threshold.

Neural net
Neural net

Multi-layer, feedforward neural networks consist of many layers of interconnected neuron units: beginning with an input layer to match the feature space; followed by multiple layers of nonlinearity; and terminating with a linear regression or classification layer to match the output space. The inputs and outputs of the model’s units follow the basic logic of the single neuron described above. Bias units are included in each non-output layer of the network. The weights linking neurons and biases with other neurons fully determine the output of the entire network, and learning occurs when these weights are adapted to minimize the error on labeled training data. More specifically, for each training example j the objective is to minimize a loss function

L(W,B|j)

Here W is the collection \left\{W_i\right\}_{1:N-1}, where W_i denotes the weight matrix connecting layers i and i+1 for a network of N layers; similarly B is the collection \left\{b_i\right\}_{1:N-1}, where b_i denotes the column vector of biases for layer i+1.

This basic framework of multi-layer neural networks can be used to accomplish deep learning tasks. Deep learning architectures are models of hierarchical feature extraction, typically involving multiple levels of nonlinearity. Such models are able to learn useful representations of raw data, and have exhibited high performance on complex data such as images, speech, and text (Bengio 2009).