Appendix B: References
Learning Deep Architectures for AI. Bengio, Yoshua, 2009.
Efficient BackProp LeCun et al, 1998.
Maxout Networks Goodfellow et al, 2013
HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. Niu et al, 2011
Improving neural networks by preventing co-adaptation of feature detectors Hinton et al., 2012.
On the importance of initialization and momentum in deep learning. Sutskever et al, 2014.
ADADELTA: AN ADAPTIVE LEARNING RATE METHOD. Zeiler, 2012.
H2O GitHub repository for the H2O Deep Learning documentation
Reducing the Dimensionality of Data with Neural Networks. Hinton et al, 2006
The Definitive Performance Tuning Guide for H2O Deep Learning Candel, Arno, 2015.