Abstract
Neural networks are powerful learners that have repeatedly proven to be capable of learning complex functions in many domains. However, the great power of neural networks is also their greatest weakness; neural networks often simply overfit the training data if care is not taken to design the learning process carefully.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Computational errors can be ignored by requiring that |wi| should be at least 10−6 in order for wi to be considered truly non-zero.
References
C. Aggarwal. Outlier analysis. Springer, 2017.
Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. NeurIPS, 19, 153, 2007.
Y. Bengio, N. Le Roux, P. Vincent, O. Delalleau, and P. Marcotte. Convex neural networks. NeurIPS, pp. 123–130, 2005.
Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. ICML, 2009.
Y. Bengio, L. Yao, G. Alain, and P. Vincent. Generalized denoising auto-encoders as generative models. NeurIPS, pp. 899–907, 2013.
C. M. Bishop. Training with noise is equivalent to Tikhonov regularization. Neural computation, 7(1),pp. 108–116, 1995.
L. Breiman. Bagging predictors. Machine Learning, 24(2), pp. 123–140, 1996.
P. Bühlmann and B. Yu. Analyzing bagging. Annals of Statistics, pp. 927–961, 2002.
Y. Burda, R. Grosse, and R. Salakhutdinov. Importance weighted autoencoders. arXiv:1509.00519, 2015. https://arxiv.org/abs/1509.00519
J. Chen, S. Sathe, C. Aggarwal, and D. Turaga. Outlier detection with autoencoder ensembles. SDM Conference, 2017.
Y. Chen and M. Zaki. KATE: K-Competitive Autoencoder for Text. KDD, 2017.
M. Denil, B. Shakibi, L. Dinh, M. A. Ranzato, and N. de Freitas. Predicting parameters in deep learning. NeurIPS, pp. 2148–2156, 2013.
C. Doersch. Tutorial on variational autoencoders. arXiv:1606.05908, 2016. https://arxiv.org/abs/1606.05908
H. Drucker and Y. LeCun. Improving generalization performance using double backpropagation. IEEE TNNLS, 3(6), pp. 991–997, 1992.
J. Elman. Learning and development in neural networks: The importance of starting small. Cognition, 48, pp. 781–799, 1993.
D. Erhan et al. Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11, pp. 625–660, 2010.
Y. Freund and R. Schapire. A decision-theoretic generalization of online learning and application to boosting. Computational Learning Theory, pp. 23–37, 1995.
K. Greff, R. K. Srivastava, and J. Schmidhuber. Highway and residual networks learn unrolled iterative estimation. arXiv:1612.07771, 2016. https://arxiv.org/abs/1612.07771
L. K. Hansen and P. Salamon. Neural network ensembles. IEEE TPAMI, 12(10), pp. 993–1001, 1990.
B. Hassibi and D. Stork. Second order derivatives for network pruning: Optimal brain surgeon. NeurIPS, 1993.
T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 2009.
T. Hastie, R. Tibshirani, and M. Wainwright. Statistical learning with sparsity: the lasso and generalizations. CRC Press, 2015.
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CVPR, pp. 770–778, 2016.
G. Hinton. To recognize shapes, first learn to generate images. Progress in Brain Research, 165, pp. 535–547, 2007.
G. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7), pp. 1527–1554, 2006.
G. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580, 2012. https://arxiv.org/abs/1207.0580
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8), pp. 1735–1785, 1997.
F. Khan, B. Mutlu, and X. Zhu. How do humans teach: On curriculum learning and teaching dimension. NeurIPS, pp. 1449–1457, 2011.
D. Kingma and M. Welling. Auto-encoding variational bayes. arXiv:1312.6114, 2013. https://arxiv.org/abs/1312.6114
S. Kirkpatrick, C. Gelatt, and M. Vecchi. Optimization by simulated annealing. Science, 220, pp. 671–680, 1983.
R. Kohavi and D. Wolpert. Bias plus variance decomposition for zero-one loss functions. ICML, 1996.
E. Kong and T. Dietterich. Error-correcting output coding corrects bias and variance. ICML, pp. 313–321, 1995.
A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. NeurIPS, pp. 1097–1105. 2012.
Q. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, and A. Ng, On optimization methods for deep learning. ICML, pp. 265–272, 2011.
Q. Le, W. Zou, S. Yeung, and A. Ng. Learning hierarchical spatio-temporal features for action recognition with independent subspace analysis. CVPR, 2011.
Y. LeCun, J. Denker, and S. Solla. Optimal brain damage. NeurIPS, pp. 598–605, 1990.
H. Lee, C. Ekanadham, and A. Ng. Sparse deep belief net model for visual area V2. NeurIPS, 2008.
J. Ma, R. P. Sheridan, A. Liaw, G. E. Dahl, and V. Svetnik. Deep neural nets as a method for quantitative structure-activity relationships. Journal of Chemical Information and Modeling, 55(2), pp. 263–274, 2015.
A. Makhzani and B. Frey. K-sparse autoencoders. arXiv:1312.5663, 2013. https://arxiv.org/abs/1312.5663
A. Makhzani and B. Frey. Winner-take-all autoencoders. NeurIPS, pp. 2791–2799, 2015.
A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey. Adversarial autoencoders. arXiv:1511.05644, 2015. https://arxiv.org/abs/1511.05644
H. Mobahi and J. Fisher. A theoretical analysis of optimization by Gaussian continuation. AAAI, 2015.
A. Ng. Sparse autoencoder. CS294A Lecture notes, 2011. https://nlp.stanford.edu/~socherr/sparseAutoencoder_2011new.pdfhttps://web.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf
S. Nowlan and G. Hinton. Simplifying neural networks by soft weight-sharing. Neural Computation, 4(4), pp. 473–493, 1992.
B. Poole, J. Sohl-Dickstein, and S. Ganguli. Analyzing noise in autoencoders and deep networks. arXiv:1406.1831, 2014. https://arxiv.org/abs/1406.1831
M.’ A. Ranzato, Y-L. Boureau, and Y. LeCun. Sparse feature learning for deep belief networks. NeurIPS, pp. 1185–1192, 2008.
A. Rasmus, M. Berglund, M. Honkala, H. Valpola, and T. Raiko. Semi-supervised learning with ladder networks. NeurIPS, pp. 3546–3554, 2015.
S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee. Generative adversarial text to image synthesis. ICML Conference, pp. 1060–1069, 2016.
S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio. Contractive autoencoders: Explicit invariance during feature extraction. ICML, pp. 833–840, 2011.
S. Rifai, Y. Dauphin, P. Vincent, Y. Bengio, and X. Muller. The manifold tangent classifier. NeurIPS, pp. 2294–2302, 2011.
D. Rezende, S. Mohamed, and D. Wierstra. Stochastic backpropagation and approximate inference in deep generative models. arXiv:1401.4082, 2014. https://arxiv.org/abs/1401.4082
T Sanger. Neural network learning control of robot manipulators using gradually increasing task difficulty. IEEE Transactions on Robotics and Automation, 10(3), 1994.
H. Schwenk and Y. Bengio. Boosting neural networks. Neural Computation, 12(8), pp. 1869–1887, 2000.
G. Seni and J. Elder. Ensemble methods in data mining: Improving accuracy through combining predictions. Morgan and Claypool, 2010.
J. Sietsma and R. Dow. Creating artificial neural networks that generalize. Neural Networks, 4(1), pp. 67–79, 1991.
K. Sohn, H. Lee, and X. Yan. Learning structured output representation using deep conditional generative models. NeurIPS, 2015.
R. Solomonoff. A system for incremental learning based on algorithmic probability. Sixth Israeli Conference on Artificial Intelligence, CVPR, pp. 515–527, 1994.
N. Srivastava et al. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), pp. 1929–1958, 2014.
R. K. Srivastava, K. Greff, and J. Schmidhuber. Highway networks. arXiv:1505.00387, 2015. https://arxiv.org/abs/1505.00387
F. Strub and J. Mary. Collaborative filtering with stacked denoising autoencoders and sparse inputs. NeurIPS Workshop on Machine Learning for eCommerce, 2015.
A. Tikhonov and V. Arsenin. Solution of ill-posed problems. Winston and Sons, 1977.
H. Valpola. From neural PCA to deep unsupervised learning. Advances in Independent Component Analysis and Learning Machines, pp. 143–171, Elsevier, 2015.
P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol. Extracting and composing robust features with denoising autoencoders. ICML, pp. 1096–1103, 2008.
J. Walker, C. Doersch, A. Gupta, and M. Hebert. An uncertain future: Forecasting from static images using variational autoencoders. ECCV, pp. 835–851, 2016.
L. Wan, M. Zeiler, S. Zhang, Y. LeCun, and R. Fergus. Regularization of neural networks using dropconnect. ICML, pp. 1058–1066, 2013.
S. Wang, C. Aggarwal, and H. Liu. Using a random forest to inspire a neural network and improving on it. SDM Conference, 2017.
Y. Wu, C. DuBois, A. Zheng, and M. Ester. Collaborative denoising auto-encoders for top-n recommender systems. WSDM Conference, pp. 153–162, 2016.
Z. Wu. Global continuation for distance geometry problems. SIAM Journal of Optimization, 7, pp. 814–836, 1997.
C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals. Understanding deep learning requires rethinking generalization. arXiv:1611.03530, 2016. https://arxiv.org/abs/1611.03530
Z.-H. Zhou. Ensemble methods: Foundations and algorithms. CRC Press, 2012.
Z.-H. Zhou, J. Wu, and W. Tang. Ensembling neural networks: many could be better than all. Artificial Intelligence, 137(1–2), pp. 239–263, 2002.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Aggarwal, C. (2023). Teaching Deep Learners to Generalize. In: Neural Networks and Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-031-29642-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-29642-0_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29641-3
Online ISBN: 978-3-031-29642-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)