Teaching Deep Learners to Generalize

Aggarwal, Charu

doi:10.1007/978-3-031-29642-0_5

Charu Aggarwal²

3699 Accesses

Abstract

Neural networks are powerful learners that have repeatedly proven to be capable of learning complex functions in many domains. However, the great power of neural networks is also their greatest weakness; neural networks often simply overfit the training data if care is not taken to design the learning process carefully.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Computational errors can be ignored by requiring that |w_i| should be at least 10⁻⁶ in order for w_i to be considered truly non-zero.

References

C. Aggarwal. Outlier analysis. Springer, 2017.
Google Scholar
Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. NeurIPS, 19, 153, 2007.
Google Scholar
Y. Bengio, N. Le Roux, P. Vincent, O. Delalleau, and P. Marcotte. Convex neural networks. NeurIPS, pp. 123–130, 2005.
Google Scholar
Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. ICML, 2009.
Google Scholar
Y. Bengio, L. Yao, G. Alain, and P. Vincent. Generalized denoising auto-encoders as generative models. NeurIPS, pp. 899–907, 2013.
Google Scholar
C. M. Bishop. Training with noise is equivalent to Tikhonov regularization. Neural computation, 7(1),pp. 108–116, 1995.
Article Google Scholar
L. Breiman. Bagging predictors. Machine Learning, 24(2), pp. 123–140, 1996.
Article MATH Google Scholar
P. Bühlmann and B. Yu. Analyzing bagging. Annals of Statistics, pp. 927–961, 2002.
Google Scholar
Y. Burda, R. Grosse, and R. Salakhutdinov. Importance weighted autoencoders. arXiv:1509.00519, 2015. https://arxiv.org/abs/1509.00519
J. Chen, S. Sathe, C. Aggarwal, and D. Turaga. Outlier detection with autoencoder ensembles. SDM Conference, 2017.
Google Scholar
Y. Chen and M. Zaki. KATE: K-Competitive Autoencoder for Text. KDD, 2017.
Google Scholar
M. Denil, B. Shakibi, L. Dinh, M. A. Ranzato, and N. de Freitas. Predicting parameters in deep learning. NeurIPS, pp. 2148–2156, 2013.
Google Scholar
C. Doersch. Tutorial on variational autoencoders. arXiv:1606.05908, 2016. https://arxiv.org/abs/1606.05908
H. Drucker and Y. LeCun. Improving generalization performance using double backpropagation. IEEE TNNLS, 3(6), pp. 991–997, 1992.
Google Scholar
J. Elman. Learning and development in neural networks: The importance of starting small. Cognition, 48, pp. 781–799, 1993.
Article Google Scholar
D. Erhan et al. Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11, pp. 625–660, 2010.
MathSciNet MATH Google Scholar
Y. Freund and R. Schapire. A decision-theoretic generalization of online learning and application to boosting. Computational Learning Theory, pp. 23–37, 1995.
Google Scholar
K. Greff, R. K. Srivastava, and J. Schmidhuber. Highway and residual networks learn unrolled iterative estimation. arXiv:1612.07771, 2016. https://arxiv.org/abs/1612.07771
L. K. Hansen and P. Salamon. Neural network ensembles. IEEE TPAMI, 12(10), pp. 993–1001, 1990.
Article Google Scholar
B. Hassibi and D. Stork. Second order derivatives for network pruning: Optimal brain surgeon. NeurIPS, 1993.
Google Scholar
T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 2009.
Google Scholar
T. Hastie, R. Tibshirani, and M. Wainwright. Statistical learning with sparsity: the lasso and generalizations. CRC Press, 2015.
Google Scholar
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CVPR, pp. 770–778, 2016.
Google Scholar
G. Hinton. To recognize shapes, first learn to generate images. Progress in Brain Research, 165, pp. 535–547, 2007.
Article Google Scholar
G. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7), pp. 1527–1554, 2006.
Article MathSciNet MATH Google Scholar
G. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580, 2012. https://arxiv.org/abs/1207.0580
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8), pp. 1735–1785, 1997.
Article Google Scholar
F. Khan, B. Mutlu, and X. Zhu. How do humans teach: On curriculum learning and teaching dimension. NeurIPS, pp. 1449–1457, 2011.
Google Scholar
D. Kingma and M. Welling. Auto-encoding variational bayes. arXiv:1312.6114, 2013. https://arxiv.org/abs/1312.6114
S. Kirkpatrick, C. Gelatt, and M. Vecchi. Optimization by simulated annealing. Science, 220, pp. 671–680, 1983.
Article MathSciNet MATH Google Scholar
R. Kohavi and D. Wolpert. Bias plus variance decomposition for zero-one loss functions. ICML, 1996.
Google Scholar
E. Kong and T. Dietterich. Error-correcting output coding corrects bias and variance. ICML, pp. 313–321, 1995.
Google Scholar
A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. NeurIPS, pp. 1097–1105. 2012.
Google Scholar
Q. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, and A. Ng, On optimization methods for deep learning. ICML, pp. 265–272, 2011.
Google Scholar
Q. Le, W. Zou, S. Yeung, and A. Ng. Learning hierarchical spatio-temporal features for action recognition with independent subspace analysis. CVPR, 2011.
Google Scholar
Y. LeCun, J. Denker, and S. Solla. Optimal brain damage. NeurIPS, pp. 598–605, 1990.
Google Scholar
H. Lee, C. Ekanadham, and A. Ng. Sparse deep belief net model for visual area V2. NeurIPS, 2008.
Google Scholar
J. Ma, R. P. Sheridan, A. Liaw, G. E. Dahl, and V. Svetnik. Deep neural nets as a method for quantitative structure-activity relationships. Journal of Chemical Information and Modeling, 55(2), pp. 263–274, 2015.
Article Google Scholar
A. Makhzani and B. Frey. K-sparse autoencoders. arXiv:1312.5663, 2013. https://arxiv.org/abs/1312.5663
A. Makhzani and B. Frey. Winner-take-all autoencoders. NeurIPS, pp. 2791–2799, 2015.
Google Scholar
A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey. Adversarial autoencoders. arXiv:1511.05644, 2015. https://arxiv.org/abs/1511.05644
H. Mobahi and J. Fisher. A theoretical analysis of optimization by Gaussian continuation. AAAI, 2015.
Google Scholar
A. Ng. Sparse autoencoder. CS294A Lecture notes, 2011. https://nlp.stanford.edu/~socherr/sparseAutoencoder_2011new.pdf https://web.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf
S. Nowlan and G. Hinton. Simplifying neural networks by soft weight-sharing. Neural Computation, 4(4), pp. 473–493, 1992.
Article Google Scholar
B. Poole, J. Sohl-Dickstein, and S. Ganguli. Analyzing noise in autoencoders and deep networks. arXiv:1406.1831, 2014. https://arxiv.org/abs/1406.1831
M.’ A. Ranzato, Y-L. Boureau, and Y. LeCun. Sparse feature learning for deep belief networks. NeurIPS, pp. 1185–1192, 2008.
Google Scholar
A. Rasmus, M. Berglund, M. Honkala, H. Valpola, and T. Raiko. Semi-supervised learning with ladder networks. NeurIPS, pp. 3546–3554, 2015.
Google Scholar
S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee. Generative adversarial text to image synthesis. ICML Conference, pp. 1060–1069, 2016.
Google Scholar
S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio. Contractive autoencoders: Explicit invariance during feature extraction. ICML, pp. 833–840, 2011.
Google Scholar
S. Rifai, Y. Dauphin, P. Vincent, Y. Bengio, and X. Muller. The manifold tangent classifier. NeurIPS, pp. 2294–2302, 2011.
Google Scholar
D. Rezende, S. Mohamed, and D. Wierstra. Stochastic backpropagation and approximate inference in deep generative models. arXiv:1401.4082, 2014. https://arxiv.org/abs/1401.4082
T Sanger. Neural network learning control of robot manipulators using gradually increasing task difficulty. IEEE Transactions on Robotics and Automation, 10(3), 1994.
Google Scholar
H. Schwenk and Y. Bengio. Boosting neural networks. Neural Computation, 12(8), pp. 1869–1887, 2000.
Article Google Scholar
G. Seni and J. Elder. Ensemble methods in data mining: Improving accuracy through combining predictions. Morgan and Claypool, 2010.
Google Scholar
J. Sietsma and R. Dow. Creating artificial neural networks that generalize. Neural Networks, 4(1), pp. 67–79, 1991.
Article Google Scholar
K. Sohn, H. Lee, and X. Yan. Learning structured output representation using deep conditional generative models. NeurIPS, 2015.
Google Scholar
R. Solomonoff. A system for incremental learning based on algorithmic probability. Sixth Israeli Conference on Artificial Intelligence, CVPR, pp. 515–527, 1994.
Google Scholar
N. Srivastava et al. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), pp. 1929–1958, 2014.
MathSciNet MATH Google Scholar
R. K. Srivastava, K. Greff, and J. Schmidhuber. Highway networks. arXiv:1505.00387, 2015. https://arxiv.org/abs/1505.00387
F. Strub and J. Mary. Collaborative filtering with stacked denoising autoencoders and sparse inputs. NeurIPS Workshop on Machine Learning for eCommerce, 2015.
Google Scholar
A. Tikhonov and V. Arsenin. Solution of ill-posed problems. Winston and Sons, 1977.
Google Scholar
H. Valpola. From neural PCA to deep unsupervised learning. Advances in Independent Component Analysis and Learning Machines, pp. 143–171, Elsevier, 2015.
Google Scholar
P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol. Extracting and composing robust features with denoising autoencoders. ICML, pp. 1096–1103, 2008.
Google Scholar
J. Walker, C. Doersch, A. Gupta, and M. Hebert. An uncertain future: Forecasting from static images using variational autoencoders. ECCV, pp. 835–851, 2016.
Google Scholar
L. Wan, M. Zeiler, S. Zhang, Y. LeCun, and R. Fergus. Regularization of neural networks using dropconnect. ICML, pp. 1058–1066, 2013.
Google Scholar
S. Wang, C. Aggarwal, and H. Liu. Using a random forest to inspire a neural network and improving on it. SDM Conference, 2017.
Google Scholar
Y. Wu, C. DuBois, A. Zheng, and M. Ester. Collaborative denoising auto-encoders for top-n recommender systems. WSDM Conference, pp. 153–162, 2016.
Google Scholar
Z. Wu. Global continuation for distance geometry problems. SIAM Journal of Optimization, 7, pp. 814–836, 1997.
Article MathSciNet MATH Google Scholar
C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals. Understanding deep learning requires rethinking generalization. arXiv:1611.03530, 2016. https://arxiv.org/abs/1611.03530
Z.-H. Zhou. Ensemble methods: Foundations and algorithms. CRC Press, 2012.
Google Scholar
Z.-H. Zhou, J. Wu, and W. Tang. Ensembling neural networks: many could be better than all. Artificial Intelligence, 137(1–2), pp. 239–263, 2002.
Article MathSciNet MATH Google Scholar
http://scikit-learn.org/
https://github.com/caglar/autoencoders
https://github.com/y0ast
https://github.com/fastforwardlabs/vae-tf/tree/master
https://archive.ics.uci.edu/ml/datasets.html
https://github.com/wiseodd/generative-models

Download references

Author information

Authors and Affiliations

IBM T. J. Watson Research Center, International Business Machines, Yorktown Heights, NY, USA
Charu Aggarwal

Authors

Charu Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Aggarwal, C. (2023). Teaching Deep Learners to Generalize. In: Neural Networks and Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-031-29642-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-29642-0_5
Published: 30 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29641-3
Online ISBN: 978-3-031-29642-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics