Skip to main content

Teaching Deep Learners to Generalize

  • Chapter
  • First Online:
Neural Networks and Deep Learning
  • 3699 Accesses

Abstract

Neural networks are powerful learners that have repeatedly proven to be capable of learning complex functions in many domains. However, the great power of neural networks is also their greatest weakness; neural networks often simply overfit the training data if care is not taken to design the learning process carefully.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 69.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Computational errors can be ignored by requiring that |wi| should be at least 10−6 in order for wi to be considered truly non-zero.

References

  1. C. Aggarwal. Outlier analysis. Springer, 2017.

    Google Scholar 

  2. Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. NeurIPS, 19, 153, 2007.

    Google Scholar 

  3. Y. Bengio, N. Le Roux, P. Vincent, O. Delalleau, and P. Marcotte. Convex neural networks. NeurIPS, pp. 123–130, 2005.

    Google Scholar 

  4. Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. ICML, 2009.

    Google Scholar 

  5. Y. Bengio, L. Yao, G. Alain, and P. Vincent. Generalized denoising auto-encoders as generative models. NeurIPS, pp. 899–907, 2013.

    Google Scholar 

  6. C. M. Bishop. Training with noise is equivalent to Tikhonov regularization. Neural computation, 7(1),pp. 108–116, 1995.

    Article  Google Scholar 

  7. L. Breiman. Bagging predictors. Machine Learning, 24(2), pp. 123–140, 1996.

    Article  MATH  Google Scholar 

  8. P. Bühlmann and B. Yu. Analyzing bagging. Annals of Statistics, pp. 927–961, 2002.

    Google Scholar 

  9. Y. Burda, R. Grosse, and R. Salakhutdinov. Importance weighted autoencoders. arXiv:1509.00519, 2015. https://arxiv.org/abs/1509.00519

  10. J. Chen, S. Sathe, C. Aggarwal, and D. Turaga. Outlier detection with autoencoder ensembles. SDM Conference, 2017.

    Google Scholar 

  11. Y. Chen and M. Zaki. KATE: K-Competitive Autoencoder for Text. KDD, 2017.

    Google Scholar 

  12. M. Denil, B. Shakibi, L. Dinh, M. A. Ranzato, and N. de Freitas. Predicting parameters in deep learning. NeurIPS, pp. 2148–2156, 2013.

    Google Scholar 

  13. C. Doersch. Tutorial on variational autoencoders. arXiv:1606.05908, 2016. https://arxiv.org/abs/1606.05908

  14. H. Drucker and Y. LeCun. Improving generalization performance using double backpropagation. IEEE TNNLS, 3(6), pp. 991–997, 1992.

    Google Scholar 

  15. J. Elman. Learning and development in neural networks: The importance of starting small. Cognition, 48, pp. 781–799, 1993.

    Article  Google Scholar 

  16. D. Erhan et al. Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11, pp. 625–660, 2010.

    MathSciNet  MATH  Google Scholar 

  17. Y. Freund and R. Schapire. A decision-theoretic generalization of online learning and application to boosting. Computational Learning Theory, pp. 23–37, 1995.

    Google Scholar 

  18. K. Greff, R. K. Srivastava, and J. Schmidhuber. Highway and residual networks learn unrolled iterative estimation. arXiv:1612.07771, 2016. https://arxiv.org/abs/1612.07771

  19. L. K. Hansen and P. Salamon. Neural network ensembles. IEEE TPAMI, 12(10), pp. 993–1001, 1990.

    Article  Google Scholar 

  20. B. Hassibi and D. Stork. Second order derivatives for network pruning: Optimal brain surgeon. NeurIPS, 1993.

    Google Scholar 

  21. T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 2009.

    Google Scholar 

  22. T. Hastie, R. Tibshirani, and M. Wainwright. Statistical learning with sparsity: the lasso and generalizations. CRC Press, 2015.

    Google Scholar 

  23. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CVPR, pp. 770–778, 2016.

    Google Scholar 

  24. G. Hinton. To recognize shapes, first learn to generate images. Progress in Brain Research, 165, pp. 535–547, 2007.

    Article  Google Scholar 

  25. G. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7), pp. 1527–1554, 2006.

    Article  MathSciNet  MATH  Google Scholar 

  26. G. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580, 2012. https://arxiv.org/abs/1207.0580

  27. S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8), pp. 1735–1785, 1997.

    Article  Google Scholar 

  28. F. Khan, B. Mutlu, and X. Zhu. How do humans teach: On curriculum learning and teaching dimension. NeurIPS, pp. 1449–1457, 2011.

    Google Scholar 

  29. D. Kingma and M. Welling. Auto-encoding variational bayes. arXiv:1312.6114, 2013. https://arxiv.org/abs/1312.6114

  30. S. Kirkpatrick, C. Gelatt, and M. Vecchi. Optimization by simulated annealing. Science, 220, pp. 671–680, 1983.

    Article  MathSciNet  MATH  Google Scholar 

  31. R. Kohavi and D. Wolpert. Bias plus variance decomposition for zero-one loss functions. ICML, 1996.

    Google Scholar 

  32. E. Kong and T. Dietterich. Error-correcting output coding corrects bias and variance. ICML, pp. 313–321, 1995.

    Google Scholar 

  33. A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. NeurIPS, pp. 1097–1105. 2012.

    Google Scholar 

  34. Q. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, and A. Ng, On optimization methods for deep learning. ICML, pp. 265–272, 2011.

    Google Scholar 

  35. Q. Le, W. Zou, S. Yeung, and A. Ng. Learning hierarchical spatio-temporal features for action recognition with independent subspace analysis. CVPR, 2011.

    Google Scholar 

  36. Y. LeCun, J. Denker, and S. Solla. Optimal brain damage. NeurIPS, pp. 598–605, 1990.

    Google Scholar 

  37. H. Lee, C. Ekanadham, and A. Ng. Sparse deep belief net model for visual area V2. NeurIPS, 2008.

    Google Scholar 

  38. J. Ma, R. P. Sheridan, A. Liaw, G. E. Dahl, and V. Svetnik. Deep neural nets as a method for quantitative structure-activity relationships. Journal of Chemical Information and Modeling, 55(2), pp. 263–274, 2015.

    Article  Google Scholar 

  39. A. Makhzani and B. Frey. K-sparse autoencoders. arXiv:1312.5663, 2013. https://arxiv.org/abs/1312.5663

  40. A. Makhzani and B. Frey. Winner-take-all autoencoders. NeurIPS, pp. 2791–2799, 2015.

    Google Scholar 

  41. A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey. Adversarial autoencoders. arXiv:1511.05644, 2015. https://arxiv.org/abs/1511.05644

  42. H. Mobahi and J. Fisher. A theoretical analysis of optimization by Gaussian continuation. AAAI, 2015.

    Google Scholar 

  43. A. Ng. Sparse autoencoder. CS294A Lecture notes, 2011. https://nlp.stanford.edu/~socherr/sparseAutoencoder_2011new.pdfhttps://web.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf

  44. S. Nowlan and G. Hinton. Simplifying neural networks by soft weight-sharing. Neural Computation, 4(4), pp. 473–493, 1992.

    Article  Google Scholar 

  45. B. Poole, J. Sohl-Dickstein, and S. Ganguli. Analyzing noise in autoencoders and deep networks. arXiv:1406.1831, 2014. https://arxiv.org/abs/1406.1831

  46. M.’ A. Ranzato, Y-L. Boureau, and Y. LeCun. Sparse feature learning for deep belief networks. NeurIPS, pp. 1185–1192, 2008.

    Google Scholar 

  47. A. Rasmus, M. Berglund, M. Honkala, H. Valpola, and T. Raiko. Semi-supervised learning with ladder networks. NeurIPS, pp. 3546–3554, 2015.

    Google Scholar 

  48. S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee. Generative adversarial text to image synthesis. ICML Conference, pp. 1060–1069, 2016.

    Google Scholar 

  49. S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio. Contractive autoencoders: Explicit invariance during feature extraction. ICML, pp. 833–840, 2011.

    Google Scholar 

  50. S. Rifai, Y. Dauphin, P. Vincent, Y. Bengio, and X. Muller. The manifold tangent classifier. NeurIPS, pp. 2294–2302, 2011.

    Google Scholar 

  51. D. Rezende, S. Mohamed, and D. Wierstra. Stochastic backpropagation and approximate inference in deep generative models. arXiv:1401.4082, 2014. https://arxiv.org/abs/1401.4082

  52. T Sanger. Neural network learning control of robot manipulators using gradually increasing task difficulty. IEEE Transactions on Robotics and Automation, 10(3), 1994.

    Google Scholar 

  53. H. Schwenk and Y. Bengio. Boosting neural networks. Neural Computation, 12(8), pp. 1869–1887, 2000.

    Article  Google Scholar 

  54. G. Seni and J. Elder. Ensemble methods in data mining: Improving accuracy through combining predictions. Morgan and Claypool, 2010.

    Google Scholar 

  55. J. Sietsma and R. Dow. Creating artificial neural networks that generalize. Neural Networks, 4(1), pp. 67–79, 1991.

    Article  Google Scholar 

  56. K. Sohn, H. Lee, and X. Yan. Learning structured output representation using deep conditional generative models. NeurIPS, 2015.

    Google Scholar 

  57. R. Solomonoff. A system for incremental learning based on algorithmic probability. Sixth Israeli Conference on Artificial Intelligence, CVPR, pp. 515–527, 1994.

    Google Scholar 

  58. N. Srivastava et al. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), pp. 1929–1958, 2014.

    MathSciNet  MATH  Google Scholar 

  59. R. K. Srivastava, K. Greff, and J. Schmidhuber. Highway networks. arXiv:1505.00387, 2015. https://arxiv.org/abs/1505.00387

  60. F. Strub and J. Mary. Collaborative filtering with stacked denoising autoencoders and sparse inputs. NeurIPS Workshop on Machine Learning for eCommerce, 2015.

    Google Scholar 

  61. A. Tikhonov and V. Arsenin. Solution of ill-posed problems. Winston and Sons, 1977.

    Google Scholar 

  62. H. Valpola. From neural PCA to deep unsupervised learning. Advances in Independent Component Analysis and Learning Machines, pp. 143–171, Elsevier, 2015.

    Google Scholar 

  63. P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol. Extracting and composing robust features with denoising autoencoders. ICML, pp. 1096–1103, 2008.

    Google Scholar 

  64. J. Walker, C. Doersch, A. Gupta, and M. Hebert. An uncertain future: Forecasting from static images using variational autoencoders. ECCV, pp. 835–851, 2016.

    Google Scholar 

  65. L. Wan, M. Zeiler, S. Zhang, Y. LeCun, and R. Fergus. Regularization of neural networks using dropconnect. ICML, pp. 1058–1066, 2013.

    Google Scholar 

  66. S. Wang, C. Aggarwal, and H. Liu. Using a random forest to inspire a neural network and improving on it. SDM Conference, 2017.

    Google Scholar 

  67. Y. Wu, C. DuBois, A. Zheng, and M. Ester. Collaborative denoising auto-encoders for top-n recommender systems. WSDM Conference, pp. 153–162, 2016.

    Google Scholar 

  68. Z. Wu. Global continuation for distance geometry problems. SIAM Journal of Optimization, 7, pp. 814–836, 1997.

    Article  MathSciNet  MATH  Google Scholar 

  69. C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals. Understanding deep learning requires rethinking generalization. arXiv:1611.03530, 2016. https://arxiv.org/abs/1611.03530

  70. Z.-H. Zhou. Ensemble methods: Foundations and algorithms. CRC Press, 2012.

    Google Scholar 

  71. Z.-H. Zhou, J. Wu, and W. Tang. Ensembling neural networks: many could be better than all. Artificial Intelligence, 137(1–2), pp. 239–263, 2002.

    Article  MathSciNet  MATH  Google Scholar 

  72. http://scikit-learn.org/

  73. https://github.com/caglar/autoencoders

  74. https://github.com/y0ast

  75. https://github.com/fastforwardlabs/vae-tf/tree/master

  76. https://archive.ics.uci.edu/ml/datasets.html

  77. https://github.com/wiseodd/generative-models

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Aggarwal, C. (2023). Teaching Deep Learners to Generalize. In: Neural Networks and Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-031-29642-0_5

Download citation

Publish with us

Policies and ethics