Higher Order Contractive Auto-Encoder

  • Salah Rifai
  • Grégoire Mesnil
  • Pascal Vincent
  • Xavier Muller
  • Yoshua Bengio
  • Yann Dauphin
  • Xavier Glorot
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6912)


We propose a novel regularizer when training an auto-encoder for unsupervised feature extraction. We explicitly encourage the latent representation to contract the input space by regularizing the norm of the Jacobian (analytically) and the Hessian (stochastically) of the encoder’s output with respect to its input, at the training points. While the penalty on the Jacobian’s norm ensures robustness to tiny corruption of samples in the input space, constraining the norm of the Hessian extends this robustness when moving further away from the sample. From a manifold learning perspective, balancing this regularization with the auto-encoder’s reconstruction objective yields a representation that varies most when moving along the data manifold in input space, and is most insensitive in directions orthogonal to the manifold. The second order regularization, using the Hessian, penalizes curvature, and thus favors smooth manifold. We show that our proposed technique, while remaining computationally efficient, yields representations that are significantly better suited for initializing deep architectures than previously proposed approaches, beating state-of-the-art performance on a number of datasets.


Unsupervised feature learning deep learning manifold 


  1. 1.
    Bengio, Y.: Learning deep architectures for AI. Foundations and Trends in Machine Learning 2(1), 1–127 (2009); Also published as a book. Now Publishers (2009)CrossRefzbMATHGoogle Scholar
  2. 2.
    Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: NIPS, vol. 19, pp. 153–160. MIT Press, Cambridge (2007)Google Scholar
  3. 3.
    Bengio, Y., Larochelle, H., Vincent, P.: Non-local manifold parzen windows. In: NIPS, vol. 18. MIT Press, Cambridge (2006)Google Scholar
  4. 4.
    Bishop, C.M.: Curvature-driven smoothing: A learning algorithm for feedforward networks. IEEE Transactions on Neural Networks 5(4), 882–884 (1993)CrossRefGoogle Scholar
  5. 5.
    Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006)Google Scholar
  6. 6.
    Cho, Y., Saul, L.: Kernel methods for deep learning. In: NIPS 2009, pp. 342–350, NIPS Foundation (2010)Google Scholar
  7. 7.
    Coates, A., Lee, H., Ng, A.: An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2011), JMLR W&CP (2011)Google Scholar
  8. 8.
    Goodfellow, I., Le, Q., Saxe, A., Ng, A.: Measuring invariances in deep networks. In: NIPS 2009, pp. 646–654 (2009)Google Scholar
  9. 9.
    Hastie, T., Tibshirani, R.: Generalized Additive Models. Chapman and Hall, Boca Raton (1990)zbMATHGoogle Scholar
  10. 10.
    Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Computation 18, 1527–1554 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Kavukcuoglu, K., Ranzato, M., Fergus, R., LeCun, Y.: Learning invariant features through topographic filter maps. In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR 2009), IEEE, Los Alamitos (2009)Google Scholar
  12. 12.
    Larochelle, H., Erhan, D., Courville, A., Bergstra, J., Bengio, Y.: An empirical evaluation of deep architectures on problems with many factors of variation. In: Ghahramani, Z. (ed.) ICML 2007, pp. 473–480. ACM, New York (2007)Google Scholar
  13. 13.
    Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: a strategy employed by V1? Vision Research 37, 3311–3325 (1997)CrossRefGoogle Scholar
  14. 14.
    Ranzato, M., Poultney, C., Chopra, S., LeCun, Y.: Efficient learning of sparse representations with an energy-based model. In: NIPS 2006 (2007)Google Scholar
  15. 15.
    Ranzato, M., Poultney, C., Chopra, S., LeCun, Y.: Efficient learning of sparse representations with an energy-based model. In: NIPS 2006, pp. 1137–1144. MIT Press, Cambridge (2007)Google Scholar
  16. 16.
    Rifai, S., Muller, X., Mesnil, G., Bengio, Y., Vincent, P.: Learning invariant features through local space contraction. Technical Report 1360, Département d’informatique et recherche opérationnelle, Université de Montréal (2011)Google Scholar
  17. 17.
    Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contracting auto-encoders: Explicit invariance during feature extraction. In: Proceedings of the Twenty-eight International Conference on Machine Learning, ICML 2011 (2011)Google Scholar
  18. 18.
    Salakhutdinov, R., Hinton, G.E.: Deep Boltzmann machines. In: AISTATS 2009, vol. 5, pp. 448–455 (2009)Google Scholar
  19. 19.
    Simard, P., Victorri, B., LeCun, Y., Denker, J.: Tangent prop - A formalism for specifying selected invariances in an adaptive network. In: NIPS 1991, pp. 895–903. Morgan Kaufmann, San Francisco (1992)Google Scholar
  20. 20.
    Swersky, K., Ranzato, M., Buchman, D., Marlin, B., de Freitas, N.: On score matching for energy based models: Generalizing autoencoders and simplifying deep learning. In: Proc. ICML 2011, ACM Press, New York (2011)Google Scholar
  21. 21.
    Tikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-posed Problems. W. H. Winston, Washington D.C (1977)zbMATHGoogle Scholar
  22. 22.
    Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: ICML 2008, pp. 1096–1103. ACM, New York (2008)Google Scholar
  23. 23.
    Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. JMLR 1, 3371–3408 (2010)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Wahba, G.: Spline models for observational data. In: CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 59. SIAM, Philadelphia (1990)Google Scholar
  25. 25.
    Weston, J., Ratle, F., Collobert, R.: Deep learning via semi-supervised embedding. In: Cohen, W.W., McCallum, A., Roweis, S.T. (eds.) ICML 2008, pp. 1168–1175. ACM, New York (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Salah Rifai
    • 1
  • Grégoire Mesnil
    • 1
    • 2
  • Pascal Vincent
    • 1
  • Xavier Muller
    • 1
  • Yoshua Bengio
    • 1
  • Yann Dauphin
    • 1
  • Xavier Glorot
    • 1
  1. 1.Dept. IROUniversité de MontréalMontréalCanada
  2. 2.LITIS EA 4108Université de RouenSaint Etienne du RouvrayFrance

Personalised recommendations