Advertisement

Learning Invariant Feature Hierarchies

  • Yann LeCun
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7583)

Abstract

Fast visual recognition in the mammalian cortex seems to be a hierarchical process by which the representation of the visual world is transformed in multiple stages from low-level retinotopic features to high-level, global and invariant features, and to object categories. Every single step in this hierarchy seems to be subject to learning. How does the visual cortex learn such hierarchical representations by just looking at the world? How could computers learn such representations from data? Computer vision models that are weakly inspired by the visual cortex will be described. A number of unsupervised learning algorithms to train these models will be presented, which are based on the sparse auto-encoder concept. The effectiveness of these algorithms for learning invariant feature hierarchies will be demonstrated with a number of practical tasks such as scene parsing, pedestrian detection, and object classification.

Keywords

Visual Cortex Sparse Code Neural Information Processing System Restricted Boltzmann Machine Machine Learn Research 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60(4), 91–110 (2004)CrossRefGoogle Scholar
  2. 2.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection, vol. 2, pp. 886–893 (June 2005)Google Scholar
  3. 3.
    Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160, 106–154 (1962)Google Scholar
  4. 4.
    Hansen, K.A., Kay, K.N., Gallant, J.L.: Topographic organization in and near human visual area v4. Journal of Neuroscience 27, 11896–11911 (2007)CrossRefGoogle Scholar
  5. 5.
    Tanaka, K.: Inferotemporal cortex and object vision. Annual Review of Neuroscience 19, 109–139 (1996)CrossRefGoogle Scholar
  6. 6.
    Thorpe, S., Fize, D., Marlot, C.: Speed of processing in the human visual system. Nature 381(6582), 520–522 (1996)CrossRefGoogle Scholar
  7. 7.
    Sur, M., Garraghty, P.E., Roe, A.W.: Experimentally induced visual projections into auditory thalamus and cortex. Science 242(4884), 1437–1441 (1988)CrossRefGoogle Scholar
  8. 8.
    LeCun, Y., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient BackProp. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, pp. 9–50. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  9. 9.
    Boureau, Y., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in vision algorithms. In: Proc. International Conference on Machine learning, ICML 2010 (2010)Google Scholar
  10. 10.
    LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Touretzky, D. (ed.) Advances in Neural Information Processing Systems (NIPS 1989), Denver, CO, vol. 2. Morgan Kaufman (1990)Google Scholar
  11. 11.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  12. 12.
    Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y.: What is the best multi-stage architecture for object recognition? In: Proc. International Conference on Computer Vision (ICCV 2009). IEEE (2009)Google Scholar
  13. 13.
    Freeman, J., Simoncelli, E.P.: Metamers of the ventral stream. Nature Neuroscience 14(9), 1195–1201 (2011)CrossRefGoogle Scholar
  14. 14.
    Fukushima, K., Miyake, S.: Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recognition 15, 455–469 (1982)CrossRefGoogle Scholar
  15. 15.
    Serre, T., Wolf, L., Poggio, T.: Object recognition with features inspired by visual cortex. In: CVPR (2005)Google Scholar
  16. 16.
    Pinto, N., Cox, D.D., DiCarlo, J.J.: Why is real-world visual object recognition hard? PLoS Comput. Biol. 4(1), e27 (2008)Google Scholar
  17. 17.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (2004)Google Scholar
  18. 18.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proc. of Computer Vision and Pattern Recognition (2005)Google Scholar
  19. 19.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)Google Scholar
  20. 20.
    Boureau, Y., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: Proc. International Conference on Computer Vision and Pattern Recognition (CVPR 2010). IEEE (2010)Google Scholar
  21. 21.
    LeCun, Y., Kavukvuoglu, K., Farabet, C.: Convolutional networks and applications in vision. In: Proc. International Symposium on Circuits and Systems (ISCAS 2010). IEEE (2010)Google Scholar
  22. 22.
    Dahl, G., Ranzato, M., Mohamed, A., Hinton, G.E.: Phone recognition with the mean-covariance restricted boltzmann machine. In: Advances in Neural Information Processing Systems, vol. 23, pp. 469–477 (2010)Google Scholar
  23. 23.
    Dahl, G., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing (2012)Google Scholar
  24. 24.
    Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine (in press, 2012)Google Scholar
  25. 25.
    Boureau, Y., Le Roux, N., Bach, F., Ponce, J., LeCun, Y.: Ask the locals: multi-way local pooling for image recognition. In: Proc. International Conference on Computer Vision, ICCV 2011 (2011)Google Scholar
  26. 26.
    Le, Q., Monga, R., Devin, M., Corrado, G., Chen, K., Ranzato, M., Dean, J., Ng, A.: Building high-level features using large scale unsupervised learning. In: Proceedings of ICML 2012 (2012)Google Scholar
  27. 27.
    Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. ArXiv e-prints (July 2012)Google Scholar
  28. 28.
    Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Scene parsing with multiscale feature learning, purity trees, and optimal covers. In: Proc. International Conference on Machine learning, ICML 2012 (2012)Google Scholar
  29. 29.
    Taylor, G.W., Fergus, R., LeCun, Y., Bregler, C.: Convolutional Learning of Spatio-temporal Features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 140–153. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  30. 30.
    Le, Q., Zou, W., Yeung, S., Ng, A.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3361–3368. IEEE (2011)Google Scholar
  31. 31.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, 2493–2537 (2011)Google Scholar
  32. 32.
    Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: A deep learning approach. In: Proceedings of the Twenty-eight International Conference on Machine Learning (ICML 2011), vol. 27, pp. 97–110 (2011)Google Scholar
  33. 33.
    Henaff, M., Jarrett, K., Kavukcuoglu, K., LeCun, Y.: Unsupervised learning of sparse features for scalable audio classification. In: Proceedings of International Symposium on Music Information Retrieval, ISMIR 2011 (2011) (Best Student Paper Award)Google Scholar
  34. 34.
    Coates, A., Ng., A.Y.: Selecting receptive fields in deep networks. In: Neural Information Processing Systems 24, NIPS 2011 (2011)Google Scholar
  35. 35.
    Ranzato, M., Poultney, C., Chopra, S., LeCun, Y.: Efficient learning of sparse representations with an energy-based model. In: Platt, J., et al. (eds.) Advances in Neural Information Processing Systems (NIPS 2006), vol. 19. MIT Press (2006)Google Scholar
  36. 36.
    Ranzato, M., Boureau, Y., LeCun, Y.: Sparse feature learning for deep belief networks. In: Advances in Neural Information Processing Systems (NIPS 2007), vol. 20 (2007)Google Scholar
  37. 37.
    Kavukcuoglu, K., Ranzato, M., LeCun, Y.: Fast inference in sparse coding algorithms with applications to object recognition. Technical report, Computational and Biological Learning Lab, Courant Institute, NYU (2008), Tech Report CBLL-TR-2008-12-01, http://arxiv.org/abs/1010.3467
  38. 38.
    Kavukcuoglu, K., Ranzato, M., Fergus, R., LeCun, Y.: Learning invariant features through topographic filter maps. In: Proc. International Conference on Computer Vision and Pattern Recognition (CVPR 2009). IEEE (2009)Google Scholar
  39. 39.
    Gregor, K., LeCun, Y.: Learning fast approximations of sparse coding. In: Proc. International Conference on Machine learning, ICML 2010 (2010)Google Scholar
  40. 40.
    Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11Google Scholar
  41. 41.
    Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contracting auto-encoders: Explicit invariance during feature extraction. In: Proceedings of the Twenty-eight International Conference on Machine Learning, ICML 2011 (June 2011)Google Scholar
  42. 42.
    Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
  43. 43.
    Bengio, Y., Courville, A.C., Vincent, P.: Unsupervised feature learning and deep learning: A review and new perspectives. CoRR abs/1206.5538 (2012)Google Scholar
  44. 44.
    Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: a strategy employed by v1? Vision Research 37, 3311–3325 (1997)CrossRefGoogle Scholar
  45. 45.
    Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)Google Scholar
  46. 46.
    Beck, A., Teboulle, M.: A fast iterative shrinkage thresholding algorithm with application to wavelet-based image deblurring. In: ISCASSP (2009)Google Scholar
  47. 47.
    Zeiler, M., Krishnan, D., Taylor, G., Fergus, R.: Deconvolutional networks. In: CVPR (2010)Google Scholar
  48. 48.
    Kavukcuoglu, K., Sermanet, P., Boureau, Y., Gregor, K., Mathieu, M., LeCun, Y.: Learning convolutional feature hierachies for visual recognition. In: Advances in Neural Information Processing Systems (NIPS 2010), vol. 23 (2010)Google Scholar
  49. 49.
    Mairal, J., Jenatton, R., Obozinski, G., Bach, F.: Convex and network flow optimization for structured sparsity. Journal of Machine Learning Research 12, 2681–2720 (2011)MathSciNetGoogle Scholar
  50. 50.
    Le, Q.V., Ranzato, M.A., Monga, R., Devin, M., Chen, K., Corrado, G.S., Dean, J., Ng, A.Y.: Building high-level features using large scale unsupervised learning. In: Proceedings of the Twenty-ninth International Conference on Machine Learning, ICML 2012 (2012)Google Scholar
  51. 51.
    Hyvarinen, A., Hoyer, P.: A two-layer sparse coding model learns simple and complex cell receptive fields and topography from natural images. Vision Research 41(8), 2413–2423 (2001)CrossRefGoogle Scholar
  52. 52.
    Osindero, S., Welling, M., Hinton, G.E.: Topographic product models applied to natural scene statistics. Neural Comput. 18(2), 381–414 (2006)MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Yann LeCun
    • 1
  1. 1.Courant InstituteNew York UniversityUSA

Personalised recommendations