Multipath Convolutional-Recursive Neural Networks for Object Recognition

  • Xiangyang Li
  • Shuqiang Jiang
  • Xinhang Song
  • Luis Herranz
  • Zhiping Shi
Conference paper
Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT, volume 432)


Extracting good representations from images is essential for many computer vision tasks. While progress in deep learning shows the importance of learning hierarchical features, it is also important to learn features through multiple paths. This paper presents Multipath Convolutional-Recursive Neural Networks(M-CRNNs), a novel scheme which aims to learn image features from multiple paths using models based on combination of convolutional and recursive neural networks (CNNs and RNNs). CNNs learn low-level features, and RNNs, whose inputs are the outputs of the CNNs, learn the efficient high-level features. The final features of an image are the combination of the features from all the paths. The result shows that the features learned from M-CRNNs are a highly discriminative image representation that increases the precision in object recognition.


Multiple paths convolutional neural networks recursive neural networks classification 


  1. 1.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  2. 2.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, San Diego (2005)Google Scholar
  3. 3.
    Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: CVPR, San Francisco (2010)Google Scholar
  4. 4.
    Bo, L., Ren, X., Fox, D.: Kernel descriptors for visual recognition. In: NIPS, Vancouver (2010)Google Scholar
  5. 5.
    Lobel, H., Vidal, R., Soto, A.: Hierarchical joint Max-Margin learning of mid and top level representations for visual recognition. In: ICCV, Sydney (2013)Google Scholar
  6. 6.
    Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Transaction on Pattern Analysis and Machine Intelligence 35(8), 1798–1828 (2013)CrossRefGoogle Scholar
  7. 7.
    Yu, K., Lin, Y., Lafferty, J.: Learning image representations from the pixel level via hierarchical sparse coding. In: CVPR, Colorado Springs (2011)Google Scholar
  8. 8.
    Le, Q., Ranzato, M., Monga, R., Devin, M., Chan, K., Gorrado, G., Dean, J., Ng, A.: Building high-level features using large scale unsupervised learning. In: ICML, Scotland (2012)Google Scholar
  9. 9.
    Lee, H., Grosse, R., Ranganath, R., Ng, A.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: ICML, Montreal (2009)Google Scholar
  10. 10.
    Lawrence, S., Giles, C., Tsoi, A., Back, D.: Face recognition: a convolutional neural-network approach. IEEE Transaction on Neural Networks 8(1), 98–113 (1997)CrossRefGoogle Scholar
  11. 11.
    Socher, R., Huval, B., Bhat, B., Manning, D., Ng, A.: Convolutional-recursive deep learning for 3D object classification. In: NIPS, Nevada (2012)Google Scholar
  12. 12.
    Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: NIPS, Nevada (2012)Google Scholar
  13. 13.
    Bo, L., Ren, X., Fox, D.: Multipath sparse coding using hierarchical matching pursuit. In: CVPR, Portland (2013)Google Scholar
  14. 14.
    Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y.: What is the best multi-stage architecture for object recognition. In: ICCV, Xi’an (2009)Google Scholar
  15. 15.
    Serre, T., Wolf, L., Poggio, T.: Object recognition with features Inspired by visual cortex. In: CVPR, San Diego (2005)Google Scholar
  16. 16.
    Kavukcuoglu, K., Ranzato, M., LeCun, Y.: Fast inference in sparse coding algorithm with applications to object recognition. Technical report, Computational and Biological Learning Lab, Courant Institute, NYU (2008)Google Scholar
  17. 17.
    Saxe, A., Koh, P., Chen, Z., Bhand, M., Suresh, B., Ng, A.: On random weights and unsupervised feature learning. In: ICML, Washington (2011)Google Scholar
  18. 18.
    Socher, R., Maning, C., Ng, A.: Learning continuous phrase representation and syntactic parsing with recursive neural networks. In: NIPS, Vancouver (2010)Google Scholar
  19. 19.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR, New York (2006)Google Scholar
  20. 20.
    Pinto, N., Cox, D., DiCarlo, J.: Why is real-world visual object recognition hard. PLOS Computational Biology 4(1) (2008)Google Scholar
  21. 21.
    Coates, A., Ng, A.: The importance of encoding versus training with sparse coding and vector quantization. In: ICML, Washington (2011)Google Scholar
  22. 22.
    Zhang, H., Berg, A., MaireM.,Malik, J.: SVM-KNN: discriminative nearest classification for visual category recognition. In: CVPR, New York (2006)Google Scholar
  23. 23.
    Kavukcuoglu, K., Ranzato, M., Fergus, R., LeCun, Y.: Learning invariant features through topographic filter maps. In: CVPR, Florida (2009)Google Scholar
  24. 24.
    Zeiler, M., Krishnan, D., Taylor, G., Fergus, R.: Deconvolutional networks. In: CVPR, San Francisco (2010)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2014

Authors and Affiliations

  • Xiangyang Li
    • 1
    • 2
  • Shuqiang Jiang
    • 2
  • Xinhang Song
    • 2
  • Luis Herranz
    • 2
  • Zhiping Shi
    • 1
  1. 1.College of Information EngineeringCapital Normal UniversityBeijingChina
  2. 2.Key Lab of Intelligent Information ProcessingInstitute of Computing Tech.BeijingChina

Personalised recommendations