Unsupervised Feature Learning for RGB-D Based Object Recognition

  • Liefeng Bo
  • Xiaofeng Ren
  • Dieter Fox
Part of the Springer Tracts in Advanced Robotics book series (STAR, volume 88)


Recently introduced RGB-D cameras are capable of providing high quality synchronized videos of both color and depth. With its advanced sensing capabilities, this technology represents an opportunity to dramatically increase the capabilities of object recognition. It also raises the problem of developing expressive features for the color and depth channels of these sensors. In this paper we introduce hierarchical matching pursuit (HMP) for RGB-D data. HMP uses sparse coding to learn hierarchical feature representations from raw RGB-D data in an unsupervised way. Extensive experiments on various datasets indicate that the features learned with our approach enable superior object recognition results using linear support vector machines.


Object Recognition Image Patch Sparse Code Orthogonal Match Pursuit Spatial Pyramid 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wright, J., Yang, A., Ganesh, A., Sastry, S., Ma, Y.: Robust Face Recognition via Sparse Representation. IEEE PAMI 31(2), 210–227 (2009)CrossRefGoogle Scholar
  2. 2.
    Aharon, M., Elad, M., Bruckstein, A.: K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation. IEEE Transactions on Signal Processing 54(11), 4311–4322 (2006)CrossRefGoogle Scholar
  3. 3.
    Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-Up Robust Features (SURF). Computer Vision and Image Understanding 110(3), 346–359 (2008)CrossRefGoogle Scholar
  4. 4.
    Blum, M., Springenberg, J., Wülfing, J., Riedmiller, M.: A Learned Feature Descriptor for Object Recognition in RGB-D Data. In: ICRA (2012)Google Scholar
  5. 5.
    Bo, L., Lai, K., Ren, X., Fox, D.: Object Recognition with Hierarchical Kernel Descriptors. In: CVPR (2011)Google Scholar
  6. 6.
    Bo, L., Ren, X., Fox, D.: Kernel Descriptors for Visual Recognition. In: NIPS (2010)Google Scholar
  7. 7.
    Bo, L., Ren, X., Fox, D.: Depth Kernel Descriptors for Object Recognition. In: IROS (2011)Google Scholar
  8. 8.
    Bo, L., Ren, X., Fox, D.: Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms. In: NIPS (2011)Google Scholar
  9. 9.
    Boureau, Y., Bach, F., LeCun, Y., Ponce, J.: Learning Mid-level Features for Recognition. In: CVPR (2010)Google Scholar
  10. 10.
    Browatzki, B., Fischer, J., Graf, B., Bülthoff, H., Wallraven, C.: Going into Depth: Evaluating 2D and 3D Cues for Object Classification on a New, Large-scale Object Dataset. In: 1st ICCV Workshop on Consumer Depth Cameras in Computer Vision (2011)Google Scholar
  11. 11.
    Coates, A., Lee, H., Ng, A.: An Analysis of Single-layer Networks in Unsupervised Feature Learning. In: International Conference on AI and Statistics (2011)Google Scholar
  12. 12.
    Coates, A., Ng, A.: The Importance of Encoding versus Training with Sparse Coding and Vector Quantization. In: ICML (2011)Google Scholar
  13. 13.
    Davis, G., Mallat, S., Avellaneda, M.: Adaptive Greedy Approximations. Constructive Approximation 13(1), 57–98 (1997)MathSciNetMATHGoogle Scholar
  14. 14.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object Detection with Discriminatively Trained Part-Based Models. IEEE PAMI 32, 1627–1645 (2010)CrossRefGoogle Scholar
  15. 15.
    Hinterstoisser, S., Holzer, S., Cagniart, C., Ilic, S., Konolige, K., Navab, N., Lepetit, V.: Multimodal Templates for Real-Time Detection of Texture-less Objects in Heavily Cluttered Scenes. In: ICCV (2011)Google Scholar
  16. 16.
    Hinton, G., Osindero, S., Teh, Y.: A Fast Learning Algorithm for Deep Belief Nets. Neural Computation 18(7), 1527–1554 (2006)MathSciNetMATHCrossRefGoogle Scholar
  17. 17.
    Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y.: What is the Best Multi-Stage Architecture for Object Recognition? In: ICCV (2009)Google Scholar
  18. 18.
    Johnson, A., Hebert, M.: Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes. IEEE PAMI 21(5) (1999)Google Scholar
  19. 19.
    Kavukcuoglu, K., Sermanet, P., Boureau, Y., Gregor, K., Mathieu, M., LeCun, Y.: Learning Convolutional Feature Hierarchies for Visual Recognition. In: NIPS (2010)Google Scholar
  20. 20.
    Lai, K., Bo, L., Ren, X., Fox, D.: A Large-Scale Hierarchical Multi-View RGB-D Object Dataset. In: ICRA (2011)Google Scholar
  21. 21.
    Lai, K., Bo, L., Ren, X., Fox, D.: A Scalable Tree-based Approach for Joint Object and Pose Recognition. In: AAAI (2011)Google Scholar
  22. 22.
    Lai, K., Bo, L., Ren, X., Fox, D.: RGB-D Object Recognition: Features, Algorithms, and a Large Scale Benchmark. In: Fossati, A., Gall, J., Grabner, H., Ren, X., Konolige, K. (eds.) Consumer Depth Cameras for Computer Vision: Research Topics and Applications. Springer (2012)Google Scholar
  23. 23.
    Lee, H., Battle, A., Raina, R., Ng, A.: Efficient Sparse Coding Algorithms. In: NIPS (2007)Google Scholar
  24. 24.
    Lee, H., Grosse, R., Ranganath, R., Ng, A.: Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations. In: ICML (2009)Google Scholar
  25. 25.
    Li, L., Su, H., Xing, E., Fei-Fei, L.: Object Bank: A High-Level Image Representation for Scene Classification and Semantic Feature Sparsification. In: NIPS (2010)Google Scholar
  26. 26.
    Lowe, D.: Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision 60, 91–110 (2004)CrossRefGoogle Scholar
  27. 27.
    Mairal, J., Bach, F., Ponce, J., Sapiro, G., Zisserman, A.: Discriminative learned dictionaries for local image analysis. In: CVPR, pp. 1–8 (2008)Google Scholar
  28. 28.
    Mairal, J., Bach, F., Ponce, J., Sapiro, G., Zisserman, A.: Supervised Dictionary Learning. In: NIPS, pp. 1033–1040 (2008)Google Scholar
  29. 29.
    Maji, S., Berg, A., Malik, J.: Classification Using Intersection Kernel Support Vector Machines is Efficient. In: CVPR (2008)Google Scholar
  30. 30.
    Morisset, B., Bogdan Rusu, R., Sundaresan, A., Hauser, K., Agrawal, M., Latombe, J., Beetz, M.: Leaving Flatland: Toward Real-Time 3D Navigation. In: ICRA (2009)Google Scholar
  31. 31.
    Olshausen, B., Field, D.: Emergence of Simple-cell Receptive Field Properties by Learning a Sparse Code for Natural Images. Nature 381, 607–609 (1996)CrossRefGoogle Scholar
  32. 32.
    Pandey, M., Lazebnik, S.: Scene Recognition and Weakly Supervised Object Localization with Deformable Part-Based Models. In: ICCV (2011)Google Scholar
  33. 33.
    Naderi Parizi, S., Oberlin, J., Felzenszwalb, P.: Reconfigurable Models for Scene Recognition. In: CVPR (2012)Google Scholar
  34. 34.
    Pati, Y., Rezaiifar, R., Krishnaprasad, P.: Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet Decomposition. In: The Twenty-Seventh Asilomar Conference on Signals, Systems and Computers, pp. 40–44 (1993)Google Scholar
  35. 35.
    Quattoni, A., Torralba, A.: Recognizing Indoor Scenes. In: CVPR (2009)Google Scholar
  36. 36.
    Rubinstein, R., Zibulevsky, M., Elad, M.: Efficient Implementation of the K-SVD Algorithm using Batch Orthogonal Matching Pursuit. Technical report (2008)Google Scholar
  37. 37.
    Ruhnke, M., Steder, B., Grisetti, G., Burgard, W.: Unsupervised Learning of Compact 3D Models Based on the Detection of Recurrent Structures. In: IROS (2010)Google Scholar
  38. 38.
    Salakhutdinov, R., Hinton, G.: Deep Boltzmann Machines. In: International Conference on AI and Statistics (2009)Google Scholar
  39. 39.
    Tang, J., Miller, S., Singh, A., Abbeel, P.: A Textured Object Recognition Pipeline for Color and Depth Image Data. In: ICRA (2012)Google Scholar
  40. 40.
    Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.: Extracting and Composing Robust Features with Denoising Autoencoders. In: ICML (2008)Google Scholar
  41. 41.
    Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Guo, Y.: Locality-constrained Linear Coding for Image Classification. In: CVPR (2010)Google Scholar
  42. 42.
    Yang, J., Yu, K., Gong, Y., Huang, T.: Linear Spatial Pyramid Matching using Sparse Coding for Image Classification. In: CVPR (2009)Google Scholar
  43. 43.
    Yu, K., Lin, Y., Lafferty, J.: Learning Image Representations from the Pixel Level via Hierarchical Sparse Coding. In: CVPR (2011)Google Scholar
  44. 44.
    Zeiler, M., Krishnan, D., Taylor, G., Fergus, R.: Deconvolutional Networks. In: CVPR (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  1. 1.University of WashingtonSeattleUSA
  2. 2.ISTC-Pervasive Computing Intel LabsSeattleUSA

Personalised recommendations