Skip to main content

RGB-D Object Recognition: Features, Algorithms, and a Large Scale Benchmark

  • Chapter
Consumer Depth Cameras for Computer Vision

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

Abstract

Over the last decade, the availability of public image repositories and recognition benchmarks has enabled rapid progress in visual object category and instance detection. Today we are witnessing the birth of a new generation of sensing technologies capable of providing high quality synchronized videos of both color and depth, the RGB-D (Kinect-style) camera. With its advanced sensing capabilities and the potential for mass adoption, this technology represents an opportunity to dramatically increase robotic object recognition, manipulation, navigation, and interaction capabilities. We introduce a large-scale, hierarchical multi-view object dataset collected using an RGB-D camera. The dataset consists of two parts: The RGB-D Object Dataset containing views of 300 objects organized into 51 categories, and the RGB-D Scenes Dataset containing 8 video sequences of office and kitchen environments. The dataset has been made publicly available to the research community so as to enable rapid progress based on this promising technology. We describe the dataset collection procedure and present techniques for RGB-D object recognition and detection of objects in scenes recorded using RGB-D videos, demonstrating that combining color and depth information substantially improves quality of results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bo, L., Lai, K., Ren, X., Fox, D.: Object recognition with hierarchical kernel descriptors. In: IEEE Conference on Computer Vision and Pattern Recognition (2011)

    Google Scholar 

  2. Bo, L., Ren, X., Fox, D.: Kernel descriptors for visual recognition. In: Advances in Neural Information Processing Systems (2010)

    Google Scholar 

  3. Bo, L., Ren, X., Fox, D.: Depth kernel descriptors for object recognition. In: Intelligent Robots and Systems (2011)

    Google Scholar 

  4. Bo, L., Sminchisescu, C.: Efficient match kernel between sets of features for visual recognition. In: Advances in Neural Information Processing Systems (2009)

    Google Scholar 

  5. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  6. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001)

    Google Scholar 

  7. Chen, Y., Gérard, M.: Object modelling by registration of multiple range images. Image Vis. Comput. 10(3), 145–155 (1992)

    Article  Google Scholar 

  8. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2005)

    Google Scholar 

  9. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)

    Google Scholar 

  10. Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)

    MATH  Google Scholar 

  11. Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 594–611 (2006)

    Article  Google Scholar 

  12. Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)

    Google Scholar 

  13. Fischler, M., Bolles, R.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  14. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: International Conference on Machine Learning, pp. 148–156 (1996)

    Google Scholar 

  15. Frome, A., Singer, Y., Malik, J.: Image retrieval and classification using local distance functions. In: Advances in Neural Information Processing Systems (2006)

    Google Scholar 

  16. Frome, A., Singer, Y., Sha, F., Malik, J.: Learning globally-consistent local distance functions for shape-based image retrieval and classification. In: International Conference on Computer Vision (2007)

    Google Scholar 

  17. Grisetti, G., Grzonka, S., Stachniss, C., Pfaff, P., Burgard, W.: Estimation of accurate maximum likelihood maps in 3d. In: Intelligent Robots and Systems (2007)

    Google Scholar 

  18. Helmer, S., Lowe, D.G.: Using stereo for object recognition. In: International Conference on Robotics and Automation, pp. 3121–3127 (2010)

    Chapter  Google Scholar 

  19. Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: RGB-D mapping: using depth cameras for dense 3d modeling of indoor environments. Int. J. Robot. Res. (2012). doi:10.1177/0278364911434148

    Google Scholar 

  20. Johnson, A., Hebert, M.: Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Trans. Pattern Anal. Mach. Intell. 21(5) (1999). doi:10.1109/34.765655

  21. KaewTraKulPong, P., Bowden, R.: An improved adaptive background mixture model for realtime tracking with shadow detection. In: European Workshop on Advanced Video Based Surveillance Systems (2001)

    Google Scholar 

  22. Microsoft Kinect. http://www.xbox.com/en-us/kinect

  23. Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: International Conference on Robotics and Automation (2011)

    Google Scholar 

  24. Lai, K., Bo, L., Ren, X., Fox, D.: A scalable tree-based approach for joint object and pose recognition. In: Conference on Artificial Intelligence (2011)

    Google Scholar 

  25. Lai, K., Bo, L., Ren, X., Fox, D.: Sparse distance learning for object recognition combining RGB and depth information. In: International Conference on Robotics and Automation (2011)

    Google Scholar 

  26. Lai, K., Bo, L., Ren, X., Fox, D.: Detection-based object labeling in 3d scenes. In: International Conference on Robotics and Automation (2012)

    Google Scholar 

  27. Lai, K., Fox, D.: Object Recognition in 3D point clouds using web data and domain adaptation. Int. J. Robot. Res. (2010). doi:10.1177/0278364910369190

    Google Scholar 

  28. Leung, T., Malik, J.: Representing and recognizing the visual appearance of materials using three-dimensional textons. Int. J. Comput. Vis. 43(1), 29–44 (2001)

    Article  MATH  Google Scholar 

  29. Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)

    Article  Google Scholar 

  30. Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer Vision (1999)

    Google Scholar 

  31. Malisiewicz, T., Efros, A.: Recognition by association via learning per-examplar distances. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)

    Google Scholar 

  32. Matuszek, C., Mayton, B., Aimi, R., Deisenroth, M., Bo, L., Chu, R., Kung, M., LeGrand, L., Smith, J., Fox, D.: Gambit: an autonomous chess-playing robotic system. In: International Conference on Robotics and Automation (2011)

    Google Scholar 

  33. Meier, L., Van De Geer, S., Bühlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc., Ser. B 70, 53–71 (2008)

    Article  MATH  Google Scholar 

  34. Pfister, H., Zwicker, M., van Baar, J., Gross, M.: Surfels: surface elements as rendering primitives. ACM Trans. Graph. (2000). doi:10.1145/344779.344936

    Google Scholar 

  35. PrimeSense. http://www.primesense.com/

  36. Russell, B., Torralba, K., Murphy, A., Freeman, W.: Labelme: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1–3) (2008). doi:10.1007/s11263-007-0090-8

  37. Savarese, S., Fei-Fei, L.: 3d generic object categorization, localization and pose estimation. In: International Conference on Computer Vision (2007)

    Google Scholar 

  38. Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons. Adv. Neural Inf. Process. Syst. (2003)

    Google Scholar 

  39. Vondrick, C., Ramanan, D., Patterson, D.: Efficiently scaling up video annotation with crowdsourced marketplaces. In: European Conference on Computer Vision (2010)

    Google Scholar 

  40. Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)

    MATH  Google Scholar 

  41. Xing, E., Ng, A., Jordan, M., Russell, S.: Distance metric learning with application to clustering with side-information. Adv. Neural Inform. Process. Syst. (2002)

    Google Scholar 

Download references

Acknowledgements

This work was funded in part by the Intel Science and Technology Center for Pervasive Computing and by ONR MURI grant N00014-07-1-0749.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kevin Lai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag London

About this chapter

Cite this chapter

Lai, K., Bo, L., Ren, X., Fox, D. (2013). RGB-D Object Recognition: Features, Algorithms, and a Large Scale Benchmark. In: Fossati, A., Gall, J., Grabner, H., Ren, X., Konolige, K. (eds) Consumer Depth Cameras for Computer Vision. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-4640-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-4640-7_9

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-4639-1

  • Online ISBN: 978-1-4471-4640-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics