Abstract
Over the last decade, the availability of public image repositories and recognition benchmarks has enabled rapid progress in visual object category and instance detection. Today we are witnessing the birth of a new generation of sensing technologies capable of providing high quality synchronized videos of both color and depth, the RGB-D (Kinect-style) camera. With its advanced sensing capabilities and the potential for mass adoption, this technology represents an opportunity to dramatically increase robotic object recognition, manipulation, navigation, and interaction capabilities. We introduce a large-scale, hierarchical multi-view object dataset collected using an RGB-D camera. The dataset consists of two parts: The RGB-D Object Dataset containing views of 300 objects organized into 51 categories, and the RGB-D Scenes Dataset containing 8 video sequences of office and kitchen environments. The dataset has been made publicly available to the research community so as to enable rapid progress based on this promising technology. We describe the dataset collection procedure and present techniques for RGB-D object recognition and detection of objects in scenes recorded using RGB-D videos, demonstrating that combining color and depth information substantially improves quality of results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bo, L., Lai, K., Ren, X., Fox, D.: Object recognition with hierarchical kernel descriptors. In: IEEE Conference on Computer Vision and Pattern Recognition (2011)
Bo, L., Ren, X., Fox, D.: Kernel descriptors for visual recognition. In: Advances in Neural Information Processing Systems (2010)
Bo, L., Ren, X., Fox, D.: Depth kernel descriptors for object recognition. In: Intelligent Robots and Systems (2011)
Bo, L., Sminchisescu, C.: Efficient match kernel between sets of features for visual recognition. In: Advances in Neural Information Processing Systems (2009)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001)
Chen, Y., Gérard, M.: Object modelling by registration of multiple range images. Image Vis. Comput. 10(3), 145–155 (1992)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2005)
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)
Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 594–611 (2006)
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)
Fischler, M., Bolles, R.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: International Conference on Machine Learning, pp. 148–156 (1996)
Frome, A., Singer, Y., Malik, J.: Image retrieval and classification using local distance functions. In: Advances in Neural Information Processing Systems (2006)
Frome, A., Singer, Y., Sha, F., Malik, J.: Learning globally-consistent local distance functions for shape-based image retrieval and classification. In: International Conference on Computer Vision (2007)
Grisetti, G., Grzonka, S., Stachniss, C., Pfaff, P., Burgard, W.: Estimation of accurate maximum likelihood maps in 3d. In: Intelligent Robots and Systems (2007)
Helmer, S., Lowe, D.G.: Using stereo for object recognition. In: International Conference on Robotics and Automation, pp. 3121–3127 (2010)
Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: RGB-D mapping: using depth cameras for dense 3d modeling of indoor environments. Int. J. Robot. Res. (2012). doi:10.1177/0278364911434148
Johnson, A., Hebert, M.: Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Trans. Pattern Anal. Mach. Intell. 21(5) (1999). doi:10.1109/34.765655
KaewTraKulPong, P., Bowden, R.: An improved adaptive background mixture model for realtime tracking with shadow detection. In: European Workshop on Advanced Video Based Surveillance Systems (2001)
Microsoft Kinect. http://www.xbox.com/en-us/kinect
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: International Conference on Robotics and Automation (2011)
Lai, K., Bo, L., Ren, X., Fox, D.: A scalable tree-based approach for joint object and pose recognition. In: Conference on Artificial Intelligence (2011)
Lai, K., Bo, L., Ren, X., Fox, D.: Sparse distance learning for object recognition combining RGB and depth information. In: International Conference on Robotics and Automation (2011)
Lai, K., Bo, L., Ren, X., Fox, D.: Detection-based object labeling in 3d scenes. In: International Conference on Robotics and Automation (2012)
Lai, K., Fox, D.: Object Recognition in 3D point clouds using web data and domain adaptation. Int. J. Robot. Res. (2010). doi:10.1177/0278364910369190
Leung, T., Malik, J.: Representing and recognizing the visual appearance of materials using three-dimensional textons. Int. J. Comput. Vis. 43(1), 29–44 (2001)
Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer Vision (1999)
Malisiewicz, T., Efros, A.: Recognition by association via learning per-examplar distances. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)
Matuszek, C., Mayton, B., Aimi, R., Deisenroth, M., Bo, L., Chu, R., Kung, M., LeGrand, L., Smith, J., Fox, D.: Gambit: an autonomous chess-playing robotic system. In: International Conference on Robotics and Automation (2011)
Meier, L., Van De Geer, S., Bühlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc., Ser. B 70, 53–71 (2008)
Pfister, H., Zwicker, M., van Baar, J., Gross, M.: Surfels: surface elements as rendering primitives. ACM Trans. Graph. (2000). doi:10.1145/344779.344936
PrimeSense. http://www.primesense.com/
Russell, B., Torralba, K., Murphy, A., Freeman, W.: Labelme: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1–3) (2008). doi:10.1007/s11263-007-0090-8
Savarese, S., Fei-Fei, L.: 3d generic object categorization, localization and pose estimation. In: International Conference on Computer Vision (2007)
Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons. Adv. Neural Inf. Process. Syst. (2003)
Vondrick, C., Ramanan, D., Patterson, D.: Efficiently scaling up video annotation with crowdsourced marketplaces. In: European Conference on Computer Vision (2010)
Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)
Xing, E., Ng, A., Jordan, M., Russell, S.: Distance metric learning with application to clustering with side-information. Adv. Neural Inform. Process. Syst. (2002)
Acknowledgements
This work was funded in part by the Intel Science and Technology Center for Pervasive Computing and by ONR MURI grant N00014-07-1-0749.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag London
About this chapter
Cite this chapter
Lai, K., Bo, L., Ren, X., Fox, D. (2013). RGB-D Object Recognition: Features, Algorithms, and a Large Scale Benchmark. In: Fossati, A., Gall, J., Grabner, H., Ren, X., Konolige, K. (eds) Consumer Depth Cameras for Computer Vision. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-4640-7_9
Download citation
DOI: https://doi.org/10.1007/978-1-4471-4640-7_9
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4639-1
Online ISBN: 978-1-4471-4640-7
eBook Packages: Computer ScienceComputer Science (R0)