Skip to main content
Log in

3D spatial pyramid: descriptors generation from point clouds for indoor scene classification

Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Traditionally, the indoor scene classification problem has been approached from a 2D image recognition point of view. In most visual scene classification systems, a descriptor for the input image is generated to obtain a suitable representation that includes features related to color, shape or spatial information. Techniques based on the use of a spatial pyramid have proven to be adequate to perform this step. In the past years, on the other hand, 3D sensors have become widely available, which allows to include new information sources to the framework previously described. In this work we rely on RGB-D data to extend the spatial pyramid approach, aimed at building descriptors that can lead to a more robust representation against changing lighting conditions. The proposed descriptors are evaluated on the RobotVision@ImageCLEF-2013 benchmark dataset, remarkably outperforming state-of-the-art 3D local and global descriptors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Notes

  1. http://www.imageclef.org/2013/robot.

References

  1. Alexandre, L.A.: 3D descriptors for object and category recognition: a comparative evaluation. In: Workshop on Color-Depth Camera Fusion in Robotics at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2012)

  2. Bay, H., Tuytelaars, T., Van Gool, L.: Surf: speeded up robust features. In: Computer Vision—ECCV 2006, pp. 404–417. Springer, New York (2006)

  3. Ben-Chen, M., Gotsman, C.: Characterizing shape using conformal factors. In: 3DOR, pp. 1–8 (2008)

  4. Bo, L., Ren, X., Fox, D.: Depth kernel descriptors for object recognition. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 821–826. IEEE, New York (2011)

  5. Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pp. 1–8 (2008)

  6. Bosch, A., Zisserman, A., Muñoz, X.: Scene classification via pLSA. In: Computer Vision—ECCV 2006, pp. 517–530. Springer, New York (2006)

  7. Bosch, A., Zisserman, A., Muñoz, X.: Image classification using random forests and ferns. In: IEEE 11th International Conference on Computer Vision, 2007. ICCV 2007, pp. 1–8. IEEE, New York (2007)

  8. Bosch, A., Zisserman, A., Muñoz, X.: Scene classification using a hybrid generative/discriminative approach. IEEE Trans. Pattern Anal. Mach. Intell. 30(4), 712–727 (2008)

    Article  Google Scholar 

  9. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(27), 1–27 (2011). http://www.csie.ntu.edu.tw/~cjlin/libsvm

  10. Chapelle, O., Haffner, P., Vapnik, V.: Support vector machines for histogram-based image classification. IEEE Trans. Neural Netw. 10(5), 1055–1064 (1999)

    Article  Google Scholar 

  11. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)

  12. Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, vol. 1, p. 22 (2004)

  13. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, vol. 1, pp. 886–893 (2005)

  14. Endres, F., Hess, J., Engelhard, N., Sturm, J., Cremers, D., Burgard, W.: An evaluation of the RGB-D SLAM system. In: 2012 IEEE International Conference on Robotics and Automation (ICRA), pp. 1691–1696 (2012)

  15. Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, vol. 2, pp. 524–531 (2005)

  16. Filipe, S., Alexandre, L.: A Comparative Evaluation of 3D Keypoint Detectors in a RGB-D Object Dataset, pp. 476–483 (2014)

  17. Garcia, S., Derrac, J., Cano, J.R., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)

    Article  Google Scholar 

  18. Gatzke, T., Grimm, C., Garland, M., Zelinka, S.: Curvature maps for local shape comparison. In: 2005 International Conference Shape Modeling and Applications, pp. 244–253. IEEE, New York (2005)

  19. Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: RGB-D mapping: using kinect-style depth cameras for dense 3D modeling of indoor environments. Int. J. Robot. Res. 31(5), 647–663 (2012)

    Article  Google Scholar 

  20. Horn, B.: Extended Gaussian images. Proc. IEEE 72(12), 1671–1686 (1984)

    Article  Google Scholar 

  21. Krainin, M., Curless, B., Fox, D.: Autonomous generation of complete 3D object models using next best view manipulation planning. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 5031–5037. IEEE, New York (2011)

  22. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

  23. Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 1817–1824 (2011)

  24. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178. IEEE, New York (2006)

  25. Li, J., Allinson, N.M.: A comprehensive review of current local features for computer vision. Neurocomputing 71(10–12), 1771–1787 (2008)

    Article  Google Scholar 

  26. Linde, O., Lindeberg, T.: Object recognition using composed receptive field histograms of higher dimensionality. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol. 2, pp. 1–6. IEEE, New York (2004)

  27. Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  28. Martinez-Gomez, J., Caputo, B.: Towards semi-supervised learning of semantic spatial concepts. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 1936–1943. IEEE, New York (2011)

  29. Martínez-Gómez, J., García-Varea, I., Cazorla, M., Caputo, B.: Overview of the imageCLEF 2013 robot vision task. In: Working Notes for CLEF 2013 Conference, Valencia, 23–26 September 2013 (2013)

  30. Martinez Mozos, O., Stachniss, C., Burgard, W.: Supervised learning of places from range data using AdaBoost. In: Proceedings of the 2005 IEEE International Conference on Robotics and Automation, 2005. ICRA 2005, pp. 1730–1735. IEEE, New York (2005)

  31. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005)

    Article  Google Scholar 

  32. Mitchell, T.M.: Machine Learning, 1st edn. McGraw-Hill Inc, New York (1997)

    MATH  Google Scholar 

  33. Nowak, E., Jurie, F., Triggs, B.: Sampling Strategies for bag-of-features image classification. In: Computer Vision ECCV 2006. Lecture Notes in Computer Science, vol. 3954, pp. 490–503. Springer, Berlin (2006)

  34. Osada, R., Funkhouser, T., Chazelle, B., Dobkin, D.: Shape distributions. ACM Trans. Graph. 21(4), 807–832 (2002)

    Article  Google Scholar 

  35. Park, H.S., Jun, C.H.: A simple and fast algorithm for \(K\)-medoids clustering. Expert Syst. Appl. 36(2), 3336–3341 (2009)

    Article  Google Scholar 

  36. Pronobis, A., Martinez Mozos, O., Caputo, B.: SVM-based discriminative accumulation scheme for place recognition. In: IEEE International Conference on Robotics and Automation, 2008. ICRA 2008, pp. 522–529. IEEE, New York (2008)

  37. Pronobis, A., Martínez Mozos, O., Caputo, B., Jensfelt, P.: Multi-modal semantic place classification. Int. J. Robot. Res. (2009). doi:10.1177/0278364909356483

  38. Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 413–420 (2009)

  39. Redondo-Cabrera, C., López-Sastre, R.J., Acevedo-Rodríguez, J., Maldonado-Bascón, S.: Surfing the point clouds: selective 3D spatial pyramids for category-level object recognition. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3458–3465. IEEE, New York (2012)

  40. Redondo-Cabrera, C., López-Sastre, R.J., Acevedo-Rodríguez, J., Maldonado-Bascón, S.: Recognizing in the depth: selective 3D spatial pyramid matching kernel for object and scene categorization. Image Vis. Comput. 32(12), 965–978 (2014)

    Article  Google Scholar 

  41. Ren, X., Bo, L., Fox, D.: RGB-(D) scene labeling: features and algorithms. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2759–2766 (2012)

  42. Romero-González, C.: Clasificación automática de espacios utilizando información visual y de profundidad. Master’s thesis, University of Castilla-La Mancha, Spain (2012)

  43. Rusu, R., Blodow, N., Beetz, M.: Fast point feature histograms (FPFH) for 3D registration. In: IEEE International Conference on Robotics and Automation, 2009. ICRA ’09, pp. 3212–3217 (2009). doi:10.1109/ROBOT.2009.5152473

  44. Rusu, R., Bradski, G., Thibaux, R., Hsu, J.: Fast 3D recognition and pose using the viewpoint feature histogram. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2155–2162 (2010). doi:10.1109/IROS.2010.5651280

  45. Rusu, R., Marton, Z., Blodow, N., Beetz, M.: Learning informative point classes for the acquisition of object model maps. In: 10th International Conference on Control, Automation, Robotics and Vision, 2008. ICARCV 2008, pp. 643–650 (2008)

  46. Rusu, R.B., Cousins, S.: 3D is here: point cloud library (PCL). In: IEEE International Conference on Robotics and Automation (ICRA), Shanghai (2011)

  47. Sinha, A., Banerji, S., Liu, C.: New color GPHOG descriptors for object and scene image classification. Mach. Vis. Appl. 25(2), 361–375 (2014)

    Article  Google Scholar 

  48. Socher, R., Huval, B., Bath, B., Manning, C.D., Ng, A.Y.: Convolutional-recursive deep learning for 3D object classification. In: Advances in Neural Information Processing Systems, pp. 665–673 (2012)

  49. Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-D: A RGB-D scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 567–576 (2015)

  50. Steder, B., Rusu, R.B., Konolige, K., Burgard, W.: Point feature extraction on 3D range scans taking into account object boundaries. In: 2011 IEEE International Conference on Robotics and automation (ICRA), pp. 2601–2608. IEEE, New York (2011)

  51. Stückler, J., Steffens, R., Holz, D., Behnke, S.: Efficient 3D object perception and grasp planning for mobile manipulation in domestic environments. Robot. Auton. Syst. 61(10), 1106–1115 (2013)

    Article  Google Scholar 

  52. Tangelder, J., Veltkamp, R.: A survey of content based 3D shape retrieval methods. Multimed. Tools Appl. 39(3), 441–471 (2008)

    Article  Google Scholar 

  53. Tombari, F., Salti, S., Di Stefano, L.: Unique signatures of histograms for local surface description. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) Computer Vision—ECCV 2010. Lecture Notes in Computer Science, vol. 6313, pp. 356–369. Springer, Berlin (2010)

    Chapter  Google Scholar 

  54. Tombari, F., Salti, S., Di Stefano, L.: Performance evaluation of 3D keypoint detectors. Int. J. Comput. Vis. 102(1–3), 198–220 (2013)

    Article  Google Scholar 

  55. Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.A.: Context-based vision system for place and object recognition. In: Proceedings of the Ninth IEEE International Conference on Computer Vision, 2003, pp. 273–280. IEEE, New York (2003)

  56. Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms. In: Proceedings of the International Conference on Multimedia, pp. 1469–1472. ACM, New York (2010)

  57. Wang, M., Gao, Y., Lu, K., Rui, Y.: View-based discriminative probabilistic modeling for 3D object retrieval and recognition. IEEE Trans. Image Process. 22(4), 1395–1407 (2013)

    Article  MathSciNet  Google Scholar 

  58. Wohlkinger, W., Vincze, M.: Ensemble of shape functions for 3D object classification. In: 2011 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 2987–2992 (2011). doi:10.1109/ROBIO.2011.6181760

  59. Yamauchi, B., Langley, P.: Place recognition in dynamic environments. J. Robot. Syst. 14(2), 107–120 (1997)

    Article  MATH  Google Scholar 

  60. Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the International Workshop on Workshop on Multimedia Information Retrieval, pp. 197–206. ACM, New York (2007)

  61. Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 1794–1801 (2009)

  62. Zhang, J., Marszałek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. Int. J. Comput. Vis. 73(2), 213–238 (2007)

    Article  Google Scholar 

  63. Zhang, M.L., Zhou, Z.H.: A \(k\)-nearest neighbor based algorithm for multi-label classification. In: 2005 IEEE International Conference on Granular Computing, vol. 2, pp. 718–721. IEEE, New York (2005)

  64. Zhong, Y.: Intrinsic shape signatures: a shape descriptor for 3D object recognition. In: 2009 IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops), pp. 689–696 (2009)

  65. Zou, Q., Cao, Y., Li, Q., Mao, Q., Wang, S.: Automatic inpainting by removing fence-like structures in RGBD images. Mach. Vis. Appl. 25(7), 1841–1858 (2014)

    Article  Google Scholar 

Download references

Acknowledgments

This work has been partially funded by FEDER funds and the Spanish Government (MICINN) through project TIN2013-46638-C3-3-P and by Consejería de Educación, Cultura y Deportes of the JCCM regional government through project PPII-2014-015-P. Cristina Romero-González is also funded by the MECD grant FPU12/04387, and Jesus Martínez-Gómez is also funded by the JCCM grant POST2014/8171.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cristina Romero-González.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Romero-González, C., Martínez-Gómez, J., García-Varea, I. et al. 3D spatial pyramid: descriptors generation from point clouds for indoor scene classification. Machine Vision and Applications 27, 263–273 (2016). https://doi.org/10.1007/s00138-015-0744-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-015-0744-4

Keywords

Navigation