Machine Vision and Applications

, Volume 27, Issue 2, pp 263–273 | Cite as

3D spatial pyramid: descriptors generation from point clouds for indoor scene classification

  • Cristina Romero-González
  • Jesus Martínez-Gómez
  • Ismael García-Varea
  • Luis Rodríguez-Ruiz
Original Paper

Abstract

Traditionally, the indoor scene classification problem has been approached from a 2D image recognition point of view. In most visual scene classification systems, a descriptor for the input image is generated to obtain a suitable representation that includes features related to color, shape or spatial information. Techniques based on the use of a spatial pyramid have proven to be adequate to perform this step. In the past years, on the other hand, 3D sensors have become widely available, which allows to include new information sources to the framework previously described. In this work we rely on RGB-D data to extend the spatial pyramid approach, aimed at building descriptors that can lead to a more robust representation against changing lighting conditions. The proposed descriptors are evaluated on the RobotVision@ImageCLEF-2013 benchmark dataset, remarkably outperforming state-of-the-art 3D local and global descriptors.

Keywords

RGB-D images 3D spatial pyramid Indoor scene classification Feature extraction Descriptor generation 

References

  1. 1.
    Alexandre, L.A.: 3D descriptors for object and category recognition: a comparative evaluation. In: Workshop on Color-Depth Camera Fusion in Robotics at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2012)Google Scholar
  2. 2.
    Bay, H., Tuytelaars, T., Van Gool, L.: Surf: speeded up robust features. In: Computer Vision—ECCV 2006, pp. 404–417. Springer, New York (2006)Google Scholar
  3. 3.
    Ben-Chen, M., Gotsman, C.: Characterizing shape using conformal factors. In: 3DOR, pp. 1–8 (2008)Google Scholar
  4. 4.
    Bo, L., Ren, X., Fox, D.: Depth kernel descriptors for object recognition. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 821–826. IEEE, New York (2011)Google Scholar
  5. 5.
    Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pp. 1–8 (2008)Google Scholar
  6. 6.
    Bosch, A., Zisserman, A., Muñoz, X.: Scene classification via pLSA. In: Computer Vision—ECCV 2006, pp. 517–530. Springer, New York (2006)Google Scholar
  7. 7.
    Bosch, A., Zisserman, A., Muñoz, X.: Image classification using random forests and ferns. In: IEEE 11th International Conference on Computer Vision, 2007. ICCV 2007, pp. 1–8. IEEE, New York (2007)Google Scholar
  8. 8.
    Bosch, A., Zisserman, A., Muñoz, X.: Scene classification using a hybrid generative/discriminative approach. IEEE Trans. Pattern Anal. Mach. Intell. 30(4), 712–727 (2008)CrossRefGoogle Scholar
  9. 9.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(27), 1–27 (2011). http://www.csie.ntu.edu.tw/~cjlin/libsvm
  10. 10.
    Chapelle, O., Haffner, P., Vapnik, V.: Support vector machines for histogram-based image classification. IEEE Trans. Neural Netw. 10(5), 1055–1064 (1999)CrossRefGoogle Scholar
  11. 11.
    Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)Google Scholar
  12. 12.
    Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, vol. 1, p. 22 (2004)Google Scholar
  13. 13.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, vol. 1, pp. 886–893 (2005)Google Scholar
  14. 14.
    Endres, F., Hess, J., Engelhard, N., Sturm, J., Cremers, D., Burgard, W.: An evaluation of the RGB-D SLAM system. In: 2012 IEEE International Conference on Robotics and Automation (ICRA), pp. 1691–1696 (2012)Google Scholar
  15. 15.
    Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, vol. 2, pp. 524–531 (2005)Google Scholar
  16. 16.
    Filipe, S., Alexandre, L.: A Comparative Evaluation of 3D Keypoint Detectors in a RGB-D Object Dataset, pp. 476–483 (2014)Google Scholar
  17. 17.
    Garcia, S., Derrac, J., Cano, J.R., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)CrossRefGoogle Scholar
  18. 18.
    Gatzke, T., Grimm, C., Garland, M., Zelinka, S.: Curvature maps for local shape comparison. In: 2005 International Conference Shape Modeling and Applications, pp. 244–253. IEEE, New York (2005)Google Scholar
  19. 19.
    Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: RGB-D mapping: using kinect-style depth cameras for dense 3D modeling of indoor environments. Int. J. Robot. Res. 31(5), 647–663 (2012)CrossRefGoogle Scholar
  20. 20.
    Horn, B.: Extended Gaussian images. Proc. IEEE 72(12), 1671–1686 (1984)CrossRefGoogle Scholar
  21. 21.
    Krainin, M., Curless, B., Fox, D.: Autonomous generation of complete 3D object models using next best view manipulation planning. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 5031–5037. IEEE, New York (2011)Google Scholar
  22. 22.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  23. 23.
    Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 1817–1824 (2011)Google Scholar
  24. 24.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178. IEEE, New York (2006)Google Scholar
  25. 25.
    Li, J., Allinson, N.M.: A comprehensive review of current local features for computer vision. Neurocomputing 71(10–12), 1771–1787 (2008)CrossRefGoogle Scholar
  26. 26.
    Linde, O., Lindeberg, T.: Object recognition using composed receptive field histograms of higher dimensionality. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol. 2, pp. 1–6. IEEE, New York (2004)Google Scholar
  27. 27.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  28. 28.
    Martinez-Gomez, J., Caputo, B.: Towards semi-supervised learning of semantic spatial concepts. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 1936–1943. IEEE, New York (2011)Google Scholar
  29. 29.
    Martínez-Gómez, J., García-Varea, I., Cazorla, M., Caputo, B.: Overview of the imageCLEF 2013 robot vision task. In: Working Notes for CLEF 2013 Conference, Valencia, 23–26 September 2013 (2013)Google Scholar
  30. 30.
    Martinez Mozos, O., Stachniss, C., Burgard, W.: Supervised learning of places from range data using AdaBoost. In: Proceedings of the 2005 IEEE International Conference on Robotics and Automation, 2005. ICRA 2005, pp. 1730–1735. IEEE, New York (2005)Google Scholar
  31. 31.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005)CrossRefGoogle Scholar
  32. 32.
    Mitchell, T.M.: Machine Learning, 1st edn. McGraw-Hill Inc, New York (1997)MATHGoogle Scholar
  33. 33.
    Nowak, E., Jurie, F., Triggs, B.: Sampling Strategies for bag-of-features image classification. In: Computer Vision ECCV 2006. Lecture Notes in Computer Science, vol. 3954, pp. 490–503. Springer, Berlin (2006)Google Scholar
  34. 34.
    Osada, R., Funkhouser, T., Chazelle, B., Dobkin, D.: Shape distributions. ACM Trans. Graph. 21(4), 807–832 (2002)CrossRefGoogle Scholar
  35. 35.
    Park, H.S., Jun, C.H.: A simple and fast algorithm for \(K\)-medoids clustering. Expert Syst. Appl. 36(2), 3336–3341 (2009)CrossRefGoogle Scholar
  36. 36.
    Pronobis, A., Martinez Mozos, O., Caputo, B.: SVM-based discriminative accumulation scheme for place recognition. In: IEEE International Conference on Robotics and Automation, 2008. ICRA 2008, pp. 522–529. IEEE, New York (2008)Google Scholar
  37. 37.
    Pronobis, A., Martínez Mozos, O., Caputo, B., Jensfelt, P.: Multi-modal semantic place classification. Int. J. Robot. Res. (2009). doi:10.1177/0278364909356483
  38. 38.
    Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 413–420 (2009)Google Scholar
  39. 39.
    Redondo-Cabrera, C., López-Sastre, R.J., Acevedo-Rodríguez, J., Maldonado-Bascón, S.: Surfing the point clouds: selective 3D spatial pyramids for category-level object recognition. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3458–3465. IEEE, New York (2012)Google Scholar
  40. 40.
    Redondo-Cabrera, C., López-Sastre, R.J., Acevedo-Rodríguez, J., Maldonado-Bascón, S.: Recognizing in the depth: selective 3D spatial pyramid matching kernel for object and scene categorization. Image Vis. Comput. 32(12), 965–978 (2014)CrossRefGoogle Scholar
  41. 41.
    Ren, X., Bo, L., Fox, D.: RGB-(D) scene labeling: features and algorithms. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2759–2766 (2012)Google Scholar
  42. 42.
    Romero-González, C.: Clasificación automática de espacios utilizando información visual y de profundidad. Master’s thesis, University of Castilla-La Mancha, Spain (2012)Google Scholar
  43. 43.
    Rusu, R., Blodow, N., Beetz, M.: Fast point feature histograms (FPFH) for 3D registration. In: IEEE International Conference on Robotics and Automation, 2009. ICRA ’09, pp. 3212–3217 (2009). doi:10.1109/ROBOT.2009.5152473
  44. 44.
    Rusu, R., Bradski, G., Thibaux, R., Hsu, J.: Fast 3D recognition and pose using the viewpoint feature histogram. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2155–2162 (2010). doi:10.1109/IROS.2010.5651280
  45. 45.
    Rusu, R., Marton, Z., Blodow, N., Beetz, M.: Learning informative point classes for the acquisition of object model maps. In: 10th International Conference on Control, Automation, Robotics and Vision, 2008. ICARCV 2008, pp. 643–650 (2008)Google Scholar
  46. 46.
    Rusu, R.B., Cousins, S.: 3D is here: point cloud library (PCL). In: IEEE International Conference on Robotics and Automation (ICRA), Shanghai (2011)Google Scholar
  47. 47.
    Sinha, A., Banerji, S., Liu, C.: New color GPHOG descriptors for object and scene image classification. Mach. Vis. Appl. 25(2), 361–375 (2014)CrossRefGoogle Scholar
  48. 48.
    Socher, R., Huval, B., Bath, B., Manning, C.D., Ng, A.Y.: Convolutional-recursive deep learning for 3D object classification. In: Advances in Neural Information Processing Systems, pp. 665–673 (2012)Google Scholar
  49. 49.
    Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-D: A RGB-D scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 567–576 (2015)Google Scholar
  50. 50.
    Steder, B., Rusu, R.B., Konolige, K., Burgard, W.: Point feature extraction on 3D range scans taking into account object boundaries. In: 2011 IEEE International Conference on Robotics and automation (ICRA), pp. 2601–2608. IEEE, New York (2011)Google Scholar
  51. 51.
    Stückler, J., Steffens, R., Holz, D., Behnke, S.: Efficient 3D object perception and grasp planning for mobile manipulation in domestic environments. Robot. Auton. Syst. 61(10), 1106–1115 (2013)CrossRefGoogle Scholar
  52. 52.
    Tangelder, J., Veltkamp, R.: A survey of content based 3D shape retrieval methods. Multimed. Tools Appl. 39(3), 441–471 (2008)CrossRefGoogle Scholar
  53. 53.
    Tombari, F., Salti, S., Di Stefano, L.: Unique signatures of histograms for local surface description. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) Computer Vision—ECCV 2010. Lecture Notes in Computer Science, vol. 6313, pp. 356–369. Springer, Berlin (2010)CrossRefGoogle Scholar
  54. 54.
    Tombari, F., Salti, S., Di Stefano, L.: Performance evaluation of 3D keypoint detectors. Int. J. Comput. Vis. 102(1–3), 198–220 (2013)CrossRefGoogle Scholar
  55. 55.
    Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.A.: Context-based vision system for place and object recognition. In: Proceedings of the Ninth IEEE International Conference on Computer Vision, 2003, pp. 273–280. IEEE, New York (2003)Google Scholar
  56. 56.
    Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms. In: Proceedings of the International Conference on Multimedia, pp. 1469–1472. ACM, New York (2010)Google Scholar
  57. 57.
    Wang, M., Gao, Y., Lu, K., Rui, Y.: View-based discriminative probabilistic modeling for 3D object retrieval and recognition. IEEE Trans. Image Process. 22(4), 1395–1407 (2013)CrossRefMathSciNetGoogle Scholar
  58. 58.
    Wohlkinger, W., Vincze, M.: Ensemble of shape functions for 3D object classification. In: 2011 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 2987–2992 (2011). doi:10.1109/ROBIO.2011.6181760
  59. 59.
    Yamauchi, B., Langley, P.: Place recognition in dynamic environments. J. Robot. Syst. 14(2), 107–120 (1997)CrossRefMATHGoogle Scholar
  60. 60.
    Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the International Workshop on Workshop on Multimedia Information Retrieval, pp. 197–206. ACM, New York (2007)Google Scholar
  61. 61.
    Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 1794–1801 (2009)Google Scholar
  62. 62.
    Zhang, J., Marszałek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. Int. J. Comput. Vis. 73(2), 213–238 (2007)CrossRefGoogle Scholar
  63. 63.
    Zhang, M.L., Zhou, Z.H.: A \(k\)-nearest neighbor based algorithm for multi-label classification. In: 2005 IEEE International Conference on Granular Computing, vol. 2, pp. 718–721. IEEE, New York (2005)Google Scholar
  64. 64.
    Zhong, Y.: Intrinsic shape signatures: a shape descriptor for 3D object recognition. In: 2009 IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops), pp. 689–696 (2009)Google Scholar
  65. 65.
    Zou, Q., Cao, Y., Li, Q., Mao, Q., Wang, S.: Automatic inpainting by removing fence-like structures in RGBD images. Mach. Vis. Appl. 25(7), 1841–1858 (2014)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Cristina Romero-González
    • 1
  • Jesus Martínez-Gómez
    • 1
    • 2
  • Ismael García-Varea
    • 1
  • Luis Rodríguez-Ruiz
    • 1
  1. 1.Departamento de Sistemas InformáticosUniversidad de Castilla-La ManchaAlbaceteSpain
  2. 2.Departamento de Ciencia de la Computación e Inteligencia ArtificialUniversidad de AlicanteAlicanteSpain

Personalised recommendations