RGB-D based place representation in topological maps

Karaoğuz, Hakan; Erkent, Özgür; Bozma, H. Işıl

doi:10.1007/s00138-014-0595-4

RGB-D based place representation in topological maps

Special Issue Paper
Published: 07 March 2014

Volume 25, pages 1913–1927, (2014)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Hakan Karaoğuz¹,
Özgür Erkent¹ &
H. Işıl Bozma¹

565 Accesses
6 Citations
Explore all metrics

Abstract

With the recent developments in sensor technology including Microsoft Kinect, it has now become much easier to augment visual data with three-dimensional depth information. In this paper, we propose a new approach to RGB-D based topological place representation—building on bubble space. While bubble space representation is in principle transparent to the type and number of sensory inputs employed, practically, this has been only verified with visual data that are acquired either via a two degrees of freedom camera head or an omnidirectional camera. The primary contribution of this paper is of practical nature in this perspective. We show that bubble space representation can easily be used to combine RGB and depth data while affording acceptable recognition performance even with limited field of view sensing and simple features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RGB-D datasets using microsoft kinect or similar sensors: a survey

Article Open access 19 March 2016

Integrating Cue Descriptors in Bubble Space for Place Recognition

3D Computer Vision: From Points to Concepts

Notes

Of course, in future work, we will consider unsupervised learning.

References

Bay, H., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110, 346–359 (2008)
Article Google Scholar
Blumenthal, S., Prassler, E., Fischer, J., Nowak, W.: Towards identification of best practice algorithms in 3D perception and modeling. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 3554–3561 (2011)
Bogdan, R., Blodow, N., Beetz, M.: Fast point feature histograms (FPFH) for 3D registration. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 1848–1853 (2009)
Boros, E., Gînsca, A.L., Iftene, A.: Uaic participation at robot vision @ 2012—an updated vision. In: CLEF (Online Working Notes/Labs/Workshop) (2012)
Bosch, A., Zisserman, A., Munoz, X.: Scene classification using a hybrid generative-discriminative approach. IEEE Trans. Pattern Anal. Mach. Intell. 30(4), 712–727 (2008)
Article Google Scholar
Bozma, H.I., Çakiroglu, G., Soyer, C.: Biologically inspired Cartesian and non-Cartesian filters for attentional sequences. Pattern Recognit. Lett. 24(9—-10), 1261–1274 (2003)
Article MATH Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1–27 (2011)
Article Google Scholar
Cummins, M., Newman, P.: Appearance-only slam at large scale with FAB-MAP 2.0. Int. J. Rob. Res. 30(9), 1100–1123 (2011)
Google Scholar
Davison, A., Reid, I., Molton, N., Stasse, O.: MonoSLAM: Real-time single camera slam. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007)
Google Scholar
Erkent, O., Bozma, H.I.: Place representation in topological maps based on bubble space. In: Proceedings of IEEE International Conference on Robototics and Automation, pp. 3497–3502 (2012)
Erkent, O., Bozma, H.I.: Bubble space and place representation in topological maps. Int. J. Robot. Res. 32(6), 671–688 (2013)
Google Scholar
Fazl-Ersi, E., Tsotsos, J.K.: Histogram of oriented uniform patterns for robust place recognition and categorization. Int. J. Robot. Res. 31(4), 468–483 (2012)
Article Google Scholar
Fraundorfer, F., Engels, C., Nister, D.: Topological mapping, localization and navigation using image collections. In: Proceedings of IEEE International Conference on IROS, pp. 3872–3387 (2007)
Granström, K., Schön, T., Nieto, J., Ramos, F.: Learning to close loops from range data. Int. J. Robot. Res. 30(14), 1–27 (2011)
Article Google Scholar
Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of AVS88, pp. 147–151 (1988)
Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: RGB-D mapping: Using kinect-style depth cameras for dense 3D modeling of indoor environments. Int. J. Robot. Res. 31(5), 647–663 (2012)
Article Google Scholar
Imageclef 2012 robot vision task dataset. “http://www.imageclef.org/2012/robot”
Jogan, M., Leonardis, A.: Robust localization using panoramic view-based recognition. In: Proceedings of 15th International Conference on Pattern Recognition, vol. 4, pp. 136–139 (2000)
Konolige, K., Bowman, J., Chen, J., Mihelich, P., Calonder, M., Lepetit, V., Fua, P.: View-based maps. Int. J. Robot. Res. 29(8), 941–957 (2010)
Article Google Scholar
Kyushu university kinect place recognition database. “http://robotics.ait.kyushu-u.ac.jp/research-e.php?content=db”
Lamon, R., Nourbakhsh, I., Jensenl, B., Siegwart, R.: Deriving and matching image fingerprint sequences for mobile robot localization. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 1609–1610 (2001)
Larson, A.M., Loschky, L.C.: The contributions of central versus peripheral vision to scene gist recognition. J. Vis. 9(10), 1–16 (2009)
Article Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2169–2178 (2006)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Magnusson, M., Andreasson, H., Nuchter, A., Lilienthal, A.: Appearance-based loop detection from 3D laser data using the normal distributions transform. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 23–28 (2009)
Martinez-Gomez, J. Garcia-Varea, I., Caputo, B.: Overview of the imageclef 2012 robot vision task. In: CLEF 2012 Evaluation Labs and Workshop, Online Working, Notes (2012)
Martínez-Gómez, J., García-Varea, I., Caputo, B.: Baseline multimodal place classifier for the 2012 robot vision task. In: CLEF (Online Working Notes/Labs/Workshop) (2012)
Mozos, O., Burgard, W.: Supervised learning of topological maps using semantic information extracted from range data. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 2722–2777 (2006)
Mozos, O.M., Mizutani, H., Kurazume, R., Hasegawa, T.: Categorization of indoor places using the kinect sensor. Sensors (Basel, Switzerland) 12(5), 6695–711 (2012).
Google Scholar
Mozos, O.M., Rottmann, A., Triebel, R., Jensfelt, P., Burgard, W.: Semantic labeling of places using information extracted from laser and vision sensor data. In: Proceedings of IEEE IROS Workshop: From sensors to human spatial concepts (2006)
Mozos, O.M., Triebel, R., Jensfelt, P., Rottmann, A., Burgard, W.: Supervised semantic labeling of places using information extracted from sensor data. Robot. Auton. Syst. 55(5), 391–402 (2007)
Article Google Scholar
Murillo, A., Guerrero, J., Sagues, C.: Surf features for efficient robot localization with omnidirectional images. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 3901–3907 (2007)
Nowak, E., Jurie, F., Triggs, B.: Sampling strategies for bag-of-features image classification. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) Computer Vision-ECCV 2006. Lecture notes in computer science, vol. 3954, pp. 490–503. Springer, Berlin (2006)
Chapter Google Scholar
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Article MATH Google Scholar
Perronnin, F., Liu, Y., Sanchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3384–3391 (2010)
Pronobis, A., Caputo, B.: Confidence-based cue integration for visual place recognition. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, 2007, pp. 2394–2401 (2007)
Pronobis, A., Martinez Mozos, O., Caputo, B.: SVM-based discriminative accumulation scheme for place recognition. In: IEEE International Conference on Robotics and Automation, pp. 522–529 (2008)
Pronobis, A., Martinez Mozos, O., Caputo, B., Jensfelt, P.: Multi-modal semantic place classification. Int. J. Robot. Res. 29(2–3), 298–320 (2010)
Article Google Scholar
Qiu, G.: Indexing chromatic and achromatic patterns for content-based colour image retrieval. Pattern Recognit. 35(8), 1675–1686 (2002)
Article MATH Google Scholar
Redolfi, J., Sánchez, J.: Leveraging robust signatures for mobile robot semantic localization. In: CLEF (Online Working Notes/Labs/Workshop) (2012)
Shi, L., Kodagoda, S., Dissanayake, G.: Laser range data based semantic labeling of places. In: Proceedings of IEEE International Conference on Intelligent Robots and Systems, pp. 5941–5946 (2010)
Shi, L., Kodagoda, S., Ranasinghe, R.: Fast indoor scene classification using 3D point clouds. In: Proceedings of Australasian Conference on Robotics and Automation (2011)
Smith, M., Posner, I., Newman, P.: Adaptive compression for 3D laser data. Int. J. Robot. Res. 30(7), 914–935 (2011)
Article Google Scholar
Smola, A.J., Bartlett, P., Schölkopf, B., Schuurmans, D. (eds.): Probabilistic outputs for support vector machines. Advances in large margin classifiers, pp. 61–74. MIT Press, Massachusetts (1999)
Sousa, P., Araiijo, R., Nunes, U.: Real-time labeling of places using support vector machines. In: IEEE International Symposium on Industrial, Electronics, pp. 2022–2027 (2007)
Steder, B., Grisetti, G., Burgard, W.: Robust place recognition for 3D range data based on point features. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 1400–1405 (2010)
Steder, B., Ruhnke, M., Grzonka, S., Burgard, W.: Place recognition in 3D scans using a combination of bag of words and point feature based relative pose estimation. In: IEEE/RSJ International Conference on Robots and Intelligent Systems, pp. 1249–1255 (2011)
Tolstov, G.P.: Fourier series. Prentice-Hall, New Jersey (1962)
Google Scholar
Torralba, A., Murphy, K.P., Freeman, W.T., A.Rubin, M.: Context-based vision system for place and object recognition. IEEE International Conference on Computer Vision, vol. 1, p. 273 (2003)
Ulrich, I., Nourbakhsh, I.: Appearance-based place recognition for topological localization. Proc. IEEE Int. Conf. Robot. Autom. 2, 1023–1029 (2000)
Google Scholar
Vasudevan, S., Gächter, S., Nguyen, V.: Cognitive maps for mobile robots: an object based approach. Robot. Autonom. Syst. 55(5), 359–371 (2007)
Article Google Scholar
Wang, L., Chen, J., Yuan, B.: Simplified representation for 3D point cloud data. In: IEEE International Conference on Signal Processing, pp. 1271–1274 (2010)
Wang, M.L., Lin, H.Y.: An extended-HCT semantic description for visual place recognition. Int. J. Robot. Res. 30(11), 1403–1420 (2011)
Article Google Scholar
Weiss, C., Tamimi, H., Masselli, A., Zell, A.: A hybrid approach for vision-based outdoor robot localization using global and local image features. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1047–1052 (2007)
Williams, B., Cummins, M., Neira, J., Newman, P., Reid, I., Tardós, J.: A comparison of loop closing techniques in monocular SLAM. Robot. Autom. Syst. 57(12), 1188–1197 (2009)
Article Google Scholar
Wolf, J., Burgard, W., Burkhardt, H.: Robust vision-based localization by combining an image-retrieval system with Monte Carlo localization. IEEE Trans. Robot. 21(2), 208–216 (2005)
Article Google Scholar
Xing, L., Pronobis, A.: Multi-cue discriminative place recognition. Multilingual information access evaluation II. Multimedia experiments, pp. 315–323. Springer, New York (2010)
Chapter Google Scholar

Download references

Acknowledgments

This work has been supported in part by Bogazici University BAP Project 5720, Tubitak Project EEAG 111E285 and the Turkish State Planning Organization (DPT) under the TAM Project, number 2007K120610.

Author information

Authors and Affiliations

Intelligent Systems Laboratory, Department of Electrical and Electronic Engineering, Boğazici University, Bebek, 34342, Istanbul, Turkey
Hakan Karaoğuz, Özgür Erkent & H. Işıl Bozma

Authors

Hakan Karaoğuz
View author publications
You can also search for this author in PubMed Google Scholar
Özgür Erkent
View author publications
You can also search for this author in PubMed Google Scholar
H. Işıl Bozma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hakan Karaoğuz.

Appendices

Appendix A: ImageClef performance

In this section, for completeness, experimental results are evaluated based on the scoring used in the ImageClef 2012 challenge [26]. In particular, the score is computed as follows. First, it is initialized to zero. For each correctly classified frame, the score is incremented by one while each misclassified frame decrements the score by one. Unclassified frames have no effect on the score. Learning is varied by considering daylight data only, night data only and both while all testing is done using separate test data acquired in night conditions. The experiments are repeated six times with \( \tau \in \left[ 0.4, 0.9\right] \) with increments of \(0.1\). The resulting scores are as given in Fig. 11. As discussed previously in Sect. 6, the best overall results and scores are achieved using night data in learning. The confidence parameter is optimal for values \(0.5 < \tau < 0.7\). The integrated approach has the highest success rates for all the combinations with visual data as the next best except with only daylight learning. Using only-depth features give the worst results as expected. With a maximum score of 874 for \(\mathcal{L}_4\) and 1,133 for \(\mathcal{L}_5\), RGB-D based bubble space representations rank as \(7\)th and \(5\)th, respectively.

Appendix B: Variations in learning and testing

As explained previously, all the experiments are done using the officially given test sequence to be able to compare our results with those of the ImageClef 2012 challenge. In this section, we consider varying the learning and test sets in regards to to robot’s path or illumination conditions.

First, we consider learning (training set 1) and test (training set 2) data that differ in the robot’s path while having same illumination conditions—namely daylight. The results are as given in Fig. 12. As expected, the results of only depth features \({\mathcal L}_3\) are the worst due to dependency of this data on local geometry. Using only visual features \({\mathcal L}_1\) and \({\mathcal L}_2\) lead to significantly better performance. Integrated vision–depth sensing gives the best results, but the results are very close to vision-only sensing. As expected, the additional features in \({\mathcal L}_2\) and \({\mathcal L}_5\) improve the results compared to \({\mathcal L}_1\) and \({\mathcal L}_4\), although the improvement is not that gross. When these results are compared with those of Sect. 6, it is observed that the performance is close to that with night learning and testing. This is expected as in both cases, the illumination conditions are the same.

Similar experiments are repeated via varying illumination conditions for learning and testing. While learning is done with night data using the training set 3, testing is done with daylight data using training set 1—which is in contrast to the official test data. In both cases, the robot follows a similar path. The results are as given in Fig. 13. In this case, it is observed that while using only visual features, the extended set does not contribute much to performance and is possibly misleading. Another interesting observation is that the depth-only sensing, despite being independent of illumination conditions, cannot have the edge over visual sensing. Limited field of view and depth range of RGB-D sensors possibly increase the sensitivity of depth data to local geometry as compared to the traditional 2D laser range scanners. Using depth sensors having wider field of view with higher resolution would probably increase performance, but for now, visual sensing continues to be the primary sensing modality for place recognition with robots. Compared to results in Sect. 6, the performance is even worse than daylight learning. We attribute this to the limited visual information of night data that does not allow generalizations. The rich sensory information from visual data is lost when learning is done in night conditions. As observed earlier, learning plays a critical role in recognition performance.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karaoğuz, H., Erkent, Ö. & Bozma, H.I. RGB-D based place representation in topological maps. Machine Vision and Applications 25, 1913–1927 (2014). https://doi.org/10.1007/s00138-014-0595-4

Download citation

Received: 31 January 2013
Revised: 27 August 2013
Accepted: 14 January 2014
Published: 07 March 2014
Issue Date: November 2014
DOI: https://doi.org/10.1007/s00138-014-0595-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RGB-D based place representation in topological maps

Abstract

Access this article

Similar content being viewed by others

RGB-D datasets using microsoft kinect or similar sensors: a survey

Integrating Cue Descriptors in Bubble Space for Place Recognition

3D Computer Vision: From Points to Concepts

Notes

References

Acknowledgments