Skip to main content
Log in

RGB-D based place representation in topological maps

  • Special Issue Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

With the recent developments in sensor technology including Microsoft Kinect, it has now become much easier to augment visual data with three-dimensional depth information. In this paper, we propose a new approach to RGB-D based topological place representation—building on bubble space. While bubble space representation is in principle transparent to the type and number of sensory inputs employed, practically, this has been only verified with visual data that are acquired either via a two degrees of freedom camera head or an omnidirectional camera. The primary contribution of this paper is of practical nature in this perspective. We show that bubble space representation can easily be used to combine RGB and depth data while affording acceptable recognition performance even with limited field of view sensing and simple features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Of course, in future work, we will consider unsupervised learning.

References

  1. Bay, H., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110, 346–359 (2008)

    Article  Google Scholar 

  2. Blumenthal, S., Prassler, E., Fischer, J., Nowak, W.: Towards identification of best practice algorithms in 3D perception and modeling. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 3554–3561 (2011)

  3. Bogdan, R., Blodow, N., Beetz, M.: Fast point feature histograms (FPFH) for 3D registration. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 1848–1853 (2009)

  4. Boros, E., Gînsca, A.L., Iftene, A.: Uaic participation at robot vision @ 2012—an updated vision. In: CLEF (Online Working Notes/Labs/Workshop) (2012)

  5. Bosch, A., Zisserman, A., Munoz, X.: Scene classification using a hybrid generative-discriminative approach. IEEE Trans. Pattern Anal. Mach. Intell. 30(4), 712–727 (2008)

    Article  Google Scholar 

  6. Bozma, H.I., Çakiroglu, G., Soyer, C.: Biologically inspired Cartesian and non-Cartesian filters for attentional sequences. Pattern Recognit. Lett. 24(9—-10), 1261–1274 (2003)

    Article  MATH  Google Scholar 

  7. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1–27 (2011)

    Article  Google Scholar 

  8. Cummins, M., Newman, P.: Appearance-only slam at large scale with FAB-MAP 2.0. Int. J. Rob. Res. 30(9), 1100–1123 (2011)

    Google Scholar 

  9. Davison, A., Reid, I., Molton, N., Stasse, O.: MonoSLAM: Real-time single camera slam. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007)

    Google Scholar 

  10. Erkent, O., Bozma, H.I.: Place representation in topological maps based on bubble space. In: Proceedings of IEEE International Conference on Robototics and Automation, pp. 3497–3502 (2012)

  11. Erkent, O., Bozma, H.I.: Bubble space and place representation in topological maps. Int. J. Robot. Res. 32(6), 671–688 (2013)

    Google Scholar 

  12. Fazl-Ersi, E., Tsotsos, J.K.: Histogram of oriented uniform patterns for robust place recognition and categorization. Int. J. Robot. Res. 31(4), 468–483 (2012)

    Article  Google Scholar 

  13. Fraundorfer, F., Engels, C., Nister, D.: Topological mapping, localization and navigation using image collections. In: Proceedings of IEEE International Conference on IROS, pp. 3872–3387 (2007)

  14. Granström, K., Schön, T., Nieto, J., Ramos, F.: Learning to close loops from range data. Int. J. Robot. Res. 30(14), 1–27 (2011)

    Article  Google Scholar 

  15. Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of AVS88, pp. 147–151 (1988)

  16. Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: RGB-D mapping: Using kinect-style depth cameras for dense 3D modeling of indoor environments. Int. J. Robot. Res. 31(5), 647–663 (2012)

    Article  Google Scholar 

  17. Imageclef 2012 robot vision task dataset. “http://www.imageclef.org/2012/robot

  18. Jogan, M., Leonardis, A.: Robust localization using panoramic view-based recognition. In: Proceedings of 15th International Conference on Pattern Recognition, vol. 4, pp. 136–139 (2000)

  19. Konolige, K., Bowman, J., Chen, J., Mihelich, P., Calonder, M., Lepetit, V., Fua, P.: View-based maps. Int. J. Robot. Res. 29(8), 941–957 (2010)

    Article  Google Scholar 

  20. Kyushu university kinect place recognition database. “http://robotics.ait.kyushu-u.ac.jp/research-e.php?content=db

  21. Lamon, R., Nourbakhsh, I., Jensenl, B., Siegwart, R.: Deriving and matching image fingerprint sequences for mobile robot localization. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 1609–1610 (2001)

  22. Larson, A.M., Loschky, L.C.: The contributions of central versus peripheral vision to scene gist recognition. J. Vis. 9(10), 1–16 (2009)

    Article  Google Scholar 

  23. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2169–2178 (2006)

  24. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  25. Magnusson, M., Andreasson, H., Nuchter, A., Lilienthal, A.: Appearance-based loop detection from 3D laser data using the normal distributions transform. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 23–28 (2009)

  26. Martinez-Gomez, J. Garcia-Varea, I., Caputo, B.: Overview of the imageclef 2012 robot vision task. In: CLEF 2012 Evaluation Labs and Workshop, Online Working, Notes (2012)

  27. Martínez-Gómez, J., García-Varea, I., Caputo, B.: Baseline multimodal place classifier for the 2012 robot vision task. In: CLEF (Online Working Notes/Labs/Workshop) (2012)

  28. Mozos, O., Burgard, W.: Supervised learning of topological maps using semantic information extracted from range data. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 2722–2777 (2006)

  29. Mozos, O.M., Mizutani, H., Kurazume, R., Hasegawa, T.: Categorization of indoor places using the kinect sensor. Sensors (Basel, Switzerland) 12(5), 6695–711 (2012).

    Google Scholar 

  30. Mozos, O.M., Rottmann, A., Triebel, R., Jensfelt, P., Burgard, W.: Semantic labeling of places using information extracted from laser and vision sensor data. In: Proceedings of IEEE IROS Workshop: From sensors to human spatial concepts (2006)

  31. Mozos, O.M., Triebel, R., Jensfelt, P., Rottmann, A., Burgard, W.: Supervised semantic labeling of places using information extracted from sensor data. Robot. Auton. Syst. 55(5), 391–402 (2007)

    Article  Google Scholar 

  32. Murillo, A., Guerrero, J., Sagues, C.: Surf features for efficient robot localization with omnidirectional images. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 3901–3907 (2007)

  33. Nowak, E., Jurie, F., Triggs, B.: Sampling strategies for bag-of-features image classification. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) Computer Vision-ECCV 2006. Lecture notes in computer science, vol. 3954, pp. 490–503. Springer, Berlin (2006)

    Chapter  Google Scholar 

  34. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)

    Article  MATH  Google Scholar 

  35. Perronnin, F., Liu, Y., Sanchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3384–3391 (2010)

  36. Pronobis, A., Caputo, B.: Confidence-based cue integration for visual place recognition. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, 2007, pp. 2394–2401 (2007)

  37. Pronobis, A., Martinez Mozos, O., Caputo, B.: SVM-based discriminative accumulation scheme for place recognition. In: IEEE International Conference on Robotics and Automation, pp. 522–529 (2008)

  38. Pronobis, A., Martinez Mozos, O., Caputo, B., Jensfelt, P.: Multi-modal semantic place classification. Int. J. Robot. Res. 29(2–3), 298–320 (2010)

    Article  Google Scholar 

  39. Qiu, G.: Indexing chromatic and achromatic patterns for content-based colour image retrieval. Pattern Recognit. 35(8), 1675–1686 (2002)

    Article  MATH  Google Scholar 

  40. Redolfi, J., Sánchez, J.: Leveraging robust signatures for mobile robot semantic localization. In: CLEF (Online Working Notes/Labs/Workshop) (2012)

  41. Shi, L., Kodagoda, S., Dissanayake, G.: Laser range data based semantic labeling of places. In: Proceedings of IEEE International Conference on Intelligent Robots and Systems, pp. 5941–5946 (2010)

  42. Shi, L., Kodagoda, S., Ranasinghe, R.: Fast indoor scene classification using 3D point clouds. In: Proceedings of Australasian Conference on Robotics and Automation (2011)

  43. Smith, M., Posner, I., Newman, P.: Adaptive compression for 3D laser data. Int. J. Robot. Res. 30(7), 914–935 (2011)

    Article  Google Scholar 

  44. Smola, A.J., Bartlett, P., Schölkopf, B., Schuurmans, D. (eds.): Probabilistic outputs for support vector machines. Advances in large margin classifiers, pp. 61–74. MIT Press, Massachusetts (1999)

  45. Sousa, P., Araiijo, R., Nunes, U.: Real-time labeling of places using support vector machines. In: IEEE International Symposium on Industrial, Electronics, pp. 2022–2027 (2007)

  46. Steder, B., Grisetti, G., Burgard, W.: Robust place recognition for 3D range data based on point features. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 1400–1405 (2010)

  47. Steder, B., Ruhnke, M., Grzonka, S., Burgard, W.: Place recognition in 3D scans using a combination of bag of words and point feature based relative pose estimation. In: IEEE/RSJ International Conference on Robots and Intelligent Systems, pp. 1249–1255 (2011)

  48. Tolstov, G.P.: Fourier series. Prentice-Hall, New Jersey (1962)

    Google Scholar 

  49. Torralba, A., Murphy, K.P., Freeman, W.T., A.Rubin, M.: Context-based vision system for place and object recognition. IEEE International Conference on Computer Vision, vol. 1, p. 273 (2003)

  50. Ulrich, I., Nourbakhsh, I.: Appearance-based place recognition for topological localization. Proc. IEEE Int. Conf. Robot. Autom. 2, 1023–1029 (2000)

    Google Scholar 

  51. Vasudevan, S., Gächter, S., Nguyen, V.: Cognitive maps for mobile robots: an object based approach. Robot. Autonom. Syst. 55(5), 359–371 (2007)

    Article  Google Scholar 

  52. Wang, L., Chen, J., Yuan, B.: Simplified representation for 3D point cloud data. In: IEEE International Conference on Signal Processing, pp. 1271–1274 (2010)

  53. Wang, M.L., Lin, H.Y.: An extended-HCT semantic description for visual place recognition. Int. J. Robot. Res. 30(11), 1403–1420 (2011)

    Article  Google Scholar 

  54. Weiss, C., Tamimi, H., Masselli, A., Zell, A.: A hybrid approach for vision-based outdoor robot localization using global and local image features. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1047–1052 (2007)

  55. Williams, B., Cummins, M., Neira, J., Newman, P., Reid, I., Tardós, J.: A comparison of loop closing techniques in monocular SLAM. Robot. Autom. Syst. 57(12), 1188–1197 (2009)

    Article  Google Scholar 

  56. Wolf, J., Burgard, W., Burkhardt, H.: Robust vision-based localization by combining an image-retrieval system with Monte Carlo localization. IEEE Trans. Robot. 21(2), 208–216 (2005)

    Article  Google Scholar 

  57. Xing, L., Pronobis, A.: Multi-cue discriminative place recognition. Multilingual information access evaluation II. Multimedia experiments, pp. 315–323. Springer, New York (2010)

    Chapter  Google Scholar 

Download references

Acknowledgments

This work has been supported in part by Bogazici University BAP Project 5720, Tubitak Project EEAG 111E285 and the Turkish State Planning Organization (DPT) under the TAM Project, number 2007K120610.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hakan Karaoğuz.

Appendices

Appendix A: ImageClef performance

In this section, for completeness, experimental results are evaluated based on the scoring used in the ImageClef 2012 challenge [26]. In particular, the score is computed as follows. First, it is initialized to zero. For each correctly classified frame, the score is incremented by one while each misclassified frame decrements the score by one. Unclassified frames have no effect on the score. Learning is varied by considering daylight data only, night data only and both while all testing is done using separate test data acquired in night conditions. The experiments are repeated six times with \( \tau \in \left[ 0.4, 0.9\right] \) with increments of \(0.1\). The resulting scores are as given in Fig. 11. As discussed previously in Sect. 6, the best overall results and scores are achieved using night data in learning. The confidence parameter is optimal for values \(0.5 < \tau < 0.7\). The integrated approach has the highest success rates for all the combinations with visual data as the next best except with only daylight learning. Using only-depth features give the worst results as expected. With a maximum score of 874 for \(\mathcal{L}_4\) and 1,133 for \(\mathcal{L}_5\), RGB-D based bubble space representations rank as \(7\)th and \(5\)th, respectively.

Fig. 11
figure 11

ImageClef 2012 scores for BuS approach. Left-to-right results for daylight-only, night-only and comprehensive daylight-night learning with varying feature sets: top row \({\mathcal L}_1\); second row \({\mathcal L}_2\); third row \({\mathcal L}_3\); fourth row \({\mathcal L}_4\); last row \({\mathcal L}_5\)

Appendix B: Variations in learning and testing

As explained previously, all the experiments are done using the officially given test sequence to be able to compare our results with those of the ImageClef 2012 challenge. In this section, we consider varying the learning and test sets in regards to to robot’s path or illumination conditions.

First, we consider learning (training set 1) and test (training set 2) data that differ in the robot’s path while having same illumination conditions—namely daylight. The results are as given in Fig. 12. As expected, the results of only depth features \({\mathcal L}_3\) are the worst due to dependency of this data on local geometry. Using only visual features \({\mathcal L}_1\) and \({\mathcal L}_2\) lead to significantly better performance. Integrated vision–depth sensing gives the best results, but the results are very close to vision-only sensing. As expected, the additional features in \({\mathcal L}_2\) and \({\mathcal L}_5\) improve the results compared to \({\mathcal L}_1\) and \({\mathcal L}_4\), although the improvement is not that gross. When these results are compared with those of Sect. 6, it is observed that the performance is close to that with night learning and testing. This is expected as in both cases, the illumination conditions are the same.

Fig. 12
figure 12

Precision–recall curves for training 1 vs. training 2 data with varying feature sets. Left for \({\mathcal L}_1{\mathcal L}_3\) and \({\mathcal L}_4\); right for \({\mathcal L}_2, {\mathcal L}_3\) and \({\mathcal L}_5\)

Similar experiments are repeated via varying illumination conditions for learning and testing. While learning is done with night data using the training set 3, testing is done with daylight data using training set 1—which is in contrast to the official test data. In both cases, the robot follows a similar path. The results are as given in Fig. 13. In this case, it is observed that while using only visual features, the extended set does not contribute much to performance and is possibly misleading. Another interesting observation is that the depth-only sensing, despite being independent of illumination conditions, cannot have the edge over visual sensing. Limited field of view and depth range of RGB-D sensors possibly increase the sensitivity of depth data to local geometry as compared to the traditional 2D laser range scanners. Using depth sensors having wider field of view with higher resolution would probably increase performance, but for now, visual sensing continues to be the primary sensing modality for place recognition with robots. Compared to results in Sect. 6, the performance is even worse than daylight learning. We attribute this to the limited visual information of night data that does not allow generalizations. The rich sensory information from visual data is lost when learning is done in night conditions. As observed earlier, learning plays a critical role in recognition performance.

Fig. 13
figure 13

Precision–recall curves for training 3 vs. training 1 data with varying feature sets. Left for \({\mathcal L}_1{\mathcal L}_3\) and \({\mathcal L}_4\); right for \({\mathcal L}_2\), \({\mathcal L}_3\) and \({\mathcal L}_5\)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karaoğuz, H., Erkent, Ö. & Bozma, H.I. RGB-D based place representation in topological maps. Machine Vision and Applications 25, 1913–1927 (2014). https://doi.org/10.1007/s00138-014-0595-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-014-0595-4

Keywords

Navigation