Advertisement

Automatic semantic maps generation from lexical annotations

  • José Carlos Rangel
  • Miguel Cazorla
  • Ismael García-Varea
  • Cristina Romero-González
  • Jesus Martínez-Gómez
Article

Abstract

The generation of semantic environment representations is still an open problem in robotics. Most of the current proposals are based on metric representations, and incorporate semantic information in a supervised fashion. The purpose of the robot is key in the generation of these representations, which has traditionally reduced the inter-usability of the maps created for different applications. We propose the use of information provided by lexical annotations to generate general-purpose semantic maps from RGB-D images. We exploit the availability of deep learning models suitable for describing any input image by means of lexical labels. Lexical annotations are more appropriate for computing the semantic similarity between images than the state-of-the-art visual descriptors. From these annotations, we perform a bottom-up clustering approach that associates each image with a different category. The use of RGB-D images allows the robot pose associated with each acquisition to be obtained, thus complementing the semantic with the metric information.

Keywords

Semantic map Lexical annotations 3D registration RGB-D data Deep learning 

Notes

Acknowledgements

This work has been partially sponsored by the Spanish Ministry of Economy and Competitiveness under grant number TIN2015-65686-C5-3-R, and by the Regional Council of Education, Culture and Sports of Castilla-La Mancha under grant number PPII-2014-015-P. It has been also supported by the Spanish Government DPI2016-76515-R Grant, supported with Feder funds. Cristina Romero-González is funded by the MECD Grant FPU12/04387. José Carlos Rangel is funded by the IFARHU Grant 8-2014-166 of the Republic of Panamá.

References

  1. Blanco, J., Fernández-Madrigal, J., & Gonzalez, J. (2007). A new approach for large-scale localization and mapping: Hybrid metric-topological slam. In International conference on robotics and automation (pp. 2061–2067). IEEE.Google Scholar
  2. Burgard, W., Stachniss, C., Grisetti, G., Steder, B., Kümmerle, R., Dornhege, C., et al. (2009). A comparison of slam algorithms based on a graph of relations. In International conference on intelligent robots and systems (pp. 2089–2095). IEEE.Google Scholar
  3. Bylow, E., Sturm, J., Kerl, C., Kahl, F., & Cremers, D. (2013). Real-time camera tracking and 3d reconstruction using signed distance functions. In N. Paul, D. Fox, & D. Hsu (Eds.), Robotics: Science and Systems (RSS) (Vol. 9, p. 8). Germany: Berlin.  https://doi.org/10.15607/RSS.2013.IX.035.Google Scholar
  4. Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In British machine vision conference.Google Scholar
  5. Chen, Z., Lam, O., Jacobson, A., & Milford, M. (2014). Convolutional neural network-based place recognition. arxiv:1411.1509.
  6. Choset, H., & Nagatani, K. (2001). Topological simultaneous localization and mapping (slam): Toward exact localization without explicit localization. IEEE Transactions on Robotics and Automation, 17(2), 125–137.CrossRefGoogle Scholar
  7. Dai, A., Nießner, M., Zollöfer, M., Izadi, S., & Theobalt, C. (2016). BundleFusion: Real-time globally consistent 3D reconstruction using on-the-fly surface re-integration. arXiv preprint arXiv:1604.01093.
  8. Endres, F., Hess, J., Engelhard, N., Sturm, J., Cremers, D., Burgard, W. (2012). An evaluation of the RGB-D slam system. In International conference on robotics and automation (pp. 1691–1696). IEEE.Google Scholar
  9. Endres, F., Hess, J., Sturm, J., Cremers, D., & Burgard, W. (2014). 3-d mapping with an RGB-D camera. IEEE Transactions on Robotics, 30(1), 177–187.  https://doi.org/10.1109/TRO.2013.2279412.CrossRefGoogle Scholar
  10. Fuentes-Pacheco, J., Ruiz-Ascencio, J., & Rendón-Mancha, J. (2012). Visual simultaneous localization and mapping: A survey. Artificial Intelligence Review, 43(1), 55–81.  https://doi.org/10.1007/s10462-012-9365-8.CrossRefGoogle Scholar
  11. Galindo, C., Saffiotti, A., Coradeschi, S., Buschka, P., Fernandez-Madrigal, J. A., Gonzalez, J. (2005). Multi-hierarchical semantic maps for mobile robotics. In 2005 IEEE/RSJ international conference on intelligent robots and systems (pp. 2278–2283).  https://doi.org/10.1109/IROS.2005.1545511.
  12. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on multimedia (pp. 675–678), ACM, New York, NY, USA.Google Scholar
  13. Kostavelis, I., & Gasteratos, A. (2013). Learning spatially semantic representations for cognitive robot navigation. Robotics and Autonomous Systems, 61(12), 1460–1475.  https://doi.org/10.1016/j.robot.2013.07.008.CrossRefGoogle Scholar
  14. Kostavelis, I., & Gasteratos, A. (2015). Semantic mapping for mobile robotics tasks: A survey. Robotics and Autonomous Systems, 66, 86–103.  https://doi.org/10.1016/j.robot.2014.12.006.CrossRefGoogle Scholar
  15. Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).Google Scholar
  16. Labbé, M., & Michaud, F. (2011). Memory management for real-time appearance-based loop closure detection. In 2011 IEEE/RSJ international conference on intelligent robots and systems (pp. 1271–1276).  https://doi.org/10.1109/IROS.2011.6094602.
  17. Lemaire, T., Berger, C., Jung, I., & Lacroix, S. (2007). Vision-based slam: Stereo and monocular approaches. International Journal of Computer Vision, 74(3), 343–364.CrossRefGoogle Scholar
  18. Lin, Y., Liu, T., & Chen, H. (2005). Semantic manifold learning for image retrieval. In Proceedings of the 13th annual ACM international conference on multimedia (pp. 249–258). ACM.Google Scholar
  19. Liu, Z., & Von Wichert, G. (2013). Applying rule-based context knowledge to build abstract semantic maps of indoor environments. In Intelligent robots and systems (IROS) (pp. 5141–5147).  https://doi.org/10.1109/IROS.2013.6697100.
  20. Maddern, W., Milford, M., & Wyeth, G. (2012). Cat-slam: Probabilistic localisation and mapping using a continuous appearance-based trajectory. In The International Journal of Robotics Research 31 (4), 429–451.  https://doi.org/10.1177/0278364912438273.
  21. Martínez-Gómez, J., Caputo, B., Cazorla, M., Christensen, H., Fornoni, M., García-Varea, I., et al. (2015). The robot vision challenge. Where are we after 5 editions? IEEE Robotics and Automation Magazine, 22(4), 147–156.  https://doi.org/10.1109/MRA.2015.2460931.CrossRefGoogle Scholar
  22. Martínez-Gomez, J., Cazorla, M., García-Varea, I., & Morell, V. (2015b). ViDRILO: The visual and depth robot indoor localization with objects information dataset. International Journal of Robotics Research, 34(14), 1681–1687.  https://doi.org/10.1177/0278364915596058.CrossRefGoogle Scholar
  23. Martínez-Gómez, J., Morell, V., Cazorla, M., & García-Varea, I. (2016). Semantic localization in the PCL library. Robotics and Autonomous Systems 75(Part B), 641–648.  https://doi.org/10.1016/j.robot.2015.09.006.
  24. Mozos, O. M., Triebel, R., Jensfelt, P., Rottmann, A., & Burgard, W. (2007). Supervised semantic labeling of places using information extracted from sensor data. Robotics and Autonomous Systems, 55(5), 391–402.  https://doi.org/10.1016/j.robot.2006.12.003.CrossRefGoogle Scholar
  25. Meshgi, K., & Ishii, S. (2015). Expanding histogram of colors with gridding to improve tracking accuracy. In International conference on machine vision applications (pp. 475–479). IEEE.Google Scholar
  26. Nieto, J., Guivant, J., & Nebot, E. (2006). Denseslam: Simultaneous localization and dense mapping. The International Journal of Robotics Research, 25(8), 711–744.  https://doi.org/10.1177/0278364906067379.CrossRefGoogle Scholar
  27. Paul, R., & Newman, P. (2010). Fab-map 3d: Topological mapping with spatial and visual appearance. In 2010 IEEE international conference on robotics and automation (ICRA) (pp. 2649–2656).  https://doi.org/10.1109/ROBOT.2010.5509587.
  28. Pronobis, A., Mozos, O. M., Caputo, B., & Jensfelt, P. (2010). Multi-modal semantic place classification. The International Journal of Robotics Research (IJRR), Special Issue on Robotic Vision, 29(2–3), 298–320.  https://doi.org/10.1177/0278364909356483.Google Scholar
  29. Rangel, J., Cazorla, M., García-Varea, I., Martínez-Gómez, J., Fromont, E., & Sebban, M. (2016a). Scene classification based on semantic labeling. Advanced Robotics, 30(11–12), 758–769.  https://doi.org/10.1080/01691864.2016.1164621.CrossRefGoogle Scholar
  30. Rangel, J. C., Martí-nez-Gómez, J., Garcí-a-Varea, I., & Cazorla, M. (2016b). Lextomap: Lexical-based topological mapping. Advanced Robotics, 1–14.  https://doi.org/10.1080/01691864.2016.1261045.
  31. Rituerto, A., Murillo, A., & Guerrero, J. (2014). Semantic labeling for indoor topological mapping using a wearable catadioptric system. Robotics and Autonomous Systems 62(5), 685–695.  https://doi.org/10.1016/j.robot.2012.10.002, special Issue Semantic Perception, Mapping and Exploration.
  32. Romero, A., & Cazorla, M. (2010). Topological SLAM using omnidirectional images: Merging feature detectors and graph-matching (pp. 464–475). Berlin: Springer.  https://doi.org/10.1007/978-3-642-17688-3_43.Google Scholar
  33. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.  https://doi.org/10.1007/s11263-015-0816-y.MathSciNetCrossRefGoogle Scholar
  34. Se, S., Lowe, D., & Little, J. (2005). Vision-based global localization and mapping for mobile robots. IEEE Transactions on Robotics, 21(3), 364–375.CrossRefGoogle Scholar
  35. Sharif Razavian, A., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition. In The IEEE conference on computer vision and pattern recognition (CVPR) workshops.Google Scholar
  36. Sünderhauf, N., Dayoub, F., Shirazi, S., Upcroft, B., & Milford, M. (2015a). On the performance of convnet features for place recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 4297–4304).  https://doi.org/10.1109/IROS.2015.7353986.
  37. Sünderhauf, N., Shirazi, S., Jacobson, A., Dayoub, F., Pepperell, E., Upcroft, B., et al. (2015b). Place recognition with convnet landmarks: Viewpoint-robust, condition-robust, training-free. In Robotics: Science and systems, auditorium antonianum, Rome. http://eprints.qut.edu.au/84931/.
  38. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2014). Going deeper with convolutions. arxiv:1409.4842.
  39. Thrun, S., & Leonard, J. (2008). Simultaneous localization and mapping. In B. Siciliano & O. Khatib (Eds.), Springer handbook of robotics (pp. 871–889). Berlin: Springer.CrossRefGoogle Scholar
  40. Thrun, S. (2002). Robotic mapping: A survey. In G. Lakemeyer & B. Nebel (Eds.), Exploring artificial intelligence in the new millennium (pp. 1–35). San Francisco: Morgan Kaufmann.Google Scholar
  41. Wang, M. L., & Lin, H. Y. (2011). An extended-hct semantic description for visual place recognition. The International Journal of Robotics Research, 30(11), 1403–1420.  https://doi.org/10.1177/0278364911406760.CrossRefGoogle Scholar
  42. Whelan, T., Salas-Moreno, R. F., Glocker, B., Davison, A. J., & Leutenegger, S. (2016). Elasticfusion: Real-time dense slam and light source estimation. The International Journal of Robotics Research, 35(14), 1697–1716.  https://doi.org/10.1177/0278364916669237.CrossRefGoogle Scholar
  43. Wu, D., Zhu, F., & Shao, L. (2012). One shot learning gesture recognition from rgbd images. In Conference on computer vision and pattern recognition workshops (pp. 7–12).  https://doi.org/10.1109/CVPRW.2012.6239179.
  44. Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 27, pp. 487–495). Red Hook: Curran Associates, Inc.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • José Carlos Rangel
    • 1
    • 2
  • Miguel Cazorla
    • 3
  • Ismael García-Varea
    • 4
  • Cristina Romero-González
    • 4
  • Jesus Martínez-Gómez
    • 4
  1. 1.Institute for Computer ResearchUniversity of AlicanteAlicanteSpain
  2. 2.RobotSISUniversidad Tecnológica de PanamáSantiagoPanama
  3. 3.University of AlicanteAlicanteSpain
  4. 4.Computing Systems DepartmentUniversity of Castilla-La ManchaAlbaceteSpain

Personalised recommendations