Skip to main content
Log in

Automatic semantic maps generation from lexical annotations

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

The generation of semantic environment representations is still an open problem in robotics. Most of the current proposals are based on metric representations, and incorporate semantic information in a supervised fashion. The purpose of the robot is key in the generation of these representations, which has traditionally reduced the inter-usability of the maps created for different applications. We propose the use of information provided by lexical annotations to generate general-purpose semantic maps from RGB-D images. We exploit the availability of deep learning models suitable for describing any input image by means of lexical labels. Lexical annotations are more appropriate for computing the semantic similarity between images than the state-of-the-art visual descriptors. From these annotations, we perform a bottom-up clustering approach that associates each image with a different category. The use of RGB-D images allows the robot pose associated with each acquisition to be obtained, thus complementing the semantic with the metric information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Available for download at http://www.rovit.ua.es/dataset/vidrilo/.

References

  • Blanco, J., Fernández-Madrigal, J., & Gonzalez, J. (2007). A new approach for large-scale localization and mapping: Hybrid metric-topological slam. In International conference on robotics and automation (pp. 2061–2067). IEEE.

  • Burgard, W., Stachniss, C., Grisetti, G., Steder, B., Kümmerle, R., Dornhege, C., et al. (2009). A comparison of slam algorithms based on a graph of relations. In International conference on intelligent robots and systems (pp. 2089–2095). IEEE.

  • Bylow, E., Sturm, J., Kerl, C., Kahl, F., & Cremers, D. (2013). Real-time camera tracking and 3d reconstruction using signed distance functions. In N. Paul, D. Fox, & D. Hsu (Eds.), Robotics: Science and Systems (RSS) (Vol. 9, p. 8). Germany: Berlin. https://doi.org/10.15607/RSS.2013.IX.035.

    Google Scholar 

  • Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In British machine vision conference.

  • Chen, Z., Lam, O., Jacobson, A., & Milford, M. (2014). Convolutional neural network-based place recognition. arxiv:1411.1509.

  • Choset, H., & Nagatani, K. (2001). Topological simultaneous localization and mapping (slam): Toward exact localization without explicit localization. IEEE Transactions on Robotics and Automation, 17(2), 125–137.

    Article  Google Scholar 

  • Dai, A., Nießner, M., Zollöfer, M., Izadi, S., & Theobalt, C. (2016). BundleFusion: Real-time globally consistent 3D reconstruction using on-the-fly surface re-integration. arXiv preprint arXiv:1604.01093.

  • Endres, F., Hess, J., Engelhard, N., Sturm, J., Cremers, D., Burgard, W. (2012). An evaluation of the RGB-D slam system. In International conference on robotics and automation (pp. 1691–1696). IEEE.

  • Endres, F., Hess, J., Sturm, J., Cremers, D., & Burgard, W. (2014). 3-d mapping with an RGB-D camera. IEEE Transactions on Robotics, 30(1), 177–187. https://doi.org/10.1109/TRO.2013.2279412.

    Article  Google Scholar 

  • Fuentes-Pacheco, J., Ruiz-Ascencio, J., & Rendón-Mancha, J. (2012). Visual simultaneous localization and mapping: A survey. Artificial Intelligence Review, 43(1), 55–81. https://doi.org/10.1007/s10462-012-9365-8.

    Article  Google Scholar 

  • Galindo, C., Saffiotti, A., Coradeschi, S., Buschka, P., Fernandez-Madrigal, J. A., Gonzalez, J. (2005). Multi-hierarchical semantic maps for mobile robotics. In 2005 IEEE/RSJ international conference on intelligent robots and systems (pp. 2278–2283). https://doi.org/10.1109/IROS.2005.1545511.

  • Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on multimedia (pp. 675–678), ACM, New York, NY, USA.

  • Kostavelis, I., & Gasteratos, A. (2013). Learning spatially semantic representations for cognitive robot navigation. Robotics and Autonomous Systems, 61(12), 1460–1475. https://doi.org/10.1016/j.robot.2013.07.008.

    Article  Google Scholar 

  • Kostavelis, I., & Gasteratos, A. (2015). Semantic mapping for mobile robotics tasks: A survey. Robotics and Autonomous Systems, 66, 86–103. https://doi.org/10.1016/j.robot.2014.12.006.

    Article  Google Scholar 

  • Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).

  • Labbé, M., & Michaud, F. (2011). Memory management for real-time appearance-based loop closure detection. In 2011 IEEE/RSJ international conference on intelligent robots and systems (pp. 1271–1276). https://doi.org/10.1109/IROS.2011.6094602.

  • Lemaire, T., Berger, C., Jung, I., & Lacroix, S. (2007). Vision-based slam: Stereo and monocular approaches. International Journal of Computer Vision, 74(3), 343–364.

    Article  Google Scholar 

  • Lin, Y., Liu, T., & Chen, H. (2005). Semantic manifold learning for image retrieval. In Proceedings of the 13th annual ACM international conference on multimedia (pp. 249–258). ACM.

  • Liu, Z., & Von Wichert, G. (2013). Applying rule-based context knowledge to build abstract semantic maps of indoor environments. In Intelligent robots and systems (IROS) (pp. 5141–5147). https://doi.org/10.1109/IROS.2013.6697100.

  • Maddern, W., Milford, M., & Wyeth, G. (2012). Cat-slam: Probabilistic localisation and mapping using a continuous appearance-based trajectory. In The International Journal of Robotics Research 31 (4), 429–451. https://doi.org/10.1177/0278364912438273.

  • Martínez-Gómez, J., Caputo, B., Cazorla, M., Christensen, H., Fornoni, M., García-Varea, I., et al. (2015). The robot vision challenge. Where are we after 5 editions? IEEE Robotics and Automation Magazine, 22(4), 147–156. https://doi.org/10.1109/MRA.2015.2460931.

    Article  Google Scholar 

  • Martínez-Gomez, J., Cazorla, M., García-Varea, I., & Morell, V. (2015b). ViDRILO: The visual and depth robot indoor localization with objects information dataset. International Journal of Robotics Research, 34(14), 1681–1687. https://doi.org/10.1177/0278364915596058.

    Article  Google Scholar 

  • Martínez-Gómez, J., Morell, V., Cazorla, M., & García-Varea, I. (2016). Semantic localization in the PCL library. Robotics and Autonomous Systems 75(Part B), 641–648. https://doi.org/10.1016/j.robot.2015.09.006.

  • Mozos, O. M., Triebel, R., Jensfelt, P., Rottmann, A., & Burgard, W. (2007). Supervised semantic labeling of places using information extracted from sensor data. Robotics and Autonomous Systems, 55(5), 391–402. https://doi.org/10.1016/j.robot.2006.12.003.

    Article  Google Scholar 

  • Meshgi, K., & Ishii, S. (2015). Expanding histogram of colors with gridding to improve tracking accuracy. In International conference on machine vision applications (pp. 475–479). IEEE.

  • Nieto, J., Guivant, J., & Nebot, E. (2006). Denseslam: Simultaneous localization and dense mapping. The International Journal of Robotics Research, 25(8), 711–744. https://doi.org/10.1177/0278364906067379.

    Article  Google Scholar 

  • Paul, R., & Newman, P. (2010). Fab-map 3d: Topological mapping with spatial and visual appearance. In 2010 IEEE international conference on robotics and automation (ICRA) (pp. 2649–2656). https://doi.org/10.1109/ROBOT.2010.5509587.

  • Pronobis, A., Mozos, O. M., Caputo, B., & Jensfelt, P. (2010). Multi-modal semantic place classification. The International Journal of Robotics Research (IJRR), Special Issue on Robotic Vision, 29(2–3), 298–320. https://doi.org/10.1177/0278364909356483.

    Google Scholar 

  • Rangel, J., Cazorla, M., García-Varea, I., Martínez-Gómez, J., Fromont, E., & Sebban, M. (2016a). Scene classification based on semantic labeling. Advanced Robotics, 30(11–12), 758–769. https://doi.org/10.1080/01691864.2016.1164621.

    Article  Google Scholar 

  • Rangel, J. C., Martí-nez-Gómez, J., Garcí-a-Varea, I., & Cazorla, M. (2016b). Lextomap: Lexical-based topological mapping. Advanced Robotics, 1–14. https://doi.org/10.1080/01691864.2016.1261045.

  • Rituerto, A., Murillo, A., & Guerrero, J. (2014). Semantic labeling for indoor topological mapping using a wearable catadioptric system. Robotics and Autonomous Systems 62(5), 685–695. https://doi.org/10.1016/j.robot.2012.10.002, special Issue Semantic Perception, Mapping and Exploration.

  • Romero, A., & Cazorla, M. (2010). Topological SLAM using omnidirectional images: Merging feature detectors and graph-matching (pp. 464–475). Berlin: Springer. https://doi.org/10.1007/978-3-642-17688-3_43.

    Google Scholar 

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252. https://doi.org/10.1007/s11263-015-0816-y.

    Article  MathSciNet  Google Scholar 

  • Se, S., Lowe, D., & Little, J. (2005). Vision-based global localization and mapping for mobile robots. IEEE Transactions on Robotics, 21(3), 364–375.

    Article  Google Scholar 

  • Sharif Razavian, A., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition. In The IEEE conference on computer vision and pattern recognition (CVPR) workshops.

  • Sünderhauf, N., Dayoub, F., Shirazi, S., Upcroft, B., & Milford, M. (2015a). On the performance of convnet features for place recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 4297–4304). https://doi.org/10.1109/IROS.2015.7353986.

  • Sünderhauf, N., Shirazi, S., Jacobson, A., Dayoub, F., Pepperell, E., Upcroft, B., et al. (2015b). Place recognition with convnet landmarks: Viewpoint-robust, condition-robust, training-free. In Robotics: Science and systems, auditorium antonianum, Rome. http://eprints.qut.edu.au/84931/.

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2014). Going deeper with convolutions. arxiv:1409.4842.

  • Thrun, S., & Leonard, J. (2008). Simultaneous localization and mapping. In B. Siciliano & O. Khatib (Eds.), Springer handbook of robotics (pp. 871–889). Berlin: Springer.

    Chapter  Google Scholar 

  • Thrun, S. (2002). Robotic mapping: A survey. In G. Lakemeyer & B. Nebel (Eds.), Exploring artificial intelligence in the new millennium (pp. 1–35). San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Wang, M. L., & Lin, H. Y. (2011). An extended-hct semantic description for visual place recognition. The International Journal of Robotics Research, 30(11), 1403–1420. https://doi.org/10.1177/0278364911406760.

    Article  Google Scholar 

  • Whelan, T., Salas-Moreno, R. F., Glocker, B., Davison, A. J., & Leutenegger, S. (2016). Elasticfusion: Real-time dense slam and light source estimation. The International Journal of Robotics Research, 35(14), 1697–1716. https://doi.org/10.1177/0278364916669237.

    Article  Google Scholar 

  • Wu, D., Zhu, F., & Shao, L. (2012). One shot learning gesture recognition from rgbd images. In Conference on computer vision and pattern recognition workshops (pp. 7–12). https://doi.org/10.1109/CVPRW.2012.6239179.

  • Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 27, pp. 487–495). Red Hook: Curran Associates, Inc.

Download references

Acknowledgements

This work has been partially sponsored by the Spanish Ministry of Economy and Competitiveness under grant number TIN2015-65686-C5-3-R, and by the Regional Council of Education, Culture and Sports of Castilla-La Mancha under grant number PPII-2014-015-P. It has been also supported by the Spanish Government DPI2016-76515-R Grant, supported with Feder funds. Cristina Romero-González is funded by the MECD Grant FPU12/04387. José Carlos Rangel is funded by the IFARHU Grant 8-2014-166 of the Republic of Panamá.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José Carlos Rangel.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rangel, J.C., Cazorla, M., García-Varea, I. et al. Automatic semantic maps generation from lexical annotations. Auton Robot 43, 697–712 (2019). https://doi.org/10.1007/s10514-018-9723-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-018-9723-8

Keywords

Navigation