Automatic semantic maps generation from lexical annotations
The generation of semantic environment representations is still an open problem in robotics. Most of the current proposals are based on metric representations, and incorporate semantic information in a supervised fashion. The purpose of the robot is key in the generation of these representations, which has traditionally reduced the inter-usability of the maps created for different applications. We propose the use of information provided by lexical annotations to generate general-purpose semantic maps from RGB-D images. We exploit the availability of deep learning models suitable for describing any input image by means of lexical labels. Lexical annotations are more appropriate for computing the semantic similarity between images than the state-of-the-art visual descriptors. From these annotations, we perform a bottom-up clustering approach that associates each image with a different category. The use of RGB-D images allows the robot pose associated with each acquisition to be obtained, thus complementing the semantic with the metric information.
KeywordsSemantic map Lexical annotations 3D registration RGB-D data Deep learning
This work has been partially sponsored by the Spanish Ministry of Economy and Competitiveness under grant number TIN2015-65686-C5-3-R, and by the Regional Council of Education, Culture and Sports of Castilla-La Mancha under grant number PPII-2014-015-P. It has been also supported by the Spanish Government DPI2016-76515-R Grant, supported with Feder funds. Cristina Romero-González is funded by the MECD Grant FPU12/04387. José Carlos Rangel is funded by the IFARHU Grant 8-2014-166 of the Republic of Panamá.
- Blanco, J., Fernández-Madrigal, J., & Gonzalez, J. (2007). A new approach for large-scale localization and mapping: Hybrid metric-topological slam. In International conference on robotics and automation (pp. 2061–2067). IEEE.Google Scholar
- Burgard, W., Stachniss, C., Grisetti, G., Steder, B., Kümmerle, R., Dornhege, C., et al. (2009). A comparison of slam algorithms based on a graph of relations. In International conference on intelligent robots and systems (pp. 2089–2095). IEEE.Google Scholar
- Bylow, E., Sturm, J., Kerl, C., Kahl, F., & Cremers, D. (2013). Real-time camera tracking and 3d reconstruction using signed distance functions. In N. Paul, D. Fox, & D. Hsu (Eds.), Robotics: Science and Systems (RSS) (Vol. 9, p. 8). Germany: Berlin. https://doi.org/10.15607/RSS.2013.IX.035.Google Scholar
- Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In British machine vision conference.Google Scholar
- Chen, Z., Lam, O., Jacobson, A., & Milford, M. (2014). Convolutional neural network-based place recognition. arxiv:1411.1509.
- Dai, A., Nießner, M., Zollöfer, M., Izadi, S., & Theobalt, C. (2016). BundleFusion: Real-time globally consistent 3D reconstruction using on-the-fly surface re-integration. arXiv preprint arXiv:1604.01093.
- Endres, F., Hess, J., Engelhard, N., Sturm, J., Cremers, D., Burgard, W. (2012). An evaluation of the RGB-D slam system. In International conference on robotics and automation (pp. 1691–1696). IEEE.Google Scholar
- Galindo, C., Saffiotti, A., Coradeschi, S., Buschka, P., Fernandez-Madrigal, J. A., Gonzalez, J. (2005). Multi-hierarchical semantic maps for mobile robotics. In 2005 IEEE/RSJ international conference on intelligent robots and systems (pp. 2278–2283). https://doi.org/10.1109/IROS.2005.1545511.
- Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on multimedia (pp. 675–678), ACM, New York, NY, USA.Google Scholar
- Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).Google Scholar
- Labbé, M., & Michaud, F. (2011). Memory management for real-time appearance-based loop closure detection. In 2011 IEEE/RSJ international conference on intelligent robots and systems (pp. 1271–1276). https://doi.org/10.1109/IROS.2011.6094602.
- Lin, Y., Liu, T., & Chen, H. (2005). Semantic manifold learning for image retrieval. In Proceedings of the 13th annual ACM international conference on multimedia (pp. 249–258). ACM.Google Scholar
- Liu, Z., & Von Wichert, G. (2013). Applying rule-based context knowledge to build abstract semantic maps of indoor environments. In Intelligent robots and systems (IROS) (pp. 5141–5147). https://doi.org/10.1109/IROS.2013.6697100.
- Maddern, W., Milford, M., & Wyeth, G. (2012). Cat-slam: Probabilistic localisation and mapping using a continuous appearance-based trajectory. In The International Journal of Robotics Research 31 (4), 429–451. https://doi.org/10.1177/0278364912438273.
- Martínez-Gómez, J., Morell, V., Cazorla, M., & García-Varea, I. (2016). Semantic localization in the PCL library. Robotics and Autonomous Systems 75(Part B), 641–648. https://doi.org/10.1016/j.robot.2015.09.006.
- Meshgi, K., & Ishii, S. (2015). Expanding histogram of colors with gridding to improve tracking accuracy. In International conference on machine vision applications (pp. 475–479). IEEE.Google Scholar
- Paul, R., & Newman, P. (2010). Fab-map 3d: Topological mapping with spatial and visual appearance. In 2010 IEEE international conference on robotics and automation (ICRA) (pp. 2649–2656). https://doi.org/10.1109/ROBOT.2010.5509587.
- Rangel, J. C., Martí-nez-Gómez, J., Garcí-a-Varea, I., & Cazorla, M. (2016b). Lextomap: Lexical-based topological mapping. Advanced Robotics, 1–14. https://doi.org/10.1080/01691864.2016.1261045.
- Rituerto, A., Murillo, A., & Guerrero, J. (2014). Semantic labeling for indoor topological mapping using a wearable catadioptric system. Robotics and Autonomous Systems 62(5), 685–695. https://doi.org/10.1016/j.robot.2012.10.002, special Issue Semantic Perception, Mapping and Exploration.
- Sharif Razavian, A., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition. In The IEEE conference on computer vision and pattern recognition (CVPR) workshops.Google Scholar
- Sünderhauf, N., Dayoub, F., Shirazi, S., Upcroft, B., & Milford, M. (2015a). On the performance of convnet features for place recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 4297–4304). https://doi.org/10.1109/IROS.2015.7353986.
- Sünderhauf, N., Shirazi, S., Jacobson, A., Dayoub, F., Pepperell, E., Upcroft, B., et al. (2015b). Place recognition with convnet landmarks: Viewpoint-robust, condition-robust, training-free. In Robotics: Science and systems, auditorium antonianum, Rome. http://eprints.qut.edu.au/84931/.
- Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2014). Going deeper with convolutions. arxiv:1409.4842.
- Thrun, S. (2002). Robotic mapping: A survey. In G. Lakemeyer & B. Nebel (Eds.), Exploring artificial intelligence in the new millennium (pp. 1–35). San Francisco: Morgan Kaufmann.Google Scholar
- Wu, D., Zhu, F., & Shao, L. (2012). One shot learning gesture recognition from rgbd images. In Conference on computer vision and pattern recognition workshops (pp. 7–12). https://doi.org/10.1109/CVPRW.2012.6239179.
- Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 27, pp. 487–495). Red Hook: Curran Associates, Inc.Google Scholar