Semantic Map Annotation Through UAV Video Analysis Using Deep Learning Models in ROS

  • Efstratios Kakaletsis
  • Maria Tzelepi
  • Pantelis I. Kaplanoglou
  • Charalampos Symeonidis
  • Nikos NikolaidisEmail author
  • Anastasios Tefas
  • Ioannis Pitas
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11296)


Enriching the map of the flight environment with semantic knowledge is a common need for several UAV applications. Safety legislations require no-fly zones near crowded areas that can be indicated by semantic annotations on a geometric map. This work proposes an automatic annotation of 3D maps with crowded areas, by projecting 2D annotations that are derived through visual analysis of UAV video frames. To this aim, a fully convolutional neural network is proposed, in order to comply with the computational restrictions of the application, that can effectively distinguish between crowded and non-crowded scenes based on a regularized multiple-loss training method, and provide semantic heatmaps that are projected on the 3D occupancy grid of Octomap. The projection is based on raycasting and leads to polygonal areas that are geo-localized on the map and could be exported in KML format. Initial qualitative evaluation using both synthetic and real world drone scenes, proves the applicability of the method.


Drone imaging Crowd detection Deep learning FCNN Semantic mapping Octomap ROS 


  1. 1.
    Anand, A., Koppula, H.S., Joachims, T., Saxena, A.: Contextually guided semantic labeling and search for three-dimensional point clouds. Int. J. Robot. Res. 32(1), 19–34 (2013)CrossRefGoogle Scholar
  2. 2.
    Babu Sam, D., Surya, S., Venkatesh Babu, R.: Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5744–5752 (2017)Google Scholar
  3. 3.
    Boominathan, L., Kruthiventi, S.S., Babu, R.V.: CrowdNet: a deep convolutional network for dense crowd counting. In: Proceedings of the 2016 ACM on Multimedia Conference, pp. 640–644. ACM (2016)Google Scholar
  4. 4.
    Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Douglas, D.H., Peucker, T.K.: Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica Int. J. Geogr. Inf. Geovisualization 10(2), 112–122 (1973)CrossRefGoogle Scholar
  6. 6.
    Friedman, S., Pasula, H., Fox, D.: Voronoi random fields: extracting topological structure of indoor environments via place labeling. IJCAI 7, 2109–2114 (2007)Google Scholar
  7. 7.
    Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., Garcia-Rodriguez, J.: A review on deep learning techniques applied to semantic segmentation. arXiv preprint arXiv:1704.06857 (2017)
  8. 8.
    Glassner, A.S.: An Introduction to Ray Tracing. Elsevier, Amsterdam (1989)zbMATHGoogle Scholar
  9. 9.
    Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., Moore, R.: Google earth engine: planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 202, 18–27 (2017)CrossRefGoogle Scholar
  10. 10.
    Hornung, A., Wurm, K.M., Bennewitz, M., Stachniss, C., Burgard, W.: OctoMap: an efficient probabilistic 3D mapping framework based on octrees. Autonom. Robots 34(3), 189–206 (2013)CrossRefGoogle Scholar
  11. 11.
    Kaneko, K., Ohta, N.: 4K applications beyond digital cinema, pp. 133–136. IEEE (2010)Google Scholar
  12. 12.
    Karis, B., Games, E.: Real shading in unreal engine 4. In: Proceedings of Physically Based Shading Theory Practice, pp. 621–635 (2013)Google Scholar
  13. 13.
    Le Cun, B.B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. Advances in Neural Information Processing Systems, vol. 2, pp. 396–404. Morgan Kaufmann Publishers Inc., San Mateo (1990)Google Scholar
  14. 14.
    Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). Scholar
  15. 15.
    Mitsou, N., et al.: Online semantic mapping of urban environments. In: Stachniss, C., Schill, K., Uttal, D. (eds.) Spatial Cognition 2012. LNCS (LNAI), vol. 7463, pp. 54–73. Springer, Heidelberg (2012). Scholar
  16. 16.
    de Nijs, R., Ramos, S., Roig, G., Boix, X., Van Gool, L., Kühnlenz, K.: On-line semantic perception using uncertainty. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4185–4191. IEEE (2012)Google Scholar
  17. 17.
    Pangercic, D., Pitzer, B., Tenorth, M., Beetz, M.: Semantic object maps for robotic housework-representation, acquisition and use, In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4644–4651. IEEE (2012)Google Scholar
  18. 18.
    Polastro, R., Corrêa, F., Cozman, F., Okamoto, J.: Semantic mapping with a probabilistic description logic. In: da Rocha Costa, A.C., Vicari, R.M., Tonidandel, F. (eds.) SBIA 2010. LNCS (LNAI), vol. 6404, pp. 62–71. Springer, Heidelberg (2010). Scholar
  19. 19.
    Pronobis, A., Jensfelt, P.: Large-scale semantic mapping and reasoning with heterogeneous modalities. In: 2012 IEEE International Conference on Robotics and Automation (ICRA), pp. 3515–3522. IEEE (2012)Google Scholar
  20. 20.
    Quigley, M., et al.: ROS: an open-source robot operating system. In: ICRA Workshop on Open Source Software, vol. 3, p. 5. Kobe, Japan (2009)Google Scholar
  21. 21.
    Remolina, E., Kuipers, B.: Towards a general theory of topological maps. Artif. Intell. 152(1), 47–104 (2004)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Roth, S.D.: Ray casting for modeling solids. Comput. Graph. Image Process. 18(2), 109–144 (1982)CrossRefGoogle Scholar
  23. 23.
    Shah, S., Dey, D., Lovett, C., Kapoor, A.: AirSim: high-fidelity visual and physical simulation for autonomous vehicles. In: Field and Service Robotics (2017).
  24. 24.
    Shao, J., Kang, K., Change Loy, C., Wang, X.: Deeply learned attributes for crowded scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4657–4666 (2015)Google Scholar
  25. 25.
    Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4: inception-ResNet and the impact of residual connections on learning. In: AAAI, vol. 4, p. 12 (2017)Google Scholar
  26. 26.
    Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)Google Scholar
  27. 27.
    Tzelepi, M., Tefas, A.: Human crowd detection for drone flight safety using convolutional neural networks. In: 2017 25th European Signal Processing Conference (EUSIPCO), pp. 743–747. IEEE (2017)Google Scholar
  28. 28.
    Tzelepi, M., Tefas, A.: Deep convolutional learning for content based image retrieval. Neurocomputing 275, 2467–2478 (2018)CrossRefGoogle Scholar
  29. 29.
    Zender, H., Mozos, O.M., Jensfelt, P., Kruijff, G.J., Burgard, W.: Conceptual spatial representations for indoor mobile robots. Robot. Autonom. Syst. 56(6), 493–502 (2008)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Efstratios Kakaletsis
    • 1
  • Maria Tzelepi
    • 1
  • Pantelis I. Kaplanoglou
    • 1
  • Charalampos Symeonidis
    • 1
  • Nikos Nikolaidis
    • 1
    Email author
  • Anastasios Tefas
    • 1
  • Ioannis Pitas
    • 1
  1. 1.Aristotle University of ThessalonikiThessalonikiGreece

Personalised recommendations