Abstract
Remote sensing imaging is an interesting field, particularly in road areas. Road segmentation has become crucial in several areas, such as transportation network optimization, urban planning, and image analysis. We proposed in this study an upgraded mixed-scale UNet network (MAP-UNet) with a multi-head attention mechanism to identify and delineate road networks within aerial images. This upgraded model identifies and delineates road networks within aerial images. Modified MAP-UNet aims to enhance the efficiency of road segmentation through the integration of multi-scale features and attention mechanisms. We performed a comparison using the most recent methods. Our proposed approach achieves recall (76.18%), precision (80.30%), and IoU (63.00%) threshold overtime on the DeepGlobe dataset.
Similar content being viewed by others
Data availability
The datasets analyzed during the current study are available from the Kaggle website(https://www.kaggle.com/datasets/balraj98/deepglobe-road-extraction-dataset).
References
Malarvizhi, K., Kumar, S.V., Porchelvan, P.: Use of high resolution google earth satellite imagery in landuse map preparation for urban related applications. Proc. Technol. 24, 1835–1842 (2016). https://doi.org/10.1016/j.protcy.2016.05.231
Bosurgi, G., Pellegrino, O., Ruggeri, A.: The role of ADAS while driving in complex road contexts: support or overload for drivers? Sustainability 15(2), 1334 (2023). https://doi.org/10.3390/su15021334
Abdollahi, A., Pradhan, B., Shukla, N., Chakraborty, S., Alamri, A.: Deep learning approaches applied to remote sensing datasets for road extraction: a state-of-the-art review. Remote Sens. 12(9), 1444 (2020). https://doi.org/10.3390/rs12091444
Ben Salah, K., Othmani, M., Kherallah, M.: Contactless heart rate estimation from facial video using skin detection and multi-resolution analysis. In: International Conference on Computational Collective Intelligence, pp. 554–563 (2022)
Ben Salah, K., Othmani, M., Kherallah, M.: A novel approach for human skin detection using convolutional neural network. Vis. Comput. 38(5), 1833–1843 (2022)
Azooz, H.J., Ben Salah, K., Kherallah, M.: A novel steganography scheme using logistic map, brisk descriptor, and k-means clustering. In: Pacific-Rim Symposium on Image and Video Technology, pp. 366–379 (2023)
Fourati, J., Othmani, M., Ltifi, H.: A hybrid model based on bidirectional long-short term memory and support vector machine for rest tremor classification. Signal Image Video Process. 16(8), 2175–2182 (2022)
Telli, M., Othmani, M., Ltifi, H.: A new approach to video steganography models with 3d deep CNN autoencoders. Multimed. Tools Appl. (2023). https://doi.org/10.1007/s11042-023-17358-7
Fourati, J., Othmani, M., Ltifi, H.: An improved approach for Parkinson’s disease classification based on convolutional neural network. In: International Conference on Computational Collective Intelligence, pp. 123–135 (2023)
Telli, M., Othmani, M., Ltifi, H.: An improved multi-image steganography model based on deep convolutional neural networks. In: International Conference on Intelligent Systems Design and Applications, pp. 250–262 (2022)
Guennich, A., Othmani, M., Ltifi, H.: An improved model for semantic segmentation of brain lesions using CNN 3D. In: International Conference on Intelligent Systems Design and Applications, pp. 181–189 (2022)
Ben Salah, K., Othmani, M., Kherallah, M.: Long short-term memory based photoplethysmography biometric authentication. In: International Conference on Computational Collective Intelligence, pp. 554–563 (2022)
Guo, Y., Liu, Y., Georgiou, T.: A review of semantic segmentation using deep neural networks. Int. J. Multimed. Inf. Retr. 7, 87–93 (2018). https://doi.org/10.1007/s13735-018-0160-4
Yang, Y., Wang, Y., Zhu, C., Zhu, M., Sun, H., Yan, T.: Mixed-scale UNet based on dense Atrous pyramid for monocular depth estimation. IEEE Access 9, 114070–114084 (2021). https://doi.org/10.1109/ACCESS.2021.3104605
Mattyus, G., Luo, W., Urtasun, R.: Deep roadmapper: extracting road topology from aerial images. In: ICCV, pp. 1-2–5-8 (2017)
Mattyus, G., Urtasun, R.: Matching adversarial networks. In: CVPR, pp. 2–8 (2018)
Batra, A., Singh, S., Pang, G., Basu, S., Jawahar, C., Paluri, M.: Improved road connectivity by joint learning of orientation and segmentation. In: Proceedings of the IEEE/CVF (2019)
Mosinska, A., Marquez-Neila, P., Kozinski, M., Fua, P.: Beyond the pixel-wise loss for topology-aware delineation. In: CVPR, 1-2–7-8 (2018)
Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: NIPS, pp. 2 (2016)
Chaurasia, A., Culurciello, E.: Linknet: Exploiting encoder representations for efficient semantic segmentation. In: VCIP, pp. 2-6–7-8 (2017)
Bastani, F., He, S., Abbar, S., Alizadeh, M., Balakrishnan, H., Chawla, S., Madden, S., DeWitt, D.: Roadtracer: automatic extraction of road networks from aerial images. In: CVPR, pp. 1-2–4-8 (2018)
He, H., Yang, D., Wang, S., Zheng, Y., Wang, S.: Light encoder-decoder network for road extraction of remote sensing images. J. Appl. Remote Sens. 13(3), 034510 (2019). https://doi.org/10.1117/1.JRS.13.034510
Salah, K.B., Othmani, M., Saida, S., Kherallah, M.: Improved approach for semantic segmentation of mbrsc aerial imagery based on transfer learning and modified UNet. In: 2023 International Conference on Cyberworlds (CW), Sousse, Tunisia, pp. 46–53 (2023). https://doi.org/10.1109/CW58918.2023.00017
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 128(2), 336–359 (2019). https://doi.org/10.1007/s11263-019-01228-7
Demir, I., Koperski, K., Lindenbaum, D., Pang, G., Huang, J., Basu, S., Hughes, F., Tuia, D., Raskar, R.: Deepglobe 2018: a challenge to parse the earth through satellite images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 172–181 (2018)
Ma, J., Xu, Z., Zheng, E., Fan, Q.: Accurate road segmentation in remote sensing images using dense residual learning and improved focal loss. J. Phys. Conf. Ser. 1544(1), 012101 (2020). https://doi.org/10.1088/1742-6596/1544/1/012101
Qi, X., Li, K., Liu, P., Zhou, X., Sun, M.: Deep attention and multi-scale networks for accurate remote sensing image segmentation. IEEE Access 8, 146627–146639 (2020). https://doi.org/10.1109/ACCESS.2020.3010195
Lu, X., Zhong, Y., Zheng, Z., Zhang, L.: Gamsnet: globally aware road detection network with multi-scale residual learning. ISPRS J. Photogramm. Remote Sens. 175, 340–352 (2021). https://doi.org/10.1016/j.isprsjprs.2021.08.002
Tang, M., Perazzi, F., Djelouah, A., Ben Ayed, I., Schroers, C., Boykov, Y.: On regularized losses for weakly-supervised CNN segmentation. In: Proceedings of the European Conference on Computer Vision, ECCV, pp. 507–522 (2018). https://doi.org/10.1007/978-3-030-01261-8_31
Lee, H., Jeong, W.-K.: Scribble2label: scribble-supervised cell segmentation via self-generating pseudo-labels with consistency. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 14–23. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59710-8_2
Marin, D., Boykov, Y.: Robust trust region for weakly supervised segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6608–6618 (2021). https://doi.org/10.1109/ICCV51839.2021.00661
Yu, S., Zhang, B., Xiao, J., Lim, E.G.: Structure-consistent weakly supervised salient object detection with local saliency coherence. In: Proceedings of the AAAI Conference on Artificial Intelligence, Palo Alto, CA, USA, AAAI (2021)
Wei, Y., Ji, S.: Scribble-based weakly supervised deep learning for road surface extraction from remote sensing images. IEEE Trans. Geosci. Remote Sens. 60, 1–12 (2021). https://doi.org/10.1109/TGRS.2021.3064099
Zhou, M., Sui, H., Chen, S., Liu, J., Shi, W., Chen, X.: Large-scale road extraction from high-resolution remote sensing images based on a weakly-supervised structural and orientational consistency constraint network. ISPRS J. Photogramm. Remote Sens. 193, 234–251 (2022). https://doi.org/10.1016/j.isprsjprs.2022.09.005
Author information
Authors and Affiliations
Contributions
Authors 1–3 conceived of the presented idea; 1–2 developed the theory and performed the computations; 3 verified the analytical methods; 2–4 encouraged to investigate other state-of-the-art findings and supervised the findings of this work. All authors discussed the results and contributed to the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ben Salah, K., Othmani, M., Fourati, J. et al. Advancing spatial mapping for satellite image road segmentation with multi-head attention. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03431-1
Accepted:
Published:
DOI: https://doi.org/10.1007/s00371-024-03431-1