Unpaved road segmentation of UAV imagery via a global vision transformer with dilated cross window self-attention for dynamic map

Li, Wensheng; Zhang, Jing; Li, Jiafeng; Zhuo, Li

doi:10.1007/s00371-024-03416-0

Unpaved road segmentation of UAV imagery via a global vision transformer with dilated cross window self-attention for dynamic map

Research
Published: 18 May 2024

(2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Wensheng Li^1,2,
Jing Zhang^1,2,
Jiafeng Li^1,2 &
…
Li Zhuo^1,2

69 Accesses
Explore all metrics

Abstract

Road segmentation is a fundamental task for dynamic map in unmanned aerial vehicle (UAV) path navigation. In unplanned, unknown and even damaged areas, there are usually unpaved roads with blurred edges, deformations and occlusions. These challenges of unpaved road segmentation pose significant challenges to the construction of dynamic maps. Our major contributions have: (1) Inspired by dilated convolution, we propose dilated cross window self-attention (DCWin-Attention), which is composed of a dilated cross window mechanism and a pixel regional module. Our goal is to model the long-range horizontal and vertical road dependencies for unpaved roads with deformation and blurred edges. (2) A shifted cross window mechanism is introduced through coupling with DCWin-Attention to reduce the influence of occluded roads in UAV imagery. In detail, the GVT backbone is constructed by using the DCWin-Attention block for multilevel deep features with global dependency. (3) The unpaved road is segmented with the confidence map generated by fusing the deep features of different levels in a unified perceptual parsing network. We verify our method on the self-established BJUT-URD dataset and public DeepGlobe dataset, which achieves 67.72 and 52.67% of the highest IoU at proper inference efficiencies of 2.7, 2.8 FPS, respectively, demonstrating its effectiveness and superiority in unpaved road segmentation. Our code is available at https://github.com/BJUT-AIVBD/GVT-URS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

YOLO-U: multi-task model for vehicle detection and road segmentation in UAV aerial imagery

Article 04 June 2024

The NITRDrone Dataset to Address the Challenges for Road Extraction from Aerial Images

Article 23 July 2022

AGF-Net: adaptive global feature fusion network for road extraction from remote-sensing images

Article Open access 05 March 2024

Data availability

Data are openly available in a public repository that issues datasets. The data that support the findings of this study are available in DeepGlobe openly at https://www.kaggle.com/datasets/balraj98/deepglobe-road-extraction-dataset and self-built BJUT-URD at https://github.com/BJUT-AIVBD/GVT-URS/tree/main/unpaved-road-dataset.

References

Liu, F., Liu, Y., Nie, Z., Gao, Y.: Precise single-frequency positioning using low-cost receiver with the aid of lane-level map matching for land vehicle navigation. J. Navig. 74(1), 24–37 (2021). https://doi.org/10.1017/S0373463320000375
Article Google Scholar
Zhang, J., Xiu, Y.: Image stitching based on human visual system and SIFT algorithm. Vis. Comput. 40(1), 427–439 (2024). https://doi.org/10.1007/s00371-023-02791-4
Article Google Scholar
Shi, W., Zhu, C.: The line segment match method for extracting road network from high-resolution satellite images. IEEE Trans. Geosci. Remote Sens. 40(2), 511–514 (2002). https://doi.org/10.1109/36.992826
Article Google Scholar
Huang, X., Lu, Q., Zhang, L.: A multi-index learning approach for classification of high-resolution remotely sensed images over urban areas. J. Photogramm. Remote Sens. 90(1), 36–48 (2014). https://doi.org/10.1016/j.isprsjprs.2014.01.008
Article Google Scholar
Zhou, H., Kong, H., Wei, L., Creighton, D., Nahavandi, S.: On detecting road regions in a single UAV image. IEEE Trans. Intell. Transp. Syst. 18(7), 1713–1722 (2017). https://doi.org/10.1109/TITS.2016.2622280
Article Google Scholar
Yang, X., Li, X., Ye, Y., Lau, R.Y.K., Zhang, X., Huang, X.: Road detection and centerline extraction via deep recurrent convolutional neural network U-Net. IEEE Trans. Geosci. Remote Sens. 57(9), 7209–7220 (2019). https://doi.org/10.1109/TGRS.2019.2912301
Article Google Scholar
Li, J., Chen, J., Sheng, B., Li, P., Yang, P., Feng, D.D., Qi, J.: Automatic detection and classification system of domestic waste via multimodel cascaded convolutional neural network. IEEE Trans. Ind. Informat. 18(1), 163–173 (2022). https://doi.org/10.1109/TII.2021.3085669
Article Google Scholar
Xie, Z., Zhang, W., Sheng, B., Li, P., Chen, C.L.P.: BaGFN: Broad attentive graph fusion network for high-order feature interactions. IEEE Trans. Neural Networks Learn. Syst. 34(8), 4499–4513 (2023). https://doi.org/10.1109/TNNLS.2021.3116209
Article Google Scholar
Sheng, B., Li, P., Ali, R., Chen, C.L.P.: Improving video temporal consistency via broad learning system. IEEE Trans. Cybern. 52(7), 6662–6675 (2022). https://doi.org/10.1109/TCYB.2021.3079311
Article Google Scholar
Jiang, N., Sheng, B., Li, P., Lee, T.-Y.: PhotoHelper: Portrait photographing guidance via deep feature retrieval and fusion. IEEE Trans. Multim. 25(1), 2226–2238 (2023). https://doi.org/10.1109/TMM.2022.3144890
Article Google Scholar
Shamsolmoali, P., Zareapoor, M., Zhou, H., Wang, R., Yang, J.: Road segmentation for remote sensing images using adversarial spatial pyramid networks. IEEE Trans. Geosci. Remote Sens. 59(6), 4673–4688 (2021). https://doi.org/10.1109/TGRS.2020.3016086
Article Google Scholar
Soni, P.K., Rajpal, N., Mehta, R.: Semiautomatic road extraction framework based on shape features and LS-SVM from high-resolution images. J. Indian Soc. Remote Sens. 48(1), 513–524 (2020). https://doi.org/10.1007/s12524-019-01077-4
Article Google Scholar
Soni, P.K., Rajpal, N., Mehta, R.: Road network extraction using multi-layered filtering and tensor voting from aerial images. Egypt. J. Remote Sens. Space Sci. 24(2), 211–219 (2021). https://doi.org/10.1016/j.ejrs.2021.01.004
Article Google Scholar
Gong, S., Zhou, H., Xue, F., Fang, C., Li, Y., Zhou, Y.: FastRoadSeg: Fast monocular road segmentation network. IEEE Trans. Intel. Trans. Syst. 23(11), 21505–21514 (2022). https://doi.org/10.1109/TITS.2022.3192473
Article Google Scholar
Zhang, H., Song, Y., Chen, Y., Zhong, H., Liu, L., Wang, Y., Akilan, T., Wu, Q.M.J.: MRSDI-CNN: Multi-model rail surface defect inspection system based on convolutional neural networks. IEEE Trans. Intel. Trans. Syst. 23(8), 11162–11177 (2022). https://doi.org/10.1109/TITS.2021.3101053
Article Google Scholar
Liang, X., Zhang, J., Zhuo, L., Li, Y., Tian, Q.: Small object detection in unmanned aerial vehicle images using feature fusion and scaling-based single shot detector with spatial context analysis. IEEE Trans. Circuits Syst. Video Technol. 30(6), 1758–1770 (2020). https://doi.org/10.1109/TCSVT.2019.2905881
Article Google Scholar
Wang, Y., Park, J.: Transitional asymmetric non-local neural networks for real-world dirt road segmentation. In: Int. Conf. Pattern Recognit., pp. 6949–6956 (2021). https://doi.org/10.1109/ICPR48806.2021.9412882
Li, X., Zhao, Z., Wang, Q.: ABSSNet: Attention-based spatial segmentation network for traffic scene understanding. IEEE Trans. Cyber. 52(9), 9352–9362 (2021). https://doi.org/10.1109/TCYB.2021.3050558
Article Google Scholar
Abdollahi, A., Pradhan, B., Alamri, A.: RoadVecNet: A new approach for simultaneous road network segmentation and vectorization from aerial and google earth imagery in a complex urban set-up. GISci. Remote Sens. 58(7), 1151–1174 (2021). https://doi.org/10.1080/15481603.2021.1972713
Article Google Scholar
Elhassan, M.A., Huang, C., Yang, C., Munea, T.L.: DSANet: Dilated spatial attention for real-time semantic segmentation in urban street scenes. Expert Syst. Appl. 183(1), 1150–1190 (2021). https://doi.org/10.1016/j.eswa.2021.115090
Article Google Scholar
Liu, S., Zhang, H., Shao, L., Yang, J.: Built-in depth-semantic coupled encoding for scene parsing, vehicle detection, and road segmentation. IEEE Trans. Intel. Trans. Syst. 22(9), 5520–5534 (2021). https://doi.org/10.1109/TITS.2020.2987819
Article Google Scholar
Chen, W., Zhou, G., Liu, Z., Li, X., Zheng, X., Wang, L.: NIGAN: A framework for mountain road extraction integrating remote sensing road-scene neighborhood probability enhancements and improved conditional generative adversarial network. IEEE Trans. Geos. Remote Sens. 60(1), 1–15 (2022). https://doi.org/10.1109/TGRS.2022.3188908
Article Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with Transformers. In: Eur. Conf. Comput. Vis., pp. 213–229 (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chen, Z., Qiu, G., Li, P., Zhu, L., Yang, X., Sheng, B.: MNGNAS: Distilling adaptive combination of multiple searched networks for one-shot neural architecture search. IEEE Trans. Pattern Anal. Mach. Intell. 45(11), 13489–13508 (2023). https://doi.org/10.1109/TPAMI.2023.3293885
Article Google Scholar
Chen, T., Jiang, D., Li, R.: Swin Transformers make strong contextual encoders for VHR image road extraction. In: IEEE Int. Geosci. Remote Sens. Symp., pp. 3019–3022 (2022). https://doi.org/10.1109/IGARSS46834.2022.9883628
Ding, L., Lin, D., Lin, S., Zhang, J., Cui, X., Wang, Y., Tang, H., Bruzzone, L.: Looking outside the window: wide-context transformer for the semantic segmentation of high-resolution remote sensing images. IEEE Trans. Geos. Remote Sens. 60(1), 1–13 (2022). https://doi.org/10.1109/TGRS.2022.3168697
Article Google Scholar
Li, A., Jiao, J., Li, N., Qi, W., Xu, W., Pang, M.: Conmw Transformer: A general vision Transformer backbone with merged-window attention. In: Int. Conf. on Image Process., pp. 1551–1555 (2022). https://doi.org/10.1109/ICIP46576.2022.9897179
Gao, L., Liu, H., Yang, M., Chen, L., Wan, Y., Xiao, Z., Qian, Y.: STransFuse: Fusing swin Transformer and convolutional neural network for remote sensing image semantic segmentation. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 14(1), 10990–11003 (2021). https://doi.org/10.1109/JSTARS.2021.3119654
He, X., Zhou, Y., Zhao, J., Zhang, D., Yao, R., Xue, Y.: Swin Transformer embedding UNet for remote sensing image semantic segmentation. IEEE Trans. Geos. Remote Sens. 60(1), 1–15 (2022). https://doi.org/10.1109/TGRS.2022.3144165
Article Google Scholar
Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., Torr, P. H.: Rethinking semantic segmentation from a sequence-to-sequence perspective with Transformers. In: IEEE Conf. Comput. Vis. Pattern Recognit., pp. 6881–6890 (2021). https://doi.org/10.1109/CVPR46437.2021.00681
Lin, X., Sun, S., Huang, W., Sheng, B., Li, P., Feng, D.D.: EAPT: Efficient attention pyramid Transformer for image processing. IEEE Trans. Multim. 25(1), 50–61 (2023). https://doi.org/10.1109/TMM.2021.3120873
Article Google Scholar
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin Transformer: Hierarchical vision Transformer using shifted windows. In: IEEE Int. Conf. Comput. Vis., pp. 10012–10022 (2021). https://doi.org/10.1109/ICCV48922.2021.00986
Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis., pp. 418–434 (2018). https://doi.org/10.1007/978-3-030-01228-1_26
Gao, X., Sun, X., Yan, M., Sun, H., Fu, K., Zhang, Y., Ge, Z.: Road extraction from remote sensing images by multiple feature pyramid network. In: IEEE Int. Geosci. Remote Sens. Symp., pp. 6907–6910 (2018). https://doi.org/10.1109/IGARSS.2018.8519093
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: IEEE Conf. Comput. Vis. Pattern Recognit., pp. 6230–6239 (2017). https://doi.org/10.1109/CVPR.2017.660
Dong, X., Bao, J., Chen, D., Zhang, W., Yu, N., Yuan, L., Chen, D., Guo, B.: CSwin Transformer: A general vision Transformer backbone with cross-shaped windows. In: IEEE Conf. Comput. Vis. Pattern Recognit., pp. 12114–12124 (2022). https://doi.org/10.1109/CVPR52688.2022.01181
Demir, I., Koperski, K., Lindenbaum, D., Pang, G., Huang, J., Basu, S., Hughes, F., Tuia, D., Raskar, R.: DeepGlobe 2018: A challenge to parse the earth through satellite images. In: IEEE Conf. Comput. Vis. Pattern Recognit. Workshop, pp. 172–181 (2018). https://doi.org/10.1109/CVPRW.2018.00031
Yuan, Y., Chen, X., Wang, J.: Object-contextual representations for semantic segmentation. In: Eur. Conf. Comput. Vis., pp. 173–190 (2020). https://doi.org/10.1007/978-3-030-58539-6_11
Chen, L., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-Decoder with atrous separable convolution for semantic image segmentation. In: Eur. Conf. Comput. Vis., pp. 801–818 (2018). https://doi.org/10.1007/978-3-030-01234-2_49
Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: CCNet: Criss-cross attention for semantic segmentation. In: IEEE Int. Conf. Comput. Vis., pp. 603–612 (2019). https://doi.org/10.1109/ICCV.2019.00069
Pan, H., Hong, Y., Sun, W., Jia, Y.: Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes. IEEE Trans. Intell. Transp. Syst. 24(3), 3448–3460 (2023). https://doi.org/10.1109/TITS.2022.3228042
Article Google Scholar
Xu, J., Xiong, Z., Bhattacharyya, S. P.: PIDNet: A real-time semantic segmentation network inspired by PID controllers. In: IEEE Conf. Comput. Vis. Pattern Recognit., pp. 19529–19539 (2023). https://doi.org/10.1109/CVPR52729.2023.01871
Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: Pyramid vision Transformer: A versatile backbone for dense prediction without convolutions. In: Eur. Conf. Comput. Vis., pp. 548–558 (2021). https://doi.org/10.1109/ICCV48922.2021.00061
Yu, Q., Xia, Y., Bai, Y., Lu, Y., Yuille, A. L., Shen, W.: Glance-and-gaze vision Transformer. In: Adv. Neural Inf. Process. Syst., pp. 12990–13003 (2021). https://doi.org/10.48550/arXiv.2106.02277
Hassani, A., Walton, S., Li, J., Li, S., Shi, H.: Neighborhood attention Transformer. In: IEEE Conf. Comput. Vis. Pattern Recognit., pp. 6185–6194 (2023). https://doi.org/10.1109/CVPR52729.2023.00599

Download references

Funding

This work is supported by National Natural Science Foundation of China (No. 62371015), Beijing Natural Science Foundation (No. L211017).

Author information

Authors and Affiliations

Faculty of Information Technology, Beijing University of Technology, 100 Pingleyuan, Chao Yang District, Beijing, 100124, China
Wensheng Li, Jing Zhang, Jiafeng Li & Li Zhuo
Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China
Wensheng Li, Jing Zhang, Jiafeng Li & Li Zhuo

Authors

Wensheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Jing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jiafeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhuo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Wensheng Li was involved in investigation, methodology, data curation, writing—original draft, visualization and validation. Jing Zhang contributed to methodology, conceptualization, supervision and writing—review and editing. Jiafeng Li was involved in resources, funding acquisition and methodology. Li Zhuo contributed to project administration and supervision.

Corresponding author

Correspondence to Jing Zhang.

Ethics declarations

Conflict of interest

The authors declare that there are no conflicts of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, W., Zhang, J., Li, J. et al. Unpaved road segmentation of UAV imagery via a global vision transformer with dilated cross window self-attention for dynamic map. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03416-0

Download citation

Accepted: 09 April 2024
Published: 18 May 2024
DOI: https://doi.org/10.1007/s00371-024-03416-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unpaved road segmentation of UAV imagery via a global vision transformer with dilated cross window self-attention for dynamic map

Abstract

Access this article

Similar content being viewed by others

YOLO-U: multi-task model for vehicle detection and road segmentation in UAV aerial imagery

The NITRDrone Dataset to Address the Challenges for Road Extraction from Aerial Images

AGF-Net: adaptive global feature fusion network for road extraction from remote-sensing images

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Unpaved road segmentation of UAV imagery via a global vision transformer with dilated cross window self-attention for dynamic map

Abstract

Access this article

Similar content being viewed by others

YOLO-U: multi-task model for vehicle detection and road segmentation in UAV aerial imagery

The NITRDrone Dataset to Address the Challenges for Road Extraction from Aerial Images

AGF-Net: adaptive global feature fusion network for road extraction from remote-sensing images

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation