Skip to main content
Log in

Robust semantic segmentation method of urban scenes in snowy environment

  • Research
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Semantic segmentation plays a crucial role in various computer vision tasks, such as autonomous driving in urban scenes. The related researches have made significant progress. However, since most of the researches focus on how to enhance the performance of semantic segmentation models, there is a noticeable lack of attention given to the performance deterioration of these models in severe weather. To address this issue, we study the robustness of the multimodal semantic segmentation model in snowy environment, which represents a subset of severe weather conditions. The proposed method generates realistically simulated snowy environment images by combining unpaired image translation with adversarial snowflake generation, effectively misleading the segmentation model’s predictions. These generated adversarial images are then utilized for model robustness learning, enabling the model to adapt to the harshest snowy environment and enhancing its robustness to artificially adversarial perturbance to some extent. The experimental visualization results show that the proposed method can generate approximately realistic snowy environment images, and yield satisfactory visual effects for both daytime and nighttime scenes. Moreover, the experimental quantitation results generated by MFNet Dataset indicate that compared with the model without enhancement, the proposed method achieves average improvements of 4.82% and 3.95% on mAcc and mIoU, respectively. These improvements enhance the adaptability of the multimodal semantic segmentation model to snowy environments and contribute to road safety. Furthermore, the proposed method demonstrates excellent applicability, as it can be seamlessly integrated into various multimodal semantic segmentation models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availibility

All data generated or analyzed during this study are included in [13] and [26].

Code availibility

The original codes of the current study are available from the corresponding author on reasonable request.

References

  1. Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.-Y., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)

  2. Kim, E., Medioni, G.: Urban scene understanding from aerial and ground lidar data. Mach. Vis. Appl. 22(4), 691–703 (2011)

    Article  Google Scholar 

  3. Gupta, S., Dileep, A.D., Thenkanidiyoor, V.: Recognition of varying size scene images using semantic analysis of deep activation maps. Mach. Vis. Appl. 32(2), 52 (2021)

    Article  Google Scholar 

  4. Tan, X., Xu, K., Cao, Y., Zhang, Y., Ma, L., Lau, R.W.: Night-time scene parsing with a large real dataset. IEEE Trans. Image Process. 30, 9085–9098 (2021)

    Article  Google Scholar 

  5. Xie, Z., Wang, S., Xu, K., Zhang, Z., Tan, X., Xie, Y., Ma, L.: Boosting night-time scene parsing with learnable frequency. IEEE Trans. Image Process. 32, 2386–2398 (2023)

    Article  Google Scholar 

  6. Yin, H., Xie, W., Zhang, J., Zhang, Y., Zhu, W., Gao, J., Shao, Y., Li, Y.: Dual context network for real-time semantic segmentation. Mach. Vis. Appl. 34(2), 22 (2023)

    Article  Google Scholar 

  7. Rizzoli, G., Barbato, F., Zanuttigh, P.: Multimodal semantic segmentation in autonomous driving: a review of current approaches and future perspectives. Technologies 10(4), 90 (2022)

    Article  Google Scholar 

  8. Gao, J., Yi, J., Murphey, Y.L.: Attention-based global context network for driving maneuvers prediction. Mach. Vis. Appl. 33(4), 53 (2022)

    Article  Google Scholar 

  9. Tan, X., Lin, J., Xu, K., Chen, P., Ma, L., Lau, R.W.: Mirror detection with the visual chirality cue. IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 3492–3504 (2022)

    Google Scholar 

  10. Tan, X., Ma, Q., Gong, J., Xu, J., Zhang, Z., Song, H., Qu, Y., Xie, Y., Ma, L.: Positive-negative receptive field reasoning for omni-supervised 3d segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 45(12), 15328–15344 (2023)

    Article  Google Scholar 

  11. Wang, P., Chen, P., Yuan, Y., Liu, D., Huang, Z., Hou, X., Cottrell, G.: Understanding convolution for semantic segmentatfion. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1451–1460 (2018)

  12. Yang, Z., Wang, Q., Zeng, J., Qin, P., Chai, R., Sun, D.: RAU-Net: U-Net network based on residual multi-scale fusion and attention skip layer for overall spine segmentation. Mach. Vis. Appl. 34(1), 10 (2023)

    Article  Google Scholar 

  13. Ha, Q., Watanabe, K., Karasawa, T., Ushiku, Y., Harada, T.: MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5108–5115 (2017)

  14. Sun, Y., Zuo, W., Liu, M.: RTFNet: RGB-thermal fusion network for semantic segmentation of urban scenes. IEEE Robotics and Automation Letters 4(3), 2576–2583 (2019)

    Article  Google Scholar 

  15. Houben, T., Huisman, T., Pisarenco, M., Sommen, F., With, P.H.: Depth estimation from a single sem image using pixel-wise fine-tuning with multimodal data. Mach. Vis. Appl. 33(4), 56 (2022)

    Article  Google Scholar 

  16. McEnroe, P., Wang, S., Liyanage, M.: A survey on the convergence of edge computing and AI for UAVs: opportunities and challenges. IEEE Internet Things J. 9(17), 15435–15459 (2022)

    Article  Google Scholar 

  17. Carrillo, H., Quiroga, J., Zapata, L., Maldonado, E.: Automatic football video production system with edge processing. Mach. Vis. Appl. 33(2), 32 (2022)

    Article  Google Scholar 

  18. Asghar, K., Sun, X., Rosin, P.L., Saddique, M., Hussain, M., Habib, Z.: Edge-texture feature-based image forgery detection with cross-dataset evaluation. Mach. Vis. Appl. 30(7–8), 1243–1262 (2019)

    Article  Google Scholar 

  19. Hu, C., Tiliwalidi, K.: Adversarial neon beam: Robust physical-world adversarial attack to DNNs. arXiv preprint arXiv:2204.00853 (2022)

  20. Duan, R., Mao, X., Qin, A.K., Chen, Y., Ye, S., He, Y., Yang, Y.: Adversarial laser beam: Effective physical-world attack to DNNs in a blink. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16062–16071 (2021)

  21. Tremblay, M., Halder, S.S., De Charette, R., Lalonde, J.-F.: Rain rendering for evaluating and improving robustness to bad weather. Int. J. Comput. Vision 129(2), 341–360 (2020)

    Article  Google Scholar 

  22. Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., Beijbom, O.: nuScenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11621–11631 (2020)

  23. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3213–3223 (2016)

  24. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition, pp. 3354–3361 (2012)

  25. Choi, S., Jung, S., Yun, H., Kim, J.T., Kim, S., Choo, J.: RobustNet: Improving domain generalization in urban-scene segmentation via instance selective whitening. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11580–11590 (2021)

  26. Pitropov, M., Garcia, D.E., Rebello, J., Smart, M., Wang, C., Czarnecki, K., Waslander, S.: Canadian adverse driving conditions dataset. Int. J. Robot. Res. 40(4–5), 681–690 (2021)

    Article  Google Scholar 

  27. Liu, M.-Y., Tuzel, O.: Coupled generative adversarial networks. In: Advances in neural information processing systems, vol. 29 (2016)

  28. Liu, M.-Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: Advances in neural information processing systems, vol. 30 (2017)

  29. Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., Krishnan, D.: Unsupervised pixel-level domain adaptation with generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3722–3731 (2017)

  30. Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2107–2116 (2017)

  31. Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp. 2223–2232 (2017)

  32. Yi, Z., Zhang, H., Tan, P., Gong, M.: DualGAN: unsupervised dual learning for image-to-image translation. In: Proceedings of the IEEE international conference on computer vision, pp. 2849–2857 (2017)

  33. Choi, Y., Uh, Y., Yoo, J., Ha, J.-W.: StarGAN v2: diverse image synthesis for multiple domains. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8188–8197 (2020)

  34. Liu, M.-Y., Huang, X., Mallya, A., Karras, T., Aila, T., Lehtinen, J., Kautz, J.: Few-shot unsupervised image-to-image translation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 10551–10560 (2019)

  35. Pizzati, F., Charette, R.d., Zaccaria, M., Cerri, P.: Domain bridge for unpaired image-to-image translation and unsupervised domain adaptation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 2990–2998 (2020)

  36. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)

  37. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)

  38. Yi-de, M., Qing, L., Zhi-Bai, Q.: Automated image segmentation using improved PCNN model based on cross-entropy. In: Proceedings of 2004 international symposium on intelligent multimedia, video and speech processing, pp. 743–746 (2004)

  39. Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988 (2017)

  40. Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Jorge Cardoso, M.: Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Deep learning in medical image analysis and multimodal learning for clinical decision support, pp. 240–248 (2017)

  41. Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: UnitBox: an advanced object detection network. In: Proceedings of the 24th ACM international conference on multimedia, pp. 516–520 (2016)

  42. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)

  43. Fan, X., Wang, Q., Ke, J., Yang, F., Gong, B., Zhou, M.: Adversarially adaptive normalization for single domain generalization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8208–8217 (2021)

  44. Volpi, R., Namkoong, H., Sener, O., Duchi, J.C., Murino, V., Savarese, S.: Generalizing to unseen domains via adversarial data augmentation. In: Advances in neural information processing systems, vol. 31 (2018)

  45. Qiao, F., Peng, X.: Uncertainty-guided model generalization to unseen domains. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6790–6800 (2021)

  46. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)

    Article  MathSciNet  Google Scholar 

  47. Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention–MICCAI 2015, pp. 234–241 (2015)

  48. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)

    Article  Google Scholar 

  49. Hazirbas, C., Ma, L., Domokos, C., Cremers, D.: FuseNet: Incorporating depth into semantic segmentation via fusion-based cnn architecture. In: Computer Vision–ACCV 2016: 13th Asian conference on computer vision, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part I 13, pp. 213–228 (2017)

  50. Deng, F., Feng, H., Liang, M., Wang, H., Yang, Y., Gao, Y., Chen, J., Hu, J., Guo, X., Lam, T.L.: FEANet: Feature-enhanced attention network for rgb-thermal real-time semantic segmentation. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 4467–4473 (2021)

  51. Zhou, W., Liu, J., Lei, J., Yu, L., Hwang, J.-N.: GMNet: graded-feature multilabel-learning network for RGB-thermal urban scene semantic segmentation. IEEE Trans. Image Process. 30, 7790–7802 (2021)

    Article  Google Scholar 

  52. Zhou, W., Zhu, Y., Lei, J., Yang, R., Yu, L.: LSNet: lightweight spatial boosting network for detecting salient objects in RGB-thermal images. IEEE Trans. Image Process. 32, 1329–1340 (2023)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Key R &D Program of China under Grant No. 2021YFC3320302, in part by the National Natural Science Foundation of China under Grant No. 62303449, in part by Fundamental Research Funds for the Central Universities under Grant No. 3072022TS0604, and in part by the Natural Science Foundation of Liaoning Province under Grant No. 2023-BS-029.

Author information

Authors and Affiliations

Authors

Contributions

HY Methodology, writing original draft. LZ and YS methodology, validation, review and editing. Guisheng Yin and Ye Tian: methodology, validation, review. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Liguo Zhang.

Ethics declarations

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yin, H., Yin, G., Sun, Y. et al. Robust semantic segmentation method of urban scenes in snowy environment. Machine Vision and Applications 35, 59 (2024). https://doi.org/10.1007/s00138-024-01540-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-024-01540-4

Keywords

Navigation