Jdlmask: joint defogging learning with boundary refinement for foggy scene instance segmentation

Wang, Xiaojian; Guo, Jichang; Wang, Yudong; He, Wanru

doi:10.1007/s00371-023-03230-0

Jdlmask: joint defogging learning with boundary refinement for foggy scene instance segmentation

Original article
Published: 30 January 2024

(2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Xiaojian Wang¹,
Jichang Guo ORCID: orcid.org/0000-0003-3130-1685^1,2,
Yudong Wang^1,2 &
…
Wanru He¹

105 Accesses
Explore all metrics

Abstract

State-of-the-art instance segmentation approaches, such as Mask R-CNN, have exhibited remarkable performance under clear weather conditions. However, their effectiveness is significantly compromised in foggy environments, primarily due to reduced visibility and obscured object details. To address this challenge, we introduce a joint defogging learning with boundary refinement (JDLMask) framework. Unlike conventional strategies that treat image dehazing as a preprocessing step, JDLMask employs a shared structure that enables the joint learning of defogging and instance segmentation. This integrated approach greatly bolsters the model’s adaptability to foggy scenarios. Recognizing challenges in feature extraction due to fog interference, we propose a multi-scale feature fusion mask head, based on the encoder–decoder architecture. This component is designed to acquire both local and global information, thereby enhancing the model’s feature representation capacity. Furthermore, we integrate a boundary refinement module, which sharpens the model’s localization accuracy by focusing on critical boundary details. Addressing the scarcity of datasets tailored, for instance, segmentation in real-world foggy scenes, we have enriched the Foggy Driving dataset with meticulously crafted instance mask annotations and named it the Foggy Driving InstanceSeg. Comprehensive experiments demonstrate JDLMask’s superiority. Compared to the baseline Mask R-CNN, JDLMask achieves improvements of 4.4% and 3.8% in mask AP on the Foggy Cityscapes and Cityscapes validation sets, respectively, and a 2.5% gain on the Foggy Driving InstanceSeg dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Boundary-Preserving Mask R-CNN

FusFormer: global and detail feature fusion transformer for semantic segmentation of small objects

Article 26 March 2024

A Novel Boundary-Guided Global Feature Fusion Module for Instance Segmentation

Article Open access 06 March 2024

Availability of data and materials

The datasets utilized and/or analyzed during this study can be accessed from the following repositories: Cityscapes: https://www.cityscapes-dataset.com/. Foggy Cityscapes: https://www.cityscapes-dataset.com/. Foggy Driving InstanceSeg: https://github.com/XJWang628/Foggy-Driving-InstanceSeg.

Notes

References

Minaee, S., Boykov, Y., Porikli, F., Plaza, A., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3523–3542 (2022). https://doi.org/10.1109/TPAMI.2021.3059968
Article Google Scholar
Huang, Z., Huang, L., Gong, Y., Huang, C., Wang, X.: Mask scoring R-CNN. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6402–6411 (2019). https://doi.org/10.1109/CVPR.2019.00657
Cai, Z., Vasconcelos, N.: Cascade R-CNN: high quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1483–1498 (2021). https://doi.org/10.1109/TPAMI.2019.2956516
Article Google Scholar
Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: SOLOv2: dynamic and fast instance segmentation. In: Proceedings Advances in Neural Information Processing Systems (NeurIPS) (2020)
Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: Computer Vision–ECCV 2020, pp. 282–298 (2020). https://doi.org/10.1007/978-3-030-58452-8_17. Springer
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 386–397 (2020). https://doi.org/10.1109/TPAMI.2018.2844175
Article Google Scholar
Chunle, G., Yan, Q., Anwar, S., Cong, R., Ren, W., Li, C.: Image dehazing transformer with transmission-aware 3D position embedding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5802–5810 (2022). https://doi.org/10.1109/CVPR52688.2022.00572
Sakaridis, C., Dai, D., Van Gool, L.: Semantic foggy scene understanding with synthetic data. Int. J. Comput. Vis. (2018). https://doi.org/10.1007/s11263-018-1072-8
Article Google Scholar
He, K., Sun, J., Tang, X.: Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 33(12), 2341–2353 (2011). https://doi.org/10.1109/TPAMI.2010.168
Article Google Scholar
Qin, X., Wang, Z., Bai, Y., Xie, X., Jia, H.: FFA-Net: Feature fusion attention network for single image dehazing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11908–11915 (2020). https://doi.org/10.1609/aaai.v34i07.6865
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223 (2016). https://doi.org/10.1109/CVPR.2016.350
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Bi, X., Hu, J., Xiao, B., Li, W., Gao, X.: IEMask R-CNN: information-enhanced mask R-CNN. IEEE Trans. Big Data (2022). https://doi.org/10.1109/TBDATA.2022.3187413
Article Google Scholar
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8759–8768 (2018). https://doi.org/10.1109/CVPR.2018.00913
Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4974–4983 (2019). https://doi.org/10.1109/CVPR.2019.00511
Shen, X., Yang, J., Wei, C., Deng, B., Huang, J., Hua, X., Cheng, X., Liang, K.: DCT-Mask: discrete Cosine transform mask representation for instance segmentation. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8716–8725 (2021). https://doi.org/10.1109/CVPR46437.2021.00861
Wang, X., Kong, T., Shen, C., Jiang, Y., Li, L.: SOLO: segmenting objects by locations. In: Computer Vision—ECCV 2020. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_38
Yang, Z., Wang, Y., Yang, F., Yin, Z., Zhang, T.: Real-time instance segmentation with assembly parallel task. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02537-8
Article Google Scholar
Zhang, G., Lu, X., Tan, J., Li, J., Zhang, Z., Li, Q., Hu, X.: RefineMask: Towards high-quality instance segmentation with fine-grained features. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6857–6865 (2021). https://doi.org/10.1109/CVPR46437.2021.00679
Kirillov, A., Wu, Y., He, K., Girshick, R.: PointRend: image segmentation as rendering. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9796–9805 (2020). https://doi.org/10.1109/CVPR42600.2020.00982
Gao, Y., Qi, Z., Zhao, D.: Edge-enhanced instance segmentation by grid regions of interest. Vis. Comput. (2022). https://doi.org/10.1007/s00371-021-02393-y
Article Google Scholar
Hu, Q., Zhang, Y., Zhu, Y., Jiang, Y., Song, M.: Single image dehazing algorithm based on sky segmentation and optimal transmission maps. Vis. Comput. 39, 1–17 (2022). https://doi.org/10.1007/s00371-021-02380-3
Article Google Scholar
Li, Z.-X., Wang, Y.-L., Han, Q.-L., Peng, C.: Zrdnet: zero-reference image defogging by physics-based decomposition–reconstruction mechanism and perception fusion. Vis. Comput. (2023). https://doi.org/10.1007/s00371-023-03109-0
Article Google Scholar
Sun, Y., Su, L., Luo, Y., Meng, H., Zhang, Z., Zhang, W., Yuan, S.: Irdclnet: instance segmentation of ship images based on interference reduction and dynamic contour learning in foggy scenes. IEEE Trans. Circuits Syst. Video Technol. 32, 6029–6043 (2022). https://doi.org/10.1109/TCSVT.2022.3155182
Article Google Scholar
Li, B., Peng, X., Wang, Z., Xu, J., Feng, D.: AOD-Net: all-in-one dehazing network. In: 2017 IEEE international conference on computer vision (ICCV), pp. 4780–4788 (2017). https://doi.org/10.1109/ICCV.2017.511
Huang, S.-C., Le, T.-H., Jaw, D.-W.: DSNet: joint semantic learning for object detection in inclement weather conditions. IEEE Trans. Pattern Anal. Mach. Intell. 43(8), 2623–2633 (2021). https://doi.org/10.1109/TPAMI.2020.2977911
Article Google Scholar
Liu, W., Ren, G., Yu, R., Guo, S., Zhu, J., Zhang, L.: Image-adaptive YOLO for object detection in adverse weather conditions. In: Proc. AAAI Conference on Artificial Intelligence, vol. 36, pp. 1792–1800 (2022). https://doi.org/10.1609/aaai.v36i2.20072
Zhang, S., Tuo, H., Hu, J., Jing, Z.: Domain adaptive yolo for one-stage cross-domain detection. In: Asian conference on machine learning, pp. 785–797 (2021). PMLR
Li, J., Xu, R., Ma, J., Zou, Q., Ma, J., Yu, H.: Domain adaptive object detection for autonomous driving under foggy weather. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 612–622 (2023)
Jiqing, C., Depeng, W., Teng, L., Tian, L., Huabin, W.: All-weather road drivable area segmentation method based on CycleGAN. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02650-8
Article Google Scholar
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2242–2251 (2017). https://doi.org/10.1109/ICCV.2017.244
Lee, S., Son, T., Kwak, S.: FIFO: learning fog-invariant features for foggy scene segmentation. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 18889–18899 (2022). https://doi.org/10.1109/CVPR52688.2022.01834
Li, Y., Chang, y., Yu, C., Yan, L.: Close the loop: a unified bottom-up and top-down paradigm for joint image deraining and segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 1438–1446 (2022). https://doi.org/10.1609/aaai.v36i2.20033
Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017). https://doi.org/10.1109/CVPR.2017.106
Nayar, S.K., Narasimhan, S.G.: Vision in bad weather. In: Proceedings of the Seventh IEEE International Conference on Computer Vision (ICCV), vol. 2, pp. 820–8272 (1999). https://doi.org/10.1109/ICCV.1999.790306
Narasimhan, S., Nayar, S.: Contrast restoration of weather degraded images, pp. 1–12 (2008). https://doi.org/10.1145/1508044.1508114
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015). https://doi.org/10.1109/CVPR.2015.7298594
Zhang, J., Cao, Y., Wang, Y., Wen, C., Chen, C.W.: Fully point-wise convolutional neural network for modeling statistical regularities in natural images. In: Proceedings of the 26th ACM International Conference on Multimedia. MM ’18, pp. 984–992. Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3240508.3240653
Sobel, I., Feldman, G.: A 3\(\times \)3 isotropic gradient operator for image processing. In: Pattern Classification and Scene Analysis, pp. 271–272 (1973)
Milletari, F., Navab, N., Ahmadi, S.-A.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571 (2016). https://doi.org/10.1109/3DV.2016.79
Cheng, T., Wang, X., Huang, L., Liu, W.: Boundary-preserving mask R-CNN. In: Computer Vision—ECCV 2020, pp. 660–676. Springer, Berlin, Heidelberg (2020). https://doi.org/10.1007/978-3-030-58568-6_39
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: Computer Vision—ECCV 2014, pp. 740–755. Springer, Berlin, Heidelberg (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C.C., Lin, D.: MMDetection: Open MMLab Detection Toolbox and Benchmark (2019) arXiv:1906.07155
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (2014). https://doi.org/10.1007/s11263-015-0816-y
Article Google Scholar
Bolya, D., Foley, S., Hays, J., Hoffman, J.: TIDE: A general toolbox for identifying object detection errors. In: Computer Vision—ECCV 2020, pp. 558–573. Springer, Berlin, Heidelberg (2020). https://doi.org/10.1007/978-3-030-58580-8_33

Download references

Funding

This work was supported in part by the National Natural Science Foundation of China (No.62171315), in part by the National Key Research and Development Program of China (No. 2022ZD0160400), and in part by Tianjin Research Innovation Project for Postgraduate Students (No.2021YJSB153).

Author information

Authors and Affiliations

School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China
Xiaojian Wang, Jichang Guo, Yudong Wang & Wanru He
Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
Jichang Guo & Yudong Wang

Authors

Xiaojian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jichang Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yudong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wanru He
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by XW, JG, YW, and WH. The first draft of the manuscript was written by XW and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jichang Guo.

Ethics declarations

Conflict of interest

The authors certify that there are no competing interests, as defined by Springer, or any other interests that could potentially influence the results and/or discussion reported in this paper.

Ethical approval

This declaration is not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, X., Guo, J., Wang, Y. et al. Jdlmask: joint defogging learning with boundary refinement for foggy scene instance segmentation. Vis Comput (2024). https://doi.org/10.1007/s00371-023-03230-0

Download citation

Accepted: 12 December 2023
Published: 30 January 2024
DOI: https://doi.org/10.1007/s00371-023-03230-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Jdlmask: joint defogging learning with boundary refinement for foggy scene instance segmentation

Abstract

Access this article

Similar content being viewed by others

Boundary-Preserving Mask R-CNN

FusFormer: global and detail feature fusion transformer for semantic segmentation of small objects

A Novel Boundary-Guided Global Feature Fusion Module for Instance Segmentation

Availability of data and materials

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Jdlmask: joint defogging learning with boundary refinement for foggy scene instance segmentation

Abstract

Access this article

Similar content being viewed by others

Boundary-Preserving Mask R-CNN

FusFormer: global and detail feature fusion transformer for semantic segmentation of small objects

A Novel Boundary-Guided Global Feature Fusion Module for Instance Segmentation

Availability of data and materials

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation