Skip to main content
Log in

Causal reasoning in typical computer vision tasks

  • Review
  • Published:
Science China Technological Sciences Aims and scope Submit manuscript

Abstract

Deep learning has revolutionized the field of artificial intelligence. Based on the statistical correlations uncovered by deep learning-based methods, computer vision tasks, such as autonomous driving and robotics, are growing rapidly. Despite being the basis of deep learning, such correlation strongly depends on the distribution of the original data and is susceptible to uncontrolled factors. Without the guidance of prior knowledge, statistical correlations alone cannot correctly reflect the essential causal relations and may even introduce spurious correlations. As a result, researchers are now trying to enhance deep learning-based methods with causal theory. Causal theory can model the intrinsic causal structure unaffected by data bias and effectively avoids spurious correlations. This paper aims to comprehensively review the existing causal methods in typical vision and vision-language tasks such as semantic segmentation, object detection, and image captioning. The advantages of causality and the approaches for building causal paradigms will be summarized. Future roadmaps are also proposed, including facilitating the development of causal theory and its application in other complex scenarios and systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Zhao C Q, Sun Q Y, Zhang C Z, et al. Monocular depth estimation based on deep learning: An overview. Sci China Tech Sci, 2020, 63: 1612–1627

    Article  Google Scholar 

  2. Tang Y, Zhao C, Wang J, et al. An overview of perception and decision-making in autonomous systems in the era of learning. arXiv: 2001.02319

  3. Zhang C, Wang J, Yen G G, et al. When autonomous systems meet accuracy and transferability through AI: A survey. Patterns, 2020, 1: 100050

    Article  Google Scholar 

  4. Zhang Z W, Zheng L, Li Y N, et al. Structured road-oriented motion planning and tracking framework for active collision avoidance of autonomous vehicles. Sci China Tech Sci, 2021, 64: 2427–2440

    Article  Google Scholar 

  5. Xu C, Zhao W Z, Chen Q Y, et al. An actor-critic based learning method for decision-making and planning of autonomous vehicles. Sci China Inf Sci, 2021, 64: 984–994

    Article  Google Scholar 

  6. Wei J, Qiu J, Li T, et al. Cloud and precipitation interference by strong low-frequency sound wave. Sci China Tech Sci, 2021, 64: 261–272

    Article  Google Scholar 

  7. Zhang N B, Zhao Y, Gu G Y, et al. Synergistic control of soft robotic hands for human-like grasp postures. Sci China Tech Sci, 2022, 65: 553–568

    Article  Google Scholar 

  8. Chu Z, Deng J, Su L, et al. A gecko-inspired adhesive robotic end effector for critical-contact manipulation. Sci China Inf Sci, 2022, 65: 182203

    Article  MathSciNet  Google Scholar 

  9. Xia R, Zhao C, Zheng M, et al. CMDA: Cross-modality domain adaptation for nighttime semantic segmentation. arXiv: 2307.15942

  10. Zhao C, Yen G G, Sun Q, et al. Masked GAN for unsupervised depth and pose prediction with scale consistency. IEEE Trans Neural Netw Learn Syst, 2020, 32: 5392–5403

    Article  Google Scholar 

  11. Ren W, Tang Y, Sun Q, et al. Visual semantic segmentation based on few/zero-shot learning: An overview. IEEE CAA J Autom Sin, 2023, doi: https://doi.org/10.1109/JAS.2023.123207

  12. Yang T, Tong C. Real-time detection network for tiny traffic sign using multi-scale attention module. Sci China Tech Sci, 2022, 65: 396–406

    Article  Google Scholar 

  13. Liu T, Bao J, Zheng H, et al. Learning semantic-specific visual representation for laser welding penetration status recognition. Sci China Tech Sci, 2022, 65: 347–360

    Article  Google Scholar 

  14. Yan P, Tan Y, Tai Y. Repeatable adaptive keypoint detection via self-supervised learning. Sci China Inf Sci, 2022, 65: 212103

    Article  MathSciNet  Google Scholar 

  15. Shao Y, Geng Z, Liu Y, et al. CPT: A pre-trained unbalanced transformer for both chinese language understanding and generation. arXiv: 2109.05729

  16. Li J, Li D, Savarese S, et al. BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv: 2301.12597

  17. Pearl J. Causality. Cambridge: Cambridge University Press, 2009

    Book  Google Scholar 

  18. Gao C, Zheng Y, Wang W, et al. Causal inference in recommender systems: A survey and future directions. arXiv: 2208.12397

  19. Blyth C R. On Simpson’s paradox and the sure-thing principle. J Am Statist Assoc, 1972, 67: 364–366

    Article  MathSciNet  Google Scholar 

  20. Borsboom D, Kievit R A, Cervone D, et al. The Two Disciplines of Scientific Psychology, or: The Disunity of Psychology as a Working Hypothesis. New York: Springer, 2009. 67–97

    Google Scholar 

  21. Malik N, Singh P V. Deep learning in computer vision: Methods, interpretation, causation, and fairness. In: Operations Research & Management Science in the Age of Analytics. Seattle: INFORMS, 2019. 73–100

    Chapter  Google Scholar 

  22. Sun Q Y, Zhao C Q, Tang Y, et al. A survey on unsupervised domain adaptation in computer vision tasks (in Chinese). Sci Sin-Tech, 2022, 52: 26–54

    Article  Google Scholar 

  23. Zhou K, Liu Z, Qiao Y, et al. Domain generalization in vision: A survey. arXiv: 2103.02503

  24. Heidel R E. Causality in statistical power: Isomorphic properties of measurement, research design, effect size, and sample size. Scientifica, 2016, 2016: 1–5

    Article  Google Scholar 

  25. Dawid A P. Statistical causality from a decision-theoretic perspective. Annu Rev Stat Appl, 2015, 2: 273–303

    Article  Google Scholar 

  26. Heckman J J, Pinto R. Causality and econometrics. Technical Report 29787, National Bureau of Economic Research. 2022

  27. Geweke J. Inference and causality in economic time series models. Handbook Econometrics, 1984, 2: 1101–1144

    Article  Google Scholar 

  28. Kundi M. Causality and the interpretation of epidemiologic evidence. Environ Health Perspect, 2006, 114: 969–974

    Article  Google Scholar 

  29. Ohlsson H, Kendler K S. Applying causal inference methods in psychiatric epidemiology. JAMA Psychiatry, 2020, 77: 637–644

    Article  Google Scholar 

  30. HairJr. J F, Sarstedt M. Data, measurement, and causal inferences in machine learning: Opportunities and challenges for marketing. J Mark Theor Pract, 2021, 29: 65–77

    Article  Google Scholar 

  31. Prosperi M, Guo Y, Sperrin M, et al. Causal inference and counter-factual prediction in machine learning for actionable healthcare. Nat Mach Intell, 2020, 2: 369–375

    Article  Google Scholar 

  32. Chen H, Du K, Yang X, et al. A review and roadmap of deep learning causal discovery in different variable paradigms. arXiv: 2209.06367

  33. Pearl J. Bayesian networks. Technical Report, UCLA, Los Angeles. 2011

    Google Scholar 

  34. Kaddour J, Lynch A, Liu Q, et al. Causal machine learning: A survey and open problems. arXiv: 2206.15475

  35. Li Z, Zhu Z, Guo X, et al. A survey of deep causal models and their industrial applications. 2023, doi: https://doi.org/10.21203/rs.3.rs-2689686/v1

  36. Rebane G, Pearl J. The recovery of causal poly-trees from statistical data. arXiv: 1304.2736, 2013

  37. Castro D C, Walker I, Glocker B. Causality matters in medical imaging. Nat Commun, 2020, 11: 3673

    Article  Google Scholar 

  38. Splawa-Neyman J, Dabrowska D M, Speed T P. On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statist Sci, 1990, 5: 465–472

    Article  MathSciNet  Google Scholar 

  39. Shen Z, Cui P, Kuang K, et al. On image classification: Correlation vs causality. arXiv: 1708.06656

  40. Goyal Y, Feder A, Shalit U, et al. Explaining classifiers with causal concept effect (cace). arXiv: 1907.07165

  41. Tang K, Huang J, Zhang H. Long-tailed classification by keeping the good and removing the bad momentum causal effect. Adv Neural Inf Process Syst, 2020, 33: 1513–1524

    Google Scholar 

  42. Yue Z, Zhang H, Sun Q, et al. Interventional few-shot learning. Adv Neural Inf Process Syst, 2020, 33: 2734–2746

    Google Scholar 

  43. Hu X, Tang K, Miao C, et al. Distilling causal effect of data in class-incremental learning. In: Proceedings of the 2021 IEEE/CVF conference on Computer Vision and Pattern Recognition. Nashville, 2021. 3957–3966

  44. Mahajan D, Tople S, Sharma A. Domain generalization using causal matching. In: Proceedings of the 38th International Conference on Machine Learning. PMLR, 2021. 7313–7324

  45. Liu C, Sun X, Wang J, et al. Learning causal semantic representation for out-of-distribution prediction. Adv Neural Inf Process Syst, 2021, 34: 6155–6170

    Google Scholar 

  46. Sun X, Wu B, Zheng X, et al. Recovering latent causal factor for generalization to distributional shifts. Adv Neural Inf Process Syst, 2021, 34: 16846–16859

    Google Scholar 

  47. Yue Z, Sun Q, Hua X S, et al. Transporting causal mechanisms for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, 2021. 8599–8608

  48. Miao Q, Yuan J, Kuang K. Domain generalization via contrastive causal learning. arXiv: 2210.02655

  49. Lv F, Liang J, Li S, et al. Causality inspired representation learning for domain generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, 2022. 8046–8056

  50. Wang X, Saxon M, Li J, et al. Causal balancing for domain generalization. arXiv: 2206.05263

  51. Wang Y, Liu F, Chen Z, et al. Contrastive-ACE: Domain generalization through alignment of causal mechanisms. IEEE Trans Image Process, 2022, 32: 235–250

    Article  Google Scholar 

  52. Yang C H H, Hung I T, Liu Y C, et al. Treatment learning causal transformer for noisy image classification. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Waikoloa, 2023. 6139–6150

  53. Qiu B, Li H, Wen H, et al. Cafeboost: Causal feature boost to eliminate task-induced bias for class incremental learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, 2023. 16016–16025

  54. Chen J, Gao Z, Wu X, et al. Meta-causal learning for single domain generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, 2023. 7683–7692

  55. Huang W, Jiang M, Li M, et al. Causal intervention for object detection. In: Proceedings of the 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI). Washington, 2021. 770–774

  56. Resnick C, Litany O, Kar A, et al. Causal bert: Improving object detection by searching for challenging groups. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, 2021. 2972–2981

  57. Lin X, Wu Z, Chen G, et al. A causal debiasing framework for unsupervised salient object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, 2022. 1610–1619

  58. Li J, Zhang Y, Qiang W, et al. Disentangle and remerge: Interventional knowledge distillation for few-shot object detection from a conditional causal perspective. arXiv: 2208.12681

  59. Xu M, Qin L, Chen W, et al. Multi-view adversarial discriminator: Mine the non-causal factors for object detection in unseen domains. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, 2023. 8103–8112

  60. Shen F, Liu J, Hu P. Conterfactual generative zero-shot semantic segmentation. arXiv: 2106.06360

  61. Li W, Li Z. Causal-setr: A segmentation transformer variant based on causal intervention. In: Proceedings of the Asian Conference on Computer Vision. Berlin: Springer, 2022. 756–772

  62. Zhang D, Zhang H, Tang J, et al. Causal intervention for weakly-supervised semantic segmentation. Adv Neural Inf Process Syst, 2020, 33: 655–666

    Google Scholar 

  63. Wang Y. Causal class activation maps for weakly-supervised semantic segmentation. In: Proceedings of UAI 2022 Workshop on Causal Representation Learning. Netherlands, 2022

  64. Chen Z, Tian Z, Zhu J, et al. C-CAM: Causal cam for weakly supervised semantic segmentation on medical image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, 2022. 11676–11685

  65. Ding H, Zhang J, Kazanzides P, et al. CARTS: Causality-driven robot tool segmentation from vision and kinematics data. In: Proceedings of the Medical Image Computing and Computer Assisted Intervention-MICCAI 2022: 25th International Conference. Singapore: Springer, 2022. 387–398

    Google Scholar 

  66. Ouyang C, Chen C, Li S, et al. Causality-inspired single-source domain generalization for medical image segmentation. IEEE Trans Med Imag, 2023, 42: 1095–1106

    Article  Google Scholar 

  67. Qin W, Zhang H, Hong R, et al. Causal interventional training for image recognition. IEEE Trans Multimedia, 2023, 25: 1033–1044

    Article  Google Scholar 

  68. Liu R, Liu H, Li G, et al. Contextual debiasing for visual recognition with causal mechanisms. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, 2022. 12755–12765

  69. Wang T, Zhou C, Sun Q, et al. Causal attention for unbiased visual recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, 2021. 3091–3100

  70. Mao C, Cha A, Gupta A, et al. Generative interventions for causal learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, 2021. 3947–3956

  71. Mao C, Xia K, Wang J, et al. Causal transportability for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, 2022. 7521–7531

  72. Yang X, Zhang H, Cai J. Deconfounded image captioning: A causal retrospect. IEEE Trans Pattern Anal Mach Intell, 2022, 45: 12996–13010

    Google Scholar 

  73. Chen W, Tian J, Fan C, et al. Dependent multi-task learning with causal intervention for image captioning. arXiv: 2105.08573

  74. Liu B, Wang D, Yang X, et al. Show, deconfound and tell: Image captioning with causal inference. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, 2022. 18041–18050

  75. Niu Y, Tang K, Zhang H, et al. Counterfactual VQA: A cause-effect look at language bias. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, 2021. 12700–12710

  76. Agarwal V, Shetty R, Fritz M. Towards causal VQA: Revealing and reducing spurious correlations by invariant and covariant semantic editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, 2020. 9690–9698

  77. Zhang S, Jiang T, Wang T, et al. Devlbert: Learning deconfounded visio-linguistic representations. In: Proceedings of the 28th ACM International Conference on Multimedia. New York: Association for Computing Machinery, 2020. 4373–4382

    Chapter  Google Scholar 

  78. Chen L, Yan X, Xiao J, et al. Counterfactual samples synthesizing for robust visual question answering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, 2020. 10800–10809

  79. Li Y, Wang X, Xiao J, et al. Invariant grounding for video question answering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, 2022. 2928–2937

  80. Zang C, Wang H, Pei M, et al. Discovering the real association: Multimodal causal reasoning in video question answering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, 2023. 19027–19036

  81. Liu Y, Li G, Lin L. Cross-modal causal relational reasoning for eventlevel visual question answering. IEEE Trans Pattern Anal Mach Intell, 2023, 45: 11624–11641

    Google Scholar 

  82. Liu W, Liu Z, Paull L, et al. Structural causal 3D reconstruction. In: Proceedings of the Computer Vision-ECCV 2022: 17th European Conference. Berlin: Springer, 2022. 140–159

    Chapter  Google Scholar 

  83. Zhang X, Wong Y, Wu X, et al. Learning causal representation for training cross-domain pose estimator via generative interventions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, 2021. 11270–11280

  84. Zhang S, Song X, Li W, et al. Layout-based causal inference for object navigation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, 2023. 10792–10802

  85. Chen C F R, Fan Q, Panda R. Crossvit: Cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, 2021. 357–366

  86. Cai R, Liu C, Li J. Efficient phase-induced gabor cube selection and weighted fusion for hyperspectral image classification. Sci China Tech Sci, 2022, 65: 778–792

    Article  Google Scholar 

  87. Zeng N, Wu P, Wang Z, et al. A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection. IEEE Trans Instrum Meas, 2022, 71: 1–14

    Google Scholar 

  88. Fang L Y, Tang Q, Ouyang L H, et al. Long-tailed object detection of kitchen waste with class-instance balanced detector. Sci China Tech Sci, 2023, 66: 2361–2372

    Article  Google Scholar 

  89. Xie X, Cheng G, Li Q, et al. Fewer is more: Efficient object detection in large aerial images. arXiv: 2212.13136

  90. Geng Q, Zhou Z, Cao X. Survey of recent progress in semantic image segmentation with CNNs. Sci China Inf Sci, 2018, 61: 1–8

    Article  MathSciNet  Google Scholar 

  91. Srinivas A, Lin T Y, Parmar N, et al. Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, 2021. 16519–16529

  92. Wei X S, Xu S L, Chen H, et al. Prototype-based classifier learning for long-tailed visual recognition. Sci China Inf Sci, 2022, 65: 160105

    Article  Google Scholar 

  93. Bareinboim E, Pearl J. A general algorithm for deciding transportability of experimental results. J Causal Inference, 2013, 1: 107–134

    Article  MathSciNet  Google Scholar 

  94. Du Y, Liu Z, Li J, et al. A survey of vision-language pre-trained models. arXiv: 2202.10936

  95. Li K, Guo D, Wang M, et al. ViGT: Proposal-free video grounding with a learnable token in the transformer. Sci China Inf Sci, 2023, 66: 202102

    Article  Google Scholar 

  96. Marino K, Rastegari M, Farhadi A, et al. OK-VQA: A visual question answering benchmark requiring external knowledge. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, 2019. 3195–3204

  97. Zhou L, Palangi H, Zhang L, et al. Unified vision-language pre-training for image captioning and VQA. In: Proceedings of the AAAI Conference on Artificial Intelligence. New York, 2020. 13041–13049

  98. Wang J, Li Y, Pan Y, et al. Contextual and selective attention networks for image captioning. Sci China Inf Sci, 2022, 65: 222103

    Article  MathSciNet  Google Scholar 

  99. Cornia M, Stefanini M, Baraldi L, et al. Meshed-memory transformer for image captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, 2020. 10578–10587

  100. Pan Y, Yao T, Li Y, et al. X-linear attention networks for image captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, 2020. 10971–10980

  101. Ray A, Sikka K, Divakaran A, et al. Sunny and dark outside?! Improving answer consistency in VQA through entailed question generation. arXiv: 1909.04696

  102. Liu Y, Wei Y S, Yan H, et al. Causal reasoning meets visual representation learning: A prospective study. Mach Intell Res, 2022, 19: 485–511

    Article  Google Scholar 

  103. Li J, Wang Q. Multi-modal bioelectrical signal fusion analysis based on different acquisition devices and scene settings: Overview, challenges, and novel orientation. Inf Fusion, 2022, 79: 229–247

    Article  Google Scholar 

  104. Lahat D, Adali T, Jutten C. Multimodal data fusion: An overview of methods, challenges, and prospects. Proc IEEE, 2015, 103: 1449–1477

    Article  Google Scholar 

  105. Fu J, Lv Y, Yu W. Robust adaptive time-varying region tracking control of multi-robot systems. Sci China Inf Sci, 2023, 66: 159202

    Article  Google Scholar 

  106. Zhang Y, Yang C, Xu S, et al. Obstacle avoidance in human-robot cooperative transportation with force constraint. Sci China Inf Sci, 2023, 66: 119205

    Article  Google Scholar 

  107. Jin L, He Y, Zhang C K, et al. Equivalent input disturbance-based load frequency control for smart grid with air conditioning loads. Sci China Inf Sci, 2022, 65: 122205

    Article  Google Scholar 

  108. Chen X, Gong Z, Zhao X, et al. A machine learning surrogate modeling benchmark for temperature field reconstruction of heat source systems. Sci China Inf Sci, 2023, 66: 152203

    Article  Google Scholar 

  109. Lindner F, Olz C. Step-by-step task plan explanations beyond causal links. In: Proceedings of the 2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN). Napoli, 2022. 45–51

  110. Daruna A, Das D, Chernova S. Explainable knowledge graph embedding: Inference reconciliation for knowledge inferences supporting robot actions. In: Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Kyoto, 2022. 1008–1015

  111. Yohanandhan R V, Elavarasan R M, Pugazhendhi R, et al. A specialized review on outlook of future Cyber-Physical Power System (CPPS) testbeds for securing electric power grid. Int J Electr Power Energy Syst, 2022, 136: 107720

    Article  Google Scholar 

  112. Runge J, Bathiany S, Bollt E, et al. Inferring causation from time series in Earth system sciences. Nat Commun, 2019, 10: 2553

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Tang.

Additional information

This work was supported by the National Natural Science Foundation of China (Grant Nos. 62233005 and 62293502), the Programme of Introducing Talents of Discipline to Universities (the 111 Project, Grant No. B17017), the Fundamental Research Funds for the Central Universities (Grant No. 222202317006), and Shanghai AI Lab.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, K., Sun, Q., Zhao, C. et al. Causal reasoning in typical computer vision tasks. Sci. China Technol. Sci. 67, 105–120 (2024). https://doi.org/10.1007/s11431-023-2502-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11431-023-2502-9

Navigation