Skip to main content

RealPatch: A Statistical Matching Framework for Model Patching with Real Samples

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)


Machine learning classifiers are typically trained to minimise the average error across a dataset. Unfortunately, in practice, this process often exploits spurious correlations caused by subgroup imbalance within the training data, resulting in high average performance but highly variable performance across subgroups. Recent work to address this problem proposes model patching with CAMEL. This previous approach uses generative adversarial networks to perform intra-class inter-subgroup data augmentations, requiring (a) the training of a number of computationally expensive models and (b) sufficient quality of model’s synthetic outputs for the given domain. In this work, we propose RealPatch, a framework for simpler, faster, and more data-efficient data augmentation based on statistical matching. Our framework performs model patching by augmenting a dataset with real samples, mitigating the need to train generative models for the target task. We demonstrate the effectiveness of RealPatch on three benchmark datasets, CelebA, Waterbirds and a subset of iWildCam, showing improvements in worst-case subgroup performance and in subgroup performance gap in binary classification. Furthermore, we conduct experiments with the imSitu dataset with 211 classes, a setting where generative model-based patching such as CAMEL is impractical. We show that RealPatch can successfully eliminate dataset leakage while reducing model leakage and maintaining high utility. The code for RealPatch can be found at

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions


  1. 1.

  2. 2.

  3. 3.


  1. Arjovsky, M., Bottou, L., Gulrajani, I., Lopez-Paz, D.: Invariant risk minimization. CoRR abs/1907.02893 (2019)

    Google Scholar 

  2. Beery, S., Cole, E., Gjoka, A.: The iWildCam 2020 competition dataset. CoRR abs/2004.10340 (2020)

    Google Scholar 

  3. Biglan, A., Ary, D., Wagenaar, A.C.: The value of interrupted time-series experiments for community intervention research. Prev. Sci. 1(1), 31–49 (2000)

    Article  Google Scholar 

  4. Chastain, R.L., et al.: Estimated full scale IQ in an adult heroin addict population (1985)

    Google Scholar 

  5. Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., Choo, J.: StarGAN: unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

    Google Scholar 

  6. Christian, P., et al.: Prenatal micronutrient supplementation and intellectual and motor function in early school-aged children in Nepal. JAMA 304(24), 2716–2723 (2010)

    Article  Google Scholar 

  7. Cochran, W.G., Rubin, D.B.: Controlling bias in observational studies: a review. Sankhyā Indian J. Stat. Ser. A, 417–446 (1973)

    Google Scholar 

  8. Cox, D.R., Snell, E.J.: The Analysis of Binary Data. Chapman and Hall, London (1989)

    MATH  Google Scholar 

  9. Creager, E., Jacobsen, J., Zemel, R.S.: Environment inference for invariant learning. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18–24 July 2021, Virtual Event. Proceedings of Machine Learning Research, vol. 139, pp. 2189–2200. PMLR (2021)

    Google Scholar 

  10. Crump, R.K., Hotz, V.J., Imbens, G.W., Mitnik, O.A.: Dealing with limited overlap in estimation of average treatment effects. Biometrika 96(1), 187–199 (2009)

    Article  MathSciNet  Google Scholar 

  11. Denton, E., Hutchinson, B., Mitchell, M., Gebru, T., Zaldivar, A.: Image counterfactual sensitivity analysis for detecting unintended bias. arXiv preprint arXiv:1906.06439v3 (2020)

  12. DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017)

  13. Goel, K., Gu, A., Li, Y., Re, C.: Model patching: closing the subgroup performance gap with data augmentation. In: International Conference on Learning Representations (2020)

    Google Scholar 

  14. Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330. PMLR (2017)

    Google Scholar 

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  16. Heinze-Deml, C., Meinshausen, N., Peters, J.: Invariant causal prediction for nonlinear models. J. Causal Inference 6(2) (2018)

    Google Scholar 

  17. Kehrenberg, T., Bartlett, M., Sharmanska, V., Quadrianto, N.: Addressing missing sources with adversarial support-matching. arXiv preprint arXiv:2203.13154 (2022)

  18. Kehrenberg, T., Bartlett, M., Thomas, O., Quadrianto, N.: Null-sampling for interpretable and fair representations. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020, Part XXVI. LNCS, vol. 12371, pp. 565–580. Springer, Cham (2020).

    Chapter  Google Scholar 

  19. King, G., Nielsen, R.: Why propensity scores should not be used for matching. Political Anal. 27(4), 435–454 (2019)

    Article  Google Scholar 

  20. Kolesnikov, A., Beyer, L., Zhai, X., Puigcerver, J., Yung, J., Gelly, S., Houlsby, N.: Big transfer (BiT): general visual representation learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 491–507. Springer, Cham (2020).

    Chapter  Google Scholar 

  21. Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3730–3738 (2015)

    Google Scholar 

  22. Lopez, M.J., Gutman, R.: Estimation of causal effects with multiple treatments: a review and new ideas. Stat. Sci., 432–454 (2017)

    Google Scholar 

  23. Martinez, N., Bertran, M., Sapiro, G.: Minimax pareto fairness: a multi objective perspective. In: International Conference on Machine Learning, pp. 6755–6764. PMLR (2020)

    Google Scholar 

  24. Mitrovic, J., McWilliams, B., Walker, J.C., Buesing, L.H., Blundell, C.: Representation learning via invariant causal mechanisms. In: International Conference on Learning Representations (2021).

  25. Morgan, S.L.: Counterfactuals, causal effect heterogeneity, and the catholic school effect on learning. Sociol. Educ., 341–374 (2001)

    Google Scholar 

  26. Normand, S.L.T., et al.: Validating recommendations for coronary angiography following acute myocardial infarction in the elderly: a matched analysis using propensity scores. J. Clin. Epidemiol. 54(4), 387–398 (2001)

    Article  Google Scholar 

  27. Perry, C.L., et al.: Project northland: outcomes of a community wide alcohol use prevention program during early adolescence. Am. J. Public Health 86(7), 956–965 (1996)

    Article  Google Scholar 

  28. Peters, J., Bühlmann, P., Meinshausen, N.: Causal inference using invariant prediction: identification and confidence intervals. J. R. Stat. Soc. Ser. B 78(5) (2016)

    Google Scholar 

  29. Rubin, D.B.: Matching to remove bias in observational studies. Biometrics, 159–183 (1973)

    Google Scholar 

  30. Rubin, D.B.: Using propensity scores to help design observational studies: application to the tobacco litigation. Health Serv. Outcomes Res. Methodol. 2(3), 169–188 (2001)

    Article  Google Scholar 

  31. Sagawa, S., Koh, P.W., Hashimoto, T.B., Liang, P.: Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731 (2019)

  32. Saunders, A.M., et al.: Association of apolipoprotein E allele \(\in \)4 with late-onset familial and sporadic Alzheimer’s disease. Neurology 43(8), 1467 (1993)

    Article  Google Scholar 

  33. Sharmanska, V., Hendricks, L.A., Darrell, T., Quadrianto, N.: Contrastive examples for addressing the tyranny of the majority. arXiv preprint arXiv:2004.06524 (2020)

  34. Veitch, V., D’Amour, A., Yadlowsky, S., Eisenstein, J.: Counterfactual invariance to spurious correlations: Why and how to pass stress tests. CoRR abs/2106.00545 (2021).

  35. Wang, T., Zhao, J., Yatskar, M., Chang, K.W., Ordonez, V.: Balanced datasets are not enough: estimating and mitigating gender bias in deep image representations. In: ICCV (2019)

    Google Scholar 

  36. Yang, K., Qinami, K., Fei-Fei, L., Deng, J., Russakovsky, O.: Towards fairer datasets: filtering and balancing the distribution of the people subtree in the ImageNet hierarchy. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 547–558. Association for Computing Machinery (2020)

    Google Scholar 

  37. Yatskar, M., Zettlemoyer, L., Farhadi, A.: Situation recognition: visual semantic role labeling for image understanding. In: CVPR (2016)

    Google Scholar 

  38. Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: CutMix: regularization strategy to train strong classifiers with localizable features. In: International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  39. Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: Beyond empirical risk minimization. In: International Conference on Learning Representations (2018)

    Google Scholar 

  40. Zhang, Z., Kim, H.J., Lonjon, G., Zhu, Y., et al.: Balance diagnostics after propensity score matching. Ann. Transl. Med. 7(1) (2019)

    Google Scholar 

  41. Zhao, J., Wang, T., Yatskar, M., Ordonez, V., Chang, K.W.: Men also like shopping: reducing gender bias amplification using corpus-level constraints. In: Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2941–2951 (2017)

    Google Scholar 

  42. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), October 2017

    Google Scholar 

Download references


This research was supported by a European Research Council (ERC) Starting Grant for the project “Bayesian Models and Algorithms for Fairness and Transparency”, funded under the European Union’s Horizon 2020 Framework Programme (grant agreement no. 851538). NQ is also supported by the Basque Government through the BERC 2018-2021 program and by Spanish Ministry of Sciences, Innovation and Universities: BCAM Severo Ochoa accreditation SEV-2017-0718.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Sara Romiti .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4164 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Romiti, S., Inskip, C., Sharmanska, V., Quadrianto, N. (2022). RealPatch: A Statistical Matching Framework for Model Patching with Real Samples. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13685. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19805-2

  • Online ISBN: 978-3-031-19806-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics