Skip to main content

Predicting Is Not Understanding: Recognizing and Addressing Underspecification in Machine Learning

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13683))

Included in the following conference series:

Abstract

Machine learning (ML) models are typically optimized for their accuracy on a given dataset. However, this predictive criterion rarely captures all desirable properties of a model, in particular how well it matches a domain expert’s understanding of a task. Underspecification [6] refers to the existence of multiple models that are indistinguishable in their in-domain accuracy, even though they differ in other desirable properties such as out-of-distribution (OOD) performance. Identifying these situations is critical for assessing the reliability of ML models. We formalize the concept of underspecification and propose a method to identify and partially address it. We train multiple models with an independence constraint that forces them to implement different functions. They discover predictive features that are otherwise ignored by standard empirical risk minimization (ERM), which we then distill into a global model with superior OOD performance. Importantly, we constrain the models to align with the data manifold to ensure that they discover meaningful features. We demonstrate the method on multiple datasets in computer vision (collages, WILDS-Camelyon17, GQA) and discuss general implications of underspecification. Most notably, in-domain performance cannot serve for OOD model selection without additional assumptions (See https://arxiv.org/abs/2207.02598 for the full-length version of this work).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In this paper, OOD means there is a covariate shift between training and test data [44].

  2. 2.

    We define f to output logits. A binary prediction \(\hat{y}\) is obtained as \(\hat{y}={\text {round}}\big (\sigma \big (f({\boldsymbol{x}})\big )\big )\).

  3. 3.

    Previously, [19, 42] used volumes of hypothesis spaces to define Rashomon sets.

  4. 4.

    In our implementation, masked elements are not replaced with zeros, but rater with random values from other instances in the current mini-batch.

  5. 5.

    We obtain very similar results between fine-tuning and retraining models from scratch on the masked data.

References

  1. Alesiani, F., Yu, S., Yu, X.: Gated information bottleneck for generalization in sequential environments. arXiv preprint arXiv:2110.06057 (2021)

  2. Arjovsky, M., Bottou, L., Gulrajani, I., Lopez-Paz, D.: Invariant risk minimization. arXiv preprint arXiv:1907.02893 (2019)

  3. Banerjee, I., et al.: Reading race: Ai recognises patient’s racial identity in medical images. arXiv preprint arXiv:2107.10356 (2021)

  4. Cubuk, E.D., Dyer, E.S., Lopes, R.G., Smullin, S.: Tradeoffs in data augmentation: An empirical study. In: Proceedings of the International Conference on Learning Representations (2021)

    Google Scholar 

  5. Teney, D., Ehsan Abbasnejad, A.v.d.H.: Unshuffling data for improved generalization. arXiv preprint arXiv:2002.11894 (2020)

  6. D’Amour, A., et al.: Underspecification presents challenges for credibility in modern machine learning. arXiv preprint arXiv:2011.03395 (2020)

  7. Das, S., Cashman, D., Chang, R., Endert, A.: Beames: Interactive multimodel steering, selection, and inspection for regression tasks. IEEE Comput. Graphics Appl. 39(5), 20–32 (2019)

    Article  Google Scholar 

  8. Deng, W., Gould, S., Zheng, L.: What does rotation prediction tell us about classifier accuracy under varying testing environments? arXiv preprint arXiv:2106.05961 (2021)

  9. Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17, 1–35 (2016)

    MathSciNet  MATH  Google Scholar 

  10. Gardner, M., et al.: Evaluating NLP models via contrast sets. arXiv preprint arXiv:2004.02709 (2020)

  11. Garg, S., Balakrishnan, S., Kolter, J.Z., Lipton, Z.C.: Ratt: Leveraging unlabeled data to guarantee generalization. arXiv preprint arXiv:2105.00303 (2021)

  12. Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231 (2018)

  13. Ghimire, S., Kashyap, S., Wu, J.T., Karargyris, A., Moradi, M.: Learning invariant feature representation to improve generalization across chest x-ray datasets. In: International Workshop on Machine Learning in Medical Imaging (2020)

    Google Scholar 

  14. Goyal, Y., Wu, Z., Ernst, J., Batra, D., Parikh, D., Lee, S.: Counterfactual visual explanations. In: International Conference on Machine Learning, pp. 2376–2384. PMLR (2019)

    Google Scholar 

  15. Gretton, A., Herbrich, R., Smola, A.J.: The kernel mutual information. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP 2003), vol. 4, pp. IV-880. IEEE (2003)

    Google Scholar 

  16. Gulrajani, I., Lopez-Paz, D.: In search of lost domain generalization. In: Proceedings of the International Conference on Learning (2021)

    Google Scholar 

  17. Hälvä, H., Hyvarinen, A.: Hidden markov nonlinear ica: Unsupervised learning from nonstationary time series. In: Conference on Uncertainty in Artificial Intelligence, pp. 939–948. PMLR (2020)

    Google Scholar 

  18. Hoffman, J., Roberts, D.A., Yaida, S.: Robust learning with jacobian regularization. arXiv preprint arXiv:1908.02729 (2019)

  19. Hsu, H., Calmon, F.d.P.: Rashomon capacity: A metric for predictive multiplicity in probabilistic classification. arXiv preprint arXiv:2206.01295 (2022)

  20. Hudson, D.A., Manning, C.D.: GQA: A new dataset for real-world visual reasoning and compositional question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  21. Ilse, M., Tomczak, J.M., Forré, P.: Designing data augmentation for simulating interventions. In: Proceedings of the International Conference on Machine Learning (2021)

    Google Scholar 

  22. Immer, A., Bauer, M., Fortuin, V., Rätsch, G., Khan, M.E.: Scalable marginal likelihood estimation for model selection in deep learning. arXiv preprint arXiv:2104.04975 (2021)

  23. Kaushik, D., Hovy, E., Lipton, Z.C.: Learning the difference that makes a difference with counterfactually-augmented data. arXiv preprint arXiv:1909.12434 (2019)

  24. Kervadec, C., Antipov, G., Baccouche, M., Wolf, C.: Roses are red, violets are blue... but should VQA expect them to? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2021)

    Google Scholar 

  25. Kirichenko, P., Izmailov, P., Wilson, A.G.: Last layer re-training is sufficient for robustness to spurious correlations. arXiv preprint arXiv:2204.02937 (2022)

  26. Koh, P.W., et al.: Wilds: A benchmark of in-the-wild distribution shifts. arXiv preprint arXiv:2012.07421 (2020)

  27. Koh, P.W., et al.: Wilds: A benchmark of in-the-wild distribution shifts. In: Proceedings of the International Conference on Machine Learning (2021)

    Google Scholar 

  28. von Kügelgen, J., et al.: Self-supervised learning with data augmentations provably isolates content from style. arXiv preprint arXiv:2106.04619 (2021)

  29. Lee, D.H., et al.: Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML (2013)

    Google Scholar 

  30. Mehrer, J., Spoerer, C.J., Kriegeskorte, N., Kietzmann, T.C.: Individual differences among deep neural network models. Nat. Commun. 11(1), 1–12 (2020)

    Article  Google Scholar 

  31. Miller, J.P., et al.: Accuracy on the line: on the strong correlation between out-of-distribution and in-distribution generalization. In: Proceedings of the International Conference on Machine Learning (2021)

    Google Scholar 

  32. Mitchell, T.M.: The need for biases in learning generalizations. Rutgers University (1980)

    Google Scholar 

  33. Ortiz-Jimenez, G., Salazar-Reque, I.F., Modas, A., Moosavi-Dezfooli, S.M., Frossard, P.: A neural anisotropic view of underspecification in deep learning. In: Proceedings of the International Conference on Learning Representations (2021)

    Google Scholar 

  34. Peters, J., Bühlmann, P., Meinshausen, N.: Causal inference by using invariant prediction: identification and confidence intervals. J. Royal Stat. Soc. Ser. B (Stat. Methodol.) 78, 947–1012 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  35. Pezeshki, M., Kaba, S.O., Bengio, Y., Courville, A., Precup, D., Lajoie, G.: Gradient starvation: A learning proclivity in neural networks. arXiv preprint arXiv:2011.09468 (2020)

  36. Pfister, N., Bühlmann, P., Peters, J.: Invariant causal prediction for sequential data. J. Am. Stat. Assoc. 114(527), 1264–1276 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  37. Pope, P., Zhu, C., Abdelkader, A., Goldblum, M., Goldstein, T.: The intrinsic dimension of images and its impact on learning. arXiv preprint arXiv:2104.08894 (2021)

  38. Rosenfeld, E., Ravikumar, P., Risteski, A.: Domain-adjusted regression or: Erm may already learn features sufficient for out-of-distribution generalization. arXiv preprint arXiv:2202.06856 (2022)

  39. Ross, A., Pan, W., Celi, L., Doshi-Velez, F.: Ensembles of locally independent prediction models. In: Proceedings of the Conference on AAAI (2020)

    Google Scholar 

  40. Ross, A.S., Pan, W., Doshi-Velez, F.: Learning qualitatively diverse and interpretable rules for classification. arXiv preprint arXiv:1806.08716 (2018)

  41. Selvaraju, R.R., et al.: Taking a hint: Leveraging explanations to make vision and language models more grounded. In: Proceedings of the IEEE International Conference on Computer Vision (2019)

    Google Scholar 

  42. Semenova, L., Rudin, C., Parr, R.: A study in rashomon curves and volumes: A new perspective on generalization and model simplicity in machine learning. arXiv preprint arXiv:1908.01755 (2019)

  43. Shah, H., Tamuly, K., Raghunathan, A., Jain, P., Netrapalli, P.: The pitfalls of simplicity bias in neural networks. arXiv preprint arXiv:2006.07710 (2020)

  44. Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Statist. Planning Inference 90(2), 227–244 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  45. Sohn, K., et al.: Fixmatch: Simplifying semi-supervised learning with consistency and confidence. In: Proceedings of the Advances in Neural Information Processing Systems (2020)

    Google Scholar 

  46. Sun, B., Feng, J., Saenko, K.: Correlation alignment for unsupervised domain adaptation. In: Csurka, G. (ed.) Domain Adaptation in Computer Vision Applications. ACVPR, pp. 153–171. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58347-1_8

    Chapter  Google Scholar 

  47. Teney, D., Abbasnedjad, E., van den Hengel, A.: Learning what makes a difference from counterfactual examples and gradient supervision. arXiv preprint arXiv:2004.09034 (2020)

  48. Teney, D., Abbasnejad, E., Lucey, S., van den Hengel, A.: Evading the simplicity bias: Training a diverse set of models discovers solutions with superior OOD generalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2022)

    Google Scholar 

  49. Thiagarajan, J., Narayanaswamy, V.S., Rajan, D., Liang, J., Chaudhari, A., Spanias, A.: Designing counterfactual generators using deep model inversion. In: Proceedings of the Advances in Neural Information Processing Systems (2021)

    Google Scholar 

  50. Vapnik, V.: Statistical learning theory. john wiley &sons. Inc., New York (1998)

    Google Scholar 

  51. Venkateswaran, P., Muthusamy, V., Isahagian, V., Venkatasubramanian, N.: Environment agnostic invariant risk minimization for classification of sequential datasets. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 1615–1624 (2021)

    Google Scholar 

  52. Wald, Y., Feder, A., Greenfeld, D., Shalit, U.: On calibration and out-of-domain generalization. arXiv preprint arXiv:2102.10395 (2021)

  53. Weinberger, K.Q., Saul, L.K.: Unsupervised learning of image manifolds by semidefinite programming. Int. J. Comput. Vision 70, 77–90 (2006)

    Article  Google Scholar 

  54. Wolpert, D.H.: The lack of a priori distinctions between learning algorithms. Neural Comput. 8(7), 1341–1390 (1996)

    Article  Google Scholar 

  55. Xiao, T., Wang, X., Efros, A.A., Darrell, T.: What should not be contrastive in contrastive learning. arXiv preprint arXiv:2008.05659 (2020)

  56. Xie, Q., Luong, M.T., Hovy, E., Le, Q.V.: Self-training with noisy student improves ImageNet classification. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  57. Zimmermann, R.S., Sharma, Y., Schneider, S., Bethge, M., Brendel, W.: Contrastive learning inverts the data generating process. arXiv preprint arXiv:2102.08850 (2021)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Damien Teney .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 8623 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Teney, D., Peyrard, M., Abbasnejad, E. (2022). Predicting Is Not Understanding: Recognizing and Addressing Underspecification in Machine Learning. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13683. Springer, Cham. https://doi.org/10.1007/978-3-031-20050-2_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20050-2_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20049-6

  • Online ISBN: 978-3-031-20050-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics