Predicting Is Not Understanding: Recognizing and Addressing Underspecification in Machine Learning

Teney, Damien; Peyrard, Maxime; Abbasnejad, Ehsan

doi:10.1007/978-3-031-20050-2_27

Damien Teney^12,14,
Maxime Peyrard¹³ &
Ehsan Abbasnejad¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13683))

Included in the following conference series:

European Conference on Computer Vision

2180 Accesses
1 Citations
1 Altmetric

Abstract

Machine learning (ML) models are typically optimized for their accuracy on a given dataset. However, this predictive criterion rarely captures all desirable properties of a model, in particular how well it matches a domain expert’s understanding of a task. Underspecification [6] refers to the existence of multiple models that are indistinguishable in their in-domain accuracy, even though they differ in other desirable properties such as out-of-distribution (OOD) performance. Identifying these situations is critical for assessing the reliability of ML models. We formalize the concept of underspecification and propose a method to identify and partially address it. We train multiple models with an independence constraint that forces them to implement different functions. They discover predictive features that are otherwise ignored by standard empirical risk minimization (ERM), which we then distill into a global model with superior OOD performance. Importantly, we constrain the models to align with the data manifold to ensure that they discover meaningful features. We demonstrate the method on multiple datasets in computer vision (collages, WILDS-Camelyon17, GQA) and discuss general implications of underspecification. Most notably, in-domain performance cannot serve for OOD model selection without additional assumptions (See https://arxiv.org/abs/2207.02598 for the full-length version of this work).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In this paper, OOD means there is a covariate shift between training and test data [44].
2.
We define f to output logits. A binary prediction \(\hat{y}\) is obtained as \(\hat{y}={\text {round}}\big (\sigma \big (f({\boldsymbol{x}})\big )\big )\).
3.
Previously, [19, 42] used volumes of hypothesis spaces to define Rashomon sets.
4.
In our implementation, masked elements are not replaced with zeros, but rater with random values from other instances in the current mini-batch.
5.
We obtain very similar results between fine-tuning and retraining models from scratch on the masked data.

References

Alesiani, F., Yu, S., Yu, X.: Gated information bottleneck for generalization in sequential environments. arXiv preprint arXiv:2110.06057 (2021)
Arjovsky, M., Bottou, L., Gulrajani, I., Lopez-Paz, D.: Invariant risk minimization. arXiv preprint arXiv:1907.02893 (2019)
Banerjee, I., et al.: Reading race: Ai recognises patient’s racial identity in medical images. arXiv preprint arXiv:2107.10356 (2021)
Cubuk, E.D., Dyer, E.S., Lopes, R.G., Smullin, S.: Tradeoffs in data augmentation: An empirical study. In: Proceedings of the International Conference on Learning Representations (2021)
Google Scholar
Teney, D., Ehsan Abbasnejad, A.v.d.H.: Unshuffling data for improved generalization. arXiv preprint arXiv:2002.11894 (2020)
D’Amour, A., et al.: Underspecification presents challenges for credibility in modern machine learning. arXiv preprint arXiv:2011.03395 (2020)
Das, S., Cashman, D., Chang, R., Endert, A.: Beames: Interactive multimodel steering, selection, and inspection for regression tasks. IEEE Comput. Graphics Appl. 39(5), 20–32 (2019)
Article Google Scholar
Deng, W., Gould, S., Zheng, L.: What does rotation prediction tell us about classifier accuracy under varying testing environments? arXiv preprint arXiv:2106.05961 (2021)
Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17, 1–35 (2016)
MathSciNet MATH Google Scholar
Gardner, M., et al.: Evaluating NLP models via contrast sets. arXiv preprint arXiv:2004.02709 (2020)
Garg, S., Balakrishnan, S., Kolter, J.Z., Lipton, Z.C.: Ratt: Leveraging unlabeled data to guarantee generalization. arXiv preprint arXiv:2105.00303 (2021)
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231 (2018)
Ghimire, S., Kashyap, S., Wu, J.T., Karargyris, A., Moradi, M.: Learning invariant feature representation to improve generalization across chest x-ray datasets. In: International Workshop on Machine Learning in Medical Imaging (2020)
Google Scholar
Goyal, Y., Wu, Z., Ernst, J., Batra, D., Parikh, D., Lee, S.: Counterfactual visual explanations. In: International Conference on Machine Learning, pp. 2376–2384. PMLR (2019)
Google Scholar
Gretton, A., Herbrich, R., Smola, A.J.: The kernel mutual information. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP 2003), vol. 4, pp. IV-880. IEEE (2003)
Google Scholar
Gulrajani, I., Lopez-Paz, D.: In search of lost domain generalization. In: Proceedings of the International Conference on Learning (2021)
Google Scholar
Hälvä, H., Hyvarinen, A.: Hidden markov nonlinear ica: Unsupervised learning from nonstationary time series. In: Conference on Uncertainty in Artificial Intelligence, pp. 939–948. PMLR (2020)
Google Scholar
Hoffman, J., Roberts, D.A., Yaida, S.: Robust learning with jacobian regularization. arXiv preprint arXiv:1908.02729 (2019)
Hsu, H., Calmon, F.d.P.: Rashomon capacity: A metric for predictive multiplicity in probabilistic classification. arXiv preprint arXiv:2206.01295 (2022)
Hudson, D.A., Manning, C.D.: GQA: A new dataset for real-world visual reasoning and compositional question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Ilse, M., Tomczak, J.M., Forré, P.: Designing data augmentation for simulating interventions. In: Proceedings of the International Conference on Machine Learning (2021)
Google Scholar
Immer, A., Bauer, M., Fortuin, V., Rätsch, G., Khan, M.E.: Scalable marginal likelihood estimation for model selection in deep learning. arXiv preprint arXiv:2104.04975 (2021)
Kaushik, D., Hovy, E., Lipton, Z.C.: Learning the difference that makes a difference with counterfactually-augmented data. arXiv preprint arXiv:1909.12434 (2019)
Kervadec, C., Antipov, G., Baccouche, M., Wolf, C.: Roses are red, violets are blue... but should VQA expect them to? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2021)
Google Scholar
Kirichenko, P., Izmailov, P., Wilson, A.G.: Last layer re-training is sufficient for robustness to spurious correlations. arXiv preprint arXiv:2204.02937 (2022)
Koh, P.W., et al.: Wilds: A benchmark of in-the-wild distribution shifts. arXiv preprint arXiv:2012.07421 (2020)
Koh, P.W., et al.: Wilds: A benchmark of in-the-wild distribution shifts. In: Proceedings of the International Conference on Machine Learning (2021)
Google Scholar
von Kügelgen, J., et al.: Self-supervised learning with data augmentations provably isolates content from style. arXiv preprint arXiv:2106.04619 (2021)
Lee, D.H., et al.: Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML (2013)
Google Scholar
Mehrer, J., Spoerer, C.J., Kriegeskorte, N., Kietzmann, T.C.: Individual differences among deep neural network models. Nat. Commun. 11(1), 1–12 (2020)
Article Google Scholar
Miller, J.P., et al.: Accuracy on the line: on the strong correlation between out-of-distribution and in-distribution generalization. In: Proceedings of the International Conference on Machine Learning (2021)
Google Scholar
Mitchell, T.M.: The need for biases in learning generalizations. Rutgers University (1980)
Google Scholar
Ortiz-Jimenez, G., Salazar-Reque, I.F., Modas, A., Moosavi-Dezfooli, S.M., Frossard, P.: A neural anisotropic view of underspecification in deep learning. In: Proceedings of the International Conference on Learning Representations (2021)
Google Scholar
Peters, J., Bühlmann, P., Meinshausen, N.: Causal inference by using invariant prediction: identification and confidence intervals. J. Royal Stat. Soc. Ser. B (Stat. Methodol.) 78, 947–1012 (2016)
Article MathSciNet MATH Google Scholar
Pezeshki, M., Kaba, S.O., Bengio, Y., Courville, A., Precup, D., Lajoie, G.: Gradient starvation: A learning proclivity in neural networks. arXiv preprint arXiv:2011.09468 (2020)
Pfister, N., Bühlmann, P., Peters, J.: Invariant causal prediction for sequential data. J. Am. Stat. Assoc. 114(527), 1264–1276 (2019)
Article MathSciNet MATH Google Scholar
Pope, P., Zhu, C., Abdelkader, A., Goldblum, M., Goldstein, T.: The intrinsic dimension of images and its impact on learning. arXiv preprint arXiv:2104.08894 (2021)
Rosenfeld, E., Ravikumar, P., Risteski, A.: Domain-adjusted regression or: Erm may already learn features sufficient for out-of-distribution generalization. arXiv preprint arXiv:2202.06856 (2022)
Ross, A., Pan, W., Celi, L., Doshi-Velez, F.: Ensembles of locally independent prediction models. In: Proceedings of the Conference on AAAI (2020)
Google Scholar
Ross, A.S., Pan, W., Doshi-Velez, F.: Learning qualitatively diverse and interpretable rules for classification. arXiv preprint arXiv:1806.08716 (2018)
Selvaraju, R.R., et al.: Taking a hint: Leveraging explanations to make vision and language models more grounded. In: Proceedings of the IEEE International Conference on Computer Vision (2019)
Google Scholar
Semenova, L., Rudin, C., Parr, R.: A study in rashomon curves and volumes: A new perspective on generalization and model simplicity in machine learning. arXiv preprint arXiv:1908.01755 (2019)
Shah, H., Tamuly, K., Raghunathan, A., Jain, P., Netrapalli, P.: The pitfalls of simplicity bias in neural networks. arXiv preprint arXiv:2006.07710 (2020)
Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Statist. Planning Inference 90(2), 227–244 (2000)
Article MathSciNet MATH Google Scholar
Sohn, K., et al.: Fixmatch: Simplifying semi-supervised learning with consistency and confidence. In: Proceedings of the Advances in Neural Information Processing Systems (2020)
Google Scholar
Sun, B., Feng, J., Saenko, K.: Correlation alignment for unsupervised domain adaptation. In: Csurka, G. (ed.) Domain Adaptation in Computer Vision Applications. ACVPR, pp. 153–171. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58347-1_8
Chapter Google Scholar
Teney, D., Abbasnedjad, E., van den Hengel, A.: Learning what makes a difference from counterfactual examples and gradient supervision. arXiv preprint arXiv:2004.09034 (2020)
Teney, D., Abbasnejad, E., Lucey, S., van den Hengel, A.: Evading the simplicity bias: Training a diverse set of models discovers solutions with superior OOD generalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2022)
Google Scholar
Thiagarajan, J., Narayanaswamy, V.S., Rajan, D., Liang, J., Chaudhari, A., Spanias, A.: Designing counterfactual generators using deep model inversion. In: Proceedings of the Advances in Neural Information Processing Systems (2021)
Google Scholar
Vapnik, V.: Statistical learning theory. john wiley &sons. Inc., New York (1998)
Google Scholar
Venkateswaran, P., Muthusamy, V., Isahagian, V., Venkatasubramanian, N.: Environment agnostic invariant risk minimization for classification of sequential datasets. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 1615–1624 (2021)
Google Scholar
Wald, Y., Feder, A., Greenfeld, D., Shalit, U.: On calibration and out-of-domain generalization. arXiv preprint arXiv:2102.10395 (2021)
Weinberger, K.Q., Saul, L.K.: Unsupervised learning of image manifolds by semidefinite programming. Int. J. Comput. Vision 70, 77–90 (2006)
Article Google Scholar
Wolpert, D.H.: The lack of a priori distinctions between learning algorithms. Neural Comput. 8(7), 1341–1390 (1996)
Article Google Scholar
Xiao, T., Wang, X., Efros, A.A., Darrell, T.: What should not be contrastive in contrastive learning. arXiv preprint arXiv:2008.05659 (2020)
Xie, Q., Luong, M.T., Hovy, E., Le, Q.V.: Self-training with noisy student improves ImageNet classification. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Zimmermann, R.S., Sharma, Y., Schneider, S., Bethge, M., Brendel, W.: Contrastive learning inverts the data generating process. arXiv preprint arXiv:2102.08850 (2021)

Download references

Author information

Authors and Affiliations

Idiap Research Institute, Martigny, Switzerland
Damien Teney
EPFL, Vaud, Switzerland
Maxime Peyrard
Australian Institute for Machine Learning, Adelaide, Australia
Damien Teney & Ehsan Abbasnejad

Authors

Damien Teney
View author publications
You can also search for this author in PubMed Google Scholar
Maxime Peyrard
View author publications
You can also search for this author in PubMed Google Scholar
Ehsan Abbasnejad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Damien Teney .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 8623 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Teney, D., Peyrard, M., Abbasnejad, E. (2022). Predicting Is Not Understanding: Recognizing and Addressing Underspecification in Machine Learning. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13683. Springer, Cham. https://doi.org/10.1007/978-3-031-20050-2_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-20050-2_27
Published: 28 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20049-6
Online ISBN: 978-3-031-20050-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Predicting Is Not Understanding: Recognizing and Addressing Underspecification in Machine Learning