Machine learning (ML) models are typically optimized for their accuracy on a given dataset. However, this predictive criterion rarely captures all desirable properties of a model, in particular how well it matches a domain expert’s understanding of a task. Underspecification [6] refers to the existence of multiple models that are indistinguishable in their in-domain accuracy, even though they differ in other desirable properties such as out-of-distribution (OOD) performance. Identifying these situations is critical for assessing the reliability of ML models. We formalize the concept of underspecification and propose a method to identify and partially address it. We train multiple models with an independence constraint that forces them to implement different functions. They discover predictive features that are otherwise ignored by standard empirical risk minimization (ERM), which we then distill into a global model with superior OOD performance. Importantly, we constrain the models to align with the data manifold to ensure that they discover meaningful features. We demonstrate the method on multiple datasets in computer vision (collages, WILDS-Camelyon17, GQA) and discuss general implications of underspecification. Most notably, in-domain performance cannot serve for OOD model selection without additional assumptions (See https://arxiv.org/abs/2207.02598 for the full-length version of this work).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
- 1.
In this paper, OOD means there is a covariate shift between training and test data [44].
- 2.
We define f to output logits. A binary prediction \(\hat{y}\) is obtained as \(\hat{y}={\text {round}}\big (\sigma \big (f({\boldsymbol{x}})\big )\big )\).
- 3.
- 4.
In our implementation, masked elements are not replaced with zeros, but rater with random values from other instances in the current mini-batch.
- 5.
We obtain very similar results between fine-tuning and retraining models from scratch on the masked data.
Alesiani, F., Yu, S., Yu, X.: Gated information bottleneck for generalization in sequential environments. arXiv preprint arXiv:2110.06057 (2021)
Arjovsky, M., Bottou, L., Gulrajani, I., Lopez-Paz, D.: Invariant risk minimization. arXiv preprint arXiv:1907.02893 (2019)
Banerjee, I., et al.: Reading race: Ai recognises patient’s racial identity in medical images. arXiv preprint arXiv:2107.10356 (2021)
Cubuk, E.D., Dyer, E.S., Lopes, R.G., Smullin, S.: Tradeoffs in data augmentation: An empirical study. In: Proceedings of the International Conference on Learning Representations (2021)
Teney, D., Ehsan Abbasnejad, A.v.d.H.: Unshuffling data for improved generalization. arXiv preprint arXiv:2002.11894 (2020)
D’Amour, A., et al.: Underspecification presents challenges for credibility in modern machine learning. arXiv preprint arXiv:2011.03395 (2020)
Das, S., Cashman, D., Chang, R., Endert, A.: Beames: Interactive multimodel steering, selection, and inspection for regression tasks. IEEE Comput. Graphics Appl. 39(5), 20–32 (2019)
Deng, W., Gould, S., Zheng, L.: What does rotation prediction tell us about classifier accuracy under varying testing environments? arXiv preprint arXiv:2106.05961 (2021)
Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17, 1–35 (2016)
Gardner, M., et al.: Evaluating NLP models via contrast sets. arXiv preprint arXiv:2004.02709 (2020)
Garg, S., Balakrishnan, S., Kolter, J.Z., Lipton, Z.C.: Ratt: Leveraging unlabeled data to guarantee generalization. arXiv preprint arXiv:2105.00303 (2021)
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231 (2018)
Ghimire, S., Kashyap, S., Wu, J.T., Karargyris, A., Moradi, M.: Learning invariant feature representation to improve generalization across chest x-ray datasets. In: International Workshop on Machine Learning in Medical Imaging (2020)
Goyal, Y., Wu, Z., Ernst, J., Batra, D., Parikh, D., Lee, S.: Counterfactual visual explanations. In: International Conference on Machine Learning, pp. 2376–2384. PMLR (2019)
Gretton, A., Herbrich, R., Smola, A.J.: The kernel mutual information. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP 2003), vol. 4, pp. IV-880. IEEE (2003)
Gulrajani, I., Lopez-Paz, D.: In search of lost domain generalization. In: Proceedings of the International Conference on Learning (2021)
Hälvä, H., Hyvarinen, A.: Hidden markov nonlinear ica: Unsupervised learning from nonstationary time series. In: Conference on Uncertainty in Artificial Intelligence, pp. 939–948. PMLR (2020)
Hoffman, J., Roberts, D.A., Yaida, S.: Robust learning with jacobian regularization. arXiv preprint arXiv:1908.02729 (2019)
Hsu, H., Calmon, F.d.P.: Rashomon capacity: A metric for predictive multiplicity in probabilistic classification. arXiv preprint arXiv:2206.01295 (2022)
Hudson, D.A., Manning, C.D.: GQA: A new dataset for real-world visual reasoning and compositional question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Ilse, M., Tomczak, J.M., Forré, P.: Designing data augmentation for simulating interventions. In: Proceedings of the International Conference on Machine Learning (2021)
Immer, A., Bauer, M., Fortuin, V., Rätsch, G., Khan, M.E.: Scalable marginal likelihood estimation for model selection in deep learning. arXiv preprint arXiv:2104.04975 (2021)
Kaushik, D., Hovy, E., Lipton, Z.C.: Learning the difference that makes a difference with counterfactually-augmented data. arXiv preprint arXiv:1909.12434 (2019)
Kervadec, C., Antipov, G., Baccouche, M., Wolf, C.: Roses are red, violets are blue... but should VQA expect them to? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2021)
Kirichenko, P., Izmailov, P., Wilson, A.G.: Last layer re-training is sufficient for robustness to spurious correlations. arXiv preprint arXiv:2204.02937 (2022)
Koh, P.W., et al.: Wilds: A benchmark of in-the-wild distribution shifts. arXiv preprint arXiv:2012.07421 (2020)
Koh, P.W., et al.: Wilds: A benchmark of in-the-wild distribution shifts. In: Proceedings of the International Conference on Machine Learning (2021)
von Kügelgen, J., et al.: Self-supervised learning with data augmentations provably isolates content from style. arXiv preprint arXiv:2106.04619 (2021)
Lee, D.H., et al.: Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML (2013)
Mehrer, J., Spoerer, C.J., Kriegeskorte, N., Kietzmann, T.C.: Individual differences among deep neural network models. Nat. Commun. 11(1), 1–12 (2020)
Miller, J.P., et al.: Accuracy on the line: on the strong correlation between out-of-distribution and in-distribution generalization. In: Proceedings of the International Conference on Machine Learning (2021)
Mitchell, T.M.: The need for biases in learning generalizations. Rutgers University (1980)
Ortiz-Jimenez, G., Salazar-Reque, I.F., Modas, A., Moosavi-Dezfooli, S.M., Frossard, P.: A neural anisotropic view of underspecification in deep learning. In: Proceedings of the International Conference on Learning Representations (2021)
Peters, J., Bühlmann, P., Meinshausen, N.: Causal inference by using invariant prediction: identification and confidence intervals. J. Royal Stat. Soc. Ser. B (Stat. Methodol.) 78, 947–1012 (2016)
Pezeshki, M., Kaba, S.O., Bengio, Y., Courville, A., Precup, D., Lajoie, G.: Gradient starvation: A learning proclivity in neural networks. arXiv preprint arXiv:2011.09468 (2020)
Pfister, N., Bühlmann, P., Peters, J.: Invariant causal prediction for sequential data. J. Am. Stat. Assoc. 114(527), 1264–1276 (2019)
Pope, P., Zhu, C., Abdelkader, A., Goldblum, M., Goldstein, T.: The intrinsic dimension of images and its impact on learning. arXiv preprint arXiv:2104.08894 (2021)
Rosenfeld, E., Ravikumar, P., Risteski, A.: Domain-adjusted regression or: Erm may already learn features sufficient for out-of-distribution generalization. arXiv preprint arXiv:2202.06856 (2022)
Ross, A., Pan, W., Celi, L., Doshi-Velez, F.: Ensembles of locally independent prediction models. In: Proceedings of the Conference on AAAI (2020)
Ross, A.S., Pan, W., Doshi-Velez, F.: Learning qualitatively diverse and interpretable rules for classification. arXiv preprint arXiv:1806.08716 (2018)
Selvaraju, R.R., et al.: Taking a hint: Leveraging explanations to make vision and language models more grounded. In: Proceedings of the IEEE International Conference on Computer Vision (2019)
Semenova, L., Rudin, C., Parr, R.: A study in rashomon curves and volumes: A new perspective on generalization and model simplicity in machine learning. arXiv preprint arXiv:1908.01755 (2019)
Shah, H., Tamuly, K., Raghunathan, A., Jain, P., Netrapalli, P.: The pitfalls of simplicity bias in neural networks. arXiv preprint arXiv:2006.07710 (2020)
Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Statist. Planning Inference 90(2), 227–244 (2000)
Sohn, K., et al.: Fixmatch: Simplifying semi-supervised learning with consistency and confidence. In: Proceedings of the Advances in Neural Information Processing Systems (2020)
Sun, B., Feng, J., Saenko, K.: Correlation alignment for unsupervised domain adaptation. In: Csurka, G. (ed.) Domain Adaptation in Computer Vision Applications. ACVPR, pp. 153–171. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58347-1_8
Teney, D., Abbasnedjad, E., van den Hengel, A.: Learning what makes a difference from counterfactual examples and gradient supervision. arXiv preprint arXiv:2004.09034 (2020)
Teney, D., Abbasnejad, E., Lucey, S., van den Hengel, A.: Evading the simplicity bias: Training a diverse set of models discovers solutions with superior OOD generalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2022)
Thiagarajan, J., Narayanaswamy, V.S., Rajan, D., Liang, J., Chaudhari, A., Spanias, A.: Designing counterfactual generators using deep model inversion. In: Proceedings of the Advances in Neural Information Processing Systems (2021)
Vapnik, V.: Statistical learning theory. john wiley &sons. Inc., New York (1998)
Venkateswaran, P., Muthusamy, V., Isahagian, V., Venkatasubramanian, N.: Environment agnostic invariant risk minimization for classification of sequential datasets. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 1615–1624 (2021)
Wald, Y., Feder, A., Greenfeld, D., Shalit, U.: On calibration and out-of-domain generalization. arXiv preprint arXiv:2102.10395 (2021)
Weinberger, K.Q., Saul, L.K.: Unsupervised learning of image manifolds by semidefinite programming. Int. J. Comput. Vision 70, 77–90 (2006)
Wolpert, D.H.: The lack of a priori distinctions between learning algorithms. Neural Comput. 8(7), 1341–1390 (1996)
Xiao, T., Wang, X., Efros, A.A., Darrell, T.: What should not be contrastive in contrastive learning. arXiv preprint arXiv:2008.05659 (2020)
Xie, Q., Luong, M.T., Hovy, E., Le, Q.V.: Self-training with noisy student improves ImageNet classification. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2020)
Zimmermann, R.S., Sharma, Y., Schneider, S., Bethge, M., Brendel, W.: Contrastive learning inverts the data generating process. arXiv preprint arXiv:2102.08850 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Teney, D., Peyrard, M., Abbasnejad, E. (2022). Predicting Is Not Understanding: Recognizing and Addressing Underspecification in Machine Learning. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13683. Springer, Cham. https://doi.org/10.1007/978-3-031-20050-2_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-20050-2_27
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20049-6
Online ISBN: 978-3-031-20050-2
eBook Packages: Computer ScienceComputer Science (R0)