Skip to main content

OOD-CV: A Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13668))

Included in the following conference series:

Abstract

Enhancing the robustness of vision algorithms in real-world scenarios is challenging. One reason is that existing robustness benchmarks are limited, as they either rely on synthetic data or ignore the effects of individual nuisance factors. We introduce OOD-CV , a benchmark dataset that includes out-of-distribution examples of 10 object categories in terms of pose, shape, texture, context and the weather conditions, and enables benchmarking models for image classification, object detection, and 3D pose estimation. In addition to this novel dataset, we contribute extensive experiments using popular baseline methods, which reveal that: 1) Some nuisance factors have a much stronger negative effect on the performance compared to others, also depending on the vision task. 2) Current approaches to enhance robustness have only marginal effects, and can even reduce robustness. 3) We do not observe significant differences between convolutional and transformer architectures. We believe our dataset provides a rich testbed to study robustness and will help push forward research in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/jsbroks/coco-annotator.

  2. 2.

    https://cvgl.stanford.edu/projects/pascal3d.html.

  3. 3.

    http://ood-cv.org/, Also see the supplementary material.

References

  1. Robust Vision Challenge 2020. http://www.robustvision.net/

  2. Alcorn, M.A., et al.: Strike (with) a pose: neural networks are easily fooled by strange poses of familiar objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4845–4854 (2019)

    Google Scholar 

  3. Bai, Y., Mei, J., Yuille, A.L., Xie, C.: Are Transformers more robust than CNNs? Adv. Neural Inf. Process. Syst. 34, 26831–26843 (2021)

    Google Scholar 

  4. Bengio, Y., Lecun, Y., Hinton, G.: Deep learning for AI. Commun. ACM 64(7), 58–65 (2021)

    Article  Google Scholar 

  5. Bhojanapalli, S., Chakrabarti, A., Glasner, D., Li, D., Unterthiner, T., Veit, A.: Understanding robustness of transformers for image classification. In: International Conference on Computer Vision (2021)

    Google Scholar 

  6. Borji, A., Izadi, S., Itti, L.: ilab-20m: a large-scale controlled object dataset to investigate deep learning. In: IEEE Conference on Computer Vision Pattern Recognition (2016)

    Google Scholar 

  7. Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., Yuille, A.: Detect what you can: Detecting and representing objects using holistic models and body parts. In: IEEE Conference Computer Vision Pattern Recognition (2014)

    Google Scholar 

  8. Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: Autoaugment: learning augmentation policies from data. In: IEEE Conference on Computer Vision Pattern Recognition (2018)

    Google Scholar 

  9. Cui, Q., et al. Discriminability-transferability trade-off: an information-theoretic perspective. In: European Conference on Computer Vision (2022)

    Google Scholar 

  10. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference Computer Vision Pattern Recognition (2009)

    Google Scholar 

  11. Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. In: International Conference Learning Representation (2020)

    Google Scholar 

  12. Erichson, N.B., Lim, S.H., Utrera, F., Xu, W., Cao, Z., Mahoney, M.W.: Noisymix: boosting robustness by combining data augmentations, stability training, and noise injections. arXiv preprint arXiv:2202.01263, 2022

  13. Everingham, M., Eslami, S.M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vision 111(1), 98–136 (2015)

    Article  Google Scholar 

  14. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html

  15. Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: IEEE Conference on Computer Vision Pattern Recognition (2016)

    Google Scholar 

  16. Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In: International Conference on Learning Representation (2019)

    Google Scholar 

  17. Gulrajani,I., Lopez-Paz, D.: In search of lost domain generalization. In: International Conference Learning Representation (2021)

    Google Scholar 

  18. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference Computer Vision Pattern Recognition (2015)

    Google Scholar 

  19. Hendrycks, D., et al. The many faces of robustness: a critical analysis of out-of-distribution generalization. In: International Conference on Computer Vision (2021)

    Google Scholar 

  20. Hendrycks, D., Dietterich, T.: Benchmarking neural network robustness to common corruptions and perturbations. In: International Conference on Learning Representation (2019)

    Google Scholar 

  21. Hendrycks, D., Mazeika, M., Kadavath, S., Song, D.: Using self-supervised learning can improve model robustness and uncertainty. Adv. Neural Inf. Process. Syst. 32 (2019)

    Google Scholar 

  22. Hendrycks, D., Mu, N., Cubuk, E.D., Zoph, B., Gilmer, J., Lakshminarayanan, B.: Augmix: a simple data processing method to improve robustness and uncertainty. In: International Conference on Learning Representation (2020)

    Google Scholar 

  23. Hendrycks, D., Zhao, K., Basart, S., Steinhardt, J., Song, D.: Natural adversarial examples. In: IEEE Conference on Computer Vision Pattern Recognition (2021)

    Google Scholar 

  24. Howard, A., et al. Searching for mobilenetv3. In: International Conference on Computer Vision (2019)

    Google Scholar 

  25. Koh, P.W., et al. Wilds: a benchmark of in-the-wild distribution shifts. In: International Conference on Machine Learning (2021)

    Google Scholar 

  26. Kortylewski, A., Egger, B., Schneider, A., Gerig, T., Morel-Forster, A., Vetter, T.: Empirically analyzing the effect of dataset biases on deep face recognition systems. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2018)

    Google Scholar 

  27. Kortylewski, A., Egger, B., Schneider, A., Gerig, T., Morel-Forster, A., Vetter, T.: Analyzing and reducing the damage of dataset bias to face recognition with synthetic data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2019)

    Google Scholar 

  28. Kortylewski, A., Liu, Q., Wang, A., Sun, Y., Yuille, A.: Compositional convolutional neural networks: a robust and interpretable model for object recognition under occlusion. Int. J. Comput. Vision 129(3), 736–760 (2021)

    Article  MATH  Google Scholar 

  29. Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial machine learning at scale. In: International Conference Learning Representation (2017)

    Google Scholar 

  30. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: International Conference on Computer Vision (2017)

    Google Scholar 

  31. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  32. Liu, Z., et al. Swin transformer: hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (2021)

    Google Scholar 

  33. Francesco, L., et al.: Object-centric learning with slot attention. Adv. Neural Inform. Process. Syst. 33, 11525–11538 (2020)

    Google Scholar 

  34. Mahmood, K., Mahmood, R., Van Dijk, M.: On the robustness of vision transformers to adversarial examples. In: International Conference on Computer Vision (2021)

    Google Scholar 

  35. Michaelis, C., et al. Benchmarking robustness in object detection: autonomous driving when winter is coming. Adv. Neural Inf. Process. Syst. (2019)

    Google Scholar 

  36. Michaelis, C., et al. Benchmarking robustness in object detection: autonomous driving when winter is coming. arXiv preprint arXiv:1907.07484 (2019)

  37. Mohseni, S., Wang, H., Zhiding, Y., Xiao, C., Wang, Z., Yadawa, J.: Practical machine learning safety: a survey and primer. ArXiv (2021)

    Google Scholar 

  38. Qiu, W., Yuille, A.: UnrealCV: connecting computer vision to unreal engine. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 909–916. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_75

    Chapter  Google Scholar 

  39. Recht, B., Roelofs, R., Schmidt, L., Shankar, V.: Do imagenet classifiers generalize to imagenet? In: International Conference Machine Learning (2019)

    Google Scholar 

  40. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Processing Syst. 28 (2015)

    Google Scholar 

  41. Rosenfeld, A., Zemel, R., Tsotsos, J.K.: The elephant in the room. arXiv preprint arXiv:1808.03305 (2018)

  42. Shao, J., Wen, X., Zhao, B., Xue, X.: Temporal context aggregation for video retrieval with contrastive learning. In: IEEE Winter Conference on Applications of Computer Vision (2021)

    Google Scholar 

  43. Tang, K., Tao, M., Qi, J., Liu, Z., Zhang, H.: Invariant feature learning for generalized long-tailed classification. In: Europe Confernce on Computer Vision (2022)

    Google Scholar 

  44. Tremblay, J., To, T., Birchfield, S.: Falling things: a synthetic dataset for 3D object detection and pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2018)

    Google Scholar 

  45. Wang, A., Kortylewski, A., Yuille, A.: Nemo: neural mesh models of contrastive features for robust 3D pose estimation. In: International Conference on Learning Representation (2021)

    Google Scholar 

  46. Wang, A., Sun, Y., Kortylewski, A., Yuille, A.L.: Robust object detection under occlusion with context-aware compositional nets. In: IEEE Conference on Computer Vision Pattern Recognition (2020)

    Google Scholar 

  47. Wang, H., Xiao, C., Kossaifi, J., Zhiding, Y., Anandkumar, A., Wang, Z.: Augmax: adversarial composition of random augmentations for robust training. In: NeurIPS (2021)

    Google Scholar 

  48. Wen, X., Zhao, B., Zheng, A., Zhang, X., Qi, X.: Self-supervised visual representation learning with semantic grouping (2022). arxiv: 2205.15288

  49. Wong, E., Rice, L., Kolter, J.Z.: Represent, fast is better than free: revisiting adversarial training. In: International Conference on Learning (2020)

    Google Scholar 

  50. Xiang, Y., Mottaghi, R., Savarese, S.: Beyond pascal: a benchmark for 3D object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision (2014)

    Google Scholar 

  51. Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: Posecnn: a convolutional neural network for 6D object pose estimation in cluttered scenes. In Robotics: Science and Systems (RSS) (2018)

    Google Scholar 

  52. Xiao, M., Kortylewski, A., Wu, R., Qiao, S., Shen, W., Yuille, A.: TDMPNet: prototype network with recurrent top-down modulation for robust object classification under partial occlusion. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12536, pp. 447–463. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66096-3_31

    Chapter  Google Scholar 

  53. Xie, C., Wu, Y., van der Maaten, L., Yuille, A.L., He, K.: Feature denoising for improving adversarial robustness. In IEEE Conference on Computer Vision Pattern Recognition (2019)

    Google Scholar 

  54. Ye, N., et al.: Ood-bench: benchmarking and understanding out-of-distribution generalization datasets and algorithms. arXiv preprint arXiv:2106.03721 (2021)

  55. Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: Cutmix: regularization strategy to train strong classifiers with localizable features. In: International Conference on Computer Vision (2019)

    Google Scholar 

  56. Zhao, B., Wen, X.: Distilling visual priors from self-supervised learning. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12536, pp. 422–429. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66096-3_29

    Chapter  Google Scholar 

  57. Zhou, X., Karpur, A., Luo, L., Huang, Q.: Starmap for category-agnostic keypoint and viewpoint estimation. In: European Conference on Computer Vision (2018)

    Google Scholar 

  58. Zhu, R., Zhao, B., Liu, J., Sun, Z., Chen, C.W.: Improving contrastive learning by visualizing feature transformation. In: International Conference Computer Vision (2021)

    Google Scholar 

Download references

Acknowledgements

AK acknowledges support via his Emmy Noether Research Group funded by the German Science Foundation (DFG) under Grant No. 468670075. BZ acknowledges compute support from LunarAI. AY acknowledges grants ONR N00014-20-1-2206 and ONR N00014-21-1-2812.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adam Kortylewski .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2761 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhao, B. et al. (2022). OOD-CV: A Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13668. Springer, Cham. https://doi.org/10.1007/978-3-031-20074-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20074-8_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20073-1

  • Online ISBN: 978-3-031-20074-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics