Learning Domain-Invariant Representations from Text for Domain Generalization

Zhang, Huihuang; Hu, Haigen; Chen, Qi; Zhou, Qianwei; Jiang, Mingfeng

doi:10.1007/978-981-99-8543-2_10

Huihuang Zhang¹⁵,
Haigen Hu¹⁵,
Qi Chen¹⁵,
Qianwei Zhou¹⁵ &
…
Mingfeng Jiang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14432))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

359 Accesses

Abstract

Domain generalization (DG) aims to transfer the knowledge learned in the source domain to the unseen target domain. Most DG methods focus on studying how to learn domain-invariant representations that remain invariant across different domains. For humans, we tend to use the same word or text to describe images from different domains but of the same category. Therefore, text can be considered a natural domain-invariant representation. Inspired by this, this paper studies how to introduce text representations into domain generalization tasks. Specifically, the text representations generated by CLIP text encoder are used to guide the image representation learning of the visual model. To alleviate domain bias and weak discriminability caused by CLIP representations, a joint loss is proposed by combining the text representation regularization loss with standard image-level supervised loss. The proposed method is simple yet efficient, and can achieve competitive performance compared with the existing state-of-the-art methods on five standard DG datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Beery, S., Van Horn, G., Perona, P.: Recognition in terra incognita. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 472–489. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_28
Chapter Google Scholar
Bui, M.H., Tran, T., Tran, A., Phung, D.: Exploiting domain-specific features to enhance domain generalization. In: Advances in Neural Information Processing Systems, vol. 34, pp. 21189–21201 (2021)
Google Scholar
Cha, J., et al.: Swad: domain generalization by seeking flat minima. In: Advances in Neural Information Processing Systems, vol. 34, pp. 22405–22418 (2021)
Google Scholar
Cha, J., Lee, K., Park, S., Chun, S.: Domain generalization by mutual-information regularization with pre-trained models. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13683, pp. 440–457. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20050-2_26
Chapter Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR (2021)
Google Scholar
Fang, C., Xu, Y., Rockmore, D.N.: Unbiased metric learning: on the utilization of multiple datasets and web images for softening bias. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1657–1664 (2013)
Google Scholar
Gulrajani, I., Lopez-Paz, D.: In search of lost domain generalization. In: International Conference on Learning Representations (2020)
Google Scholar
Kim, D., Yoo, Y., Park, S., Kim, J., Lee, J.: Selfreg: self-supervised contrastive regularization for domain generalization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9619–9628 (2021)
Google Scholar
Krueger, D., et al.: Out-of-distribution generalization via risk extrapolation (rex). In: International Conference on Machine Learning, pp. 5815–5826. PMLR (2021)
Google Scholar
Li, D., Yang, Y., Song, Y.Z., Hospedales, T.M.: Deeper, broader and artier domain generalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5542–5550 (2017)
Google Scholar
Li, H., Pan, S.J., Wang, S., Kot, A.C.: Domain generalization with adversarial feature learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5400–5409 (2018)
Google Scholar
Li, L., et al.: Progressive domain expansion network for single domain generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 224–233 (2021)
Google Scholar
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11) (2008)
Google Scholar
Min, S., Park, N., Kim, S., Park, S., Kim, J.: Grounding visual representations with texts for domain generalization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13697, pp. 37–53. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19836-6_3
Chapter Google Scholar
Nam, H., Lee, H., Park, J., Yoon, W., Yoo, D.: Reducing domain gap by reducing style bias. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8690–8699 (2021)
Google Scholar
Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., Wang, B.: Moment matching for multi-source domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1406–1415 (2019)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115, 211–252 (2015)
Article MathSciNet Google Scholar
Shi, Y., et al.: Gradient matching for domain generalization. arXiv preprint arXiv:2104.09937 (2021)
Sun, B., Saenko, K.: Deep CORAL: correlation alignment for deep domain adaptation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 443–450. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_35
Chapter Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1999). https://doi.org/10.1007/978-1-4757-3264-1
Book Google Scholar
Venkateswara, H., Eusebio, J., Chakraborty, S., Panchanathan, S.: Deep hashing network for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5018–5027 (2017)
Google Scholar
Wang, Z., Wu, Z., Agarwal, D., Sun, J.: Medclip: contrastive learning from unpaired medical images and text. arXiv preprint arXiv:2210.10163 (2022)
Yao, X., et al.: PCL: proxy-based contrastive learning for domain generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7097–7107 (2022)
Google Scholar
Zhang, M., Marklund, H., Dhawan, N., Gupta, A., Levine, S., Finn, C.: Adaptive risk minimization: a meta-learning approach for tackling group distribution shift (2020)
Google Scholar
Zhou, K., Yang, Y., Qiao, Y., Xiang, T.: Domain generalization with mixstyle. In: International Conference on Learning Representations (2020)
Google Scholar

Download references

Acknowledgments

This work was supported in part by National Natural Science Foundation of China (Grant Nos. 62373324, 62271448 and U20A20171), in part by Zhejiang Provincial Natural Science Foundation of China (Grant Nos. LGF22F030016 and LY21F020027), and in part Key Programs for Science and Technology Development of Zhejiang Province (2022C03113).

Author information

Authors and Affiliations

Zhejiang University of Technology, Hangzhou, China
Huihuang Zhang, Haigen Hu, Qi Chen & Qianwei Zhou
Zhejiang Sci-Tech University, Hangzhou, China
Mingfeng Jiang

Authors

Huihuang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Haigen Hu
View author publications
You can also search for this author in PubMed Google Scholar
Qi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Qianwei Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Mingfeng Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Haigen Hu or Mingfeng Jiang .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, H., Hu, H., Chen, Q., Zhou, Q., Jiang, M. (2024). Learning Domain-Invariant Representations from Text for Domain Generalization. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14432. Springer, Singapore. https://doi.org/10.1007/978-981-99-8543-2_10

Download citation

DOI: https://doi.org/10.1007/978-981-99-8543-2_10
Published: 29 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8542-5
Online ISBN: 978-981-99-8543-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning Domain-Invariant Representations from Text for Domain Generalization