CLIP-guided black-box domain adaptation of image classification

Tian, Liang; Ye, Mao; Zhou, Lihua; He, Qichen

doi:10.1007/s11760-024-03101-8

CLIP-guided black-box domain adaptation of image classification

Original Paper
Published: 23 March 2024

(2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Liang Tian¹,
Mao Ye¹,
Lihua Zhou¹ &
…
Qichen He¹

201 Accesses
Explore all metrics

Abstract

Recently, the significant success of the large pre-trained models have attracted great attentions. How to sufficiently use these models is a big issue. Black-box domain adaptation is a way which tries to train a target model by a cloud API offered by a large pre-trained model without model details and source data. The existing black-box domain adaptation methods for image classification always use the prediction results from the cloud API, but the information is very limited. On the other hand, the recent proposed visual-language model (CLIP), trained from a large number of extensive datasets, aligns the visual feature and text feature in a common space, which provides useful auxiliary information. In this work, we propose a new black-box domain adaptation method guided by CLIP (BBC). The key idea is to generate more accurate pseudo-labels. Two strategies are adapted. The first is called generation of joint pseudo-labels, which combines the predictions from cloud API and CLIP model. Another one is the structure-preserved pseudo-labeling strategy which further generates much better pseudo-labels by the previous stored predictions of the k-closest neighbors. Experiments on three benchmark datasets show that our method achieves the state-of-the-art results with large margin.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Context-Aware Robust Fine-Tuning

Article 03 December 2023

Learning to Prompt for Vision-Language Models

Article 31 July 2022

Unsupervised Prototype Adapter for Vision-Language Models

Availability of data and materials

The datasets Office-31, Office-Home and VisDA-C could be downloaded from https://github.com/tim-learn/SHOT [7]

References

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Google Scholar
Saito, K., Watanabe, K., Ushiku, Y., Harada, T.: Maximum classifier discrepancy for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3723–3732 (2018)
Long, M., Cao, Z., Wang, J., Jordan, M.I.: Conditional adversarial domain adaptation. In: Advances in Neural Information Processing Systems, 31 (2018)
Liang, J., Hu, D., He, R., Feng, J.: Distill and fine-tune: Effective adaptation from a black-box source model, arXiv preprint arXiv:2104.01539 1 (3) (2021)
Zhang, H., Zhang, Y., Jia, K., Zhang, L.: Unsupervised domain adaptation of black-box source models, arXiv preprint arXiv:2101.02839 (2021)
Li, R., Jiao, Q., Cao, W., Wong, H.-S., Wu, S.: Model adaptation: Unsupervised domain adaptation without source data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9641–9650 (2020)
Liang, J., Hu, D., Feng, J.: Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In: International Conference on Machine Learning, PMLR, pp. 6028–6039 (2020)
Yang, S., Wang, Y., Weijer, J.V.D., Herranz, L., Jui, S.: Unsupervised domain adaptation without source data by casting a bait, arXiv preprint arXiv:2010.12427 (2020)
Kim, Y., Cho, D., Han, K., Panda, P., Hong, S.: Domain adaptation without source data. IEEE Trans. Artif. Intell. 2(6), 508–518 (2021)
Article Google Scholar
Qiu, Z., Zhang, Y., Lin, H., Niu, S., Liu, Y., Du, Q., Tan, M.: Source-free domain adaptation via avatar prototype generation and adaptation, arXiv preprint arXiv:2106.15326 (2021)
Tang, S., Zou, Y., Song, Z., Lyu, J., Chen, L., Ye, M., Zhong, S., Zhang, J.: Semantic consistency learning on manifold for source data-free unsupervised domain adaptation. Neural Netw. (2022)
Ding, Y., Sheng, L., Liang, J., Zheng, A., He, R.: Proxymix: proxy-based mixup training with label refinery for source-free domain adaptation, arXiv preprint arXiv:2205.14566 (2022)
Liu, C., Zhou, L., Ye, M., Li, X.: Self-alignment for black-box domain adaptation of image classification. IEEE Signal Process. Lett. 29, 1709–1713 (2022)
Article Google Scholar
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, PMLR, pp. 8748–8763 (2021)
Gal, R., Patashnik, O., Maron, H., Bermano, A.H., Chechik, G., Cohen-Or, D.: Stylegan-nada: clip-guided domain adaptation of image generators. ACM Trans. Graph. 41(4), 1–13 (2022)
Article Google Scholar
Tian, L., Zhou, L., Zhang, H., Wang, Z., Ye, M.: Robust self-supervised learning for source-free domain adaptation. Signal, Image and Video Processing, pp. 1–9 (2023)
Tian, J., Zhang, J., Li, W., Xu, D.: VDM-DA: virtual domain modeling for source data-free domain adaptation. IEEE Trans. Circuits Syst. Video Technol. (2021)
Xia, H., Zhao, H., Ding, Z.: Adaptive adversarial network for source-free domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9010–9019 (2021)
Liang, J., Hu, D., Feng, J., He, R.: Dine: Domain adaptation from single and multiple black-box predictors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8003–8013 (2022)
Quattoni, A., Collins, M., Darrell, T.: Learning visual representations using images with captions. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp. 1–8 (2007)
Srivastava, N., Salakhutdinov, R.R.: Multimodal learning with deep Boltzmann machines. Advances in neural information processing systems, p. 25 (2012)
Jia, C., Yang, Y., Xia, Y., Chen, Y.-T., Parekh, Z., Pham, H., Le, Q., Sung, Y.-H., Li, Z., Duerig, T.: Scaling up visual and vision-language representation learning with noisy text supervision. In: International Conference on Machine Learning, PMLR, pp. 4904–4916 (2021)
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., Sutskever, I.: Zero-shot text-to-image generation. In: International Conference on Machine Learning, PMLR, pp. 8821–8831 (2021)
Kang, G., Jiang, L., Yang, Y., Hauptmann, A.G.: Contrastive adaptation network for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4893–4902 (2019)
Yang, S., Wang, Y., van de Weijer, J., Herranz, L., Jui, S.: Generalized source-free domain adaptation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8978–8987 (2021)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Article MathSciNet Google Scholar
Saenko, K., Kulis, B., Fritz, M., Darrell, T.: Adapting Visual Category Models to New Domains. In: European Conference on computer vision. Springer, pp. 213–226 (2010)
Venkateswara, H., Eusebio, J., Chakraborty, S., Panchanathan, S.: Deep hashing network for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5018–5027 (2017)
Peng, X., Usman, B., Kaushik, N., Hoffman, J., Wang, D., Saenko, K.: Visda: the visual domain adaptation challenge, arXiv preprint arXiv:1710.06924 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Xu, R., Li, G., Yang, J., Lin, L.: Larger norm more transferable: an adaptive feature norm approach for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1426–1435 (2019)
Jin, Y., Wang, X., Long, M., Wang, J.: Minimum class confusion for versatile domain adaptation. In: European Conference on Computer Vision, Springer, pp. 464–480 (2020)
Tang, H., Chen, K., Jia, K.: Unsupervised domain adaptation via structurally regularized deep clustering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8725–8735 (2020)
Liang, J., Hu, D., Feng, J.: Domain adaptation with auxiliary target domain-oriented classifier. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16632–16642 (2021)
Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9(11) (2008)

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (62276048) and Chengdu Science and Technology Projects (2023-YF06-00009-HZ).

Funding

This work was supported in part by National Natural Science Foundation of China (62276048).

Author information

Authors and Affiliations

School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, People’s Republic of China
Liang Tian, Mao Ye, Lihua Zhou & Qichen He

Authors

Liang Tian
View author publications
You can also search for this author in PubMed Google Scholar
Mao Ye
View author publications
You can also search for this author in PubMed Google Scholar
Lihua Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Qichen He
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Liang Tiang contributed to the conception of the study, performed the experiment; Mao Ye contributed significantly to analysis and manuscript preparation; Lihua Zhou helped perform the analysis with constructive discussions; Qichen He helped perform the analysis with constructive discussions.

Corresponding author

Correspondence to Mao Ye.

Ethics declarations

Conflict of interest

This declaration is not applicable.

Ethical approval

This declaration is not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tian, L., Ye, M., Zhou, L. et al. CLIP-guided black-box domain adaptation of image classification. SIViP (2024). https://doi.org/10.1007/s11760-024-03101-8

Download citation

Received: 16 May 2023
Revised: 30 January 2024
Accepted: 17 February 2024
Published: 23 March 2024
DOI: https://doi.org/10.1007/s11760-024-03101-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CLIP-guided black-box domain adaptation of image classification

Abstract

Access this article

Similar content being viewed by others

Context-Aware Robust Fine-Tuning

Learning to Prompt for Vision-Language Models

Unsupervised Prototype Adapter for Vision-Language Models

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CLIP-guided black-box domain adaptation of image classification

Abstract

Access this article

Similar content being viewed by others

Context-Aware Robust Fine-Tuning

Learning to Prompt for Vision-Language Models

Unsupervised Prototype Adapter for Vision-Language Models

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation