Skip to main content
Log in

CLIP-guided black-box domain adaptation of image classification

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Recently, the significant success of the large pre-trained models have attracted great attentions. How to sufficiently use these models is a big issue. Black-box domain adaptation is a way which tries to train a target model by a cloud API offered by a large pre-trained model without model details and source data. The existing black-box domain adaptation methods for image classification always use the prediction results from the cloud API, but the information is very limited. On the other hand, the recent proposed visual-language model (CLIP), trained from a large number of extensive datasets, aligns the visual feature and text feature in a common space, which provides useful auxiliary information. In this work, we propose a new black-box domain adaptation method guided by CLIP (BBC). The key idea is to generate more accurate pseudo-labels. Two strategies are adapted. The first is called generation of joint pseudo-labels, which combines the predictions from cloud API and CLIP model. Another one is the structure-preserved pseudo-labeling strategy which further generates much better pseudo-labels by the previous stored predictions of the k-closest neighbors. Experiments on three benchmark datasets show that our method achieves the state-of-the-art results with large margin.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Availability of data and materials

The datasets Office-31, Office-Home and VisDA-C could be downloaded from https://github.com/tim-learn/SHOT [7]

References

  1. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)

    Google Scholar 

  2. Saito, K., Watanabe, K., Ushiku, Y., Harada, T.: Maximum classifier discrepancy for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3723–3732 (2018)

  3. Long, M., Cao, Z., Wang, J., Jordan, M.I.: Conditional adversarial domain adaptation. In: Advances in Neural Information Processing Systems, 31 (2018)

  4. Liang, J., Hu, D., He, R., Feng, J.: Distill and fine-tune: Effective adaptation from a black-box source model, arXiv preprint arXiv:2104.01539 1 (3) (2021)

  5. Zhang, H., Zhang, Y., Jia, K., Zhang, L.: Unsupervised domain adaptation of black-box source models, arXiv preprint arXiv:2101.02839 (2021)

  6. Li, R., Jiao, Q., Cao, W., Wong, H.-S., Wu, S.: Model adaptation: Unsupervised domain adaptation without source data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9641–9650 (2020)

  7. Liang, J., Hu, D., Feng, J.: Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In: International Conference on Machine Learning, PMLR, pp. 6028–6039 (2020)

  8. Yang, S., Wang, Y., Weijer, J.V.D., Herranz, L., Jui, S.: Unsupervised domain adaptation without source data by casting a bait, arXiv preprint arXiv:2010.12427 (2020)

  9. Kim, Y., Cho, D., Han, K., Panda, P., Hong, S.: Domain adaptation without source data. IEEE Trans. Artif. Intell. 2(6), 508–518 (2021)

    Article  Google Scholar 

  10. Qiu, Z., Zhang, Y., Lin, H., Niu, S., Liu, Y., Du, Q., Tan, M.: Source-free domain adaptation via avatar prototype generation and adaptation, arXiv preprint arXiv:2106.15326 (2021)

  11. Tang, S., Zou, Y., Song, Z., Lyu, J., Chen, L., Ye, M., Zhong, S., Zhang, J.: Semantic consistency learning on manifold for source data-free unsupervised domain adaptation. Neural Netw. (2022)

  12. Ding, Y., Sheng, L., Liang, J., Zheng, A., He, R.: Proxymix: proxy-based mixup training with label refinery for source-free domain adaptation, arXiv preprint arXiv:2205.14566 (2022)

  13. Liu, C., Zhou, L., Ye, M., Li, X.: Self-alignment for black-box domain adaptation of image classification. IEEE Signal Process. Lett. 29, 1709–1713 (2022)

    Article  Google Scholar 

  14. Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, PMLR, pp. 8748–8763 (2021)

  15. Gal, R., Patashnik, O., Maron, H., Bermano, A.H., Chechik, G., Cohen-Or, D.: Stylegan-nada: clip-guided domain adaptation of image generators. ACM Trans. Graph. 41(4), 1–13 (2022)

    Article  Google Scholar 

  16. Tian, L., Zhou, L., Zhang, H., Wang, Z., Ye, M.: Robust self-supervised learning for source-free domain adaptation. Signal, Image and Video Processing, pp. 1–9 (2023)

  17. Tian, J., Zhang, J., Li, W., Xu, D.: VDM-DA: virtual domain modeling for source data-free domain adaptation. IEEE Trans. Circuits Syst. Video Technol. (2021)

  18. Xia, H., Zhao, H., Ding, Z.: Adaptive adversarial network for source-free domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9010–9019 (2021)

  19. Liang, J., Hu, D., Feng, J., He, R.: Dine: Domain adaptation from single and multiple black-box predictors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8003–8013 (2022)

  20. Quattoni, A., Collins, M., Darrell, T.: Learning visual representations using images with captions. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp. 1–8 (2007)

  21. Srivastava, N., Salakhutdinov, R.R.: Multimodal learning with deep Boltzmann machines. Advances in neural information processing systems, p. 25 (2012)

  22. Jia, C., Yang, Y., Xia, Y., Chen, Y.-T., Parekh, Z., Pham, H., Le, Q., Sung, Y.-H., Li, Z., Duerig, T.: Scaling up visual and vision-language representation learning with noisy text supervision. In: International Conference on Machine Learning, PMLR, pp. 4904–4916 (2021)

  23. Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., Sutskever, I.: Zero-shot text-to-image generation. In: International Conference on Machine Learning, PMLR, pp. 8821–8831 (2021)

  24. Kang, G., Jiang, L., Yang, Y., Hauptmann, A.G.: Contrastive adaptation network for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4893–4902 (2019)

  25. Yang, S., Wang, Y., van de Weijer, J., Herranz, L., Jui, S.: Generalized source-free domain adaptation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8978–8987 (2021)

  26. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)

    Article  MathSciNet  Google Scholar 

  27. Saenko, K., Kulis, B., Fritz, M., Darrell, T.: Adapting Visual Category Models to New Domains. In: European Conference on computer vision. Springer, pp. 213–226 (2010)

  28. Venkateswara, H., Eusebio, J., Chakraborty, S., Panchanathan, S.: Deep hashing network for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5018–5027 (2017)

  29. Peng, X., Usman, B., Kaushik, N., Hoffman, J., Wang, D., Saenko, K.: Visda: the visual domain adaptation challenge, arXiv preprint arXiv:1710.06924 (2017)

  30. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  31. Xu, R., Li, G., Yang, J., Lin, L.: Larger norm more transferable: an adaptive feature norm approach for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1426–1435 (2019)

  32. Jin, Y., Wang, X., Long, M., Wang, J.: Minimum class confusion for versatile domain adaptation. In: European Conference on Computer Vision, Springer, pp. 464–480 (2020)

  33. Tang, H., Chen, K., Jia, K.: Unsupervised domain adaptation via structurally regularized deep clustering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8725–8735 (2020)

  34. Liang, J., Hu, D., Feng, J.: Domain adaptation with auxiliary target domain-oriented classifier. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16632–16642 (2021)

  35. Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9(11) (2008)

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (62276048) and Chengdu Science and Technology Projects (2023-YF06-00009-HZ).

Funding

This work was supported in part by National Natural Science Foundation of China (62276048).

Author information

Authors and Affiliations

Authors

Contributions

Liang Tiang contributed to the conception of the study, performed the experiment; Mao Ye contributed significantly to analysis and manuscript preparation; Lihua Zhou helped perform the analysis with constructive discussions; Qichen He helped perform the analysis with constructive discussions.

Corresponding author

Correspondence to Mao Ye.

Ethics declarations

Conflict of interest

This declaration is not applicable.

Ethical approval

This declaration is not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, L., Ye, M., Zhou, L. et al. CLIP-guided black-box domain adaptation of image classification. SIViP (2024). https://doi.org/10.1007/s11760-024-03101-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-024-03101-8

Keywords

Navigation