Cross Validation Framework to Choose amongst Models and Datasets for Transfer Learning

  • Erheng Zhong
  • Wei Fan
  • Qiang Yang
  • Olivier Verscheure
  • Jiangtao Ren
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6323)


One solution to the lack of label problem is to exploit transfer learning, whereby one acquires knowledge from source-domains to improve the learning performance in the target-domain. The main challenge is that the source and target domains may have different distributions. An open problem is how to select the available models (including algorithms and parameters) and importantly, abundance of source-domain data, through statistically reliable methods, thus making transfer learning practical and easy-to-use for real-world applications. To address this challenge, one needs to take into account the difference in both marginal and conditional distributions in the same time, but not just one of them. In this paper, we formulate a new criterion to overcome “double” distribution shift and present a practical approach “Transfer Cross Validation” (TrCV) to select both models and data in a cross validation framework, optimized for transfer learning. The idea is to use density ratio weighting to overcome the difference in marginal distributions and propose a “reverse validation” procedure to quantify how well a model approximates the true conditional distribution of target-domain. The usefulness of TrCV is demonstrated on different cross-domain tasks, including wine quality evaluation, web-user ranking and text categorization. The experiment results show that the proposed method outperforms both traditional cross-validation and one state-of-the-art method which only considers marginal distribution shift. The software and datasets are available from the authors.


Cross Validation Conditional Distribution Density Ratio Target Domain Label Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Pan, S.J., Yang, Q.: A survey on transfer learning. Technical Report HKUST-CS08-08, Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China (November 2008)Google Scholar
  2. 2.
    Asuncion, A., Newman, D.J.: uci machine learning repository (2007),
  3. 3.
    Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)Google Scholar
  4. 4.
    Akaike, H.: A new look at the statistical model identification. IEEE Transactions on Automatic Control 19(6), 716–723 (2003)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6(2), 461–464 (1978)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)zbMATHGoogle Scholar
  7. 7.
    Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference 90(2), 227–244 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Sugiyama, M., Krauledat, M., Müller, K.R.: Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research 8, 985–1005 (2007)Google Scholar
  9. 9.
    Sugiyama, M., Nakajima, S., Kashima, H., Buenau, P.V., Kawanabe, M.: Direct importance estimation with model selection and its application to covariate shift adaptation. In: NIPS ’07: Proceedings of the 2007 Conference on Advances in Neural Information Processing Systems, vol. 20, pp. 1433–1440. MIT Press, Cambridge (2008)Google Scholar
  10. 10.
    Huang, J., Smola, A.J., Gretton, A., Borgwardt, K.M., Schölkopf, B.: Correcting sample selection bias by unlabeled data. In: NIPS ’06: Proceedings of the 2006 Conference on Advances in Neural Information Processing Systems, vol. 19, pp. 601–608. MIT Press, Cambridge (2007)Google Scholar
  11. 11.
    Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI’95: Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 1137–1143. Morgan Kaufmann Publishers Inc, San Francisco (1995)Google Scholar
  12. 12.
    Dai, W., Yang, Q., Xue, G.R., Yu, Y.: Boosting for transfer learning. In: ICML ’07: Proceedings of the 24th International Conference on Machine Learning, pp. 193–200. ACM, New York (2007)CrossRefGoogle Scholar
  13. 13.
    Xie, S., Fan, W., Peng, J., Verscheure, O., Ren, J.: Latent space domain transfer between high dimensional overlapping distributions. In: WWW ’09: Proceedings of the 18th International Conference on World Wide Web, pp. 91–100. ACM, New York (2009)CrossRefGoogle Scholar
  14. 14.
    Gao, J., Fan, W., Jiang, J., Han, J.: Knowledge transfer via multiple model local structure mapping. In: KDD ’08: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 283–291. ACM, New York (2008)CrossRefGoogle Scholar
  15. 15.
    Fan, W., Davidson, I.: Reverse testing: an efficient framework to select amongst classifiers under sample selection bias. In: KDD ’06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 147–156. ACM, New York (2006)CrossRefGoogle Scholar
  16. 16.
    Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, R.: Metalearning: Applications to Data Mining. In: Cognitive Technologies. Springer, Heidelberg (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Erheng Zhong
    • 1
  • Wei Fan
    • 2
  • Qiang Yang
    • 3
  • Olivier Verscheure
    • 2
  • Jiangtao Ren
    • 1
  1. 1.Sun Yat-Sen UniversityGuangzhouChina
  2. 2.IBM T.J Watson ResearchUSA
  3. 3.Department of Computer ScienceHong Kong University of Science and Technology 

Personalised recommendations