Advertisement

Knowledge and Information Systems

, Volume 33, Issue 2, pp 309–349 | Cite as

Parsimonious unsupervised and semi-supervised domain adaptation with good similarity functions

  • Emilie MorvantEmail author
  • Amaury Habrard
  • Stéphane Ayache
Regular paper

Abstract

In this paper, we address the problem of domain adaptation for binary classification. This problem arises when the distributions generating the source learning data and target test data are somewhat different. From a theoretical standpoint, a classifier has better generalization guarantees when the two domain marginal distributions of the input space are close. Classical approaches try mainly to build new projection spaces or to reweight the source data with the objective of moving closer the two distributions. We study an original direction based on a recent framework introduced by Balcan et al. enabling one to learn linear classifiers in an explicit projection space based on a similarity function, not necessarily symmetric nor positive semi-definite. We propose a well-founded general method for learning a low-error classifier on target data, which is effective with the help of an iterative procedure compatible with Balcan et al.’s framework. A reweighting scheme of the similarity function is then introduced in order to move closer the distributions in a new projection space. The hyperparameters and the reweighting quality are controlled by a reverse validation procedure. Our approach is based on a linear programming formulation and shows good adaptation performances with very sparse models. We first consider the challenging unsupervised case where no target label is accessible, which can be helpful when no manual annotation is possible. We also propose a generalization to the semi-supervised case allowing us to consider some few target labels when available. Finally, we evaluate our method on a synthetic problem and on a real image annotation task.

Keywords

Machine learning Transfer learning Domain adaptation Good similarity functions 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abbasnejad M, Ramachandram D, Mandava R (2012) A survey of the state of the art in learning the kernels. Knowl Inf Syst 31(2): 193–221. doi: 10.1007/s10115-011-0404-6 CrossRefGoogle Scholar
  2. 2.
    Ando R, Zhang T (2005) A framework for learning predictive structures from multiple tasks and unlabeled data. J Mach Learn Res 6: 1817–1853MathSciNetzbMATHGoogle Scholar
  3. 3.
    Ayache S, Quénot G (2008) Video corpus annotation using active learning. In: Proceedings of the 30th European conference on information retrieval research (ECIR), vol 4956 of LNCS. Springer, pp 187–198Google Scholar
  4. 4.
    Ayache S, Quénot G, Gensel J (2007) Image and video indexing using networks of operators. J Image Video Process 1: 1–113CrossRefGoogle Scholar
  5. 5.
    Bahadori MT, Liu Y, Zhang D (2011) Learning with minimum supervision: a general framework for transductive transfer learning. In: Proceedings of the 11th IEEE international conference on data mining (ICDM), pp 61–70Google Scholar
  6. 6.
    Balcan M, Blum A, Srebro N (2008a) Improved guarantees for learning via similarity functions. In: Proceedings of the annual conference on computational learning theory (COLT), pp 287–298Google Scholar
  7. 7.
    Balcan M, Blum A, Srebro N (2008) A theory of learning with similarity functions. Mach Learn J 72(1–2): 89–112CrossRefGoogle Scholar
  8. 8.
    Bellet A, Habrard A, Sebban M (2011) Learning good edit similarities with generalization guarantees. In: Proceedings of European conference on machine learning and principles of data mining and knowledge discovery (ECML/PKDD), vol 6911 of LNCS, pp 188–203Google Scholar
  9. 9.
    Ben-David S, Blitzer J, Crammer K, Kulesza A, Pereira F, Vaughan J (2010) A theory of learning from different domains. Mach Learn J 79(1–2): 151–175CrossRefGoogle Scholar
  10. 10.
    Ben-David S, Blitzer J, Crammer K, Pereira F (2007) Analysis of representations for domain adaptation. In: Proceedings of advances in neural information processing systems (NIPS), pp 137–144Google Scholar
  11. 11.
    Ben-David S, Lu T, Luu T, Pal D (2010) Impossibility theorems for domain adaptation. JMLR W&CP 9: 129–136Google Scholar
  12. 12.
    Bergamo A, Torresani L (2010) Exploiting weakly-labeled web images to improve object classification: a domain adaptation approach. In: Proceedings of advances in neural information processing systems (NIPS)Google Scholar
  13. 13.
    Blitzer J, Foster D, Kakade S (2011) Domain adaptation with coupled subspaces. In: Proceedings of AISTATSGoogle Scholar
  14. 14.
    Bruzzone L, Marconcini M (2010) Domain adaptation problems: a DASVM classification technique and a circular validation strategy. IEEE Trans Pattern Anal Mach Intell 32(5): 770–787CrossRefGoogle Scholar
  15. 15.
    Cao B, Ni X, Sun J-T, Wang G, Yang Q (2011) Distance metric learning under covariate shift. In: Proceedings of international joint conference on artificial intelligence (IJCAI), pp 1204–1210Google Scholar
  16. 16.
    Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm
  17. 17.
    Chattopadhyay R, Ye J, Panchanathan S, Fan W, Davidson I (2011) Multi-source domain adaptation and its application to early detection of fatigue. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, pp 717–725Google Scholar
  18. 18.
    Chen M, Weinberger K, Blitzer J (2011) Co-training for domain adaptation. In: Proceedings of advances in neural information processing systems (NIPS)Google Scholar
  19. 19.
    Cortes C, Mohri M (2011) Domain adaptation in regression. In: Proceedings of international conference on algorithmic learning theory (ALT), vol 6925 of LNCS, pp 308–323Google Scholar
  20. 20.
    Daumé H III (2007) Frustratingly easy domain adaptation. In: Proceedings of the association for computational linguistics (ACL)Google Scholar
  21. 21.
    Daumé H III, Kumar A, Saha A (2010) Co-regularization based semi-supervised domain adaptation. In: Proceedings of advances in neural information processing systems (NIPS)Google Scholar
  22. 22.
    Duan L, Tsang I, Xu D, Chua T (2009) Domain adaptation from multiple sources via auxiliary classifiers. In: Proceedings of international conference on machine learning (ICML), p 37Google Scholar
  23. 23.
    Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2007) The PASCAL visual object classes challenge 2007 (VOC2007) results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/
  24. 24.
    Fei H, Huan J (2011) Structured feature selection and task relationship inference for multi-task learning. In: Proceedings of the 11th IEEE international conference on data mining (ICDM). IEEE, pp 171–180Google Scholar
  25. 25.
    Freund R (1991) Polynomial-time algorithms for linear programming based only on primal scaling and projected gradients of a potential function. Math Program 51: 203–222MathSciNetzbMATHCrossRefGoogle Scholar
  26. 26.
    Geng B, Tao D, Xu C (2011) DAML: Domain adaptation metric learning. IEEE Trans Image Process (TIP) 20(10): 2980–2989MathSciNetCrossRefGoogle Scholar
  27. 27.
    Guerra P, Veloso A Jr, WM, Almeida V (2011) From bias to opinion: a transfer-learning approach to real-time sentiment analysis. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, pp 150–158Google Scholar
  28. 28.
    Huang J, Smola A, Gretton A, Borgwardt K, Schölkopf B (2006) Correcting sample selection bias by unlabeled data. In: Proceedings of advances in neural information processing systems (NIPS), pp 601–608Google Scholar
  29. 29.
    Jiang J (2008) A literature survey on domain adaptation of statistical classifiers. Technical report, Computer Science Department at University of Illinois at Urbana-Champaign. http://sifaka.cs.uiuc.edu/jiang4/domain_adaptation/da_survey.pdf
  30. 30.
    Jiang J, Zhai C (2007) Instance weighting for domain adaptation in nlp. In: Proceedings of the association for computational linguistics (ACL)Google Scholar
  31. 31.
    Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of international conference on machine learning (ICML), pp 200–209Google Scholar
  32. 32.
    Junejo K, Karim A (2012) Robust personalizable spam filtering via local and global discrimination modeling. Knowl Inf Syst 1–36. doi: 10.1007/s10115-012-0477-x
  33. 33.
    Kulis B, Saenko K, Darrell T (2011) What you saw is not what you get: domain adaptation using asymmetric kernel transforms. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR 2011), pp 1785–1792Google Scholar
  34. 34.
    Macqueen J (1967) Some methods of classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, pp 281–297Google Scholar
  35. 35.
    Mansour Y, Mohri M, Rostamizadeh A (2008) Domain adaptation with multiple sources. In: Proceedings of advances in neural information processing systems (NIPS), pp 1041–1048Google Scholar
  36. 36.
    Mansour Y, Mohri M, Rostamizadeh A (2009) Domain adaptation: learning bounds and algorithms. In: Proceedings of annual conference on learning theory (COLT), pp 19–30Google Scholar
  37. 37.
    Pan S, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10): 1345–1359CrossRefGoogle Scholar
  38. 38.
    Quionero-Candela J, Sugiyama M, Schwaighofer A, Lawrence N (2009) Dataset shift in machine learning. MIT Press, CambridgeGoogle Scholar
  39. 39.
    Schweikert G, Widmer C, Schölkopf B, Rätsch G (2008) An empirical analysis of domain adaptation algorithms for genomic sequence analysis. In: Proceedings of advances in neural information processing systems (NIPS), pp 1433–1440Google Scholar
  40. 40.
    Seah C, Tsang I, Ong Y, Lee K (2010) Predictive distribution matching svm for multi-domain learning. In: Proceedings of European conference on machine learning and principles of data mining and knowledge discovery (ECML/PKDD), vol 6321 of LNCS. Springer, pp 231–247Google Scholar
  41. 41.
    Smeaton A, Over P, Kraaij W (2009) High-level feature detection from video in TRECVid: a 5-year retrospective of achievements. In: Multimedia content analysis, theory and applications. Springer, pp 151–174Google Scholar
  42. 42.
    Sugiyama M, Nakajima S, Kashima H, von Bünau P, Kawanabe M (2007) Direct importance estimation with model selection and its application to covariate shift adaptation. In: Proceedings of advances in neural information processing systems (NIPS)Google Scholar
  43. 43.
    Vapnik V (1998) Statistical learning theory. Springer, BerlinzbMATHGoogle Scholar
  44. 44.
    Wang B, Tang J, Fan W, Chen S, Tan C, Yang Z (2012) Query-dependent cross-domain ranking in heterogeneous network. Knowl Inf Syst 1–37. doi: 10.1007/s10115-011-0472-7
  45. 45.
    Xu H, Mannor S (2010) Robustness and generalization. In: Proceedings of annual conference on computational theory (COLT), pp 503–515Google Scholar
  46. 46.
    Xu H, Mannor S (2012) Robustness and generalization. Mach Learn J 86(3): 391–423MathSciNetzbMATHCrossRefGoogle Scholar
  47. 47.
    Xu Z, Kersting K (2011) Multi-task learning with task relations. In: Proceedings of the 11th IEEE international conference on data mining (ICDM). IEEE, pp 884–893Google Scholar
  48. 48.
    Xue G-R, Dai W, Yang Q, Yu Y (2008) Topic-bridged plsa for cross-domain text classification. In: Proceedings of international ACM SIGIR conference on research and development in information retrieval, pp 627–634Google Scholar
  49. 49.
    Ye Y (1991) ‘An O(n 3L) potential reduction algorithm for linear programming’. Math Program 50: 239–258zbMATHCrossRefGoogle Scholar
  50. 50.
    Zhang Y, Yeung D-Y (2010) Transfer metric learning by learning task relationships. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, pp 1199–1208Google Scholar
  51. 51.
    Zhong E, Fan W, Yang Q, Verscheure O, Ren J (2010) Cross validation framework to choose amongst models and datasets for transfer learning. In: Proceedings of European conference on machine learning and principles of data mining and knowledge discovery (ECML/PKDD), vol 6323 of LNCS. Springer, pp 547–562Google Scholar

Copyright information

© Springer-Verlag London Limited 2012

Authors and Affiliations

  • Emilie Morvant
    • 1
    Email author
  • Amaury Habrard
    • 2
  • Stéphane Ayache
    • 1
  1. 1.LIF-QARMA, CNRS, UMR 7279Aix-Marseille UniversityMarseilleFrance
  2. 2.Lab. Hubert Curien, CNRS, UMR 5516University of St-EtienneSt-EtienneFrance

Personalised recommendations