Knowledge and Information Systems

, Volume 38, Issue 1, pp 61–83 | Cite as

A general framework for scalable transductive transfer learning

Regular paper

Abstract

Transductive transfer learning is one special type of transfer learning problem, in which abundant labeled examples are available in the source domain and only unlabeled examples are available in the target domain. It easily finds applications in spam filtering, microblogging mining, and so on. In this paper, we propose a general framework to solve the problem by mapping the input features in both the source domain and the target domain into a shared latent space and simultaneously minimizing the feature reconstruction loss and prediction loss. We develop one specific example of the framework, namely latent large-margin transductive transfer learning algorithm, and analyze its theoretic bound of classification loss via Rademacher complexity. We also provide a unified view of several popular transfer learning algorithms under our framework. Experiment results on one synthetic dataset and three application datasets demonstrate the advantages of the proposed algorithm over the other state-of-the-art ones.

Keywords

Transductive transfer learning Large-margin approach  Rademacher complexity Stochastic gradient descent 

References

  1. 1.
    Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. In Proceedings of advances in neural information processing systems, MIT PressGoogle Scholar
  2. 2.
    Argyriou A, Evgeniou T, Pontil M (2007) A comparative study of methods for transductive transfer learning, In: Proceedings of IEEE international conference on data miningWGoogle Scholar
  3. 3.
    Bahadori MT, Liu Y, Zhang D (2011) Learning with minimum supervision: a general framework for transductive transfer learning. In: Proceedings of IEEE international conference on data miningGoogle Scholar
  4. 4.
    Bartlett PL, Mendelson S (2003) Rademacher and Gaussian complexities: risk bounds and structural results. J Mach Learn Res 3:463–482Google Scholar
  5. 5.
    Ben-David S, Blitzer J, Crammer K, Pereira F (2007) Analysis of representations for domain adaptation. In: Proceedings of advances in neural information processing systemsGoogle Scholar
  6. 6.
    Blitzer J, McDonald R, Pereira F (2006) Domain adaptation with structural correspondence learning. In: EMNLPGoogle Scholar
  7. 7.
    John B, Mark D, Fernando P (2007) Biographies. Domain adaptation for sentiment classification. In: Proceedings of the annual meeting of the Association for Computational Linguistics, Bollywood, Boom-boxes and BlendersGoogle Scholar
  8. 8.
    Bonilla E, Chai KM, Williams C (2008) Multi-task Gaussian process prediction. In: Proceedings of advances in neural information processing systemsGoogle Scholar
  9. 9.
    Bottou L (1998) Online algorithms and stochastic approximations. In: David S (ed) Online learning and neural networks. Cambridge University Press, CambridgeGoogle Scholar
  10. 10.
    Bottou L (2011) Stochastic gradient descent (version 2). http://leon.bottou.org/projects/sgd
  11. 11.
    Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, CambridgeCrossRefMATHGoogle Scholar
  12. 12.
    Bradley DM, Andrew BJ (2009) Convex coding. In: Proceedings of conference on uncertainty in artificial intelligenceGoogle Scholar
  13. 13.
    Chapelle O, Zien A (2005) Semi-supervised classification by low density separation. In proceedings of the Tenth Internationlal Workshop on Artificial Intelligence and Statistics, pp 57–64Google Scholar
  14. 14.
    Chapelle O, Sindhwani V, Keerthi SS (2008) Optimization techniques for semi-supervised support vector machines. J Mach Learn Res 9:203–233Google Scholar
  15. 15.
    Chong EKP, Zak SH (2008) An introduction to optimization (Wiley-Interscience series in discrete mathematics and optimization), 3rd edn. Wiley-Interscience, New YorkGoogle Scholar
  16. 16.
    Collobert R, Sinz F, Weston J, Bottou L (2006) Large scale transductive SVMs. J Mach Learn Res 7:1687–1712Google Scholar
  17. 17.
    Dai W, Yang Q, Xue G-R, Yu Y (2007) Boosting for transfer learning. In: Proceedings of international conference on machine learningGoogle Scholar
  18. 18.
    Dai W, Yang Q, Xue G-R, Yu Y (2008) Self-taught clustering. In: Proceedings of international conference on machine learningGoogle Scholar
  19. 19.
    Daum H (2007) Frustratingly easy domain adaptation. In: Proceedings of the annual meeting of the Association for Computational LinguisticsGoogle Scholar
  20. 20.
    Davis J, Domingos P (2009) Deep transfer via second-order markov logic. In: Proceedings of international conference on machine learningGoogle Scholar
  21. 21.
    Duan L, Tsang IW, Xu D, Maybank SJ (2009) Domain transfer svm for video concept detection. In: IEEE conference on computer vision and pattern recognitionGoogle Scholar
  22. 22.
    El-Yaniv R, Pechyony D (2007) Transductive rademacher complexity and its applications. In: Proceedings of conference on learning theoryGoogle Scholar
  23. 23.
    Gu Z, Rothberg E, Bixby R (2011) Gurobi 4.6.2, 2011. http://www.gurobi.com/
  24. 24.
    He J, Liu Y, Lawrence R (2009) Graph-based transfer learning. In: Proceedings of conference on information and knowledge managementGoogle Scholar
  25. 25.
    Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of international conference on machine learningGoogle Scholar
  26. 26.
    Lanckriet G, Cristianini N, Bartlett P, Ghaoui LE (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72Google Scholar
  27. 27.
    Lawrence ND, Platt JC (2004) Learning to learn with the informative vector machine. In: Proceedings of international conference on machine learningGoogle Scholar
  28. 28.
    Lee H, Battle A, Raina R, AY Ng (2007) Efficient sparse coding algorithms. In: Proceedings of advances in neural information processing systemsGoogle Scholar
  29. 29.
    Liu L, Liang Q (2011) A high-performing comprehensive learning algorithm for text classification without pre-labeled training set. Knowl Inf Syst 29(3):727–738Google Scholar
  30. 30.
    Mihalkova L, Mooney RJ (2008) Transfer learning by mapping with minimal target data. In: Proceedings of AAAI conference on artificial intelligenceGoogle Scholar
  31. 31.
    Mihalkova L, Huynh T, Mooney RJ (2007) Mapping and revising markov logic networks for transfer learning. In: Proceedings of AAAI conference on artificial intelligenceGoogle Scholar
  32. 32.
    Pan SJ, Yang Q (2010) A survey on transfer learning. TKDEGoogle Scholar
  33. 33.
    Pan SJ, Kwok JT, Yang Q (2008) Transfer learning via dimensionality reduction. In: Proceedings of AAAI conference on artificial intelligenceGoogle Scholar
  34. 34.
    Quanz B, Huan J (2009) Large margin transductive transfer learning. In: Proceedings of conference on information and knowledge managementGoogle Scholar
  35. 35.
    Raina R, Battle A, Lee H, Packer B, Ng AY (2007) Self-taught learning: transfer learning from unlabeled data. In: Proceedings of international conference on machine learningGoogle Scholar
  36. 36.
    Shalev-Shwartz S, Singer Y, Srebro N (2007) Pegasos: primal estimated sub-gradient solver for svm. In: Proceedings of international conference on machine learningGoogle Scholar
  37. 37.
    Shao H, Tong B, Suzuki B (2012) Extended MDL principle for feature-based inductive transfer learning. Knowl Inf Syst 35(2):365–369Google Scholar
  38. 38.
    Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  39. 39.
    Thrun S, Pratt L (eds) (1998) Learning to learn. Kluwer Academic Publishers, NorwellMATHGoogle Scholar
  40. 40.
    Vapnik VN (1995) The nature of statistical learning theory. Springer, New YorkCrossRefMATHGoogle Scholar
  41. 41.
    Junhui W, Xiaotong S, Wei P (2007) On transductive support vector machines. In: Prediction and Discovery. American Mathematical SocietyGoogle Scholar
  42. 42.
    Xu W (2011) Towards optimal one pass large scale learning with averaged stochastic gradient descent. CoRRGoogle Scholar
  43. 43.
    Xu Z, Jin R, Zhu J, King I, Lyu M (2008) Efficient convex relaxation for transductive support vector machine. In: Proceedings of advances in neural information processing systemsGoogle Scholar
  44. 44.
    Xue Y, Liao X, Carin L, Krishnapuram B (2007) Multi-task learning for classification with dirichlet process priors. J Mach Learn Res 8:35–63Google Scholar
  45. 45.
    Zhang D, Liu Y, Lawrence RD, Chenthamarakshan V (2011) Transfer latent semantic learning: Microblog mining with less supervision. In: Proceedings of AAAI conference on artificial intelligenceGoogle Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of Southern CaliforniaLos AngelesUSA
  2. 2.Department of Computer SciencePurdue UniversityWest LafayetteUSA

Personalised recommendations