Advertisement

Online transfer learning with multiple decision trees

  • Yimin WenEmail author
  • Yixiu Qin
  • Keke Qin
  • Xiaoxia Lu
  • Pingshan Liu
Original Article
  • 2 Downloads

Abstract

Online learning techniques have been widely used in many fields where instances come one by one. However, in early stage of a data stream, online learning models cannot exhibit good classification accuracy for it cannot collect sufficient instances to learn. For example, a well-known online learning algorithm named as very fast decision tree (VFDT) needs to wait for Hoeffding bound satisfied to split, which leads to poor classification accuracy at the beginning of data stream. Thus, VFDT may not be appropriate for some real applications which demand us a fast and accurate online detection. This situation will become more serious in the scenario of data stream classification with concept drift. This paper attempts to take transfer learning algorithm to make up this shortcoming of VFDT. To achieve this goal, a new decision tree method named as VFDT-D is first proposed to cache instances in its leaf nodes to handle numerical attributes and adapt to a framework of online transfer learning (OTL), and then a measure which considers tree path, classification accuracy and classification confidence is proposed to evaluate the local similarity between source and target domain classifiers. At last, a multiple-source online transfer learning algorithm named as DMOTL is proposed to take VFDT-D as base classifier and use the proposed measure of local similarity to select the optimal source domain classifier to help transfer learning. The extensive experiments on several synthetic and real-world datasets demonstrate the advantage of the proposed algorithm.

Keywords

Online learning Transfer learning Multiple sources Local similarity Incremental decision tree Concept drift 

Notes

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (61866007, 61762029, 61662014, 61763007), the Natural Science Foundation of Guangxi District (2018GXNSFDA138006), Guangxi Key Laboratory of Trusted Software (KX201721), Collaborative innovation center of cloud computing and big data (YD16E12), Image intelligent processing project of Key Laboratory Fund (GIIP201505).

Supplementary material

13042_2019_998_MOESM1_ESM.rar (1 mb)
Supplementary material 1 (RAR 1027 kb)
13042_2019_998_MOESM2_ESM.rar (51 kb)
Supplementary material 2 (RAR 50 kb)

References

  1. 1.
    Shalev-Shwartz S (2012) Online learning and online convex optimization. Found Trends Mach Learn 4(2):107–194zbMATHGoogle Scholar
  2. 2.
    Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proc. of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, pp. 71–80Google Scholar
  3. 3.
    Chattopadhyay R, Ye J, Panchanathan S, et al (2011) Multisource domain adaptation and its application to early detection of fatigue. In: Pro of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, New York: ACM, pp. 717–725Google Scholar
  4. 4.
    Sidhu P, Bhatia MPS (2018) A novel online ensemble approach to handle concept drifting data streams: diversified dynamic weighted majority. Int J Mach Learn Cybern 9(1):37–61Google Scholar
  5. 5.
    Pan SJ, Yang Q (2010) A Survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359Google Scholar
  6. 6.
    Zhuang FZ, Luo P, He Q et al (2015) Survey on transfer learning research. J Softw 26(1):26–39 (in Chinese) MathSciNetGoogle Scholar
  7. 7.
    Weiss K, Khoshgoftaar TM, Wang DD (2016) A survey of transfer learning. J Big Data 3(1):9–48Google Scholar
  8. 8.
    Pan W, Yang Q (2013) Transfer learning in heterogeneous collaborative filtering domains. Artif Intell 197(4):39–55MathSciNetzbMATHGoogle Scholar
  9. 9.
    Pan W, Zhong H, Xu C et al (2015) Adaptive bayesian personalized ranking for heterogeneous implicit feedbacks. Knowl-Based Syst 73(1):173–180Google Scholar
  10. 10.
    Quattoni A, Collins M, Darrell T (2008) Transfer learning for image classification with sparse prototype representations. In: Proc of the Computer Vision and Pattern Recognition. Piscataway: IEEE, pp. 1–8Google Scholar
  11. 11.
    Zhao P, Hoi SCH (2010) OTL: a framework of online transfer learning. In: Proc. of the international conference on machine learning. New York: ACM, pp. 1231–1238Google Scholar
  12. 12.
    Zhao P, Hoi SCH, Wang J et al (2014) Online transfer learning. J Artif Intell 216(16):76–102MathSciNetzbMATHGoogle Scholar
  13. 13.
    Wu Q, Wu H, Zhou X et al (2017) Online transfer learning with multiple homogeneous or heterogeneous Sources. IEEE Trans Knowl Data Eng 29(7):1494–1507Google Scholar
  14. 14.
    Li ZJ, Li YX, Wang F et al (2015) Online learning algorithms for big data analytics: a survey. J Comput Res Dev 52(8):1707–1721 (in Chinese) MathSciNetGoogle Scholar
  15. 15.
    Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386–408Google Scholar
  16. 16.
    Crammer K, Dekel O, Keshet J et al (2006) Online passive-aggressive algorithms. J Mach Learn Res 7(3):551–585MathSciNetzbMATHGoogle Scholar
  17. 17.
    Gama J, Rocha R, Medas P (2003) Accurate decision trees for mining high-speed data streams. In: Proc. of the ACM SIGKDD international conference on knowledge discovery and data mining. New York: ACM, pp 523–528Google Scholar
  18. 18.
    Dai W, Yang Q, Xue G R et al (2007) Boosting for transfer learning. In: Proc. of the 24th international conference on Machine learning. New York: ACM, pp 193–200Google Scholar
  19. 19.
    Eaton E, Desjardins M (2011) Selective transfer between learning tasks using task-based boosting. In: Proc. of the AAAI conference on artificial intelligence. Menlo Park: AAAI, pp 337–342Google Scholar
  20. 20.
    Wang XS, Pan J, Cheng YH et al (2013) Self-adaptive transfer for decision trees based on similarity metric. Acta Automatica Sinica 39(12):2186–2192 (in Chinese) Google Scholar
  21. 21.
    Gao J, Fan W, Jiang J et al (2008) Knowledge transfer via multiple model local structure mapping. In: Proc. of the ACM SIGKDD international conference on knowledge discovery and data mining. New York: ACM, pp 283–291Google Scholar
  22. 22.
    Ge L, Gao J, Zhang AD (2013) OMS-TL: a framework of online multiple source transfer learning. In: Proc. of the 22nd ACM international conference on information and knowledge management. New York: ACM, pp 2423–2428Google Scholar
  23. 23.
    Tang SQ, Wen YM, Qin YX (2017) Online transfer learning from multiple sources based on local classification accuracy. J Softw 28(11):2940–2960 (in Chinese) zbMATHGoogle Scholar
  24. 24.
    Zadrozny B, Elkan C (2001) Learning and making decisions when costs and probabilities are both unknown. In: Proc. of the ACM SIGKDD international conference on knowledge discovery & data mining. New York: ACM, pp 204–213Google Scholar
  25. 25.
    Ntoutsi I, Kalousis A, Theodoridis Y (2008) A general framework for estimating similarity of datasets and decision trees: exploring semantic similarity of decision trees. In: Proc. of the SIAM international conference on data mining. Philadelphia: SIAM, pp 810–821Google Scholar
  26. 26.
    Huang Z (1997) Clustering large datasets with mixed numeric and categorical values. In: Proc. of the 1st Pacific-asia conference on knowledge discovery and data mining. Springer, Berlin, pp 21–34Google Scholar
  27. 27.
    Bifet A, Holmes G, Kirkby R et al (2010) MOA: massive online analysis. J Mach Learn Res 11(2):1601–1604Google Scholar
  28. 28.
    Xiang W E, Pan J S, Pan W et al (2011) Source-selection-free transfer learning, In: Proc of the twenty-second international joint conference on artificial intelligence, Menlo Park: AAAI, pp 2355–2360Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Guangxi Key Laboratory of Trusted SoftwareGuilin University of Electronic TechnologyGuilinChina
  2. 2.Guangxi Key Laboratory of Image and Graphic Intelligent ProcessingGuilin University of Electronic TechnologyGuilinChina
  3. 3.School of Computer Science and Information SafetyGuilin University of Electronic TechnologyGuilinChina
  4. 4.Information Engineering CollegeGuangzhou Huaxia Vocational CollegeGuangzhouChina

Personalised recommendations