Skip to main content
Log in

Online transfer learning with multiple decision trees

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Online learning techniques have been widely used in many fields where instances come one by one. However, in early stage of a data stream, online learning models cannot exhibit good classification accuracy for it cannot collect sufficient instances to learn. For example, a well-known online learning algorithm named as very fast decision tree (VFDT) needs to wait for Hoeffding bound satisfied to split, which leads to poor classification accuracy at the beginning of data stream. Thus, VFDT may not be appropriate for some real applications which demand us a fast and accurate online detection. This situation will become more serious in the scenario of data stream classification with concept drift. This paper attempts to take transfer learning algorithm to make up this shortcoming of VFDT. To achieve this goal, a new decision tree method named as VFDT-D is first proposed to cache instances in its leaf nodes to handle numerical attributes and adapt to a framework of online transfer learning (OTL), and then a measure which considers tree path, classification accuracy and classification confidence is proposed to evaluate the local similarity between source and target domain classifiers. At last, a multiple-source online transfer learning algorithm named as DMOTL is proposed to take VFDT-D as base classifier and use the proposed measure of local similarity to select the optimal source domain classifier to help transfer learning. The extensive experiments on several synthetic and real-world datasets demonstrate the advantage of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27

Similar content being viewed by others

References

  1. Shalev-Shwartz S (2012) Online learning and online convex optimization. Found Trends Mach Learn 4(2):107–194

    Article  MATH  Google Scholar 

  2. Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proc. of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, pp. 71–80

  3. Chattopadhyay R, Ye J, Panchanathan S, et al (2011) Multisource domain adaptation and its application to early detection of fatigue. In: Pro of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, New York: ACM, pp. 717–725

  4. Sidhu P, Bhatia MPS (2018) A novel online ensemble approach to handle concept drifting data streams: diversified dynamic weighted majority. Int J Mach Learn Cybern 9(1):37–61

    Article  Google Scholar 

  5. Pan SJ, Yang Q (2010) A Survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

    Article  Google Scholar 

  6. Zhuang FZ, Luo P, He Q et al (2015) Survey on transfer learning research. J Softw 26(1):26–39 (in Chinese)

    MathSciNet  Google Scholar 

  7. Weiss K, Khoshgoftaar TM, Wang DD (2016) A survey of transfer learning. J Big Data 3(1):9–48

    Article  Google Scholar 

  8. Pan W, Yang Q (2013) Transfer learning in heterogeneous collaborative filtering domains. Artif Intell 197(4):39–55

    Article  MathSciNet  MATH  Google Scholar 

  9. Pan W, Zhong H, Xu C et al (2015) Adaptive bayesian personalized ranking for heterogeneous implicit feedbacks. Knowl-Based Syst 73(1):173–180

    Article  Google Scholar 

  10. Quattoni A, Collins M, Darrell T (2008) Transfer learning for image classification with sparse prototype representations. In: Proc of the Computer Vision and Pattern Recognition. Piscataway: IEEE, pp. 1–8

  11. Zhao P, Hoi SCH (2010) OTL: a framework of online transfer learning. In: Proc. of the international conference on machine learning. New York: ACM, pp. 1231–1238

  12. Zhao P, Hoi SCH, Wang J et al (2014) Online transfer learning. J Artif Intell 216(16):76–102

    Article  MathSciNet  MATH  Google Scholar 

  13. Wu Q, Wu H, Zhou X et al (2017) Online transfer learning with multiple homogeneous or heterogeneous Sources. IEEE Trans Knowl Data Eng 29(7):1494–1507

    Article  Google Scholar 

  14. Li ZJ, Li YX, Wang F et al (2015) Online learning algorithms for big data analytics: a survey. J Comput Res Dev 52(8):1707–1721 (in Chinese)

    Google Scholar 

  15. Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386–408

    Article  Google Scholar 

  16. Crammer K, Dekel O, Keshet J et al (2006) Online passive-aggressive algorithms. J Mach Learn Res 7(3):551–585

    MathSciNet  MATH  Google Scholar 

  17. Gama J, Rocha R, Medas P (2003) Accurate decision trees for mining high-speed data streams. In: Proc. of the ACM SIGKDD international conference on knowledge discovery and data mining. New York: ACM, pp 523–528

  18. Dai W, Yang Q, Xue G R et al (2007) Boosting for transfer learning. In: Proc. of the 24th international conference on Machine learning. New York: ACM, pp 193–200

  19. Eaton E, Desjardins M (2011) Selective transfer between learning tasks using task-based boosting. In: Proc. of the AAAI conference on artificial intelligence. Menlo Park: AAAI, pp 337–342

  20. Wang XS, Pan J, Cheng YH et al (2013) Self-adaptive transfer for decision trees based on similarity metric. Acta Automatica Sinica 39(12):2186–2192 (in Chinese)

    Article  Google Scholar 

  21. Gao J, Fan W, Jiang J et al (2008) Knowledge transfer via multiple model local structure mapping. In: Proc. of the ACM SIGKDD international conference on knowledge discovery and data mining. New York: ACM, pp 283–291

  22. Ge L, Gao J, Zhang AD (2013) OMS-TL: a framework of online multiple source transfer learning. In: Proc. of the 22nd ACM international conference on information and knowledge management. New York: ACM, pp 2423–2428

  23. Tang SQ, Wen YM, Qin YX (2017) Online transfer learning from multiple sources based on local classification accuracy. J Softw 28(11):2940–2960 (in Chinese)

    MATH  Google Scholar 

  24. Zadrozny B, Elkan C (2001) Learning and making decisions when costs and probabilities are both unknown. In: Proc. of the ACM SIGKDD international conference on knowledge discovery & data mining. New York: ACM, pp 204–213

  25. Ntoutsi I, Kalousis A, Theodoridis Y (2008) A general framework for estimating similarity of datasets and decision trees: exploring semantic similarity of decision trees. In: Proc. of the SIAM international conference on data mining. Philadelphia: SIAM, pp 810–821

  26. Huang Z (1997) Clustering large datasets with mixed numeric and categorical values. In: Proc. of the 1st Pacific-asia conference on knowledge discovery and data mining. Springer, Berlin, pp 21–34

  27. Bifet A, Holmes G, Kirkby R et al (2010) MOA: massive online analysis. J Mach Learn Res 11(2):1601–1604

    Google Scholar 

  28. Xiang W E, Pan J S, Pan W et al (2011) Source-selection-free transfer learning, In: Proc of the twenty-second international joint conference on artificial intelligence, Menlo Park: AAAI, pp 2355–2360

Download references

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (61866007, 61762029, 61662014, 61763007), the Natural Science Foundation of Guangxi District (2018GXNSFDA138006), Guangxi Key Laboratory of Trusted Software (KX201721), Collaborative innovation center of cloud computing and big data (YD16E12), Image intelligent processing project of Key Laboratory Fund (GIIP201505).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yimin Wen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (RAR 1027 kb)

Supplementary material 2 (RAR 50 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wen, Y., Qin, Y., Qin, K. et al. Online transfer learning with multiple decision trees. Int. J. Mach. Learn. & Cyber. 10, 2941–2962 (2019). https://doi.org/10.1007/s13042-019-00998-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-019-00998-3

Keywords

Navigation