Abstract
Online learning techniques have been widely used in many fields where instances come one by one. However, in early stage of a data stream, online learning models cannot exhibit good classification accuracy for it cannot collect sufficient instances to learn. For example, a well-known online learning algorithm named as very fast decision tree (VFDT) needs to wait for Hoeffding bound satisfied to split, which leads to poor classification accuracy at the beginning of data stream. Thus, VFDT may not be appropriate for some real applications which demand us a fast and accurate online detection. This situation will become more serious in the scenario of data stream classification with concept drift. This paper attempts to take transfer learning algorithm to make up this shortcoming of VFDT. To achieve this goal, a new decision tree method named as VFDT-D is first proposed to cache instances in its leaf nodes to handle numerical attributes and adapt to a framework of online transfer learning (OTL), and then a measure which considers tree path, classification accuracy and classification confidence is proposed to evaluate the local similarity between source and target domain classifiers. At last, a multiple-source online transfer learning algorithm named as DMOTL is proposed to take VFDT-D as base classifier and use the proposed measure of local similarity to select the optimal source domain classifier to help transfer learning. The extensive experiments on several synthetic and real-world datasets demonstrate the advantage of the proposed algorithm.
Similar content being viewed by others
References
Shalev-Shwartz S (2012) Online learning and online convex optimization. Found Trends Mach Learn 4(2):107–194
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proc. of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, pp. 71–80
Chattopadhyay R, Ye J, Panchanathan S, et al (2011) Multisource domain adaptation and its application to early detection of fatigue. In: Pro of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, New York: ACM, pp. 717–725
Sidhu P, Bhatia MPS (2018) A novel online ensemble approach to handle concept drifting data streams: diversified dynamic weighted majority. Int J Mach Learn Cybern 9(1):37–61
Pan SJ, Yang Q (2010) A Survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Zhuang FZ, Luo P, He Q et al (2015) Survey on transfer learning research. J Softw 26(1):26–39 (in Chinese)
Weiss K, Khoshgoftaar TM, Wang DD (2016) A survey of transfer learning. J Big Data 3(1):9–48
Pan W, Yang Q (2013) Transfer learning in heterogeneous collaborative filtering domains. Artif Intell 197(4):39–55
Pan W, Zhong H, Xu C et al (2015) Adaptive bayesian personalized ranking for heterogeneous implicit feedbacks. Knowl-Based Syst 73(1):173–180
Quattoni A, Collins M, Darrell T (2008) Transfer learning for image classification with sparse prototype representations. In: Proc of the Computer Vision and Pattern Recognition. Piscataway: IEEE, pp. 1–8
Zhao P, Hoi SCH (2010) OTL: a framework of online transfer learning. In: Proc. of the international conference on machine learning. New York: ACM, pp. 1231–1238
Zhao P, Hoi SCH, Wang J et al (2014) Online transfer learning. J Artif Intell 216(16):76–102
Wu Q, Wu H, Zhou X et al (2017) Online transfer learning with multiple homogeneous or heterogeneous Sources. IEEE Trans Knowl Data Eng 29(7):1494–1507
Li ZJ, Li YX, Wang F et al (2015) Online learning algorithms for big data analytics: a survey. J Comput Res Dev 52(8):1707–1721 (in Chinese)
Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386–408
Crammer K, Dekel O, Keshet J et al (2006) Online passive-aggressive algorithms. J Mach Learn Res 7(3):551–585
Gama J, Rocha R, Medas P (2003) Accurate decision trees for mining high-speed data streams. In: Proc. of the ACM SIGKDD international conference on knowledge discovery and data mining. New York: ACM, pp 523–528
Dai W, Yang Q, Xue G R et al (2007) Boosting for transfer learning. In: Proc. of the 24th international conference on Machine learning. New York: ACM, pp 193–200
Eaton E, Desjardins M (2011) Selective transfer between learning tasks using task-based boosting. In: Proc. of the AAAI conference on artificial intelligence. Menlo Park: AAAI, pp 337–342
Wang XS, Pan J, Cheng YH et al (2013) Self-adaptive transfer for decision trees based on similarity metric. Acta Automatica Sinica 39(12):2186–2192 (in Chinese)
Gao J, Fan W, Jiang J et al (2008) Knowledge transfer via multiple model local structure mapping. In: Proc. of the ACM SIGKDD international conference on knowledge discovery and data mining. New York: ACM, pp 283–291
Ge L, Gao J, Zhang AD (2013) OMS-TL: a framework of online multiple source transfer learning. In: Proc. of the 22nd ACM international conference on information and knowledge management. New York: ACM, pp 2423–2428
Tang SQ, Wen YM, Qin YX (2017) Online transfer learning from multiple sources based on local classification accuracy. J Softw 28(11):2940–2960 (in Chinese)
Zadrozny B, Elkan C (2001) Learning and making decisions when costs and probabilities are both unknown. In: Proc. of the ACM SIGKDD international conference on knowledge discovery & data mining. New York: ACM, pp 204–213
Ntoutsi I, Kalousis A, Theodoridis Y (2008) A general framework for estimating similarity of datasets and decision trees: exploring semantic similarity of decision trees. In: Proc. of the SIAM international conference on data mining. Philadelphia: SIAM, pp 810–821
Huang Z (1997) Clustering large datasets with mixed numeric and categorical values. In: Proc. of the 1st Pacific-asia conference on knowledge discovery and data mining. Springer, Berlin, pp 21–34
Bifet A, Holmes G, Kirkby R et al (2010) MOA: massive online analysis. J Mach Learn Res 11(2):1601–1604
Xiang W E, Pan J S, Pan W et al (2011) Source-selection-free transfer learning, In: Proc of the twenty-second international joint conference on artificial intelligence, Menlo Park: AAAI, pp 2355–2360
Acknowledgments
This work was partially supported by the National Natural Science Foundation of China (61866007, 61762029, 61662014, 61763007), the Natural Science Foundation of Guangxi District (2018GXNSFDA138006), Guangxi Key Laboratory of Trusted Software (KX201721), Collaborative innovation center of cloud computing and big data (YD16E12), Image intelligent processing project of Key Laboratory Fund (GIIP201505).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Wen, Y., Qin, Y., Qin, K. et al. Online transfer learning with multiple decision trees. Int. J. Mach. Learn. & Cyber. 10, 2941–2962 (2019). https://doi.org/10.1007/s13042-019-00998-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-019-00998-3