Abstract
Concept evolution detection is an important and difficult problem in streaming data mining. When the labeled samples in streaming data insufficient to reflect the training data distribution, it will often further restrict the detection performance. This paper proposed a concept evolution detection method based on active learning (CE_AL). Firstly, the initial classifiers are constructed by a small number of labeled samples. The sample areas are divided into the automatic labeling and the active labeling areas according to the relationship between the classifiers of different categories. Secondly, for online new coming samples, according to their different areas, two strategies based on the automatic learning-based model labeling and active learning-based expert labeling are adopted respectively, which can improve the online learning performance with only a small number of labeled samples. Besides, the strategy of “data enhance” combined with “model enhance” is adopted to accelerate the convergence of the evolution category detection model. The experimental results show that the proposed CE_AL method can enhance the detection performance of concept evolution and realize efficient learning in an unstable environment by labeling a small number of key samples.
Similar content being viewed by others
References
Abd EK, Sofiane L, Karima A, Hamida S (2020) A simple graph embedding for anomaly detection in a stream of heterogeneous labeled graphs. Pattern Recognit 112:107746
Abdallah ZS, Gaber MM, Srinivasan B (2016) AnyNovel: detection of novel concepts in evolving data streams. Evol Syst 7:73–93
Abdualrhman M, Padma M (2019) Deterministic concept drift detection in ensemble classifier based data stream classification process. Int J Grid High Perform Comput (IJGHPC) 11(1):29–48
Ahn CK (2010) Passive learning and input-to-state stability of switched Hopfield neural networks with time-delay. Inf Sci 180(23):4582–4584
Al-Khateeb T, Masud M, Khan L, Aggarwal C, Han J, Thuraisingham B (2012) Stream classification with recurring and novel class detection using class-based ensemble. In: Proceedings of the IEEE 12th international conference on data mining, pp 31–40
Al-Khateeb T, Masud MM, Al-Naami KM, Seker SE, Mustafa AM, Khan L, Trabelsi Z, Aggarwal C, Han JW (2016) Recurring and novel class detection using class-based ensemble for evolving data stream. IEEE Trans Knowl Data Eng 28(10):2752–2764
Alothali E, Alashwal H, Harous S (2019) Data stream mining techniques: a review. TELKOMNIKA 17(2):728–737
Ancy S, Paulraj D (2019) Online learning model for handling different concept drifts using diverse ensemble classifiers on evolving data streams. Cybern Syst 50(7):579–608
Barbosa Roa N, Travé-Massuyės L, Grisales-Palacio VH (2019) DyClee: dynamic clustering for tracking evolving environments. Pattern Recognit 94:162–186
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Brzeninski D, Stefanowski J (2014) Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans Neural Netw Learn Syst 25(1):81–94
Chakraborty D, Narayanan V, Ghosh A (2019) Integration of deep feature extraction and ensemble learning for outlier detection. Pattern Recognit 89:161–171
Chandak MB (2016) Role of big-data in classification and novel class detection in data streams. J Big Data 3(1):1–9
de Faria ER, de Leon Ferreira Carvalho AC Ponce, Gama J (2016) MINAS: multiclass learning algorithm for novelty detection in data streams. Data Min Knowl Discov 30(3):640–680
Demiar J, Schuurmans D (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
Ditzler G, Polikar R (2013) Incremental learning of concept drift from streaming imbalanced data. IEEE Trans Knowl Data Eng 25(10):2283–2301
Dongre SS, Malik LG, Thomas A (2019) Detecting concept drift using HEDDM in data stream. Int J Intell Eng Inform 7(2–3):164
Faria ER, Gama J, Carvalho AC (2013) Novelty detection algorithm for data streams multi-class problems. In: Proceedings of the 28th annual ACM symposium on applied computing, pp 795–800
Farid DM, Zhang L, Hossain A, Rahman CM, Strachan R, Sexton G, Dahal K (2013) An adaptive ensemble classifier for mining concept drifting data streams. Expert Syst Appl 40(15):5895–5906
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. Comput Syst Sci 55(1):119–139
Frias-Blanco I, Campo-Avila J, Ramos G, Morales-Bueno R (2015) Online and non-parametric drift detection methods based on Hoeffdings bounds. IEEE Trans Knowl Data Eng 27(3):810–823
Gandhi J, Gandhi V (2020) Novel class detection with concept drift in data stream-AhtNODE. Int J Distrib Syst Technol 11(1):15–26
Ghomeshi H, Gaber M, Kovalchuk Y (2019) EACD: evolutionary adaptation to concept drifts in data streams. Data Min Knowl Discov 33(3):663–694
Guo HS, Wang WJ (2015) An active learning-based SVM multi-class classification model. Pattern Recognit 48(5):1577–1597
Guo HS, Zhang S, Wang WJ (2021) Selective ensemble-based on line adaptive deep neural networks for streaming data with concept drift. Neural Netw 142:437–456
Guo HS, Li H, Ren QY, Wang WJ (2022) Concept drift type identification based on multi-sliding windows. Inf Sci 585:1–23
Haque A, Khan L, Baron M (2015) Semi-supervised adaptive framework for classifying evolving data stream. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining, pp 383–394
Haque A, Khan L, Baron M (2016a) Sand: semi-supervised adaptive novel class detection and classification over data stream. In: Proceedings of the 30th AAAI conference on artificial intelligence, pp 1652–1658
Haque A, Khan L, Baron M, Thuraisingham B, Aggarwal C (2016b) Efficient handling of concept drift and concept evolution over stream data. In: 2016 IEEE 32nd international conference on data engineering (ICDE), pp 481–492
Hashemi S, Yang Y, Mirzamomen Z, Kangavari M (2009) Adapted one-versus-all decision trees for data stream classification. IEEE Trans Knowl Data Eng 21(5):624–637
Hayat MZ, Hashemi MR (2010) A DCT based approach for detecting novelty and concept drift in data streams. In: Proceedings of the 2010 international conference of soft computing and pattern recognition, pp 373–378
Kuncheva L, Zliobaite I (2009) On the window size for classification in changing environments. IEEE Trans Knowl Data Eng 13(6):861–872
Lu CH, Yu CH (2019) Online data stream analytics for dynamic environments using self-regularized learning framework. IEEE Syst J 13(4):3697–3707
Lughofer E, Pratama M (2018) Online active learning in data stream regression using uncertainty sampling based on evolving generalized fuzzy models. IEEE Trans Fuzzy Syst 26(1):292–309
Masud MM, Gao J, Khan L, Han JW, Thuraisingham B (2008) A practical approach to classify evolving data streams: training with limited amount of labeled data. In: Proceedings of the 2008 IEEE 8th international conference on data mining, pp 929–934
Masud MM, Gao J, Khan L, Han JW, Thuraisingham B (2009) Integrating novel class detection with classification for concept-drifting data streams. Mach Learn Knowl Discov Databases 5782:79–94
Masud MM, Al-Khateeb TM, Khan L, Aggarwal C, Gao J, Han J, Thuraisingham B (2011a) Detecting recurring and novel classes in concept-drifting data streams. In: Proceedings of the 2011 IEEE 11th international conference on data mining, pp 1176–1181
Masud MM, Gao J, Khan L, Han JW, Thuraisingham B (2011) Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans Knowl Data Eng 23(6):859–874
Masud M, Gao J, Khan L (2011) Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans Knowl Data Eng 23(6):859–874
Masud MM, Chen Q, Khan L, Aggarwal CC, Gao J, Han JW, Srivastava A, Oza NC (2013) Classification and adaptive novel class detection of feature-evolving data streams. IEEE Trans Knowl Data Eng 25(7):1484–1497
Miao Y, Qiu L, Chen H, Zhang J, Wen Y (2013) Novel class detection within classification for data streams. In: Proceedings of the 10th international symposium on neural networks, pp 413–420
Minku LL, Yao X (2012) DDD: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24(4):619–633
Mohamad S, Sayed-Mouchaweh M, Bouchachia A (2016) Active Learning for Data Streams under Concept Drift and concept evolution. In: ECML/PKDD 2016 workshop on large-scale learning from data streams in evolving environments (STREAMEVOLV-2016)
Mu X, Ting KM, Zhou ZH (2017) Classification under streaming emerging new classes: a solution using completely-random trees. IEEE Trans Knowl Data Eng 29(8):1605–1618
Oikarinen E, Tiittanen H, Henelius A, Puola mki K (2021) Detecting virtual concept drift of regressors without ground truth values. Data Min Knowl Discov 1:1
Parker B, Mustafa AM, Khan L (2012) Novel class detection and feature via a tiered ensemble approach for stream mining. In: Proceedings of the 2012 IEEE 24th international conference on tools with artificial intelligence, pp 1171–1178
Pesaranghader A, Viktor H (2016) Fast hoeffding drift detection method for evolving data streams. In: Proceedings of the Joint European conference on machine learning and knowledge discovery in databases, pp 96–111
Pinag F, dos Santos EM, Gama J (2020) A drift detection method based on dynamic classifier selection. Data Min Knowl Discov 34(1):50–74
Rakitianskaia AS, Engelbrecht AP (2012) Training feedforward neural networks with dynamic particle swarm optimization. Swarm Intell 6(3):233–270
Spinosa EJ, Carvalho AP, Gama J (2007) OLINDDA: a cluster-based approach for detecting novelty and concept drift in data streams. In: Proceedings of the 2007 ACM symposium on applied computing, pp 448–452
Sugiyama M, Ogawa H (2001) Incremental projection learning for optimal generalization. Neural Netw 14:53–66
Sun Y, Tang K, Minku LL, Wang S, Yao X (2016) Online ensemble learning of data streams with gradually evolved classes. IEEE Trans Knowl Data Eng 28(6):1532–1545
Warmuth MK, Liao J, Ratsch G (2006) Totally corrective boosting algorithms that maximize the margin. In: Proceedings of the 23rd international conference on machine learning, pp 1001–1008
Webb GI, Hyde R, Cao H, Nguyen HL, Petitjean F (2016) Characterizing concept drift. Data Min Knowl Discov 30(4):964–994
Webb GI, Lee LK, Goethals B, Petitjean F (2018) Analyzing concept drift and shift from sample data. Data Min Knowl Discov 32(5):1179–1199
Zaremoodi P, Beigy H, Kamali Siahroudi S (2015) Novel class detection in data streams using local patterns and neighborhood graph. Neurocomputing 158:234–245
ZareMoodi P, Kamali Siahroudi S, Beigy H (2019) Concept-evolution detection in non-stationary data streams: a fuzzy clustering approach. Knowl Inf Syst 60(3):1329–1352
Zyblewski P, Sabourin R, Wozniak M (2020) Preprocessed dynamic classifier ensemble selection for highly imbalanced drifted data streams. Inf Fusion 66:138–154
Acknowledgements
This research was partially supported by the National Natural Science Foundation of China (Nos. 62276157, U21A20513, 62076154, 61503229), the Key R &D Program of Shanxi Province (No. 202202020101003).
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Albert Bifet.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Guo, H., Li, H., Cong, L. et al. Online concept evolution detection based on active learning. Data Min Knowl Disc (2024). https://doi.org/10.1007/s10618-024-01011-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10618-024-01011-4