Skip to main content
Log in

Online concept evolution detection based on active learning

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Concept evolution detection is an important and difficult problem in streaming data mining. When the labeled samples in streaming data insufficient to reflect the training data distribution, it will often further restrict the detection performance. This paper proposed a concept evolution detection method based on active learning (CE_AL). Firstly, the initial classifiers are constructed by a small number of labeled samples. The sample areas are divided into the automatic labeling and the active labeling areas according to the relationship between the classifiers of different categories. Secondly, for online new coming samples, according to their different areas, two strategies based on the automatic learning-based model labeling and active learning-based expert labeling are adopted respectively, which can improve the online learning performance with only a small number of labeled samples. Besides, the strategy of “data enhance” combined with “model enhance” is adopted to accelerate the convergence of the evolution category detection model. The experimental results show that the proposed CE_AL method can enhance the detection performance of concept evolution and realize efficient learning in an unstable environment by labeling a small number of key samples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Algorithm 1
Algorithm 2
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  • Abd EK, Sofiane L, Karima A, Hamida S (2020) A simple graph embedding for anomaly detection in a stream of heterogeneous labeled graphs. Pattern Recognit 112:107746

    Google Scholar 

  • Abdallah ZS, Gaber MM, Srinivasan B (2016) AnyNovel: detection of novel concepts in evolving data streams. Evol Syst 7:73–93

    Article  Google Scholar 

  • Abdualrhman M, Padma M (2019) Deterministic concept drift detection in ensemble classifier based data stream classification process. Int J Grid High Perform Comput (IJGHPC) 11(1):29–48

    Article  Google Scholar 

  • Ahn CK (2010) Passive learning and input-to-state stability of switched Hopfield neural networks with time-delay. Inf Sci 180(23):4582–4584

    Article  Google Scholar 

  • Al-Khateeb T, Masud M, Khan L, Aggarwal C, Han J, Thuraisingham B (2012) Stream classification with recurring and novel class detection using class-based ensemble. In: Proceedings of the IEEE 12th international conference on data mining, pp 31–40

  • Al-Khateeb T, Masud MM, Al-Naami KM, Seker SE, Mustafa AM, Khan L, Trabelsi Z, Aggarwal C, Han JW (2016) Recurring and novel class detection using class-based ensemble for evolving data stream. IEEE Trans Knowl Data Eng 28(10):2752–2764

    Article  Google Scholar 

  • Alothali E, Alashwal H, Harous S (2019) Data stream mining techniques: a review. TELKOMNIKA 17(2):728–737

    Article  Google Scholar 

  • Ancy S, Paulraj D (2019) Online learning model for handling different concept drifts using diverse ensemble classifiers on evolving data streams. Cybern Syst 50(7):579–608

    Article  Google Scholar 

  • Barbosa Roa N, Travé-Massuyės L, Grisales-Palacio VH (2019) DyClee: dynamic clustering for tracking evolving environments. Pattern Recognit 94:162–186

    Article  ADS  Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    Article  Google Scholar 

  • Brzeninski D, Stefanowski J (2014) Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans Neural Netw Learn Syst 25(1):81–94

    Article  Google Scholar 

  • Chakraborty D, Narayanan V, Ghosh A (2019) Integration of deep feature extraction and ensemble learning for outlier detection. Pattern Recognit 89:161–171

    Article  ADS  Google Scholar 

  • Chandak MB (2016) Role of big-data in classification and novel class detection in data streams. J Big Data 3(1):1–9

    Article  MathSciNet  Google Scholar 

  • de Faria ER, de Leon Ferreira Carvalho AC Ponce, Gama J (2016) MINAS: multiclass learning algorithm for novelty detection in data streams. Data Min Knowl Discov 30(3):640–680

    Article  MathSciNet  Google Scholar 

  • Demiar J, Schuurmans D (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30

    MathSciNet  Google Scholar 

  • Ditzler G, Polikar R (2013) Incremental learning of concept drift from streaming imbalanced data. IEEE Trans Knowl Data Eng 25(10):2283–2301

    Article  Google Scholar 

  • Dongre SS, Malik LG, Thomas A (2019) Detecting concept drift using HEDDM in data stream. Int J Intell Eng Inform 7(2–3):164

    Google Scholar 

  • Faria ER, Gama J, Carvalho AC (2013) Novelty detection algorithm for data streams multi-class problems. In: Proceedings of the 28th annual ACM symposium on applied computing, pp 795–800

  • Farid DM, Zhang L, Hossain A, Rahman CM, Strachan R, Sexton G, Dahal K (2013) An adaptive ensemble classifier for mining concept drifting data streams. Expert Syst Appl 40(15):5895–5906

    Article  Google Scholar 

  • Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. Comput Syst Sci 55(1):119–139

    Article  MathSciNet  Google Scholar 

  • Frias-Blanco I, Campo-Avila J, Ramos G, Morales-Bueno R (2015) Online and non-parametric drift detection methods based on Hoeffdings bounds. IEEE Trans Knowl Data Eng 27(3):810–823

    Article  Google Scholar 

  • Gandhi J, Gandhi V (2020) Novel class detection with concept drift in data stream-AhtNODE. Int J Distrib Syst Technol 11(1):15–26

    Article  MathSciNet  Google Scholar 

  • Ghomeshi H, Gaber M, Kovalchuk Y (2019) EACD: evolutionary adaptation to concept drifts in data streams. Data Min Knowl Discov 33(3):663–694

    Article  Google Scholar 

  • Guo HS, Wang WJ (2015) An active learning-based SVM multi-class classification model. Pattern Recognit 48(5):1577–1597

    Article  ADS  Google Scholar 

  • Guo HS, Zhang S, Wang WJ (2021) Selective ensemble-based on line adaptive deep neural networks for streaming data with concept drift. Neural Netw 142:437–456

    Article  PubMed  Google Scholar 

  • Guo HS, Li H, Ren QY, Wang WJ (2022) Concept drift type identification based on multi-sliding windows. Inf Sci 585:1–23

    Article  Google Scholar 

  • Haque A, Khan L, Baron M (2015) Semi-supervised adaptive framework for classifying evolving data stream. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining, pp 383–394

  • Haque A, Khan L, Baron M (2016a) Sand: semi-supervised adaptive novel class detection and classification over data stream. In: Proceedings of the 30th AAAI conference on artificial intelligence, pp 1652–1658

  • Haque A, Khan L, Baron M, Thuraisingham B, Aggarwal C (2016b) Efficient handling of concept drift and concept evolution over stream data. In: 2016 IEEE 32nd international conference on data engineering (ICDE), pp 481–492

  • Hashemi S, Yang Y, Mirzamomen Z, Kangavari M (2009) Adapted one-versus-all decision trees for data stream classification. IEEE Trans Knowl Data Eng 21(5):624–637

    Article  Google Scholar 

  • Hayat MZ, Hashemi MR (2010) A DCT based approach for detecting novelty and concept drift in data streams. In: Proceedings of the 2010 international conference of soft computing and pattern recognition, pp 373–378

  • Kuncheva L, Zliobaite I (2009) On the window size for classification in changing environments. IEEE Trans Knowl Data Eng 13(6):861–872

    Google Scholar 

  • Lu CH, Yu CH (2019) Online data stream analytics for dynamic environments using self-regularized learning framework. IEEE Syst J 13(4):3697–3707

    Article  ADS  Google Scholar 

  • Lughofer E, Pratama M (2018) Online active learning in data stream regression using uncertainty sampling based on evolving generalized fuzzy models. IEEE Trans Fuzzy Syst 26(1):292–309

    Article  Google Scholar 

  • Masud MM, Gao J, Khan L, Han JW, Thuraisingham B (2008) A practical approach to classify evolving data streams: training with limited amount of labeled data. In: Proceedings of the 2008 IEEE 8th international conference on data mining, pp 929–934

  • Masud MM, Gao J, Khan L, Han JW, Thuraisingham B (2009) Integrating novel class detection with classification for concept-drifting data streams. Mach Learn Knowl Discov Databases 5782:79–94

    Google Scholar 

  • Masud MM, Al-Khateeb TM, Khan L, Aggarwal C, Gao J, Han J, Thuraisingham B (2011a) Detecting recurring and novel classes in concept-drifting data streams. In: Proceedings of the 2011 IEEE 11th international conference on data mining, pp 1176–1181

  • Masud MM, Gao J, Khan L, Han JW, Thuraisingham B (2011) Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans Knowl Data Eng 23(6):859–874

    Article  Google Scholar 

  • Masud M, Gao J, Khan L (2011) Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans Knowl Data Eng 23(6):859–874

    Article  Google Scholar 

  • Masud MM, Chen Q, Khan L, Aggarwal CC, Gao J, Han JW, Srivastava A, Oza NC (2013) Classification and adaptive novel class detection of feature-evolving data streams. IEEE Trans Knowl Data Eng 25(7):1484–1497

    Article  Google Scholar 

  • Miao Y, Qiu L, Chen H, Zhang J, Wen Y (2013) Novel class detection within classification for data streams. In: Proceedings of the 10th international symposium on neural networks, pp 413–420

  • Minku LL, Yao X (2012) DDD: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24(4):619–633

    Article  Google Scholar 

  • Mohamad S, Sayed-Mouchaweh M, Bouchachia A (2016) Active Learning for Data Streams under Concept Drift and concept evolution. In: ECML/PKDD 2016 workshop on large-scale learning from data streams in evolving environments (STREAMEVOLV-2016)

  • Mu X, Ting KM, Zhou ZH (2017) Classification under streaming emerging new classes: a solution using completely-random trees. IEEE Trans Knowl Data Eng 29(8):1605–1618

    Article  Google Scholar 

  • Oikarinen E, Tiittanen H, Henelius A, Puola mki K (2021) Detecting virtual concept drift of regressors without ground truth values. Data Min Knowl Discov 1:1

    MathSciNet  Google Scholar 

  • Parker B, Mustafa AM, Khan L (2012) Novel class detection and feature via a tiered ensemble approach for stream mining. In: Proceedings of the 2012 IEEE 24th international conference on tools with artificial intelligence, pp 1171–1178

  • Pesaranghader A, Viktor H (2016) Fast hoeffding drift detection method for evolving data streams. In: Proceedings of the Joint European conference on machine learning and knowledge discovery in databases, pp 96–111

  • Pinag F, dos Santos EM, Gama J (2020) A drift detection method based on dynamic classifier selection. Data Min Knowl Discov 34(1):50–74

    Article  MathSciNet  Google Scholar 

  • Rakitianskaia AS, Engelbrecht AP (2012) Training feedforward neural networks with dynamic particle swarm optimization. Swarm Intell 6(3):233–270

    Article  Google Scholar 

  • Spinosa EJ, Carvalho AP, Gama J (2007) OLINDDA: a cluster-based approach for detecting novelty and concept drift in data streams. In: Proceedings of the 2007 ACM symposium on applied computing, pp 448–452

  • Sugiyama M, Ogawa H (2001) Incremental projection learning for optimal generalization. Neural Netw 14:53–66

    Article  CAS  PubMed  Google Scholar 

  • Sun Y, Tang K, Minku LL, Wang S, Yao X (2016) Online ensemble learning of data streams with gradually evolved classes. IEEE Trans Knowl Data Eng 28(6):1532–1545

    Article  Google Scholar 

  • Warmuth MK, Liao J, Ratsch G (2006) Totally corrective boosting algorithms that maximize the margin. In: Proceedings of the 23rd international conference on machine learning, pp 1001–1008

  • Webb GI, Hyde R, Cao H, Nguyen HL, Petitjean F (2016) Characterizing concept drift. Data Min Knowl Discov 30(4):964–994

    Article  MathSciNet  Google Scholar 

  • Webb GI, Lee LK, Goethals B, Petitjean F (2018) Analyzing concept drift and shift from sample data. Data Min Knowl Discov 32(5):1179–1199

    Article  MathSciNet  Google Scholar 

  • Zaremoodi P, Beigy H, Kamali Siahroudi S (2015) Novel class detection in data streams using local patterns and neighborhood graph. Neurocomputing 158:234–245

    Article  Google Scholar 

  • ZareMoodi P, Kamali Siahroudi S, Beigy H (2019) Concept-evolution detection in non-stationary data streams: a fuzzy clustering approach. Knowl Inf Syst 60(3):1329–1352

    Article  Google Scholar 

  • Zyblewski P, Sabourin R, Wozniak M (2020) Preprocessed dynamic classifier ensemble selection for highly imbalanced drifted data streams. Inf Fusion 66:138–154

    Article  Google Scholar 

Download references

Acknowledgements

This research was partially supported by the National Natural Science Foundation of China (Nos. 62276157, U21A20513, 62076154, 61503229), the Key R &D Program of Shanxi Province (No. 202202020101003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenjian Wang.

Additional information

Responsible editor: Albert Bifet.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, H., Li, H., Cong, L. et al. Online concept evolution detection based on active learning. Data Min Knowl Disc (2024). https://doi.org/10.1007/s10618-024-01011-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10618-024-01011-4

Keywords

Navigation