Advertisement

Knowledge and Information Systems

, Volume 55, Issue 1, pp 15–44 | Cite as

Modeling recurring concepts in data streams: a graph-based framework

  • Zahra AhmadiEmail author
  • Stefan Kramer
Regular Paper

Abstract

Classifying a stream of non-stationary data with recurrent drift is a challenging task and has been considered as an interesting problem in recent years. All of the existing approaches handling recurrent concepts maintain a pool of concepts/classifiers and use that pool for future classifications to reduce the error on classifying the instances from a recurring concept. However, the number of classifiers in the pool usually grows very fast as the accurate detection of an underlying concept is a challenging task in itself. Thus, there may be many concepts in the pool representing the same underlying concept. This paper proposes the GraphPool framework that refines the pool of concepts by applying a merging mechanism whenever necessary: after receiving a new batch of data, we extract a concept representation from the current batch considering the correlation among features. Then, we compare the current batch representation to the concept representations in the pool using a statistical multivariate likelihood test. If more than one concept is similar to the current batch, all the corresponding concepts will be merged. GraphPool not only keeps the concepts but also maintains the transition among concepts via a first-order Markov chain. The current state is maintained at all times and new instances are predicted based on that. Keeping these transitions helps to quickly recover from drifts in some real-world problems with periodic behavior. Comprehensive experimental results of the framework on synthetic and real-world data show the effectiveness of the framework in terms of performance and pool management.

Keywords

Pool management Recurring concepts Concept drift Data stream classification 

References

  1. 1.
    Aggarwal CC (2014) Data classification: algorithms and applications. CRC Press, Boca RatonGoogle Scholar
  2. 2.
    Aggarwal CC, Han J, Wang J, Yu PS (2004) On demand classification of data streams. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), ACM, pp 503–508Google Scholar
  3. 3.
    Anderson TW (2003) An introduction to multivariate statistical analysis. Wiley, New YorkzbMATHGoogle Scholar
  4. 4.
    Ángel AM, Bartolo GJ, Ernestina M (2016) Predicting recurring concepts on data-streams by means of a meta-model and a fuzzy similarity function. Expert Syst Appl 46:87–105CrossRefGoogle Scholar
  5. 5.
    Baena-Garcıa M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: Proceedings of the fourth international workshop on knowledge discovery from data streams, vol 6, pp 77–86Google Scholar
  6. 6.
    Bengio Y, Frasconi P (1996) Input-output hmms for sequence processing. IEEE Trans Neural Netw 7(5):1231–1249CrossRefGoogle Scholar
  7. 7.
    Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the seventh SIAM international conference on data mining (SDM), SIAM, pp 443–448Google Scholar
  8. 8.
    Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavaldà R (2009) New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), ACM, pp 139–148Google Scholar
  9. 9.
    Bifet A, Holmes G, Kirkby R, Pfahringer B (2010a) Moa: massive online analysis. J Mach Learn Res 11:1601–1604Google Scholar
  10. 10.
    Bifet A, Holmes G, Pfahringer B (2010b) Leveraging bagging for evolving data streams. In: Machine learning and knowledge discovery in databases: proceedings of european conference on machine learning (ECML/PKDD), Springer, pp 135–150Google Scholar
  11. 11.
    Bifet A, Read J, Zliobaite I, Pfahringer B, Holmes G (2013) Pitfalls in benchmarking data stream classification and how to avoid them. In: Machine learning and knowledge discovery in databases: proceedings of european conference on machine learning (ECML/PKDD), Springer, pp 465–479Google Scholar
  12. 12.
    Borchani H, Martínez AM, Masegosa AR, Langseth H, Nielsen TD, Salmerón A, Fernández A, Madsen AL, Sáez R (2015) Modeling concept drift: a probabilistic graphical model based approach. In: Proceedings of the international symposium on intelligent data analysis, Springer, pp 72–83Google Scholar
  13. 13.
    Brzeziński D, Stefanowski J (2011) Accuracy updated ensemble for data streams with concept drift. In: Proceedings of the 6th international conference on hybrid artificial intelligence systems, Springer, pp 155–163Google Scholar
  14. 14.
    Brzezinski D, Stefanowski J (2014) Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans Neural Netw Learn Syst 25(1):81–94CrossRefGoogle Scholar
  15. 15.
    Dietterich TG (2002) Machine learning for sequential data: a review. In: Caelli T, Amin A, Duin RPW, de Ridder D, Kamel M (eds) Structural, syntactic, and statistical pattern recognition. Springer, pp 15–30Google Scholar
  16. 16.
    Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(10):1517–1531CrossRefGoogle Scholar
  17. 17.
    Gama J (2010) Knowledge discovery from data streams. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series, CRC Press, Boca RatonCrossRefzbMATHGoogle Scholar
  18. 18.
    Gama J, Kosina P (2014) Recurrent concepts in data streams classification. Knowl Inf Syst 40(3):489–507CrossRefGoogle Scholar
  19. 19.
    Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4):1–44CrossRefzbMATHGoogle Scholar
  20. 20.
    Gomes JB, Gaber MM, Sousa PA, Menasalvas E (2013) Mining recurring concepts in a dynamic feature space. IEEE Trans Neural Netw Learn Syst 25(1):95–110CrossRefGoogle Scholar
  21. 21.
    Gonçalves PM Jr, Barros RS (2013) RCD: a recurring concept drift framework. Pattern Recognit Lett 34(9):1018–1025CrossRefGoogle Scholar
  22. 22.
    Hahsler M, Dunham MH (2011) Temporal structure learning for clustering massive data streams in real-time. In: Proceedings of the 2011 SIAM international conference on data mining (SDM), SIAM, pp 664–675Google Scholar
  23. 23.
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18CrossRefGoogle Scholar
  24. 24.
    Harries M (1999) Splice-2 comparative evaluation: electricity pricing. University of New South Wales, Technical reportGoogle Scholar
  25. 25.
    Hosseini MJ, Ahmadi Z, Beigy H (2011) Pool and accuracy based stream classification: a new ensemble algorithm on data stream classification using recurring concepts detection. In: Proceedings of the IEEE 11th international conference on data mining workshops (ICDMW), IEEE, pp 588–595Google Scholar
  26. 26.
    Hosseini MJ, Ahmadi Z, Beigy H (2012) New management operations on classifiers pool to track recurring concepts. In: Proceedings of the 14th international conference on data warehousing and knowledge discovery (DaWaK), Springer, pp 327–339Google Scholar
  27. 27.
    Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), ACM, pp 97–106Google Scholar
  28. 28.
    Jaber G, Cornuéjols A, Tarroux P (2013) Online learning: searching for the best forgetting strategy under concept drift. In: Proceedings of the 20th international conference neural information processing (ICONIP), Springer, pp 400–408Google Scholar
  29. 29.
    Kalnis P, Mamoulis N, Bakiras S (2005) On discovering moving clusters in spatio–temporal data. In: Proceedings of the 9th international symposium on advances in spatial and temporal databases (SSTD), Springer, pp 364–381Google Scholar
  30. 30.
    Katakis I, Tsoumakas G, Vlahavas I (2010) Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl Inf Syst 22(3):371–391CrossRefGoogle Scholar
  31. 31.
    Kolter JZ, Maloof MA (2005) Using additive expert ensembles to cope with concept drift. In: Proceedings of the 22nd international conference on machine learning (ICML), ACM, pp 449–456Google Scholar
  32. 32.
    Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8:2755–2790zbMATHGoogle Scholar
  33. 33.
    Krempl G, Zliobaite I, Brzeziński D, Hüllermeier E, Last M, Lemaire V, Noack T, Shaker A, Sievi S, Spiliopoulou M, Stefanowski J (2014) Open challenges for data stream mining research. ACM SIGKDD Explor Newslett 16(1):1–10CrossRefGoogle Scholar
  34. 34.
    Kuncheva LI (2004) Classifier ensembles for changing environments. In: Proceedings of the 5th international workshop on multiple classifier systems (MCS), Springer, pp 1–15Google Scholar
  35. 35.
    Lazarescu M (2005) A multi-resolution learning approach to tracking concept drift and recurrent concepts. In: Proceedings of the 5th international workshop on pattern recognition in information systems (PRIS), pp 52–61Google Scholar
  36. 36.
    Lewandowski D, Kurowicka D, Joe H (2009) Generating random correlation matrices based on vines and extended onion method. J Multivar Anal 100(9):1989–2001MathSciNetCrossRefzbMATHGoogle Scholar
  37. 37.
    Littlestone N, Warmuth MK (1994) The weighted majority algorithm. Inf Comput 108(2):212–261MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    Masud MM, Chen Q, Khan L, Aggarwal C, Gao J, Han J, Thuraisingham B (2010) Addressing concept-evolution in concept-drifting data streams. In: Proceedings of the IEEE 10th international conference on data mining (ICDM), IEEE, pp 929–934Google Scholar
  39. 39.
    Minku LL, Yao X (2012) DDD: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24(4):619–633CrossRefGoogle Scholar
  40. 40.
    Muirhead RJ (2009) Aspects of multivariate statistical theory, vol 197. Wiley, HobokenzbMATHGoogle Scholar
  41. 41.
    Nishida K, Yamauchi K, Omori T (2005) ACE: adaptive classifiers-ensemble system for concept-drifting environments. In: Proceedings of the 6th international workshop on multiple classifier systems (MCS), Springer, pp 176–185Google Scholar
  42. 42.
    Ntoutsi I, Spiliopoulou M, Theodoridis Y (2009) Tracing cluster transitions for different cluster types. Control Cybern 38(1):239–259zbMATHGoogle Scholar
  43. 43.
    Oliveira MDB, Gama J (2010) MEC—monitoring clusters’ transitions. In: Proceedings of the fifth starting AI researchers’ symposium (STAIRS), pp 212–224Google Scholar
  44. 44.
    Oza NC (2005) Online bagging and boosting. IEEE Int Conf Syst Man Cybern 3:2340–2345Google Scholar
  45. 45.
    Oza NC, Russell S (2001) Experimental comparisons of online and batch versions of bagging and boosting. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), ACM, pp 359–364Google Scholar
  46. 46.
    Ramamurthy S, Bhatnagar R (2007) Tracking recurrent concept drift in streaming data using ensemble classifiers. In: Proceedings of the sixth international conference on machine learning and applications (ICMLA), IEEE, pp 404–409Google Scholar
  47. 47.
    Sakthithasan S, Pears R, Bifet A, Pfahringer B (2015) Use of ensembles of Fourier spectra in capturing recurrent concepts in data streams. In: Proceedings of the international joint conference on neural networks (IJCNN), pp 1–8Google Scholar
  48. 48.
    Spiliopoulou M, Ntoutsi I, Theodoridis Y, Schult R (2006) Monic: modeling and monitoring cluster transitions. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), ACM, pp 706–711Google Scholar
  49. 49.
    Street WN, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), ACM, pp 377–382Google Scholar
  50. 50.
    Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), ACM, pp 226–235Google Scholar
  51. 51.
    Webb GI, Hyde R, Cao H, Nguyen HL, Petitjean F (2016) Characterizing concept drift. Data Min Knowl Disc 30(4):964–994MathSciNetCrossRefGoogle Scholar
  52. 52.
    Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101Google Scholar
  53. 53.
    Yang Y, Wu X, Zhu X (2006) Mining in anticipation for concept change: proactive–reactive prediction in data streams. Data Min Knowl Discov 13(3):261–289MathSciNetCrossRefGoogle Scholar
  54. 54.
    Zliobaite I, Pechenizkiy M, Gama J (2016) An overview of concept drift applications. In: Japkowicz N, Stefanowski J (eds) Big data analysis: new algorithms for a new society. Springer, pp 91–114Google Scholar

Copyright information

© Springer-Verlag London 2017

Authors and Affiliations

  1. 1.Institut für InformatikJohannes Gutenberg-UniversitätMainzGermany

Personalised recommendations