A novel online ensemble approach to handle concept drifting data streams: diversified dynamic weighted majority

Abstract

We present an online ensemble approach, diversified dynamic weighted majority (DDWM) to classify new data instances which have varying conceptual distributions. Our approach maintains two sets of weighted ensembles that differentiate in their level of diversity. An expert in either of the ensembles is updated or removed as per its classification accuracy and a new expert is added based on the final global prediction of the algorithm and the global prediction of the ensemble for any data instance. Experimental evaluation using various artificial and real-world datasets proves that DDWM provides very high accuracy in classifying new data instances, irrespective of size of dataset, type of drift or presence of noise. We compare DDWM with the other learners in terms of new performance metrics such as kappa statistic, model cost, and the evaluation time and memory requirements. Our approach proved to be highly resource effective achieving very high accuracies even in a resource constrained environment.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

References

  1. 1.

    Baena-Garcı´a M, Del Campo-Avila J, Fidalgo R, Bifet A (2006) Early drift detection method. In: Proceedings Fourth ECML PKDD Int’l Workshop Knowledge Discovery from Data Streams (IWKDDS’06), pp 77–86

  2. 2.

    Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis, a framework for stream classification and clustering. In: workshop on applications of pattern analysis, JMLR: Workshop and Conference Proceedings, vol 11. p 44

  3. 3.

    Blum A (1997) Empirical support for winnow and weighted majority algorithms: results on a calendar scheduling domain, machine learning. Kluwer Academic Publisher, Boston

    Google Scholar 

  4. 4.

    Dawid A, Vovk V (1999) Prequential probability : principles and proper ties. Bernoulli 5(1):125–162

    MathSciNet  Article  MATH  Google Scholar 

  5. 5.

    Dietterich TG (1997) Machine learning research: four current directions. Artif Intell 18(4):97–136

    Google Scholar 

  6. 6.

    Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection, In: Proceeding Seventh Brazilian Symp. Artificial Intelligence (SBIA’04), pp. 286–295

  7. 7.

    Gao J, Fan W, Han J (2007) On appropriate assumptions to mine data streams: analysis and practice. In: Proceedings IEEE Int’l Conf. Data Mining (ICDM,’07), pp 143–152

  8. 8.

    Harries M (1999) Splice-2 comparative evaluation: electricity pricing, Technical report. University of New South Wales, Australia

    Google Scholar 

  9. 9.

    Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams, In: Proceedings KDD’01, ACM Press. San Francisco, 2001, pp 97–106

  10. 10.

    Kolter JZ, Maloof MA (2003) Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the 3rd ICDM, USA, pp 123–130

  11. 11.

    Kolter JZ, Maloof MA (2005) Using additive expert ensembles to cope with concept drift. In: Proceedings Int’l Conf. Machine Learning (ICML’05), pp 449–456

  12. 12.

    Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8:2755–2790

    MATH  Google Scholar 

  13. 13.

    Littlestone N, Warmuth M (1994) The weighted majority algorithm. Inf Comput 108:212–261

    MathSciNet  Article  MATH  Google Scholar 

  14. 14.

    Mansoori M, Zakaria O, Gani A (2012) Improving exposure of intrusion deception system through implementation of hybrid honeypot. IAJIT 9 (5): 436–444

  15. 15.

    Minku FL, White A, Yao X (2010) The Impact of Diversity on On-Line Ensemble Learning in the Presence of Concept Drift. IEEE Trans Knowl Data Eng 22(5):730–742

    Article  Google Scholar 

  16. 16.

    Minku LL, Yao X (2012) DDD: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24(4):619

    Article  Google Scholar 

  17. 17.

    Nishida K (2008) Learning and Detecting Concept Drift, PhD dissertation, Hokkaido Univ. [Online]. http://lis2.huie.hokudai.ac.jp/%20%20knishida/paper/nishida2008-dissertation%20.pdf

  18. 18.

    Nishida K, Yamauchi K (2007) Adaptive classifiers-ensemble system for tracking concept drift. In: Proceedings Sixth Int’l Conf. Machine Learning and Cybernetics (ICMLC’07), pp 3607–3612

  19. 19.

    Nishida K, Yamauchi K (2007) Detecting concept drift using statistical testing. In: Proceedings 10th Int’l Conf. Discovery Science (DS’07), pp 264–269

  20. 20.

    Nishida K, Yamauchi K, Omori T (2005) ACE: adaptive classifiers-ensemble system for concept-drifting environments. In: Proceedings of the 6th International Workshop on Multiple Classifier Systems, ser. Lect Notes Comput Sci 3541:176–185

    Article  Google Scholar 

  21. 21.

    Oza NC, Russell S (2001) Experimental comparisons of online and batch versions of bagging and boosting. In: Proceedings of the Seventh ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD’01), ACM Press, New York, pp 359–364

  22. 22.

    Scholz M, Klinkenberg R (2005) An ensemble classifier for drifting concepts. In: Proceedings of the Second International Workshop on Knowledge Discovery from Data Streams (IWKDDS’05), Porto, pp 53–64

  23. 23.

    Sidhu P, Bhatia MPS (2014) Extended dynamic weighted majority using diversity to handle drifts. New Trends Databases Inf Sys Adv Intell Sys Comput 241:389–395

    Article  Google Scholar 

  24. 24.

    Stanley KO (2003) Learning concept drift with a Commitee of decision trees, Technical Report AI-TR-03-302, Dept. of Computer Sciences, Univ. of Texas, Austin

  25. 25.

    Street W, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification, In: Proceedings of the 7th ACM International Conference on Knowledge Discovery and Data Mining, ACM Press, New York, pp 377–382

  26. 26.

    Schlimmer JC, Granger RH (1986) Incremental learning from noisy data. Mach Learn 1(3):317–354

    Google Scholar 

  27. 27.

    Tsymbal A (2004) The problem of concept drift: definitions and related work, Technical Report TCD-CS-2004-15. Department of Computer Science, Trinity College Dublin, Ireland

    Google Scholar 

  28. 28.

    Kubat M, Widmer G (1996) Learning in the presence of concept drift and hidden contexts, Machine Learning, 23 (1): 69–101.16.Klinkenberg R., Learning drifting

  29. 29.

    Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51:181–207

    Article  MATH  Google Scholar 

  30. 30.

    Tang EK, Sunganthan PN, Yao X (2006) An analysis of diversity measures. Mach Learn 65:247–271

    Article  Google Scholar 

  31. 31.

    Yule G (1900) On the association of attributes in statistics, Philosophical Trans. Royal Soc. of London, Series A, vol 194, pp 257–319

  32. 32.

    Gama J, Sebastião R, Rodrigues PP (2009) Issues in evaluation of stream learning algorithms, In KDD’09, pp 329–338

  33. 33.

    Minku FL, Yao X (2009) Using diversity to handle concept drift in on-line learning, In: Proceedings Int’l Joint Conf. Neural Networks (IJCNN, 2009b), pp 2125–2132

  34. 34.

    Su L, Liu HY, Song ZH (2011) A new classification algorithm for data stream. International Journal of Modern Education and Computer Science 4:32–39

    Article  Google Scholar 

  35. 35.

    Murphy PM (1998) UCI Repository of machine learning databases. Department of Information and Computer Sciences, University of California, Irvine, available at http://www.ics.uci.edu/~mlearn/

  36. 36.

    Blake C, Merz C (1998) UCI repository of machine learning databases. Department of Information and Computer Sciences, University of California, Irvine, Web site (Online). http://www.ics.uci.edu/~mlearn/MLRepository.html.

  37. 37.

    Tsai CJ, Lee CI, Yang WP (2009) Mining decision rules on data streams in the presence of concept drifts. Expert Syst Appl 36:1164–1178

    Article  Google Scholar 

  38. 38.

    Gaber MM, Yu PS (2006) Detection and classification of changes in evolving data streams. Int J Inf Technol Decis Mak 5:659–670

    Article  Google Scholar 

  39. 39.

    Yang Y, Wu X, Zhu X (2005) Combining proactive and reactive predictions for data streams, In Proceedings of ACM SIGKDD, pp 710–715

  40. 40.

    Wang H, Fan W, Yu PS, Han J (2001) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, pp 226–235

  41. 41.

    Chu F, Zaniolo C (2004) Fast and light boosting for adaptive mining of data streams. In: Proceedings Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD’04), pp 282–292

  42. 42.

    Scholz M, Klinkenberg R (2007) Boosting classifiers for drifting concepts. Intell Data Anal Spec Issue Knowl Discov Data Streams 11(1):3–28

    Google Scholar 

  43. 43.

    S. Ramamurthy, R. Bhatnagar, Tracking Recurrent Concept Drift in Streaming Data Using Ensemble Classifiers, In Proc. Int’l Conf. Machine Learning and Applications (ICMLA’07), pp. 404-409, 2007

  44. 44.

    Gao J, Fan W, Han J, Yu P (2007) A general framework for mining concept-drifting data streams with skewed distributions. In: Proceedings SIAM Int’l Conf. Data Mining (ICDM)

  45. 45.

    He H, Chen S (2008) IMORL: incremental Multiple-Object Recognition and Localization. IEEE Trans Neural Networks 19(10):1727–1738

    Article  Google Scholar 

  46. 46.

    Polikar R, Udpa L, Udpa SS, Honavar V (2001) Learn ++: an incremental learning algorithm for supervised neural networks. IEEE Trans Sys Man Cybernet Part C 31(4):497–508

    Article  Google Scholar 

  47. 47.

    Kasabov N (2003) Evolving connectionist systems. Springer, London

    Google Scholar 

  48. 48.

    Asuncion A, Newman DJ (2007) UCI machine learning repository. Web site, Department of Information and Computer Sciences, University of California, Irvine, http://www.ics.uci.edu/~mlearn/MLRepository.html

  49. 49.

    Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, New York pp 71–80

  50. 50.

    Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco

    Google Scholar 

  51. 51.

    Dewan MF, Zhang L, Hossain A, Chowdhury MR, Rebecca S, Graham S, Keshav D (2013) An adaptive ensemble classifier for mining concept drifting data streams. Expert Sys Appl 40(15):5895–5906. doi:10.1016/j.eswa.05.001

    Article  Google Scholar 

  52. 52.

    Zliobaite I (2009) Learning under concept drift: an overview, Technical report faculty of mathematics and informatics. Vilnius UniversityLithuania, Vilnius

    Google Scholar 

  53. 53.

    Tumer K, Ghosh J (1996) Error correlation and error reduction in ensemble classifiers. Connect Sci 8(3):385–404

    Article  Google Scholar 

  54. 54.

    Schlimmer J, Granger R (1986) Beyond incremental processing: tracking concept drift. In: Proceedings of the 5th National Conference on Artificial Intelligence, AAAI Press, Menlo Park, CA, pp 502–507

  55. 55.

    Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22:1517–1531

    Article  Google Scholar 

  56. 56.

    Bhardwaj M, Bhatnagar V (2014) Towards an optimally pruned classifier ensemble. Int J Mach Learn Cybernet. doi:10.1007/s13042-014-0303-8

    Google Scholar 

  57. 57.

    Baumgartner D, Serpen G (2013) Performance of global–local hybrid ensemble versus boosting and bagging ensembles. Int J Mach Learn Cybernet 4(4):301–317

    Article  Google Scholar 

  58. 58.

    Christou IT, Gekas G, Kyrikou A (2012) A classifier ensemble approach to the TV-viewer profile adaptation problem. Int J Mach Learn Cybernet 3(4):313–326

    Article  Google Scholar 

  59. 59.

    Wang XZ, Wang R, Feng HM, Wang H (2014) A new approach to classifier fusion based on upper integral. IEEE Transactions on Cybernetics 44(5):620–635

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Parneeta Sidhu.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sidhu, P., Bhatia, M.P.S. A novel online ensemble approach to handle concept drifting data streams: diversified dynamic weighted majority. Int. J. Mach. Learn. & Cyber. 9, 37–61 (2018). https://doi.org/10.1007/s13042-015-0333-x

Download citation

Keywords

  • Concept drift
  • Ensemble
  • Diversity
  • Data stream
  • Online learning