An Ensemble Classification Algorithm Based on Information Entropy for Data Streams

Abstract

Data stream mining has attracted much attention from scholars. In recent researches, ensemble classification has been wide aplied in concept drift detection; however, most of them regard classification accuracy as a criterion for judging whether concept drift happens or not. Information entropy is an important and effective method for measuring uncertainty. Based on the information entropy theory, a new algorithm using information entropy to evaluate a classification result is developed. It utilizes the methods of ensemble learning and the weight of each classifier is decided by the entropy of the result produced by an ensemble classifiers system. When the concept in data stream changes, the classifiers whose weight are below a predefined threshold will be abandoned to adapt to a new concept. In the experimental analysis, the proposed algorithm and six comparision algorithms are executed on six experimental data sets. The results show that the proposed method can not only handle concept drift effectively, but also have a better performance than the comparision algorithms.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. 1.

    Abdulsalam H, Skillicorn DB, Martin P (2007) Streaming random forests. In: 11th International database engineering & applications symposium, pp 225–232

  2. 2.

    Becker H, Arias M (2007) Real-time ranking with concept drift using expert advice. In: ACM SIGKDD international conference on knowledge discovery & data mining, pp 86–94

  3. 3.

    Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) Massive online analysis. J Mach Learn Res 11(2):1601–1604

    Google Scholar 

  4. 4.

    Bifet A, Holmes G, Pfahringer B, Kirkby R (2009) New ensemble methods for evolving data streams. In: ACM SIGKDD international conference on knowledge discovery & data mining. ACM 2009, pp 139–148

  5. 5.

    Brzezinski D, Stefanowski J (2013) Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans Neural Netw Learn Syst 25(1):81–94

    Article  Google Scholar 

  6. 6.

    Brzezinski D, Stefanowski J (2014) Combining block-based and online methods in learning ensembles from concept drifting data streams. Inf Sci 265(5):50–67

    MathSciNet  Article  Google Scholar 

  7. 7.

    Czarnowski I, Jedrzejowicz P (2014) Ensemble classifier for mining data streams. Procedia Comput Sci 35(9):397–406

    Article  Google Scholar 

  8. 8.

    Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp 71–80

  9. 9.

    Domingos P, Hulten G (2001) A general method for scaling up machine learning algorithms and its application to clustering. In: Proceedings of the 18th international conference on machine learning, pp 106–113

  10. 10.

    Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(10):1517–1531

    Article  Google Scholar 

  11. 11.

    Escandell-Montero P, Lorente D, Martnez-Martnez JM, Soria-Olivas E, Martn-Guerrero JD (2016) Online fitted policy iteration based on extreme learning machines. Knowl-Based Syst. 100:200–211

    Article  Google Scholar 

  12. 12.

    Farid D, Li Z, Hossain A, Rahman C, Strachan R, Sexton G, Dahal K (2013) An adaptive ensemble classifier for mining concept drifting data streams. Expert Syst Appl 40(15):5895–5906

    Article  Google Scholar 

  13. 13.

    Gama J, Medas P, Rodrigues P (2005) Learning decision trees from dynamic data streams. In: Acm symposium on applied computing, pp 573–577

  14. 14.

    Gama J, Sebastiao R, Rodrigues P (2009) Issues in evaluation of stream learning algorithms. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 329–338

  15. 15.

    Gomes HM, Enembreck F (2013) Sae: social adaptive ensemble classifier for data streams. In: Computational intelligence & data mining, pp 199–206

  16. 16.

    Gu Y, Liu J, Chen Y, Jiang X, Yu H (2014) Toselm: timeliness online sequential extreme learning machine. Neurocomputing 128(27):119–127

    Article  Google Scholar 

  17. 17.

    Huang G, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B 42(2):513–529

    Article  Google Scholar 

  18. 18.

    Huang G, Zhu Q, Siew C (2005) Extreme learning machine: a new learning scheme of feedforward neural networks. In: IEEE international joint conference on neural networks

  19. 19.

    Huang G, Zhu Q, Siew C (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501

    Article  Google Scholar 

  20. 20.

    Kolter JZ, Maloof M.A (2005) Using additive expert ensembles to cope with concept drift. In: International conference on machine learning, pp 449–456

  21. 21.

    Kumar V, Gaur P, Mittal AP (2013) Trajectory control of dc servo using os-elm based controller. In: Power India conference, pp 1–5

  22. 22.

    Li P, Wu X, Hu X, Hao W (2015) Learning concept-drifting data streams with random ensemble decision trees. Neurocomputing 166(C):68–83

    Article  Google Scholar 

  23. 23.

    Liang NY, Huang GB, Saratchandran P, Sundararajan N (2006) A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans Neural Netw 17(6):1411–23

    Article  Google Scholar 

  24. 24.

    Lim J, Lee S, Pang H (2013) Low complexity adaptive forgetting factor for online sequential extreme learning machine (os-elm) for application to nonstationary system estimations. Neural Comput Appl 22(3–4):569–576

    Article  Google Scholar 

  25. 25.

    Liu D, Wu Y, Jiang H (2016) Fp-elm: an online sequential learning algorithm for dealing with concept drift. Neurocomputing 207:322–334

    Article  Google Scholar 

  26. 26.

    Ma Z, Luo G, Huang D (2016) Short term traffic flow prediction based on on-line sequential extreme learning machine. In: Eighth international conference on advanced computational intelligence, pp 143–149

  27. 27.

    Minku L, Yao X (2012) Ddd: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24(4):619–633

    Article  Google Scholar 

  28. 28.

    Ouyang Z, Min Z, Tao W, Wu Q (2009) Mining concept-drifting and noisy data streams using ensemble classifiers. In: International conference on artificial intelligence & computational intelligence, pp 360–364

  29. 29.

    Ramamurthy S, Bhatnagar R (2007) Tracking recurrent concept drift in streaming data using ensemble classifiers. In: International conference on machine learning & applications, pp 404–409

  30. 30.

    Rushing J, Graves S, Criswell E.e.a (2004) A coverage based ensemble algorithm (cbea) for streaming data. In: IEEE international conference on tools with artificial intelligence, pp 106–112

  31. 31.

    Rutkowski L, Jaworski M, Pietruczuk L, Duda P (2013) Decision trees for mining data streams based on the Gaussian approximation. IEEE Trans Knowl Data Eng 25(6):1272–1279

    Article  Google Scholar 

  32. 32.

    Ryang H, Yun U (2016) High utility pattern mining over data streams with sliding window technique. Expert Syst Appl 57(C):214–231

    Article  Google Scholar 

  33. 33.

    Shannon CE (1938) A mathematical theory of communication. Bell Syst Tech J 196(4):519–520

    Google Scholar 

  34. 34.

    Street W (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: ACM SIGKDD international conference on knowledge discovery & data mining, pp 377–382

  35. 35.

    Wang H, Yu P, Han J (2003) Mining concept-drifting data streams. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery & data mining, pp 226–235

  36. 36.

    Wei Q, Yang Z, Zhu J, Qiang Q (2009) Mining multi-label concept-drifting data streams using dynamic classifier ensemble. In: International conference on Fuzzy systems and knowledge discovery, pp 275–279

  37. 37.

    Wu X, Li P, Hu X (2012) Learning from concept drifting data streams with unlabeled data. Neurocomputing 92(3):145–155

    Article  Google Scholar 

  38. 38.

    Xu S, Wang J (2016) A fast incremental extreme learning machine algorithm for data streams classification. Expert Syst Appl 65:332–344

    Article  Google Scholar 

  39. 39.

    Xu S, Wang J (2017) Dynamic extreme learning machine for data stream classification. Neurocomputing 238:433–449

    Article  Google Scholar 

  40. 40.

    Yang Z, Wu Q, Leung C, Miao C (2015) OS-ELM based emotion recognition for empathetic elderly companion. Proceedings of ELM-2014, vol 2. Springer, Cham

    Google Scholar 

  41. 41.

    Zhai J, Wang J, Wang X (2014) Ensemble online sequential extreme learning machine for large data set classification. In: IEEE international conference on systems, man and cybernetics, pp 2250–2255

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (Nos. 61772323, 61202018, 61432011, and U1435212), the National Key Basic Research and Development Program of China (973) (No. 2013CB329404), and the Natural Science Foundation of Shanxi Province, China (Nos. 201701D121051 and 201701D221098). The authors are grateful to the editor and the anonymous reviewers for constructive comments that helped to improve the quality and presentation of this paper.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Junhong Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, J., Xu, S., Duan, B. et al. An Ensemble Classification Algorithm Based on Information Entropy for Data Streams. Neural Process Lett 50, 2101–2117 (2019). https://doi.org/10.1007/s11063-019-09995-7

Download citation

Keywords

  • Data streams
  • Data mining
  • Concept drift
  • Information entropy
  • Ensemble classification