Advertisement

Multi-Window Based Ensemble Learning for Classification of Imbalanced Streaming Data

  • Ye WangEmail author
  • Hu Li
  • Hua Wang
  • Bin Zhou
  • Yanchun Zhang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9419)

Abstract

Imbalanced streaming data is widely existed in real world and has attracted much attention in recent years. Most studies focus on either imbalance data or streaming data; however, both imbalance data and streaming data are always accompanied in practice. In this paper, we propose a multi-window based ensemble learning (MWEL as short) method for the classification of imbalanced streaming data. Three types of windows are defined to store the current batch of instances, the latest minority instances and the ensemble classifier. The ensemble classifier consists of a set of latest sub-classifiers, and instances each sub-classifier trained on respectively. All sub-classifiers are weighted before predicting new arriving instance’s class labels and new sub-classifiers are trained if a precision is below a threshold. Extensive experiments on synthetic datasets and real world datasets demonstrate that the new approach can efficiently and efficiently classify imbalanced streaming data and outperform existing approaches.

Keywords

Streaming data Class imbalance Multi-window Ensemble learning 

Notes

Acknowledgements

This work was supported by ARC DP project (DP 130101327), 973 Program (Grant No. 2013CB329601, 2013CB329602, 2013CB329604), NSFC (Grant No. 60933005, 91124002), 863 Program (Grant No. 2012AA01A401, 2012AA01A402), National Key Technology R&D Program (Grant No. 2012BAH38B04, 2012BAH38B06).

References

  1. 1.
    Delany, S.J., Cunningham, P., Tsymbal, A., Coyle, L.: A case-based technique for tracking concept drift in spam filtering. In: Macintosh, A., Ellis, R., Allen, T. (eds.) Applications and Innovations in Intelligent Systems XII, pp. 3–16. Springer, London (2005)CrossRefGoogle Scholar
  2. 2.
    Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 226–235. ACM, New York, NY, USA (2003)Google Scholar
  3. 3.
    Parveen, P., Weger, Z.R., Thuraisingham, B., Hamlen, K., Khan, L.: Supervised learning for insider threat detection using stream mining. In: Proceedings of the 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence, pp. 1032–1039. IEEE Computer Society, Washington, DC, USA (2011)Google Scholar
  4. 4.
    Wang, X., Jia, Y., Chen, R., Fan, H., Zhou, B.: Improving text categorization with semantic knowledge in Wikipedia. IEICE Trans. Inf. Syst. E96-D, 2786–2794 (2013)Google Scholar
  5. 5.
    Hoens, T.R., Polikar, R., Chawla, N.V.: Learning from streaming data with concept drift and imbalance: an overview. Prog. Artif. Intell. 1, 89–101 (2012)CrossRefGoogle Scholar
  6. 6.
    Shen, X., Boutell, M., Luo, J., Brown, C.: Multilabel machine learning and its application to semantic scene classification. Presented at the storage and retrieval methods and applications for multimedia 2004, 1 December 2003Google Scholar
  7. 7.
    Liu, W., Wang, L., Yi, M.: Simple-random-sampling-based multiclass text classification algorithm. Sci. World J. 2014, 1–7 (2014)zbMATHGoogle Scholar
  8. 8.
    Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM, New York, NY, USA (2000)Google Scholar
  10. 10.
    Shi, J., Luo, Z.: Nonlinear dimensionality reduction of gene expression data for visualization and clustering analysis of cancer tissue samples. Comput. Biol. Med. 40, 723–732 (2010)CrossRefGoogle Scholar
  11. 11.
    Lichtenwalter, R.N., Chawla, N.V.: Adaptive methods for classification in arbitrarily imbalanced and drifting data streams. In: Theeramunkong, T., Nattee, C., Adeodato, P.J.L., Chawla, N., Christen, P., Lenca, P., Poon, J., Williams, G. (eds.) New Frontiers in Applied Data Mining. LNCS, vol. 5669, pp. 53–75. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  12. 12.
    Xioufis, E.S., Spiliopoulou, M., Tsoumakas, G., Vlahavas, I.: Dealing with concept drift and class imbalance in multi-label stream classification. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, vol. 2, pp. 1583–1588. AAAI Press, Barcelona, Catalonia, Spain (2011)Google Scholar
  13. 13.
    Wang, S., Minku, L.L., Yao, X.: A learning framework for online class imbalance learning. In: 2013 IEEE Symposium on Computational Intelligence and Ensemble Learning (CIEL), pp. 36–45 (2013)Google Scholar
  14. 14.
    Wang, S., Minku, L.L., Yao, X.: Online class imbalance learning and its applications in fault detection. Int. J. Comput. Intell. Appl. 12, 1340001 (2013)CrossRefGoogle Scholar
  15. 15.
    Zhang, D., Shen, H., Hui, T., Li, Y., Wu, J., Sang, Y.: A selectively re-train approach based on clustering to classify concept-drifting data streams with skewed distribution. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014, Part II. LNCS, vol. 8444, pp. 413–424. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  16. 16.
    Chen, S., He, H.: Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evol. Syst. 2, 35–50 (2010)CrossRefGoogle Scholar
  17. 17.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Int. Res. 16, 321–357 (2002)zbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Ye Wang
    • 1
    • 2
    Email author
  • Hu Li
    • 2
  • Hua Wang
    • 1
  • Bin Zhou
    • 2
    • 3
  • Yanchun Zhang
    • 1
  1. 1.Centre for Applied InformaticsVictoria UniversityMelbourneAustralia
  2. 2.College of ComputerNational University of Defense TechnologyChangshaChina
  3. 3.State Key Laboratory of High Performance ComputingNational University of Defense TechnologyChangshaChina

Personalised recommendations