Advertisement

Evolving Systems

, Volume 10, Issue 3, pp 351–362 | Cite as

A novel approach using incremental oversampling for data stream mining

  • N. AnupamaEmail author
  • Sudarson Jena
Original Paper
  • 71 Downloads

Abstract

Data stream mining is very popular in recent years with advanced electronic devices generating continuous data streams. The performance of standard learning algorithms is been compromised with imbalance nature present in real world data streams. In this paper we propose a novel algorithm dubbed as increment over sampling for data streams (IOSDS) which uses an unique over sampling technique to almost balance the data sets to minimize the effect of imbalance in stream mining process. The experimental analysis is conducted on 15 data chunks of data streams with varied sizes and different imbalance ratios. The results suggests that the proposed IOSDS algorithm improves the knowledge discovery over benchmark algorithms like C4.5 and Hoeffding tree in terms of standard performance measures namely accuracy, AUC, precision, recall and F-measure.

Keywords

Knowledge discovery Data streams Imbalanced data Oversampling Increment over sampling for data streams (IOSDS) 

Notes

References

  1. Alcalá-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17:2–3 (255–287) Google Scholar
  2. Angelov PP (2012) Autonomous learning systems: from data streams to knowledge in real-time. Wiley, New YorkCrossRefGoogle Scholar
  3. Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604Google Scholar
  4. Bifet A, Holmes G, Pfahringer B, Read J, Kranen P, Kremer H, Jansen T, Seidl T (2011) MOA: a real-time analytics open source framework. In: Joint European conference on machine learning and knowledge discovery in databases, ECML PKDD 2011: machine learning and knowledge discovery in databases, pp 617–620Google Scholar
  5. Brown I, Mues C (2012) An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst Appl 39:3446–3453CrossRefGoogle Scholar
  6. Cao P, Zhao D, Zaiane O (2011) A PSO-based cost-sensitive neural network for imbalanced data classification, adfa. Springer, Berlin, p 1Google Scholar
  7. Chen Y (2008) Learning classifiers from imbalanced, only positive and unlabeled data sets. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 213–220Google Scholar
  8. Czarnowski I, Jedrzejowicz P (2014) Ensemble classifier for mining data streams. In: 18th international conference on knowledge-based and intelligent information and engineering systems—KES2014.  https://doi.org/10.1016/j.procs.2014.08.120
  9. Ditzler G, Polikar R (2013) Incremental learning of concept drift from streaming imbalanced data. In: IEEE transactions on knowledge and data engineering, Digital Object Indentifier.  https://doi.org/10.1109/TKDE.2012.136
  10. Doucette J, Heywood MI (2008) GP classification under imbalanced data sets: active sub-sampling and AUC approximation. In: O’Neill M et al (eds) EuroGP 2008, LNCS 4971. Springer, Berlin, pp 266–277Google Scholar
  11. Gama J (2010) Knowledge discovery from data streams. Chapman & Hall/CRC, Boca RatonCrossRefzbMATHGoogle Scholar
  12. Hamilton A, Newman AD (2007) UCI repository of machine learning database (School of Information and Computer Science). University of California, Irvine. http://www.ics.uci.edu/~mlearn/MLRepository.html. Accessed 3 May 2017
  13. Hoens TR, Polikar R, Chawla NV (2012) Learning from streaming data with concept drift and imbalance: an overview. Prog Artif Intell 1:89–101,  https://doi.org/10.1007/s13748-011-0008-0 CrossRefGoogle Scholar
  14. Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 97–106Google Scholar
  15. Jankowski D, Jackowski K, Cyganek B (2016) Learning decision trees from data streams with concept drift. In: ICCS 2016. The international conference on computational science, vol 80, pp 1682–1691Google Scholar
  16. Khamassi I, SayedMouchaweh M, Hammami M, Ghédira K (2016) Discussion and review on evolving data streams and concept drift adapting. Evol Syst Springer.  https://doi.org/10.1007/s12530-016-9168-2,Google Scholar
  17. Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5:221.  https://doi.org/10.1007/s13748-016-0094-0 CrossRefGoogle Scholar
  18. Krempl G, Zliobaite I, Brzezinski D, Hullermeier E, Last M, Lemaire V, Noack T, Shaker A, Sievi S, Spiliopoulou M, Stefanowski J (2014) Open challenges for data stream mining research. SIGKDD Explor 16(1):1–10CrossRefGoogle Scholar
  19. Li Q, Mao Y (2014) A review of boosting methods for imbalanced data classification. Pattern Anal Appl 17(4):679–693MathSciNetCrossRefzbMATHGoogle Scholar
  20. López V, Triguero I, Carmona CJ, García S, Herrera F (2014) Addressing imbalanced classification with instance generation techniques: IPADE-ID. Neurocomputing 126:15–28CrossRefGoogle Scholar
  21. Lorena AC, Jacintho LFO, Siqueira MF, Giovanni RD, Lohmann LG, de Carvalho ACPLF, Yamamoto M (2011) Comparing machine learning classifiers in potential distribution modelling. Expert Syst Appl 38:5268–5275CrossRefGoogle Scholar
  22. Lughofer E, Buchtala O (2013) Reliable all-pairs evolving fuzzy classifiers. IEEE Trans Fuzzy Syst 21(4):625–641CrossRefGoogle Scholar
  23. Lughofer E, Weig E, Heid W, Eitzinger C, Radauer T (2015) Integrating new classes on the fly in evolving fuzzy classifier designs and its application in visual inspection. Appl Soft Comput 35:558–582CrossRefGoogle Scholar
  24. Lughofer E, Weigl E, Heidl W, Eitzinger C, Radauer T (2016) Recognizing input space and target concept drifts in data streams with scarcely labeled and unlabelle d instances. Inf Sci 355–356:127–151CrossRefGoogle Scholar
  25. Menon AK, Narasimhan H, Agarwal S, Chawla S (2013) On the statistical consistency of algorithms for binary classification under class imbalance. In: Appearing in proceedings of the 30th international conference on machine learning, Atlanta, Georgia, USAGoogle Scholar
  26. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, BurlingtonGoogle Scholar
  27. Sayed-Mouchaweh M, Lughofer E (2012) Learning in non-stationary environments: methods and applications. Springer, New YorkCrossRefzbMATHGoogle Scholar
  28. Song G, Ye Y (2014) A dynamic ensemble framework for mining textual streams with class imbalance. Hindawi Publ Corp Sci World J.  https://doi.org/10.1155/2014/497354. (Article ID 497354) Google Scholar
  29. Thalor MA, Patil S (2016) Incremental learning on non-stationary data stream using ensemble approach. Int J Electr Comput Eng (IJECE) 6(4):1811–1817.  https://doi.org/10.11591/ijece.v6i4.10255 Google Scholar
  30. Verbiesta N, Ramentol E, Cornelisa C, Herrera F (2014) Preprocessing noisy imbalanced datasets using SMOTE enhanced withfuzzy rough prototype selection. Appl Soft Comput 22:511–517CrossRefGoogle Scholar
  31. Wang S, Minku LL, Yao X (2014) A multi-objective ensemble method for online class imbalance learning. In: 2014 international joint conference on neural networks IJCNN July 6–11, Beijing, ChinaGoogle Scholar
  32. Wang S, Minku LL, Yao X (2015) Resampling-based ensemble methods for online class imbalance learning. In: IEEE transactions on knowledge and data engineering.  https://doi.org/10.1109/TKDE.2014.2345380
  33. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San FranciscozbMATHGoogle Scholar
  34. Yang B, Jing L (2014) A novel nonparallel plane proximal SVM for imbalance data classification. J Softw 9(9):2384–2392MathSciNetGoogle Scholar
  35. Yu S, Tang K, Minku LL, Wang S, Yao X (2016) Online ensemble learning of data streams with gradually evolved classes. In: IEEE transactions on knowledge and data engineeringGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.GITAM UniversityHyderabadIndia
  2. 2.Sambalpur University Institute of Information TechnologySambalpurIndia

Personalised recommendations