Skip to main content
Log in

Online cost-sensitive neural network classifiers for non-stationary and imbalanced data streams

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Classifying non-stationary and imbalanced data streams encompasses two important challenges, namely concept drift and class imbalance. Concept drift is changes in the underlying function being learnt, and class imbalance is vast difference between the numbers of instances in different classes of data. Class imbalance is an obstacle for the efficiency of most classifiers. Previous methods for classifying non-stationary and imbalanced data streams mainly focus on batch solutions, in which the classification model is trained using a chunk of data. Here, we propose two online classifiers. The classifiers are one-layer NNs. In the proposed classifiers, class imbalance is handled with two separate cost-sensitive strategies. The first one incorporates a fixed and the second one an adaptive misclassification cost matrix. The proposed classifiers are evaluated on 3 synthetic and 8 real-world datasets. The results show statistically significant improvements in imbalanced data metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Aggarwal CC (2006) Data streams: models and algorithms. Springer, Berlin

    Google Scholar 

  2. Klinkenberg R, Joachims T (2000) Detecting concept drift with support vector machines. Paper presented at the 17th international conference on machine learning, San Mateo

  3. Sun J, Li H (2011) Dynamic financial distress prediction using instance selection for the disposal of concept drift. Expert Syst Appl 38(3):2566–2576

    Article  Google Scholar 

  4. Martínez-Rego D, Pérez-Sánchez B, Fontenla-Romero O, Alonso-Betanzos A (2011) A robust incremental learning method for non-stationary environments. Neurocomputing 74(11):1800–1808

    Article  Google Scholar 

  5. Pavlidis NG, Tasoulis DK, Adams NM, Hand DJ (2011) An adaptive classifier for data streams. Pattern Recognit 44(1):78–96

    Article  MATH  Google Scholar 

  6. Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(1):1517–1531

    Article  Google Scholar 

  7. Abdulsalam H, Skillicorn DB, Martin P (2011) Classification using streaming random forests. IEEE Trans Knowl Data Eng 23(1):22–36

    Article  Google Scholar 

  8. Masud MM, Jing G, Khan L, Jiawei H, Thuraisingham BM (2011) Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans Knowl Data Eng 23(6):859–874

    Article  Google Scholar 

  9. Gao J, Fan W, Han J, Yu PS (2007) A general framework for mining concept-drifting data streams with skewed distributions. Paper presented at the SIAM

  10. Lichtenwalter R, Chawla NV (2009) Learning to classify data streams with imbalanced class distributions. Paper presented at the PAKDD

  11. Lichtenwalter R, Chawla NV (2009) Adaptive methods for classification in arbitrarily imbalanced and drifting data streams. Paper presented at the PAKDD workshop for data mining when classes are imbalanced and errors have costs

  12. Chen S, He H (2010) Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evol Syst 2(1):35–50

    Google Scholar 

  13. Ditzler G, Polikar R (2010) An ensemble based incremental learning framework for concept drift and class imbalance. Paper presented at the WCCI

  14. Tsymbal A (2004) The problem of concept drift: definitions and related work. Technical Report: TCD-CS-2004-15. Trinity College Dublin, Computer Science Department, Dublin

  15. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  16. Zadrozny B, Langford J, Abe N (2003, Nov) Cost-sensitive learning by cost proportionate example weighting. Paper presented at the 3rd IEEE international conference on data mining, Melbourne

  17. Ling CX, Li C (2004, July) Decision trees with minimal costs. Paper presented at the 21st International Conference on Machine Learning, Banff

  18. Domingos P (1999) Metacost: a general method for making classifiers cost-sensitive. In: Paper presented at the proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. San Diego, CA

  19. Lan J-s, Berardi V, Patuwo B, Hu M (2009) A joint investigation of misclassification treatments and imbalanced datasets on neural network performance. Neural Comput Appl 18(7):689–706. doi:10.1007/s00521-009-0239-1

    Article  Google Scholar 

  20. Yoon K, Kwek S (2007) A data reduction approach for resolving the imbalanced data issue in functional genomics. Neural Comput Appl 16(3):295–306. doi:10.1007/s00521-007-0089-7

    Article  Google Scholar 

  21. Zhi-Hua Z, Xu-Ying L (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77

    Article  Google Scholar 

  22. Street NW, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. Paper presented at the 7th ACM SIGKDD international conference on knowledge discovery and data mining

  23. Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23:60–101

    Google Scholar 

  24. Narasimhamurthy A, Kuncheva LI (2007) A framework for generating data to simulate changing environments. Paper presented at the IASTED international conference on artificial intelligence and applications

  25. Harries M (1999) Splice-2 comparative evaluation: electricity pricing. University of South Wales

  26. Neurotech (2009) PAKDD 2009 data mining competition. http://sede.neurotech.com.br:443/PAKDD2009/

  27. NOAA (2010) Weather data. http://users.rowan.edu/~polikar/research/NSE/

  28. UCI Repository of Machine Learning Database (2007) School of information and computer science. University of California, Irvine. http://www.ics.uci.edu/~mlearn/MLRepository.html

  29. Yang Y, Wu X, Zhu X (2006) Mining in anticipation for concept change: proactive-reactive prediction in data streams. Data Mining Knowl Discov 13(3):261–289

    Article  MathSciNet  Google Scholar 

  30. Alpaydın E (2010) Introduction to machine learning, 2nd edn. The MIT Press, Cambridge

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adel Ghazikhani.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghazikhani, A., Monsefi, R. & Sadoghi Yazdi, H. Online cost-sensitive neural network classifiers for non-stationary and imbalanced data streams. Neural Comput & Applic 23, 1283–1295 (2013). https://doi.org/10.1007/s00521-012-1071-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-012-1071-6

Keywords

Navigation