Abstract
Classifying non-stationary and imbalanced data streams encompasses two important challenges, namely concept drift and class imbalance. Concept drift is changes in the underlying function being learnt, and class imbalance is vast difference between the numbers of instances in different classes of data. Class imbalance is an obstacle for the efficiency of most classifiers. Previous methods for classifying non-stationary and imbalanced data streams mainly focus on batch solutions, in which the classification model is trained using a chunk of data. Here, we propose two online classifiers. The classifiers are one-layer NNs. In the proposed classifiers, class imbalance is handled with two separate cost-sensitive strategies. The first one incorporates a fixed and the second one an adaptive misclassification cost matrix. The proposed classifiers are evaluated on 3 synthetic and 8 real-world datasets. The results show statistically significant improvements in imbalanced data metrics.
Similar content being viewed by others
References
Aggarwal CC (2006) Data streams: models and algorithms. Springer, Berlin
Klinkenberg R, Joachims T (2000) Detecting concept drift with support vector machines. Paper presented at the 17th international conference on machine learning, San Mateo
Sun J, Li H (2011) Dynamic financial distress prediction using instance selection for the disposal of concept drift. Expert Syst Appl 38(3):2566–2576
Martínez-Rego D, Pérez-Sánchez B, Fontenla-Romero O, Alonso-Betanzos A (2011) A robust incremental learning method for non-stationary environments. Neurocomputing 74(11):1800–1808
Pavlidis NG, Tasoulis DK, Adams NM, Hand DJ (2011) An adaptive classifier for data streams. Pattern Recognit 44(1):78–96
Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(1):1517–1531
Abdulsalam H, Skillicorn DB, Martin P (2011) Classification using streaming random forests. IEEE Trans Knowl Data Eng 23(1):22–36
Masud MM, Jing G, Khan L, Jiawei H, Thuraisingham BM (2011) Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans Knowl Data Eng 23(6):859–874
Gao J, Fan W, Han J, Yu PS (2007) A general framework for mining concept-drifting data streams with skewed distributions. Paper presented at the SIAM
Lichtenwalter R, Chawla NV (2009) Learning to classify data streams with imbalanced class distributions. Paper presented at the PAKDD
Lichtenwalter R, Chawla NV (2009) Adaptive methods for classification in arbitrarily imbalanced and drifting data streams. Paper presented at the PAKDD workshop for data mining when classes are imbalanced and errors have costs
Chen S, He H (2010) Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evol Syst 2(1):35–50
Ditzler G, Polikar R (2010) An ensemble based incremental learning framework for concept drift and class imbalance. Paper presented at the WCCI
Tsymbal A (2004) The problem of concept drift: definitions and related work. Technical Report: TCD-CS-2004-15. Trinity College Dublin, Computer Science Department, Dublin
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Zadrozny B, Langford J, Abe N (2003, Nov) Cost-sensitive learning by cost proportionate example weighting. Paper presented at the 3rd IEEE international conference on data mining, Melbourne
Ling CX, Li C (2004, July) Decision trees with minimal costs. Paper presented at the 21st International Conference on Machine Learning, Banff
Domingos P (1999) Metacost: a general method for making classifiers cost-sensitive. In: Paper presented at the proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. San Diego, CA
Lan J-s, Berardi V, Patuwo B, Hu M (2009) A joint investigation of misclassification treatments and imbalanced datasets on neural network performance. Neural Comput Appl 18(7):689–706. doi:10.1007/s00521-009-0239-1
Yoon K, Kwek S (2007) A data reduction approach for resolving the imbalanced data issue in functional genomics. Neural Comput Appl 16(3):295–306. doi:10.1007/s00521-007-0089-7
Zhi-Hua Z, Xu-Ying L (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77
Street NW, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. Paper presented at the 7th ACM SIGKDD international conference on knowledge discovery and data mining
Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23:60–101
Narasimhamurthy A, Kuncheva LI (2007) A framework for generating data to simulate changing environments. Paper presented at the IASTED international conference on artificial intelligence and applications
Harries M (1999) Splice-2 comparative evaluation: electricity pricing. University of South Wales
Neurotech (2009) PAKDD 2009 data mining competition. http://sede.neurotech.com.br:443/PAKDD2009/
NOAA (2010) Weather data. http://users.rowan.edu/~polikar/research/NSE/
UCI Repository of Machine Learning Database (2007) School of information and computer science. University of California, Irvine. http://www.ics.uci.edu/~mlearn/MLRepository.html
Yang Y, Wu X, Zhu X (2006) Mining in anticipation for concept change: proactive-reactive prediction in data streams. Data Mining Knowl Discov 13(3):261–289
Alpaydın E (2010) Introduction to machine learning, 2nd edn. The MIT Press, Cambridge
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ghazikhani, A., Monsefi, R. & Sadoghi Yazdi, H. Online cost-sensitive neural network classifiers for non-stationary and imbalanced data streams. Neural Comput & Applic 23, 1283–1295 (2013). https://doi.org/10.1007/s00521-012-1071-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-012-1071-6