Abstract
With the advance in both hardware and software technologies, streaming data is ubiquitous today, and it is often a challenging task to store, analyze and visualize such rapid large volumes of data. One of difficult problems in the data stream domain is the data streams classification problem. The traditional classification algorithms have to be adapted to run in a streaming environment because of the underlying resource constraints in terms of memory and running time. There are at least three hard aspects in the data streams classification: large length, concept drift and feature selection. Concept drift is a common attribute of data streams that occurs as a result of changes in the underlying concepts. Feature selection has been extensively studied from a conventional mining perspective, but it is a much more challenging problem in the data stream domain. The concept drift and large length make impossible applying classical feature selection methods in the learning procedure. This paper proposes a new Bayesian framework to feature selection in data streams pattern recognition problem. We suggest a hierarchical probabilistic model with sparse regularization for estimation of decision rule parameters. The proposed approach gives a strong Bayesian formulation of the shrinkage criterion for predictor selection. Experimental results show that the proposed framework outperforms other methods of concept drift analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Machine Learning 23(1), 69–101 (1996)
Dongre, P., Malik, L.: Stream Data Classification and Adapting to Gradual Concept Drift. International Journal of Advance Research in Computer Science and Management Studies 2(3), 125–129 (2014)
Chen, S., Wang, H., Zhou, S., Yu, P.: Stop chasing trends: discovering high order models in evolving data. In: Proceedings of the ICDE 2008, pp. 923–932 (2008)
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: SIGKDD, San Francisco, CA, USA, pp. 97–106, August 2001
Yang, Y., Wu, X., Zhu, X.: Combining proactive and reactive predictions for data streams. In: Proceedings of the SIGKDD, pp. 710–715 (2005)
Kolter, J., Maloof, M.: Using additive expert ensembles to cope with concept drift. In: ICML, Bonn, Germany, pp. 449–456, August 2005
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: KDD 2003, pp. 226–235 (2003)
Zhou, X., Li, S., Chang, C., Wu, J., Liu, K.: Information-value-based feature selection algorithm for anomaly detection over data streams. Tehnicki Vjesnik 21(2), 223–232 (2014)
Sauerbrei, W.: The use of resampling methods to simplify regression models in medical statistics. Apply Statistics 48, 313–339 (1999)
Sauerbrei, W., Schumacher, M.: A bootstrap resampling procedure for model building: Application to the cox regression model. Statistics in Medicine (1992)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society (2005)
Zou, H.: The adaptive lasso and its oracle properties. Journal of the American Statistical Association (2006)
Zou, H., Li, R.: One-step sparse estimates in nonconcave penalized likelihood models (with discussion). Annals of Statistics (2008)
Seredin, O., Kopylov, A., Mottl, V.: Selection of subsets of ordered features in machine learning. In: Perner, P. (ed.) MLDM 2009. LNCS, vol. 5632, pp. 16–28. Springer, Heidelberg (2009)
Seredin, O., Mottl, V., Tatarchuk, A., Razin, N., Windridge, D.: Convex support and relevance vector machines for selective multimodal pattern recognition. In: 21st International Conference on Pattern Recognition (ICPR 2012), Tsukuba, Japan, November 11–15, 2012, pp. 1647–1650 (2012)
Fan, J., Samworth, R., Wu, Y.: Ultrahigh Dimensional Feature Selection: Beyond The Linear Model. J. Mach. Learn. Res. 10, 2013–2038 (2009)
Cai, D., Zhang, C., He, X.: Unsupervised Feature Selection for Multi-cluster Data. SIGKDD (2010)
Yang, H., Lyu, M.R., King, I.: Efficient Online Learning for Multitask Feature Selection. TKDD 7(2), 6 (2013)
Song, Q., Ni, J., Wang, G.: A Fast Clustering-based Feature Subset Selection Algorithm for High-dimensional Data. TKDE 25(1) 2013
Maung, C., Schweitzer, H.: Pass-efficient Unsupervised Feature Selection. NIPS (2013)
Krasotkina, O., Mottl, V.: A Bayesian approach to sparse learning-to-rank for search engine optimization. In: Perner, P. (ed.) MLDM 2015. LNCS, vol. 9166, pp. 382–394. Springer, Heidelberg (2015)
Krasotkina, O., Mottl, V.: A Bayesian approach to sparse Cox regression in high-dimentional survival analysis. In: Perner, P. (ed.) MLDM 2015. LNCS, vol. 9166, pp. 425–437. Springer, Heidelberg (2015)
De Groot, M.: Optimal Statistical Decisions. McGraw-Hill Book Company (1970)
Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20, 273–297 (1995)
Vapnik, V.: Statistical Learning Theory. J. Wiley, NY (1998)
Tatarchuk, A., Mottl, V., Eliseyev, A., Windridge, D.: Selectivity supervision in com-bining pattern-recognition modalities by feature-and kernel-selective Support Vector Machines. In: Proceedings of the 19th International Conference on Pattern Recognition, vol. 1–6, pp. 2336–2339. IEEE (2008). ISBN 978-1-4244-2174-9
Markov, M., Krasotkina, O., Mottl, V., Muchnik, I.: Time-varying regression model with unknown time-volatility for nonstationary signal analysis. In: Proceedings of the 8th IASTED International Conference on Signal and Image Processing, Honolulu, Hawaii, USA, August 14–16, 2006, pp. 534–196 (2006)
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
Sniedovich, M.: Dynamic Programming. Marcel Dekker, NY (1991)
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analysis. Journal of Machine Learning Research (JMLR) (2010). http://sourceforge.net/projects/moa-datastream/
Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavalda, R.: New ensemble methods for evolving data streams. In: 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2009)
Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)
KDD Cup 1999 Data. http://kdd.ics.uci.edu/databases/kddcup99/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Turkov, P., Krasotkina, O., Mottl, V., Sychugov, A. (2016). Feature Selection for Handling Concept Drift in the Data Stream Classification. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2016. Lecture Notes in Computer Science(), vol 9729. Springer, Cham. https://doi.org/10.1007/978-3-319-41920-6_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-41920-6_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41919-0
Online ISBN: 978-3-319-41920-6
eBook Packages: Computer ScienceComputer Science (R0)