Abstract
Data stream mining refers to methods able to mine continuously arriving and evolving data sequences or even large scale static databases. Mining data streams has attracted much attention recently. Many data stream classification methods are supervised, hence they require labeled samples that are more difficult and expensive to obtain than unlabeled ones. This paper proposes an incremental semi-supervised clustering approach for data stream classification. Preliminary experimental results on the benchmark data set KDD-CUP’99 show the effectiveness of the proposed algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Any stream can be turned into a chunked stream by simply waiting for enough data points to arrive.
References
Aggarwal, C.C.: A framework for diagnosing changes in evolving data streams. In: Proceedings of ACM SIGMOD Conference, pp. 575–586 (2003)
Almeida, R.J., Sousa, J.M.C.: Comparison of fuzzy clustering algorithms for classification. In: Proceedings of International Symposium on Evolving Fuzzy Systems, pp. 112–117 (2006)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 1–16 (2002)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of Annual Conference on Computational Learning Theory, pp. 92–100 (1998)
Bolon-Canedo, V., Sanchez-Marono, N., Alonso-Betanzos, A.: A combination of discretization and filter methods for improving classification performance in KDD Cup 99 dataset. In: Proceedings of International Joint Conference on Neural Networks, pp. 359–366 (2009)
Castellano, G., Fanelli, A.M., Torsello, M.A.: Shape annotation by semi-supervised fuzzy clustering. Inf. Sci. 289(24), 148–161 (2014)
Castellano, G., Fanelli, A.M., Torsello, M.A.: Incremental semi-supervised fuzzy clustering for shape annotation. In: Proceedings of 2014 IEEE Symposium on Computational Intelligence for Multimedia, Signal and Vision Processing (SSCI-CIMSIVP 2014), Orlando, Florida, USA, pp. 190–194, December 2014
Chapelle, O., Scholkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006)
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of KDD, pp. 71–80 (2000)
Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: Proceeding of PAKDD, pp. 21–34 (1997)
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of KDD, pp. 97–106 (2006)
K. C. 1999, KDDCup 1999, Technical report (1999). http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data_10_percent.gz
Olusola, A.A., Oladele, A.S., Abosede, D.O.: Analysis of KDD ’99 intrusion detection dataset for selection of relevance features. In: Proceedings of World Congress on Engineering and Computer Science, vol. I (2010)
Pedrycz, W., Waletzky, J.: Fuzzy clustering with partial supervision. IEEE Trans. Syst. Man Cybern. 27(5), 787–795 (1997)
Wu, X., Li, P., Hu, X.: Learning from concept drifting data streams with unlabeled data. Neurocomputing 92, 145–155 (2012)
Wu, S., Yang, C., Zhou, J.: Clustering-training for data stream mining. In: Proceedings of ICDMW 2006, pp. 653–656 (2006)
Zhou, Z.-H., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)
Zhu, X.: Semi-supervised learning literature survey. Report No. 1530, University of Wisconsin (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Castellano, G., Fanelli, A.M. (2017). Classification of Data Streams by Incremental Semi-supervised Fuzzy Clustering. In: Petrosino, A., Loia, V., Pedrycz, W. (eds) Fuzzy Logic and Soft Computing Applications. WILF 2016. Lecture Notes in Computer Science(), vol 10147. Springer, Cham. https://doi.org/10.1007/978-3-319-52962-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-52962-2_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52961-5
Online ISBN: 978-3-319-52962-2
eBook Packages: Computer ScienceComputer Science (R0)