Classification of Data Streams by Incremental Semi-supervised Fuzzy Clustering

Castellano, G.; Fanelli, A. M.

doi:10.1007/978-3-319-52962-2_16

G. Castellano¹⁶ &
A. M. Fanelli¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10147))

Included in the following conference series:

International Workshop on Fuzzy Logic and Applications

1473 Accesses
3 Citations

Abstract

Data stream mining refers to methods able to mine continuously arriving and evolving data sequences or even large scale static databases. Mining data streams has attracted much attention recently. Many data stream classification methods are supervised, hence they require labeled samples that are more difficult and expensive to obtain than unlabeled ones. This paper proposes an incremental semi-supervised clustering approach for data stream classification. Preliminary experimental results on the benchmark data set KDD-CUP’99 show the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Any stream can be turned into a chunked stream by simply waiting for enough data points to arrive.

References

Aggarwal, C.C.: A framework for diagnosing changes in evolving data streams. In: Proceedings of ACM SIGMOD Conference, pp. 575–586 (2003)
Google Scholar
Almeida, R.J., Sousa, J.M.C.: Comparison of fuzzy clustering algorithms for classification. In: Proceedings of International Symposium on Evolving Fuzzy Systems, pp. 112–117 (2006)
Google Scholar
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 1–16 (2002)
Google Scholar
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
Book MATH Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of Annual Conference on Computational Learning Theory, pp. 92–100 (1998)
Google Scholar
Bolon-Canedo, V., Sanchez-Marono, N., Alonso-Betanzos, A.: A combination of discretization and filter methods for improving classification performance in KDD Cup 99 dataset. In: Proceedings of International Joint Conference on Neural Networks, pp. 359–366 (2009)
Google Scholar
Castellano, G., Fanelli, A.M., Torsello, M.A.: Shape annotation by semi-supervised fuzzy clustering. Inf. Sci. 289(24), 148–161 (2014)
Article MATH Google Scholar
Castellano, G., Fanelli, A.M., Torsello, M.A.: Incremental semi-supervised fuzzy clustering for shape annotation. In: Proceedings of 2014 IEEE Symposium on Computational Intelligence for Multimedia, Signal and Vision Processing (SSCI-CIMSIVP 2014), Orlando, Florida, USA, pp. 190–194, December 2014
Google Scholar
Chapelle, O., Scholkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006)
Google Scholar
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of KDD, pp. 71–80 (2000)
Google Scholar
Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: Proceeding of PAKDD, pp. 21–34 (1997)
Google Scholar
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of KDD, pp. 97–106 (2006)
Google Scholar
K. C. 1999, KDDCup 1999, Technical report (1999). http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data_10_percent.gz
Olusola, A.A., Oladele, A.S., Abosede, D.O.: Analysis of KDD ’99 intrusion detection dataset for selection of relevance features. In: Proceedings of World Congress on Engineering and Computer Science, vol. I (2010)
Google Scholar
Pedrycz, W., Waletzky, J.: Fuzzy clustering with partial supervision. IEEE Trans. Syst. Man Cybern. 27(5), 787–795 (1997)
Article Google Scholar
Wu, X., Li, P., Hu, X.: Learning from concept drifting data streams with unlabeled data. Neurocomputing 92, 145–155 (2012)
Article Google Scholar
Wu, S., Yang, C., Zhou, J.: Clustering-training for data stream mining. In: Proceedings of ICDMW 2006, pp. 653–656 (2006)
Google Scholar
Zhou, Z.-H., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)
Article Google Scholar
Zhu, X.: Semi-supervised learning literature survey. Report No. 1530, University of Wisconsin (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Università degli Studi di Bari “A. Moro”, Via E. Orabona 4, 70126, Bari, Italy
G. Castellano & A. M. Fanelli

Authors

G. Castellano
View author publications
You can also search for this author in PubMed Google Scholar
A. M. Fanelli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to G. Castellano .

Editor information

Editors and Affiliations

University of Naples “Parthenope”, Naples, Italy
Alfredo Petrosino
University of Salerno, Fisciano, (Salerno), Italy
Vincenzo Loia
University of Alberta, Edmonton, Alberta, Canada
Witold Pedrycz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Castellano, G., Fanelli, A.M. (2017). Classification of Data Streams by Incremental Semi-supervised Fuzzy Clustering. In: Petrosino, A., Loia, V., Pedrycz, W. (eds) Fuzzy Logic and Soft Computing Applications. WILF 2016. Lecture Notes in Computer Science(), vol 10147. Springer, Cham. https://doi.org/10.1007/978-3-319-52962-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-52962-2_16
Published: 07 February 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52961-5
Online ISBN: 978-3-319-52962-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics