Skip to main content

Classification of Data Streams by Incremental Semi-supervised Fuzzy Clustering

  • Conference paper
  • First Online:
Fuzzy Logic and Soft Computing Applications (WILF 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10147))

Included in the following conference series:

Abstract

Data stream mining refers to methods able to mine continuously arriving and evolving data sequences or even large scale static databases. Mining data streams has attracted much attention recently. Many data stream classification methods are supervised, hence they require labeled samples that are more difficult and expensive to obtain than unlabeled ones. This paper proposes an incremental semi-supervised clustering approach for data stream classification. Preliminary experimental results on the benchmark data set KDD-CUP’99 show the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Any stream can be turned into a chunked stream by simply waiting for enough data points to arrive.

References

  1. Aggarwal, C.C.: A framework for diagnosing changes in evolving data streams. In: Proceedings of ACM SIGMOD Conference, pp. 575–586 (2003)

    Google Scholar 

  2. Almeida, R.J., Sousa, J.M.C.: Comparison of fuzzy clustering algorithms for classification. In: Proceedings of International Symposium on Evolving Fuzzy Systems, pp. 112–117 (2006)

    Google Scholar 

  3. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 1–16 (2002)

    Google Scholar 

  4. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)

    Book  MATH  Google Scholar 

  5. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of Annual Conference on Computational Learning Theory, pp. 92–100 (1998)

    Google Scholar 

  6. Bolon-Canedo, V., Sanchez-Marono, N., Alonso-Betanzos, A.: A combination of discretization and filter methods for improving classification performance in KDD Cup 99 dataset. In: Proceedings of International Joint Conference on Neural Networks, pp. 359–366 (2009)

    Google Scholar 

  7. Castellano, G., Fanelli, A.M., Torsello, M.A.: Shape annotation by semi-supervised fuzzy clustering. Inf. Sci. 289(24), 148–161 (2014)

    Article  MATH  Google Scholar 

  8. Castellano, G., Fanelli, A.M., Torsello, M.A.: Incremental semi-supervised fuzzy clustering for shape annotation. In: Proceedings of 2014 IEEE Symposium on Computational Intelligence for Multimedia, Signal and Vision Processing (SSCI-CIMSIVP 2014), Orlando, Florida, USA, pp. 190–194, December 2014

    Google Scholar 

  9. Chapelle, O., Scholkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006)

    Google Scholar 

  10. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of KDD, pp. 71–80 (2000)

    Google Scholar 

  11. Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: Proceeding of PAKDD, pp. 21–34 (1997)

    Google Scholar 

  12. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of KDD, pp. 97–106 (2006)

    Google Scholar 

  13. K. C. 1999, KDDCup 1999, Technical report (1999). http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data_10_percent.gz

  14. Olusola, A.A., Oladele, A.S., Abosede, D.O.: Analysis of KDD ’99 intrusion detection dataset for selection of relevance features. In: Proceedings of World Congress on Engineering and Computer Science, vol. I (2010)

    Google Scholar 

  15. Pedrycz, W., Waletzky, J.: Fuzzy clustering with partial supervision. IEEE Trans. Syst. Man Cybern. 27(5), 787–795 (1997)

    Article  Google Scholar 

  16. Wu, X., Li, P., Hu, X.: Learning from concept drifting data streams with unlabeled data. Neurocomputing 92, 145–155 (2012)

    Article  Google Scholar 

  17. Wu, S., Yang, C., Zhou, J.: Clustering-training for data stream mining. In: Proceedings of ICDMW 2006, pp. 653–656 (2006)

    Google Scholar 

  18. Zhou, Z.-H., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)

    Article  Google Scholar 

  19. Zhu, X.: Semi-supervised learning literature survey. Report No. 1530, University of Wisconsin (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to G. Castellano .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Castellano, G., Fanelli, A.M. (2017). Classification of Data Streams by Incremental Semi-supervised Fuzzy Clustering. In: Petrosino, A., Loia, V., Pedrycz, W. (eds) Fuzzy Logic and Soft Computing Applications. WILF 2016. Lecture Notes in Computer Science(), vol 10147. Springer, Cham. https://doi.org/10.1007/978-3-319-52962-2_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-52962-2_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-52961-5

  • Online ISBN: 978-3-319-52962-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics