Skip to main content

Semi-Supervised Stream Clustering Using Labeled Data Points

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9166))

Abstract

Semi-supervised stream clustering performs cluster analysis of data streams by exploiting background or domain expert knowledge. Almost of existing semi-supervised stream clustering techniques exploit background knowledge as constraints such as must-link and cannot-link constraints. The use of constraints is not appropriate with respect to the dynamic nature of data streams. In this paper, we proposed a new semi-supervised stream clustering algorithm, SSE-Stream. SSE-Stream exploits background knowledge in the form of single labeled data points to monitor and detect change of the clustering structure evolution. Exploiting background knowledge as single labeled data points is more appropriate for data streams. They can be immediately utilised for determining the class of clusters, and effectively support the changing behavior of data streams. SSE-Stream defines new cluster representation to include labeled data points, and uses it to extend the clustering operations such as merge and split for detecting change of the clustering structure evolution. Experimental results on real-world stream datasets show that SSE-Stream is able to improve the output clustering quality, especially for highly complex and drift datasets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bradley, P.S., Bennett, K.P., Demiriz, A.: Constrained k-means clustering. Technical report. Technical report MSR-TR-2000-65, Microsoft Research, Redmond, WA (2000)

    Google Scholar 

  2. Milenova, B.L., Campos, M.M.: Cpar: clustering large databases with numeric and nominal valuees using orthogonal proections. In: Proceedings of the 29th VLDB Conference (2003)

    Google Scholar 

  3. Aggarwal, C., Han, J., Yu, P.S.: A framework for projected clustering of high dimensional data streams. In: Proceeding of the 30th VLDB Conference (2004)

    Google Scholar 

  4. Udommanetanakit, K., Rakthanmanon, T., Waiyamai, K.: E-stream: evolution-based technique for stream clustering. In: Alhajj, R., Gao, H., Li, X., Li, J., Zaïane, O.R. (eds.) ADMA 2007. LNCS (LNAI), vol. 4632, pp. 605–615. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  5. Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: SIAM 2006: SIAM International Conference on Data Mining (2006)

    Google Scholar 

  6. Chairukwattana, R., Kangkachit, T., Rakthanmanon, T., Waiyamai, K.: SED-stream: discriminative dimension selection for evolution-based clustering of high dimensional data streams. Int. J. Intell. Syst. Technol. Appl. Arch. 13(3), 187–201 (2014)

    Google Scholar 

  7. Chairukwattana, R., Kangkachit, T., Rakthanmanon, T., Waiyamai, K.: SE-stream: dimension projection for evolution-based clustering of high dimensional data streams. In: Van Huynh, N., Denoeux, T., Tran, D.H., Le, A.C., Pham, S.B. (eds.) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, pp. 365–376. Springer, Heidelberg (2013)

    Google Scholar 

  8. Meesuksabai, W., Kangkachit, T., Waiyamai, K.: HUE-stream: evolution-based clustering technique for heterogeneous data streams with uncertainty. In: Tang, J., King, I., Chen, L., Wang, J. (eds.) ADMA 2011, Part II. LNCS, vol. 7121, pp. 27–40. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  9. Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained K-means clustering with background knowledge. In: ICML 2001: Proceedings of 18th International Conference on Machine Learning, pp. 577–584 (2001)

    Google Scholar 

  10. Ruiz, C., Spiliopoulou, M., Menasalvas, E.: User constraints over data streams. In: IWKDDS (2006)

    Google Scholar 

  11. Ruiz, C., Menasalvas, E., Spiliopoulou, M.: C-Denstream: using domain knowledge on a data stream. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds.) DS 2009. LNCS, vol. 5808, pp. 287–301. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  12. Sirampuj, T., Kangkachit, T., Waiyamai, K.: CE-stream: evaluation-based techniquefor stream clustering with constraints. In: The 10th International Joint Conference on Computer Science and Software Engineering (JCSSE 2013) (2013)

    Google Scholar 

  13. Han, J., Wang, J., Philip, S.Y.: A framework for projected clustering of high dimensional data streams. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, pp. 852–863 (2004)

    Google Scholar 

  14. Zhu, X.: semi-supervised learning literature survey

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kitsana Waiyamai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Treechalong, K., Rakthanmanon, T., Waiyamai, K. (2015). Semi-Supervised Stream Clustering Using Labeled Data Points. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2015. Lecture Notes in Computer Science(), vol 9166. Springer, Cham. https://doi.org/10.1007/978-3-319-21024-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-21024-7_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-21023-0

  • Online ISBN: 978-3-319-21024-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics