Effective Evaluation Measures for Subspace Clustering of Data Streams

  • Marwan Hassani
  • Yunsu Kim
  • Seungjin Choi
  • Thomas Seidl
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7867)

Abstract

Nowadays, most streaming data sources are becoming high-dimensional. Accordingly, subspace stream clustering, which aims at finding evolving clusters within subgroups of dimensions, has gained a significant importance. However, existing subspace clustering evaluation measures are mainly designed for static data, and cannot reflect the quality of the evolving nature of data streams. On the other hand, available stream clustering evaluation measures care only about the errors of the full-space clustering but not the quality of subspace clustering.

In this paper we propose, to the first of our knowledge, the first subspace clustering measure that is designed for streaming data, called SubCMM: Subspace Cluster Mapping Measure. SubCMM is an effective evaluation measure for stream subspace clustering that is able to handle errors caused by emerging, moving, or splitting subspace clusters. Additionally, we propose a novel method for using available offline subspace clustering measures for data streams within the Subspace MOA framework.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proc. of the 29th Int. Conf. on Very Large Data Bases, VLDB 2003, vol. 29, pp. 81–92 (2003)Google Scholar
  3. 3.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for projected clustering of high dimensional data streams. In: Proc. of VLDB 2004, pp. 852–863 (2004)Google Scholar
  4. 4.
    Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. SIGMOD Rec. 28(2), 61–72 (1999)CrossRefGoogle Scholar
  5. 5.
    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proc. of SIGMOD 1998, pp. 94–105 (1998)Google Scholar
  6. 6.
    Assent, I., Krieger, R., Müller, E., Seidl, T.: Inscy: Indexing subspace clusters with in-process-removal of redundancy. In: ICDM, pp. 719–724 (2008)Google Scholar
  7. 7.
    Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is ”nearest neighbor” meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  8. 8.
    Bohm, C., Kailing, K., Kriegel, H.-P., Kroger, P.: Density connected clustering with local subspace preferences. In: ICDM, pp. 27–34 (2004)Google Scholar
  9. 9.
    Bringmann, B., Zimmermann, A.: The chosen few: On identifying valuable patterns. In: ICDM, pp. 63–72 (2007)Google Scholar
  10. 10.
    Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: 2006 SIAM Conference on Data Mining, pp. 328–339 (2006)Google Scholar
  11. 11.
    Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: Proc. of KDD 2007, pp. 133–142 (2007)Google Scholar
  12. 12.
    Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. of KDD 1996, pp. 226–231 (1996)Google Scholar
  13. 13.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier Science & Tech. (2006)Google Scholar
  14. 14.
    Hassani, M., Kim, Y., Seidl, T.: Subspace moa: Subspace stream clustering evaluation using the moa framework. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds.) DASFAA 2013, Part II. LNCS, vol. 7826, pp. 446–449. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  15. 15.
    Hassani, M., Kranen, P., Seidl, T.: Precise anytime clustering of noisy sensor data with logarithmic complexity. In: Proc. 5th International Workshop on Knowledge Discovery from Sensor Data (SensorKDD 2011) in Conjunction with KDD 2011, pp. 52–60 (2011)Google Scholar
  16. 16.
    Hassani, M., Müller, E., Seidl, T.: EDISKCO: energy efficient distributed in-sensor-network k-center clustering with outliers. In: Proc. SensorKDD 2010 Workshop in Conj. with KDD 2009, pp. 39–48 (2009)Google Scholar
  17. 17.
    Hassani, M., Spaus, P., Gaber, M.M., Seidl, T.: Density-based projected clustering of data streams. In: Hüllermeier, E., Link, S., Fober, T., Seeger, B. (eds.) SUM 2012. LNCS, vol. 7520, pp. 311–324. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  18. 18.
    Jain, A., Zhang, Z., Chang, E.Y.: Adaptive non-linear clustering in data streams. In: Proc. of CIKM 2006, pp. 122–131 (2006)Google Scholar
  19. 19.
    Kaufman, L., Rousseeuw, P.J.: Finding groups in data: an introduction to cluster analysis. Wiley series in probability and mathematical statistics: Applied probability and statistics. Wiley (1990)Google Scholar
  20. 20.
    Kranen, P., Kremer, H., Jansen, T., Seidl, T., Bifet, A., Holmes, G., Pfahringer, B., Read, J.: Stream data mining using the MOA framework. In: Lee, S.-g., Peng, Z., Zhou, X., Moon, Y.-S., Unland, R., Yoo, J. (eds.) DASFAA 2012, Part II. LNCS, vol. 7239, pp. 309–313. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  21. 21.
    Kremer, H., Kranen, P., Jansen, T., Seidl, T., Bifet, A., Holmes, G., Pfahringer, B.: An effective evaluation measure for clustering on evolving data streams. In: Proc. of KDD 2011 (2011)Google Scholar
  22. 22.
    Kriegel, H.-P., Kröger, P., Ntoutsi, I., Zimek, A.: Towards subspace clustering on dynamic data: an incremental version of predecon. In: Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques, StreamKDD 2010, pp. 31–38 (2010)Google Scholar
  23. 23.
    Kröger, P., Kriegel, H.-P., Kailing, K.: Density-connected subspace clustering for high-dimensional data. In: SDM, pp. 246–257 (2004)Google Scholar
  24. 24.
    Lin, G., Chen, L.: A grid and fractal dimension-based data stream clustering algorithm. In: ISISE 2008, pp. 66–70 (2008)Google Scholar
  25. 25.
    Müller, E., Assent, I., Günnemann, S., Jansen, T., Seidl, T.: Opensubspace: An open source framework for evaluation and exploration of subspace clustering algorithms in weka. In: In Open Source in Data Mining Workshop at PAKDD, pp. 2–13 (2009)Google Scholar
  26. 26.
    Müller, E., Günnemann, S., Assent, I., Seidl, T.: Evaluating clustering in subspace projections of high dimensional data. PVLDB 2(1), 1270–1281 (2009)Google Scholar
  27. 27.
    Ntoutsi, I., Zimek, A., Palpanas, T., Kröger, P., Kriegel, H.-P.: Density-based projected clustering over high dimensional data streams. In: Proc. of SDM 2012, pp. 987–998 (2012)Google Scholar
  28. 28.
    Park, N.H., Lee, W.S.: Grid-based subspace clustering over data streams. In: Proc. of CIKM 2007, pp. 801–810 (2007)Google Scholar
  29. 29.
    Patrikainen, A., Meila, M.: Comparing subspace clusterings. IEEE Transactions on Knowledge and Data Engineering 18(7), 902–916 (2006)CrossRefGoogle Scholar
  30. 30.
    Sequeira, K., Zaki, M.: Schism: A new approach to interesting subspace mining, vol. 1, pp. 137–160 (2005)Google Scholar
  31. 31.
    Zhao, Y., Karypis, G.: Criterion functions for document clustering: Experiments and analysis. Technical report (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Marwan Hassani
    • 1
  • Yunsu Kim
    • 1
  • Seungjin Choi
    • 2
  • Thomas Seidl
    • 1
  1. 1.Data Management and Data Exploration GroupRWTH Aachen UniversityGermany
  2. 2.Department of Computer Science and EngineeringPohang University of Science and TechnologyKorea

Personalised recommendations