Advertisement

Density-Based Projected Clustering of Data Streams

  • Marwan Hassani
  • Pascal Spaus
  • Mohamed Medhat Gaber
  • Thomas Seidl
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7520)

Abstract

In this paper, we have proposed, developed and experimentally validated our novel subspace data stream clustering, termed PreDeConStream. The technique is based on the two phase mode of mining streaming data, in which the first phase represents the process of the online maintenance of a data structure, that is then passed to an offline phase of generating the final clustering model. The technique works on incrementally updating the output of the online phase stored in a micro-cluster structure, taking into consideration those micro-clusters that are fading out over time, speeding up the process of assigning new data points to existing clusters. A density based projected clustering model in developing PreDeConStream was used. With many important applications that can benefit from such technique, we have proved experimentally the superiority of the proposed methods over state-of-the-art techniques.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proc. of VLDB 2003, pp. 81–92 (2003)Google Scholar
  3. 3.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for projected clustering of high dimensional data streams. In: Proc. of VLDB 2004, pp. 852–863 (2004)Google Scholar
  4. 4.
    Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. SIGMOD Rec., 61–72 (1999)Google Scholar
  5. 5.
    Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When Is Nearest Neighbor Meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  6. 6.
    Bohm, C., Kailing, K., Kriegel, H.-P., Kroger, P.: Density connected clustering with local subspace preferences. In: ICDM 2004, pp. 27–34 (2004)Google Scholar
  7. 7.
    Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proc. of SDM 2006, pp. 328–339 (2006)Google Scholar
  8. 8.
    Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: Proc. of KDD 2007, pp. 133–142 (2007)Google Scholar
  9. 9.
    Ester, M., Kriegel, H.-P., Jörg, S., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Knowledge Discovery and Data Mining, pp. 226–231 (1996)Google Scholar
  10. 10.
    Hassani, M., Kranen, P., Seidl, T.: Precise anytime clustering of noisy sensor data with logarithmic complexity. In: Proc. SensorKDD 2011 Workshop in conj. with KDD 2011, pp. 52–60 (2011)Google Scholar
  11. 11.
    Hassani, M., Müller, E., Seidl, T.: EDISKCO: energy efficient distributed in-sensor-network k-center clustering with outliers. In: Proc. SensorKDD 2010 Workshop in conj. with KDD 2009, pp. 39–48 (2009)Google Scholar
  12. 12.
    Hassani, M., Seidl, T.: Towards a mobile health context prediction: Sequential pattern mining in multiple streams. In: Proc. MDM 2011, pp. 55–57 (2011)Google Scholar
  13. 13.
    Jain, A., Zhang, Z., Chang, E.Y.: Adaptive non-linear clustering in data streams. In: Proc. of CIKM 2006, pp. 122–131 (2006)Google Scholar
  14. 14.
    Kailing, K., Kriegel, H.-P., Kröger, P.: Density-connected subspace clustering for high-dimensional data. In: SDM 2004, pp. 246–257 (2004)Google Scholar
  15. 15.
    Kriegel, H.-P., Kröger, P., Ntoutsi, I., Zimek, A.: Towards subspace clustering on dynamic data: an incremental version of predecon. In: Proc. of StreamKDD workshop in conj. with KDD 2010, pp. 31–38 (2010)Google Scholar
  16. 16.
    Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. on Knowledge Discovery from Data, 1:1–1:58 (2009)Google Scholar
  17. 17.
    Lin, G., Chen, L.: A grid and fractal dimension-based data stream clustering algorithm. In: ISISE 2008, pp. 66–70 (2008)Google Scholar
  18. 18.
    Ntoutsi, I., Zimek, A., Palpanas, T., Kröger, P., Kriegel, H.-P.: Density-based projected clustering over high dimensional data streams. In: Proc. of SDM 2012, pp. 987–998 (2012)Google Scholar
  19. 19.
    Park, N.H., Lee, W.S.: Grid-based subspace clustering over data streams. In: Proc. of CIKM 2007, pp. 801–810 (2007)Google Scholar
  20. 20.
    Spaus, P.: Density based model for subspace clustering on stream data. Bachelor’s thesis, Dept. of Computer Science, RWTH Aachen University (May 2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Marwan Hassani
    • 1
  • Pascal Spaus
    • 1
  • Mohamed Medhat Gaber
    • 2
  • Thomas Seidl
    • 1
  1. 1.Data Management and Data Exploration GroupRWTH Aachen UniversityGermany
  2. 2.School of ComputingUniversity of PortsmouthUK

Personalised recommendations