Data Mining and Knowledge Discovery

, Volume 24, Issue 2, pp 387–410

Tracing Evolving Subspace Clusters in Temporal Climate Data

  • Stephan Günnemann
  • Hardy Kremer
  • Charlotte Laufkötter
  • Thomas Seidl
Open Access
Article

Abstract

Analysis of temporal climate data is an active research area. Advanced data mining methods designed especially for these temporal data support the domain expert’s pursuit to understand phenomena as the climate change, which is crucial for a sustainable world. Important solutions for mining temporal data are cluster tracing approaches, which are used to mine temporal evolutions of clusters. Generally, clusters represent groups of objects with similar values. In a temporal context like tracing, similar values correspond to similar behavior in one snapshot in time. Each cluster can be interpreted as a behavior type and cluster tracing corresponds to tracking similar behaviors over time. Existing tracing approaches are for datasets satisfying two specific conditions: The clusters appear in all attributes, i.e., fullspace clusters, and the data objects have unique identifiers. These identifiers are used for tracking clusters by measuring the number of objects two clusters have in common, i.e. clusters are traced based on similar object sets. These conditions, however, are strict: First, in complex data, clusters are often hidden in individual subsets of the dimensions. Second, mapping clusters based on similar objects sets does not reflect the idea of tracing similar behavior types over time, because similar behavior can even be represented by clusters having no objects in common. A tracing method based on similar object values is needed. In this paper, we introduce a novel approach that traces subspace clusters based on object value similarity. Neither subspace tracing nor tracing by object value similarity has been done before.

References

  1. Aggarwal CC (2005) On change diagnosis in evolving data streams. IEEE TKDE 17(5): 587–600Google Scholar
  2. Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: VLDB, pp 81–92Google Scholar
  3. Aggarwal CC, Han J, Wang J, Yu PS (2004) A framework for projected clustering of high dimensional data streams. In: VLDB, pp 852–863Google Scholar
  4. Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: ACM SIGMOD, pp 94–105Google Scholar
  5. Barnett T, Pierce D, Schnur R (2001) Detection of anthropogenic climate change in the world’s oceans. Science 292(5515): 270CrossRefGoogle Scholar
  6. Boriah S, Kumar V, Steinbach M, Potter C, Klooster SA (2008) Land cover change detection: a case study. In: ACM SIGKDD, pp 857–865Google Scholar
  7. Böttcher M, Höppner F, Spiliopoulou M (2008) On exploiting the power of time in data mining. ACM SIGKDD Explorations 10(2): 3–11CrossRefGoogle Scholar
  8. Brodeur R, Mills C, Overland J, Walters G, Schumacher J (1999) Evidence for a substantial increase in gelatinous zooplankton in the bering sea, with possible links to climate change. Fisheries Oceanograp 8(4): 296–306CrossRefGoogle Scholar
  9. Cao F, Ester M, Qian W, Zhou A (2006) Density-based clustering over an evolving data stream with noise. In: SIAM SDM, pp 328–339, 2006Google Scholar
  10. Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc. Series B, pp 1–38Google Scholar
  11. Ester M, Kriegel H-P, JS, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: ACM SIGKDD, pp 226–231Google Scholar
  12. Fu T (2011) A review on time series data mining. Eng Appl Artif Intel 24(1): 164–181CrossRefGoogle Scholar
  13. Gaffney S, Smyth P (1999) Trajectory clustering with mixtures of regression models. In: ACM SIGKDD, pp 63–72Google Scholar
  14. Günnemann S, Kremer H, Seidl T (2010) Subspace clustering for uncertain data. In: SIAM SDM, pp 385–396Google Scholar
  15. Hinneburg A, Aggarwal CC, Keim DA (2000) What is the nearest neighbor in high dimensional spaces? In: VLDB, pp 506–515Google Scholar
  16. Hoegh-Guldberg O (1999) Climate change, coral bleaching and the future of the world’s coral reefs. Marine Freshw Res 50(8): 839–866CrossRefGoogle Scholar
  17. Hoffman F, Hargrove W Jr, Erickson D III, Oglesby R (2005) Using clustered climate regimes to analyze and compare predictions from fully coupled general circulation models. Earth Interact 9(10): 1–27CrossRefGoogle Scholar
  18. Huntington T (2006) Evidence for intensification of the global water cycle: Review and synthesis. J Hydrol 319(1-4): 83–95CrossRefGoogle Scholar
  19. Jensen CS, Lin D, Ooi BC (2007) Continuous clustering of moving objects. IEEE TKDE 19(9): 1161–1174Google Scholar
  20. Kalnis P, Mamoulis N, Bakiras S (2005) On discovering moving clusters in spatio-temporal data. In: SSTD, Springer, pp 364–381Google Scholar
  21. Kremer H, Günnemann S, Seidl T (2010) Detecting climate change in multivariate time series data by novel clustering and cluster tracing techniques. In: IEEE ICDM Workshops, pp 96–97Google Scholar
  22. Kremer H, Kranen P, Jansen T, Seidl T, Bifet A, Holmes G, Pfahringer B (2011) An effective evaluation measure for clustering on evolving data streams. In: ACM SIGKDD, pp 868–876Google Scholar
  23. Kriegel H-P, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM TKDD 3(1): 1–58CrossRefGoogle Scholar
  24. Li Y, Han J, Yang J (2004) Clustering moving objects. In: ACM SIGKDD, pp 617–622Google Scholar
  25. Liao TW (2005) Clustering of time series data: a survey. Patt Recogn 38(11): 1857–1874CrossRefMATHGoogle Scholar
  26. Longhurst A (1998) Ecological geography of the sea. Academic Press, LondonGoogle Scholar
  27. Müller E, Günnemann S, Assent I, Seidl T (2009) Evaluating clustering in subspace projections of high dimensional data. In: VLDB, pp 1270–1281Google Scholar
  28. Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. ACM SIGKDD Explorations 6(1): 90–105CrossRefGoogle Scholar
  29. Patrikainen A, Meila M (2006) Comparing subspace clusterings. IEEE TKDE 18(7): 902–916Google Scholar
  30. Procopiuc CM, Jones M, Agarwal PK, Murali TM (2002) A monte carlo algorithm for fast projective clustering. In ACM SIGMOD, pp 418–427Google Scholar
  31. Rosswog J, Ghose K (2008) Detecting and tracking spatio-temporal clusters with adaptive history filtering. In: IEEE ICDM Workshops, pp 448–457Google Scholar
  32. Siegel D, Doney S, Yoder J (2002) The North Atlantic spring phytoplankton bloom and Sverdrup’s critical depth hypothesis. Science 296(5568): 730CrossRefGoogle Scholar
  33. Spiliopoulou M, Ntoutsi I, Theodoridis Y, Schult R (2006) MONIC - modeling and monitoring cluster transitions. In: ACM SIGKDD, pp 706–711Google Scholar
  34. Steinbach M, Tan P-N, Kumar V, Klooster SA, Potter C (2003) Discovery of climate indices using clustering. In: ACM SIGKDD, pp 446–455Google Scholar
  35. Vlachos M, Gunopulos D, Kollios G (2002) Discovering similar multidimensional trajectories. In: IEEE ICDE, pp 673–684Google Scholar
  36. Yiu ML, Mamoulis N (2003) Frequent-pattern based iterative projected clustering. In: IEEE ICDM, pp 689–692Google Scholar
  37. Zhou D, Li J, Zha H (2005) A new mallows distance based metric for comparing clusterings. In: ICML, pp 1028–1035Google Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  • Stephan Günnemann
    • 1
  • Hardy Kremer
    • 1
  • Charlotte Laufkötter
    • 2
  • Thomas Seidl
    • 1
  1. 1.Data Management and Data Exploration GroupRWTH Aachen UniversityAachenGermany
  2. 2.Institute of Biogeochemistry and Pollutant DynamicsETH ZürichZürichSwitzerland

Personalised recommendations