Knowledge and Information Systems

, Volume 16, Issue 1, pp 29–51 | Cite as

Clustering multidimensional sequences in spatial and temporal databases

  • Ira Assent
  • Ralph Krieger
  • Boris Glavic
  • Thomas Seidl
Regular Paper

Abstract

Many environmental, scientific, technical or medical database applications require effective and efficient mining of time series, sequences or trajectories of measurements taken at different time points and positions forming large temporal or spatial databases. Particularly the analysis of concurrent and multidimensional sequences poses new challenges in finding clusters of arbitrary length and varying number of attributes. We present a novel algorithm capable of finding parallel clusters in different subspaces and demonstrate our results for temporal and spatial applications. Our analysis of structural quality parameters in rivers is successfully used by hydrologists to develop measures for river quality improvements.

Keywords

Data mining Clustering Spatial and temporal data Multidimensional sequences 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal R, Srikant R (1995) Mining sequential patterns. In: IEEE international conference on data engineering (ICDE), pp. 3–14Google Scholar
  2. 2.
    Assent I, Krieger R, Müller E, Seidl T (2007) DUSC: dimensionality unbiased subspace clustering. In: IEEE international conference on data mining (ICDM), pp. 409–414Google Scholar
  3. 3.
    Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp. 429–435Google Scholar
  4. 4.
    Bartusseck S (2005) Regelbasiertes Entscheidungsunterstützungssystem (DSS) zur Bewertung von Maß nahmenplänen gemäß EG-WRRL. Forum für Hydrologie und Wasserbewirtschaftung 10Google Scholar
  5. 5.
    Brecheisen S, Kriegel H and Pfeifle M (2006). Multi-step density-based clustering. Knowl Inf Sys 9(3): 284–308 CrossRefGoogle Scholar
  6. 6.
    Coatney M and Parthasarathy S (2005). MotifMiner: efficient discovery of common substructures in biochemical molecules. Knowl Inf Sys 7(2): 202–223 CrossRefGoogle Scholar
  7. 7.
    Denton A (2004) Density-based clustering of time series subsequences. In: IEEE international conference on data mining (ICDM)Google Scholar
  8. 8.
    Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp. 226–231Google Scholar
  9. 9.
    Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: ACM SIGMOD international conference on management of data, pp. 419–429Google Scholar
  10. 10.
    Georgia Forestry Commission (2005) Weather data retrieval. http://weather.gfc.state.ga.us
  11. 11.
    Grahne G, Zhu J (2004) Mining frequent itemsets from secondary memory. In: IEEE international conference on data mining (ICDM), pp. 91–98Google Scholar
  12. 12.
    Guha S, Rastogi R, Shim K (1999) A robust clustering algorithm for categorical attributes. In: IEEE international conference on data engineering (ICDE), pp. 512–521Google Scholar
  13. 13.
    Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: ACM SIGMOD international conference on management of data, pp. 1–12Google Scholar
  14. 14.
    Hinneburg A, Keim D (1998) An efficient approach to clustering in large multimedia databases with noise. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp. 58–65Google Scholar
  15. 15.
    Hinneburg A and Keim D (2003). A general approach to clustering in large databases with noise. Knowl Inf Sys 5(4): 387–415 CrossRefGoogle Scholar
  16. 16.
    Kailing K, Kriegel H, Kröger P (2004) Density-connected subspace clustering for high-dimensional data. In: IEEE international conference on data mining (ICDM), pp. 246–257Google Scholar
  17. 17.
    Kailing K, Kriegel H, Schonauer S, Seidl T (2004) Efficient similarity search for hierarchical data in large databases. In: international conference on extending database technology (EDBT), pp. 676–693Google Scholar
  18. 18.
    Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Locally adaptive dimensionality reduction for indexing large time series databases. In: ACM SIGMOD international conference on management of data, pp. 151–162Google Scholar
  19. 19.
    Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Workshop on research issues in data mining and knowledge discovery at ACM SIGMOD international conference on management of data, pp. 2–11Google Scholar
  20. 20.
    LUA NRW (2003) River quality data, http://www.lua.nrw.de..
  21. 21.
    MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Berkeley symposium on mathematical statistics and probability, pp. 281–297Google Scholar
  22. 22.
    Patel P, Keogh E, Lin J, Lonardi S (2002) Mining motifs in massive time series databases. In: IEEE international Conference on data mining (ICDM)Google Scholar
  23. 23.
    Zaki M, Peters M, Assent I, Seidl T (2005) Clicks: An effective algorithm for mining subspace clusters in categorical datasets. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp. 355–356Google Scholar

Copyright information

© Springer-Verlag London Limited 2008

Authors and Affiliations

  • Ira Assent
    • 1
  • Ralph Krieger
    • 1
  • Boris Glavic
    • 1
  • Thomas Seidl
    • 1
  1. 1.Data Management and Exploration GroupRWTH Aachen UniversityAachenGermany

Personalised recommendations