Knowledge and Information Systems

, Volume 18, Issue 1, pp 1–27 | Cite as

Pattern-based time-series subsequence clustering using radial distribution functions

  • Anne M. Denton
  • Christopher A. Besemann
  • Dietmar H. Dorr
Regular Paper

Abstract

Clustering of time series subsequence data commonly produces results that are unspecific to the data set. This paper introduces a clustering algorithm, that creates clusters exclusively from those subsequences that occur more frequently in a data set than would be expected by random chance. As such, it partially adopts a pattern mining perspective into clustering. When subsequences are being labeled based on such clusters, they may remain without label. In fact, if the clustering was done on an unrelated time series it is expected that the subsequences should not receive a label. We show that pattern-based clusters are indeed specific to the data set for 7 out of 10 real-world sets we tested, and for window-lengths up to 128 time points. While kernel-density-based clustering can be used to find clusters with similar properties for window sizes of 8–16 time points, its performance degrades fast for increasing window sizes.

Keywords

Density-based clustering Time series subsequence clustering Clustering noisy data Noise elimination Time series labeling 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Berndt D, Clifford J (1996) Advances in knowledge discovery and data mining. AAAI Press, Menlo Park, chapter Finding patterns in time series: a dynamic programming approach, pp 229–248Google Scholar
  2. 2.
    Breunig M, Kriegel H-P, Ng R, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of 5th ACM SIGMOD international conference on mangement of data, pp 93–104Google Scholar
  3. 3.
    Chen J (2005) Making subsequence time series clustering meaningful. In: Proceedings of the 5th IEEE international conference on data mining (ICDM’05), Houston, pp 114–121Google Scholar
  4. 4.
    Chen J (2007a) Making clustering in delay-vector space meaningful. Knowl Inf Syst 11(3): 369–385CrossRefGoogle Scholar
  5. 5.
    Chen J (2007b) Useful clustering outcomes from meaningful time series clustering. In: Proceedings of the Australasian data mining conference, Gold Coast, AustraliaGoogle Scholar
  6. 6.
    Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5): 603–619CrossRefGoogle Scholar
  7. 7.
    Das G, Lin K-I, Mannila H et al (1998) Rule discovery from time series. In: Proceedings of the 4th ACM SIGKDD international conference on data mining, Rio de Janeiro, BrazilGoogle Scholar
  8. 8.
    Daw C, Finney C, Tracy E (2003) A review of symbolic analysis of experimental data. Rev Sci Instrum 74(2): 915–930CrossRefGoogle Scholar
  9. 9.
    Denton A (2004) Density-based clustering of time series subsequences. In: Proceedings of the 3rd workshop on mining temporal and sequential data (TDM 04) in conj. with the 10th ACM SIGKDD international conference on knowledge discovery and data mining, SeattleGoogle Scholar
  10. 10.
    Denton A (2005) Kernel-density-based clustering of time series subsequences using a continuous random-walk noise model. In: Proceedings of the 5th IEEE international conference on data mining (ICDM’05), Houston, pp 122–129Google Scholar
  11. 11.
    Ernst J, Nau G, Bar-Joesph Z (2005) Clustering short time series gene expression data, Bioinformatics 21(Supplement 1)Google Scholar
  12. 12.
    Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data miningGoogle Scholar
  13. 13.
    Gavrilov M, Anguelov D, Indyk P, Motwani R (2000) Mining the stock market (extended abstract): which measure is best?. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, Boston, pp 487–496Google Scholar
  14. 14.
    Goldberger A, Amaral L, Glass L et al (2000) PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals, Circulation 101(23):e215–e220. Circulation Electronic Pages: [http://circ.ahajournals.org/cgi/content/full/101/23/e215]
  15. 15.
    Goldin D, Mardales R, Nagy G (2006) In search of meaning for time series subsequence clustering: Matching algorithms based on a new distance measure. In: Proceedings of the Conference on Information and Knowledge Management, Washington, DCGoogle Scholar
  16. 16.
    Han J, Kamber M (2006) Data Mining: Concepts and Techniques 2nd edn. The Morgan Kaufmann Series in Data Management Systems, Morgan Kaufmann PublishersGoogle Scholar
  17. 17.
    Hinneburg A, Keim D (2003) A general approach to clustering in large databases with noise. Knowl Inf Syst 5(4): 387–415CrossRefGoogle Scholar
  18. 18.
    Ide T (2006) Why does subsequence time-series clustering produce sine waves?. In: Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp 311–322Google Scholar
  19. 19.
    Ihler A (accessed 04/2003), Kernel density estimation toolbox for matlab (r13). [http://ttic.uchicago.edu/~ihler/code/kde.php]
  20. 20.
    Kantz H, Schreiber T (1999) Nonlinear time series analysis. Cambridge University Press, CambridgeGoogle Scholar
  21. 21.
    Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New YorkGoogle Scholar
  22. 22.
    Keogh E, Folias T (accessed 2003) The ucr time series data mining archive. [http://www.cs.ucr.edu/~eamonn/TSDMA/index.html]
  23. 23.
    Keogh E, Lin J, Truppel W (2003) Clustering of time series subsequences is meaningless: implications for previous and future research. In: Proceedings of the IEEE international conference on data mining, Melbourne, pp 115–122Google Scholar
  24. 24.
    Keogh E, Pazzani M (2000) Scaling up dynamic time warping for datamining applications. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, Boston, MA, pp 285–289Google Scholar
  25. 25.
    Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Mining Knowl Discovery 15(2): 107–144CrossRefMathSciNetGoogle Scholar
  26. 26.
    Ng R, Han J (2002) Clarans: A method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 14(5): 1003–1016CrossRefGoogle Scholar
  27. 27.
    Papadimitriou S, Kitagawa H, Gibbons P, Faloutsos C (2003) Loci: Fast outlier detection using the local correlation integral. In: Proceedings of the 19th international conference on data engineering (ICDE), pp 315–326Google Scholar
  28. 28.
    Patel P, Keogh E, Lin J, Lonardi S (2002) Mining motifs in massive time series databases. In: Proceedings of the IEEE international conference on data mining, Maebashi City, JapanGoogle Scholar
  29. 29.
    Peker K (2005) Subsequence time series (sts) clustering techniques for meaningful pattern discovery. In: Proceedings of the IEEE KIMAS ConferenceGoogle Scholar
  30. 30.
    Saito N (1995) Local feature extraction and its application using a library of bases. PhD ThesisGoogle Scholar
  31. 31.
    Simon G, Lee J, Verleysen M (2005) On the need of unfolding preprocessing for time series clustering. In: Proceedings of Workshop on Self-Organizing Maps (WSOM’05), Paris, France, pp 251–258Google Scholar
  32. 32.
    Verleysen M, François D (2005) The curse of dimensionality in data mining and time series prediction, In: Cabestany J, Prieto A, Sandoval F (eds) Computational Intelligence and Bioinspired Systems, Lecture Notes in Computer Science 3512. Springer, Heidelberg pp 758–770Google Scholar
  33. 33.
    Vlachos M, Gunopoulos D, Kollios G (2002) Discovering similar multidimensional trajectories. In: Proceedings 18th international conference on data engineering (ICDE’02), San JoseGoogle Scholar
  34. 34.
    Weisstein E (2003) Hypersphere, in MathWorld—A Wolfram Web Resource, [http://mathworld.wolfram.com/Hypersphere.html]
  35. 35.
    Yankov D, Keogh E, Medina J et al (2007) Detecting time series motiv under uniform scaling, In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, San JoseGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2008

Authors and Affiliations

  • Anne M. Denton
    • 1
  • Christopher A. Besemann
    • 1
  • Dietmar H. Dorr
    • 1
  1. 1.Department of Computer Science and Operations ResearchNorth Dakota State UniversityFargoUSA

Personalised recommendations