Skip to main content
Log in

Pattern-based time-series subsequence clustering using radial distribution functions

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Clustering of time series subsequence data commonly produces results that are unspecific to the data set. This paper introduces a clustering algorithm, that creates clusters exclusively from those subsequences that occur more frequently in a data set than would be expected by random chance. As such, it partially adopts a pattern mining perspective into clustering. When subsequences are being labeled based on such clusters, they may remain without label. In fact, if the clustering was done on an unrelated time series it is expected that the subsequences should not receive a label. We show that pattern-based clusters are indeed specific to the data set for 7 out of 10 real-world sets we tested, and for window-lengths up to 128 time points. While kernel-density-based clustering can be used to find clusters with similar properties for window sizes of 8–16 time points, its performance degrades fast for increasing window sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Berndt D, Clifford J (1996) Advances in knowledge discovery and data mining. AAAI Press, Menlo Park, chapter Finding patterns in time series: a dynamic programming approach, pp 229–248

  2. Breunig M, Kriegel H-P, Ng R, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of 5th ACM SIGMOD international conference on mangement of data, pp 93–104

  3. Chen J (2005) Making subsequence time series clustering meaningful. In: Proceedings of the 5th IEEE international conference on data mining (ICDM’05), Houston, pp 114–121

  4. Chen J (2007a) Making clustering in delay-vector space meaningful. Knowl Inf Syst 11(3): 369–385

    Article  Google Scholar 

  5. Chen J (2007b) Useful clustering outcomes from meaningful time series clustering. In: Proceedings of the Australasian data mining conference, Gold Coast, Australia

  6. Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5): 603–619

    Article  Google Scholar 

  7. Das G, Lin K-I, Mannila H et al (1998) Rule discovery from time series. In: Proceedings of the 4th ACM SIGKDD international conference on data mining, Rio de Janeiro, Brazil

  8. Daw C, Finney C, Tracy E (2003) A review of symbolic analysis of experimental data. Rev Sci Instrum 74(2): 915–930

    Article  Google Scholar 

  9. Denton A (2004) Density-based clustering of time series subsequences. In: Proceedings of the 3rd workshop on mining temporal and sequential data (TDM 04) in conj. with the 10th ACM SIGKDD international conference on knowledge discovery and data mining, Seattle

  10. Denton A (2005) Kernel-density-based clustering of time series subsequences using a continuous random-walk noise model. In: Proceedings of the 5th IEEE international conference on data mining (ICDM’05), Houston, pp 122–129

  11. Ernst J, Nau G, Bar-Joesph Z (2005) Clustering short time series gene expression data, Bioinformatics 21(Supplement 1)

  12. Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining

  13. Gavrilov M, Anguelov D, Indyk P, Motwani R (2000) Mining the stock market (extended abstract): which measure is best?. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, Boston, pp 487–496

  14. Goldberger A, Amaral L, Glass L et al (2000) PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals, Circulation 101(23):e215–e220. Circulation Electronic Pages: [http://circ.ahajournals.org/cgi/content/full/101/23/e215]

  15. Goldin D, Mardales R, Nagy G (2006) In search of meaning for time series subsequence clustering: Matching algorithms based on a new distance measure. In: Proceedings of the Conference on Information and Knowledge Management, Washington, DC

  16. Han J, Kamber M (2006) Data Mining: Concepts and Techniques 2nd edn. The Morgan Kaufmann Series in Data Management Systems, Morgan Kaufmann Publishers

  17. Hinneburg A, Keim D (2003) A general approach to clustering in large databases with noise. Knowl Inf Syst 5(4): 387–415

    Article  Google Scholar 

  18. Ide T (2006) Why does subsequence time-series clustering produce sine waves?. In: Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp 311–322

  19. Ihler A (accessed 04/2003), Kernel density estimation toolbox for matlab (r13). [http://ttic.uchicago.edu/~ihler/code/kde.php]

  20. Kantz H, Schreiber T (1999) Nonlinear time series analysis. Cambridge University Press, Cambridge

    Google Scholar 

  21. Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York

    Google Scholar 

  22. Keogh E, Folias T (accessed 2003) The ucr time series data mining archive. [http://www.cs.ucr.edu/~eamonn/TSDMA/index.html]

  23. Keogh E, Lin J, Truppel W (2003) Clustering of time series subsequences is meaningless: implications for previous and future research. In: Proceedings of the IEEE international conference on data mining, Melbourne, pp 115–122

  24. Keogh E, Pazzani M (2000) Scaling up dynamic time warping for datamining applications. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, Boston, MA, pp 285–289

  25. Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Mining Knowl Discovery 15(2): 107–144

    Article  MathSciNet  Google Scholar 

  26. Ng R, Han J (2002) Clarans: A method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 14(5): 1003–1016

    Article  Google Scholar 

  27. Papadimitriou S, Kitagawa H, Gibbons P, Faloutsos C (2003) Loci: Fast outlier detection using the local correlation integral. In: Proceedings of the 19th international conference on data engineering (ICDE), pp 315–326

  28. Patel P, Keogh E, Lin J, Lonardi S (2002) Mining motifs in massive time series databases. In: Proceedings of the IEEE international conference on data mining, Maebashi City, Japan

  29. Peker K (2005) Subsequence time series (sts) clustering techniques for meaningful pattern discovery. In: Proceedings of the IEEE KIMAS Conference

  30. Saito N (1995) Local feature extraction and its application using a library of bases. PhD Thesis

  31. Simon G, Lee J, Verleysen M (2005) On the need of unfolding preprocessing for time series clustering. In: Proceedings of Workshop on Self-Organizing Maps (WSOM’05), Paris, France, pp 251–258

  32. Verleysen M, François D (2005) The curse of dimensionality in data mining and time series prediction, In: Cabestany J, Prieto A, Sandoval F (eds) Computational Intelligence and Bioinspired Systems, Lecture Notes in Computer Science 3512. Springer, Heidelberg pp 758–770

  33. Vlachos M, Gunopoulos D, Kollios G (2002) Discovering similar multidimensional trajectories. In: Proceedings 18th international conference on data engineering (ICDE’02), San Jose

  34. Weisstein E (2003) Hypersphere, in MathWorld—A Wolfram Web Resource, [http://mathworld.wolfram.com/Hypersphere.html]

  35. Yankov D, Keogh E, Medina J et al (2007) Detecting time series motiv under uniform scaling, In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, San Jose

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anne M. Denton.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Denton, A.M., Besemann, C.A. & Dorr, D.H. Pattern-based time-series subsequence clustering using radial distribution functions. Knowl Inf Syst 18, 1–27 (2009). https://doi.org/10.1007/s10115-008-0125-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-008-0125-7

Keywords

Navigation