Abstract
Clustering of time series subsequence data commonly produces results that are unspecific to the data set. This paper introduces a clustering algorithm, that creates clusters exclusively from those subsequences that occur more frequently in a data set than would be expected by random chance. As such, it partially adopts a pattern mining perspective into clustering. When subsequences are being labeled based on such clusters, they may remain without label. In fact, if the clustering was done on an unrelated time series it is expected that the subsequences should not receive a label. We show that pattern-based clusters are indeed specific to the data set for 7 out of 10 real-world sets we tested, and for window-lengths up to 128 time points. While kernel-density-based clustering can be used to find clusters with similar properties for window sizes of 8–16 time points, its performance degrades fast for increasing window sizes.
Similar content being viewed by others
References
Berndt D, Clifford J (1996) Advances in knowledge discovery and data mining. AAAI Press, Menlo Park, chapter Finding patterns in time series: a dynamic programming approach, pp 229–248
Breunig M, Kriegel H-P, Ng R, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of 5th ACM SIGMOD international conference on mangement of data, pp 93–104
Chen J (2005) Making subsequence time series clustering meaningful. In: Proceedings of the 5th IEEE international conference on data mining (ICDM’05), Houston, pp 114–121
Chen J (2007a) Making clustering in delay-vector space meaningful. Knowl Inf Syst 11(3): 369–385
Chen J (2007b) Useful clustering outcomes from meaningful time series clustering. In: Proceedings of the Australasian data mining conference, Gold Coast, Australia
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5): 603–619
Das G, Lin K-I, Mannila H et al (1998) Rule discovery from time series. In: Proceedings of the 4th ACM SIGKDD international conference on data mining, Rio de Janeiro, Brazil
Daw C, Finney C, Tracy E (2003) A review of symbolic analysis of experimental data. Rev Sci Instrum 74(2): 915–930
Denton A (2004) Density-based clustering of time series subsequences. In: Proceedings of the 3rd workshop on mining temporal and sequential data (TDM 04) in conj. with the 10th ACM SIGKDD international conference on knowledge discovery and data mining, Seattle
Denton A (2005) Kernel-density-based clustering of time series subsequences using a continuous random-walk noise model. In: Proceedings of the 5th IEEE international conference on data mining (ICDM’05), Houston, pp 122–129
Ernst J, Nau G, Bar-Joesph Z (2005) Clustering short time series gene expression data, Bioinformatics 21(Supplement 1)
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining
Gavrilov M, Anguelov D, Indyk P, Motwani R (2000) Mining the stock market (extended abstract): which measure is best?. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, Boston, pp 487–496
Goldberger A, Amaral L, Glass L et al (2000) PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals, Circulation 101(23):e215–e220. Circulation Electronic Pages: [http://circ.ahajournals.org/cgi/content/full/101/23/e215]
Goldin D, Mardales R, Nagy G (2006) In search of meaning for time series subsequence clustering: Matching algorithms based on a new distance measure. In: Proceedings of the Conference on Information and Knowledge Management, Washington, DC
Han J, Kamber M (2006) Data Mining: Concepts and Techniques 2nd edn. The Morgan Kaufmann Series in Data Management Systems, Morgan Kaufmann Publishers
Hinneburg A, Keim D (2003) A general approach to clustering in large databases with noise. Knowl Inf Syst 5(4): 387–415
Ide T (2006) Why does subsequence time-series clustering produce sine waves?. In: Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp 311–322
Ihler A (accessed 04/2003), Kernel density estimation toolbox for matlab (r13). [http://ttic.uchicago.edu/~ihler/code/kde.php]
Kantz H, Schreiber T (1999) Nonlinear time series analysis. Cambridge University Press, Cambridge
Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Keogh E, Folias T (accessed 2003) The ucr time series data mining archive. [http://www.cs.ucr.edu/~eamonn/TSDMA/index.html]
Keogh E, Lin J, Truppel W (2003) Clustering of time series subsequences is meaningless: implications for previous and future research. In: Proceedings of the IEEE international conference on data mining, Melbourne, pp 115–122
Keogh E, Pazzani M (2000) Scaling up dynamic time warping for datamining applications. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, Boston, MA, pp 285–289
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Mining Knowl Discovery 15(2): 107–144
Ng R, Han J (2002) Clarans: A method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 14(5): 1003–1016
Papadimitriou S, Kitagawa H, Gibbons P, Faloutsos C (2003) Loci: Fast outlier detection using the local correlation integral. In: Proceedings of the 19th international conference on data engineering (ICDE), pp 315–326
Patel P, Keogh E, Lin J, Lonardi S (2002) Mining motifs in massive time series databases. In: Proceedings of the IEEE international conference on data mining, Maebashi City, Japan
Peker K (2005) Subsequence time series (sts) clustering techniques for meaningful pattern discovery. In: Proceedings of the IEEE KIMAS Conference
Saito N (1995) Local feature extraction and its application using a library of bases. PhD Thesis
Simon G, Lee J, Verleysen M (2005) On the need of unfolding preprocessing for time series clustering. In: Proceedings of Workshop on Self-Organizing Maps (WSOM’05), Paris, France, pp 251–258
Verleysen M, François D (2005) The curse of dimensionality in data mining and time series prediction, In: Cabestany J, Prieto A, Sandoval F (eds) Computational Intelligence and Bioinspired Systems, Lecture Notes in Computer Science 3512. Springer, Heidelberg pp 758–770
Vlachos M, Gunopoulos D, Kollios G (2002) Discovering similar multidimensional trajectories. In: Proceedings 18th international conference on data engineering (ICDE’02), San Jose
Weisstein E (2003) Hypersphere, in MathWorld—A Wolfram Web Resource, [http://mathworld.wolfram.com/Hypersphere.html]
Yankov D, Keogh E, Medina J et al (2007) Detecting time series motiv under uniform scaling, In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, San Jose
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Denton, A.M., Besemann, C.A. & Dorr, D.H. Pattern-based time-series subsequence clustering using radial distribution functions. Knowl Inf Syst 18, 1–27 (2009). https://doi.org/10.1007/s10115-008-0125-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-008-0125-7