Pattern-based time-series subsequence clustering using radial distribution functions

Denton, Anne M.; Besemann, Christopher A.; Dorr, Dietmar H.

doi:10.1007/s10115-008-0125-7

Pattern-based time-series subsequence clustering using radial distribution functions

Regular Paper
Published: 11 March 2008

Volume 18, pages 1–27, (2009)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Anne M. Denton¹,
Christopher A. Besemann¹ &
Dietmar H. Dorr¹

440 Accesses
20 Citations
Explore all metrics

Abstract

Clustering of time series subsequence data commonly produces results that are unspecific to the data set. This paper introduces a clustering algorithm, that creates clusters exclusively from those subsequences that occur more frequently in a data set than would be expected by random chance. As such, it partially adopts a pattern mining perspective into clustering. When subsequences are being labeled based on such clusters, they may remain without label. In fact, if the clustering was done on an unrelated time series it is expected that the subsequences should not receive a label. We show that pattern-based clusters are indeed specific to the data set for 7 out of 10 real-world sets we tested, and for window-lengths up to 128 time points. While kernel-density-based clustering can be used to find clusters with similar properties for window sizes of 8–16 time points, its performance degrades fast for increasing window sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

From Cluster-Based Outlier Detection to Time Series Discord Discovery

Accelerating the discovery of unsupervised-shapelets

Article 07 May 2015

An Enhanced Parameter-Free Subsequence Time Series Clustering for High-Variability-Width Data

References

Berndt D, Clifford J (1996) Advances in knowledge discovery and data mining. AAAI Press, Menlo Park, chapter Finding patterns in time series: a dynamic programming approach, pp 229–248
Breunig M, Kriegel H-P, Ng R, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of 5th ACM SIGMOD international conference on mangement of data, pp 93–104
Chen J (2005) Making subsequence time series clustering meaningful. In: Proceedings of the 5th IEEE international conference on data mining (ICDM’05), Houston, pp 114–121
Chen J (2007a) Making clustering in delay-vector space meaningful. Knowl Inf Syst 11(3): 369–385
Article Google Scholar
Chen J (2007b) Useful clustering outcomes from meaningful time series clustering. In: Proceedings of the Australasian data mining conference, Gold Coast, Australia
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5): 603–619
Article Google Scholar
Das G, Lin K-I, Mannila H et al (1998) Rule discovery from time series. In: Proceedings of the 4th ACM SIGKDD international conference on data mining, Rio de Janeiro, Brazil
Daw C, Finney C, Tracy E (2003) A review of symbolic analysis of experimental data. Rev Sci Instrum 74(2): 915–930
Article Google Scholar
Denton A (2004) Density-based clustering of time series subsequences. In: Proceedings of the 3rd workshop on mining temporal and sequential data (TDM 04) in conj. with the 10th ACM SIGKDD international conference on knowledge discovery and data mining, Seattle
Denton A (2005) Kernel-density-based clustering of time series subsequences using a continuous random-walk noise model. In: Proceedings of the 5th IEEE international conference on data mining (ICDM’05), Houston, pp 122–129
Ernst J, Nau G, Bar-Joesph Z (2005) Clustering short time series gene expression data, Bioinformatics 21(Supplement 1)
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining
Gavrilov M, Anguelov D, Indyk P, Motwani R (2000) Mining the stock market (extended abstract): which measure is best?. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, Boston, pp 487–496
Goldberger A, Amaral L, Glass L et al (2000) PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals, Circulation 101(23):e215–e220. Circulation Electronic Pages: [http://circ.ahajournals.org/cgi/content/full/101/23/e215]
Goldin D, Mardales R, Nagy G (2006) In search of meaning for time series subsequence clustering: Matching algorithms based on a new distance measure. In: Proceedings of the Conference on Information and Knowledge Management, Washington, DC
Han J, Kamber M (2006) Data Mining: Concepts and Techniques 2nd edn. The Morgan Kaufmann Series in Data Management Systems, Morgan Kaufmann Publishers
Hinneburg A, Keim D (2003) A general approach to clustering in large databases with noise. Knowl Inf Syst 5(4): 387–415
Article Google Scholar
Ide T (2006) Why does subsequence time-series clustering produce sine waves?. In: Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp 311–322
Ihler A (accessed 04/2003), Kernel density estimation toolbox for matlab (r13). [http://ttic.uchicago.edu/~ihler/code/kde.php]
Kantz H, Schreiber T (1999) Nonlinear time series analysis. Cambridge University Press, Cambridge
Google Scholar
Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Google Scholar
Keogh E, Folias T (accessed 2003) The ucr time series data mining archive. [http://www.cs.ucr.edu/~eamonn/TSDMA/index.html]
Keogh E, Lin J, Truppel W (2003) Clustering of time series subsequences is meaningless: implications for previous and future research. In: Proceedings of the IEEE international conference on data mining, Melbourne, pp 115–122
Keogh E, Pazzani M (2000) Scaling up dynamic time warping for datamining applications. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, Boston, MA, pp 285–289
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Mining Knowl Discovery 15(2): 107–144
Article MathSciNet Google Scholar
Ng R, Han J (2002) Clarans: A method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 14(5): 1003–1016
Article Google Scholar
Papadimitriou S, Kitagawa H, Gibbons P, Faloutsos C (2003) Loci: Fast outlier detection using the local correlation integral. In: Proceedings of the 19th international conference on data engineering (ICDE), pp 315–326
Patel P, Keogh E, Lin J, Lonardi S (2002) Mining motifs in massive time series databases. In: Proceedings of the IEEE international conference on data mining, Maebashi City, Japan
Peker K (2005) Subsequence time series (sts) clustering techniques for meaningful pattern discovery. In: Proceedings of the IEEE KIMAS Conference
Saito N (1995) Local feature extraction and its application using a library of bases. PhD Thesis
Simon G, Lee J, Verleysen M (2005) On the need of unfolding preprocessing for time series clustering. In: Proceedings of Workshop on Self-Organizing Maps (WSOM’05), Paris, France, pp 251–258
Verleysen M, François D (2005) The curse of dimensionality in data mining and time series prediction, In: Cabestany J, Prieto A, Sandoval F (eds) Computational Intelligence and Bioinspired Systems, Lecture Notes in Computer Science 3512. Springer, Heidelberg pp 758–770
Vlachos M, Gunopoulos D, Kollios G (2002) Discovering similar multidimensional trajectories. In: Proceedings 18th international conference on data engineering (ICDE’02), San Jose
Weisstein E (2003) Hypersphere, in MathWorld—A Wolfram Web Resource, [http://mathworld.wolfram.com/Hypersphere.html]
Yankov D, Keogh E, Medina J et al (2007) Detecting time series motiv under uniform scaling, In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, San Jose

Download references

Author information

Authors and Affiliations

Department of Computer Science and Operations Research, North Dakota State University, Fargo, ND, 58105-5164, USA
Anne M. Denton, Christopher A. Besemann & Dietmar H. Dorr

Authors

Anne M. Denton
View author publications
You can also search for this author in PubMed Google Scholar
Christopher A. Besemann
View author publications
You can also search for this author in PubMed Google Scholar
Dietmar H. Dorr
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anne M. Denton.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Denton, A.M., Besemann, C.A. & Dorr, D.H. Pattern-based time-series subsequence clustering using radial distribution functions. Knowl Inf Syst 18, 1–27 (2009). https://doi.org/10.1007/s10115-008-0125-7

Download citation

Received: 18 December 2006
Revised: 07 January 2008
Accepted: 19 January 2008
Published: 11 March 2008
Issue Date: January 2009
DOI: https://doi.org/10.1007/s10115-008-0125-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pattern-based time-series subsequence clustering using radial distribution functions

Abstract

Access this article

Similar content being viewed by others

From Cluster-Based Outlier Detection to Time Series Discord Discovery

Accelerating the discovery of unsupervised-shapelets

An Enhanced Parameter-Free Subsequence Time Series Clustering for High-Variability-Width Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pattern-based time-series subsequence clustering using radial distribution functions

Abstract

Access this article

Similar content being viewed by others

From Cluster-Based Outlier Detection to Time Series Discord Discovery

Accelerating the discovery of unsupervised-shapelets

An Enhanced Parameter-Free Subsequence Time Series Clustering for High-Variability-Width Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation