Clustering Time Series with k-Medoids Based Algorithms

Holder, Christopher; Guijo-Rubio, David; Bagnall, Anthony

doi:10.1007/978-3-031-49896-1_4

Christopher Holder¹⁴,
David Guijo-Rubio^14,15 &
Anthony Bagnall¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14343))

Included in the following conference series:

International Workshop on Advanced Analytics and Learning on Temporal Data

215 Accesses

Abstract

Time Series Clustering (TSCL) involves grouping unlabelled time series into homogeneous groups. A popular approach to TSCL is to use the partitional clustering algorithms k-means or k-medoids in conjunction with an elastic distance function such as Dynamic Time Warping (DTW). We explore TSCL using nine different elastic distance measures. Both partitional algorithms characterise clusters with an exemplar series, but use different techniques to do so: k-means uses an averaging algorithm to find an exemplar, whereas k-medoids chooses a training case (medoid). Traditionally, the arithmetic mean of a collection of time series was used with k-means. However, this ignores any offset. In 2011, an averaging technique specific to DTW, called DTW Barycentre Averaging (DBA), was proposed. Since, k-means with DBA has been the algorithm of choice for the majority of partition-based TSCL and much of the research using medoids-based approaches for TSCL stopped. We revisit k-medoids based TSCL with a range of elastic distance measures. Our results show k-medoids approaches are significantly better than k-means on a standard test suite, independent of the elastic distance measure used. We also compare the most commonly used alternating k-medoids approach against the Partition Around Medoids (PAM) algorithm. PAM significantly outperforms the default k-medoids for all nine elastic measures used. Additionally, we evaluate six variants of PAM designed to speed up TSCL. Finally, we show PAM with the best elastic distance measure is significantly better than popular alternative TSCL algorithms, including the k-means DBA approach, and competitive with the best deep learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/aeon-toolkit/aeon/.

References

Aghabozorgi, S., Wah, T.Y.: Clustering of large time series datasets. Intell. Data Anal. 18, 793–817 (2014)
Article Google Scholar
Benavoli, A., Corani, G., Mangili, F.: Should we really use post-hoc tests based on mean-ranks? J. Mach. Learn. Res. 17, 1–10 (2016)
MathSciNet Google Scholar
Cai, B., Huang, G., Samadiani, N., Li, G., Chi, C.-H.: Efficient time series clustering by minimizing dynamic time warping utilization. IEEE Access 9, 46589–46599 (2021)
Article Google Scholar
Caiado, J., Maharaj, E., D’Urso, P.: Time series clustering. In: Handbook of Cluster Analysis, pp. 241–264 (2015)
Google Scholar
Dau, H., et al.: The UCR time series archive. IEEE/CAA J. Automatica Sinica 6(6), 1293–1305 (2019)
Article Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet Google Scholar
der Laan, M.V., Pollard, K., Bryan, J.: A new partitioning around medoids algorithm. J. Stat. Comput. Simul. 73(8), 575–584 (2003)
Article MathSciNet Google Scholar
Estivill-Castro, V.: Why so many clustering algorithms: a position paper. SIGKDD Explor. Newsl. 4(1), 65–75 (2002)
Article Google Scholar
García, S., Herrera, F.: An extension on “statistical comparisons of classifiers over multiple data sets’’ for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008)
Google Scholar
Germain, T., Truong, C., Oudre, L., Krejci, E.: Unsupervised study of plethysmography signals through dtw clustering. In: 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pp. 3396–3400. IEEE (2022)
Google Scholar
Holder, C., Guijo-Rubio, D., Bagnall, A.: Barycentre averaging for the move-split-merge time series distance measure. In: 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (2023)
Google Scholar
Holder, C., Middlehurst, M., Bagnall, A.: A review and evaluation of elastic distance functions for time series clustering. Knowl. Inform. Syst. (2023)
Google Scholar
Ismail-Fawaz, A., et al.: An approach to multiple comparison benchmark evaluations that is stable under manipulation of the comparate set. arXiv preprint arXiv:2305.11921 (2023)
Javed, A., Lee, B. S., Rizzo, D.: A benchmark study on time series clustering. Mach. Learn. Appli. 1 (2020)
Google Scholar
Kariv, O., Hakimi, S.L.: An algorithmic approach to network location problems. ii: the p-medians. SIAM J. Appli. Mathem. 37(3), 539–560 (1979)
Google Scholar
Kaufman, L., Rousseeuw, P. J.: Clustering large data sets. In: Pattern Recognition in Practice, pp. 425–437. Elsevier, Amsterdam (1986)
Google Scholar
Lafabregue, B., Weber, J., Gancarski, P., Forestier, G.: End-to-end deep representation learning for time series clustering: a comparative study. Data Min. Knowl. Disc. 36, 29–81 (2022)
Article MathSciNet Google Scholar
Lenssen, L., Schubert, E.: Clustering by direct optimization of the medoid silhouette. In: Similarity Search and Applications: 15th International Conference, SISAP 2022, Bologna, Italy, Proceedings, pp. 190–204. Springer (2022). https://doi.org/10.1007/978-3-031-17849-8_15
Leonard Kaufman, P.J.R.: Partitioning Around Medoids (Program PAM), chapter 2, pp. 68–125. John Wiley and Sons Ltd. (1990)
Google Scholar
Li, H., Liu, J., Yang, Z., Liu, R.W., Wu, K., Wan, Y.: Adaptively constrained dynamic time warping for time series classification and clustering. Inf. Sci. 534, 97–116 (2020)
Article MathSciNet Google Scholar
Li, X., Lin, J., Zhao, L.: Time series clustering in linear time complexity. Data Min. Knowl. Disc. 35(3), 2369–2388 (2021)
Article MathSciNet Google Scholar
Lines, J., Bagnall, A.: Time series classification with ensembles of elastic distance measures. Data Min. Knowl. Disc. 29, 565–592 (2015)
Article MathSciNet Google Scholar
Lloyd, S.P.: Least squares quantization in pcm. IEEE Trans. Inf. Theory 28, 129–136 (1982)
Article MathSciNet Google Scholar
Marteau, P.: Time warp edit distance with stiffness adjustment for time series matching. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 306–318 (2009)
Article Google Scholar
Ng, R., Han, J.: CLARANS: a method for clustering objects for spatial data mining. IEEE Trans. Knowl. Data Eng. 14, 1003–1016 (2002)
Article Google Scholar
Paparrizos, J., Gravano, L.: k-shape: efficient and accurate clustering of time series. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1855–1870 (2015)
Google Scholar
Petitjean, F., Ketterlin, A., Gancarski, P.: A global averaging method for dynamic time warping, with applications to clustering. Pattern Recogn. 44, 678 (2011)
Article Google Scholar
Ratanamahatana, C., Keogh, E.: Three myths about dynamic time warping data mining. In: Proceedings of the 5th SIAM International Conference on Data Mining (2005)
Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Article Google Scholar
Schubert, E., Lenssen, L.: Fast k-medoids clustering in rust and python. J. Open Source Softw. 7(75), 4183 (2022)
Article Google Scholar
Schubert, E., Rousseeuw, P.J.: Faster k-medoids clustering: improving the PAM, CLARA, and CLARANS algorithms. In: Amato, G., Gennaro, C., Oria, V., Radovanović, M. (eds.) SISAP 2019. LNCS, vol. 11807, pp. 171–187. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32047-8_16
Chapter Google Scholar
Schubert, E., Rousseeuw, P.J.: Fast and eager k-medoids clustering: O(k) runtime improvement of the pam, clara, and clarans algorithms. Inf. Syst. 101, 101804 (2021)
Article Google Scholar
Shifaz, A., Pelletier, C., Petitjean, F., Webb, G.: Elastic similarity and distance measures for multivariate time series. Knowl. Inform. Syst. 65(6) (2023)
Google Scholar
Stefan, A., Athitsos, V., Das, G.: The Move-Split-Merge metric for time series. IEEE Trans. Knowl. Data Eng. 25(6), 1425–1438 (2013)
Article Google Scholar
Tavenard, R., et al.: Tslearn, a machine learning toolkit for time series data. J. Mach. Learn. Res. 21(118), 1–6 (2020)
Google Scholar

Download references

Acknowledgements

This work is supported by the UK Engineering and Physical Sciences Research Council (EPSRC) (grant ref.: EP/W030756/1), and by “Agencia Española de Investigación (España)” (grant ref.: PID2020-115454GB-C22 / AEI / 10.13039 / 501100011033). David Guijo-Rubio’s research has been subsidised by the University of Córdoba and financed by the European Union - NextGenerationEU (grant ref.: UCOR01MS). The experiments were carried out on the High Performance Computing Cluster supported by the Research and Specialist Computing Support service at the University of East Anglia. We would like to thank all those responsible for helping maintain the time series dataset archives and those contributing to open source implementations of the algorithms.

Author information

Authors and Affiliations

School of Computing Sciences, University of East Anglia, NR4 7TQ, Norwich, UK
Christopher Holder, David Guijo-Rubio & Anthony Bagnall
Department of Computer Science, Universidad de Córdoba, Córdoba, Spain
David Guijo-Rubio

Authors

Christopher Holder
View author publications
You can also search for this author in PubMed Google Scholar
David Guijo-Rubio
View author publications
You can also search for this author in PubMed Google Scholar
Anthony Bagnall
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christopher Holder .

Editor information

Editors and Affiliations

University College Dublin, Dublin, Ireland
Georgiana Ifrim
University of Rennes 2, Rennes, France
Romain Tavenard
University of Southampton, Southampton, UK
Anthony Bagnall
Humboldt University of Berlin, Berlin, Germany
Patrick Schaefer
University of Rennes, Rennes, France
Simon Malinowski
Claude Bernard University Lyon 1, Villeurbanne, France
Thomas Guyet
Orange Innovation, Lannion, France
Vincent Lemaire

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Holder, C., Guijo-Rubio, D., Bagnall, A. (2023). Clustering Time Series with k-Medoids Based Algorithms. In: Ifrim, G., et al. Advanced Analytics and Learning on Temporal Data. AALTD 2023. Lecture Notes in Computer Science(), vol 14343. Springer, Cham. https://doi.org/10.1007/978-3-031-49896-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-49896-1_4
Published: 20 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49895-4
Online ISBN: 978-3-031-49896-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Clustering Time Series with k-Medoids Based Algorithms