Skip to main content

Clustering Time Series with k-Medoids Based Algorithms

  • Conference paper
  • First Online:
Advanced Analytics and Learning on Temporal Data (AALTD 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14343))

  • 215 Accesses

Abstract

Time Series Clustering (TSCL) involves grouping unlabelled time series into homogeneous groups. A popular approach to TSCL is to use the partitional clustering algorithms k-means or k-medoids in conjunction with an elastic distance function such as Dynamic Time Warping (DTW). We explore TSCL using nine different elastic distance measures. Both partitional algorithms characterise clusters with an exemplar series, but use different techniques to do so: k-means uses an averaging algorithm to find an exemplar, whereas k-medoids chooses a training case (medoid). Traditionally, the arithmetic mean of a collection of time series was used with k-means. However, this ignores any offset. In 2011, an averaging technique specific to DTW, called DTW Barycentre Averaging (DBA), was proposed. Since, k-means with DBA has been the algorithm of choice for the majority of partition-based TSCL and much of the research using medoids-based approaches for TSCL stopped. We revisit k-medoids based TSCL with a range of elastic distance measures. Our results show k-medoids approaches are significantly better than k-means on a standard test suite, independent of the elastic distance measure used. We also compare the most commonly used alternating k-medoids approach against the Partition Around Medoids (PAM) algorithm. PAM significantly outperforms the default k-medoids for all nine elastic measures used. Additionally, we evaluate six variants of PAM designed to speed up TSCL. Finally, we show PAM with the best elastic distance measure is significantly better than popular alternative TSCL algorithms, including the k-means DBA approach, and competitive with the best deep learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/aeon-toolkit/aeon/.

References

  1. Aghabozorgi, S., Wah, T.Y.: Clustering of large time series datasets. Intell. Data Anal. 18, 793–817 (2014)

    Article  Google Scholar 

  2. Benavoli, A., Corani, G., Mangili, F.: Should we really use post-hoc tests based on mean-ranks? J. Mach. Learn. Res. 17, 1–10 (2016)

    MathSciNet  Google Scholar 

  3. Cai, B., Huang, G., Samadiani, N., Li, G., Chi, C.-H.: Efficient time series clustering by minimizing dynamic time warping utilization. IEEE Access 9, 46589–46599 (2021)

    Article  Google Scholar 

  4. Caiado, J., Maharaj, E., D’Urso, P.: Time series clustering. In: Handbook of Cluster Analysis, pp. 241–264 (2015)

    Google Scholar 

  5. Dau, H., et al.: The UCR time series archive. IEEE/CAA J. Automatica Sinica 6(6), 1293–1305 (2019)

    Article  Google Scholar 

  6. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  Google Scholar 

  7. der Laan, M.V., Pollard, K., Bryan, J.: A new partitioning around medoids algorithm. J. Stat. Comput. Simul. 73(8), 575–584 (2003)

    Article  MathSciNet  Google Scholar 

  8. Estivill-Castro, V.: Why so many clustering algorithms: a position paper. SIGKDD Explor. Newsl. 4(1), 65–75 (2002)

    Article  Google Scholar 

  9. García, S., Herrera, F.: An extension on “statistical comparisons of classifiers over multiple data sets’’ for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008)

    Google Scholar 

  10. Germain, T., Truong, C., Oudre, L., Krejci, E.: Unsupervised study of plethysmography signals through dtw clustering. In: 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pp. 3396–3400. IEEE (2022)

    Google Scholar 

  11. Holder, C., Guijo-Rubio, D., Bagnall, A.: Barycentre averaging for the move-split-merge time series distance measure. In: 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (2023)

    Google Scholar 

  12. Holder, C., Middlehurst, M., Bagnall, A.: A review and evaluation of elastic distance functions for time series clustering. Knowl. Inform. Syst. (2023)

    Google Scholar 

  13. Ismail-Fawaz, A., et al.: An approach to multiple comparison benchmark evaluations that is stable under manipulation of the comparate set. arXiv preprint arXiv:2305.11921 (2023)

  14. Javed, A., Lee, B. S., Rizzo, D.: A benchmark study on time series clustering. Mach. Learn. Appli. 1 (2020)

    Google Scholar 

  15. Kariv, O., Hakimi, S.L.: An algorithmic approach to network location problems. ii: the p-medians. SIAM J. Appli. Mathem. 37(3), 539–560 (1979)

    Google Scholar 

  16. Kaufman, L., Rousseeuw, P. J.: Clustering large data sets. In: Pattern Recognition in Practice, pp. 425–437. Elsevier, Amsterdam (1986)

    Google Scholar 

  17. Lafabregue, B., Weber, J., Gancarski, P., Forestier, G.: End-to-end deep representation learning for time series clustering: a comparative study. Data Min. Knowl. Disc. 36, 29–81 (2022)

    Article  MathSciNet  Google Scholar 

  18. Lenssen, L., Schubert, E.: Clustering by direct optimization of the medoid silhouette. In: Similarity Search and Applications: 15th International Conference, SISAP 2022, Bologna, Italy, Proceedings, pp. 190–204. Springer (2022). https://doi.org/10.1007/978-3-031-17849-8_15

  19. Leonard Kaufman, P.J.R.: Partitioning Around Medoids (Program PAM), chapter 2, pp. 68–125. John Wiley and Sons Ltd. (1990)

    Google Scholar 

  20. Li, H., Liu, J., Yang, Z., Liu, R.W., Wu, K., Wan, Y.: Adaptively constrained dynamic time warping for time series classification and clustering. Inf. Sci. 534, 97–116 (2020)

    Article  MathSciNet  Google Scholar 

  21. Li, X., Lin, J., Zhao, L.: Time series clustering in linear time complexity. Data Min. Knowl. Disc. 35(3), 2369–2388 (2021)

    Article  MathSciNet  Google Scholar 

  22. Lines, J., Bagnall, A.: Time series classification with ensembles of elastic distance measures. Data Min. Knowl. Disc. 29, 565–592 (2015)

    Article  MathSciNet  Google Scholar 

  23. Lloyd, S.P.: Least squares quantization in pcm. IEEE Trans. Inf. Theory 28, 129–136 (1982)

    Article  MathSciNet  Google Scholar 

  24. Marteau, P.: Time warp edit distance with stiffness adjustment for time series matching. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 306–318 (2009)

    Article  Google Scholar 

  25. Ng, R., Han, J.: CLARANS: a method for clustering objects for spatial data mining. IEEE Trans. Knowl. Data Eng. 14, 1003–1016 (2002)

    Article  Google Scholar 

  26. Paparrizos, J., Gravano, L.: k-shape: efficient and accurate clustering of time series. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1855–1870 (2015)

    Google Scholar 

  27. Petitjean, F., Ketterlin, A., Gancarski, P.: A global averaging method for dynamic time warping, with applications to clustering. Pattern Recogn. 44, 678 (2011)

    Article  Google Scholar 

  28. Ratanamahatana, C., Keogh, E.: Three myths about dynamic time warping data mining. In: Proceedings of the 5th SIAM International Conference on Data Mining (2005)

    Google Scholar 

  29. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    Article  Google Scholar 

  30. Schubert, E., Lenssen, L.: Fast k-medoids clustering in rust and python. J. Open Source Softw. 7(75), 4183 (2022)

    Article  Google Scholar 

  31. Schubert, E., Rousseeuw, P.J.: Faster k-medoids clustering: improving the PAM, CLARA, and CLARANS algorithms. In: Amato, G., Gennaro, C., Oria, V., Radovanović, M. (eds.) SISAP 2019. LNCS, vol. 11807, pp. 171–187. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32047-8_16

    Chapter  Google Scholar 

  32. Schubert, E., Rousseeuw, P.J.: Fast and eager k-medoids clustering: O(k) runtime improvement of the pam, clara, and clarans algorithms. Inf. Syst. 101, 101804 (2021)

    Article  Google Scholar 

  33. Shifaz, A., Pelletier, C., Petitjean, F., Webb, G.: Elastic similarity and distance measures for multivariate time series. Knowl. Inform. Syst. 65(6) (2023)

    Google Scholar 

  34. Stefan, A., Athitsos, V., Das, G.: The Move-Split-Merge metric for time series. IEEE Trans. Knowl. Data Eng. 25(6), 1425–1438 (2013)

    Article  Google Scholar 

  35. Tavenard, R., et al.: Tslearn, a machine learning toolkit for time series data. J. Mach. Learn. Res. 21(118), 1–6 (2020)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the UK Engineering and Physical Sciences Research Council (EPSRC) (grant ref.: EP/W030756/1), and by “Agencia Española de Investigación (España)” (grant ref.: PID2020-115454GB-C22 / AEI / 10.13039 / 501100011033). David Guijo-Rubio’s research has been subsidised by the University of Córdoba and financed by the European Union - NextGenerationEU (grant ref.: UCOR01MS). The experiments were carried out on the High Performance Computing Cluster supported by the Research and Specialist Computing Support service at the University of East Anglia. We would like to thank all those responsible for helping maintain the time series dataset archives and those contributing to open source implementations of the algorithms.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christopher Holder .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Holder, C., Guijo-Rubio, D., Bagnall, A. (2023). Clustering Time Series with k-Medoids Based Algorithms. In: Ifrim, G., et al. Advanced Analytics and Learning on Temporal Data. AALTD 2023. Lecture Notes in Computer Science(), vol 14343. Springer, Cham. https://doi.org/10.1007/978-3-031-49896-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-49896-1_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-49895-4

  • Online ISBN: 978-3-031-49896-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics