End-to-end deep representation learning for time series clustering: a comparative study

Lafabregue, Baptiste; Weber, Jonathan; Gançarski, Pierre; Forestier, Germain

doi:10.1007/s10618-021-00796-y

End-to-end deep representation learning for time series clustering: a comparative study

Published: 16 October 2021

Volume 36, pages 29–81, (2022)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

3503 Accesses
21 Citations
22 Altmetric
Explore all metrics

Abstract

Time series are ubiquitous in data mining applications. Similar to other types of data, annotations can be challenging to acquire, thus preventing from training time series classification models. In this context, clustering methods can be an appropriate alternative as they create homogeneous groups allowing a better analysis of the data structure. Time series clustering has been investigated for many years and multiple approaches have already been proposed. Following the advent of deep learning in computer vision, researchers recently started to study the use of deep clustering to cluster time series data. The existing approaches mostly rely on representation learning (imported from computer vision), which consists of learning a representation of the data and performing the clustering task using this new representation. The goal of this paper is to provide a careful study and an experimental comparison of the existing literature on time series representation learning for deep clustering. In this paper, we went beyond the sole comparison of existing approaches and proposed to decompose deep clustering methods into three main components: (1) network architecture, (2) pretext loss, and (3) clustering loss. We evaluated all combinations of these components (totaling 300 different models) with the objective to study their relative influence on the clustering performance. We also experimentally compared the most efficient combinations we identified with existing non-deep clustering methods. Experiments were performed using the largest repository of time series datasets (the UCR/UEA archive) composed of 128 univariate and 30 multivariate datasets. Finally, we proposed an extension of the class activation maps method to the unsupervised case which allows to identify patterns providing highlights on how the network clustered the time series.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning for time series classification: a review

Article 02 March 2019

Machine and deep learning for longitudinal biomedical data: a review of methods and applications

Article Open access 05 August 2023

A survey of methods for time series change point detection

Article 08 September 2016

Notes

Abbreviations

X :: The dataset to cluster
x :: An element of X
\(x_i\) :: The ith element of X
N :: The number of elements in X
k :: The number of expected clusters
\(|.|\) :: The cardinatlity of the set
f():: The encoder non-linear function
g():: The decoder non-linear function
Z :: Projection of X in the latent space, equals to f(X)
z :: An element of Z
\(z_i\) :: The ith element of Z

References

Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering—a decade review. Inf Syst 53:16–38
Article Google Scholar
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018) The uea multivariate time series classification archive, 2018. arXiv preprint arXiv:1811.00075
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
Ballard DH (1987) Modular learning in neural networks. In: AAAI, pp 279–284
Becker S (1991) Unsupervised learning procedures for neural networks. Int J Neural Syst 2(01–02):17–33
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166
Article Google Scholar
Bo D, Wang X, Shi C, Zhu M, Lu E, Cui P (2020) Structural deep clustering network. Proc Web Conf 2020:1400–1410
Google Scholar
Caron M, Bojanowski P, Joulin A, Douze M (2018) Deep clustering for unsupervised learning of visual features. In: Proceedings of the European conference on computer vision (ECCV), pp 132–149
Chan KP, Fu AWC (1999) Efficient time series matching by wavelets. In: Proceedings 15th international conference on data engineering (Cat. No. 99CB36337). IEEE, pp 126–133
Chang S, Zhang Y, Han W, Yu M, Guo X, Tan W, Cui X, Witbrock M, Hasegawa-Johnson MA, Huang TS (2017) Dilated recurrent neural networks. In: Advances in neural information processing systems, pp 77–87
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078
Dau HA, Bagnall A, Kamgar K, Yeh CCM, Zhu Y, Gharghabi S, Ratanamahatana CA, Keogh E (2019) The UCR time series archive. IEEE/CAA J Autom Sin 6(6):1293–1305
Article Google Scholar
Dempster A, Petitjean F, Webb GI (2020) Rocket: exceptionally fast and accurate time series classification using random convolutional kernels. Data Mining and Knowledge Discovery 1–42
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30
Doersch C, Gupta A, Efros AA (2015) Unsupervised visual representation learning by context prediction. In: Proceedings of the IEEE international conference on computer vision, pp 1422–1430
Fawaz HI, Forestier G, Weber J, Idoumghar L, Muller PA (2019) Deep learning for time series classification: a review. Data Min Knowl Disc 33(4):917–963
Article MathSciNet Google Scholar
Franceschi JY, Dieuleveut A, Jaggi M (2019) Unsupervised scalable representation learning for multivariate time series. In: Advances in neural information processing systems, pp 4652–4663
Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with LSTM. Neural Comput 12(10):2451–2471
Article Google Scholar
Ghasedi K, Wang X, Deng C, Huang H (2019) Balanced self-paced learning for generative adversarial clustering network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4391–4400
Ghasedi Dizaji K, Herandi A, Deng C, Cai W, Huang H (2017) Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization. In: Proceedings of the IEEE international conference on computer vision, pp 5736–5745
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
Guan Q, Huang Y, Zhong Z, Zheng Z, Zheng L, Yang Y (2018) Diagnose like a radiologist: attention guided convolutional neural network for thorax disease classification. arXiv preprint arXiv:1801.09927
Guo X, Gao L, Liu X, Yin J (2017a) Improved deep embedded clustering with local structure preservation. In: IJCAI, pp 1753–1759
Guo X, Liu X, Zhu E, Yin J (2017b) Deep clustering with convolutional autoencoders. In: International conference on neural information processing. Springer, pp 373–382
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 65–70
Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci 79(8):2554–2558
Article MathSciNet Google Scholar
Ienco D, Pensa RG (2019) Deep triplet-driven semi-supervised embedding clustering. In: International conference on discovery science. Springer, pp 220–234
Jaderberg M, Simonyan K, Zisserman A, Kavukcuoglu K (2015) Spatial transformer networks. arXiv preprint arXiv:1506.02025
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666
Article Google Scholar
Jiang Z, Zheng Y, Tan H, Tang B, Zhou H (2016) Variational deep embedding: a generative approach to clustering. CoRR
Jiao Y, Yang K, Dou S, Luo P, Liu S, Song D (2020) Timeautoml: autonomous representation learning for multivariate irregularly sampled time series. arXiv preprint arXiv:2010.01596
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907
Kotsiantis SB, Zaharakis I, Pintelas P (2007) Supervised machine learning: a review of classification techniques. Emerg Artif Intell Appl Comput Eng 160(1):3–24
Google Scholar
Kramer MA (1991) Nonlinear principal component analysis using autoassociative neural networks. AIChE J 37(2):233–243
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Larsson G, Maire M, Shakhnarovich G (2017) Colorization as a proxy task for visual understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6874–6883
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Li X, Chen Z, Poon LK, Zhang NL (2018) Learning latent superstructures in variational autoencoders for deep multidimensional clustering. In: International conference on learning representations
Liao TW (2005) Clustering of time series data—a survey. Pattern Recogn 38(11):1857–1874
Article Google Scholar
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Disc 15(2):107–144
Article MathSciNet Google Scholar
Lipton ZC, Tripathi S (2017) Precise recovery of latent vectors from generative adversarial networks. arXiv preprint arXiv:1702.04782
Ma Q, Zheng J, Li S, Cottrell GW (2019) Learning representations for time series clustering. In: Advances in neural information processing systems, pp 3776–3786
Ma Q, Li S, Zhuang W, Wang J, Zeng D (2020) Self-supervised time series clustering with model-based dynamics. IEEE Trans Neural Netw Learn Syst
Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(Nov):2579–2605
Madiraju NS, Sadat SM, Fisher D, Karimabadi H (2018) Deep temporal clustering: fully unsupervised learning of time-domain features. arXiv preprint arXiv:1802.01059
Makhzani A, Frey B (2013) K-sparse autoencoders. arXiv preprint arXiv:1312.5663
McConville R, Santos-Rodriguez R, Piechocki RJ, Craddock I (2019) N2d:(not too) deep clustering via clustering the local manifold of an autoencoded embedding. arXiv preprint arXiv:1908.05968
McInnes L, Healy J, Melville J (2018) Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426
Mukherjee S, Asnani H, Lin E, Kannan S (2019) Clustergan: latent space clustering in generative adversarial networks. Proc AAAI Conf Artif Intell 33:4610–4617
Google Scholar
Panuccio A, Bicego M, Murino V (2002) A hidden Markov model-based approach to sequential data clustering. In: Joint IAPR international workshops on statistical techniques in pattern recognition (SPR) and structural and syntactic pattern recognition (SSPR). Springer, pp 734–743
Paparrizos J, Gravano L (2015) k-shape: efficient and accurate clustering of time series. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, pp 1855–1870
Pascanu R, Gulcehre C, Cho K, Bengio Y (2013) How to construct deep recurrent neural networks. arXiv preprint arXiv:1312.6026
Petitjean F, Ketterlin A, Gançarski P (2011) A global averaging method for dynamic time warping, with applications to clustering. Pattern Recogn 44(3):678–693
Article Google Scholar
Rani S, Sikka G (2012) Recent techniques of clustering of time series data: a survey. Int J Comput Appl 52(15)
Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: explicit invariance during feature extraction. In: Icml
Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386
Article Google Scholar
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Article Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
Article Google Scholar
Sak H, Senior A, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Fifteenth annual conference of the international speech communication association
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1):43–49
Article Google Scholar
Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Er MJ, Ding W, Lin CT (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681
Article Google Scholar
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
Shahriari B, Swersky K, Wang Z, Adams RP, De Freitas N (2015) Taking the human out of the loop: a review of bayesian optimization. Proc IEEE 104(1):148–175
Article Google Scholar
Souza TV, Zanchettin C (2019) Improving deep image clustering with spatial transformer layers. In: International conference on artificial neural networks. Springer, pp 641–654
Sun D, Wulff J, Sudderth EB, Pfister H, Black MJ (2013) A fully-connected layered model of foreground and background flow. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2451–2458
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112
Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv preprint arXiv:1706.03762
Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning, pp 1096–1103
Wang Z, Yan W, Oates T (2017) Time series classification from scratch with deep neural networks: a strong baseline. In: 2017 international joint conference on neural networks (IJCNN). IEEE, pp 1578–1585
Wang C, Pan S, Hu R, Long G, Jiang J, Zhang C (2019) Attributed graph clustering: a deep attentional embedding approach. arXiv preprint arXiv:1906.06532
Weiss G, Goldberg Y, Yahav E (2018) On the practical computational power of finite precision rnns for language recognition. arXiv preprint arXiv:1805.04908
Wilcoxon F (1992) Individual comparisons by ranking methods. In: Breakthroughs in statistics. Springer, pp 196–202
Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Xiao Y, Cho K (2016) Efficient character-level document classification by combining convolution and recurrent layers. arXiv preprint arXiv:1602.00367
Xiao Z, Xu X, Xing H, Chen J (2020) Rtfn: robust temporal feature network. arXiv preprint arXiv:2008.07707
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning, pp 478–487
Xu J, Xiao L, López AM (2019) Self-supervised domain adaptation for computer vision tasks. IEEE Access 7:156694–156706
Article Google Scholar
Yang X, Deng C, Zheng F, Yan J, Liu W (2019) Deep spectral clustering using dual autoencoder network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4066–4075
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122
Zeng N, Zhang H, Song B, Liu W, Li Y, Dobaie AM (2018) Facial expression recognition via learning deep sparse autoencoders. Neurocomputing 273:643–649
Article Google Scholar
Zha H, He X, Ding C, Gu M, Simon HD (2002) Spectral relaxation for k-means clustering. In: Advances in neural information processing systems, pp 1057–1064
Zhang Q, Wu J, Zhang P, Long G, Zhang C (2018) Salient subsequence learning for time series clustering. IEEE Trans Pattern Anal Mach Intell 41(9):2193–2207
Article Google Scholar
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929

Download references

Acknowledgements

The authors would like to thank the creators and providers of the datasets: Hoang Anh Dau, Anthony Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, Eamonn Keogh, and Mustafa Baydogan. The authors would also like to thank the Mésocentre of Strasbourg for providing access to the GPU cluster. The authors would also like to thank Hassan Fawaz that gave free access to his code on time series processing and cd-diagrams. This work was supported by the ANR TIMES project (Grant ANR-17-CE23-0015) of the French Agence Nationale de la Recherche.

Author information

Authors and Affiliations

IRIMAS, Université de Haute Alsace, Mulhouse, France
Baptiste Lafabregue, Jonathan Weber & Germain Forestier
ICube, Université de Strasbourg, Strasbourg, France
Baptiste Lafabregue & Pierre Gançarski
Faculty of IT, Monash University, Melbourne, Australia
Germain Forestier

Authors

Baptiste Lafabregue
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Weber
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Gançarski
View author publications
You can also search for this author in PubMed Google Scholar
Germain Forestier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baptiste Lafabregue.

Additional information

Responsible editor: Eamonn Keogh.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lafabregue, B., Weber, J., Gançarski, P. et al. End-to-end deep representation learning for time series clustering: a comparative study. Data Min Knowl Disc 36, 29–81 (2022). https://doi.org/10.1007/s10618-021-00796-y

Download citation

Received: 08 March 2021
Accepted: 02 September 2021
Published: 16 October 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s10618-021-00796-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

End-to-end deep representation learning for time series clustering: a comparative study

Abstract

Access this article

Similar content being viewed by others

Deep learning for time series classification: a review

Machine and deep learning for longitudinal biomedical data: a review of methods and applications

A survey of methods for time series change point detection

Notes

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

End-to-end deep representation learning for time series clustering: a comparative study

Abstract

Access this article

Similar content being viewed by others

Deep learning for time series classification: a review

Machine and deep learning for longitudinal biomedical data: a review of methods and applications

A survey of methods for time series change point detection

Notes

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation