Abstract
We present a new method for time series clustering which we call the Hierarchical Spectral Merger (HSM) method. This procedure is based on the spectral theory of time series and identifies series that share similar oscillations or waveforms. The extent of similarity between a pair of time series is measured using the total variation distance between their estimated spectral densities. At each step of the algorithm, every time two clusters merge, a new spectral density is estimated using the whole information present in both clusters, which is representative of all the series in the new cluster. The method is implemented in an R package HSMClust. We present two applications of the HSM method, one to data coming from wave-height measurements in oceanography and the other to electroencefalogram (EEG) data.
Similar content being viewed by others
References
ALVAREZ-ESTEBAN, P.C., EUÁN, C., and ORTEGA, J. (2016), “Time Series Clustering Using the Total Variation Distance with Applications in Oceanography”, Environmetrics, 27, 355–369.
BRODTKORB, P.A., JOHANNESSON, P., LINDGREN, G., RYCHLIK, I., RYDÉN, J., and SJÖ, E. (2010), “WAFO - A Matlab Toolbox for Analysis of Random Waves and Loads”, in Proceedings of the 10th International Offshore and Polar Engineering Conference, Vol. 3, Seattle, USA, pp. 343–350.
CAIADO, J., CRATO, N., and PEÑA, D. (2006), “A Periodogram-Based Metric for Time Series Classification”, Computational Statistics and Data Analysis, 50, 2668–2684.
CAIADO, J., CRATO, N., and PEÑA, D. (2009), “Comparison of Times Series with Unequal Length in the Frequency Domain", Communications in Statistics - Simulation and Computation, 38, 527–540.
CAIADO, J., MAHARAJ, E.A., and D’URSO, P. (2015), “Time Series Clustering”, in Handbook of Cluster Analysis, eds. C. Hennig, M. Meila, F. Murtagh, and R. Rocci, Handbooks of Modern Statistical Methods, Chap. 12, Chapman and Hall/CRC, pp. 241–263.
CATTELL, R.B. (1966), “The Scree Test For The Number Of Factors”, Multivariate Behavioral Research, 1, 245–276.
CONTRERAS, P., and MURTAGH, F. (2015), "Hierarchical Clustering", in Handbook of Cluster Analysis, eds. C. Hennig, M. Meila, F. Murtagh, and R. Rocci, Handooks of Modern Statistiacl Methods, Chap. 12, Chapman and Hall/CRC, pp. 103–123.
EUÁN, C. (2016), “Detection of Changes in Time Series: A Frequency Domain Approach”, PhD dissertation, CIMAT.
GAVRILOV, M., ANGUELOV, D., INDYK, P., and MOTWANI, R. (2000), “Mining the Stock Market: Which Measure is Best”, in Proceedings of the 6 th ACM Internationall Conference on Knowledge Discovery and Data Mining, pp. 487–496.
GOUTTE, C., TOFT, P., ROSTRUP, E., NIELSEN, F., and HANSEN, L.K. (1999), “On Clustering fMRI Time Series”, NeuroImage, 9, 298–310.
KRAFTY, R.T. (2016), “Discriminant Analysis of Time Series in the Presence of Within-Group Spectral Variability”, Journal of Time Series Analysis, 37, 435–450.
KRAFTY, R.T., HALL, M., and GUO, W. (2011), “Functional Mixed Effects Spectral Analysis”, Biometrika, 98, 583–598.
KREISS, J.-P., and PAPARODITIS, E. (2015), “Bootstrapping Locally Stationary Processes”, Journal of the Royal Statistical Society. Series B. Statistical Methodology, 77, 267–290.
LIAO, T.W. (2005), “Clustering of Time Series Data – A Survey”, Pattern Recognition, 38, 1857–1874.
LONGUETT-HIGGINS, M. (1957), “The Statistical Analysis of a Random Moving Surface”, Philosophical Transactions of the Royal Society of London, Series A, 249, 321–387.
MAHARAJ, E., D’URSO, P., and GALAGEDERA, D. (2010), “Wavelet-Based Fuzzy Clustering of Time Series”, Journal of Classification, 27, 231–275.
MAHARAJ, E.A. (2002), “Comparison of Non-Stationary Time Series in the Frequency Domain”, Computational Statistics and Data Analysis, 40, 131–141.
MAHARAJ, E.A., and ALONSO, A.M. (2007), “Discrimination of Locally Stationary Time Series Using Wavelets”, Computational Statistics and Data Analysis, 52, 879–895.
MAHARAJ, E.A., and ALONSO, A.M (2014), “Discriminant Analysis of Multivariate Time Series: Application to Diagnosis Based on ECG Signals”, Computational Statistics and Data Analysis, 70, 67–87.
MAHARAJ, E.A., and D’URSO, P. (2011), “Fuzzy Clustering of Time Series in the Frequency Domain”, Information Sciences, 181, 1187–1211.
MAHARAJ, E.A., and D’URSO, P. (2012), “Wavelets-Based Clustering of Multivariate Time Series”, Fuzzy Sets and Systems, 193, 33–61.
MONTERO, P., and VILAR, J. (2014), “TsClust: An R package for Time Series Clustering”, Journal of Statistical Software, 62(1), 1–43
OCHI, M.K. (1998), Ocean Waves: The Stochastic Approach, Cambridge, U.K: Cambridge University Press.
PÉRTEGA DÍAZ, S., and VILAR, J.A. (2010), “Comparing Several Parametric and Nonparametric Approaches to Time Series Clustering: A Simulation Study”, Journal of Classification, 27, 333–362.
PIERSON, W.J. (1955), “Wind-Generated Gravity Waves”, Advances in Geophysics, 2, 93–178.
R CORE TEAM (2014), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria.
SHUMWAY, R.H., and STOFFER, D.S. (2011), Time Series Analysis and Its Applications. With R Examples (3rd ed.), New York: Springer.
THORNDIKE, R.L. (1953), “Who Belongs in the Family”, Psychometrika, 18(4), 267–276.
TIBSHIRANI, R., WALTHER, G., and HASTIE, T. (2001), “Estimating the Number of Clusters in a Data Set via the Gap Statistic”, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63, 411–423.
WU, J., SRINIVASAN, R., KAUR, A., and CRAMER, S.C. (2014), “Resting-State Cortical Connectivity Predicts Motor Skill Acquision”, NeuroImage, 91, 84–90.
XU, R., and WUNSCH, D. (2005), “Survey of Clustering Algorithms”, IEEE Transactions on Neural Networks, 16, 645–678.
Author information
Authors and Affiliations
Corresponding author
Additional information
The authors would like to thank the reviewers for their comments which led to improvements in this work.
Rights and permissions
About this article
Cite this article
Euán, C., Ombao, H. & Ortega, J. The Hierarchical Spectral Merger Algorithm: A New Time Series Clustering Procedure. J Classif 35, 71–99 (2018). https://doi.org/10.1007/s00357-018-9250-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-018-9250-5