Machine Learning

, Volume 108, Issue 12, pp 2159–2195 | Cite as

Covariance-based dissimilarity measures applied to clustering wide-sense stationary ergodic processes

  • Qidi PengEmail author
  • Nan Rao
  • Ran Zhao


We introduce a new unsupervised learning problem: clustering wide-sense stationary ergodic stochastic processes. A covariance-based dissimilarity measure together with asymptotically consistent algorithms is designed for clustering offline and online datasets, respectively. We also suggest a formal criterion on the efficiency of dissimilarity measures, and discuss an approach to improve the efficiency of our clustering algorithms, when they are applied to cluster particular type of processes, such as self-similar processes with wide-sense stationary ergodic increments. Clustering synthetic data and real-world data are provided as examples of applications.


Cluster analysis Wide-sense stationary ergodic processes Covariance-based dissimilarity measure Self-similar processes 

Mathematics Subject Classification

62-07 60G10 62M10 



We gratefully thank the editor Dr. João Gama and three anonymous referees for their careful reading of our manuscript and their many insightful comments and suggestions.


  1. Bastos, J. A., & Caiado, J. (2014). Clustering financial time series with variance ratio statistics. Quantitative Finance, 14(12), 2121–2133.MathSciNetzbMATHGoogle Scholar
  2. Bianchi, S., & Pianese, A. (2008). Multifractional properties of stock indices decomposed by filtering their pointwise Hölder regularity. International Journal of Theoretical and Applied Finance, 11(06), 567–595.MathSciNetzbMATHGoogle Scholar
  3. Boufoussi, B., Dozzi, M., & Guerbaz, R. (2008). Path properties of a class of locally asymptotically self similar processes. Electronic Journal of Probability, 13(29), 898–921.MathSciNetzbMATHGoogle Scholar
  4. Cambanis, S., Hardin, C. J., & Weron, A. (1987). Ergodic properties of stationary stable processes. Stochastic Processes and their Applications, 24(1), 1–18.MathSciNetzbMATHGoogle Scholar
  5. Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning, and games. Cambridge: Cambridge University Press.zbMATHGoogle Scholar
  6. Comte, F., & Renault, E. (1998). Long memory in continuous-time stochastic volatility models. Mathematical Finance, 8(4), 291–323.MathSciNetzbMATHGoogle Scholar
  7. Damian, D., Orešič, M., Verheij, E., et al. (2007). Applications of a new subspace clustering algorithm (COSA) in medical systems biology. Metabolomics, 3(1), 69–77.Google Scholar
  8. Embrechts, P., & Maejima, M. (2000). An introduction to the theory of self-similar stochastic processes. International Journal of Modern Physics B, 14(12), 1399–1420.MathSciNetzbMATHGoogle Scholar
  9. Gray, R. M. (1988). Probability, random processes, and ergodic properties. Berlin: Springer.zbMATHGoogle Scholar
  10. Hartigan, J. A. (1975). Clustering algorithms. New York: Wiley.zbMATHGoogle Scholar
  11. Herdin, M., Czink, N., Ozcelik, H., & Bonek, E. (2005). Correlation matrix distance, a meaningful measure for evaluation of non-stationary MIMO channels. In IEEE 61st vehicular technology conference, 2005 (Vol. 1, pp. 136–140).Google Scholar
  12. Hirkhorshidi, A. S., Aghabozorgi, S., & Wah, T. Y. (2015). A comparison study on similarity and dissimilarity measures in clustering continuous data. PLoS ONE, 10(12), e0144,059.Google Scholar
  13. Ieva, F., Paganoni, A. M., & Tarabelloni, N. (2016). Covariance-based clustering in multivariate and functional data analysis. Journal of Machine Learning Research, 17, 1–21.MathSciNetzbMATHGoogle Scholar
  14. Jääskinen, V., Parkkinen, V., Cheng, L., & Corander, J. (2014). Bayesian clustering of DNA sequences using markov chains and a stochastic partition model. Statistical Applications in Genetics and Molecular Biology, 13(1), 105–121.MathSciNetzbMATHGoogle Scholar
  15. Jain, A. K., & Mao, J. (1996). A self-organizing network for hyperellipsoidal clustering (HEC). IEEE Transactions on Neural Networks, 7, 16–29.Google Scholar
  16. Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys (CSUR), 31(3), 264–323.Google Scholar
  17. Juozapavičius, A., & Rapsevicius, V. (2001). Clustering through decision tree construction in geology. Nonlinear Analysis: Modelling and Control, 6(2), 29–41.zbMATHGoogle Scholar
  18. Katsavounidis, I., Kuo, C. J., & Zhang, Z. (1994). A new initialization technique for generalized Lloyd iteration. IEEE Signal Processing Letters, 1(10), 144–146.Google Scholar
  19. Khaleghi, A., Ryabko, D., Mari, J., & Preux, P. (2016). Consistent algorithms for clustering time series. Journal of Machine Learning Research, 17(3), 1–32.MathSciNetzbMATHGoogle Scholar
  20. Kleinberg, J. M. (2003). An impossibility theorem for clustering. Advances in Neural Information Processing Systems (NIPS), 15, 463–470.Google Scholar
  21. Lamperti, J. W. (1962). Semi-stable stochastic processes. Transactions of the American Mathematical Society, 104, 62–78.MathSciNetzbMATHGoogle Scholar
  22. Magdziarz, M., & Weron, A. (2011). Ergodic properties of anomalous diffusion processes. Annals of Physics, 326, 2431–2443.MathSciNetzbMATHGoogle Scholar
  23. Mandelbrot, B., & van Ness, J. W. (1968). Fractional Brownian motions, fractional noises and applications. SIAM Review, 10(4), 422–437.MathSciNetzbMATHGoogle Scholar
  24. Maruyama, G. (1970). Infinitely divisible processes. Theory of Probability and Its Applications, 15(1), 1–22.MathSciNetzbMATHGoogle Scholar
  25. Pavlidis, N. G., Plagianakos, V. P., Tasoulis, D. K., & Vrahatis, M. N. (2006). Financial forecasting through unsupervised clustering and neural networks. Operational Research, 6(2), 103–127.Google Scholar
  26. Peng, J., & Müller, H. G. (2008). Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions. The Annals of Applied Statistics, 2(3), 1056–1077.MathSciNetzbMATHGoogle Scholar
  27. Peng, Q. (2012). Uniform Hölder exponent of a stationary increments Gaussian process: Estimation starting from average values. Statistics & Probability Letters, 81(8), 1326–1335.zbMATHGoogle Scholar
  28. Rubinstein, M., Joulin, A., Kopf, J., & Liu, C. (2013). Unsupervised joint object discovery and segmentation in internet images. In The IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1939–1946).Google Scholar
  29. Samorodnitsky, G. (2004). Extreme value theory, ergodic theory and the boundary between short memory and long memory for stationary stable processes. The Annals of Probability, 32(2), 1438–1468.MathSciNetzbMATHGoogle Scholar
  30. Samorodnitsky, G., & Taqqu, M. S. (1994). Stable non-Gaussian random processes: Stochastic models with infinite variance. New York: Chapman & Hall.zbMATHGoogle Scholar
  31. Sen, P. K., & Singer, J. M. (1993). Large sample methods in statistics. New York: Chapman & Hall Inc.zbMATHGoogle Scholar
  32. Shields, P. C. (1996). The ergodic theory of discrete sample paths, Graduate Studies in Mathematics (Vol. 13). Providence: American Mathematical Society.Google Scholar
  33. Śęlzak, J. (2017). Asymptotic behaviour of time averages for non-ergodic Gaussian processes. Annals of Physics, 383, 285–311.MathSciNetGoogle Scholar
  34. Slonim, N., Atwal, G. S., Tkavcik, G., & Bialek, W. (2005). Information-based clustering. PNAS, 102(51), 18,297–18,302.MathSciNetGoogle Scholar
  35. Wilson, D. R., & Martinez, T. R. (1997). Improved heterogeneous distance functions. JAIR, 6, 1–34.MathSciNetzbMATHGoogle Scholar
  36. Xu, R., & Wunsch, D. (2005). Survey of clustering algorithms. IEEE Transactions on Neural Networks, 16(3), 645–678.Google Scholar
  37. Zhao, W., Zou, W., & Chen, J. J. (2014). Topic modeling for cluster analysis of large biological and medical datasets. BMC Bioinformatics, 15, S11.Google Scholar

Copyright information

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Institute of Mathematical SciencesClaremont Graduate UniversityClaremontUSA
  2. 2.ClaremontUSA
  3. 3.School of Mathematical SciencesShanghai Jiao Tong UniversityShanghaiChina
  4. 4.Institute of Mathematical Sciences and Drucker School of ManagementClaremont Graduate UniversityClaremontUSA
  5. 5.ClaremontUSA

Personalised recommendations