Skip to main content

On Clustering Financial Time Series: A Need for Distances Between Dependent Random Variables

  • Chapter
  • First Online:
Computational Information Geometry

Part of the book series: Signals and Communication Technology ((SCT))

Abstract

This artilce summarizes our work on the clustering of financial time series. It was written for a workshop on information geometry and its application for image and signal processing. This workshop brought several experts in pure and applied mathematics together with applied researchers from medical imaging, radar signal processing and finance. The authors belong to the latter group. This document was written as a long introduction to further development of geometric tools in financial applications such as risk or portfolio analysis. Indeed, risk and portfolio analysis essentially rely on covariance matrices. Besides that the Gaussian assumption is known to be inaccurate, covariance matrices are difficult to estimate from empirical data. To filter noise from the empirical estimate, Mantegna proposed using hierarchical clustering. In this work, we first show that this procedure is statistically consistent. Then, we propose to use clustering with a much broader application than the filtering of empirical covariance matrices from the estimated correlation coefficients. To be able to do that, we need to obtain distances between the financial time series that incorporate all the available information in these cross-dependent random processes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Allez, R., Bun, J., Bouchaud, J.-P. (2014). The eigenvectors of gaussian matrices with an external source. arXiv:1412.7108.

  • Ao, S. I., Yip, K., Ng, M., Cheung, D., Fong, P.-Y., Melhado, I., et al. (2005). Clustag: Hierarchical clustering and graph methods for selecting tag SNPs. Bioinformatics, 21(8), 1735–1736.

    Article  Google Scholar 

  • Atkinson, C., Mitchell, A.F.S. (1981). Rao’s distance measure. Sankhyā: The Indian Journal of Statistics, Series A (pp. 345–365).

    Google Scholar 

  • Balakrishnan, S., Xu, M., Krishnamurthy, A., & Singh, A. (2011). Noise thresholds for spectral clustering. NIPS, 2011, 954–962.

    Google Scholar 

  • Basalto, N., Bellotti, R., De Carlo, F., Facchi, P., Pantaleo, E., & Pascazio, S. (2007). Hausdorff clustering of financial time series. Physica A: Statistical Mechanics and its Applications, 379(2), 635–644.

    Article  Google Scholar 

  • Bien, J., & Tibshirani, R. (2011). Hierarchical clustering with prototypes via minimax linkage. Journal of the American Statistical Association, 106(495), 1075–1084.

    Article  MathSciNet  MATH  Google Scholar 

  • Borysov, P., Hannig, J., & Marron, J. S. (2014). Asymptotics of hierarchical clustering for growing dimension. Journal of Multivariate Analysis, 124, 465–479.

    Article  MathSciNet  MATH  Google Scholar 

  • Bun, J., Allez, R., Bouchaud, J.-P., & Potters, M. (2015). Rotational invariant estimator for general noisy matrices. arXiv:1502.06736.

  • Chen, Z., & Van Ness, J. W. (1996). Space-conserving agglomerative algorithms. Journal of Classification, 13(1), 157–168.

    Article  MathSciNet  MATH  Google Scholar 

  • Cont, R. (2001). Empirical properties of asset returns: Stylized facts and statistical issues. Quantitative Finance, 1(2), 223–236.

    Article  Google Scholar 

  • Costa, S. I. R., Santos, S. A., & Strapasson, J. E. (2014). Fisher information distance: A geometrical reading. Discrete Applied Mathematics, 197, 59–69.

    Article  MathSciNet  MATH  Google Scholar 

  • Dasu, T., Swayne, D. F., & Poole, D. (2005). Grouping multivariate time series: A case study. In Proceedings of the IEEE Workshop on Temporal Data Mining: Algorithms, Theory and Applications, in conjunction with the Conference on Data Mining, Houston (pp. 25–32).

    Google Scholar 

  • Deheuvels, P. (1979) La fonction de dépendance empirique et ses propriétés. Un test non paramétrique d’indépendance. Académie Royale de. Belgique. Bulletin de la Classe des Sciences (5), 65(6), 274–292.

    Google Scholar 

  • Deheuvels, P. (1981). An asymptotic decomposition for multivariate distribution-free tests of independence. Journal of Multivariate Analysis, 11(1), 102–113.

    Article  MathSciNet  MATH  Google Scholar 

  • Donnat, P., Marti, G., & Very, P. (2016). Toward a generic representation of random variables for machine learning. Pattern Recognition Letters, 70, 24–31.

    Article  Google Scholar 

  • El Maliani, A. D., El Hassouni, M., Lasmar, N.-E., Berthoumieu, Y., & Aboutajdine, D. (2011). Color texture classification using rao distance between multivariate copula based models. Computer analysis of images and patterns (pp. 498–505). Berlin: Springer.

    Chapter  Google Scholar 

  • Fredricks, G. A., & Nelsen, R. B. (2007). On the relationship between Spearman’s rho and Kendall’s tau for pairs of continuous random variables. Journal of Statistical Planning and Inference, 137(7), 2143–2150.

    Article  MathSciNet  MATH  Google Scholar 

  • Genest, C., Quesada Molina, J. J., & Rodríguez Lallena, J. A. (1995). De l’impossibilité de construire des lois à marges multidimensionnelles données à partir de copules. Comptes rendus de l’Académie des sciences. Série 1, Mathématique, 320(6), 723–726.

    Google Scholar 

  • Hartigan, J. A. (1981). Consistency of single linkage for high-density clusters. Journal of the American Statistical Association, 76(374), 388–394.

    Article  MathSciNet  MATH  Google Scholar 

  • Khaleghi, A., Ryabko, D., Mary, J., & Preux, P. (2012). Online clustering of processes. (pp. 601–609).

    Google Scholar 

  • Killiches, M., Kraus, D., & Czado, C. (2015). Model distances for vine copulas in high dimensions with application to testing the simplifying assumption. arXiv:1510.03671.

  • Krishnamurthy, A., Balakrishnan, S., Xu, M., & Singh, A. (2012). Efficient active algorithms for hierarchical clustering. In International Conference on Machine Learning.

    Google Scholar 

  • Laloux, L., Cizeau, P., Bouchaud, J.-P., & Potters, M. (1999). Noise dressing of financial correlation matrices. Physical Review Letters, 83(7), 1467.

    Article  Google Scholar 

  • Laloux, L., Cizeau, P., Potters, M., & Bouchaud, J.-P. (2000). Random matrix theory and financial correlations. International Journal of Theoretical and Applied Finance, 3(03), 391–397.

    Article  MATH  Google Scholar 

  • Lange, T., Roth, V., Braun, M. L., & Buhmann, J. M., (2004). Stability-based validation of clustering solutions. Neural Computation, 16(6), 1299–1323.

    Google Scholar 

  • Lemieux, V., Rahmdel, P. S., Walker, R., Wong, B. L. & Flood, M. (2014). Clustering techniques and their effect on portfolio formation and risk analysis (pp. 1–6).

    Google Scholar 

  • Li, H., Scarsini, M., & Shaked, M. (1996). Linkages: A tool for the construction of multivariate distributions with given nonoverlapping multivariate marginals. Journal of Multivariate Analysis, 56(1), 20–41.

    Article  MathSciNet  MATH  Google Scholar 

  • Mantegna, R. N. (1999). Hierarchical structure in financial markets. The European Physical Journal B-Condensed Matter and Complex Systems, 11(1), 193–197.

    Article  Google Scholar 

  • Mantegna, R. N., & Stanley, H. E. (1999). Introduction to econophysics: Correlations and complexity in finance. Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Marti, G., Nielsen, F., & Donnat, P. (2016). Optimal copula transport for clustering multivariate time series. IEEE ICASSP.

    Google Scholar 

  • Marti, G., Very, P., Donnat, P., & Nielsen, F. (2015). A proposal of a methodological framework with experimental guidelines to investigate clustering stability on financial time series. IEEE ICMLA.

    Google Scholar 

  • Meinshausen, N., & Bühlmann, P. (2010). Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417–473.

    Article  MathSciNet  Google Scholar 

  • Murtagh, F., & Contreras, P. (2011). Methods of hierarchical clustering. arXiv:1105.0121.

  • Pantaleo, E., Tumminello, M., Lillo, F., & Mantegna, R. N. (2011). When do improved covariance matrix estimators enhance portfolio optimization? An empirical comparative study of nine estimators. Quantitative Finance, 11(7), 1067–1080.

    Article  MathSciNet  Google Scholar 

  • Plerou, V., Gopikrishnan, P., Rosenow, B., Nunes Amaral, L. A., Guhr, T., & Stanley, H. E. (2002). Random matrix approach to cross correlations in financial data. Physical Review E, 65(6), 066126.

    Article  Google Scholar 

  • Pollard, D., et al. (1981). Strong consistency of \(k\)-means clustering. The Annals of Statistics, 9(1), 135–140.

    Article  MathSciNet  MATH  Google Scholar 

  • Potters, M., Bouchaud, J.-P., & Laloux, L. (2005). Financial applications of random matrix theory: Old laces and new pieces. arXiv:physics/0507111.

  • Ryabko, D. (2010a). Clustering processes (pp. 919–926).

    Google Scholar 

  • Ryabko, D. (2010b). Clustering processes. In Proceedings of the 27th International Conference on Machine Learning (ICML 2010) (pp. 919–926). Haifa, Israel.

    Google Scholar 

  • Shamir, O., & Tishby, N. (2007). Cluster stability for finite samples. In NIPS.

    Google Scholar 

  • Shamir, O., & Tishby, N. (2008). Model selection and stability in k-means clustering. In Learning theory.

    Google Scholar 

  • Singhal, A., & Seborg, D. E. (2002). Clustering of multivariate time-series data. In American Control Conference, 2002. Proceedings of the 2002 (Vol 5, pp. 3931–3936). IEEE.

    Google Scholar 

  • Sklar, A. (1959). Fonctions de répartition à n dimensions et leurs marges. Université Paris, 8.

    Google Scholar 

  • Smola, A., Gretton, A., Song, L., & Schölkopf, B. (2007). A hilbert space embedding for distributions. Algorithmic learning theory (pp. 13–31). Berlin: Springer.

    Chapter  Google Scholar 

  • Sriperumbudur, B. K., Fukumizu, K., Gretton, A., Lanckriet, G. R. G., & Schölkopf, B. (2009). Kernel choice and classifiability for RKHS embeddings of probability distributions. In NIPS (pp. 1750–1758).

    Google Scholar 

  • Takatsu, A., et al. (2011). Wasserstein geometry of gaussian measures. Osaka Journal of Mathematics, 48(4), 1005–1026.

    MathSciNet  MATH  Google Scholar 

  • Terada, Y. (2013). Strong consistency of factorial k-means clustering. Annals of the Institute of Statistical Mathematics, 67(2), 335–357.

    Article  MathSciNet  MATH  Google Scholar 

  • Terada, Y. (2014). Strong consistency of reduced k-means clustering. Scandinavian Journal of Statistics, 41(4), 913–931.

    Article  MathSciNet  MATH  Google Scholar 

  • Tola, V., Lillo, F., Gallegati, M., & Mantegna, R. N. (2008). Cluster analysis for portfolio optimization. Journal of Economic Dynamics and Control, 32(1), 235–258.

    Article  MathSciNet  MATH  Google Scholar 

  • Tumminello, M., Lillo, F., & Mantegna, R. N. (2007). Shrinkage and spectral filtering of correlation matrices: A comparison via the kullback-leibler distance. arXiv:0710.0576.

  • Von Luxburg, U., Belkin, M., & Bousquet, O. (2008). Consistency of spectral clustering. The Annals of Statistics, 36, 555–586.

    Article  MathSciNet  MATH  Google Scholar 

  • Yang, K., & Shahabi, C. (2004). A PCA-based similarity measure for multivariate time series. In Proceedings of the 2nd ACM International Workshop on Multimedia Databases (pp. 65–74). ACM.

    Google Scholar 

Download references

Acknowledgements

Gautier Marti wants to thank Prof. Eguchi for helpful and encouraging remarks, Prof. Brigo for pointing us interesting research directions on dependence, copulas and optimal transport, and Frédéric Barbaresco for sending us relevant literature, historical references, and interesting discussions. We also want to thank our colleagues at Hellebore Capital, and the friendly feedbacks from Philippe Very. Finally, the authors thank the organizers of the workshop “Computational information geometry for image and signal processing” at the International Centre for Mathematical Sciences, Edinburgh, UK, for the invitation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gautier Marti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Marti, G., Nielsen, F., Donnat, P., Andler, S. (2017). On Clustering Financial Time Series: A Need for Distances Between Dependent Random Variables. In: Nielsen, F., Critchley, F., Dodson, C. (eds) Computational Information Geometry. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-47058-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47058-0_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47056-6

  • Online ISBN: 978-3-319-47058-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics