Skip to main content

Cluster Analysis for Investment Funds Portfolio Optimisation: A Symbolic Data Approach

  • Chapter
  • First Online:
Financial Risk Management and Modeling

Part of the book series: Risk, Systems and Decisions ((RSD))

Abstract

In risk management and portfolio optimization it is important to know which assets move individually or in certain groups to make a diversified portfolio. The statistical uncertainty of the correlation matrix is the main problem into the optimization of a financial portfolio. Indeed, estimates of correlations are often noisy particularly in stress period and unreliable as estimation horizons are always finite. Another drawback in the classical estimation of correlations is that time series are estimated on historical data and prediction based on past data is very difficult, since finding elementary structures in data which are valid and persistent in the future is not really easy. The Markowitz optimization approaches of portfolio suffer from theses estimation errors. From the perspective of machine learning, new approaches have been proposed in the literature of applied finance. Among these techniques, clustering has been considered as a significant method to capture the natural structure of data. The objective of this research is to use data mining approaches for identifying the best clustering indicators for building optimal portfolios. Clustering is an empirical procedure for grouping financial assets into homogeneous groups. The aim of cluster analysis is to maximize similarity within groups of assets and minimize similarity between groups. The similarities and dissimilarities are based on the attribute values and frequently involve distance measures. There are different techniques used for clustering, some are Partitioning based technique, Density based technique, Model based technique, Grid based technique. In this research we consider the symbolic approach based histogram-valued data and clusters as a new approach for investment funds portfolio optimization. Firstly, it is based on aggregating individual level data into group-based summarized by symbols. In our case, symbols are histogram-valued data taking into account variability inside groups. Secondly, for partitioning, we use dynamical clustering which is an extension of K-means where, instead of the means, we use other kinds of centers called ‘kernel’ distributions in our case. After clustering, stock samples are selected from these clusters to create funds of funds optimal portfolios which impose the lowest risk measured in terms of Conditional Value at Risk for a certain return. Funds’ Portfolios are compared during the period of 2008–2016 using the conditional Sharpe ratio and the 2017 year is used to validate our results out of sample. In this research we show that the use of symbolic data clustering algorithms can improve the reliability of the portfolio in terms of the risk adjusted performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Acerbi, Tasche (2002) On the coherence of expected shortfall. J Bank Financ 26(7):1487–1503

    Article  Google Scholar 

  • Afonso F, Diday E, Toque C (2018) Data science par analyse des données symboliques. Technip, 448 pages. ISBN: 9782710811817

    Google Scholar 

  • Argawal, Naik (2004) Risks and portfolio decisions involving hedge funds. Rev Financ Stud 17(1):63–98

    Article  Google Scholar 

  • Artzner P, Delbaen F, Eber J-M, Heath D (1997) Thinking coherently. Risk 10:68–71

    Google Scholar 

  • Basak and Shapiro (2001) Value at Risk based management: optimal policies and asset prices. Review of Financial Studies 14(2):371–405

    Google Scholar 

  • Basak S, Shapiro A (1998) Value-at-risk based management: optimal policies and asset prices. Working paper, Wharton School, University of Pennsylvania

    Google Scholar 

  • Billard L, Diday E (2007) Symbolic data analysis: conceptual statistics and data mining (Wiley series in computational statistics). Wiley, Hoboken

    Google Scholar 

  • Billard L, Diday E (2019) Clustering methodology for symbolic data. Wiley, Hoboken, p 288

    Book  Google Scholar 

  • Bock HH, Diday E (2000) Analysis of symbolic data. Exploratory methods for extracting statistical information from complex data. Springer, Berlin

    Google Scholar 

  • Brito P, Chavent M (2012) Divisive monothetic clustering for interval and histogram-valued data. In: Proceedings ICPRAM 2012-1st international conference on pattern recognition applications and methods, Vilamoura, Portugal

    Google Scholar 

  • Calinski T, Harabasz J (1974) (1974). A dendrite method for cluster analysis. Commun Stat 3:1–27

    Google Scholar 

  • Dias S, Brito P (2015) Linear regression model with histogram-valued variables. Stat Anal Data Min 8(2):75–113

    Article  Google Scholar 

  • Diday E (1971) La méthode des nuées dynamiques. Revue de Statistique Appliquée 19:19–34

    Google Scholar 

  • Diday E (1988) The symbolic approach in clustering and related methods of data analysis: the basic choices. In: Bock HH (ed) IFCS ‘87, vol 1988, pp 673–684

    Google Scholar 

  • Diday E (2010) Principal component analysis for categorical histogram, data: some open directions of research. In: Fichet B, Piccolo D, Verde R, Vichi M (eds) Classification and multivariate analysis for complex data structures. Springer Verlag, Heidelberg, p 492. ISBN 9783642133114

    Google Scholar 

  • Diday E (2013) Principal component analysis for bar charts and metabins tables. Stat Anal Data Min ASA Data Sci J 6(5):403–430

    Article  Google Scholar 

  • Diday E (2016) Thinking by classes in Data Science: the symbolic data analysis paradigm. WIREs Comput Stat 8:172–205. https://doi.org/10.1002/wics.1384

    Article  Google Scholar 

  • Diday E, Noirhomme-Fraiture M (2008) Symbolic data analysis and the SODAS software. Wiley-Interscience, New York

    Google Scholar 

  • Diday E, Simon JC (1976) Clustering analysis. In: Fu K (ed) Digital pattern classification. Springer, Berlin

    Google Scholar 

  • Elton EJ, Gruber MJ, Brown SJ, Goetzman WN (2007) Modern portfolio theory and investment analysis, 7th edn. Wiley, New York

    Google Scholar 

  • Emilion R, Diday E (2018) Symbolic data analysis basic theory. In: Saporta, Wang, Diday, Rong Guan (eds) Chapter in Advances in data sciences. ISTE-Wiley

    Google Scholar 

  • Gaivoronski AA, Pflug G (2000) Value at risk in portfolio optimization: properties and computational approach. NTNU, Department of Industrial Economics and Technology Management, Working paper

    Google Scholar 

  • Haddad R (2016) Apprentissage supervisé des données symboliques et adaptation aux données massives et distribuées. Thèse de doctorat, Université Paris 9 Dauphine, France

    Google Scholar 

  • Kim J, Billard L (2018) Double monothetic clustering for histogram-valued data. Communications for Statistical Applications and Methods 25:263–274

    Article  Google Scholar 

  • Korzeniewski J (2018) Efficient stock portfolio construction by means clustering. Folia Oeconomica 1(333)

    Google Scholar 

  • Krokhmal P, Palmquist J, Uryasev S (2002) Portfolio optimization with conditional value-at-risk criterion. J Risk 4(2)

    Google Scholar 

  • Le-Rademacher J, Billard L (2013) Principal component histograms from interval-valued observations. Comput Stat 28:2117–2138

    Article  Google Scholar 

  • Markowitz (1952) Portfolio selection. J Financ 7(1):77–91

    Google Scholar 

  • Marvin K (2015) Creating diversified portfolios using cluster analysis. WP, Princeton University

    Google Scholar 

  • Medova E (1998) VAR methodology and the limitation of catastrophic or unquantifiable risk. VII International Conference on Stochastic Programming, the University of British Columbia, Vancouver, Canada

    Google Scholar 

  • Pasha SA, Leong PHW (2013) Cluster analysis of high-dimensional high-frequency financial time series. IEEE Conference on Computational Intelligence for Financial Engineering & Economics

    Google Scholar 

  • Pflug GC (2000) Some remarks on the value-at-risk and the conditional value-at-risk. In: Uryasev SP (ed) Probabilistic constrainted optimization: methodology and applications. Kluwer, Norwell, pp 278–287

    Google Scholar 

  • Ren Z (2005) Portfolio construction using clustering methods. Worcester Polytechnic Institute, Worcester

    Google Scholar 

  • Rockafellar RT, Uryasev S (2000) Optimization of conditional value-at-risk. J Risk 2:21–41

    Article  Google Scholar 

  • Rockafellar RT, Uryasev S (2002) Conditional value at risk for general loss distribution. J Bank Financ 26(7):1443–1471

    Article  Google Scholar 

  • Rosen F (2006) Correlation based clustering of the Stockholm Stock Exchange. WP, Stockholm University

    Google Scholar 

  • Toque C, Terraza V (2013) Histogram-valued data on value at risk measures: a symbolic approach for risk attribution. Appl Econ Lett 21(17):1243–1251

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Virginie Terraza .

Editor information

Editors and Affiliations

Appendix

Appendix

  • Period 2010–2012

Fig. A1
figure 10

Intra-class inertia for a number of classes between 2 and 8

Table A1 The symbolic data table for the four classes
Table A2 Funds by cluster
  • Period 2013–2014

Fig. A2
figure 11

Intra-class inertia for a number of classes between 2 and 8

Table A3 The symbolic data table for the three classes
Table A4 Funds by cluster
  • Period 2015–2016

Fig. A3
figure 12

Intra-class inertia for a number of classes between 2 and 8

Table A5 The symbolic data table for the five classes
Table A6 Funds by cluster

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Terraza, V., Toque, C. (2021). Cluster Analysis for Investment Funds Portfolio Optimisation: A Symbolic Data Approach. In: Zopounidis, C., Benkraiem, R., Kalaitzoglou, I. (eds) Financial Risk Management and Modeling. Risk, Systems and Decisions. Springer, Cham. https://doi.org/10.1007/978-3-030-66691-0_5

Download citation

Publish with us

Policies and ethics