Abstract
In risk management and portfolio optimization it is important to know which assets move individually or in certain groups to make a diversified portfolio. The statistical uncertainty of the correlation matrix is the main problem into the optimization of a financial portfolio. Indeed, estimates of correlations are often noisy particularly in stress period and unreliable as estimation horizons are always finite. Another drawback in the classical estimation of correlations is that time series are estimated on historical data and prediction based on past data is very difficult, since finding elementary structures in data which are valid and persistent in the future is not really easy. The Markowitz optimization approaches of portfolio suffer from theses estimation errors. From the perspective of machine learning, new approaches have been proposed in the literature of applied finance. Among these techniques, clustering has been considered as a significant method to capture the natural structure of data. The objective of this research is to use data mining approaches for identifying the best clustering indicators for building optimal portfolios. Clustering is an empirical procedure for grouping financial assets into homogeneous groups. The aim of cluster analysis is to maximize similarity within groups of assets and minimize similarity between groups. The similarities and dissimilarities are based on the attribute values and frequently involve distance measures. There are different techniques used for clustering, some are Partitioning based technique, Density based technique, Model based technique, Grid based technique. In this research we consider the symbolic approach based histogram-valued data and clusters as a new approach for investment funds portfolio optimization. Firstly, it is based on aggregating individual level data into group-based summarized by symbols. In our case, symbols are histogram-valued data taking into account variability inside groups. Secondly, for partitioning, we use dynamical clustering which is an extension of K-means where, instead of the means, we use other kinds of centers called ‘kernel’ distributions in our case. After clustering, stock samples are selected from these clusters to create funds of funds optimal portfolios which impose the lowest risk measured in terms of Conditional Value at Risk for a certain return. Funds’ Portfolios are compared during the period of 2008–2016 using the conditional Sharpe ratio and the 2017 year is used to validate our results out of sample. In this research we show that the use of symbolic data clustering algorithms can improve the reliability of the portfolio in terms of the risk adjusted performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Acerbi, Tasche (2002) On the coherence of expected shortfall. J Bank Financ 26(7):1487–1503
Afonso F, Diday E, Toque C (2018) Data science par analyse des données symboliques. Technip, 448 pages. ISBN: 9782710811817
Argawal, Naik (2004) Risks and portfolio decisions involving hedge funds. Rev Financ Stud 17(1):63–98
Artzner P, Delbaen F, Eber J-M, Heath D (1997) Thinking coherently. Risk 10:68–71
Basak and Shapiro (2001) Value at Risk based management: optimal policies and asset prices. Review of Financial Studies 14(2):371–405
Basak S, Shapiro A (1998) Value-at-risk based management: optimal policies and asset prices. Working paper, Wharton School, University of Pennsylvania
Billard L, Diday E (2007) Symbolic data analysis: conceptual statistics and data mining (Wiley series in computational statistics). Wiley, Hoboken
Billard L, Diday E (2019) Clustering methodology for symbolic data. Wiley, Hoboken, p 288
Bock HH, Diday E (2000) Analysis of symbolic data. Exploratory methods for extracting statistical information from complex data. Springer, Berlin
Brito P, Chavent M (2012) Divisive monothetic clustering for interval and histogram-valued data. In: Proceedings ICPRAM 2012-1st international conference on pattern recognition applications and methods, Vilamoura, Portugal
Calinski T, Harabasz J (1974) (1974). A dendrite method for cluster analysis. Commun Stat 3:1–27
Dias S, Brito P (2015) Linear regression model with histogram-valued variables. Stat Anal Data Min 8(2):75–113
Diday E (1971) La méthode des nuées dynamiques. Revue de Statistique Appliquée 19:19–34
Diday E (1988) The symbolic approach in clustering and related methods of data analysis: the basic choices. In: Bock HH (ed) IFCS ‘87, vol 1988, pp 673–684
Diday E (2010) Principal component analysis for categorical histogram, data: some open directions of research. In: Fichet B, Piccolo D, Verde R, Vichi M (eds) Classification and multivariate analysis for complex data structures. Springer Verlag, Heidelberg, p 492. ISBN 9783642133114
Diday E (2013) Principal component analysis for bar charts and metabins tables. Stat Anal Data Min ASA Data Sci J 6(5):403–430
Diday E (2016) Thinking by classes in Data Science: the symbolic data analysis paradigm. WIREs Comput Stat 8:172–205. https://doi.org/10.1002/wics.1384
Diday E, Noirhomme-Fraiture M (2008) Symbolic data analysis and the SODAS software. Wiley-Interscience, New York
Diday E, Simon JC (1976) Clustering analysis. In: Fu K (ed) Digital pattern classification. Springer, Berlin
Elton EJ, Gruber MJ, Brown SJ, Goetzman WN (2007) Modern portfolio theory and investment analysis, 7th edn. Wiley, New York
Emilion R, Diday E (2018) Symbolic data analysis basic theory. In: Saporta, Wang, Diday, Rong Guan (eds) Chapter in Advances in data sciences. ISTE-Wiley
Gaivoronski AA, Pflug G (2000) Value at risk in portfolio optimization: properties and computational approach. NTNU, Department of Industrial Economics and Technology Management, Working paper
Haddad R (2016) Apprentissage supervisé des données symboliques et adaptation aux données massives et distribuées. Thèse de doctorat, Université Paris 9 Dauphine, France
Kim J, Billard L (2018) Double monothetic clustering for histogram-valued data. Communications for Statistical Applications and Methods 25:263–274
Korzeniewski J (2018) Efficient stock portfolio construction by means clustering. Folia Oeconomica 1(333)
Krokhmal P, Palmquist J, Uryasev S (2002) Portfolio optimization with conditional value-at-risk criterion. J Risk 4(2)
Le-Rademacher J, Billard L (2013) Principal component histograms from interval-valued observations. Comput Stat 28:2117–2138
Markowitz (1952) Portfolio selection. J Financ 7(1):77–91
Marvin K (2015) Creating diversified portfolios using cluster analysis. WP, Princeton University
Medova E (1998) VAR methodology and the limitation of catastrophic or unquantifiable risk. VII International Conference on Stochastic Programming, the University of British Columbia, Vancouver, Canada
Pasha SA, Leong PHW (2013) Cluster analysis of high-dimensional high-frequency financial time series. IEEE Conference on Computational Intelligence for Financial Engineering & Economics
Pflug GC (2000) Some remarks on the value-at-risk and the conditional value-at-risk. In: Uryasev SP (ed) Probabilistic constrainted optimization: methodology and applications. Kluwer, Norwell, pp 278–287
Ren Z (2005) Portfolio construction using clustering methods. Worcester Polytechnic Institute, Worcester
Rockafellar RT, Uryasev S (2000) Optimization of conditional value-at-risk. J Risk 2:21–41
Rockafellar RT, Uryasev S (2002) Conditional value at risk for general loss distribution. J Bank Financ 26(7):1443–1471
Rosen F (2006) Correlation based clustering of the Stockholm Stock Exchange. WP, Stockholm University
Toque C, Terraza V (2013) Histogram-valued data on value at risk measures: a symbolic approach for risk attribution. Appl Econ Lett 21(17):1243–1251
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
-
Period 2010–2012
-
Period 2013–2014
-
Period 2015–2016
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Terraza, V., Toque, C. (2021). Cluster Analysis for Investment Funds Portfolio Optimisation: A Symbolic Data Approach. In: Zopounidis, C., Benkraiem, R., Kalaitzoglou, I. (eds) Financial Risk Management and Modeling. Risk, Systems and Decisions. Springer, Cham. https://doi.org/10.1007/978-3-030-66691-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-66691-0_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66690-3
Online ISBN: 978-3-030-66691-0
eBook Packages: Business and ManagementBusiness and Management (R0)