Abstract
In this paper, we propose a new clustering procedure for financial instruments. Unlike the prevalent clustering procedures based on time series analysis, our procedure employs the jump tail dependence coefficient as the dissimilarity measure, assuming that the observed logarithm of the prices/indices of the financial instruments are embedded into multidimensional Lévy processes. The efficiency of our proposed clustering procedure is tested by a simulation study. Finally, with the help of the real data of country indices we illustrate that our clustering procedure could help investors avoid potential huge losses when constructing portfolios.
Similar content being viewed by others
Notes
The data set consists of the country indices of the following 14 countries: Australia, Belgium, Canada, France, Germany, Japan, China, India, Koera, Malaysia, Mexico, Russia, Brazil, Chile.
References
Baragona R (2001) A simulation study on clustering time series with metaheuristic methods. Quaderni di Statistica 3:1–26
Billio M, Caporin M (2009) A generalized dynamic conditional correlation model for portfolio risk evaluation. Math Comput Simul 79(8):2566–2578. https://doi.org/10.1016/j.matcom.2008.12.011, http://linkinghub.elsevier.com/retrieve/pii/S0378475408004138
Billio M, Caporin M, Gobbo M (2006) Flexible dynamic conditional correlation multivariate GARCH models for asset allocation. Appl Financ Econ Lett 2(2):123–130. https://doi.org/10.1080/17446540500428843
Brockwell PJ, Davis RA (eds) (2002) Introduction to time series and forecasting. Springer texts in statistics. Springer, New York. https://doi.org/10.1007/b97391
Calinski T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat Theory Methods 3(1):1–27. https://doi.org/10.1080/03610927408827101
Cont R, Tankov P (2003) Financial modelling with jump processes, vol 2. Financial mathematics series. Chapman and Hall/CRC, Boco Raton. https://doi.org/10.1201/9780203485217
De Luca G, Zuccolotto P (2011) A tail dependence-based dissimilarity measure for financial time series clustering. Adv Data Anal Classif 5(4):323–340. https://doi.org/10.1007/s11634-011-0098-3
Dobrić J, Schmid F (2005) Nonparametric estimation of the lower tail dependence \(\lambda _L\) in bivariate copulas. J Appl Stat 32(4):387–407. https://doi.org/10.1080/02664760500079217
Durante F, Jaworski P (2010) Spatial contagion between financial markets: a copula-based approach. Appl Stoch Models Bus Ind 26(5):551–564. https://doi.org/10.1002/asmb.799
Durante F, Pappadà R, Torelli N (2014) Clustering of financial time series in risky scenarios. Adv Data Anal Classif 8(4):359–376. https://doi.org/10.1007/s11634-013-0160-4
Durante F, Pappadà R, Torelli N (2015) Clustering of time series via non-parametric tail dependence estimation. Stat Pap 56(3):701–721. https://doi.org/10.1007/s00362-014-0605-7
Embrechts M, Arciniegas F, Ozdemir M, Momma M (2001) Scientific data mining with StripMiner/sup TM/. In: SMCia/01. In: Proceedings of the 2001 IEEE mountain workshop on soft computing in industrial applications (Cat. No.01EX504), IEEE, pp 13–16, https://doi.org/10.1109/SMCIA.2001.936721, http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=936721
Engle R (2002) Dynamic conditional correlation: a simple class of multivariate generalized autoregressive conditional Heteroskedasticity models. J Bus Econ Stat 20(3):339–350
Engle R, Sheppard K (2001) Theoretical and empirical properties of dynamic conditional correlation multivariate GARCH. Tech. Rep., National Bureau of Economic Research, Cambridge, MA, https://doi.org/10.3386/w8554, http://www.nber.org/papers/w8554.pdf
Grothe O (2013) Jump tail dependence in lévy copula models. Extremes 16(3):303–324
Grothe O, Hofert M (2015) Construction and sampling of archimedean and nested archimedean lévy copulas. J Multivar Anal 138:182–198
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Jing BY, Kong XB, Liu Z (2012) Modeling high-frequency financial data by pure jump processes. Ann Stat 40:759–784
Kallsen J, Tankov P (2006) Characterization of dependence of multidimensional Lévy processes using Lévy copulas. J Multivar Anal 97(7):1551–1572. https://doi.org/10.1016/j.jmva.2005.11.001, http://linkinghub.elsevier.com/retrieve/pii/S0047259X05002022
Kaufman L, Rousseeuw PJ (eds) (1990) Finding groups in data: an introduction to cluster analysis. Wiley Series in Probability and Statistics. Wiley, Hoboken. https://doi.org/10.1002/9780470316801
Kyprianou A (2006) Introductory lectures on fluctuations of Lévy processes with applications. Springer, Berlin
Madan DB, Carr PP, Chang EC (1998) The variance gamma process and option pricing. Eur Finance Rev 2(1):79–105
Meilă M, Pentney W (2007) Clustering by weighted cuts in directed graphs. Society for Industrial and Applied Mathematics, Philadelphia, pp 135–144, copyright - Copyright Society for Industrial and Applied Mathematics 2007; Last updated - 2012-05-15
Nelsen RB (2006) An introduction to Copulas. Springer series in statistics. Springer, New York. https://doi.org/10.1007/0-387-28678-0
Nelsen RB (2007) An introduction to Copulas. Springer, Berlin
Poirot J, Tankov P (2007) Monte Carlo option pricing for tempered stable (CGMY) processes. Asia-Pacific Financ Mark 13(4):327–344. https://doi.org/10.1007/s10690-007-9048-7
Schmidt R, Stadtmüller U (2006) Non-parametric estimation of tail dependence. Scand J Stat 33(2):307–335. https://doi.org/10.1111/j.1467-9469.2005.00483.x
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905. https://doi.org/10.1109/34.868688
Sklar A (1959) Fonctions de répartition à \(n\) dimensions et leurs marges. Publications de l’Institut de Statistique de L’Université de Paris 8:229–231
Tankov P (2003a) Dependence structure of spectrally positive multidimensional lévy processes. Unpublished manuscript
Tankov P (2003b) Financial modelling with jump processes, vol 2. CRC Press, Boca Raton
Tankov P (2006) Simulation and option pricing in lévy copula models. Mathematical Modelling of Financial Derivatives, IMA Volumes in Mathematics and Applications, Springer
Tankov P (2016) Lévy copulas: review of recent results. In: The fascination of probability, statistics and their applications, Springer, pp 127–151
Wagner S, Wagner D (2007) Comparing clusterings: an overview. Universität Karlsruhe, Fakultät für Informatik Karlsruhe
Acknowledgements
The authors are indebted to two anonymous reviewers for comments and suggestions that improved the paper. Jiang Wu is grateful to the support from MOE (Ministry of Education in China) Project of Humanities and Social Sciences (Project No. 10YJC790280).
Author information
Authors and Affiliations
Corresponding author
Appendices
A The simulation procedure of multidimensional Lévy processes
The simulation procedure of multidimensional Lévy processes heavily relies on one key lemma and the improved algorithm given by Tankov (2006b).
Lemma 1
(Tankov (2006b)) Let \(F(u_1,\dots ,d_d)\) be a d-dimensional Lévy copula and
for nonempty \(I\subseteq \{1,2,\dots ,d\}\). Define \(F_{u_1}(u_2,\dots ,u_d)\) as
where \(\zeta \) shares the same sign of \(u_1\). There exists a set \(N\subseteq \mathbb {R}\) of zero Lebesgue measure such that for \(u_1\in \mathbb {R}\backslash N\), \(F_{u_1}(u_2,\dots ,u_d)\) is a probability distribution of \((u_2,\dots ,u_d)\) conditional on the first element equals to \(u_1\).
Theorem 2
(Tankov (2006b)) Let \(v(\cdot )\) be a Lévy measure on \(\mathbb {R}^d\) satisfying
with marginal tail integrals \(U_i\), \(i=1,\dots ,d\) and Lévy copula \(F(u_1,\dots ,u_d)\) such that (10) is satisfied, and \(F_{u_i}(\cdot )\) be the corresponding conditional distributions.
Fix a truncation level \(\tau \). Let \(\{V_k\}_{k\ge 1}\) and \(\{W_k^i\}_{k\ge 1}^{1\le i\le d}\) be independent sequences of i.i.d. standard uniform random variables. Introduce \(d^2\) random sequences \(\{\gamma _{k}^{ij}\}_{k\ge 1}^{1\le i,j\le d}\), independent from \(\{V_k\}_{k\ge 1}\) and \(\{W_k^i\}_{k\ge 1}^{1\le n\le d}\) such that
-
For \(i=1,\dots ,d\), \(\sum _{k=1}^{\infty }\varDelta _{\{\gamma _{k}^{ii}\}}\) are independent Poisson random measures on \(\mathbb {R}\) with Lebesgue intensity measures.
-
Conditional on \(\gamma _k^{ii}\), the random vector \((\gamma _k^{i1},\dots ,\gamma _k^{i,i-1},\gamma _k^{i,i+1},\dots ,\gamma _k^{id})\) is distributed on \(\mathbb {R}^{d-1}\) with law \(F_{\gamma _k^{ii}}(\cdot )\).
For each \(k\ge 1\) and each \(i=1,\dots ,d\),let \(n_k^i=\#\{j=1,\dots ,d: |\gamma _k^{ij}|\le \tau \}\). Then the process \(\{Z_t^{\tau }\}_{0\le t\le 1}\) with components
is a Lévy process on [0, 1] with characteristic function
where \(<\cdot>\) represents the inner product and
The above theorem provides a robust algorithm of simulating multidimensional Lévy processes given the Lévy copula; however, it still requires an efficient algorithm to sample the random vector \((\gamma _k^{i1},\dots ,\gamma _k^{i,i-1},\gamma _k^{i,i+1},\dots ,\gamma _k^{id})\) given the cumulative distribution function \(F_{\gamma _k^{ii}}(\cdot )\). While it remains unknown that whether there exists such an algorithm in general, for the particular case of Lévy-Clayton copula considered in our simulation study we discover a method serving this purpose well.
For the two-directed Lévy-Clayton Copula F defined in (9), we have
whenever \(u_1=0\). Hence by denoting \(I_{d-1}:=(-\infty ,u_2]\times \cdots \times (-\infty ,u_d]\in \mathcal {B}(\mathbb {R}^{d-1})\), we have
Thus given the signs of \(u_1,u_2,\ldots ,u_n\), interchanging the partial differentiation and the integrals yields
where
Notice that according to Lemma 1, \(F_{u_1}\) is a probability distribution function. In fact, a random vector having the CDF \(F_{u_1}\) is a mixture of \(2^{d-1}\) random vectors with the following joint CDF:
for all \(i=2,\ldots ,d\). Hence to construct such a random vector, consider a random sign vector \(\begin{pmatrix} S_2&\cdots&S_d\end{pmatrix}^{\intercal }\in \{-1,1\}^{d-1}\) satisfying
for arbitrary \(s_2,\ldots ,s_d\in \{-1,1\}\). Thus by generate \(\begin{pmatrix} Y_2&\cdots&Y_d\end{pmatrix}^{\intercal }\) from \(G_{u_1}\) independent of \(\begin{pmatrix} S_2&\cdots&S_d\end{pmatrix}^{\intercal }\), the random vector \(\begin{pmatrix} S_2Y_2&\cdots&S_dY_d\end{pmatrix}^{\intercal }\) has joint CDF \(F_{u_1}\).
The remaining key point to generate random vectors from \(F_{u_1}\) is to generate random vectors from \(G_{u_1}\). Notice that \(|u_1|\) plays a role of scale parameter in \(G_{u_1}\), we may only consider the cumulative distribution function \(G_1\). Since the copula uniquely associated with \(G_1\) is the Clayton copula
a random vector from \(G_1\) could be written as \(\begin{pmatrix} G^{-1}_{1,2}(U_2)&\cdots&G^{-1}_{1,d}(U_d)\end{pmatrix}^{\intercal }\) where \(\begin{pmatrix} U_2&\cdots&U_d\end{pmatrix}^{\intercal }\) is generated from the Clayton copula C and
for \(i=2,\ldots ,d\) are the marginal distributions of \(G_1\) (all of these marginal distributions are Dagum distributions of type I). Therefore, \(\begin{pmatrix} Y_2&\cdots&Y_d\end{pmatrix}^{\intercal } = \begin{pmatrix} |u_1|G^{-1}_{1,2}(U_2)&\cdots&|u_1|G^{-1}_{1,d}(U_d)\end{pmatrix}^{\intercal }\) is a random vector from \(G_{u_1}\). Consequently, the simulation algorithm could be summarized as follows:
B Details of the clustering algorithm
C Descriptive statistics of the log-returns of the dataset
See Table 9.
Rights and permissions
About this article
Cite this article
Yang, C., Jiang, W., Wu, J. et al. Clustering of financial instruments using jump tail dependence coefficient. Stat Methods Appl 27, 491–513 (2018). https://doi.org/10.1007/s10260-017-0411-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-017-0411-1