Dynamics of HDI Index: Temporal Dependence Based on D-vine Copulas Model for Three-Way Data

The HDI (Human Development Index) is a widely used index based on the average of measures of health, education, and income. It assesses the progress of countries worldwide. The publicly available data set associated with the HDI can be seen as a table with 3 dimensions (three-way table): countries, indexes regarding progress, and years (from 2010 to 2018). Thus, modeling the serial dependence structure of this type of intricate three-way tables is a challenge. D-vine copulas are a special class of multivariate copulas that are particularly suited for modeling serial dependence. This work aims to assess the evolution of the dependence relationship between the indexes of the HDI data set over time through D-vine copulas, which has not been fully used before in the area, as far as we are concerned. We tested our approach to European and African countries and compare their results.


Introduction
The human development approach is based on expanding the richness of human life, rather than simply the richness of the economy in which human beings live. It is an approach that is focused on people and their opportunities and choices (see http:// hdr. undp. org/ en/ human dev). There are well-known indexes to measure human development in terms of economic factors such as the Gross Domestic Product (GPD) per capita, the index of innovation and ICT index. However, it is more difficult to capture the human development in terms of noneconomic factors (McGillivray 2005). There have been several efforts to describe human welfare based on non-economic facts. The probably more used and accepted is the Human Development Index (HDI). It was introduced in the first Human Development Report, presented by the United Nations Development Program in 1990 (Undp 1990). The use of the HDI and its association with other index has been largely studied. For instance, Alijanzadeh, Asefzadeh, and Moosaniaye (2016) studied the association between the HDI and the infant mortality rate, Almasi-Hashiani et al. (2016) analyzed the association between the HDI and mortality rates in other ages and Yakunina and Bychkov (2015) computed the correlation among the HDI and other indexes such as the index of innovation, ICT index, Gross national income and life expectancy at birth (Alijanzadeh et al. 2016).
The HDI is based on the average over three main pillars: health, education, and income. Health is measured by life expectancy. Education is based on mean years of schooling for adults aged 25 and expected years of schooling for children of school age. Finally, income is based on the GDP per capita (see please here for technical notes of the computation of the HDI: http:// hdr. undp. org/ sites/ defau lt/ files/ hdr20 20_ techn ical_ notes. pdf). There is an assumption of minimum values for each of the variables. Thus, 0 years is assumed for education, 20 years for life expectancy, and 163 US dollars for income. The HDI provides a measure ranging between 0 and 1. Thus, the higher is the score the more is human development. For instance, Germany and New Zealand scored the highest on the HDI in 2018, with 0.946 and 0.923, respectively. On the other hand, Niger and Eritrea had the lowest HDI with 0.247 and 0.269, respectively.
The HDI data set (we will refer to the data set in this way in this article from now on) can be seen as a table with 3 dimensions (three-way table, see Fig. 1). A first dimension (rows) is described by 188 countries. The second dimension (columns) would be the 4 indexes: the HDI, life expectancy, income and education. The third dimension (layers) would be the years ranging from 2010 to 2018. The first two dimensions, i.e., countries and indexes are labeled as the domain indicators. Additionally, we cannot assume normality in the variables regarding the indexes. Moreover, the variable time (years) is involved in the data and, therefore, we deal with non-normal multivariate longitudinal data, which is quite common in this type of framework.
As we described above, the HDI index is formulated as a linear combination of the other three indexes. Therefore, one of the main challenges of working with this type of intricate 3-way data sets is dealing with modeling a very complex dependence structure. Modeling dependence structures has become more and more popular in all areas of applications over the last two decades. Particularly, copulas have gained large popularity since they allow to model marginal distributions and the dependence structure separately (Sklar 1959). Consequently, copulas were also applied for modeling longitudinal data. This approach was first used by Lambert and Vandenhende (2002) who developed a model for longitudinal data, where the dependence is described by the copula, although their work only applied the Gaussian copula. Shen and Weissfeld (2006) modelled serial dependence for continuous  In the  HDI data set, N represents countries, R represents indexes, and  T represents years, from 2010  to 2018 longitudinal data with a non-ignorable non-monotone missing-data process using a Gaussian copula. Other examples are Lindsey (1999), who used the Gaussian copula among other multivariate models with correlation matrices for non-linear repeated measurements. Further, Sun et al. (2008) argued that elliptical copulas are better suited than Archimedean copulas (Joe 1997;Nelsen 2007) for modeling serial dependence in the context of longitudinal data.
D-vine copulas are a special class of multivariate copulas that are particularly suited for modeling serial dependence (Bedford and Cooke 2002;Aas et al. 2009). Smith et al. (2010) used them to model longitudinal data in a Bayesian approach. Multivariate time series are considered in Smith (2015) and Nai Ruscone and Osmetti (2016). In Joe (2014) (Chapter 7.5) discrete longitudinal count data are modeled using D-vines. Shi and Yang (2018) used a mixed D-vine to model semi-continuous longitudinal claims. The main advantage of this approach is to be able to overcome the problem of the multivariate copulas (Nelsen 2007), which suffer from rather inflexible structures in high dimensions. Therefore, it is a reasonable approach to model our data.
The main aim of this article is to assess the evolution of the dependence relationship between the domain indicators (i.e., countries, indexes) over time (years) through a copula method. Thus, we present a model based on D-vine for longitudinal data, which allows us to represent a unified and general strategy for jointly modeling the variable and the temporal dependence. This approach allows us to get insights into the data via a new flexible way of modeling the dependence between the components of the non-normal multivariate longitudinal data.
The paper is organized as follows: Sect. 2 briefly introduces the D-vine copula model approach for a three-way data set. The dependence relationships between domain indicators over time in the HDI data set are described in Sect. 3. Finally, Sect. 4 contains the conclusions and an outlook on future research.

D-vine-based dependence model: a brief review
As mentioned in the previous section, we will apply a D-vine copula model approach to the HDI data set. In this section, we briefly introduce this kind of model. For a more thorough introduction, we refer to Stöber and Czado (2012) and Joe (2014). Sects. 2.1 and 2.2 introduce the general setting of copula and Vine copula, respectively. The particular case of characterizing the dependence relationship for a three-way data set is described in Sect. 2.3.

Copulas
Copulas provide a great deal of flexibility in modeling multivariate distributions, allowing the researcher to specify the models for the marginal distributions separately from the dependence structure (copula) that links them to form a joint distribution. In addition to flexibility, this helps to expose and understand the various fallacies associated with correlation. The mathematical description of the copula is as follows: let F be a d-dimensional distribution function of the random vector = (X 1 , ..., X d ) with univariate marginals F 1 , ..., F d . A d-variate copula is a multivariate cumulative distribution function (cdf) C ∶ [0, 1] d → [0, 1] with d ∈ N (with N set of natural numbers) and a specific set of uniformly distributed marginals on the interval [0, 1], i.e. U(0, 1). The Sklar's Theorem (Sklar 1959) provides a link between an arbitrary joint distribution and its marginal distributions and dependence structure. This result has been very important for applications since it allows the marginals and the dependence structure to be modeled separately.
The Sklar's theorem (Sklar 1959) states that every multivariate distribution F with marginals F 1 , ..., F d can be written as: for some appropriate d-dimensional copulas C. For an absolutely continuous F with strictly increasing continuous marginals F 1 , ..., F d the joint density function f is given by: where c(⋅) denotes the copula density. More details on copulas theory can be found in Nelsen (2007) and Joe (1997) contains a thorough overview of copulas.
While there is exhaustive literature on bivariate copulas families for the bidimensional case (see Table 1 and Fig. 2), their extension to the multivariate case is not straightforward (Joe 1997). Standard multivariate copulas, such as elliptical or Archimedean (Nelsen 2007), lack of flexibility and accuracy in modeling the dependence structure in a higher dimension. These multivariate extensions imply additional restrictions on the parameters that limit their flexibility. To overcome these problems, vine copulas are proposed by (Joe 1997) and developed in more detail by several authors (see please e.g. Bedford and Cooke 2001a, b;Kurowicka and Cooke 2006;Aas et al. 2009;Berg and Aas 2009).

Vine copulas
Vine copulas (Bedford and Cooke 2002) is a rich and flexible class of multivariate copulas based on the idea of Joe (1997) to decompose the dependence into a cascade of dependencies between (unconditional/conditional) pairs and permits to construct flexible highdimensional copulas by using only bivariate copulas as a building blocks (see please Aas et al. (2009)).
Bedford and colleagues (Bedford and Cooke 2002) introduced a graphical model, called regular vine (R-vine), that organizes all valid decompositions and Kurowicka and Cooke (2006) described it in more detail. It involves the specification of a sequence of trees where each edge corresponds to bivariate copulas, the so-called pair-copulas. Then these paircopulas constitute the building blocks of the joint regular vine distribution.
( Table 1 Four copulas functions and ranges of the dependent parameters. Note that x 1 and x 2 are uniform from [0, 1], Φ G (x 1 , x 2 ) is the standard bivariate normal distribution with correlation parameter , Φ is the distribution function of the standard normal distribution
• (proximity condition) If two edges in tree T i are to be joined by an edge in tree T i+1 they must share a common node.
To develop a statistical model on R-vine trees with the node set ℕ ∶= {N 1 , ..., N d−1 } and the edge set ∶= {E 1 , ..., E d−1 } , one associates each edge e = j(e) , k(e)|D(e) in E i with a bivariate copula density c j(e),k(e)|D(e) . The nodes j(e) and k(e) are called the conditioned nodes, while D(e) is the conditioning set. More generally, a d-dimensional R-vine is a set of d − 1 trees such that the first tree comprises d nodes, identifying d − 1 pairs of variables and also d − 1 corresponding edges. Therefore each subsequent tree is derived such that all the edges of the tree T i turn into nodes of the tree i + 1 ; furthermore, two edges in T i , which become nodes in T i+1 are joined by an edge in T i+1 only if these edges share a common node in T i . In Fig. 3 a graphical representation of a regular vine tree sequence of five variable is given. The node in the first tree represent the random variables X 1 , X 2 , ..., X 5 . The edges are identified with a bivariate copulas (called pair-copulas), where edge (j(e), k(e)) described the dependence between a five-dimensional R-vine copulas with five random variables, four trees and ten edges. The nodes in the first tree correspond to the five random variables that are being modeled and each edge corresponds to a bivariate unconditional or conditional pair-copulas X j(e) and X k(e) . In the second tree, the nodes are just the edges of the first tree. The edges are annotated by (j(e), k(e)|D(e)) 1 and describe the dependence between X j(e) and X k(e) conditional on X D(e) . In subsequent trees, the number of conditioning variables increases.
In Theorem (4.2) of Kurowicka and Cooke (2006) it is proven that the joint density of is uniquely determined and given by where x D(e) denotes the sub-vector of x = (x 1 , ..., x d ) � determined by the indices in D(e). The rightmost part of equation (3), which involves the product of d marginal densities f k and d(d − 1)∕2 bivariate copulas densities evaluated at the conditional distribution functions F(⋅|⋅) , is called an R-vine copulas.
One well-known specifications of such representations of R-vine were identified by (Bedford and Cooke 2002): the drawable vine ( D-vine, in short). Particularly, an R-vine is called a D-vine if each node in T 1 has a degree at most 2, where the degree of a node denotes the number of connections or edges the node has to other nodes (path structure).
For distinct indices i, j, i 1 , ..., i k with i < j and i 1 < ... < i k we use the abbreviation Using this notation the D-vine density is given by See Aas et al. (2009) for more on D-vine copulas. Fig. (4) depicts the hierarchical nature of the R-vine copulas by plotting the tree structure of D-vine model.
Fitting an R-vine copulas specification to a given datatset requires the following separate tasks: (a) selection of the R-vine (structure), i. e., selecting which unconditioned and conditioned pair to use, (b) choice of a bivariate copulas family for each pair selected in (a), (c) estimation of the corresponding parameters for each copula.
Since all the three step are needed for an R-vine copulas specification, one way of finding the "best" model is to accomplish steps (b) and (c) for all the possible R-vine constructions.
To select one possible R-vine for a given dataset it is necessary to decide for which pairs of variables we want to specify copulas. We proceed sequentially, starting by defining the first tree T 1 = (N 1 , E 1 ) for the R-vine, continuing with the second tree, and so on. The trees are selected in such a way that the chosen pairs model the strongest pairwise dependencies present. The copulas families specified in the first tree of the R-vine have the greatest influence on the model fit. Later, we will refer to this method as the sequential method. This sequential approach is reasonable because the copulas families specified in the first tree of the R-vine have the greatest influence on the model fit (Joe et al. 2010). Similarly, D-vines are also constructed by choosing a specific order of the variables (time ordered). Then in the first tree, the dependence of the first and second variable, of the second and third, of the third and fourth, and so on, is modeled using pair-copulas, i.e., if we assume the order 1, 2, ...d, we model the pairs (1, 2), (2, 3), (3, 4), etc. In the second tree, conditional dependence of the first and third given the second variable (the pair (1, 3|2)), the second and fourth given the third (the pair (2, 4|3)), and so on, is modeled. In the same way, pairwise dependencies of two variables a and b are modeled in subsequent trees conditioned on those variables which lie between the variables a and b in the first tree, e.g., the pair (1, 5|2, 3, 4). That is each D-vine tree has a path structure. Given the tree model structure we used the Dißann's algorithm Dißmann et al. (2013) to fit the pair-copulas families and parameters. The bivariate copulas for each pair is selected based on the Akaike Information Criterion (AIC) (Akaike 1998) and the copulas selection is performed sequentially (Dißmann et al. 2013) from the lower to the higher trees. Dißmann's algorithm starts with the estimation of the first tree and estimates the unconditional pair-copulas (and their parameters) via maximum-likelihood estimation. Then the observations are transformed into pseudo-observations needed for the estimation of the second tree using the estimated pair-copulas of tree 1. Continuing this way the vine is built up tree-by-tree.

Application: D-vine for the HDI data set
The empirical evidence shows that normality is not a rule in practice, thus alternatives to the multivariate normal model must be found. With the aim of relaxing the normality assumption, the procedure based on D-vine copulas theory will be applied here to infer the dependence relationships between the HDI indexes over the years (2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018).
This relationship follows dynamics that are not homogeneous over the times and among the HDI indicators as they are characterized by a structurally different underlying phenomena. The D-vine approach allows the relaxation of the assumption of multi-normality that is typical of the usual model for multivariate longitudinal data and it can easily accommodate complex dependence structures such as asymmetric dependence or strong joint tail behavior Joe et al. (2010).
Let's suppose the existence of dependence among the HDI indexes invariant over time. Therefore, the change over time of the distribution of the indexes is due only to a change of the marginal conditional (to the past) distributions of each index and not to the change of the dependence structure across the indexes. The proposed model considers two different levels of analysis. In the first level, a multivariate copulas describes the relations of the indexes observed at a specific time. In the second level of analysis, each longitudinal series, which corresponds to a given index over time, is modeled separately using a pair copulas decomposition to relate the distributions of the variables describing the observation given in different times. The model that we consider starts from a given C a multivariate copulas with parameter of the multivariate index variable such that the joint cumulative distribution function (cdf) is and the joint density function (df) is ( (1) , (2) , ..., (R) ) (6) F( (1) , (2) , .., (R) ) = C (F 1 ( (1) ), F 2 ( (2) ), ..., F R ( (R) )) Since longitudinal data are considered each index is observed over times. Therefore, for each index (r) we observe n independent observations on a dependent time series vector ( ) = (y (r) 1 , y (r) 2 , ..., y (r) T ) � We model each continuous process which generates the longitudinal data using a pair copulas decomposition ( as in (Smith and Khaled 2012), (Nai Ruscone and Osmetti 2016)). In so doing we decompose the distribution of the process at point in time, conditional to the past, into the product of a sequence of bivariate copulas density and marginal density. The advantage is that the marginal distribution of the process at each point can be modeled arbitrarily, while the dependence over time is capture by a sequence of bivariate copulas.
Let (y 1 , y 2 , ..., y T ) (a simplification of the notation dropping the index r) be the univariate serie, the joint density function can be decomposed in a product of the conditional (to the past) distributions: By using a pair copula decomposition we have: where F(y t ) and f (y t ) are the cdf and the df of the marginal Y t and c t,j = c t,j|t−1,t−2,...,j+1 are the pair copulas with parameters t,j . Therefore, the joint distribution of the process becomes a D-vine copulas model of order T, which is a product of T marginal densities and T(T − 1)∕2 pair copulas densities: where u t|j+1 = F(y t |y t−1 , ..., y j+1 ) and u j|t−1 = F(y j |y t−1 , ..., y j+1 ).
By substituting (9) and the correspondent cumulative distribution function in (7) we obtain the model for multivariate longitudinal data: In (10) describes the dependence between the indexes and (r) i,j describes the dependence between the r-th index at time t and the one at time j.
where u t|j+1 = F(y * (r) t |y * (r−1) . The df f t can be also defined by a PCC (between the indexes). In (11) t describes the dependence between the indexes at time t and (r) i,j describes the dependence between the r-th index at time t and the one at time j.
Note that in both models we suppose that the response at time t is independent from the past of the other variables.
The code of the algorithm is based on functions in the R packages CDVine (Schepsmeier et al. 2015) and VineCopula (Schepsmeier et al. 2015) and the MLE (Maximum Likelihood Estimation) used here therefore corresponds to the maximum pseudo likelihood method (MPL) (Genest et al. 1995).

Dependence relationships among indicators in the HDI data set
As we described in Sect. 1, the HDI data set used to apply the D-vine copulas methods described in 2.2 is composed of 4 indexes in a study period of 9 years (2010-2018): Education, Income, Life Expectancy, and the HDI itself (see Appendix Figs. 5,6,7,8,9,10,11,12,13). The region distribution over the 5 continents is as follows: 53 countries from Africa, 35 countries from America, 48 from Asia, 41 from Europa, and 11 from Oceania. Thus, we will find estimates for the set of pair-copula families in Eq. (11) and the relative set of parameters. Additionally, in the case of the HDI data set, we also want to jointly assess the evolution of the dependence relationship between the four domain indexes over time. Therefore, this copula approach will allow us to model the dependence structure between the indexes and the temporal structure from 2010 to 2018 at the same time.
For fitting the model in Eq.(11) with flexible lower/upper tail dependence, we considered the following copulas families, which are classified according to their different strengths of tail behavior in the estimation process of the copulas: • Gaussian/Normal (tail-symmetric, with no tail dependence at all) • Student-t (tail-symmetric,with upper and lower tail dependence) • Clayton (tail-asymmetric, with only lower tail dependence) • Gumbel (tail-asymmetric, with only upper tail dependence) • Frank (tail-symmetric, with no tail dependence) for more details see please Nelsen (2007). The suitable copulas approaches in the case of positive dependence, are the Gaussian, Student-t, Gumbel and Frank copulas. Otherwise, Clayton copula is the appropriate when we observed negative dependence see Tab. 1 and Fig. 2. Given those different copulas candidates, the next step is to fit the model shown in Eq.(11) and select the best candidate for the HDI data set. In the first step, each longitudinal series of data corresponding to a given index is modelled separately using a D-vine copulas to relate the marginal distributions of the indexes at each time of observation. In other words, we assume that we can explain the effect of the past indexes on the rth index by the history of the rth index and by the current values of the other indexes. In the second step, at each observation time, the conditional (on the past) distributions of each index are related using another D-vine copulas describing the relationship between the corresponding variables.
As we want to test whether there are differences between countries according to location, we stratified the HDI data set into 2 data sets: countries with the lowest HDI (African countries), countries with the highest HDI (European countries). Thus, we compare the dependence between these two data sets, but also comparing both data sets with the whole group of countries.
Following the decomposition shown in the Eq.(11), the 4 D-vine copulas (each one with 8 trees) is fitted.
Tables 5-2 listed the (r) i,j estimates, which describe the dependence between the rth index at time t and the one at time j. Additionally, Figs These contour plots graphically show all the conditional/ unconditional copulas estimates in every tree D-vine (t the bottom of every picture the first tree, and on the top of every copulas the label as in Fig. 4.) As we can observe the dependence structure in all these D-vine cases (World, Europe, and Africa) is slightly different. Table 5 shows in the first two trees all symmetric dependence in all the three cases (World, Europe, and Africa) but in the Table 2 and  the Table 3 we can observe upper tail dependence (Gumbel copula) in the first two trees in the case of Europe and Africa. This means that indexes emphasize dependence among the higher level of two consecutive years. For instance in Africa, c 89 (Table 2) we got among HDI of 2017 and 2018 a Gumbel (18.36) copula, therefore there is dependence in the case of the higher values among the indexes HDI in those two years. One of the main results we observed in African countries is that there is a significant upper tail dependence in the case of both the life expectancy index and the HDI. This implies that there has been a positive tendency and improvement in living conditions in Africa over the last years. In the case of the European countries, we observed a quite significant right tail dependence in the HDI, which in this case might mean that the level of HDI is keeping slightly a positive tendency over the studied years. This makes sense as Europe has less to improve than in Africa. From the copulas families selected, we see evidence of different types of asymmetric dependence. This demonstrates that the choice of D-vine was appropriate, since it guarantees enough flexibility to model the dependence structure. According to Joe et al. (2010) the copulas families in the first tree c 12 , c 23 , c 34 , c 45 , c 56 , c 67 , c 78 , c 89 of the model have the greatest influence on the model fitting.

Discussion
The HDI is one of the more used measures to rank countries into three tiers of human development: lifespan, education level and gross national income per capita. Its associated data set includes a temporal structure that motivates a dependency modeling, which has not been fully used before in the area, as far as we are concerned. Our research work employed a D-vine copulas approach to jointly modeling the dependence among the four HDI indexes (HDI itself, life expectancy, income, and education) at a given time and across time. This modeling approach allows us to describe a complex pattern of dependence structure in the HDI data set using a flexible high-dimensional model based on a D-vine copulas. Thus, it overcomes the problem of modeling simultaneous dependence between non-normal responses over time. Particularly, this allows modeling nonlinear and asymmetric cross-sectional and serial dependence, using a D-vine copulas. The use of the methods described in this article may be advantageous for practitioners in the field because they can jointly model cross-section dependence and serial dependence in a really flexible way. As the joint estimation of the D-vine copulas might be computationally slow and time consuming in high dimensional spaces, future research direction would include exploring possibilities of speeding up the estimation process. One option would be to implement a fast sequential alternative based on a simplified D-vine. Another possibility would be to use parallel computing approach via the R package parallel, which is included in the R base.
Another future research direction we would like to intent is to develop a modelbased clustering approach, which will take into account both cross-sectional and serial dependence for a comprehensive class of continuous three-way data sets. This modeling with five random variables, four trees and ten edges. The nodes in the first tree correspond to the five random variables that are being modeled and each edge corresponds to a bivariate conditional or unconditional pair-copulas strategy will determine clustering structures of countries according to low-, medium-, high-, and very high-HDI levels based on both cross-sectional and serial dependence and not only taking into account cross-sectional dependence.
The R code to fit the models is available upon request from the first author.
1 3    Table 4 D-vine parameter estimates for the Income in all the 8 trees ( T 1 the first tree, T 8 the last tree). From the left to the right: the whole countries, only European countries, and only African countries. The selected copulas family (Cop: Gauss, t-Student (labeled as t), Clayton, Frank, and Gumbel copula) and the copulas parameters (par.1 and par.2) can be one or two according to the type of copulas. N/A identifies absence of one parameter) Table 5 D-vine parameter estimates for the Education in all the 8 trees ( T 1 the first tree, T 8 the last tree).
From the left to the right: the whole countries, only European countries, and only African countries. The selected copulas family (Cop: Gauss, t-Student (labeled as t), Clayton, Frank, and Gumbel copula) and the copulas parameters (par.1 and par.2) can be one or two according to the type of copulas. N/A identifies absence of one parameter) 1 3

Fig. 14 Contour plots of D-vine
Trees (see Table 2 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.