Choquet-Integral Aggregation Method to Aggregate Social Indicators to Account for Interactions: An Application to the Human Development Index

Composite well-being and sustainability indices are usually obtained as arithmetic and geometric means of sub-dimensions. However, the arithmetic mean does not consider potential interactions across the dimensions of the indices and the geometric mean does not penalize unbalanced achievements across dimensions strongly enough. This paper uses a flexible non-additive aggregation model—the Choquet integral—to account for potential synergies and redundancies of the dimensions that are used to obtain indices, and uses the Human development index (HDI) as an example to illustrate the flexibility of the aggregation procedure. This paper relies on multiple theoretical and empirical studies, which indicate mutually strengthening relationships (positive interactions) among the three HDI dimensions. To illustrate and show-case how positive interactions among the three HDI dimensions could be taken into account, this paper uses five hypothetical weight sets and simulates 500 weight sets that allow varying positive interactions among the three dimensions. The analyses with the HDI data suggest that both geometric and arithmetic mean HDI scores are roughly the same for most countries, even when variations across the three dimensions are relatively large. On the other hand, countries with balanced (unbalanced) achievements across dimensions rank in higher (lower) positions with the Choquet integral aggregation. The illustrations of this paper show-case how Choquet integral is a flexible aggregation method to take into account varying positive interactions across the HDI dimensions and able to detect unbalanced achievements.


Introduction
Well-being consists of many components and cannot be measured by income alone (e.g., Fleurbaey, 2009;Fleurbaey & Blanchet, 2013, among many others). This realization has led to the inclusion of new dimensions beyond income and different frameworks to measure multidimensional well-being, especially in the context of imperfect markets and public provision of some well-being components (e.g., provision of public health and education) (e.g., Sen, 1985Sen, , 1987. The recent political agenda also considers improvements in many well-being dimensions beyond GDP when assessing social progress. 1 A different set of well-being dimensions are put together into composite well-being (sustainability) indices to allow policymakers to measure and monitor the overall societal well-being (Ness et al., 2007). These composite well-being indices are calculated using simple aggregation of different well-being (sustainability) dimensions. For instance, the Human Development Index (HDI) of the United Nations' Development Programme (UNDP) is based on the geometric average of the health, income and education dimensions (UNDP, 2010). Similarly, the Environmental Performance Index (EPI) is based on the weighted average of a set of indicators. 2 This paper proposes a methodology to obtain different sets of composite wellbeing and sustainability indices and uses the particular case of the HDI to demonstrate its application and its benefits. The methodology has the potential to be adapted to a range of other indices.
Most composite indices are aggregated through weighted averages where each dimension is given a relative weight suggesting its intrinsic importance (Alkire & Santos, 2014). This is based on the assumption that there is no interaction among dimensions and that there are constant marginal rates of substitution among its dimensions to maintain the composite index unchanged (see Decancq & Lugo, 2013, for a detailed analysis on the issue). However, explicit tradeoffs among dimensions (i.e., the weights attached to dimensions) are not in place implicitly, and implicit tradeoffs across dimensions can be different from the explicit ones (Pinar et al., , 2015Ravallion, 2012). Ravallion (2012) offered an alternative aggregation function based on the generalized aggregation formula by Chakravarty (2003Chakravarty ( , 2011, which led to more sensible tradeoffs across the dimensions compared to the geometric mean (see Pinar, 2019 for the recent application of the generalized aggregation method to the OECD's regional well-being index). Additionally, multiple studies analyzed the robustness of allocating weights to HDI dimensions using linear programming tools to assess the precision of rankings with alternative weights (Athanassoglou, 2015;Cherchye et al., 2008;Foster et al., 2013;Pinar et al., 2017Pinar et al., , 2020Rogge, 2018). 3 Even though the existing literature focused on the impact of alternative weight allocation across the dimensions of the HDI on the ranking precision, the literature mentioned above did not consider the potential interaction among the dimensions of the HDI. This paper aims to fill 1 See e.g. the European Commission's "Going beyond GDP" initiative; Stiglitz et al. (2009), a report that is commissioned by French government to propose relevant indicators that measure social progress; the United Nations' 2030 Agenda for Sustainable Development. 2 Some other indices are also aggregated thorough weights chosen by experts (e.g., the Fondazione Eni Enrico Mattei (FEEM) sustainability index, see Pinar et al., 2014 for details) and weights chosen by individuals (e.g., Better Life Index (BLI) of the Organization for Economic Co-operation and Development (OECD), see Durand, 2015 for details). 3 In a related area, recently developed methods also overcome the use of subjective weights while obtaining the index outcomes (see e.g., Shaker, 2018;Mazziotta and Pareto, 2019;Kraemer et al., 2020 for discussion and application). this gap by allowing positive interactions (synergies) among the dimensions of the HDI with the use of the Choquet integral aggregation method. In a related approach developed by Mazziotta and Pareto (2016), they obtain a non-compensatory index by penalizing the unbalanced values of the indicators through standard deviation across indicators. Another extreme case scenario for penalizing the unbalanced achievements is to use a minimum operator (i.e., index outcome results in minimum achieved dimension). However, both Mazziotta and Pareto (2016) methodology and minimum operator do not determine the level of interaction among the pairs of dimensions, which is also possible with the Choquet integral as it allows one to choose a different degree of interaction across the pairs of the dimensions chosen.
There is an extensive set of literature that suggests that there is a synergy (or positive interaction) across the dimensions of the HDI. For example, education and health are considered to be part of the production function leading to different levels of income per capita (see e.g., Mankiw et al., 1992 andGlaeser et al., 2004 for education's effect on economic development and growth; and also see Bhargava et al., 2001;Bloom et al., 2004;Behrman et al., 2004 for health's effect on economic growth and aggregate output through productivity). Additionally, higher educated societies tend to have better outcomes in health and healthy behaviors (Brunello et al., 2013;Kemptner et al., 2011;Soares, 2007). In particular, women's educational level is profoundly important in improving health outcomes in the developing world (Chen & Li, 2009). Similarly, healthier students receive higher levels of schooling (Weil, 2007) and make better educational and occupational choices (Vogl, 2014). The above-mentioned theoretical and empirical literature highlights that the HDI dimensions are complements to some degree. Therefore, it is essential to incorporate these synergies across HDI dimensions while obtaining composite HDI scores, especially when it comes to informing governments and policies aiming at improving citizens' well-being.
In other words, one should choose an aggregation method that is flexible enough to take into account the positive interactions across the HDI dimensions.
To capture the synergies (or penalize the unbalanced achievements) across the dimensions of the HDI, the UNDP shifted from the arithmetic mean (AM hereafter) aggregation method to the geometric mean (GM) one since 2010 so that "poor performance in any dimension is now directly reflected in the HDI, and there is no longer perfect substitutability across dimensions. This method captures how well-rounded a country's performance is across the three dimensions it recognizes that health, education and income are all important" (UNDP, 2010, p.15). In other words, the UNDP adopted a GM method to penalize countries with relatively unbalanced achievements across the three dimensions, where countries with unbalanced achievements across three dimensions obtain lower composite scores with the GM method compared to the AM. However, the GM aggregation method still has shortcomings in capturing the complementariness or synergies among the three dimensions as it does not penalize the unbalanced achievements across dimensions strongly enough. Firstly, there is a set of countries that have unbalanced achievements across three dimensions of the HDI but still achieve relatively similar composite scores with the GM and AM method (see Sect. 3 for some of these examples). Secondly, the GM aggregation method does not differentiate alternative synergies among different dimensions as relatively weak performance in either dimension is reflected similarly in the achieved composite score. Yet, the aggregation method should be flexible enough to allow more interaction among some dimensions compared to other sets of dimensions. To allow differing synergies across different dimensions of the HDI and also choose parameters of aggregation method that captures the unbalanced achievements across the dimensions of the HDI, this paper offers an alternative aggregation method, which is flexible enough to capture a various set of interactions across the HDI dimensions.
This study proposes an aggregation methodology based on the Choquet integral (CI hereafter), which considers interactions across dimensions as it evaluates all possible sets of dimensions, rather than evaluating single dimensions (Grabisch et al., 2009). This permits taking into account how balanced (or unbalanced) the achievements across dimensions are and reflecting these differences in the composite score. The CI is a general method that allows interactions across dimensions while allocating different relative importance to dimensions (see e.g., Grabisch and Labreuche (2010) for review of CI's use in multicriteria decision making). The application of the CI for obtaining multivariate indices has increased in recent years. For example, Meyer and Pontheire (2011) used the CI and demonstrated that individual preferences could not be captured by an additive model (i.e., weighted average aggregation methods) because of complementarities and redundancies between well-being dimensions (see also Angilella et al. (2016) Oppio et al. (2018), and Gálvez Ruiz et al. (2018) for non-additive models for assessing urban quality). Carraro et al. (2013) applied the CI aggregation method to capture interactions across different sustainability indicators, with other authors using the CI to construct indices that allow for different interactions among indicators (see also Merad et al., 2013;Bertin et al., 2018;Bottero et al., 2015Bottero et al., , 2018Branke et al., 2016;Campagnolo et al., 2018). Most of the literature that utilized the Choquet integral aggregation method used expert elicitation to identify the weights (see, e.g., Grabisch et al. (2008) for a review of methods used for identification of weights). However, in this paper, we use five hypothetical weights that allow different degrees of positive interactions across the dimensions of the HDI to illustrate the use of the Choquet integral for obtaining composite HDI scores. Furthermore, as a robustness analysis, we also simulate 500 weights for the aggregation, which allows different levels of interaction and offers a range of composite HDI scores obtained by countries. The simulation exercise also allows us to obtain a feasible range of composite index outcomes for countries when an interaction index varies between lower and upper values set by the decision-makers. We also obtain the HDI index outcomes with the minimum operator (i.e., perfect complementarity across the three dimensions of the HDI) to compare this extreme case scenario with the index outcomes obtained with the Choquet integral. It should be noted that the hypothetical weights and simulated weights are chosen by this paper is to show-case the usefulness of the Choquet integral in penalizing the unbalanced achievements across dimensions of the HDI and take into account varying interactions; however, they are not the 'actual' or 'true' interactions among the dimensions. 4 The remaining of the paper is organized as follows. Section 2 introduces the CI aggregation method and illustrates its characteristics. Section 3 shows an application of this methodology to obtain HDI scores by allowing different degrees of interaction across HDI dimensions, followed by a set of concluding remarks in Sect. 4. 4 The weights chosen in this paper are only for illustrative purposes as there are no studies identifying 'true' degree of complementarity among the dimensions. As the contribution of this paper is to illustrate how Choquet integral could penalize unbalanced achievements and take into account varying interactions compared to the geometric mean, examining the 'true' interactions among the dimensions of the HDI and eliciting weights for the HDI dimensions would be a promising future research venue.

The Choquet integral as an aggregation methodology
The CI aggregation method (Choquet, 1953) has a general expression that uses a diversity of inputs from policymakers and the public. This allows considering a wide range of political and personal choices. For example, this paper shows an application of the CI method to measuring well-being by allowing different degrees of interaction among HDI dimensions, with dimensions still being equally important.
Let {x 1 , x 2 , … , x d } be the index values of the well-being dimensions described by a set D = {1, 2, … , d} of dimensions. Capacities are a set of functions where 2 D is the all possible subsets of the criteria, which assigns a weight (measure), ranging between 0 and 1, to every subset of dimensions. The set function, , has to satisfy border and monotonicity conditions: The first condition represents scenarios (i.e., � = 0 and (D) = 1 , which suggests that all dimensions are unsatisfactory (i.e., achievements in all dimensions are zero) and satisfactory (i.e., achievements in all dimensions are full), respectively. The second condition suggests that the value of (T) represents the capacity (weight) of dimensions belonging to the subset T for any subset T ⊆ D . This can be interpreted as the weight (importance) that one assigns to the fully satisfactory performances of the dimensions belonging to the subset T, and with fully unsatisfactory performances by the remaining dimensions. For example, if a subset has two out of three HDI dimensions (e.g., health and education), then ({Health, Education}) would represent the weight attached to the scenario where health and education achievements are fully satisfactory, and income is fully unsatisfactory. Since � = 0, (D) = 1 (i.e., when all dimensions are unsatisfactory and fully satisfactory, respectively), one needs to assign weights (S) for all other 2 d − 2 subsets S of D, where d = card(D). The CI x ∶ {x 1 , x 2 , … , x d } with respect to a capacity on D is defined by: where is a permutation on D such that x (1) ≤ … ≤ x (d) and x (0) = 0 for all i ∈ {1, … , d} . A convenient way of presenting the CI is by using Möbius values. Given a weight on 2 D , its Möbius presentation (Rota, 1964;Shafer, 1976 where S = card(S) and t = card(T) . The following boundary and monotonicity conditions must be met: where the Möbius value for the scenario where achievements in all dimensions are zero would be zero (i.e., m � = 0) , and the sum of the whole Möbius values would be one (i.e., ∑ T⊆D m(T) = 1).
The CI can now be expressed in terms of the Möbius representation m of the weight μ as follows: , where the symbol ^ denotes the minimum operator and m(T) are the Möbius coefficients (Grabisch et al., 2009).

Characteristics of the Choquet Integral
The CI aggregation method can be used to obtain general preferences in multidimensional well-being analysis. Three important characteristics of the CI are included below to illustrate the flexibility of the methodology to incorporate decision-makers' preferences on multidimensional well-being.

Relative Importance Index
The relative importance of well-being dimensions can be estimated by the Shapley value (Shapley, 1953) of each dimension. This is calculated by comparing the weights in every set that includes that dimension against every set that does not include it. Therefore, the overall importance of dimension i ∈ D can be obtained by averaging marginal contributions (Grabisch, 1995(Grabisch, , 1996 as follows: where d = card(D) and t = card(T) represent the cardinality of the subset of D and T, respectively. For instance, to obtain the importance of the health dimension in the calculation of the HDI, one can compare the weights assigned to subsets that include the health dimension with the subsets that do not have the health dimension. This would consist of four comparisons: (i) weight attached to a subset that has health dimension only vs. weight attached to an empty subset; (ii) weight attached to a subset that includes health and education dimensions vs. weight attached to a subset that only includes education dimension; (iii) weight attached to a subset that includes health and income dimensions vs. weight attached to a subset that only includes income dimension; (iv) weight attached to a subset that includes all dimensions vs. weight attached to a subset that includes education and income dimensions. In terms of the Möbius representation of , the Shapley value of dimension i can be rewritten as: Furthermore, the relative importance of the dimensions (i.e., Shapley values) sums to one (i.e., ∑ d i=1 v (i) = 1) , and higher Shapley values represent higher relative importance.

Orness Index
The Choquet integral aggregation also allows one to examine whether the choice of the weights by the decision-maker is optimistic or pessimistic (see Marichal, 2004 for the detailed discussion on the orness index). In other words, the orness index measures whether a decision-maker thinks that a good performance in one dimension compensates another one or not. The orness index ranges between 0 and 1, and higher (lower) values of this index represent that the decision-maker thinks that the dimensions are substitutes (complements) of each other. For instance, if orness index equals to 1, then the decisionmaker is considered to be fully compensative (i.e., the dimensions are perfect substitutes to each other), and in this case, Choquet integral aggregation will be equal to the maximum operator (i.e., an index outcome would be the maximum value amongst the dimensions).
On the other hand, if orness index is equal to 0, then the decision-maker is considered to be fully non-compensative, and the Choquet integral corresponds to the minimum operator (i.e., the dimensions are perfect complements), and the index outcome would be the lowest value amongst the dimensions. The orness index is computed as follows: where d = card(D) and t = card(T) represent the cardinality of the subset of D and T, respectively.

Interaction Index
A key reason for using the CI to construct a composite well-being index is to take into account the interaction between well-being dimensions. Consider two well-being dimensions of i and j. If ({i, j}) is greater than the sum of ({i}) and ( ({i, j}) = 0 , this would suggest that dimensions i and j do not interact. For instance, if the health and education dimensions of the HDI are mutually-strengthening (mutuallyweakening), then the weight given to the subset that includes both dimensions should be larger (smaller) than the sum of the weights given to subsets that only includes health and education dimensions. Obviously, these two dimensions can join other subsets and therefore, an index of interaction between dimensions i and j should take into account all forms: (T ∪ i) , (T ∪ j) and (T ∪ ij) with T ⊆ D∖ij . Therefore, an average interaction index between the two dimensions i and j is calculated as follows (Murofushi & Soneda, 1993): where d = card(D) and t = card(T) represent the cardinality of subsets of D and T, respectively. The quantity I (ij) can be interpreted as a measure of the average marginal interaction between i and j. An important property is that I (ij) ∈ [−1, 1] for all ij ⊆ D , the value 1 (respectively − 1) corresponding to maximum complementarity (respectively substitutivity) between i and j (see Grabisch, 1997). In terms of the Möbius representation of I (ij) , the interaction index between the two dimensions i and j can be rewritten as:

Current Measurement of HDI and Data
The official HDI is obtained as the GM of the three sub-indices-health, education, and income indices-where each sub-index is obtained through a normalization procedure by setting minimum and maximum goalposts to set the sub-index between 0 and 1. The health index (HI) is obtained by HI = LE−20 85−20 where LE is the life expectancy at birth for a given country, and the minimum and maximum goalposts for LE are 20 and 85, respectively. To obtain education index (EI), two indicators are used: mean years of schooling (MYS) for adults aged 25 years and the expected years of schooling (EYS) for children of school entering age. The index values for MYS and EYS (MYSI and EYSI, respectively) are obtained by using a minimum value of zero and maximum values of 15 and 18 years such as MYSI = MYS−0 15−0 and EYSI = EYS−0 18−0 respectively. Then, two indices are combined into an EI as follows:EI = MYSI + EYSI 2 . Finally, the standard of living dimension, income index (II) is obtained by using the gross national income (GNI) per capita, and obtained by procedure II = ln (GNI per capita)−ln (100) ln (75,000)−ln (100) where the minimum and maximum goalposts for GNI per capita are set to $100 and $75,000, respectively. Finally, the official HDI is the GM of the HI, EI and II. 5 This paper will use the EI, HI, and II of the HDI for years between 2008 and 2018 to demonstrate the use of the CI methodology compared to the GM and AM methods. 6

Comparison of Choquet Integral Aggregation with the GM and AM Methods
This subsection will briefly provide some comparisons between the Choquet integral aggregation and the GM and AM methods to illustrate the usefulness of the Choquet integral. The Choquet integral aggregation ranges between the minimum and maximum operators with different interactions among the dimensions. To provide comparison between different aggregation methods, let us consider three hypothetical countries with the respective achievements in II, HI and EI: Country A = (0.200, 0.500, 0.800), Country B = (0.294, 0.340, 0.800), and Country C = (0.114, 0.700, 1.000). Furthermore, Table 1 offers Choquet integral capacities, interaction indices and orness index for the minimum operator, four sets of Choquet integral capacities (CI1, CI2, CI3 and CI4), and maximum operator for the illustrative example. Capacities presented in CI1 and CI2 suggest that the three dimensions are complements and the interaction indices are greater than 0. In contrast, CI3 and CI4 provide capacities suggesting that the three dimensions are substitutes. Overall, the composite scores of these countries with the GM are the same (i.e., 0.431), and the composite scores for countries A, B and C are 0.500, 0.478 and 0.605 with the AM, respectively. Clearly, the most (least) balanced achievement across the three dimensions is obtained by country B (C); however, the GM offers identical composite scores for all countries. On the other hand, even though country C has a very unbalanced achievement across three dimensions, the composite score of this country is the highest with the AM due to relatively high achievements in HI and EI. On the other hand, Fig. 1 offers the composite achievement scores for countries A, B and C with the minimum and maximum operators (Min and Max, respectively) and the CI1-CI4 capacities. This figure clearly shows that country B (i.e., the country with more balanced achievement across three dimensions) has a higher composite score with the minimum operator and CI1 and CI2 capacities than country A and C when dimensions are considered complements. Whereas country C has the highest achievement with the CI3 and CI4 when dimensions are considered as substitutes.
In sum, even though countries had varying achievements across the three dimensions, the GM method offers the same composite score for all these countries, and AM does not consider the complementarity (substitutability) across the three dimensions. Yet, the Choquet integral aggregation method can capture different levels of interaction (complementarity or substitutability) among the dimensions in consideration and reflect this in the composite scores.

Identification of Capacities for the Choquet Integral Aggregation
To use the CI methodology, one first needs to identify the set of capacities (i.e., set of weights given to the subsets of the dimensions. However, eliciting representative capacity (monotonic weight sets) for the CI method is a non-trivial task because of the complexity of the general identification problem. When the number of dimensions increases, it becomes harder for decision-makers (e.g., policymakers, public, stakeholders) to provide capacities for all subsets. Most identification methods proposed in the literature are generated by an optimization problem where constraints are obtained from the actual decision-makers' preferences. Grabisch et al. (2008) provide a review of methods used for the identification of these capacities, including maximum-split, minimum variance, minimum distance, and least-squares-based approaches. All these methods rely on the collection of preferences from decision-makers using questionnaires, which is non-trivial and requires careful consideration. 7 Examples of how to elicit capacity representing decision-makers' preferences can be found in many articles, including Marichal and Roubens (2000), Pontiere (2011), andBottero et al. (2018), among others. For instance, Bertin et al. (2018) elicited the weights and the other parameters by using a computer-based nominal group technique to minimize the drastically dissenting valuations and generate an ex-post consensus and mitigate the potential expert-selection bias. Their paper determined the lack of consensus among experts if the expert evaluations' inter-quartile range was 20 or higher and repeated the stage if there was a lack of consensus. The expert elicitation method adapted by Bertin et al. (2018) is a good method to eliminate the potential expert-selection bias if most expert-selection involves limited bias. However, suppose the most expert selection involves high bias levels. In that case, the method adopted by Bertin et al. (2018) may not mitigate the potential expert-selection bias but instead increase such bias as the consensus weights would be closer to the ones chosen by the majority of the experts.
Given this paper's contribution is in providing a set of examples on how the CI method is capable of capturing the positive interactions which is not fully captured by the GM methodology, this paper uses some set of capacities as exemplary cases instead of expert elicitation to identify capacities for the CI aggregation. Given that UNDP policymakers aim at promoting balanced achievements across dimensions (see page 15 of UNDP, 2010), and that pairs of dimensions work in synergy, the decision-makers can decide on the level of interaction across dimensions using CI and choose a degree of penalty that could be implemented for unbalanced achievements across dimensions compared to the balanced case. Furthermore, one can also select varying interactions across the pairs of dimensions, allowing a relatively higher synergy between health and education but less synergy between income and education (income and health).
To illustrate the meaning and use of a different set of capacities for the CI aggregation, we offer some capacities (weights) for the CI aggregation that allows different degrees of positive interactions among dimensions where all dimensions are given equal importance (i.e., Shapley value of each dimension is equal to a third). In other words, one could still keep the equal importance given to the three dimensions but, at the same time, allow for positive interactions. Therefore, the CI aggregation procedure still gives equal importance to each dimension, but it allows a mutual-strengthening effect between the pairs of dimensions.
In order to use CI aggregation to obtain HDI scores, one needs to identify six coefficients of capacities:  Table 2 provides five alternative capacities (i.e., allocation of weights to different subsets of three dimensions), which satisfies all the conditions mentioned above. Capacities with cases 1 and 2 (C1 and C2, respectively) are symmetric scenarios that allow positive interaction indices among pairs of dimensions (0.2 and 0.3, respectively). On the other hand, we allow one pair of dimensions to be more of synergic with one another (i.e.,  higher positive interaction index) compared to the other two pairs of dimensions (0.35 versus 0.15) where capacities identified for the third, fourth, and fifth cases (C3, C4, and C5, respectively) allows for higher synergy between II and HI, II and EI, and HI and EI, respectively. Furthermore, the orness indices of the five cases are either 0.3 or lower. Since the orness index is lower than 0.5 and closer to zero, this suggests some degree of synergy (complementariness) among the dimensions. As a final option, we also used minimum operator, which could also be captured by the Choquet integral, where the interaction indices between pairs of dimensions are 0.5 and orness index is 0 (see the respective capacities and interaction index and orness measure with the minimum operator in Table 2). Even though the minimum operator perfectly penalizes the unbalanced achievements, it does not allow for varying interactions among the dimensions as the interaction indices among the pairs of the dimensions are 0.5. Henceforth, even though the minimum operator is a good option for penalizing the unbalanced achievements across the dimensions, it does not capture potential varying interactions among the dimensions, which is possible with the Choquet integral and penalizes the unbalanced achievements in the most extreme way. 8 These various alternatives in capacities would allow us to examine how the unbalanced nature of achievements across the three dimensions are reflected in the composite scores as each scenario with the CI aggregation method compared to that of the GM and AM.
This paper uses a limited number of capacities (weights) to illustrate how the Choquet integral is useful in capturing varying interactions among the dimensions to allow the reader to follow the procedure of how varying interactions could be taken into account with the Choquet integral where AM and GM fail. However, there are many infinite possibilities to express different degrees of complementarity. Henceforth, we provide a robustness analysis in Sect. 3.6 where we simulated 500 capacities (weights) that allow interaction indices between pairs of dimensions to be between 0.1 and 0.3 and orness index to be less than 0.4 to provide how HDI scores vary with the use of alternative interaction and orness values.

Composite HDI Scores Obtained with the CI Cases, GM and AM
In this subsection, we briefly show some countries' composite achievements in 2018 when the GM, AM, and five cases of the CI are used for aggregation. Table 3 shows achievements in II, EI, HI, and HDI scores obtained with the GM and AM, and five cases of the CI are used to aggregate the three dimensions for six countries (see Appendix Table 7 for the composite HDI scores of the countries in 2018 with the AM, GM, CI cases and minimum operator). To show how countries' unbalanced achievements across the three dimensions are reflected in the composite HDI scores, we also calculated total absolute deviations across the three dimensions of HDI for all countries (i.e., the sum of the absolute deviations between income and health indices, income and education indices, health and education indices). 9 Clearly, the higher the total absolute deviations across the three dimensions, the higher the unbalanced nature of the three dimensions' achievements. Table 3 also offers the total absolute deviations across the three dimensions of HDI for these countries. Despite differences in achievements across dimensions, both GM and AM of the HDI dimensions 9 Total absolute deviation across dimensions is obtained by |II − HI| + |II − EI| + |HI − EI|. Table 3 HDI scores with the GM and AM (HDI-GM and HDI-AM), five Choquet integral cases and minimum operator for a set of countries generate roughly the same HDI scores for Bulgaria, Mongolia, Trinidad and Tobago, Suriname, Sierra Leone, Micronesia. Even though some countries had relatively weaker performance in one or two dimensions in 2018 (e.g., Trinidad and Tobago and Suriname had a relatively weaker performance in education, Sierra Leone and Micronesia had a relatively weaker performance in income and education dimensions), the GM and AM aggregation methods roughly offer the same composite score to these countries. Table 3 provides some country examples, but the average absolute difference between HDI scores obtained with the GM and AM methods is also 0.006 for the whole set of countries.
If the policymakers in the UNDP were trying to avoid perfect substitution across dimensions (and to promote balanced achievement across dimensions), using either GM or AM methods would not serve this aim as both methods (GM and AM) mostly result in similar HDI scores for most countries around the world. More importantly, neither the AM nor the GM considers all possible interactions among the three HDI dimensions. This is, however, possible with the CI method. For instance, the AM and GM aggregation methods offer the same composite scores for Bulgaria and Mongolia even though there exists a variation across the three dimensions. Yet, even this relatively small variation in achievements in three dimensions is reflected in composite scores obtained with C1 and C2, where both composite scores are lower than those obtained with the AM. 10 On the other hand, one can see that the absolute deviation across the three dimensions is the highest and lowest for Micronesia and Bulgaria (0.380 vs. 0.094), respectively, yet their composite scores are roughly similar to the GM and AM. However, this is something that is reflected in the aggregation C2. The composite score obtained with C2 is 0.012 less than the composite score obtained with the AM for Bulgaria, but this difference is 0.049 for Micronesia. In other words, aggregation with the C2 reflects the unbalanced achievements across the three dimensions in the composite score proportionately and penalizes the countries that have a relatively more unbalanced nature of achievements across the three dimensions. Similarly, the minimum operator provides relatively lower scores to the countries with unbalanced achievements across three dimensions. Furthermore, balanced achievements across two dimensions could also be taken into account with the CI, which is no. t the case with the GM. For instance, among the composite scores achieved with the CI aggregation method, Trinidad and Tobago achieved the highest composite score with C3 compared to other composite scores obtained with CI. This is expected as the interaction between II and HI is the highest for this case (see Table 2 where the interaction index between II and HI is 0.35) and Trinidad and Tobago's achievements in II and HI are relatively closer to each other. Similarly, Sierra Leone and Micronesia had a relatively balanced achievement in II and EI, and therefore, the composite scores allocated to these countries are relatively higher with the C4, where the interaction between II and EI is relatively higher. On the other hand, the minimum operator does penalize the unbalanced 10 The composite HDI scores for each Choquet integral aggregation case are obtained by using the capacities (weights) from Table 1  achievements across the three dimensions but does not differentiate the balanced achievements across the pairs of the dimensions, which is well taken into account with C3 for Trinidad and Tobago and with C4 for Sierra Leone and Micronesia. Overall, even though the GM aggregation does not differentiate in which dimension a country has a relatively balanced achievement among the pairs, aggregation with the CI methodology allows one to consider allocating higher interaction indices among different pairs.
To show that the CI aggregation penalizes the unbalanced nature of the achievements across the three dimensions better than that of the GM aggregation method, Fig. 2 shows the differences between HDI scores obtained with the GM method (GM) vs. HDI scores obtained with the AM, CI cases 1 and 2 and the minimum operator (GM, CI Case 1, CI Case 2, Min, respectively) (y-axis) at different levels of absolute deviation across the three dimensions (x-axis). Figure 2 aims to highlight how unbalanced achievements across the three dimensions are reflected in the composite score with the CI cases 1 and 2, and minimum operator compared to the GM. When achievements across the three dimensions became more unbalanced, then HDI scores obtained with the GM and CI cases 1 and 2, and minimum operator should decrease when the total absolute deviations across the three dimensions increase compared to HDI scores obtained with the AM. Figure 1 shows that both the GM and AM generate roughly the same HDI scores up to 0.4 total absolute deviations across the three dimensions. Note that three HDI dimension indices (II, EI, and HI) range between 0 and 1; hence a 0.4 total absolute deviation is relatively high, suggesting that the GM fails to reflect poor performances in some dimensions in the composite HDI for most countries. On the other hand, there is a clear linear relationship between total absolute deviations and HDI score differences between the GM and CI methods and minimum operator. In other words, when the total absolute deviations increase (i.e., when the achievements in the three dimensions become more unbalanced), the differences in HDI scores obtained with the CI and minimum operator compared to the GM increase. Hence, relatively higher unbalanced achievements across the three dimensions are captured more clearly with the CI and minimum operator compared to the GM. Note that we also compare the CI cases 3, 4, and 5 with the GM and minimum operator when one pair of dimensions has higher positive interaction compared to other pairs. In those cases, when the achievements in the highly interacting pair become more unbalanced, the CI method offers relatively lower HDI scores compared to the GM. Furthermore, CI cases 3, 4 and 5 systematically penalizes unbalanced achievements between II and HI, II and EI, HI and EI compared to that of the GM and minimum operator (see Appendix Figs. 8, 9 and 10 for each scenario, respectively). 11 In other words, even though the minimum operator penalizes unbalanced achievements at the most extreme level, varying interactions among the dimensions of the HDI are considered better with the Choquet integral.

Ranking Analysis
The rank correlation coefficients between HDI scores obtained with the GM, AM, minimum operator and CI methods are positive and high. 12 This was expected given positive and high correlation coefficients between the three HDI dimensions. However, despite the high correlation between rankings, there are cases of clear rank differences and sensitivity depending on the used method. Figures 3, 4, and 5 show scatter plots of HDI ranks with the AM, CI cases 1 and 2 compared to HDI ranks with the GM, respectively. Both AM and GM methods generate similar HDI ranking positions for most countries (Fig. 4). However, HDI ranks become more sensitive when positive interaction levels between pairs of dimensions are allowed (Figs. 4 and 5). In particular, rankings are more sensitive when there is a  higher degree of interaction among the three dimensions (i.e., CI case 2 HDI ranks vs. GM HDI ranks in Fig. 5 shows more variation than comparisons in Figs. 3 and 4). Table 4 lists countries that rank in higher positions with the CI case 2 versus the GM. 13 For example, South Africa is ranked 97th with the CI versus 113th with th e GM. This is because of South Africa's relatively rounded achievements across dimensions (II = 0.720; HI = 675; EI = 0.721), allowing it to surpass other countries with relatively unbalanced achievements. One of these surpassed countries is Dominica, which ranks 100th with the GM despite unbalanced achievements and because of better performance in HI (II = 0.684, CI case 2 HDI rank GM HDI rank

Fig. 5
Scatter plots of CI case 2 HDI ranks vs. GM HDI ranks 13 In the remaining part of the paper, we present comparisons between HDI scores obtained with the CI case 2 and the ones obtained with the GM and AM methods. HI = 0.894 and EI = 0.620). In other words, GM allows countries to achieve higher positions in the rankings by having relatively good achievements in just one dimension. On the other hand, the CI method prevents this from happening and rewards countries with more balanced achievements across dimensions with higher positions in HDI rankings. Table 5 shows a sample of countries that rank lower with the CI method compared to GM because of unbalanced achievements across dimensions. Examples include Kuwait, Qatar, Brunei, and Singapore, which have top achievements in income but much lower achievements in education. Other examples in Table 4 include countries with relatively high life expectancy levels and therefore high HI (e.g., Lebanon, Maldives, Bhutan, Cuba, Dominica) but relatively lower achievements in the other two dimensions, or countries with relatively rounded achievements in income and health (e.g., Hong Kong and Andorra) but relatively poor achievements in education. 14

Implications for Policy-Making and Temporal Improvements in HDI Dimensions
One of the most important implications of multidimensional well-being indices such as the HDI is to inform governments about how governments may improve their citizen's well-being. The UNDP was aiming to promote balanced achievements across HDI dimensions. It was with this purpose that the UNDP moved from an AM to a GM aggregation method, as explained in their 2010 report: "Poor performance in any dimension is now directly reflected in the HDI, and there is no longer perfect substitutability across dimensions. This method [GM] captures how well rounded a country's performance is across the three dimensions are…we should not let changes in any of them go unnoticed" (UNDP, 2010, p.15). However, as demonstrated in Sect. 3.4 and exemplified in Table 5, countries with unbalanced achievements across dimensions can have composite HDI scores calculated with the GM that are like those calculated with the AM, hence suggesting that the AM-to-GM methodological move has not been successful at addressing the UNDP's original aim. Additionally, many countries have similar composite scores with both the GM and AM despite these countries having unbalanced achievements across dimensions (Fig. 2). Contrary to the GM, the CI method meets the 2010 UNDP report's aims by successfully accounting for balanced and unbalanced achievements across HDI dimensions and allowing different degrees of interaction across dimensions. When using the CI method, countries with unbalanced achievements across dimensions receive lower composite scores and are penalized with lower rankings compared to the GM and AM methods. Similarly, the CI method rewards balanced achievements across dimensions by ranking those countries higher.
In this subsection, we further explore the temporal changes to achievements in the three HDI dimensions to examine whether changes resulted in more rounded achievements or not and how these changes were reflected in the HDI scores calculated with the GM and CI methods. 15 Out of 186 countries, 152, 182 and 175 of them experienced improvements in their income, health and education dimensions between 2008 and 2018, respectively. 14 With the third, fourth and fifth cases of the CI, we allow higher positive interaction levels between II and HI, II and EI, HI and EI, respectively. Therefore, the higher (lower) the differences in achievements of countries in respective pairs of dimensions, the lower (higher) these countries' rankings with the CI aggregation are compared to the GM. The countries that ranked in higher (lower) positions with the Choquet integral cases 3, 4, and 5 and minimum operator compared to the GM presented in Table 8, 9, 10 and 11, respectively. The minimum operator takes into account the unbalanced achievements across the three dimensions similar to that of Choquet integral cases 1 and 2. However, Choquet integral cases 3, 4 and 5 reward (penalize) the balanced (unbalanced) achievements in II and HI, II and EI, and HI and EI better than that of the minimum operator, respectively. 15 To preserve space, we only present a comparison between CI case 2 and GM aggregation methods, however, comparisons between GM and other CI case 3, 4 and 5 and minimum operator are presented in Table 12, 13, 14 and 15 respectively. Table 16 provides the correlation matrix between changes in differences between II and HI, II and EI, HI and EI, and total absolute deviation between 2018 and 2008 versus differences between ΔHDI-GM (i.e., changes in HDI scores between 2008 and 2018 with the GM) and ΔHDI-CI (i.e., changes in HDI scores between 2008 and 2018 with the CI cases) and minimum operator. All of the correlation coefficients are positive and significant suggesting that Choquet integral and minimum operator. Table 6 shows an example of 10 countries that had experienced improvements in their income, life expectancy and educational outcomes. Still, improvements in these outcomes in five of these countries (i.e., Malawi, Tanzania, Zambia, Uganda and Mozambique) led to increased absolute deviation across the three dimensions of the HDI, and the improvements in the remaining five countries (i.e., Eswatini, Saudi Arabia, Lesotho, Turkey and Angola) decreased the absolute deviation across the three dimensions of the HDI. Improvements across dimensions were reflected in composite scores obtained with both the GM and CI methods (i.e., composite scores for all these countries were higher in 2018 compared to the ones in 2008 regardless of the aggregation method used), but composite score improvements with the GM were higher (or lower) than the ones obtained with the CI for some countries. For instance, composite score improvements with the GM were higher for Malawi, Tanzania, Zambia, Uganda, and Mozambique, where aggregate improvements were mainly driven by health improvements. These countries experienced improvements in all dimensions by 2018, and these improvements were reflected positively in composite scores regardless of the method. However, improvements were not in the direction of balanced achievements across dimensions, but rather the opposite; there was an increase in the differences in achievements across the three dimensions. Larger differences between dimensions were reflected by lower improvements to composite scores calculated with the CI but not in those calculated with the GM, indicating that the CI aggregation method is successful at reflecting the unbalanced nature of improvements across the HDI dimensions. The CI method is, again, successful at rewarding balanced achievements across dimensions. For example, increases in composite scores of Eswatini, Saudi Arabia, Lesotho, Turkey and Angola from 2008 to 2018 (Table 6) were higher with the CI method compared to the GM because improvements across the three dimensions in these countries were in the direction of balanced achievements across HDI dimensions. Health was the lowest achieved dimension for Eswatini and Lesotho in 2008, but it was also the dimension that improved the most between 2008 and 2018. A similar scenario took place for Saudi Arabia and Turkey, where education was the lowest achieved dimension in 2008 with these countries experienced major improvements in this dimension between 2008 and 2018. In these countries, improvements across all dimensions resulted in a more balanced set of achievements. Hence, increases in composite scores obtained with the CI were higher than the ones obtained with the GM. Figure 6 shows a comparison of changes in HDI scores from 2008 to 2018 to illustrate the differences between the CI and the GM methods. The diagram compares differences between ΔHDI-CI (i.e., changes in HDI scores between 2008 and 2018 with the CI) and ΔHDI-GM (i.e., changes in HDI scores between 2008 and 2018 with the CI) (y-axis) with changes in the total absolute deviation across dimensions between the same period (x-axis). A positive (or negative) change in the total absolute deviation indicates that a country's achievements across dimensions became relatively unbalanced (or balanced) over time.
If improvements across dimensions are in the direction of balanced achievements (i.e., negative values in the x-axis), over-time changes in HDI scores with the CI (ΔHDI-CI) is greater than over-time changes in HDI scores with the GM (ΔHDI-CI) (an upper left quarter of the figure). If improvements tend to be unbalanced achievements across dimensions, composite score changes with the CI are lower than the ones with the GM method (a lower right quarter of the figure). In other words, the CI method encourages (or demotivates) balanced (or unbalanced) achievements across dimensions by rewarding relatively higher (or lower) composite score improvements. This can give valuable information to governments and can help to promote balanced achievements. While aggregate improvement across dimensions might be similar for two countries, temporal improvements in the composite score calculated with the CI method will be higher for the country that works towards balancing achievements across dimensions. 16

Robustness Analysis: Obtaining a Feasible Range of Hdi Scores When Interaction Levels Vary
Throughout the paper, we used five fixed set of weights to illustrate how different degrees of positive interactions among the three dimensions of the HDI could be taken into account and show how these weight sets could be useful to reflect how well rounded achievements across the three dimensions of the HDI in the composite index. In this subsection, to provide robustness of the analysis, rather than relying on five hypothetical weight sets, we simulate 500 weight sets that would allow a wide range of positive interactions between dimensional pairs while maintaining the relative importance of the three dimensions at a similar level (i.e., roughly equal Shapley values for all dimensions). To obtain simulated 500 weight sets, we expose a set of constraints: (i) the relative importance of the dimensions to be roughly equal to one-third (i.e.,v ({II}) , v ({HI}) and v ({EI}) to get values of 0.33 ± 0.01); (ii) orness index to be less than 0.4; (iii) all the Möbius values except the empty set to be positive; (iv) interaction indices between pairs of II, HI, and EI to range from 0.1 to 0.3. The first condition suggests that the simulated weight choices allow an overall degree of 16 Table 16 provides the correlation matrix between changes in differences between II and HI, II and EI, HI and EI, and total absolute deviation between 2018 and 2008 (Δ|II-HI|, Δ|II-EI|, Δ|HI-EI| and Δ Absolute deviation, respectively) versus differences between ΔHDI-GM (i.e., changes in HDI scores between 2008 and 2018 with the GM) and ΔHDI-CI (i.e., changes in HDI scores between 2008 and 2018 with the CI cases) and minimum operator. All of the correlation coefficients are positive and significant, suggesting that differences in HDI scores between 2018 and 2008 are higher (lower) with the GM than Choquet integral and minimum operator when achievements become more unbalanced ( 3. Based on the above constraints, we simulate a total of 500 weight sets that satisfy all of the above constraints using the Kappalab package (Grabisch et al., 2006(Grabisch et al., , 2008. The software is freely distributed and can be downloaded from the Comprehensive R Archive Network (http:// cran.r-proje ct. org). 17 The simulated 500 weight sets are used to obtain 500 composite HDI scores for each country for the year of 2018 (see Appendix Table 17 for the composite HDI scores of the countries in 2018 with the AM and GM, and minimum, maximum, median HDI scores obtained with the CI aggregation using 500 simulated weights). This exercise also allows one to obtain a feasible range of HDI scores for countries (minimum and maximum HDI scores presented in Table 17) when interaction levels among the pairs of dimensions vary between two values that could be chosen by the decision-makers. Like the hypothetical five cases, to demonstrate how the CI aggregation penalizes the unbalanced nature of the achievements across the three dimensions better than that of the GM aggregation method, Fig. 7 offers the scatter plots of the differences between HDI scores obtained with the GM method (GM) vs. HDI scores obtained with the AM, minimum, median and maximum HDI scores obtained with the CI using 500 weight sets (AM, Min CI, Median CI, Max CI, respectively) (y-axis) at different levels of absolute deviation across the three dimensions Differences in HDI scores obtained with the AM (AM) and HDI scores obtained with the GM (GM), minimum, median and maximum HDI scores with the CI aggregation using simulated 500 weight sets (Min CI, Median CI, Max CI) at different levels of total absolute deviation across the three dimensions (x-axis). As discussed earlier, both the GM and AM generate roughly the same HDI scores up to 0.4 total absolute deviations across the three dimensions. However, when achievements across the three dimensions became more unbalanced, then HDI scores obtained with the CI decrease compared to the ones obtained with the GM. Overall, when we allow interaction indices of the pairs of the dimensions to vary between 0.1 and 0.3, and obtain 500 composite HDI scores for each country, the overall results presented using five hypothetical weight sets tend to hold. The CI aggregation method can reflect the unbalanced achievements across the dimensions of the HDI while obtaining composite scores.

Conclusions
Most well-being and sustainability composite indices are based on either arithmetic weighted averages or geometric weighted averages of sub-dimensions (i.e., AM or GM when indicators consisting of composite indices are given equal weights). When weighted averages are used in the aggregation of indicators, decision-makers (e.g., policymakers, index users) can choose the importance of indicators that make up the composite index. For instance, the OECD's Better Life Index has a user-friendly interactive website that enables users to decide on all indicators' relative importance (weights) to produce a composite index. However, one of the main issues associated with obtaining composite indices based on arithmetic and geometric weighted averages is that these methods do not consider the potential interactions among the indicators. Decision-makers often assume that certain pairs of indicators have a mutual-weakening (mutual-strengthening) effect and can substitute (complement) each other. The methodology presented in this article (CI) offers a flexible approach that allows considering these potential interactions, with benefits to decisionmaking, including the possibility to capture different levels of interactions across pairs of well-being indicators.
The HDI case, one of the best known composite indices, was analyzed in this paper to illustrate the use of the CI as an aggregator that considers potential interactions among well-being indicators. Prior to 2010, HDI scores were officially calculated based on the AM of the three HDI dimensions (income, health, and education). The UNDP changed its aggregation method to the GM after 2010 to promote well-rounded achievements across dimensions and acknowledge the literature exploring the mutual-strengthening of these three dimensions. In this paper, rather than using the expert elicitation (i.e., a method that is used by most of the existing literature using the CI aggregation) to identify the weights, we rely on the theoretical and empirical literature and use five hypothetical weight sets that allow synergies among the three dimensions of the HDI to avoid potential expert-selection bias. We also simulated 500 weight sets that would enable different levels of interaction between pairs of the HDI dimensions to demonstrate the CI aggregation use.
Overall, this article demonstrates that: (1) the GM adopted after 2010 actually results in HDI scores that are similar to those generated by the AM; (2) the GM allocates similar HDI scores to countries that show a relatively significant variation in achievements across dimensions, suggesting that this is not successful at accounting for positive interactions among dimensions; (3) the use of GM as an aggregation method does not promote well-rounded performances across the three HDI dimensions, because countries can still improve their composite scores without addressing unbalanced achievements.
The CI aggregation method offers several benefits over the AM and GM methods, including: (1) accounting for different degrees of positive interactions (i.e., mutual-strengthening effect) among pairs of dimensions (Sect. 3.3); (2) recognizing unbalanced achievements across dimensions by generating lower composite scores, hence encouraging countries to work in the direction of balanced achievements in education, health, and income (in the case of HDI); (3) allowing decision-makers to identify poor performance in dimensions (e.g., Table 5 includes countries ranking in high positions with the GM because of good achievements in only one or two dimensions, but poor achievements in the third; the same countries ranked in lower positions using the CI because this method allows measuring how well-rounded achievements across dimensions are); (4) temporal changes in HDI scores based on the GM (or AM) method did not explain whether changes in aggregate achievement levels across dimensions were in the direction of balanced achievements or not (Sect. 3.6). The CI method was again, however, successful at explaining whether temporal improvements were in the direction of rounded achievements across the three dimensions or not; (5) showing how the minimum operator is also successful in penalizing unbalanced achievements across the three dimensions similar to those of CI cases 1 and 2. However, the minimum operator does not allow for varying positive interactions among the pairs of the dimensions, which is possible with the CI method (see Appendix Tables 8,  9 , 10, 11, 12, 13, 14, 15, 16 for comparisons between CI cases 3, 4 and 5, and minimum operator). (6) obtaining a feasible range of HDI scores for countries (minimum and maximum HDI scores presented in Tables 17) when interaction levels among the pairs of dimensions are allowed to vary between two values that could be chosen by the decision-makers.
The analyses presented in this paper concentrate on the positive interactions across the three dimensions based on the intention of UNDP policymakers (2010 report) and existing literature. The Choquet methodology allows, however, multiple choices to reflect the preferences of policymakers and the public. This flexibility has important benefits because it offers the possibility of adopting different sets of constraints, including a variety of interaction levels across pairs of dimensions (e.g., by considering some of the well-being dimensions to be substitutes and some others to be complements) and different relative importance to dimensions (i.e., higher or lower Shapley values).
One of the limitations of this study was to choose weights a priori rather than obtaining weights through expert elicitation. However, the weights simulated in this study were to illustrate how Choquet integral-a flexible aggregation method-can take into account varying positive interactions to penalize the unbalanced achievements across the dimensions. Therefore, it should be noted that the weights chosen by this paper are neither the 'best' nor the 'most democratic' ones but were only chosen to illustrate the usefulness of the Choquet aggregation methodology in capturing varying interactions across the HDI dimensions. The future research venues are to identify the 'true' interactions among wellbeing indicators and use expert elicitation to identify more democratic weights to obtain a more representative measurement of well-being indices.      HDI-AM and HDI-GM columns represent composite HDI scores obtained with the arithmetic mean (AM) and geometric mean (GM), respectively. Case 1, Case 2, Case 3, Case 4 and Case 5 columns represent composite HDI scores obtained with the Choquet integral aggregation using the respective weight sets presented in Table 1. Column titled 'minimum' represent composite HDI scores obtained with the minimum operator      HDI-AM and HDI-GM columns represent composite HDI scores obtained with the arithmetic mean (AM) and geometric mean (GM), respectively. Min-CI, Max-CI and Median-CI columns represent minimum, maximum and median HDI scores obtained with the Choquet integral using 500 simulated weight sets, respectively