1 Introduction

Comparisons of health across populations are often made in the academic literature, and are frequently cited in political discussions. From cross-country evaluations (König et al. 2009), to studies on the socioeconomic gradient of health (Marmot 2005), to research on changes in health over time (Anderson et al. 2000), relative health assessments are among the most studied and hotly debated topics across disciplines. However, despite the importance of this issue, there is still an active debate about how to best operationalize population-level health comparisons (for a review of the literature, see Murray 2002 and Etches et al. 2006).

In this work we focus on the measurement of morbidity and introduce an approach specifically designed to tackle two of the main challenges faced by health indicators, as discussed by Burgard and Chen (2014). Namely, differences in disease distributions across groups and problems with self-assessed health measures. Health comparisons are often made across populations that are not necessarily afflicted by the same types of diseases. This issue of comparability is perhaps most obvious in cross-country comparisons among countries in different stages of the epidemiological transition and in health comparisons over time (Burgard and Chen 2014), but it is also present in subnational health assessments (Rivera et al. 2002; Weimann et al. 2016). A further potential problem that can arise when using existing measures is that some of the most frequently collected measures of health rely on self-assessments of health, despite a growing body of evidence that suggests that there is systematic bias in how respondents rate their health. While self-assessed health measures have been validated by their association with objective health indicators such as mortality (Idler and Benyamini 1997, 1999; Van Doorslaer and Gerdtham 2003; Jürges 2008), or health utilization (Humphries and Van Doorslaer 2000), more recent research has found evidence that certain socioeconomic groups suffer from reporting bias (Dowd and Zajacova 2010; Van Doorslaer et al. 2002; Etilé and Milcent 2006). Thus, the reliance on self-rated health measures when making group comparisons has been called into question.

Our strategy for addressing these challenges is to develop an analytical framework that is based strictly on relative prevalences. Comparisons based on disease prevalences have a long tradition in the public health literature, and comparisons that use this approach remain common (Braveman et al. 2010). The appeal of focusing on relative disease prevalences lies in the assumption that these indicators are more objective than self-rated health measures, and are more directly comparable in cross-population studies. Furthermore, as was noted in Burgard and Chen (2014), measures based on disease prevalences are more closely related to the origins of health disparities, and can therefore provide more direct guidance for public health interventions than other methods that aggregate health states in complex ways. We recognize that despite these advantages prevalence-based analyses are not without problems. As noted by Burgard and Chen (2014), diagnosis and reporting bias can lead to an over- or an underestimation of the relative disease burden across socioeconomic groups (Johnston and Propper 2009). While we acknowledge the importance of such issues, addressing such data collection concerns is beyond the scope of this work.

Setting aside data related challenges, a difficulty that often arises in comparative health assessments based on prevalences is that no group is shown to be universally healthier than the other groups across all health outcomes. For example, it has been documented that in the U.S. white elderly individuals have higher cancer prevalences than Hispanics and blacks, but that the latter groups are more likely to report having diabetes (National Research Council of the National Academics 2004). In such a situation, figuring out how to aggregate this information into a single measure without losing the benefits of prevalence-based indicators can be a challenge.Footnote 1

Our approach addresses this technical difficulty. The basic logic behind our proposal which we formalize in a set of axioms is as follows. Suppose we wish to compare two individuals, A and B, who might be suffering from cancer and cardiovascular disease. If individual A has both cancer and cardiovascular disease and individual B only has cancer, then, barring differences in the severity of the conditions, we say that individual A is in a worse health state than individual B. However, if individual A has cancer and individual B has cardiovascular disease, we say that they are not comparable. While we are focused on diseases in this example, the idea can also be applied to other prevalence-based measures, such as limitations in activities of daily living or health behaviors.

Based on this reasoning, which we aggregate to population-level comparisons, we take the following three steps. First, we construct an index of comparability. In a context of multi-morbidity, this indicator captures to what extent the populations can be ranked using our axioms, and is influenced by the overlap in the profiles of conditions affecting the comparison groups. Second, we take the comparability information into account using our main indicator, the partial ordering disparity index (PODI). The PODI is an ordinal measure for population-level pairwise comparisons that ranks groups on the basis of relative prevalences. Third, we show how the PODI can be extended to generate a health ranking in multi-group comparisons. Our aim is to provide an intuitive health index that can be readily implemented using the information commonly collected in health surveys. Given the concerns that have been raised about self-assessed health measures, especially in the presence of multi-morbidity, our health indicators provide conservative estimates of the health disparities between populations that can be used to complement existing analyses. We illustrate the applicability of our analytical framework by drawing on data from the National Health Interview Survey (NHIS) to study racial health disparities. Those wishing to apply this approach in the future can refer to the user-friendly R-code and examples presented in the online appendix of this work.

The paper is organized as follows. In Sect. 2, we introduce our axioms and formalize how they can be used to provide population-level comparisons. Building on the pairwise relations discussed in Sect. 2, we present in Sect. 3 a new health measure, the partial ordering disparity index. We then derive the formal properties of the PODI and illustrate how it can be extended to compare any number of populations. In Sect. 4, we use data from the NHIS in applying the PODI to examine racial differences in health in the U.S. We discuss and compare our measures to others from the literature in Sect. 5. In Sect. 6, we conclude.

2 Definitions and Concepts

2.1 Axioms

Within the scope of our work are health comparisons based on data from health surveys, such as the National Health Interview Survey (NHIS), the Health and Retirement Study (HRS) and related surveys, and the World Health Survey (WHS). In addition to collecting data on socioeconomic variables, these surveys generally gather some information on the burden of disease at an individual level, such as self-reports on the prevalence of diseases or on limitations in daily life activities. In the analysis that follows, we assume that we have information on the prevalence of relatively common diseases, even though our approach can be implemented for any prevalence-based variable (see Sect. 5). The following axioms provide the foundation for our analysis.

Axiom 2.1

Individuals who have no condition are healthier than individuals with at least one condition.

Axiom 2.2

Individuals with the same combination of conditions are all equally (un)healthy.

Axiom 2.3

Individual j is less healthy than individual k if j has the same conditions as k and at least one additional condition.

We expect that these axioms will provide a reasonable basis for comparisons at the population level, assuming that the diseases and the limitations covered in the health survey data used accurately reflect the conditions that afflict the population in question. We wish to emphasize that our goal is to make comparisons at the population level, as we are well aware that the judgments derived from our axioms might not hold in all individual-level comparisons.

In the discussion, we explore the question of how these axioms can be extended to cover more informed assessments. The basic analytical framework is, however, unchanged by those modifications.

2.2 Partial Ordering

Axioms 1 through 3 provide a partial ordering of any two individuals based on pairwise comparisons. We define those comparisons using two relations, denoted dominance and comparability. Let \({{\mathcal {D}}}=\{D_1,\dots ,D_k\}\) be the set of all conditions under study. The power set \({{\mathcal {C}}}={{\mathcal {P}}}({{\mathcal {D}}})\) gives the set of all possible combinations of conditions, including the case of no condition as the empty set, \(\emptyset \). Elements of \({{\mathcal {C}}}\) will be denoted by \(C_j\). Define \(\succ \) as the non-reflexive, non-symmetric, and transitive relation of dominance and \(\sim \) as the reflexive, symmetric, and transitive relation of comparability.

Definition 2.1

(Comparability) \(\forall C_i,C_j\in {{\mathcal {C}}}: C_i\sim C_j\text { iff }C_j\subseteq C_i{{\text { or }}} C_i\subseteq C_j\).

Definition 2.2

(Dominance) \(\forall C_i,C_j\in {{\mathcal {C}}}: C_i\succ C_j\text { iff }C_j\subset C_i\).

While these relations are defined in terms of conditions, they can be easily translated into comparisons between individuals defined by condition profiles. The comparability relation captures the idea that individuals with different medical profiles might not be comparable. In turn, among comparable individuals, the dominance relation establishes a health ranking. Coming back to the example in the introduction, we can say based on Definition 2.1 that if individual A has cancer and individual B cardiovascular disease, they are not comparable. However, if individual A has both cancer and a cardiovascular disease, then A is comparable to B. Furthermore, since individual A’s conditions are a strict subset of individual B’s conditions, we can say that, barring differences in the severity of the conditions, individual A’s health state dominates that of individual B (Definition 2.2).

The relations 2.1 and 2.1 imply a partial ordering of the elements of \({{\mathcal {C}}}\). More precisely, a strict partial order can be defined by the set \({{\mathcal {C}}}\) and the irreflexive, antisymmetric, and transitive dominance relation. It can be depicted as a directed, acyclic graph \({{\mathcal {G}}}=(N,E)\), with nodes N given by the elements of \({{\mathcal {C}}}\) and edges E given by the elements of \({{\mathcal {C}}}\times {{\mathcal {C}}}\) for which the conditions stated in Definition 2.2 apply. \({{\mathcal {D}}}\) is the maximal element of \({{\mathcal {G}}}\) and \(\emptyset \) is the minimal element of \({{\mathcal {G}}}\). In a similar way a non-directed graph \({{\mathcal {G}}}'\) can be constructed that represents comparability relations (relation 2.1). An example for \({{\mathcal {D}}}=\{A,B,C\}\), given in Fig. 1, represents the dominance relation. For instance, the node \(\{A,B\}\) dominates both nodes \(\{A\}\) and \(\{B\}\) and also the minimal element \(\emptyset \) because of transitivity. In turn, it is dominated by the maximal element of the graph. A graphical representation of the comparability relation would look similar, by directed edges replaced by undirected edges.

Fig. 1
figure 1

Example of partial ordering with three conditions A, B, and C. Nodes (shown as circles) represent all of the possible combinations of conditions. Edges (shown as arrows) show the dominance relation, e.g., \(\{A,B\}\) dominates \(\{A\}\) and \(\{B\}\) and because of transitivity, also \(\emptyset \). In the literature, partial orderings are often not depicted with directed edges (arrows), but with undirected edges (lines). We chose to use the former for readability

2.3 Population Level Comparisons

Having formally defined health relations at the individual level, the next logical step is to formulate a strategy to aggregate these relations for the purposes of making group comparisons. To that end, we build on the concept of expected dominance, which can be traced back to Lieberson (1976) and has also been applied to health comparisons in recent work by Herrero and Villar (2013). This intuitive measure reflects the probability that a given relation (2.1 or 2.2) can be established between any two randomly selected individuals from two different groups. Computationally, this measure corresponds to the proportion of pairwise comparisons between individuals of two groups that constitute a relation of comparability or dominance. A possible drawback of using a measure based on the “average” (expectation) is that it is blind to the presence of particular subgroups for observations for which we might observe different patterns in health disparities. However, in Sect. 3.2 we show that it is possible to decompose our health index into the contributions of different subgroups.

In this section we build on this reasoning to define two population level measures: an index of dominance and an index of comparability.

2.3.1 Population Level Comparisons: Comparability

There is no guarantee that any two groups are composed of highly comparable individuals. If relations of comparability can be established across a relatively small number of individuals only, then a health ranking might not be informative. We acknowledge this potential shortcoming in two ways. First, we construct an index of comparability following the reasoning outlined above. This index explicitly measures to what extent the disease profiles of the individuals in the two populations can be ordered using our relations. Based on the value of the index of comparability, the analyst can judge whether the results are meaningful. Second, we use the index of comparability to adjust our health disparity measure. In the following, we formalize the index of comparability; the health disparity measure is introduced later.

Let us define the index of comparability formally. Let \(\Omega _a\) and \(\Omega _b\) denote the sets of individuals from two groups. The proportion of observations individual \(\omega \) is comparable to can be written as

$$\begin{aligned} J_i(\omega )=\sum \limits _{C_j:C_\omega \sim C_j} \mathrm {P}_i[C_j], \end{aligned}$$
(2.1)

where \(C_\omega \) is the set of conditions that afflicts \(\omega \) and \(\mathrm {P}_i[C_j]\) is the proportion of individuals with conditions \(C_j\) in group \(\Omega _i\), the reference group, which can be either \(\Omega _a\) or \(\Omega _b\). That is, the comparability of \(\omega \) can be assessed both relative to her own group or to the comparison group. The choice of reference group is not important for our health disparity measure (see Sect. 3), but is needed to define aggregate comparability and dominance. Intuitively, comparability is between zero and 1 , \(0\le J_i(\omega )\le 1\); i.e., in the most extreme cases individual \(\omega \) might be comparable to anyone (\(J_i(\omega )=1\)) or not be comparable to anyone else (\(J_i(\omega )=0\)). The latter case requires that \(\omega \) is not a member of \(\Omega _i\). If \(\omega \in \Omega _i\), then she is at least comparable to herself, guaranteeing \(J_i(\omega )>0\).

Having defined \(J_i\), we only need to aggregate across individuals to arrive at an aggregate measure of comparability.

Definition 2.3

The comparability index \(\mathrm {CI}_i(\Omega _j)\in {(0,1)}\), with reference group \(\Omega _i\), is given by

$$\begin{aligned} \mathrm {CI}_i(\Omega _j)=\frac{1}{n_j} \sum \limits _{\omega \in \Omega _j} J_i(\omega ). \end{aligned}$$

The interpretation of this definition is straightforward; i.e., high CI values mean that most observations can be compared to each other, while the opposite is true when the CI is close to zero. By construction, every observation is comparable to observations with no conditions (minimal element of \({{\mathcal {G}}}\)) and to observations that suffer from all conditions (maximal element of \({{\mathcal {G}}}\)); and vice versa. Thus, the CI is bounded below as follows:

$$\begin{aligned} \mathrm {CI}_i(\Omega _j)\ge \mathrm {P}_i[\emptyset {{\text { or }}} {{\mathcal {D}}}](2-\mathrm {P}_i[\emptyset {{\text { or }}} {{\mathcal {D}}}]), \end{aligned}$$
(2.2)

where \(\mathrm {P}_i[\emptyset {{\text { or }}} {{\mathcal {D}}}]=\mathrm {P}_i[\emptyset ]+\mathrm {P}_i[{{\mathcal {D}}}]\). For example, if \(\mathrm {P}_i[\emptyset {{\text { or }}} {{\mathcal {D}}}]=0.5\), the \(\mathrm {CI}\) is equal to or higher than 0.75, even if the observations with at least one but not all conditions are not all comparable to each other. This means that large proportions of healthy individuals in both groups will increase comparability, even if the conditions of the unhealthy individuals differ substantially between groups. Moreover, the comparability index is symmetric in the sense that \(\mathrm {CI}_i(\Omega _j)=\mathrm {CI}_j(\Omega _i)\).

To account for this, we can decompose the CI into two parts. The first part, \(\mathrm {CI}^D_i(\Omega _j)\), given on the right hand side of Eq. (2.2), gives the proportion of comparability due to the presence of observations with no or all conditions (see above). The second part is given by:

$$\begin{aligned} \mathrm {CI}^N_i(\Omega _j)=\frac{1}{n_j} \sum \limits _{\omega \in \Omega _j:C_\omega \in {{\mathcal {N}}}} J^N_i(\omega ), \end{aligned}$$
(2.3)

where \({{\mathcal {N}}}={{\mathcal {C}}}/\{\emptyset ,{{\mathcal {D}}}\}\) is the set of conditions that includes all combinations of conditions except the minimal element (healthy) and the maximal element (all conditions) and

$$\begin{aligned} J^N_i(\omega )=\sum \limits _{C_j:C_\omega \sim C_j} \mathrm {P}_i[C_j] {{\text { for }}}C_j,\quad C_\omega \in {{\mathcal {N}}}. \end{aligned}$$
(2.4)

This gives the contribution to the overall comparability of individuals with at least one but not all conditions; in other words, to the comparability among the sick who might not be comparable. Before turning to dominance, we wish to make a final important remark regarding the interpretation of comparability. All of our measures are aggregations of individual-level comparisons. For this reason, even if two populations have identical distributions of prevalences at the population level, they might have a low comparability index if the individuals constituting the groups are not comparable. Ultimately, this can be the case because our measures of comparability are all based on the concept of dominance, and not on dissimilarity. One individual is not comparable to another individual, not only because each has a different affliction, but because it is not possible to establish a relation of dominance between them. The clearest example of such a situation is a comparison between a completely healthy individual and an individual suffering from several conditions. While these individuals may satisfy the comparability criteria, they clearly have very different medical profiles. At the same time, as a necessary but not sufficient condition for achieving high levels of comparability, populations have to suffer from similar diseases.

2.3.2 Population Level Comparisons: Dominance

The procedure we used to count the proportion of comparable individuals can also be applied when assessing the proportion of individuals linked by a relation of dominance. In addition, we build on this reasoning and include a comparability correction: we propose weighting the dominance relations not by the total possible pairwise comparisons, but rather by the total comparable relations only. In essence, our axioms provide us with tools that can be used to establish rankings among the comparable, while staying neutral among the non-comparable. For this reason, while we still report information on the total proportion of comparable relations, we exclude those individuals for whom we cannot establish an ordering from dominance measures.

Let \(\Omega _a\) and \(\Omega _b\) denote the sets of individuals from the two groups to be compared; let \(\mathrm {P}_i[C_j]\) be the proportion of individuals with conditions \(C_j\) in group \(\Omega _i\). For individual \(\omega \) with conditions \(C_\omega \) the proportion of dominated individuals is given by:

$$\begin{aligned} I_i(\omega )=\sum \limits _{C_j:C_\omega \succ C_j} \mathrm {P}_i[C_j]. \end{aligned}$$
(2.5)

For a given reference group, \(I(\omega )\) can be computed for each individual \(\omega \in \Omega _a,\Omega _b\).

Note that since \(J_i(\omega )\ge I_i(\omega )\) the expectation of \(I_i(\omega )\) is bounded by comparability. To account for this, we count the number of relations of dominance over the comparable pairwise relations instead of the total. This correction defines the comparability weighted health dominance index.

Definition 2.4

The comparability weighted health dominance index \(\mathrm {CHDI}_i(\Omega _j)\in {[0,1]}\), with reference group \(\Omega _i\), is given by

$$\begin{aligned} \mathrm {CHDI}_i(\Omega _j)=\frac{1}{n_j} \sum \limits _{\omega \in \Omega _j} \frac{I_i(\omega )}{J_i(\omega )}. \end{aligned}$$

Note that the CHDI is not defined when there are uncomparable sets of conditions, i.e., when at least one member in \(\Omega _j\) is not comparable to any member of \(\Omega _i\), as this leads to division by zero because \(J_i=0\).

There are two issues that remain unresolved. First, the values of the comparability index and of the CHDI are dependent on the choice of reference group. Second and more importantly, we require a way to compare health across multiple groups since, as we show in the next section, dominance measures are not always transitive. That is, such measures do not necessarily lead to a unique ranking of populations. Our proposed health ranking indicator addresses both concerns.

3 The PODI Index

Having formally defined the relations of dominance and comparability at the population level, we propose the partial ordering disparity index (PODI). The PODI is constructed through a simple modification of the CHDI that resolves the issue of having to choose a reference population. In the first part of this section, we present the PODI and show that, despite its simplicity, the index has a number of desirable properties: i.e., it is not altered by the sample size or the choice of reference group, and it properly reflects the health improvements within groups. Then, in the remainder of the section, we illustrate how the PODI can be used to establish a health ranking between any number of groups. In conclusion, we demonstrate how it is possible to capture the essence of the axioms we presented in the first part of the paper in a health measure capable of ranking across multiple groups or populations.

3.1 Definition

We define the PODI between two groups as the difference between the CHDIs resulting from alternating the reference group. This essentially captures the difference in expected domination.

Definition 3.1

For two groups \(\Omega _a\) and \(\Omega _b\) the \(PODI\in {[-1,1]}\) is defined as

$$\begin{aligned} P_{ab}=\mathrm {CHDI}_a(\Omega _b)-\mathrm {CHDI}_b(\Omega _a). \end{aligned}$$

As noted above, the CHDI captures dominance among the comparable individuals. Thus, a value of 1 of the PODI means that among the comparable individuals every member of group \(\Omega _b\) is less healthy than every member of group \(\Omega _a\); the reverse is true when \(P_{ab}=-1\). A value of zero implies that both groups are equally (un)healthy. Intermediary values can also be interpreted directly; values above zero indicate the disadvantage of \(\Omega _b\) relative to \(\Omega _a\), while the opposite is the case for values below zero. Intuitively, the PODI measures the relative expected dominance of one group over the other. For example, \(P_{ab}=0.3\) means that on average, members of \(\Omega _b\) dominate 30% more members of \(\Omega _a\) than members of \(\Omega _a\) dominate members of \(\Omega _b\). A value of \(-0.3\) again indicates a reversed situation. This also means that a given value x of the PODI can be due to many different combinations of values of the CHDI.

3.2 Properties

The PODI has a number of interesting properties. The choice of the reference group, the sample size, and the definition of the pairwise relations do not alter the results in unexpected ways.

Property 3.1

(Semi-symmetry) If the distributions of conditions are changed for groups\(\Omega _a\)and\(\Omega _b\)the sign of the PODI is reversed, i.e.\(P_{ab}=-P_{ba}\), otherwise it remains unchanged.

Property 3.2

(Sample size independence) The PODI does not depend on sample size, only on relative frequencies. That is, if\(n_a\)and\(n_b\)denote the sizes of populations\(\Omega _a\)and\(\Omega _b\)and\(n_c\)and\(n_d\)are the sizes of two other populations\(\Omega _c\)and\(\Omega _d\)for which\(n_a\ne n_c\) and \(n_b\ne n_d\)hold, but the distribution of conditions is the same for\(\Omega _a\)and\(\Omega _c\)as well as for\(\Omega _b\)and\(\Omega _d\), then\(P_{ab}=P_{cd}\).

Property 3.3

(Mirror property) If the dominance relation as given in Definition 2.2is reversed, i.e.\(\forall C_i,C_j\in {{\mathcal {C}}}: C_i\preceq C_j{{\text { iff }}} C_j\subset C_i\), calculating the “reversed” PODI yields\(1-P_{ab}\).

Furthermore, the PODI is well behaved with respect to changes in the prevalence of conditions that result in an improvement in the health of one of the groups. We consider two changes that unambiguously improve the health of a group and demonstrate that those changes also improve the PODI. First, let one condition be removed from a member of group \(\Omega _a\); call this a cure. The second change consists of adding an individual with no conditions to group \(\Omega _a\); we call this a group improving addition. Let \(\Omega _{a_C}\) be the group resulting from curing one member of a, and \(\Omega _{a_A}\) be the group resulting from a adding a healthy individual to group a. Then, the following properties hold (proofs are given in the “Appendix”).

Property 3.4

(Cures) Cures do not decrease the PODI, i.e.\(P_{ab}\le P_{a_Cb}\).

Property 3.5

(Group improving additions) Group improving additions do not decrease the PODI, i.e.\(P_{ab}\le P_{a_Ab}\).

Finally, the PODI between two populations can be decomposed into the contributions of distinct subpopulations. For example, suppose we are interested in examining the relative health of the males and the females in a given country. We might suspect that these differences are not constant across, for example, socioeconomic groups. This hypothesis can be tested directly since, as we show, the PODI is the weighted sum of the relative health across subpopulations.

Formally, suppose populations \(\Omega _a\) and \(\Omega _b\) can each be divided in S subgroups, denoted with the superindex s. For each subgroup s the PODI can be calculated as \(P^s_{ab}=\mathrm {CHDI}^s_a(\Omega ^s_b)-\mathrm {CHDI}^s_b(\Omega ^s_a)\). Then, the aggregate PODI can be decomposed into weighted contributions of each subgroup comparison and an additional cross-group comparison, denoted by \(P^{-s}_{ab}\).

Property 3.6

(Decomposability) The PODI can be decomposed as

$$\begin{aligned} P_{ab}=\sum \limits _{s=1}^S w^s P^s_{ab} + P^{-s}_{ab}, \end{aligned}$$
(3.1)

where\(w_s\)is a weight given by\(\frac{n_a^sn_b^s}{n_an_b}\). \(P^s_{ab}\)equals

$$\begin{aligned} \frac{1}{n^s_b} \sum \limits _{\omega \in \Omega ^s_b} \frac{I^s_a(\omega )}{J^s_a(\omega )}-\frac{1}{n^s_a} \sum \limits _{\omega \in \Omega ^s_a} \frac{I^s_b(\omega )}{J^s_b(\omega )} \end{aligned}$$
(3.2)

with\(I^s_a(\omega )\)defined as\(\sum _{C_j:C_\omega \succ C_j} \mathrm {P}^s_a[C_j]\), where\(\mathrm {P}^s_a[C_j]\)is the distribution of conditions in\(\Omega ^s_a\). \(J^s_a(\omega )\), \(I^s_b(\omega )\), and\(J^s_b(\omega )\)are defined correspondingly.\(P^{-s}_{ab}\)is given by

$$\begin{aligned} \frac{1}{n_b} \sum \limits _{\omega \in \Omega ^s_b} \frac{I^{-s}_a(\omega )}{J^{-s}_a(\omega )}\frac{1-n_a^s}{n_a}-\frac{1}{n_a} \sum \limits _{\omega \in \Omega ^s_a} \frac{I^{-s}_b(\omega )}{J^{-s}_b(\omega )}\frac{1-n_b^s}{n_b} \end{aligned}$$
(3.3)

with\(I^{-s}_a(\omega )=\sum _{C_j:C_\omega \succ C_j} \mathrm {P}^{-s}_a[C_j]\)and\(\mathrm {P}^{-s}_a[C_j]\)is the distribution of conditions in\(\Omega _a/\Omega ^s_a\). \(J^{-s}_a(\omega )\), \(I^{-s}_b(\omega )\), and\(J^{-s}_b(\omega )\)are defined correspondingly.

A proof is given in the “Appendix”. This property shows that a decomposition by subgroups as described above is relatively straightforward.

3.3 A Health Ranking

In applying any measure of health, we often aim to draw conclusions based on comparisons that involve more than two groups. Unfortunately, the pairwise measures defined in the previous sections can not be applied directly to these scenarios since they are not always transitive.

Property 3.7

(Non-transitivity) The PODI is non-transitive in the sense that if\(P_{ab}>0\)and\(P_{bc}>0\)(or\(P_{ab}<0\)and\(P_{bc}<0\)) then\(P_{ac}\lessgtr 0\).

In our case, this issue can arise when there is a high degree of non-comparability across groups (an example is provided in the “Appendix”).

This well-known shortcoming has been addressed in the literature on the optimal aggregation of pairwise comparisons (e.g. Gavalec et al. 2015), and on pairwise comparisons of health (Herrero and Villar 2013). A common strategy for overcoming non-transitivity is to summarize all pairwise comparisons \(P_{ij}\) for each group i with all other groups j into a single index \(I_i\). By construction, this index is transitive and thus provides a health ranking across groups.

Here we follow the approach developed in Crawford and Williams (1985) and Barzilai and Golany (1990). Intuitively, the strategy is to obtain individual group indexes that are as close as possible to the values of each of the possible pairwise comparisons. Since this cannot be fully achieved, the method specifies a penalty function consisting of taking the square of the differences between \(P_{ij}\) and the values for each group \(I_i\).

Let \(\Omega _1,\Omega _2,\dots ,\Omega _G\) be the groups for which we know \(P_{ij}\) and want to derive \(I_i\). The individual \(I_i\) are normalized to sum to zero, i.e., \(\sum I_i=0\). Moreover, the individual index \(I_i\) to be as close as possible to the pairwise PODI between any pair of populations. That is, that pairwise differences of index values should mimic pairwise comparisons between populations given by \(P_{ij}\) as closely as possible. To achieve this, we minimize

$$\begin{aligned} \min \limits _{I_{s}\in \left\{ I_{1},...,I_{S}\right\} }\sum \limits _{i=1}^{G}\sum \limits _{j=1}^{G}[P_{ij}-(I_{i}-I_{j})]^{2}, \end{aligned}$$
(3.4)

given the constraint that

$$\begin{aligned} \sum \limits _i I_i=0. \end{aligned}$$
(3.5)

An optimal solution to the problem is given by:

$$\begin{aligned} I_i=\frac{1}{P}\sum \limits _{j=1}^P P_{ij}, \end{aligned}$$
(3.6)

which is simply the mean of all pairwise comparisons for group i.

This solution guarantees that higher values of \(I_i\) imply that a given group i has a relatively low health status. Furthermore, Barzilai and Golany (1990) prove that the solution is unique; i.e., that there are no other optimal solutions to (3.4), and it does not depend on the ordering of groups; i.e., that relabeling groups does not change the result. Note that the additive restriction (3.5) is required for a unique solution, but that the choice of normalization does not affect the ranking. In the next section we apply our concepts to a comparison of health across races in the U.S.

4 Empirical Application

We use our health indexes to study racial health differences in the US. Drawing on data from the National Health Interview Survey (NHIS) for the year 2014, we study the relative prevalence of 15 conditions across four races for 3570 individuals. The comparison is narrowed to ages 60 to 65 to control for the differences in age distributions across races. The objective of this exercise is to illustrate that while simple comparisons of prevalences across groups are appealing, they are often not sufficient to rank health across populations or groups.

4.1 Descriptive Statistics

We report the prevalences of the full set of 15 diseases in the NHIS for our age and racial groups (Table 1). Prevalences are measured by survey questions that ask respondents whether they have ever been told by a doctor that they suffer from a given condition. Because they are relatively small in number, the individuals with missing values are removed from the sample.Footnote 2 In addition, we restrict our exercise to the individuals who reported belonging to one of the four main racial categories, and exclude those who reported belonging to the other category, as this category is difficult to interpret. The data should be read with some caution since the narrow age group results in small samples for some racial groups. For example, the Asian group is composed of 163 individuals. As a consequence, the number of cases of some of the less prevalent diseases might be subject to small sample bias.

Table 1 Prevalence by race

A first look at the relative prevalences across the races reveals that some conditions have a much steeper race gradient that others. For example, black Americans have a much higher prevalence of hypertension (69.6%) than other races (at around 50%). In contrast, the differences in the prevalence of heart attacks across races appear to be much smaller (between 4.9% for Asian individuals, and 6.1% for Hispanic and white individuals). Nevertheless, the table shows a wide range of prevalences across races for each disease. Given that no racial group has unambiguously lower prevalences than the other groups across all conditions, it is not possible to establish a relative health ordering without further analysis.

Table 2 Number of diseases per person by race

In addition, Table 2 highlights the empirical relevance of multi-morbidity. Only 15.3% of the sample reported having none of the 15 diseases, while 20.8% indicated that they have one condition. The remaining 63.9% of the individuals across all races reported suffering from at least two conditions. While the prevalence of multi-morbidity differs across races, in each of the racial groups more than half the individuals said they have at least two conditions, and a high proportion indicated that they have several more. A further difference across that races is that among the individuals suffering from multi-morbidity, the most frequent combinations of conditions vary in prevalence as well. For example, among those individuals with two conditions, the combination of hypertension and cholesterol is twice as prevalent among blacks (30.8%) as among Hispanics (15.9%) (Table 3).

Table 3 Most prevalent combinations of diseases by race

Our health measures are designed to provide a health ranking for precisely this type of complex information on prevalences. First, we assess the comparability of the group-specific profiles of diseases. Then, once we have established whether a prevalence-based comparison is reasonable, the PODI provides a health ranking.

4.2 Health Measures

4.2.1 Comparability

Overall, the comparability index across races (Table 4, panel A) ranges from .51 (between blacks and whites) to .59 (between Asians and Hispanics). This means that it is possible to establish relations of comparability for over half of the possible pairwise comparisons across individuals of different races. In this example, we have purposely kept diseases at the lowest possible form of aggregation, which makes comparability more difficult. Hence, we treat this as a lower bound to the comparability levels. To illustrate the impact of the level of aggregation in conditions, we have calculated the comparability index after merging all of the circulatory diseases into one category (following the IC-10 category definitions). Grouping circulatory conditions has a very noticeable impact on comparability, raising the lowest comparability index to .69 and the highest to .79 (Table 4, panel B). Results on decomposed comparability for the disaggregated example are shown in the “Appendix” (Tables 8 and 9).

Table 4 Comparability

4.2.2 PODI and Dominance

Table 5 shows both of the pairwise PODI indexes (3.1) across races and the summary index (3.6) that aggregates the pairwise comparisons; results on the underlying values of the CHDI are given in the “Appendix” as an example for the disaggregated case. We include both the PODI for the full set of conditions and the PODI for the aggregation of circulatory diseases. In this particular case, we find that the pairwise PODIs satisfy transitivity, but this need not be the case in the other scenarios. Across both scenarios, our results indicate that the Asian group is the healthiest, and the black group is the least healthy. However, the intermediate positions depend on the modeling of circulatory diseases. In our preferred model, with a single category for circulatory diseases, the white group is ranked higher than the Hispanic group. The opposite outcome is found when we use multiple circulatory disease categories; and, hence, when comparability is low.

Table 5 PODI

4.2.3 Comments

The results of our application are generally in line with existing findings on racial/ethnic disparities in health in the US (for a recent overview, see Braveman et al. 2010). It is well-established that morbidity levels are highest among the black population (for a recent reference, see Ward 2013). In addition, the results highlight the importance of taking comparability into consideration. The Hispanic group is ranked as healthier than the white group in the low comparability case only. This finding corresponds with recent research indicating that the so-called Hispanic paradoxFootnote 3 might not apply to morbidity. However, the prevalences found among the Asian population should be viewed with care given the relatively small sample size for this group. For an in-depth study on racial differences in health among the elderly in the U.S. that includes such measures, see, for example, Hummer et al. (2004).

5 Discussion

5.1 Methodological Considerations

In this paper we have explored a specific application of a partial order approach with condition prevalence as a more robust objective measure of health than self-reported assessments. Our current implementation focused on disease prevalences allows us to readily compute our health index in most health surveys. However, it would be possible to extend our analysis to include expert knowledge on the relative severity of different diseases. Such a refinement would entail modifying the definitions of the relations of dominance and comparability to include additional criteria, but the analytical framework would remain unchanged. Other extensions, and applications to different kinds of data can be considered in our framework; the theoretical backbone of our measures can be adapted to those scenarios. For example, we could apply the PODI as an alternative index to aggregate the information of health surveys on multiple domains of health, such as the EQ-5-D (Devlin and Brooks 2017). In that vein, the PODI can be applied to the types of data that are used to construct other more general multidimensional indexes of well-being that are not restricted to health. Examples include, but are not limited to, the Better Life Index developed by the Organization for Economic Co-operation and Development (OECD), or the Human Development Index sponsored by the United Nations (UN). The analytical structure of our indicators can support ordinal variables with multiple categories, as these applications would require, though modifications of our R functions would be required.Footnote 4

In addition to the issues related to data collection (diagnosis bias and reporting bias) already discussed, a potential drawback of the simple criteria on which our approach is based is that they could result in low comparability; i.e., in health rankings that only capture reliably a fraction of the population. Our recommendation for potential users is to avoid including too many individual conditions or dimensions in the analysis. Aggregations across typologies of diseases are a straightforward way to increase comparability. Still, if the focus of the analysis is not on individual conditions, using coarser definitions for disease profiles might result in only a small loss of information. In other contexts, limiting or aggregating the number of ordinal categories per dimension could be an option. These are just two examples, and ultimately the best approach is clearly dependent on the application.

5.2 Comparison to Other Approaches

There exists a variety of alternative measures of population health and well-being which emphasize different issues. In order to clarify the contribution of the measure we have developed in this article, we discuss our proposal with regard to three closely related literatures. First, we discuss our treatment of multi-morbidity compared to the clinical literature, which explores consequences of disease combinations on specific health outcomes. Second, we relate our approach to existing research on the measurement of multi-dimensional well-being that focuses on the aggregation problem, most frequently using weighting approaches. Finally, we review alternative analysis that also utilize partial order approaches to generate rankings in multi-attribute situations as a response to the criticisms levied on weighting schemes.

5.2.1 Clinical and Public Health

Our focus on disease prevalences is perhaps closest to clinical and public health research. The clinical literature on multi-morbidity has explored measures that evaluate combinations on diseases based on their influence on highly specific health outcomes of interest (de Groot et al. 2003). For example, the Charlson index (Charlson et al. 1987) is a widely used health indicator which is computed as the weighted combination of an array of conditions, and it is validated by its capacity to predict the future mortality of the patients. Much like the Carlson index, many other clinical indicators are designed to assess multi-morbidity in specific clinical applications. For example, the Hallstrom index (Hallstrom et al. 1996) focuses on the prediction of cardiac episodes, and the Hurwitz index (Hurwitz and Morgenstern 1997) assesses how multi-morbidity influences the healthcare choices of patients with back pain.

A limitation of these measures is that the strong influence of the application on the design of the indexes can result in a poor fit when used to study other issues. Brilleman et al. (2014) evaluate the capacity of different multi-morbidity indexes to predict patient health care costs, and conclude that the Carlson index is over-performed by simple disease counts. Furthermore, these methods also have strong data requirements, such as mortality follow-ups, that go beyond the information collected in most national health surveys. In some cases, the methods need to be calibrated to each specific population, which results in further data demands.Footnote 5

Perhaps in an effort to avoid the specificity of the multi-morbidity measures in clinical research, the public health literature has often opted for simpler approaches. Some prominent studies rely on disease-by-disease prevalence comparisons without further elaboration (Banks et al. 2006). Other studies, more focused on multi-morbidity, use disease counts as their primary index (Marengoni et al. 2008; Ward 2013). In the context of these studies, the PODI allows us to establish a more systematic approach to population level health assessments. In addition, we can generate measures of comparability that provide meaningful summaries of the overlap of the condition profiles between the populations analyzed. This is important given that, as we have shown in our empirical application and is recognized in the clinical literature, multi-morbidity contains a wide diversity of complex disease profiles that often have a socioeconomic gradient (Schiøtz et al. 2017).

5.2.2 Health Economics

In the field of health economics there have been several proposals to generate indexes that aggregate across multiple dimensions of health on the basis of self reported health data. A well known example of this approach is the EQ-5D methodology, developed by the EuroQol Group (Devlin and Brooks 2017).Footnote 6 The basis of the EQ-5D approach is a survey in which individuals are asked to self assess their health state in a variety of dimensions, ranging from the pure physical (pain), to limitations (or lack of) on their daily life activities as a consequence of their health. These measures are then aggregated into a single index.

Setting aside the implementation requirements for now, the tradeoff of utilizing such measures is the introduction of possibly contentious assumptions. First, these indexes rely on self-assessed measures, and as such have been shown to be vulnerable to reporting heterogeneity, a form of bias. This issue occurs when, for a similar objective state of health, individuals report systematically different self assessments depending on their socioeconomic characteristics, or country of pertinence (Bago d’Uva et al. 2008). Second, aggregating the self assessed limitation into a single index requires attributing preference weights or relative importances to the different dimensions of limitation. As the developers of the EQ5-D acknowledge (Devlin et al. 2018), there remains considerable debate on the methods used to evaluate preferences over certain health states. In addition, using preference-based weights might not be desirable in all applications; in some cases the analyst might prefer comparisons based purely on objective health differences, and not confounded by individuals’ valuations. Last but not least, such summary measures and questionnaires are not always implemented, whereas disease and limitation prevalences appear in most health national surveys.

5.2.3 Partial Orders

An alternative to weighting schemes consists in a variety of approaches that rely on partial orders. We identify two main existing distinct approaches that, like us, use partial order notions as an alternative to the more traditional weighting approach to deal with multiple attributes. The research by Fattore, Maggino and coauthors (henceforth referred to as “LE”) uses the concept of linear extensions as a natural partial order ranking (Fattore et al. 2011; Fattore and Maggino 2015; Fattore and Arcagni 2018). A different approach, developed in a series of papers starting with Arndt et al. (2012) (henceforth referred to as “FOD”), is based on an application of the multidimensional version of first order stochastic dominance (Hussain et al. 2016; Permanyer and Hussain 2018).

5.2.4 Linear Extensions

The work by LE builds on the counting method developed by Alkire and Foster (2011), which classifies individuals as poor if they are below the poverty line in more than a given number of well-being dimensions. The authors point out that, from a partial order perspective, it is not clear that all profiles below and above the poverty line are comparable.Footnote 7 Instead of simple counts, the authors propose establishing a set of deprived profiles, and proceed to compare each individual profile to them using partial order theory. However, note that while LE focus on the measurement of poverty, existing work in other fields has similar strategies to obtain pure orderings (Fattore and Bruggemann 2017).

There is a main difference between our approach and that in the literature that uses linear extensions. Primarily, we differ in the treatment of non-comparable profiles. Instead of using the relation for non-comparability, the authors opt for a fuzzy approach to classifying profiles. The main thrust of their and related contributions consists of exploiting the concept of linear extensions, and the fuzziness arises from the fact that one can obtain multiple linear extensions from a partially ordered set. Succinctly put, a linear extension is a linear order such that it respects the ordering of the original set of profiles; in scenarios with non-comparable profiles, a partial order has multiple linear rankings. The approach in Bruggemann and Carlsen (2011) and others (e.g., Morton et al. 2009) consists in dealing with the multiplicity of orders condensing the information in summary statistics computed from all the linear extensions. Some methods opt to eliminate the ambiguity from non-comparable profiles altogether, by, for example, taking the average position across linear orders as its final ranking. Others, like LE, report the proportion of linear orders that classify a given profile as poor, hence preserving the non conclusiveness of the linear orders. In our proposal, we have opted instead for excluding non-comparable relations from our rankings, and thus focus only on those cases for which an unambiguous order can be established. Bear in mind that this does not equate to completely eliminating certain profiles from the analysis; rather, the more comparable profiles will have a greater importance in the overall assessment. The robustness of the PODI ranking to the treatment given to unclear comparisons can be evaluated on the basis of the comparability index.

5.2.5 First Order Stochastic Dominance

Table 6 Distribution of prevalences across three populations
Table 7 PODI and FOD assessments

The FOD approach is another methodology that imposes strong requirements on the types of populations that can be compared. The contribution by the scholars working with FOD applied to multidimensional indexes consists in operationalizing the multidimensional extension of the concept of first order stochastic dominance in a context with many individuals and attributes. While FOD based evaluations provide uncontroversial orders, they suffer from two main limitations. The first is a technical limitation. While some progress has been made in this vein, there remain feasibility constraints on the number of dimensions for which FOD tests can be computed. Second, the main drawback of the FOD approach is that the conditions needed to establish dominance of a distribution over an other are so demanding that frequently it does not allow us to judge between distributions in applications. Permanyer and Hussain (2018) provide examples and simulations that illustrate this point; essentially, FOD applicability diminishes considerably when the number of attributes increases, particularly when the attributes are not highly correlated. In the application provided in the paper, with multi-attribute cross country evaluations, 47% of the comparisons can be ranked according to FOD for three dimensions, dropping to 31% and 10% for five and ten dimensions respectively.

Compared to FOD based analysis, our proposal allows the applied researcher to compare more frequently populations across multiple dimensions, at the cost of relaxing the requisites to establish an order. The basic intuition for our approach, which we have refined and formalized, is that an order can be established if, in expectation, an individual from a given population can be deemed healthier than that of another. To illustrate the differences between the two approaches, consider the following example from Arndt et al. (2012). Table 6 describes the prevalence of two conditions across three populations, and Table 7 the orders according to our method (panel A), and following the FOD approach (panel B). Comparability is high in this example (\(\geqslant .85)\), and thus our method covers a very large fraction of the possible relations across populations. The PODI index ranks all populations, compared to the FOD approach that is restricted to populations 1 and 3; furthermore, note that our methods coincides with the FOD for those particular populations.

6 Conclusion

We propose a new indicator, the PODI, based on the relative prevalence of conditions across populations. Although no statistic can provide an universal solution for the methodological issues that arise when assessing health disparities, our method specifically addresses two of the main difficulties associated with health comparisons. That is, we provide a measure that is both robust to self-assessment biases, and that takes into consideration the differences in the types of diseases afflicting the comparison groups. We illustrate how our simple framework can be used to analyze complex data on disease prevalences by applying it to current health disparities across racial groups in the U.S.