Comparative Disease Assessment: a multi-causal approach for estimating the burden of mortality

The Comparative Risk Assessment (CRA) framework comprehensively evaluates the impact of exposure to risk factors on health populations using the counterfactual causal approach. We propose a framework, Comparative Disease Assessment (CDA), for assessing the impact of exposure to morbidity from some diseases on health outcomes, particularly death from other (relevant) diseases. This framework has been developed following the ideas of the CRA framework and using the widely accepted concept that exposure to morbidity is usually a risk factor for health outcomes (morbidity/mortality) related to other diseases. Our framework uses a counterfactual and not a categorical approach when attributing the burden of health outcomes to potential causes. This paper describes the different steps and assumptions required to implement the CDA framework, and an illustrative example is used considering diabetes mellitus morbidity as a risk factor for death from heart diseases. One advantage of the CDA framework is that it can be applied using multi-causal death registries. Some assumptions are needed to implement it in order to avoid biases, but at least it can provide preliminary estimations of the impact of exposure to diseases as risk factors for deaths from other diseases. Another main advantage is that the burden of deaths is no longer attributed to a single cause, the underlying cause, as it is almost always done. Finally, this framework provides information on the pattern of comorbidity in a (sub)population of subjects who is about to die. These patterns can be used as a reference for alternative patterns of the general population or patterns of other specific subpopulations.


Background
For more than two decades, the Comparative Risk Assessment (CRA) framework has been proposed and used as a way of comprehensively assessing the impact of risk factors on populations. The CRA represents a systematic evaluation of the changes in population health that would result from changing/ improving the population distribution of exposure to a (group of) risk factor(s) (Murray and Lopez 1999).
"This unified (CRA) framework for describing population exposure to risk factors and their consequences for population health is an important step in linking the growing interest in the causal determinants of health across a variety of public health disciplines from natural, physical, and medical sciences to the social sciences and humanities." (p. 2, Chapter I, Ezzati et al. 2004) On the other hand, the Burden of Disease (BoD) framework (Murray and Lopez 1996;Murray et al. 2000) quantifies the impact of diseases on populations' health. This impact is quantified using the metric of "years of life lost": years of life lost due to living with a suboptimal quality of life (burden of morbidity) and years of life lost due to premature death (burden of mortality).
One of the main differences between the above two frameworks, BoD and CRA, refers to how causality is assigned. BoD uses categorical attribution: health outcomes, particularly deaths, are assigned to just one disease/condition. CRA uses the counterfactual approach: cause is assessed contrasting the observed and the potential outcomes under corresponding scenarios of (exposure to) the risk factors, on the same unit and at the same time.
The categorical attribution of mortality to a single cause (the underlying cause, UC) is a limitation of the BoD framework. The counterfactual approach within the CRA framework allows, though, for health outcomes to be multicausal, considering that morbidity and/or mortality for a particular disease is the result of exposures to multiple risk factors.
Because there is consensus that many diseases are risk factors for other diseases (e.g., Koch 2015;Broderick 1955;Cheung and Li 2012;Kazancioğlu 2013), it makes sense to consider a framework where disease i is viewed as a risk factor for (eventual) death from disease j, i ≠ j. Then, (exposure to) risk factors and the corresponding outcomes (morbidity and/or mortality for particular diseases) within the CRA framework could be replaced by exposures to morbidities for particular diseases and (corresponding and eventual) deaths from other diseases, within a new framework that we could call Comparative Disease Assessment (CDA).
In other words, we propose a framework for (partially) describing exposure to diseases in a population and its impact on (eventual) deaths from other diseases (as UCs). This would allow to change the paradigm for assessing the burden of mortality from "years of life lost due to premature deaths from disease j as the UC" to "years of life lost due to exposure to diseases that are risk factors for (eventual) deaths from disease j" (see Fig. 1), for j ∈ J, J being a relevant group of diseases. This would put the assessment of the burden of mortality under principles similar to those of the CRA framework, thus, avoiding the use of the categorical attribution.
Some authors have used approaches to assess the frequency of a specific association of causes of death, one "underlying" and the other "associated", as in the case of Cause of Death Association Indicator (CDAI) (Desesquelles et al. 2012); however, they have not formally considered the associated/ contributory causes as risk factors for death from the UC.
We realize that multiple causes of death allow for other approaches, like, for example, a sort of syndemic approach (Lancet Editorial 2017). This would be possible if clusters are identified among secondary causes of death surrounding a particular UC (e.g., diabetes). When doing this, together with socio-economic indicators like education and income, we would be able to identify groups of diseases and social conditions that consistently come together when dying from the reference UC.
In the next sections, we discuss further basic concepts and some details on the eventual implementation of the proposed CDA framework.

The Comparative Disease Assessment (CDA) framework
Categorical attribution to risk factors overlooks that many diseases have multiple causes (Lopez et al. 2006). Categorical attribution of deaths to diseases has similar problems: many deaths have multiple causes (Dorn and Moriyama 1964).
If it is important to identify "the true effect" of an exposure on disease occurrence (Maldonado and Greenland 2002). It is natural to be also interested in estimating the (true) effect of being exposed to disease i on eventual death (occurrence) from disease j, i ≠ j and j the UC of death. Exposure to disease i is not exactly a risk factor for eventual death from the same disease i; in fact, it is essentially a precondition. On the other hand, part of the burden due to mortality from disease i should definitely be attributed to exposure to (morbidity for) disease i. We will address this problem later on.
As in the case of the BoD framework, we propose to use death registries for the CDA framework, but now "multi-causal". Secondary causes in death certificates (in Part I and, particularly, in Part II) could be assumed as morbidities before deaths that potentially lead and/or contribute to deaths from the UCs. This assumption is a minimal pre-condition for disease i to be considered a risk factor for eventual death from disease j.
We consider secondary causes, all causes in the death certificate but the UC. We should consider disease i as a "contributory" disease to and not as a "complication" of cause of death j. "Complications" appear along the process that begins with the UC, and, in this sense, they do not contribute neither influence, but are intermediate diseases (intermediate outcomes; Seuc et al. 2013). The standard practice is that "complications" are diseases that appear in Part I of the death certificate, while "contributory diseases" appear in Part II (Koch 2015;Broderick 1955), and, in theory, we should be able to distinguish between "intermediate diseases" (Part I) and "contributory diseases" (Part II).
In practice, this distinction (1) might not be recorded on the death certificate, (2) might be recorded poorly, or (3) might be recorded on the death certificate but not on the corresponding database. Therefore, we might need to consider all other diseases in the certificate (apart from the UC) as contributing, if we want to use them at all. In other words, we might need to consider all secondary causes as contributing.
Similarly as in the CRA approach, we now consider a counterfactual incidence/prevalence distribution of (exposure to) diseases. The current and counterfactual/reference distributions of diseases would generate corresponding deaths and disabilities (or, alternatively, "loss of quality of life") that (both) can be expressed, for example, in disability-adjusted life years (DALYs). The difference between these DALYs would be the burden attributable to the reduction in the diseases' distributions, from the current to the counterfactual/ reference distribution ("theoretical minimum" or whatever other reference distribution that could be used in place of the "theoretical minimum").
The CDA approach should be useful because it allows for a more comprehensive assessment of the impact of health programs (HPs). HPs usually impact (not only on several risk factors but also) on several diseases, generating a multidimensional counterfactual distribution/scenario as the basis for the impact calculation.
A good summary of the information required for the CDA approach is presented in Table 1 (following Table 4.1 in Lopez et al. 2006). This table presents the following four pieces of information (columns) for a relevant list of diseases' outcomes (e.g., mortality): 1. Metric of burden (mortality/morbidity) from disease j (e.g., DALYs) 2. Diseases i 1 , i 2 ,…, i k that are risk factors for mortality/ morbidity from disease j 3. Metric of exposure for each disease i 1 , i 2 ,…, i k 4. Reference ("theoretical minimum") distribution for each disease i 1 , i 2 ,…, i k Probably, theoretical minimum disease incidence/ prevalence for different diseases are dependent. For example, it might be impossible to reduce heart diseases (HD) without reducing hypertension prevalence, etc., but theoretical minimums for (exposure to) risk factors share the same problem. We might not be able to reduce the reference risk factor to its "theoretical minimum" keeping other risk factors' levels (of exposure) fixed at arbitrary levels. It might happen that, for example, if we make/force risk factor 1 to its theoretical minimum, then, necessarily, risk factor 2 comes (close) to its corresponding theoretical minimum too.
Alternatively, we might need to talk about the vector of theoretical minimum distributions for a corresponding vector of risk factors, or, in the case of CDA, for a corresponding vector of diseases. In these cases, we would no longer be considering a vector of theoretical minimums, but, rather, a vector of practical minimums that accounts for dependence between exposures to diseases.

Estimation procedures
It is common to consider diseases (incidences/prevalences) as consequences/outcomes of exposure to risk factors. In BoD studies, the burden of this morbidity is computed using its severity (Haagsma et al. 2014;Burstein et al. 2015;, a value between 0 (perfect health) and 1 (equivalent to death) that allows to combine burden of morbidity and burden of mortality using "years of life lost" (from morbidity and/or mortality) as the common metric.
To consider diseases as causes (risk factors) for health outcomes, like morbidity and/or mortality from another disease, is not rare. The necessary ingredients for assessing the impact of a risk factor on a health outcome are the risk factor exposure distribution and the corresponding relative risk (RR); different levels/values of the risk factor exposure distribution have different RRs associated. Note that RRs for disease morbidity and disease mortality are, in general, different (pp. 245 "Attributable mortality and burden of disease", Lopez et al. 2006), particularly because they (morbidity and mortality) are different events.
However, as far as we know, there are no studies systematically considering exposure distributions to diseases and corresponding RRs for other diseases. Lacking this information, we could try to estimate it from multi-causal death registries.
If we take as an illustrative example diabetes mellitus (DM) as a risk factor for eventual death from HD, a possible solution would be to consider exposure to the disease/risk factor DM just as a binary variable; "exposed" to DM would be those with DM as one of the secondary causes in the death certificate and "non-exposed" to DM those without DM within the secondary causes in the death certificate. The corresponding RRs would be estimated as the proportion of death certificates with HD as the UC within the "exposed" (the risk of death from HD within those exposed to DM) divided by the similar proportion within the "non-exposed" (the risk of death from HD within those non-exposed to DM).
The main limitation of this approach is that it (wrongly) assumes that deaths not including DM as a secondary cause correspond to subjects that did not have DM diagnosed before death.
As we want to estimate the risk of death from HD when diagnosed with (exposed to) DM before death (vs. not diagnosed with/not exposed to DM before death), we need that: (1) those with DM diagnosed before death all have DM as a secondary cause in the death certificate and (2) those without DM diagnosed before death do not have DM as a secondary cause in the death certificate.
Condition (2) is (almost) 100% true, while condition (1) is definitely not true, but, in all probability, the % bias, i.e., those diagnosed with DM not having it as a secondary cause in the death certificates are as such because the physician in charge did consider that, in these cases, DM was not a contributing cause. Therefore, the bias in (1) might not be relevant. If relevant, its effect being that our RR estimation would overestimate the "true" RR.
Correction for this bias could be obtained from revising (a sample of) the death certificates without DM within the secondary causes, identifying those that did have DM diagnosed before death, and checking the correctness of the mismatch.
In any case, with or without bias correction, the above procedure could be used to get a rough estimation of the required population-attributable fraction (PAF), discussed in the next section, and then the BoD i as a risk factor for death from disease j, just using multiple causes of deaths in deaths registries.

Population-attributable fractions
The contribution of a risk factor to disease or mortality is expressed as the fraction of disease or death attributable to the risk factor in a population and is referred to as the population-attributable fraction (PAF). It is given by the equation (when exposure is described as a discrete variable with n levels) (Lopez et al. 2006): Pi RRi For example, let us suppose we are interested in the risk factor "smoking" as a cause for lung cancer, and "smoking" with four exposure levels: (1) no smoking, (2) 1-10 cigarettes per day, (3) 11-20 cigarettes per day, and (4) 21+ cigarettes per day. Let us assume that the current exposure distribution is 1/4 for each of the four exposure levels, the theoretical minimum exposure distribution is 100% in the first (i = 1) exposure level (all the population non-smoking), and that the RR i for lung cancer morbidity with respect to the first exposure level are assumed to be: RR 2 = 1.5, RR 3 = 2.5, and RR 4 = 3.5.
Therefore, the PAF is: This means that approximately 53% of current lung cancer morbidity can be reduced if current exposure is reduced to the theoretical minimum exposure (100% no smoking), as described above.
A more realistic (alternative) smoking prevention HP could change the exposure distribution to: 1st level = 40%, 2nd level = 30%, 3rd level = 20%, and 4th level = 10%. In this case, the PAF would be: meaning that, in this case, the reduction is just 20%, rather than 53% as before.
Within the CDA framework, considering the effect of disease i (e.g., DM) on death from disease j (e.g., HD), we would have: In the above formula, we need to re-define some items: P k : the current probability distribution of exposure to disease i. This distribution could be expressed in terms of n levels of severity/evolution/grades of the disease, e.g., k = 1 for "no DM", k = 2 for "DM without complications", and k = 3 for "DM with complications". Let us assume that, in this case, the current exposure distribution is P 1 = 60%, P 2 = 20%, and P 3 = 20%. RR i; j k : RR of level k of disease i for death from disease j, with respect to the reference level of disease i; e.g., RR i; j 2 is the RR of "DM without complications" (level 2) vs. "no DM" (level 1) for death from disease j. Let us assume the values: RR i; j 2 ¼ 1:5 and RR i; j 3 ¼ 2:5. P 0 i k : the theoretical minimum exposure distribution to disease i. Following the example of DM with three (n = 3) exposure levels, it could be assumed 90%, 5%, and 5% for "no DM", "DM without complications", and "DM with complications", respectively. The PAF ij in this case represents the proportional reduction in mortality from disease j (e.g., HD) due to reduction/change of exposure to disease i (e.g., DM) from current to theoretical minimum (or whatever reference exposure we might use). Using the assumed values above, we would have: meaning that a 26% proportional reduction in mortality from disease j would be observed if we change/reduce exposure to disease i from the current to the theoretical minimum exposure distribution.
An important practical issue is how to estimate the RR k corresponding to the kth level of disease i with respect to the reference level, on mortality from disease j. As mentioned at the end of the previous section, one option would be to consider exposure to the disease/risk factor DM just as a binary variable; "exposed" to DM would be those with DM as one of the secondary causes in the death certificate and "non-exposed" to DM those without DM within the secondary causes in the death certificate. The corresponding RRs would be estimated as the proportion of death certificates with HD as the UC within the "exposed" (the risk of death from HD within those exposed to DM) divided by the similar proportion within the "non-exposed" (the risk of death from HD within those non-exposed to DM).
As an illustration, we have computed the risks of death from HD (ICD-10 codes: I05-I52) as the UC for those with and without DM (ICD-10 codes: E10-E14) as a secondary cause of death, and for those with and without hypertension (HT) (ICD-10 code: I10X), for the years 2005, 2010, and 2015 in Cuba, using the corresponding death registries. From Tables 2, 3, and 4, we see that the RRs of death from HD for those with DM vs. those without DM (as a secondary cause in the death certificates) are 1.47, 1.55, and 1.63, respectively. From Tables 5, 6, and 7, the corresponding RRs for those with HT vs. those without HT are 1.22, 1.45, and 1.84, respectively. As mentioned before, with this approach, we are possibly overestimating the true RRs. The degree of overestimation depends on the proportion of subjects with DM/HT diagnosed before death from HD that did not have DM/HT as a secondary cause in the death certificate. The difference between these two figures might or might not be "justified", and this would be key to decide if bias is or not present.
Alternatively, if we want to obtain estimates of (adjusted) RRs, we could carry out a logistic regression analysis, predicting "death from disease j" as the outcome (disease j as UC: no/yes), and using all other relevant diseases as binary predictors, i.e., exposed to disease i (no/yes) for i ∈ I, where I is a group of relevant diseases that are considered as potential risk factors for death from disease j.
Adjusted RRs were estimated using logistic regression, with card(I) = 2, and i1 = DM and i2 = HT as (binary) risk factors for outcome j = "death from HD"; the results are presented in Table 8. It is seen that the adjusted odds ratios (ORs) are all statistically significant and with a tendency to be smaller than the corresponding crude ORs in the case of DM, while in the case of HT, the two sets of ORs are quite similar.
As already mentioned at the end of the previous section, contributing diseases reported in death certificates are not necessarily the group of diseases (the comorbidity) that the subjects have before death (while alive). Some diseases might appear in the death certificate that were not diagnosed in the subject while alive (not very probable), and some diseases might have been diagnosed in a subject while alive and not reported as contributing diseases in the eventual death certificate (quite probable). As a consequence, the predictive value of the above-mentioned regression model is uncertain.
It should be noted that it is standard practice to use, when relevant, different PAFs for mortality and for morbidity (pp. 245 "Attributable mortality and burden of disease", Lopez et al. 2006).

Joint effects of multiple risk factors
From Eide and Heuch (2001), we note that, in the formula: RR, P, and P′ may represent joint relative risks and exposure distributions for multiple risk factors, that is, x may be (a vector of values from) a vector of risk factors, with RR for each risk factor estimated at the appropriate level of the remaining ones.
Let us consider the two risk factors "obesity" and "smoking" for HD mortality, with categories no/yes for each of these two risk factors. The integral in the above formula would go from X = (x 1 , x 2 ) = (x 1, 0 , x 2, 0 ) to X = (x 1, 1 , x 2, 1 ), where (x 1, 0 , x 2, 0 ) is the vector of initial/reference exposure levels for obesity and smoking, respectively, and (x 1, 1 , x 2, 1 ) is the vector of final/extreme/maximum exposure levels. In discrete terms, the possible values for X could be: & (x 1, 0 , x 2, 0 ) equivalent to obesity = no, smoking = no & (x 1, 0 , x 2, 1 ) equivalent to obesity = no, smoking = yes & (x 1, 1 , x 2, 0 ) equivalent to obesity = yes, smoking = no & (x 1, 1 , x 2, 1 ) equivalent to obesity = yes, smoking = yes For each of the four values of the vector above, we should have specific RR values, and specific P and P′ exposure distributions would be assumed over these four categories. RRs for these specific combinations of levels of risk factors might be difficult to obtain from the literature. Also, similarly, it would be difficult to get the distribution of exposure P (and, eventually, P′).
In the case of CDA, we could consider diseases "DM" and "HT" as risk factors for death from HD. We would have, as above: & (x 1, 0 , x 2, 0 ) equivalent to DM = no, hypertension = no & (x 1, 0 , x 2, 1 ) equivalent to DM = no, hypertension = yes & (x 1, 1 , x 2, 0 ) equivalent to DM = yes, hypertension = no & (x 1, 1 , x 2, 1 ) equivalent to DM = yes, hypertension = yes As above, we need to have estimations of the RRs and P, and a reference P′. The estimation of the RRs could be obtained as follows: & RR 1 = for the reference category & RR 2 = (proportion of deaths from HD within the subgroup "DM = no; hypertension = yes")/(proportion of deaths from HT within the reference subgroup "DM = no; hypertension = no" & RR 3 = (proportion of deaths from HD within the subgroup "DM = yes; hypertension = no")/(proportion of deaths from HD within the reference subgroup "DM = no; hypertension = no" & RR 4 = (proportion of deaths from HD within the subgroup "DM = yes; hypertension = yes")/(proportion of deaths from HD within the reference subgroup "DM = no; hypertension = no" The exposure distribution would need to be estimated from epidemiological sources, definitely not from the death certificates. Alternatively, for n biologically independent and uncorrelated risk factors, the joint PAF is given by the equation (Miettinen 1974): where PAF i represents the PAF of individual risk factors. Of course, risk factors generally are not independent and uncorrelated, but it could be a reasonable solution if we are interested in the additivity of PAF of several risk factors for the same outcome.

Conclusions
In this paper, we have demonstrated that it is possible to assess the burden of mortality using a multi-causal counterfactual approach, similar to the one used to assess the impact of risk factors in populations' health [Comparative Disease Assessment (CDA) and Comparative Risk Assessment (CRA) respectively].
One of the advantages of this CDA approach is that it uses multi-causal mortality registries, while are not always but usually available in developed and middle-income countries.
The estimation procedure suggested for the CDA approach is based on several assumptions; two of the most questionable are: & (1) The underlying cause (UC) apart, all other causes are potential risk factors for death from disease j & (2) If disease i is not included as a secondary cause in the death certificate, then the corresponding subject did not have disease i diagnosed before death To correct for (1), we might need a priori to filter out "secondary" causes that cannot be considered as potential risk factors in order to avoid not logical and/or irrelevant results. As a consequence of (2), we might get just a rough estimate of the burden of disease i as a risk factor for death from disease j, but this particular limitation should not be an obstacle for a wide application of the approach presented here. *From logistic regression analysis with binary outcome "death from heart diseases" as underlying cause and binary predictors DM and HT (exposure to morbidity)