Background

Quantifying the health impacts of population interventions by social strata is necessary for the design of effective policies to reduce health inequalities. These analyses are often based on simulation models, which attempt to quantify the future health (and cost) impacts of interventions. For example: what is the health gain of various tobacco [1], diet [2,3,4,5] and other preventive interventions by socioeconomic groups? And, how will the intervention change health inequalities?

Simulating interventions by socioeconomic group is difficult. The heterogeneity of epidemiological parameters (e.g. incidence, mortality, morbidity, case fatality, etc.) across socioeconomic strata and determining their effects on simulation outputs is challenging; both in specifying the relevant inputs correctly, and coherence in the modeling (e.g. ensuring that deaths and other outputs such as prevalence sum across strata to return that in the aggregated population).

For an illustrative example, consider a lifetable simulation for a population of 2000 members. Suppose that the mortality risk of the population is 1 per 100 and increases by 20% per year for 20 years to account for aging. After the first year, there will be an expected 1980 people alive, and by the 20th year, the expected population will decrease to 367.4. The person-years lived by annual cycle are shown in Fig. 1.

Fig. 1
figure 1

Demonstration of the principle issues addressed in our study: Persons years lived by annual cycle for a population of initial size n = 2000, with mortality rate 1 per 100 person years in the first year, and the high and low SES groups (each N = 1000) that make up this total population. The mortality rate in the low SES group is twice that in the high SES group at baseline (i.e. 0.667 and 1.333 per 100 person years, respectively). The mortality rate increase by 20% per annum in the total population, and each SES group

Now suppose that our population is comprised of two distinct groups, e.g., high and low socioeconomic status (SES) groups, each with 1000 people. Suppose also that one group has twice the mortality risk of the other. Setting the mortality risks for the groups to be 4/3 and 2/3 deaths per 100 respectively yields the same expected number of living people after the first year as were obtained previously (i.e., 1980). By applying the annual 20% increase of mortality risk to each group and simulating to the end of 20th year, we generate the number of person years as shown in Fig. 1. In the 20th annual cycle there are 322.2 and 105.0 expected person years lived in the high and low mortality group respectively. This sums to 427.3, which is greater than the 367.4 obtained when the population was modeled as a whole. That is, the mortality parameters are mis-specified in the stratified model, because differing mortality rates by strata change the proportional distribution of people across strata away from 50:50 over time. An intervention scenario imposed on top of these mis-specified lifetables likewise incorrectly estimate the health gains for each stratum and the differences in health gain by stratum (i.e. the inequality impact).

The purpose of our study was to develop methods to disaggregate populations in Markov macro-simulation models, particularly proportional multistate lifetable models (PMSLT) [6, 7], retaining fidelity with the aggregate population in terms of numbers of deaths and other event counts. In so doing, we are implicitly assuming that aggregate epidemiological data (and forecasts for modelling interventions into the future) are more accurate than disaggregated data. For example, we assume that forecasts of future mortality and incidence rates by sex and age are more accurate (and more readily available) than such estimates additionally stratified by socioeconomic status. We are also adhering to a ‘burden of disease approach’ that places and emphasis on ensuring ‘everything adds up’ – at least in the business-as-usual (BAU) epidemiological model. We searched the literature for previous publications on this topic but could find none (see “Appendix 1: Abstracts from literature search”).

A few extensions of multistate models accounting for heterogeneity have been observed in the literature. Such extensions have highlighted, in a disease-free context with independent populations, that mortality is underestimated if heterogeneity is not accounted for [8, 9]. Further, log-linear, hazard, or Bayesian regression [10] methods can be used to calculate differences in multistate life table parameter values by an explicitly defined heterogenous parameter of interest (say, SES or education) for further simulation. Other related works in epidemiology and demography demonstrate methods for obtaining the mortality of diseased and non-diseased cohorts from the overall mortality rate in continuous time PMSLT models [11, 12].

Our work differs from the previous literature in that we use a proportional version of multistate lifetables where rates only vary at discrete time steps. We also treat the aggregate population as the ‘working truth’, with our task being to obtain a unique set of valid rates for the strata in order to quantify the impacts on health inequalities.

First, we outline the mathematics of disaggregating populations, such that separately simulating the transitions of each sub-stratum under BAU through its state transition Markov chain yields death counts and person-years lived that are identical with the aggregate population. Second, we demonstrate intervention scenarios applied to both the aggregate and deprivation-stratified models. We deliberately chose interventions that do not have differing effect sizes by deprivation quintile. The three modeled scenarios are two hypothetical interventions (50% lower all-cause mortality, 50% lower coronary heart disease mortality) and a dietary intervention to substitute 59% of sodium chloride with potassium chloride in the food supply in New Zealand.

We apply this disaggregation method to the disease and all-cause components of a PMSLT model. An intervention model, though, first calculates the intervention effect size as a change in risk factor distribution (now and into the future) to then generate a population impact fraction (PIF; or difference in disease incidence rates between BAU and intervention scenarios). We assume the modellers estimate the PIFs by strata of variable we are disaggregating the lifetables for, and therefore we do not present methods for disaggregation of PIFs.

Methods

We consider two separate Markov processes which describe the mortality and disease lifetables used in PMSLT modeling [6, 7].

The mortality lifetable

The mortality lifetable of a population \(P\) measures metrics such as the number of deaths, life years and life expectancy, capturing snapshots at discrete timesteps (e.g. years). Let \(A_{t}\) and \(D_{t}\) denote the number of people alive and dead at the end of the \(t\)-th timestep. We assume \(A_{0} = N\) and \(D_{0} = 0\), where \(N\) is the total number of people that are observed initially, and for each \(t\) we have a mortality rate \(m_{t}\). Then:

$$\begin{aligned} A_{t} & = A_{t - 1} e^{{{ } - m_{t} }} , \\ D_{t} & = D_{t - 1} + A_{t - 1} \left( {1 - e^{{{ } - m_{t} }} } \right) \\ \Rightarrow D_{t} & = N - A_{t} \\ \end{aligned}$$

In the mortality lifetable disaggregation problem, \(P\) consists of \(n\) underlying sub-populations \(P_{1} , \ldots , P_{n}\), where we assume that members of \(P\) remain in their respective sub-populations for life. The initial population for each \(P_{1} , \ldots , P_{n}\) are known, and the initial mortality rates are known (either given, or solvable using mortality rate ratios between strata). However, whilst \(m_{t}\), the total population mortality rate by future time step, is given, the stratum-specific mortality rates over time are not given. This situation is not uncommon, e.g. we may know the starting disaggregation of a population by socioeconomic strata, and the rate ratios for mortality or disease incidence comparing strata, but not the exact rates by strata over time. The goal is to use the aggregate populations mortality rates, the rate ratios comparing strata, and starting distribution of each sub-population (\(P_{k}\)) to solve the mortality lifetables for each \(P_{1} , \ldots , P_{n}\). The sub-populations lifetables must be consistent with that of the aggregate population: at each time timestep \(t\), the sum of the people in the alive (dead) compartment for each sub-population is equal to the number of people in the alive (dead) compartment for the aggregate population. i.e.,

$$\begin{aligned} A_{t} & = \mathop \sum \limits_{k = 1}^{n} A_{k,t} \\ D_{t} & = \mathop \sum \limits_{k = 1}^{n} D_{k,t} \\ \end{aligned}$$

where \(A_{k,t}\) and \(D_{k,t}\) denote the number of people in the alive and dead compartments respectively for sub-population \(k\) at time \(t.\) Thus, the disaggregation problem reduces to finding mortality rates \(m_{k,t}\) for each sup-population \(k\) at timestep \(t > 0\) such that:

$$A_{t - 1} e^{{{ } - m_{t} }} = \mathop \sum \limits_{k = 1}^{n} A_{k,t - 1} e^{{{ } - m_{k,t} }}$$

If we let \(P_{1}\) be the reference sub-population, then the mortality rate ratios r for timestep \(t\) are given as scalars \(r_{1,t}^{{{\text{mort}}}} , \ldots , r_{n,t}^{{{\text{mort}}}}\), where \(r_{1,t}^{{{\text{mort}}}} = 1\), such that \(m_{k,t} = r_{k,t}^{{{\text{mort}}}} m_{1,t}\) for each sub-population \(k\). By substituting each \(r_{k,t}^{{{\text{mort}}}} m_{1,t}\) into the consistency equations, we obtain a set of equations with a unique solution. By solving these, we are then able to obtain a unique set of sub-population mortality rates for the mortality lifetable disaggregation problem. A method for solving for these rates, and the proof of the uniqueness of the solution is in “Appendix 2: Disaggregation details”.

The mortality/morbidity lifetable

We now extend the mortality lifetable to a mortality/morbidity lifetable that also includes HALYs which incorporate the effects of morbidity. Let \(L_{t}\) denote population life years at \(t\), where

$$L_{t} = \frac{{A_{t} + A_{t - 1} }}{2}$$

Our HALY unit is a rescaling of the life-year using disability rates. Let \(w_{t}\) denote the prevalent years of life with disability (i.e. ‘YLDs’) from a burden of disease study at time \(t\), divided by the population in that strata. Then the formula for HALYs at time \(t\), denoted \(L_{t}^{*}\), is:

$$L_{t}^{*} = L_{t} \left( {1 - w_{t} } \right)$$

For the mortality/morbidity lifetable disaggregation problem, an extension of the disaggregation problem in the previous

section, we are given a mortality/morbidity lifetable for a population \(P\) which includes all parameters from the mortality lifetable, along with morbidity weights \(w_{t}\) and HALYs \(L_{t}^{*}\). As before, \(P\) consists of \(n\) sub-populations \(P_{1} , \ldots , P_{n}\), with their individual population counts, mortality rates, and YLDs given for the first time step. The goal is to use the aggregate lifetable to determine the mortality rates and morbidity rates (\(w_{t}\)) rates for each sub-population \(P_{k}\), and hence obtain the mortality/morbidity lifetables for each \(P_{1} , \ldots , P_{n}\). We have to ensure alive population counts for sub-populations and the aggregate population agree and also total HALYs for the sub-populations agree with the aggregate population HALYs. That is:

$$L_{t}^{*} = \mathop \sum \limits_{k = 1}^{n} L_{k,t}^{*}$$

where \(L_{k,t}^{*}\) denotes the HALYs for sub-population \(k\) at time \(t\). To satisfy the above equations, we must solve values \(w_{k,t}\) for each \(k\) and \(t\) such that:

$$L_{t} \left( {1 - w_{t} } \right) = \mathop \sum \limits_{k = 1}^{n} L_{k,t} \left( {1 - w_{k,t} } \right)$$

where \(L_{k,t}\) denotes life years for sub-population \(k\) at \(t\).

We can assume that the mortality lifetable disaggregation problem has been solved as a subproblem, since it can be independently solved using the method in “The mortality lifetable” section. Then, we have alive population values, such that \(A_{t} = \mathop \sum \nolimits_{k = 1}^{n} A_{k,t}\), which implies that \(L_{t} = \mathop \sum \nolimits_{k = 1}^{n} L_{k,t}\). Hence, we can simplify the HALY constraints to:

$$L_{t} w_{t} = \mathop \sum \limits_{k = 1}^{n} L_{k,t} w_{k,t}$$

To solve the problem, we assume morbidity (morb) for each \(t\) (although in all likelihood ratios vary by age and sex, but are assumed constant over t), which are \(r_{1,t}^{{{\text{morb}}}} , \ldots , r_{n,t}^{{{\text{morb}}}}\), and \(r_{1,t}^{{{\text{morb}}}} = 1\), such that \(w_{k,t} = r_{k,t}^{{{\text{morb}}}} w_{1,t}\) for each sub-population \(k\). After substituting each \(r_{k,t}^{{{\text{morb}}}} w_{1,t}\) into the HALY constraints and solving, we obtain:

$$w_{k,t} = r_{k,t}^{{{\text{morb}}}} \frac{{L_{t} w_{t} }}{{\mathop \sum \nolimits_{j = 1}^{n} r_{j,t}^{{{\text{morb}}}} L_{j,t} }}$$

Thus, we are able to use morbidity ratios to uniquely disaggregate the mortality/morbidity lifetable such that the HALYs in the sub-population lifetables are consistent with the aggregate population lifetable.

The disease lifetable

A PMSLT, described in detail in [6, 7], works through changes in disease incidence or case fatality rates, where each disease is assumed independent of other diseases. Similar to the all-cause mortality and morbidity lifetable (above), the issue here is in ensuring that each disease-specific subsidiary lifetable also returns the numbers and rates or the total population before it is disaggregated by heterogeneity (eg. SES).

We now consider an alternative type of lifetable: the disease lifetable. This lifetable consists of three compartments: a healthy compartment \(S\), diseased compartment \(C\), and dead compartment \(D\). At each timestep, members of the population in \(S\) transition to \(C\) according to the incidence rate, and from \(C\) to \(D\) according to the fatality rate. For some diseases, members can transition from \(C\) to \(S\) as per the remission rate, however, for simplicity, we do not consider this possibility for now.

Let \(S_{t}\), \(C_{t}\) and \(D_{t}\) denote the number of people in compartment \(S\), \(C\) and \(D\) respectively at the end of the \(t\)-th step. We will assume initially that \(D_{0} = 0\) and \(S_{0} + C_{0} = N\), where \(N\) is the total number of people initially observed. Let \(i_{t}\) and \(f_{t}\) denote the incidence and fatality rates respectively at \(t\). The equations for \(S_{t}\), \(C_{t}\) and \(D_{t}\) are given by the system:

$$\begin{aligned} S_{t} & = S_{t - 1} e^{{{ } - i_{t} }} , \\ C_{t} & = C_{t - 1} e^{{{ } - f_{t} }} + S_{t - 1} \left( {1 - e^{{{ } - i_{t} }} } \right), \\ D_{t} & = N - S_{t} - C_{t} \\ \end{aligned}$$

These equations are premised on a simplifying assumption that members of the population cannot die from the disease in the same timestep in which they contract the disease This assumption can, in practice, be mitigated through choosing an appropriately small timestep.

For the disease lifetable disaggregation problem, we are given the disease lifetable for an aggregate population \(P\) complete with incidence rates \(i_{t}\), fatality rates \(f_{t}\) and population counts \(S_{t}\), \(C_{t}\) and \(D_{t}\) at each timestep \(t\). We assume that \(P\) consists of \(n\) separate underlying sub-populations \(P_{1} , \ldots , P_{n}\), each with their own population counts, incidence rates and fatality rates. Let \(S_{k,t}\), \(C_{k,t}\) and \(D_{k,t}\) denote the number of people in the healthy, diseased and dead compartments respectively for sub-population \(k\) at time \(t\). We additionally assume for each sub-population we are given the initial disease prevalence, hence we can obtain \(S_{k,0}\) and \(C_{k,0}\). The objective of the problem is to determine both the incidence rates and fatality rates of each sub-population \(P_{k}\) and hence obtain the disease lifetables for each \(P_{1} , \ldots , P_{n}\). The criteria for consistency for this disaggregation at each time timestep \(t\) are given by:

$$\begin{aligned} S_{t} & = \mathop \sum \limits_{k = 1}^{n} S_{k,t} \\ C_{t} & = \mathop \sum \limits_{k = 1}^{n} C_{k,t} \\ \end{aligned}$$

i.e., we must choose incidence rates \(i_{k,t}\) and fatality rates \(f_{k,t}\) for each timestep \(t\) and sub-population \(k\) such that:

$$S_{t - 1} e^{{{ } - i_{t} }} = \mathop \sum \limits_{k = 1}^{n} S_{k,t - 1} e^{{{ } - i_{k,t} }}$$

and

$$C_{t - 1} e^{{{ } - f_{t} }} + S_{t - 1} \left( {1 - e^{{{ } - i_{t} }} } \right) = \mathop \sum \limits_{k = 1}^{n} \left[ {C_{k,t - 1} e^{{{ } - f_{k,t} }} + S_{k,t - 1} \left( {1 - e^{{{ } - i_{k,t} }} } \right) } \right]$$

We once again assume that we are given rate ratios for the sub-population rates at each timestep \(t\), specifically incidence rate ratios \(r_{1,t}^{{\text{i}}} , \ldots , r_{n,t}^{{\text{i}}}\) such that \(i_{k,t} = r_{k}^{{\text{i}}} w_{1,t}\) for each \(k\), and fatality rate ratios \(r_{1,t}^{{\text{f}}} , \ldots , r_{n,t}^{{\text{f}}}\) such that \(f_{k,t} = r_{k}^{{\text{f}}} w_{1,t}\) for each \(k\).

We can apply the method used in the mortality problem to obtain unique incidence rates \(i_{k,t}\) that satisfy the constraints for the healthy population. After obtaining the sub-population incidence rates, the consistency constraint for the diseased population simplifies to:

$$\begin{aligned} & C_{t - 1} e^{{{ } - f_{t} }} + S_{t - 1} - S_{t} = \mathop \sum \limits_{k = 1}^{n} \left[ {C_{k,t - 1} e^{{{ } - f_{k,t} }} + S_{k,t - 1} - S_{k,t} } \right] \\ & \Rightarrow C_{t - 1} e^{{{ } - f_{t} }} + \left( {S_{t - 1} - \mathop \sum \limits_{k = 1}^{n} S_{k,t - 1} } \right) - \left( {S_{t} - \mathop \sum \limits_{k = 1}^{n} S_{k,t} } \right) = \mathop \sum \limits_{k = 1}^{n} C_{k,t - 1} e^{{{ } - f_{k,t} }} \\ & \Rightarrow C_{t - 1} e^{{{ } - f_{t} }} = \mathop \sum \limits_{k = 1}^{n} C_{k,t - 1} e^{{{ } - f_{k,t} }} \\ \end{aligned}$$

Thus, by using two consecutive applications of the methods described in “Appendix 1: Abstracts from literature search”, first for the healthy compartment and incidence rates and then for the diseased compartment and fatality rates, we can use the rate ratios to obtain a consistent disaggregation of the disease lifetable.

The prototype code for the above methods is provided in a GitHub repository [13].

Case studies of interventions by deprivation strata

Our case studies are applied to the NZ population, which we disaggregate by small area deprivation (a census index at the geographic unit of about 100 individuals).

Intervention 1: 50% reduction in ACMR

Our first intervention is a hypothetical 50% reduction of the all-cause mortality rate (ACMR) for Māori females (Māori being the Indigenous population of NZ and suffering elevated levels of deprivation, and a determinant of SES hence we routinely stratify by sex, age and ethnicity prior to then stratifying by SES). This intervention acts directly upon the mortality/morbidity lifetable by modifying the ACMR and so does not require modeling of any disease lifetables. Additional file 1: Table S1 shows the aggregate ACMR and morbidity values for Māori females at each age and Additional file 1: Table S2 gives the aggregate population counts for each 5-year age group. We apply an annual percentage change in mortality (APC) of -2.5% to the ACMR values for each year after 2011 until 2026 [14].

To uniquely disaggregate the main lifetable for Māori females we use the rate ratios for five categories of deprivation obtained from routine health data for ACMR and morbidity values (Additional file 1: Tables S3 and S4 respectively). The initial proportions for the five strata are set at 20% each.

To apply the intervention, we multiply each business-as-usual (BAU) ACMR rate by 0.5 for both the aggregate and stratified populations.

Intervention 2: 50% reduction in CHD incidence

In our second intervention, we reduce the incidence rate of coronary heart disease (CHD) for Māori females by 50%. This intervention requires the modeling of a CHD disease lifetable as well as the main mortality/morbidity lifetable. The incidence rate, fatality rate, morbidity ratio and initial prevalence values for the aggregate population are given in Additional file 1: Table S5. We assume an APC of -2% for both incidence and fatality rates for each year after 2011 until 2026 (giving a 4% per annum reduction in mortality).

To disaggregate the CHD lifetable, we use the rate ratios for five categories of deprivation from routine health data for CHD incidence and fatality rates given in Additional file 1: Tables S6 and S7 respectively. We also use prevalence ratios for CHD given in Additional file 1: Table S8 to obtain initial prevalence values for the sub-populations.

To apply the intervention, we multiply the CHD incidence rate in BAU by 0.5 for both the aggregate and stratified populations. This change in incidence flows through into changes in CHD mortality and prevalence, and then to changes in all-cause mortality/morbidity in the main lifetable (as described in “The mortality lifetable” section).

Intervention 3: 59% reduction in NZ sodium consumption

In our final example, we apply a real-world dietary intervention. As described in Nghiem et al. [15], we assume that 59% of the dietary sodium in processed foods and table salt is replaced by potassium and magnesium salts. This is estimated to reduce daily sodium intake by 51.5% for the NZ population. We assume that the effect size is the same across sub-populations (consistent with similar sodium intakes across the sub-populations considered here).

This intervention reduces the incidence rates of CHD and stroke (due to a reduction of systolic blood pressure). As such, our modeling involves both CHD and stroke disease lifetables. The CHD lifetables are obtained as per section 2.5. The stroke lifetables are similarly obtained using the aggregate values from Additional file 1: Table S9 and rate ratios for incidence, fatality and prevalence from Additional file 1: Tables S10, S11 and S12 respectively. We again assume an APC = − 2% for the incidence and fatality rates of stroke for each year after 2011 until 2026.

The PSMLT modeling technique we use to apply this intervention to the disease lifetables uses potential impact fractions (PIFs) to scale the BAU disease incidence rates. Our PIFs for each disease \(d \in\) {CHD, stroke}[7] are obtained using the RR shift method described in Barendregt and Veerman [16]:

$${\text{PIF}}_{d} = \frac{{\mathop \sum \nolimits_{{c \in {\mathcal{C}}}} p_{c} {\text{RR}}_{c} - \mathop \sum \nolimits_{{c \in {\mathcal{C}}}} p_{c} {\text{RR}}_{c}^{*} }}{{\mathop \sum \nolimits_{{c \in {\mathcal{C}}}} p_{c} {\text{RR}}_{c} }}$$

where \({\mathcal{C}}\) is the set of risk exposure categories \(c\), \(p_{c}\) is the population fraction in category \(c\), and \({\text{RR}}_{c}\) and \({\text{RR}}_{c}^{*}\) are the relative risks of \(c\) before and after the intervention respectively. The proportions of the female Māori population in sodium risk categories are obtained from our analysis of national nutrition survey data [17] and given in Additional file 1: Table S13. To determine \({\text{RR}}_{c}\) and \({\text{RR}}_{c}^{*}\), we apply the relative risks per 1 g daily sodium increase, from Blakely et al. [5] (to the mean sodium intakes for each category) where the mean intakes for \({\text{RR}}_{c}^{*}\) are obtained from a 51.5% reduction of the BAU mean intakes. Supporting tables are presented in Additional file 1: Tables S14 and S15.

Results

Table 1 shows the PMSLT outputs in BAU for 60–64 year old Māori females (centered on age 62) alive in 2011, for the next 20 years until aged 82. Regarding ACMRs, there is no difference for the total population modeled in aggregate compared to the weighted average across deprivation heterogeneity strata—as there should be given the method described above. Similarly, the total HALYs by cycle and summed to age 80, are identical between the aggregate and disaggregated PMSLT. Also shown in Table 1 are the ACMR for the least and most deprived quintile (with a rate ratio of 1.5812 at age 62 decreasing to 1.1159 by age 82). Whilst the population distribution is 20% in each quintile of deprivation, the HALYs lived by the least deprived are greater than for the most deprived at all ages, and more so with increasing age such that summed from age 62 to 82 the least deprived have 121.9 (18.7%) more accrued HALYs than the most deprived—due to both higher morbidity and higher ACMR.

Table 1 Business as usual 60–64 year-old cohort (centered on age 62) of Māori females, comparing aggregated and deprivation heterogeneity disaggregated proportional multistate lifetables (PMSLT) models in term of all-cause mortality rates and health adjusted life years (HALYs) outputs†

Table 2 shows the three intervention scenarios. For the (extreme) scenario of 50% reductions in ACMRs at all ages, the rate ratios comparing the most and least deprived Māori females are unchanged (as per specification), and the rate differences are halved. By the age of 82, there is a difference in the aggregated population mortality rate (0.0275) and that averaged (weighted) across quintiles (0.0276) at the third meaningful digit. Similarly, summed to age 82 the total HALYs incremental to BAU differ by 30 (0.2%) for the aggregate (13,233) compared to heterogeneity (13,204) models—due to the non-linear association of mortality rates with mortality risks, with mortality rates varying by strata.

Table 2 Intervention impacts for 60–64 year-old cohort of Māori females, comparing aggregated and deprivation heterogeneity disaggregated proportional multistate lifetables (PMSLT) models in terms of all-cause mortality rates and health adjusted life years (HALYs) outputs

Also shown in Table 2 are the measures of interest to assessing health inequality impacts of the interventions. For an intervention reducing CHD incidence by 50% and a ‘real world’ scenario of sodium substitution reducing both CHR and stroke incidence, there are reductions to the null in the ACMR rate ratio and rate differences—due to the higher rates of these diseases in deprived populations that make these interventions inequality reducing. Similarly, there are greater HALY gains for the deprived population in both relative and absolute terms (last two columns of Table 2; specifically, there is both a greater absolute incremental gain in HALYs for the most deprived (last column) and a greater relative gain (ratios in second to last column greater than 1)).

Discussion

We developed algorithms for mathematically disaggregating a population into strata such that the lifetables for the sub-populations under BAU are consistent with the lifetable of the aggregate population. To the best of our knowledge, this method is the first of its kind to perform this disaggregation with mathematical guarantees of consistency for the sub-populations (see discussion in “Background” section). These guarantees allow the lifetables for the sub-populations to be treated as the “working truth” and thus they can be used in simulations to estimate HALY gains by strata without any loss of fidelity.

We find a modest difference in the sum of HALYs gained across strata compared to ‘simply’ applying the intervention to the aggregate population—a difference that arises due to the non-linear association of rates with risks (i.e. risk = 1 – exp[rate]), and where the intervention effect modeled separately across strata then summed is more accurate than modeled simply in aggregate (as heterogeneity is allowed for).

There are two main assumptions that are required for the disaggregated outputs to be reliable. First, we assume that the lifetable for the aggregate population has been correctly parameterized and that the outputs of the BAU simulations for the aggregate lifetables are accurate for the population they model. This may be a strong assumption in practice; lifetable parameters are often obtained through projections and approximations and may involve a significant degree of uncertainty. However, since the sub-populations are mathematically consistent with the aggregate population, the amount of error in the estimated sub-population is only as large as that of the aggregate.

The second assumption is that the rate ratios comparing strata (e.g. disease incidence rate ratios), and the initial proportions in each sub-population accurately reflect reality. This implies that one must have accurate a priori knowledge of the relative differences between sub-populations.

The disaggregation algorithm is not computationally expensive to implement for typical applications (i.e., a manageable number of sub-populations), and should assist researchers aiming to quantify intervention effects by population heterogeneity.

Future research may try to integrate the system of differential equations as outlined by Barendgredt et al. [18] for aggregate data, allowing for cohort members to both enter and leave states in the same cycle. However, this will make the system considerably more complex (we cannot guarantee there is a unique and obtainable solution, requiring optimization methods), which may make it more difficult to implement algorithms efficiently compared to the simpler method we provide in this paper.

Conclusion

We provide a method for disaggregating population by heterogeneity in a PMSLT or markov model, whereby over future time steps total deaths and HALYs across all heterogeneity strata sum to those in the parent whole population model. This method is intended for use in modelling of population interventions by (say) sex and age, but also by strata such as SES and disease risk, for policy making intended to reduce inequalities in health or focus on targeted populations to maximise cost effectiveness.