Examining Approaches to Estimate the Prevalence of Catastrophic Costs Due to Tuberculosis from Small-Scale Studies in South Africa

Background and Objective In context of the End TB goal of zero tuberculosis (TB)-affected households encountering catastrophic costs due to TB by 2020, the estimation of national prevalence of catastrophic costs due to TB is a priority to inform programme design. We explore approaches to estimate the national prevalence of catastrophic costs due to TB from existing datasets as an alternative to nationally representative surveys. Methods We obtained, standardized and merged three patient-level datasets from existing studies on patient-incurred costs due to TB in South Africa. A deterministic cohort model was developed with the aim of estimating the national prevalence of catastrophic costs, using national data on the prevalence of TB and likelihood of loss to follow-up by income quintile and HIV status. Two approaches were tested to parameterize the model with existing cost data. First, a meta-analysis summarized study-level data by HIV status and income quintile. Second, a regression analysis of patient-level data also included employment status, education level and urbanicity. We summarized findings by type of cost and examined uncertainty around resulting estimates. Results Overall, the median prevalence of catastrophic costs for the meta-analysis and regression approaches were 11% (interquartile range [IQR] 9–13%) and 6% (IQR 5–8%), respectively. Both approaches indicated that the main burden of catastrophic costs falls on the poorest households. An individual-level regression analysis produced lower uncertainty around estimates than a study-level meta-analysis. Conclusions This paper presents a novel application of existing data to estimate the national prevalence of catastrophic costs due to TB. This type of model could be useful for researchers and policy makers looking to inform certain policy decisions; however, some uncertainties remain due to limitations in data availability. There is an urgent need for standardized reporting of cost data and improved guidance on methods to collect income data to improve these estimates going forward. Electronic supplementary material The online version of this article (10.1007/s40273-020-00898-3) contains supplementary material, which is available to authorized users.


Introduction
Tuberculosis (TB) remains the leading cause of death from a single infectious agent worldwide, with 10 million people falling ill and 1.2 million people dying from TB in 2018 [1]. Often those who are most affected by TB are the most vulnerable in society, and affected households can face substantial costs associated with the disease [2]. Globally, costs associated with TB represent an average of 58% and 39% of individual and household income, respectively [3].
In recognition of the impact of the costs of illness on households, the World Health Organization (WHO) has highlighted reduction of catastrophic costs due to TB as one of three priority targets for 2020 [1]. Costs due to TB are defined as 'catastrophic' by the WHO Global TB Programme where they exceed 20% of a household's annual pre-TB income [4]. The focus of this metric is on economic hardship associated with seeking TB care, including direct out-of-pocket medical costs (such as money paid for medicines, diagnostics, consultation fees or informal payments made to health workers), direct non-medical costs (transport and accommodation costs, costs of any special food or supplements taken because of illness) as well as indirect (opportunity) costs of time spent seeking care by people with TB and guardians or household members accompanying them [5][6][7][8].
To help track country progress against this goal and inform programme planning, the Global TB Programme has developed guidelines for the conduct of nationally representative cross-sectional surveys to estimate the prevalence of catastrophic costs [4]. However, these surveys require ample resources and time to complete and will not be feasible for all 130 WHO member states to carry out repeatedly, leaving countries searching for another source of estimates.
In many settings, data on patient costs of TB have been collected as part of trials or other smaller-scale projects; however, recent systematic reviews of patient-incurred costs due to TB observed large heterogeneity in the quality of reporting as well as the methods used to collect cost data and measure income loss [3,10,11]. Given this variation, it is currently unclear to what extent this existing data can be used to inform national estimates of catastrophic costs due to TB. We hypothesize that with the use of a cohort model these data could still be a useful resource for countries looking for decision-making support, in the absence of a national survey. We aim to investigate approaches to model the national prevalence of catastrophic costs due to TB using the case study of South Africa, which has one of the world's highest TB incidence rates, with an estimated incidence of 520 per 100,000 people in 2018 [12].

Parameterizing the Cohort Model: Population Characteristics
We created an individual-level deterministic cohort model that simulated progression through the TB care cascade in order to estimate the prevalence of catastrophic costs in South Africa (Fig. 1). The model contained a hypothetical cohort of 1000 South Africans with drug-susceptible (DS) TB, with population characteristics mirroring those of the national population of people with DS-TB. Individuals in the cohort were first distributed across national income quintiles 1-4 using data on the national income distribution and distribution of TB across income quintiles [13][14][15]. We then sampled employment status by income quintile, and household size reflecting national distributions of each [16]. Individual income was estimated by dividing household income by household size; individual income took a value of zero if unemployed or otherwise not income-earning. HIV sero-status was modelled for each individual in the cohort based on the national HIV prevalence among individuals with DS-TB [17]. We then estimated the likelihood of loss to follow-up before treatment start based on HIV status, following evidence from Naidoo et al. [17].

Identifying and Reconciling Primary Data
We collated all research articles reporting any estimates of patient-incurred costs due to TB in South Africa from the Unit Cost Study Repository (UCSR) of the Global Health Cost Consortium [18]. Patient-incurred costs included any costs paid out-of-pocket by TB patients and their households, and any lost income or productivity due to TB. Eleven studies presenting patient cost data in South Africa were identified [19][20][21][22][23][24][25][26][27][28][29]. Of these, four were excluded due to outdated models of care and one was excluded as a duplicate of previously published data. Corresponding authors of seven eligible studies were invited to participate, and a protocol identifying variables to be included in the pooled dataset was provided. Collaborators from three of the seven eligible studies agreed to participate in the analysis. Due to data availability, the scope of this analysis was restricted to costs whilst on treatment for DS-TB; we did not consider costs for drug-resistant (DR) TB, nor did we consider costs during the diagnostic process [10]. All datasets had obtained ethical approval for their original study. Ethical approval for the pooled analysis was granted by the London School of Hygiene and Tropical Medicine (reference 14486).
We reconciled timeframes for cost data by identifying the treatment start date, interview date and recall period for each participant. Direct out-of-pocket costs incurred in each treatment phase (intensive and continuation phase) were categorized as direct medical costs (consultation fees, medicines, diagnostics), direct non-medical costs (transportation, accommodation) and food costs (food supplements, special foods). Cost estimates were distinguished by treatment phase (intensive and continuation phase) and by type of healthcare provider, including public healthcare (PHC) facility (study site), another PHC facility (non-study site), private general practitioner, pharmacy, hospital inpatient service, hospital outpatient service, and traditional healer.
All data in different studies were collected using adaptations of the Tool to Estimate Patient Costs [30], and thus definitions for out-of-pocket cost variables were homogeneous; however, the Researching Equity in ACcess to Healthcare (REACH) dataset did not contain information on direct non-medical costs or time spent accessing providers other than the main study clinic. As this was omitted entirely from data collection, we assumed these values to be missing at random and used imputation to complete these costs (imputation methods described in Sect. 2.2.2). In contrast, methods for collecting data on income and estimating indirect costs varied widely across datasets and were not reconcilable. To complete the datasets, we took a statistical approach to predict income quintile for households in the dataset. Assuming income distribution to be the same as the national distribution of income amongst people with TB, we used regression coefficients from an analysis run on the most recent (2015) South African National Income Dynamics Survey (NIDS) for variables including asset holdings, housing quality indicators and basic demographics to predict income. Full methods to predict household income quintiles are described in Electronic Supplementary Material (ESM) Appendix 2.
All costs are reported in 2017 US dollars. Data collected before 2017 were inflated using the US consumer price index [31]. Prior to generating model parameters using the standardized data, we conducted a descriptive analysis of sociodemographic and cost variables within and across datasets. Variables were summarized using the mean and standard deviation for each individual dataset and across the pooled dataset. We tested for significant differences in categorical variables using a chi-squared test, and tested for significant differences in continuous variables using a one-way analysis of variance (ANOVA) within and between studies.

Generating Model Parameters
We tested two approaches to estimate mean and standard error values for direct costs and hours lost due to treatment by household income quintile, HIV status and treatment phase: (1) meta-analysis of summary statistics from  Our first approach was meta-analysis to calculate pooled estimates of available (study-level) mean values for the above-described cost categories for each treatment phase, by HIV status and household income quintile [32]. Given that patient demographics varied significantly across datasets, and assuming that patient costs vary according to demographics, we used a random effects meta-analysis approach, which does not assume that all studies investigate the same population [32]. Data on direct costs, travel time and consultation time were log-transformed for the meta-analysis as they were highly skewed, and results were exponentiated following meta-analysis.

EsƟmaƟon of prevalence of catastrophic costs in the cohort
Our second approach was to identify a regression model to predict the above-described cost categories for each treatment phase, by HIV status and household income quintile. Firstly, we imputed missing values in the pooled dataset. Where total consultation hours were missing, we used multivariate imputation with chained equations (MICE) to impute these values based on the number of visits by phase and provider type. Total travel hours and total direct nonmedical costs were imputed based on number of visits and transport method, as well as demographic variables included in the regression analysis. All imputations used predictive mean matching (PMM), as a non-parametric alternative for imputing skewed data. Imputations generated 20 plausible datasets, which were then used for analysis. The number of missing observations by dataset are listed in ESM Table 3 [33].
Following imputation, we conducted a series of regression analyses to predict the cost of each cost category for each treatment phase. The regression analyses used a generalized linear model (GLM) approach assuming a gamma distribution and a log link to accommodate skewed data [34]. The specification of each regression was held constant across analyses and included independent variables identified following theory, as well as previous published evidence [35][36][37][38][39][40]. Independent variables were defined as urbanicity (1 = rural), education level (1 = educated to grade 8 and above), employment status (1 = employed), HIV status (1 = HIV positive) and household income quintile (quintiles [1][2][3][4][5]. An interaction between employment status and income quintile was also included. Following the regression analysis, marginal estimates for each of the above-described cost variables were obtained by HIV status, household income quintile and employment status, with urbanicity and education values held constant at the mean observed among people with TB in the NIDS dataset (urbanicity = 0.327; education above grade 8 = 0.683). To help us interpret the results of the regression analysis using the pooled dataset, we also tested the extent to which cost drivers varied across datasets. To do this, we conducted a regression on the unimputed data for each dataset separately. As there were insufficient observations to do this for the intensive phase, we conducted this test only on observations in the continuation phase.

Estimation of Prevalence of Catastrophic Costs in the Cohort Model
We sampled patient-incurred direct costs and total time per individual in the cohort following a gamma distribution based on the mean and standard error values determined in the analyses described earlier. Individuals that were lost to follow-up before treatment initiation were assumed to encounter only costs of special food and supplements, while those initiated onto treatment were assumed to also encounter direct medical, direct non-medical and indirect costs.
Following prediction of total time spent accessing care, indirect costs were estimated using a human capital approach to value the opportunity cost of travel and consultation time whilst seeking care. This was calculated as the total hours lost multiplied by an approximation of individual income per hour, estimated assuming 220 working days per year and an 8-h working day. Indirect costs were assumed to be zero for those who were unemployed; however, we undertook a sensitivity analysis valuing costs for unemployed people using an equivalent individual salary for a similar person. Sensitivity analysis also tested inclusion of all treatment costs for those lost to follow-up. Due to a lack of data, we were not able to estimate indirect costs using an output approach (income loss due to illness), nor were we able to include costs incurred by other members of the household such as guardians or lay caregivers.
Catastrophic costs were defined as where the total patient-incurred cost during treatment was greater than 20% of annual household income [4]. For each model run, the prevalence of households encountering catastrophic costs was estimated per quintile and across the full cohort. The cohort model was simulated 10,000 times, to give 10,000 unique estimates of the national prevalence of catastrophic costs. Simulations were summarized using the median value and interquartile range (IQR) across all simulations.

Demographic Characteristics, Direct Costs and Time Accessing Care in Published Studies
Three datasets were obtained and merged [27][28][29], resulting in a total of 1573 observations; 1219 were from the REACH study [27], 148 from the MERGE trial [29] and 171 from the XTEND (Xpert for TB: Evaluating a New Diagnostic) trial [28]. Table 1 shows the demographic data for each dataset, as well as the pooled dataset. Several demographic variables, including urbanicity, age, education and employment status, were significantly different across datasets. Although each study sample was randomly selected, the pooled datasets were not representative of the TB-affected population in South Africa as measured in the NIDS 2015 dataset. Tables 2 and 3 show the meta-analysis results for direct medical costs, direct non-medical costs, food costs and hours lost (also see ESM Figures 1-5). The considerable uncertainty observed was a result of several factors, including the small numbers of observations, wide variation in visit frequency across datasets, and wide variation in costs within and between datasets. Availability of data also varied by dataset and treatment phase. ESM Table 1 lists the mean number of visits per month, direct costs per visit and time spent per visit by provider type and treatment phase for each of the datasets. The variation in demographics observed across datasets provided a motivation for pursuing a regression analysis, which allowed inclusion of other explanatory variables in estimation of costs and time associated with accessing care. The results for the regression analysis are listed in Tables 4  and 5. Several independent variables were found to have a significant effect on cost. HIV status had a consistently positive effect. Being in a rural setting and having a higher education level both had a negative effect on direct costs and a positive effect on time spent accessing providers. Our tests of the regression model on the raw unimputed data separately for each dataset (ESM Tables 6-8) found some regression model coefficients were not consistent across datasets. There were no substantial differences observed in significant coefficients across datasets; where multiple datasets had significant coefficients for a given variable, coefficients were in the same direction and similar magnitudes.   Table 6 and Fig. 2 show the cohort model estimates of total costs and hours lost for both approaches, by household income quintile. Model estimates from both meta-analysis and regression approaches show the majority of the catastrophic cost burden falling on the first income quintile (meta-analysis: median 28%, IQR 24-34%; regression: median 14%, IQR 12-17%). Overall, 11% of people with TB nationally were predicted to encounter catastrophic costs using the meta-analysis approach (IQR 9-13%). Using inputs derived from the regression approach, the overall predicted prevalence of catastrophic costs was slightly reduced at 6% (IQR 5-8%). Including all costs for those lost to follow-up, and valuing time for those unemployed, increased the prevalence of catastrophic costs in quintiles 1 and 2 in both the meta-analysis and regression approaches. Our estimate of the overall prevalence of catastrophic costs using the meta-analysis approach were robust in sensitivity analyses; overall estimates for the regression approach increased slightly in sensitivity analysis.

Discussion
We present estimates of the prevalence of catastrophic costs associated with TB, employing an individual-level cohort model using two approaches to parameterize cost estimates: a meta-analysis approach using study-level statistics and a regression approach using individual-level primary data. Overall, the median prevalence of catastrophic costs was estimated at 11% using a meta-analysis approach and 6% using a regression approach. Both approaches confirmed that catastrophic costs had a higher prevalence among the poorest quintile.
Our analysis had several limitations, mostly related to limitations in the available data. Due to limited data availability, the scope of the analysis was restricted to costs incurred by DS-TB patients during treatment. We were also unable to include costs before treatment initiation, costs incurred by other household members or indirect costs associated with lost income due to illness as there were no data on these costs. These restrictions on the analysis are likely to result in an under-estimation of the true prevalence of catastrophic costs. Furthermore, as all studies collected data at the health facility level, we were unable to include any cost estimates for people unable to access care. This may exclude the most vulnerable households impacted by TB, and thus likely results in an underestimate of the economic burden of TB. Finally, we do not include funeral costs for TB-related deaths, as there is limited evidence on these costs.
The meta-analysis approach produced higher estimates and wider uncertainty intervals than the regression analysis.
This reflects the limitations of parameterization of a deterministic model using available cost and epidemiological data. We designed the model using the best available information on the TB epidemic in South Africa, using HIV status and income quintile to delineate TB prevalence and loss to follow-up rates. There was no available information on TB prevalence or loss to follow-up by urbanicity, unemployment or education level, and thus no way to include these effects in the meta-analysis approach.
In contrast, the better-parameterized regression approach allowed us to obtain cost estimates that had been corrected for the additional independent variables of urbanicity, unemployment and education. As these three variables were found to have significant effects on cost, it is therefore not surprising that the estimates obtained through the regression approach resulted in reduced uncertainty. This also explains the reduced estimated overall prevalence using the regression approach. Rural setting and higher education level both had a negative effect on direct costs, neither of which were accounted for in the meta-analysis.
Although the regression approach was better parameterized, our analysis indicates that better information is still needed on the household-incurred costs due to TB. Our findings on significant determinants of patient-incurred costs identified in the regression approach are largely supported by the existing literature [28,36,37]; however, these determinants were not consistently identified within the individual datasets and predictive power of the regression analyses were low. This may have been a consequence of small sample sizes and varying demographics across datasets; however, it could also reflect substantial differences in models of care in different settings. The extent to which TB and HIV services are integrated varies widely across South Africa [39]; this could lead to variation in the number of visits prescribed for those with both TB and HIV and thus a substantial difference in the degree to which positive HIV status is a driver of costs. Similarly, the extent to which directly observed therapy (DOT) is followed and its modality (facility-based or community-based) varies between settings. This is likely to have produced differences in cost drivers across datasets, for example increasing travel time for rural participants in settings where DOT is prevalent.
To enable deterministic models such as the one presented here, further information is also needed on the healthcare pathway and drivers of care-seeking behaviour, including sub-national variations in models of care. This may also be evolving as the model of care for TB in South Africa continues to change.
Given these uncertainties, our analysis indicates that access to patient-level data is vital for researchers looking to extrapolate existing cost estimates to national settings, at least until the cost function can be better defined. The importance of improving data sharing is being increasingly  recognized by publishers and funders [43,44], not only to ensure transparency of research but also to maximize the benefit of data being collected. Primary data collection is often costly and time-consuming; the nationally representative WHO surveys are typically budgeted between $US27,000 and $US166,000 [4], while the costs of a secondary analysis are typically restricted to the time of the analyst (not including the costs of primary data collection in the original studies). Lengthy interviews about costs related to TB can also impose a substantial burden on patients and their household members, it is therefore important to make the most of data that are collected. Of course, primary data collection also comes with some degree of uncertainty. Although primary cost data are increasingly being made available, for example through mechanisms such as the Global Health Cost Consortium's UCSR [18], the wide variance in methods used to collect cost data remains a persistent limitation in the feasibility of pooling data. There are several areas where further data collection or better guidance on data collection methods would improve these estimates substantially. Firstly, methods for estimation and reporting of income data in patient-incurred cost surveys are currently inconsistent, with limited guidance on methods [41]. Going forward, better guidance on methods to estimate household and individual income is critical for any future attempts to pool data for drawing national estimates as well as more generally informing policy. Guidance on the appropriate measures of indirect costs in the numerator, and ability to pay in the denominator (e.g. household income vs. household expenditures), would also improve the theoretical validity of the metric [41].
Despite the above-discussed uncertainties, this type of model could be useful for researchers and policy makers. A cohort model such as the one presented in this paper can estimate the national prevalence of catastrophic costs due to TB and the uncertainty around these estimates, and can identify the comparative impact of TB-related costs on different sections of the population. It also has potential to inform certain policy decisions; for example, Verguet et al. [40] use a similar approach to illustrate the potential number of catastrophic costs averted from a range of TB interventions. The approach presented in this paper improves estimates by using a systematic approach to pool data from multiple studies, and allowing for adjustment of demographics and by treatment phase.
The usefulness of the type of analysis presented in this paper depends on the objectives of the analysis. This analysis may be sensitive enough to capture major movement towards the End TB goal of zero catastrophic costs due to TB; however, it is likely not sensitive enough to capture small changes from year to year-especially in settings where the cost function is still unknown or differs substantially in different settings. Ongoing primary data collection through national surveys is likely still necessary to facilitate annual reporting and programme management until the availability of cost and epidemiological data improves, and the cost function is better identified. However, while probably not providing quite as robust an estimate of catastrophic costs as a national survey, this type of analysis can complement, enrich and add depth to findings from the national surveys, especially for certain groups.

Conclusions
This paper presents a novel use of existing data to estimate the prevalence of catastrophic costs due to TB [4]. We find that in the absence of nationally representative surveys, a deterministic model can provide an alternative for estimating catastrophic cost prevalence and the uncertainty around those estimates, with uncertainty slightly reduced using a regression approach as compared with a meta-analysis approach. A repeat of this analysis with additional primary data from South Africa added would test the validity of the main finding. Analyses testing the results of a cohort model against national estimates of catastrophic costs of other conditions would also help researchers to understand the validity of these models and the value of information added as compared with primary data collection through national surveys. Ultimately, to improve estimates from such costsaving approaches, there is an urgent need for standardized methods to collect income data and standardized reporting of cost estimates.