FormalPara Key Points for Decision Makers

Our findings suggest the necessity for enhanced rigour and standardisation in the application of TB disability weights and reveal that sensitivity analyses frequently overlook weight values.

Various methods and weights are being utilised to account for aspects of TB-related morbidity that extend beyond the GBD disability weights for active disease and TB-HIV coinfection, suggesting opportunities for future research to formalise these broader aspects.

In response to our review findings, this study offers a set of essential and desirable recommendations for standardising the usage of DALY disability weights in TB-related cost-effectiveness analyses.

Furthermore, decision-makers should consider how tuberculosis DALYs are constructed to ensure they adequately reflect TB-related disability in accordance with their specific objectives.

1 Background

The Sustainable Development Goal (SDG) target of ending the TB epidemic by 2030 is challenging. In their Global Plan to End Tuberculosis, the Stop TB Partnership estimate US$250 billion is required, with US$40.2 billion for research and development for new TB tools, including medicines, diagnostics and vaccines.[1] These novel health technologies, including those in the 29 clinical trials and implementation research studies underway of August 2023 [2], will require evaluation of their cost-effectiveness.

When conducting a cost-effectiveness analysis (CEA), certainly in the case of TB interventions, the ability to combine morbidity and mortality estimates into a unified ‘generic’ measure offers clear advantages. The two foremost measures are the disability-adjusted life year (DALY) and the quality-adjusted life year (QALY), differing from one another in their conceptual underpinnings and construction, and in the valuations they yield [3, 4].

The World Health Organization’s Choosing Interventions that are Cost-Effective (WHO-CHOICE) programme provides a structured framework for generalised CEA. The DALY was reaffirmed in 2021 as the metric of choice, justified by both its established prominence in low- and middle-income countries (LMICs) and the lack of “a single database of QALY weights for every disease state and country” [5]. Today, the DALY is increasingly the preferred measure in economic evaluations across the TB care cascade, including vaccination [6], screening [7], diagnosis [8] and treatment [9].

DALYs combine the years of life lost from a condition (YLL) with the years lived in impaired health or ‘disability’ (YLD), the latter calculated by multiplying the duration spent in a specified health state by a numeric ‘disability weight’. These weights can be pivotal in cost-effectiveness calculations, particularly for interventions with low or stable mortality rates, thereby elevating the influence of morbidity factors. Disability weights function such that, for instance, living 4 years in a state with a weight of 0.25 (as compared with 4 years in full health) would in principle be equivalent to losing 1 year of healthy life, both cases resulting in one DALY.

While there is no gold standard, weights computed by the Global Burden of Disease (GBD) programme are extensively used and endorsed by WHO-CHOICE [5]. Furthermore, GBD methodologies are progressively shaping the weight derivation field, with derivation studies increasingly conforming to the GBD approach [10]. GBD disability weight derivation methods have evolved since their 1993 World Development Report inauguration [11], which saw six broad ‘severity classes’ assigned weights by experts. Since GBD-2010, weight derivation has been through valuation exercises with lay survey respondents [13]. The process purposefully avoids presenting respondents with disease labels (e.g. “tuberculosis”) but utilises short (maximum 35-word) lay descriptions developed through “consultation with expert groups”, intended to “capture the most salient details for each health state” [13].

While the common physical manifestations of pulmonary TB disease have been recognised for millennia (e.g. cough, fatigue, weight loss and night sweats) [14], there is now wide recognition of TB’s broader effects, including on mental health [15]. A review of 131 studies reporting TB-related disability (2000–2019) catalogued a spectrum of TB-related impairments, including respiratory (21% in pooled estimates), auditory (15%), musculoskeletal (17%) and mental health disorders (23%) [16]. Increasing evidence also demonstrates that, for many, the effects of TB continue long after microbiological cure and treatment completion, including psychosocial and socio-economic impacts, and manifesting physically as ‘post-tuberculosis lung disease’ [17]. Furthermore, while treatment is essential, daily drug regimens can be onerous, having a high pill-burden and commonly causing side effects. These difficulties are amplified for those taking drug-resistant regimens [18].

The GBD description assigned to human immunodeficiency virus (HIV)-negative people with TB, both drug-susceptible and drug-resistant, is: “has a persistent cough and fever, is short of breath, feels weak, and has lost a lot of weight.” This has, since GBD 2019, carried the weight 0.333 [19].

This study reviews the use of DALY disability weights in CEAs of TB-related interventions. Unlike broader reviews of cost-effectiveness analyses which examine overall approaches and study quality (including comparators, methodologies and evidence), this review focusses principally on the choice, derivation and application of disability weights and associated weight-related considerations. Through critical examination, we aim to provide valuable insights into this important aspect of TB-intervention assessment, identifying opportunities for enhancement and refinement.

2 Methods

We sought to determine what disability weight values and accompanying methods have been used in the determination of TB-related DALYs within CEAs and whether weights and methods were applied consistently.

We conducted a systematic literature review following, where appropriate and feasible, Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) 2020 guidelines [20]. In addition to our review of disability weights in CEAs, we also reviewed relevant GBD publications over time to summarise the evolution of TB weight values. This summary not only serves as a valuable resource but also facilitated our review by enabling accurate cross-referencing of weight usage within the literature.

To our knowledge there are no specific Enhancing the Quality and Transparency of Health Research (EQUATOR) guidelines or PRISMA extensions currently published to guide or appraise systematic literature reviews of DALYs within CEAs, nor guidelines focussed on weight measures and their quality. Establishing a practicable set of recommendations was a secondary objective of this review.

2.1 Data Sources and Inclusion/Exclusion Criteria

Our search was conducted in two parts.

2.1.1 Search 1—Tufts CEA Registry

The Centre for the Evaluation of Value and Risk in Health (CEVR) at Tufts Medical Centre maintains the CEA Registry, a comprehensive database of cost-effectiveness studies published since 1976 which report cost/DALY or cost/QALY ratios. At the time of writing, the database included articles published up to 31 December 2021 [21]. Additionally, as part of their Global Health Initiative, CEVR offers data on cost-per-DALY articles for free download in a single Microsoft Excel database. Details of the Tufts search strategy and inclusion criteria can be found [https://cear.tuftsmedicalcenter.org/storage/resources/CEA%20Registry%20User%20Manual%202023.docx] with detail also provided in the Electronic Supplementary Material Fig. A5.

We searched the database of studies within both the Title and Abstract fields using the strings ‘tuberculosis’, ‘tb’ and ‘MDR-TB’, and within the Health State field of the Utilities data previously extracted by Tufts. All identified papers were downloaded for full-text screening, excluding only those for which the search string ‘tb’ had returned irrelevant words (e.g. ‘seatbelt’) while no other criteria had been met.

2.1.2 Search 2—Extended Database Search

Adhering to the same Tufts strategy, we conducted an update to the CEVR search (from 1 January 2022 to 30 June 2023) within the same electronic databases, namely PubMed, Scopus and Embase. We added one criterion, requiring one or more of the search strings ‘tb’, ‘tuberculosis’ or ‘mdr’ to appear in the title/abstract or manuscript body (Electronic Supplementary Material—A12). These abstracts were screened independently by two authors, with exclusions made solely on the study type (e.g. protocols, posters and comments).

2.2 Full-Text Review and Data Extraction

After merging studies from both searches and removing duplicates, two authors independently conducted full-text reviews of the studies. In cases of discrepancies, a third author was available for consultation.

To be included for data extraction, all studies had to include at least one cost/DALY ratio (necessarily satisfied by studies from the CEVR database). Furthermore, studies had to have either evaluated at least one TB intervention or included relevant TB-related disability weights. Relevant weights included those for any TB-related state [latent TB infection (LTBI) and active disease, drug resistant/susceptible TB, post-TB sequelae or HIV-positive or HIV-negative TB], or those relating to TB medication or associated side effects.

Data were independently extracted from studies by paired combinations of authors using a standardised Google Sheets form, from either manuscript main texts or appendices/supplementary material.

We contacted authors of studies (by email) only in the instances where manuscripts had referred to appendices or supplementary material and these files could not be located.

2.2.1 Study Data

At the study level, data included publication details (authors, journal and publication year), intervention and country focus, and type(s) of tuberculosis [e.g. drug-susceptible TB (DS-TB), Multidrug-resistant (MDR)/rifampicin-resistant (RR)-TB, LTBI, extrapulmonary (EPTB)]. Studies were categorised by their intervention type relating to the patient pathway, with the following categories: diagnosis, treatment, prevention (with subcategories vaccine and preventative therapy), active case finding (ACF) and other, including programmatic interventions, e.g., directly observed therapy (DOT) expansion. HIV-focussed interventions containing TB-weights formed a separate category. We further extracted whether studies had conducted sensitivity analyses on any parameters (recording whether they were probabilistic or deterministic), and each study’s cost perspective (health system/provider, societal or patient). We additionally examined each manuscript for discussions of post-TB disability or false-positive diagnoses.

2.2.2 Disability Weight Data

Where available, for each disability weight instance in a study, we extracted the numeric value (reporting decimal places faithfully), any referenced source(s) for the weight, the weight’s confidence interval or range and the health state(s) to which the weight was applied. If a study had applied a single weight value to multiple health states (e.g. both HIV-positive and HIV-negative TB had been assigned 0.333), we considered this to be one weight in the study, though all states to which the weight had been applied were recorded. For studies that described deriving additional primary weights, these weights were only included in our extraction if they were explicitly stated in the study (either within the manuscripts or supplementary material). In cases where studies had indicated that they had ‘reversed’ their weights (i.e., indicating the value of \(1\) as full health and/or \(0\) as death), the position of the weight in our results table is based on \(1-weight\) (and is clearly signposted). Weights deemed ‘non-viable’ (e.g. not within the range [0, 1]) are included in the main results; studies lacking weights are summarised separately.

2.3 Audit of Weights

Each disability weight usage instance was evaluated using a set of objective binary metrics, defined a priori:

  • Was the disability weight specified explicitly?

  • Was a reference provided?

  • Did the provided reference support the disability weight value?

  • Did the weight value correspond to a GBD weight?

  • If the weight was taken from the GBD, was the weight up-to-date (allowing a 1-year grace period post-new GBD weight publication)?

  • Were the results of any sensitivity analysis on weight value reported?

For this final question on sensitivity analyses, we identified whether methods used were probabilistic, deterministic or both. We recorded both whether studies that had suggested analysis had been conducted and whether results had been presented. For additional details see Electronic Supplementary Material Table A5.

While we initially aimed to extract data concerning the length of time that weights were applied to health states, this component was excluded from the final analysis.

2.4 Risk of Bias and Quality of Studies

Considering the abovementioned audit and our review’s aims and approach, we used no additional discrete validated tool to assess the risk of study bias. While our review concentrates on methodologies associated with disability weighting and does not assign individual scores for overall quality, we offer, for comparative purposes, external quality scores (provided by Tufts and Hao [8]) for a subset of papers alongside weight findings (Electronic Supplementary Materia, Table A14).

2.5 Recommendations

We developed a set of best-practice recommendations on the basis of the findings of our review to guide future research using TB disability weights.

3 Results

3.1 History of GBD TB disability weights

In 1996 the first TB-specific disability weights were introduced, departing from the previous framework of general ‘severity classes’ across conditions (Table 1). Since GBD 2019 (published late 2020), five distinct (non-zero) TB-weights have been proposed for analysis, with equal weighting assigned to MDR-TB/extensively drug-resistant TB (XDR-TB) and DS-TB.

Table 1 Evolution of GBD disability weights for tuberculosis-related health states (1996 to present)a

3.2 Review

Of the CEA registry’s 929 studies, 88 met inclusion criteria, while our up-to-date databases search returned 87 unique studies. Combined, this yielded 166 studies for full-text review (one duplicate removed), with 105 studies satisfying criteria for inclusion (Fig. 1).

Fig. 1
figure 1

PRISMA diagram explaining study selection

3.2.1 Study Characteristics

Among the 105 studies published between 2002 and 2023, 69% (72/105) had been conducted in a single country, of which 93% (67/72) were LMICs. A total of 50 countries appeared across studies, with all but 8 LMICs (Electronic Supplementary Material; Table A2). South Africa was the most frequently studied country, appearing in 32 studies, followed by India (23 studies) and Brazil (11 studies); 10 studies were non-country specific (Electronic Supplementary Material; Table A3; Fig. A2). The publication rate has increased from 1.2 studies per year in the decade 2000–2009 to 10.8 studies per year since 2020; 70% had been published in 2015 or later. Publications were spread across 41 journals with PLOS One (n = 13; 12%), International Journal of Tuberculosis and Lung Disease (IJTLD; n = 12; 11%) and Lancet Global Health (n = 11; 10%) being most common (Electronic Supplementary Material; Table A5).

All but 11 studies had evaluated tuberculosis focussed interventions, and of these, studies of diagnosis (24/94; 26%), treatment of active disease (20/94; 21%) and prevention (17/94; 17%) were most frequent, followed by active case finding (13/94; 14%; Electronic Supplementary Material Table A2). Other studies had considered programmatic scale-up of interventions [e.g. Directly Observed Treatment, Short-course (DOTS)] or several interventions across categories. Among the 11 non-TB-focussed studies, 10 had focussed on HIV (11%), with one environmental study which had calculated TB DALYs [24].

Table 2 Value (range) and provenance of TB disability weights reported in cost-effectiveness studies

For 21 studies (20%) we were unable to extract disability weights: 9 had not calculated YLD; 7 suggested calculations of YLDs but specified no weights; and 5 studies were unclear (Table A6). We were unable to retrieve appendices for two studies [25, 151].

A large majority of studies (98/105; 90%) had performed and documented some sensitivity analyses. Approximately half reported results of both probabilistic sensitivity analysis (PSA) and deterministic sensitivity analysis (DSA; 56/105; 53%), 35/105 studies DSA alone and 7/105 studies PSA alone.

Most studies adopted a health-system/provider perspective, excluding patient costs (79/105; 75%).

3.2.2 Overview of Weights

The 165 weights used ranged from 0, used for latent TB [26, 27] and specified Grade 3 MDR-TB Treatment side effects [28], to 0.697 used for HIV-positive individuals receiving ART in a study of people with DR-TB [92] (not considering weight values > 1). The most common values used were the GBD 2010 values for TB (0.331; 19 studies) and TB-HIV (0.399; 17 studies; full breakdown in Electronic Supplementary Material A8). Specified weights for ‘\(well \ state = 0\)’ or ‘\(death = 1\)’ (e.g. [93]) were not extracted. Several studies stated they had reversed the standard disability weight anchor to 1 for no disability, and had adjusted their weights accordingly (e.g. by taking 1 minus the weight) [30, 31, 47, 70, 111]. In several cases a statement was given indicating transformation, e.g., “\(no\ disability = 1\)”, but this appeared not to have been conducted [32,33,34].

3.2.3 Disability Weight Sources

Of the 165 weights extracted, 100 (61%) were identified as GBD weights. Of these 100, only in 47 instances (47%) had the weight value used been explicitly specified (e.g. ‘0.333’ stated in manuscript or appendix) and a corresponding up-to-date reference cited (first five indicator columns of Table 2 = ‘●’). This represents 28% of all 165 weight usages across the studies (47/165). Apart from GBD studies, 17 further non-GBD studies had been cited as sources of weights (Electronic Supplementary Material; Table-A9). The application of some methods saw a departure from standard DALY methodology, with several studies appearing to adopt a ‘health utility’ approach in DALY calculations [45, 58, 124, 125], including, for example, individuals’ post-recovery returned to the ‘health of the general population’, with a EuroQol 5-Dimension (EQ-5D)-based citation [58]. Two studies [124, 125] were excluded from Table 2 for this reason (see Electronic Supplementary Material; Table A11).

3.2.4 TB Health States

Across studies, several types of TB were specified as having had weights applied, including latent TB (TB infection without symptoms), active TB (TB disease causing symptoms and illness) and drug-resistant TB (RR-TB, MDR-TB, and XDR-TB); one study had assigned different weights to smear-positive TB (TB bacilli, whether alive or dead, visualised by microscopy) and smear-negative TB [59], and two other studies had used different weights for culture-positive TB (colonies of TB bacilli grown from sputum samples in the laboratory) and culture-negative TB [57, 58]. Eight studies identified extra-pulmonary TB in their study populations [39, 56, 62, 94, 126,127,128,129], though we found no instances of extra-pulmonary and pulmonary TB being weighted differently. We note, however, that six further studies had accounted for ‘Any WHO stage four condition’ (0.54), which would include EPTB [30,31,32,33,34, 111]. Only one study [28] had accounted for post-TB using a specific weight (0.053).

3.2.5 TB–HIV Coinfection

Two-thirds of studies (70/105; 67%) had indicated that HIV was a consideration for their study population (Electronic Supplementary Material; Table-A4). Of the 53/70 studies published from 2014 onwards – 1 year after GBD began publishing dedicated with/without-HIV-infection TB weights—36/53 (68%) had made use of the GBD TB-HIV weight (Electronic Supplementary Material, Fig. A4). While in several studies it was unclear why the TB–HIV coinfection weight had not been used [47, 51, 77, 102, 130], Marx and colleagues [27] bypassed the single GBD TB-HIV weight, using the multiplicative method to derive coinfection weights reflecting HIV severity, using the TB weight (0.333) and three GBD HIV states (0.012, 0.428 and 0.078). Similar approaches were followed prior to dedicated TB-HIV weights within GBD [69]. Our review found only one use of a GBD TB-HIV anaemia weight (0.439, TB-HIV moderate anaemia), though this appeared to be used for TB-HIV [28].

3.2.6 Disability of Treatment and Wider Effects

In general, those receiving medical care and those not were assigned equal weights, though several studies had used specific values to account for treatment, most frequently 0.132 [50,51,52], and 0.1 [39,40,41, 43, 44]. The 0.132 value first appeared as an assumption in 2008 [half of the GBD weight for TB; Murray et al. (1996); 15-44-year age group 22], while the 0.1 value’s origin traces back to Guo et al. (2008) [131] (Electronic Supplementary Material-B). Further weights assigned to TB treatment included a value of 0.2 used for MDR-TB treatment [39, 43], a value of 0.226 for individuals receiving both ART and TB treatment [56] and a ‘relative disability of 1.2’ applied in one paper for XDR-TB [58]. Induced hepatotoxicity from isoniazid preventive therapy (IPT) was weighted in two studies [41, 44], in each as mild (weight = 0.15) and severe (weight = 0.6; for weight provenance see Electronic Supplementary Material-B). Other studies accounted for “progression to hepatitis from TB drugs” [61] and ethambutol-induced blindness [70]; one study had accounted for hearing loss [105], while Sweeney et al.’s (2022) study of MDR treatment regimens [28] accounted for 11 health states (9 distinct weights), 5 of which applied to treatment effects. Wolfson and colleagues had used weights of 0.08 for ‘MDR-TB patient post-surgery’ (lung resection surgery) and 0.12 for ‘patient with MDR-TB, cured, on treatment’ [38]. One study had used a ‘utility decrement’ for hospitalisation (0.121), a weight for ‘palliative care and loss to follow up (LTFU; 0.66), and for surgery (0.49), though it followed health utility methods [45].

3.2.7 Uncertainty and Sensitivity

Only 30% of studies (31/105) reported having conducted sensitivity analysis on disability weight values, with either DSA or PSA results explicitly provided for 26 studies (4 DSA alone, 15 PSA alone and 7 both DSA and PSA). In 11 studies it was unclear whether PSA analyses had incorporated weight values due to unspecified intervals/distributions. Results for DSA on weights were frequently not provided, often due to implied low sensitivity (e.g. studies stated only the most influential parameters shown), though in other cases the reason for omission was unclear (refer to Electronic Supplementary Material Table A5 for a full breakdown). Of the 165 uses of weights, 51/165 (31%) had been included in a PSA, and for 24/165 (15%) results of DSA were shown. Lung et al., who had conducted a sensitivity analysis on weight values, notably remarked: “ICERs were fairly robust to change, suggesting parameter uncertainty had a minor effect on the cost-effectiveness results. Use of the upper and lower 95% CI values for tuberculosis-specific DALY weights resulted in the largest changes, with ICERs per DALY averted ranging from $438 to $744” [76].

4 Discussion

Given the sustained underfunding for global TB control programmes and the increased demands for resources towards achieving the End-TB strategy, there is a critical need for accurate, transparent, robust, and reproducible methods for valuing health effects in TB-concerned CEAs. The preponderance of the DALY in TB studies necessitates harmonisation of disability weight procedures. While GBD disability weights are the predominant choice in studies calculating TB DALYs, our review identified weights sourced from an additional 17 studies covering a spectrum of TB-related health states.

Collectively, TB-CEA studies employing DALYs reveal methodological concerns of varying gravity. At one end, we have minor concerns unlikely to materially impact conclusions. This includes uses of outdated GBD weights and unclear/omitted referencing—issues easily resolved with increased diligence. At the other end are a subset of studies containing methodological shortcomings that inevitably affect results and conclusions. These include the adoption of inappropriate weights or methodologies—such as weights exceeding 1 and health utility approaches—and seemingly contradictory weighting of states of clinically distinct severity. Between these poles, the most widespread issues observed were omissions in published reports which impede reproducibility, at times reducing confidence in results. These issues include unspecified weight values, insufficient detail on weight application and/or underlying methodological assumptions and the absence of sensitivity analyses on weights. One study which had conducted sensitivity analyses on weight values reported “values for tuberculosis-specific DALY weights resulted in the largest changes” [76]. This is foreseeable, given the GBD’s upper-range estimates for both TB and TB-HIV weights are double their corresponding lower estimates. Lack of sensitivity analyses on disability weights has similarly been highlighted elsewhere [132]. Increased rigor and consistency, aided by our proposed set of recommendations (Table 3), could address many of these issues and provide support in the peer review process.

Table 3 Recommendations for CEAs using tuberculosis disability weights

The influence of disability weight values on overall DALYs within any CEA will be less pronounced when the alternatives being evaluated have larger relative contributions from mortality (YLL). While not articulated, this consideration may potentially explain why some reviewed studies chose to calculate DALYs solely on the basis of YLLs. Looking ahead, however, as TB treatments are expected to become more efficacious and their coverage widens, accurately reflecting non-fatal health outcomes will become increasingly important. Our review provides evidence that many researchers are currently seeking to include broader health effects related to TB beyond those covered by currently available GBD disability weights.

Our findings reveal multiple efforts to include the effects of TB treatment in analyses, yet also point to an absence of established methods or standardised weights for doing so. We document the use of ad hoc values appearing in studies, which not only challenge the precision of results and the decisions they inform but also limit between-study comparisons, as these methods are not followed by all. Nevertheless, the inclusion of treatment effects within analyses may have a strong basis for justification. Side effects from TB treatment are commonplace (even driving some people to discontinue their treatment) [133, 134], and despite a persistent treatment gap, most people who develop clinical TB disease do receive treatment [135]. Ergo, for most, their experience of TB disease is intimately coupled to their experience of TB treatment. Moreover, one might consider it particularly important to reflect treatment effects in CEAs which themselves evaluate treatment interventions (our review found to be approximately one-fifth of CEAs). Furthermore, as global TB control efforts transition towards the ‘last mile’, progressively more strategies focussed on systematic mass-screening for TB among at-risk populations (i.e. ACF) will likely be devised and implemented. The lower positive predictive value of such interventions will further increase the relevance of representing treatment effects.

A disability weight for TB–HIV coinfection has been provided by the GBD since 2012, yet this weight has been regularly overlooked, again hindering interpretation and inter-study comparability. However, caution surrounding the GBD-provided TB-HIV weight is not without cause. While several GBD weights are available for HIV without TB, ranging from 0.012 (‘early HIV without anaemia’) to 0.582 (AIDS cases not on ART), TB-HIV coinfection is represented by a single weight of 0.408 (see Electronic Supplementary Material, Fig. A1). This can result in a considerable jump in the weight assigned when accounting for TB in HIV-positive individuals, not only upward (e.g. from 0.012 to 0.408) but also downward (from 0.582 to 0.408). Likely due to this reason, several authors described having derived their own weights within their studies to allow for a range of TB-HIV severities. Further TB-HIV weighting limitations were evident in two studies considering treatment of LTBI in HIV-positive populations, whereby, somewhat counterintuitively, ‘active TB-HIV coinfection off ART’ was assigned a less severe weight of 0.408 than the weight of 0.582 applied to ‘LTBI-HIV coinfection off ART’ [36, 37]. Johnson et al. remarked on this, referring to the “absence of accepted values” for TB–HIV coinfection [36]. While not limited to GBD weights [136], such incongruities regarding potentially contradictory or undifferentiated weightings, along with a lack of uniformity among studies, complicates comparisons and risks, diminishing the accuracy in assessing the benefits or harms of interventions, possibly leading to skewed conclusions.

That the average experience of people with MDR-TB differs from people with DS-TB is well recognised and uncontentious [137], a distinction that several studies in our review aimed to reflect using MDR-specific weights. However, in the absence of widely accepted weights or methods, most studies understandably defer to the GBD framework, applying a uniform weight for both drug-susceptible and MDR-TB: “The disability weight for TB of 0.331, which does not differ between different states nor between TB and MDR-TB…” [85]. Undifferentiated weighting will inevitably reduce projected DALY benefits of some MDR-TB-related interventions. Moreover, although a detailed exploration of this was beyond the scope of our study, this approach may also disincentivise data collection on drug resistance patterns in study populations since current weighting practices are insensitive to these distinctions.

Since 2020, five distinct weight values for TB-related health states (plus the value of 0 for latent TB) have been provided by GBD (GBD 2019 update; see Table 1). Three of these values, however—those relating to TB-HIV anaemia classes—are not currently utilised in analyses. Considering the evolving nature of GBD disability weights, our review points to potential gaps, which could be addressed in future updates. More granular TB-HIV weights could be created by extending the methodology for ‘combined weights’ without additional data collection. Furthermore, this same approach could additionally consider the growing body of evidence relating undernutrition and tuberculosis, affecting more than 20% of TB cases globally [138, 139]—a greater number than those with HIV. Our review further suggests that, if/when additional states are added and scored in future GBD weight updates, a distinct weight for ‘MDR-TB’ would likely be welcomed.

While our review catalogues disability weights used in TB CEAs, it cannot verify or provide guidance on the accuracy of weight values in representing the health effects of TB. Nonetheless, the dominant use of GBD-derived weights warrants a brief examination of the health state lay descriptions they are based upon. Lay descriptions prioritise the “major functional effects and symptoms associated with each health state”; in the case of TB, this involves a physical health description of pulmonary disease. However, descriptions often extend to include wider considerations such as ‘daily medication’ and its side effects (specified for HIV/AIDS on ART [19]), while social and emotional impacts are further incorporated into other descriptions [12]. Given that GBD lay descriptions undergo revisions prior to weight elicitation studies—30 modifications were made for GBD 2013 [12]—and with growing recognition of TB’s mental and social implications [15, 140, 141], a revision of the purely physical health description to reflect recent evidence and incorporate up-to-date perspectives may be worthy of consideration. Ongoing comprehensive critiques of the overarching DALY approach can be found elsewhere [142, 143], and while valuable context for this study, such exploration is not our present focus.

A major strength of GBD weights is the clarity of their derivation process. For example, in GBD-2013, 30,000 web responses contributed to the revision of the HIV-negative TB weight (from 0.331 to 0.333). However, one might question the extent to which the respondent pool, from the USA and several countries in Europe, is representative of the populations identified in our review (Electrionic Supplementary Material Table A3; Figs. A2, A3). While acknowledging the complexities in conducting GBD studies—and GBD’s remit extending far beyond TB—it is notable that most DALY CEAs are conducted in infectious diseases, predominantly in LMICs [144]. Given this context, there is an argument for enhanced data collection in key and underserved populations. Furthermore, given the substantial variation in individual TB-experiences, particularly disparities between experiences of MDR-TB and DS-TB treatment, our findings support collection of primary health-related quality of life data for CEA when possible.

Our study has several limitations. First, the two-stage process of full-review and partial-review employed by Tufts in compiling the CEA registry omits articles in journals with impact factor < 2 unless published in one of their priority journals. Any such studies will not have been included in our review. Furthermore, in one instance we were unable to locate a manuscript and were additionally unsuccessful in locating supplementary material for two further manuscripts, receiving no response from contact authors. Furthermore, while we had set out to collect data on duration of weight application, this variable was dropped in analysis due to widespread ambiguity in reporting. Additionally, we acknowledge that there may be some HIV-focussed studies undetected in our extended search which included TB weights. While this study’s focus was on DALYs, it is acknowledged that future discussions of how we measure TB morbidity will inevitably require a comprehensive understanding of the landscape of TB CEA studies which utilise the QALY; our review revealed that few studies calculate both measures in parallel [38, 65]. While sharing similarities, QALYs and DALYs have structural differences, and further exploration of the mechanisms by which QALYs and DALY evaluations differ would be valuable. Finally, given that our review specifically aimed to document the use of TB weight values, we did not examine the impact of these values on cost-effectiveness outcomes within studies. Such investigations could be pursued in a future meta-analysis.

5 Conclusions

Considering the future roadmap for TB research and anticipated investments, high-quality research is essential for informed decision-making. While this study reveals inconsistencies in current disability weight methodologies and reporting, we have confidence our recommendations could aid in their standardisation. While GBD disability weights are favoured within TB DALY literature, our findings show that researchers are utilising additional weights in conjunction that extend beyond the scope permitted by GBD weights alone. This suggests an intent to capture the broader non-fatal effects of TB more comprehensively than GBD weights typically accommodate.