Introduction

In the last two decades, there has been a global push to end the tuberculosis (TB) epidemic by setting aggressive targets with the End TB Strategy [1]. Nonetheless, in 2020, there were an estimated 9.9 million TB cases and 1.3 million deaths, of which an estimated 40% went undiagnosed [2]. These missed diagnoses, made worse by the ongoing COVID-19 pandemic, perpetuate transmission and present significant challenges in ending TB [2]. Implementing diagnostic tools that improve detection and reduce diagnostic and treatment delays is critical in overcoming these gaps in TB care [3, 4].

GeneXpert MTB/RIF® and MTB/RIF Ultra® (Xpert) and line probe assays (LPA) are commercial nucleic acid amplification tests (NAATs) that have good diagnostic accuracy with the capacity to diagnose drug sensitive (DS-TB) and drug resistant TB (DR-TB) within 1–2 days of sample processing [5, 6]. Anticipating improvements in accurate and timely TB diagnosis, these NAATs were recommended by the World Health Organization (WHO) [7, 8]. Since then, unprecedented efforts have been made by National Tuberculosis Programs (NTPs) across the globe to scale up these tests and included them as part of the routine TB diagnostic algorithms [9,10,11]. These NAATs have proven to have high accuracy, and research has increasingly focused on studying their actual clinical impact [10, 12,13,14,15,16]. While there are systematic reviews on the diagnostic accuracy of Xpert and LPAs [6, 17, 18], and others that separately describe diagnostic and treatment delays experienced by TB patients [19], no study has summarized the impact of NAATs on reducing time delays in diagnosis and treatment of TB.

Therefore, the main objective of our systematic review was to summarize the available quantitative evidence on the impact of NAATs on diagnostic and treatment delays compared to that of the standard of care for DS-TB and DR-TB. As the secondary objective, we investigated the potential sources of heterogeneity on the effect estimates, including the period the tests were used (pre-2015, post 2015), empiric treatment rate, HIV prevalence, healthcare level, and type of study design (randomized controlled trial, observational study design). We also describe methodological areas of concern in assessing time delays, an aspect that has not been adequately addressed in previous systematic reviews of diagnostic delays in TB.

Methods

Study selection criteria and operational definitions

Prior to the review, we developed a conceptual framework for classification of essential time delay components and definitions [20, 21] (Fig. 1). This framework standardized time delays and provided structural guidance in assessing time delays reported in the studies included in this review. We defined diagnostic delay as the time between initial patient contact with a clinic or sputum collection to reporting of results. Treatment delay was defined as the time between results and initiation of anti-TB treatment. And the combination of diagnostic delay and treatment delay was referred to as treatment initiation delay.

Fig. 1
figure 1

Conceptual framework of time delay components in diagnosis and treatment of tuberculosis. The illustrations depicted here are our own

Our review focused on the impact of the World Health Organization (WHO)-recommended rapid diagnostics (WRD), specifically Xpert® MTB/RIF and MTB/RIF Ultra assay (Xpert) and GenoType MTBDRplus and Inno-LiPA RifTB (both referred to as LPA here on), because of their rapid uptake at the global level [2]. Several other tests have been recommended since 2020, but we did not include them in our systematic review because data is still limited [22].

We included only peer-reviewed studies that assessed time delays in the process of diagnosis and treatment of DS-TB and DR-TB with the index test as NAAT and a respective comparator test (e.g., smear for Xpert and culture DST for LPAs). We did not restrict our studies based on geography, settings, language, or type of study design. We excluded studies if they: (1) did not include primary data; (2) did not report all data necessary for meta-analysis; (3) were reviews or modelling studies; (4) only reported ‘run-time’ or turnaround time of the test (e.g., “2 h to run” Xpert test); and (5) focused on childhood or extra-pulmonary TB. For conference abstracts, we contacted the authors to see if there was a manuscript in preparation to obtain relevant data. Similarly, we requested original data from the authors when a study did not report time delay estimates as per our study requirements.

Study search strategy, study selection, and data extraction

The present systematic review is an update to the systematic review published in the lead author’s (HS) doctoral thesis in 2016 [23]. The original and updated search were undertaken on January 31, 2015, and October 12, 2020, respectively. We identified eligible studies from MEDLINE, EMBASE, Web of Science, and the Global Health databases that included terms associated with time, like “delay” and “time to treatment” (see Additional file 1 for the complete search strategy). We also consulted references of included articles and previous systematic reviews focusing on the diagnostic accuracy of NAATs, and experts in the fields of TB diagnostics to identify additional studies not included in the database search. After removing duplicates, two reviewers (SGC, ZZQ, or HS—original review; JSL, JHL, or TG—updated review) independently screened titles and abstracts, followed by full-text review for inclusion (HS, SGS—original review; JSL, JHL—updated review). Any discrepancies were resolved by consensus or, in case of the updated review, a third reviewer (HS, TG).

Google Forms (Google LLC, Mountain View, CA, USA) was used for the initial review, but in the updated review, this data was incorporated into Covidence (Veritas Health Innovation, Melbourne, Australia) to manage the review and extract data [24]. The data extraction tools were pilot tested, using five studies in the full text review pool, prior to conducting full data extraction. A set of reviewers (HS—original review; JL, JHL—updated review) extracted the data before it was examined by separate reviewers (SGS—original review, TG—updated review) to resolve any discrepancies in the extracted data. We extracted data on study design, geographic setting, operational context, time delays for both the index and comparator tests, and delay definitions. Units of time were converted into the number of days. An example data extraction tool is available in Additional file 3.

Quality assessment of time delay estimates

Unlike quality assessment tools for diagnostic accuracy studies, there is currently no established method or checklist that can be used to assess the quality of studies investigating time delays or time to event study outcomes [25]. Therefore, we developed a matrix of key methodologic and contextual information necessary to determine the usefulness and comparability of the time delay reported. These included (1) provision of a clear definition of measuring time delay and reporting the time delay estimates (“delay definition”); (2) use of appropriate statistical methods to report and assess changes in time delays (“statistical methods”); (3) evaluating time estimates alongside patient-important outcomes (“patient important outcomes”), which included culture conversion, TB treatment outcomes, infection control and/or contact tracing.

The provision of a clear delay definition was a binary variable with “Yes” and “No” options, where “Yes” indicated that the time delay term was defined clearly indicating its start and end time points with the delay estimate. The other two quality indicators were ranked on a high–medium–low scale. For the statistical method assessment, high quality studies evaluated the distribution of time delay and whether it used proper statistical methods [randomized controlled trial (RCT) or propensity score method for observational studies] that adjust estimates for proper comparison with a measure of variance to assess time delays between the index and the comparator test. Medium-quality studies evaluated the distribution of time delay with uncertainty estimates but did not use appropriate statistical methods for comparative assessment of time delays. And low-quality studies neither evaluated the distribution nor compared the time delay. For patient-important outcomes, high-quality studies analysed the relative risk or odds of improvement in culture conversion with the amount of time saved in TB treatment initiation. Medium quality studies reported time estimate alongside patient-important outcomes but without direct analysis, and low-quality studies did not consider patient-important outcomes at all.

Data synthesis and meta-analysis

We calculated overall medians and IQRs of diagnostic and treatment initiation delay for each diagnostic test (Xpert vs. smear, LPA vs. any culture DST methods) from the medians and means reported by the individual studies. Additionally, using the extracted raw data, we applied the Mann–Whitney U test on overall medians to determine the statistical significance of the median time estimates between the index and comparator tests. We assumed no confounding in the primary studies.

We then conducted a meta-analysis using the quantile estimation (QE) method developed by McGrath et al. to assess the absolute reduction in diagnostic and treatment initiation delay using NAATs [26]. The method involves estimating the variance of the difference of medians of each study and pooling them using the standard inverse variance method. Time to event data are non-normally distributed variables that are primarily reported in medians and IQRs. As units of delay measurements (days) were uniform across all studies, the effect size was chosen to be the raw difference of medians in time delay for both diagnostic and treatment initiation delays. We used a random effects model because the studies differed importantly in characteristics that may lead to variations in the effect size [27, 28]. Between-study heterogeneity was estimated by the method of restricted maximum likelihood. Since this method requires complete data from median (or mean), IQR (or SD), and sample size, studies that did not report all the data points were excluded for the analysis.

Given the multifactorial nature of the studies, we also evaluated the heterogeneity based on the I-squared statistic, where a value greater than 75% is considered to be considerably heterogeneous [28, 29]. We conducted subgroup analyses to identify possible sources of heterogeneity and to assess key factors (pre-2015 vs. post-2015, RCT vs. observational, etc.) that can variably influence the magnitude of our effect size estimate. We specifically chose 2015 as our cut-off time point not only because this was the cut-off for the original systematic review but also enough time had passed since the recommendation to see the effects of the implementation of NAATs in research studies. Further, we assessed for “small study effects” and publication bias with funnel plots followed by Egger’s test to determine their symmetry. We managed and analysed the data using Microsoft Excel 16 (Microsoft Corporation, USA) and R version 4.1.1 (R Foundation for Statistical Learning, Austria).

Results

Search results

After removing duplicates, we identified 14,776 (original review—7995; updated review—6781) titles and abstracts eligible for title and abstract screening. Of these, 323 were selected for full text review during screening. A total of 45 studies (26 DS-TB and 20 DR-TB) with relevant time delay estimates were ultimately included in this review (Fig. 2).

Fig. 2
figure 2

PRISMA diagram. DS-TB drug-sensitive tuberculosis, DR-TB drug-resistant tuberculosis. *One study [30] reported data for both DS-TB and DR-TB

Description of included studies

Of the 45 studies included in this review, 21 (81%) DS-TB and 15 (75%) DR-TB studies were conducted in Low-and Middle-Income Countries (LMICs) (Tables 1 and 2). One study had estimates for both DS-TB and DR-TB [30]. Overall, half of the studies (17 DS-TB, 7 DR-TB) were conducted in the African region with over two thirds of those in South Africa (n = 15). HIV prevalence was reported by 31 (19 DS-TB, 12 DR-TB) studies, of which about half (16 DS-TB, 4 DR-TB) reported a HIV prevalence of over than 50%. Amongst the DS-TB studies, 7 studies (27%) implemented Xpert as a point-of-care testing (POCT) program, and 15 studies (58%) implemented Xpert on-site, within walking distance of a primary care program or a laboratory.

Table 1 Study characteristics and time delays reported for diagnosis and treatment of drug-sensitive TB
Table 2 Study characteristics and time delays reported for diagnosis and treatment of drug-resistant TB

Quality assessment of time delay estimates

The studies had considerable methodological heterogeneity in the definitions of time delays. When classifying reported time delays according to our operational definitions and by study design, no study reported all sub-components of time delay. All studies evaluating treatment delay used TB treatment initiation time but start and end points for diagnostic delay varied across studies (Tables 1 and 2). Overall, 13 of the 45 studies did not provide a clear definition of the time delay estimates reported (Table 3). Amongst studies included in the DS-TB analysis, 6 (23%) studies employed a randomized control trial (RCT), and 2 studies (8%) were quasi-experimental using pre- and post-implementation study designs. One study used a single-arm interventional pilot study (4%), and the remaining 15 studies were observational (58%). All the studies in the DR-TB analysis were observational. In the use of proper statistical methods for measurement and reporting of delay estimates, 18 studies ranked high, 23 ranked medium, and 2 ranked low. In the evaluation of time estimates alongside patient important outcomes, 7 ranked high, 18 ranked medium, and 18 ranked low.

Table 3 Quality assessment of time delay estimates

In all funnel plots (Additional file 2), there were several studies falling outside of the 95% CI, impacting the visualized asymmetry. This may be due to considerable heterogeneity (I2 > 99%) of the studies. However, Egger’s tests—used to assess whether there are systematic differences between high- and low-precision studies—demonstrated no clear evidence of “small study effects.” (p = 0.085–0.462).

Impact of NAATs on delay

For DS-TB analysis, 12 studies were included in the primary analysis for diagnostic delay, and 18 studies were included for treatment initiation delay. The overall median diagnostic delay for smear and Xpert were 3 days and 1.04 days, respectively. The overall median treatment initiation delay for smear and Xpert were 6 days and 4.5 days, respectively. A random effects meta-analysis of the difference of medians showed that the use of Xpert did not show a statistically significant reduction in diagnostic delay [1.79 days (95% CI − 0.27 to 3.85)] compared to smear but showed a statistically significant reduction in treatment initiation delay by 2.55 days (95% CI 0.54–4.56) (Figs. 3 and 4).

Fig. 3
figure 3

Forest plots of raw median difference in diagnostic delay for Xpert and smear for drug-sensitive TB

Fig. 4
figure 4

Forest plots of raw median difference in treatment initiation delay for Xpert and smear for drug-sensitive TB

For DR-TB analysis, 13 studies were included in diagnostic delays and 12 studies were included in treatment initiation delays. The overall median diagnostic delay for culture DST and LPA were 54 days and 11 days, respectively. The overall median treatment initiation delay for culture DST and LPA were 78 days and 28 days, respectively. A random effects meta-analysis of the difference of medians showed that, in comparison with culture DST, the use of LPA significantly reduced diagnostic delay by 40.09 days (95% CI 26.82–53.37) and treatment initiation delay by 45.32 days (95% CI 30.27–60.37) (Figs. 5 and 6). I2 value of 99.79% and 97.22% for diagnostic and treatment initiation delay indicated considerable heterogeneity.

Fig. 5
figure 5

Forest plots of raw median difference in diagnostic delay for LPA and culture DST for drug-resistant TB

Fig. 6
figure 6

Forest plots of raw median difference in treatment initiation delay LPA and culture DST for drug-resistant TB

Comparing the studies from the two different phases of the review (pre-/post-2015), we found no statistical significance in the reduction of diagnostic delays but observed statistical significance in the reduction of treatment initiation delay with a median difference of 2.54 days (95% CI 0.45–4.62) for post-2015 studies and 5.04 days (95% CI 0.09–9.99) for pre-2015 studies. Similarly, subgroup analysis based on study design showed a statistically significant reduction in treatment initiation delay in the RCT group [2.85 days (95% CI 1.16–4.55)] but not in the observational group [1.67 days (95% CI − 1.70 to 5.05)]. When classifying studies by the healthcare systems level, Xpert did not provide meaningful reduction in treatment initiation delay regardless of the location of its placement: 1.27 days (95% CI − 1.45 to 4.00) for primary health care centres and 5.27 days (95% CI − 1.06 to 11.60) for tertiary hospitals. When grouped by POCT status, Xpert test implemented as a POCT service showed statistically significant reductions in treatment initiation delay compared to non-POCT programs. All sub-group analyses with greater than 2 studies showed I2 values greater than 89%, suggesting considerable heterogeneity (Tables 4 and 5).

Table 4 Subgroup analyses of reported time delay for TB diagnosis
Table 5 Subgroup analyses of reported time delay for TB treatment

Discussion

Principal findings

While there are several patient-important impact measures for new diagnostic tests [31], time delay estimates provide direct measure of the timeliness of TB care. To our knowledge, our systematic review of 45 studies is the first to comparatively synthesize and quantify reductions in delays in diagnosis and treatment of DS and DR-TB when the WHO recommended NAATs are used instead of smear (DS-TB) or culture DST (DR-TB). Our random effectives meta-analysis of the differences of median times showed that the use of NAATs improved treatment initiation delay for patients investigated for both DS and DR-TB; however, this benefit was not seen for diagnostic delay for DS-TB (Xpert vs. smear). We also found that the degree of benefit in reducing delays in using NAATs for TB care was highly variable and dependent on how the tests were implemented (e.g., laboratory-based vs. POCT), differences in study design to evaluate impact of NAATs on TB care delays, and large variations in how delays were defined and quantified.

In principle, Xpert and smear are “same-day” tests; therefore, expected reduction in diagnostic delays may be limited for Xpert. As such, in our meta-analysis, we did not find significant reduction in diagnostic delays when using Xpert compared to smear [1.79 days (95% CI − 0.27 to 3.85)]. For treatment delays, our analysis of 18 studies showed that Xpert reduced treatment initiation delays for DS-TB by 2.55 days (95% CI 0.54–4.56) compared to smear, but the degree of this effect was highly variable depending on how and where Xpert was deployed within the health care system. Particularly, in our sub-group analysis, we found that the use of Xpert as non-POCT (at any levels of health system) did not show meaningful improvement in DS-TB treatment initiation delay. Moreover, the ‘hub-and-spokes’ model—where patient samples for Xpert from several community health centres (spokes) are referred to a centralized laboratory (hub) in the system—for Xpert testing evaluated in earlier studies has shown limited impact on improving and optimizing the timeliness of TB care due to operational barriers causing further delays [32,33,34], de-prioritization of Xpert use as an initial test in the national algorithms [35, 36], and continued high empiric treatment [37, 38] rates in certain settings.

In contrast to DS-TB, use of LPA for DR-TB care had resulted in large reduction in delays for DR-TB care. Our meta-analysis results found that use of LPA drastically reduced overall DR-TB care delays by 45.32 days (95% CI 30.27–60.37). This was mainly due to prolonged delays associated with conventional DR-TB diagnostics (culture DST) that takes weeks to diagnose and treat DR-TB patients. However, reduction of these delays were not solely due the implementation of the technology alone. In an earlier phases of LPA implementation in South Africa, use of LPA for DR-TB care were much restricted and centralized at higher levels of the health and laboratory system, and caused treatment initiation delays of more than 50 days [39, 40]. DR-TB care delays gradually improved to 28 days (IQR: 16–40) through the 3-year DR-TB care decentralization program, which included streamlining LPA testing in the clinical practice (years 2009–2011). Moreover, studies from settings with more established healthcare infrastructure (e.g., China and South Korea) also found that operational challenges diminished the potential benefit of rapid molecular testing in improving DR-TB care delays [41,42,43].

Strengths and limitations

For the meta-analysis, we used the Quantile Estimation (QE) method because it had excellent performance in simulation studies that were motivated by our systematic review [26]. One advantage compared to more traditional approaches based on meta-analysing the difference of means is that the QE method uses an effect size that is typically reported by the primary studies (i.e., the difference of medians) rather than one that must be estimated from the summary data of the primary studies (i.e., the difference of means). However, our meta-analysis results should be interpreted with caution because considerable statistical power was lost when restricting to studies that presented all the necessary data for estimating the variance of the difference of medians. Also, the high level of clinical (e.g. participants, outcomes) and methodological heterogeneity (e.g. study design, defining and reporting of time delays) in the studies included in our review translated into high I2 values in all of our meta-analyses results, making generalized interpretation of our summary estimates difficult. We also advise caution in the interpretation of our subgroup analyses because these confounders often complicate the interpretation and lead to wrong conclusions [44].

Delays in TB care occur due to a wide range of patient and health systems risk factors. [46, 48] Studies included in our review did not comparatively assess and adjust for risk factors associated with time delays for both the index (Xpert or LPA) and the comparator (smear or culture DST). This may be because time delay estimates were not the primary outcomes in most of the studies, and thus lacking proper analytical assessment of these outcome measures. Therefore, we were limited to sub-group analyses on key study-level attributes (e.g., HIV prevalence, empiric treatment rate, Xpert placement strategy, and study design), which were highly heterogenous and in many cases, inconclusive in showing that Xpert improved delays in TB care. Moreover, our findings are subject to potential confounding issues—at both health systems (e.g., differences in healthcare system infrastructure, TB care practices, implementation strategies of the index tests) and patient level factors (e.g., symptom levels, age, care-seeking behaviours)—which may bias our effect estimates (number of days reduced in diagnostic and treatment initiation delays) towards or away from the null. Given these reasons, generalizability of our findings may be limited. Likewise, our review underscores a need for more research investigating health systems and patient factors that can impact delays in TB care during and after the implementation of diagnostic tests and strategies that aim to improve the timeliness and quality of TB care. Lastly, despite carrying out comprehensive searches and considering non-English studies, we may have missed some studies in our review. Therefore, we cannot rule out potential publication bias.

In our study, we also investigated consistencies in defining and reporting of time delays across studies with a framework developed as part of our study (Fig. 1). In our quality assessment of the studies reporting time delay estimates (Table 3), we found considerable heterogeneity in defining time delays and close to 30% of studies (13) reported delay estimates without providing clear definitions. Many of the studies included in our review used the same terms to define different components of the delay. For instance, “turnaround time”, “time to detection”, and “laboratory processing time” were used to describe the time from specimen receipt by the lab to test result at the lab, while others employed these same terms to define diagnostic delay, time from specimen collection to notifying the clinic of the test result. In addition, several studies included in our review did not include or inappropriately reported uncertainty ranges (e.g., no IQRs or reported means with IQRs). As time data may be highly skewed, standardizing the practice of reporting delay estimates as medians with their variances or other measures of spread (e.g., IQR or range) can help facilitate synthesis of these studies. Many of these issues have been previously reported by other systematic reviews on TB care delays and our findings reemphasizes the importance in standardizing how TB care delays are defined, measured, and reported [20, 45,46,47,48].

Conclusions

The global rollout of NAATs has dramatically changed the landscape of TB diagnosis in high TB burden settings with improvements in the TB diagnostic infrastructure and the quality of TB prevention and care programs. Our systematic review findings suggest that implementation of NAATs have resulted in a noticeable reduction in delays for TB treatment compared to the conventional methods. However, these improvements did not fully realize the potential benefits of NAATs because of health system limitations [49]. Additionally, we identified methodological concerns in reporting of time delay estimates and emphasize the need to standardize and promote their consistent reporting.