FormalPara Key Points for Decision Makers

Because of the upcoming (2022) European “in vitro diagnostic regulation”, more attention towards the clinical effectiveness of new diagnostic tests is expected, which also presents an opportunity for economic analyses of diagnostics

This review shows that the methods to assess the cost effectiveness of diagnostic tests for respiratory tract infections vary, making it difficult to make comparisons

Decision makers should consider the application of the reference case for economic evaluation and pharmacoeconomic guidelines to diagnostics and adapt these guidelines if needed

1 Introduction

When diagnosing a patient with common symptoms with various possible pathologies, such as respiratory infections, clinicians historically had to either rely on clinical judgement and empirical therapy, or wait for the results of diagnostics performed in specialised laboratories [1]. Recent developments have brought diagnostic tests to the point of care (POC): these novel diagnostics enable clinicians to rapidly and more accurately diagnose patients and to prescribe more appropriate treatment [2], which in turn improves understanding of the patient’s condition and monitoring of the patient’s clinical course [3].

In light of the COVID-19 pandemic, rapid diagnostic tests are regarded as a fundamental instrument in combating the spread of SARS-CoV-2 [4, 5] and, consequently, have received considerable attention. While societies worldwide are vaccinated at an unprecedented rate, rapid COVID-19 tests are expected to play an important role in reopening the economy. Since the start of the pandemic, many diagnostics have been developed and are entering the market [6,7,8].

The pandemic is a major risk to public health and economies worldwide, but infectious disease poses another threat: in 2018, the World Health Organization declared antimicrobial resistance (AMR) to be one of the ten greatest threats to public health [9]. In Europe, there are estimated to be over 650,000 infections with resistant bacteria every year, causing over 30,000 attributable deaths [10]. Innovative POC testing may have an important role in combating AMR, as it enables clinicians to prescribe antibiotics more accurately [11, 12]. Antibiotic prescriptions related to respiratory infections are especially relevant because this type of infection represents a third of total visits to primary care centres [13] and generates difficulties for medical professionals when diagnosing, as they tend to overestimate the proportion of patients presenting with bacterial infections and, consequently, overprescribe antibiotics [14].

In the field of pharmaceuticals, health technology assessment (HTA) plays an important role in assessing the value for money of new drugs [15], but in the case of diagnostic technologies, this is not as established. A complicating factor here is that, unlike pharmaceuticals, which directly influence a patient’s health status, the impact of diagnostic technologies is indirect and only takes effect when diagnostic results change downstream clinical interventions [16]. Until now, the assessment of new diagnostic techniques often focused on technical capabilities, such as the test’s sensitivity and specificity. However, starting in 2022, the European “in vitro diagnostic regulation” (IVDR) law will come into effect, making it mandatory for companies to prove the clinical effectiveness of new diagnostics before they enter the market with aligned data evidence [17]. These data will enable policy makers, payers and healthcare providers to better estimate the added clinical value of novel diagnostics and can be incorporated into HTAs.

Considering the public health relevance of diagnostics of respiratory infections and the policy changes that may increase the focus on the HTAs of diagnostics, we systematically reviewed the methods used in economic evaluations of applied diagnostic techniques, for all patients seeking care for infectious diseases of the respiratory tract (such as pneumonia, pulmonary tuberculosis [TB], influenza, bronchitis, bronchiolitis, sinusitis, pharyngitis, sore throats, group A beta-haemolytic streptococcal infections [GABHS] and general respiratory tract infections). Specifically, we report on the types of economic models used to assess current practices for implementing development diagnostic technologies so that the results generated would facilitate the identification of areas for improvement in economic evaluations of diagnostics. Finally, considering the evidence of increasing societal costs of AMR [10], we evaluate how authors have modelled the influence of AMR and under which circumstances diagnostic tools would help reduce antibiotic prescription.

2 Methods

2.1 Search Strategy

We conducted a systematic review of articles contained in PubMed, Scopus and Web of Science. The search syntax was constructed to include economic evaluations of diagnostic strategies of infectious diseases, see the Electronic Supplementary Material (ESM) for the specific search syntax. The results were not limited to certain countries but, with the purpose of reflecting recent clinical practice, we only included articles published between January 2000 and May 2020.

2.2 Eligibility Criteria

The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [18] were used for this study. Economic evaluations of diagnostics strategies for all respiratory tract infections were considered for inclusion. Articles meeting the eligibility criteria performed an economic analysis comparing both costs and effects, including cost-effectiveness analyses (CEAs), cost-utility analyses and cost-minimisation analyses (CMAs) that incorporated clinical effects. Studies assessing only the test characteristics from a technical (laboratory) point of view were not considered in this review. Patient-relevant outcomes had to be included, such as quality-adjusted life-years (QALYs), disability-adjusted life-years (DALYs), life-years gained or the proportion of correct diagnoses. An inclusion criterium was also utilised that at least two diagnostic strategies for respiratory tract infections were compared. Diagnostic strategies were defined as: “identifying the most likely cause of, and optionally optimal treatment for, a previously undetected disease in a clinically suspect patient who is seeking care” [19]. Population screening, disease monitoring or genotyping of genetic material from patients were therefore explicitly not considered to be diagnostic strategies. Other exclusion criteria were studies focusing on animals, review articles, study protocols, comments on articles or individual case reports, and languages other than English, Spanish, Dutch, German or French.

2.3 Study Selection

Phase one consisted of two reviewers independently screening the titles and abstracts (PRG and SvdP). In the case of not reaching an agreement, a third person (ADIvA) was asked. The full-text screening phase was performed by the same two reviewers, applying the same criteria. This phase was also used to separate the diagnostic and screening strategies, as this distinction was often not clear from the abstract, and to assess whether the article concerned respiratory tract infections.

2.4 Data Extraction

The Consolidated Health Economic Evaluation Reporting Standards (CHEERS) checklist [20], was used as a basis to create a standardised digital (Google) form to extract the relevant data from the articles. Various items relevant to diagnostics were added to this extraction tool. See the ESM for an overview of the items included. The data extraction was divided between three reviewers (PRG, SvdP and ADIvA); 10% of extractions was duplicated to check for consistency between authors. Furthermore, the reviewers reached a consensus with continuous discussions on the extracted data and repeating the extraction if necessary. To analyse the reporting quality of the studies, a score was calculated based on the presence of the items as reported in CHEERS [20].

In the general and introduction parts of the extraction tool, the research question, specific disease area and pathogens considered were included. In the methodology, we emphasised the type of model developed and its characteristics in terms of perspective, time horizon, setting, population included and incorporation of uncertainty analysis in parameter values (stochastic or deterministic). A section was included to assess whether the model included AMR and, if so, how. In the results sections, we paid attention to the incremental costs and outcomes and techniques for reporting uncertainty in the model. Finally, for the discussion, the focus was on the main findings, limitations, specific limitations in the assessment of diagnostics, and advantages/disadvantages of the modelling technique discussed by the authors.

2.5 Data Analysis

The data extracted from the articles were analysed using R 3.6.3 [21], categorising the data by the considered pathogens (influenza, streptococcus) and the type of models (decision tree, Markov). For data transformation and table creation, the packages Dplyr 1.0.0 [22] and gt 0.2.2 [23] were used. The code was made available on GitHub.

3 Results

Figure 1 shows the included and excluded studies in a PRISMA flow diagram. Seventy papers were included in this review. Most studies were a CEA or CUA, comparing the standard of care, mostly consisting of empirical therapy, clinical judgement or traditional diagnostics (e.g. cultures or microscopy), to the use of rapid diagnostic tests. The most common rapid tests included were: Xpert for TB, influenza-specific POC tests, C-reactive protein (CRP) POC tests and procalcitonin (PCT) tests. Other diagnostics included were polymerase chain reaction (PCR), microscopy, X-ray and clinical scoring algorithms. No tests for coronaviruses were included in this review. Most studies (46) included a decision tree model, two included a Markov model in addition to a decision tree, 12 were trial-based analyses, seven were categorised as a dynamic model and three papers were categorised as ‘other’. Table 1 provides an overview of the settings and tests assessed in the studies, while Table 2 provides an overview of the methods used. Key characteristics and results of all studies can be found in the ESM.

Fig. 1
figure 1

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram of inclusion and exclusion

Table 1 Overview of included studies
Table 2 Methods of included studies

The reporting quality, in the form of a CHEERS score, can also be found in the ESM. Most items were included by most studies. All articles reported comparators, model assumptions and incremental costs and outcomes of the model. Background and objectives were items included in all Markov model but only included in 90% of trial-based analyses and 93% of decision tree models. The target population was included in all trial-based analyses, decision trees and Markov models but in 86% of all dynamic models.

3.1 Regression Models and Trial-Based Analyses

When cost and effectiveness data obtained from a trial are directly used to analyse the costs and clinical effects of a new intervention using standard statistical methods, this is regarded as a regression model or a trial-based analysis. Several studies examined cost effectiveness based upon (mostly) a single trial [24,25,26,27,28,29,30,31,32,33,34,35], without the use of a health-economic model. Four studies assessed diagnostics for general respiratory tract infections [24,25,26,27], two studies assessed diagnostics for pneumonia [28, 30] and three assessed diagnostics for TB [29, 31, 32].

Two studies by Oppong et al. assessed the cost effectiveness of POC CRP testing [25, 26] and Internet-based training for primary care clinicians [26] using regression models. The first study used data from an observational study in Norway and Sweden [25], while the second study used data from a multinational, cluster-randomised, factorial controlled trial in Belgium, the Netherlands, Poland, Spain and the UK [26] Both studies incorporated resource use, EQ-5D scores and antibiotic prescribing from the trials [25, 26]. Multilevel modelling was used to model the outcomes of interest: QALYs, antibiotic prescriptions and costs. Both studies found antibiotic prescribing to be less prevalent when a POC CRP test was performed and found no significant differences in health outcomes. The cost effectiveness of POC CRP testing largely depended on the country, even when similar methods were used: while it was a dominant strategy in the Netherlands; in Spain, it was dominated by usual care [25, 26]. In the analysis where communication training was considered, this was considered to be the overall dominant strategy [26]. Cost-effectiveness acceptability curves (CEACs) were included in both articles [25, 26], using the net benefit regression framework [36]. The reduction in antibiotic prescriptions was expressed in monetary terms as additional costs per patient prescription avoided [25] or additional costs per percentage reduction in antibiotic prescriptions [26].

Nicholson et al. conducted a randomised controlled trial and health-economic evaluation of diagnostic tests for influenza, respiratory syncytial virus (RSV) and Streptococcus pneumoniae in adults hospitalised for chronic or acute cardiopulmonary illness in the UK [24]. Three strategies were included in the analysis: POC tests, PCR and traditional, conventional laboratory diagnostic assessment. The clinical characteristics of the various diagnostic strategies were compared, and the cost effectiveness was assessed using a bivariate model, mainly incorporating costs and QALYs of the trial. The authors performed a Bayesian analysis that used 50,000 replications of a Markov Chain Monte Carlo analysis. The probability of cost effectiveness was calculated at the willingness-to-pay threshold of ₤20,000/QALY. The authors concluded that there were no major differences between the different strategies, both in costs and QALYs, but that the PCR-based strategy was the most likely to be cost effective.

Two studies assessed the cost effectiveness of diagnosing community-acquired pneumonia. Böhmer et al. assessed a computer algorithm that aided the prescription of antibiotics in hospital settings compared with usual care [30]. Two groups of 15 patients were followed in a single hospital in Germany and the clinical outcomes considered were: days with symptoms, days with antibiotics and hospital length of stay. Differences between trial arms were calculated using t tests. The algorithm was considered to be cost saving, resulting in fewer costs and better treatment [30]. Dinh et al. [28] assessed the cost effectiveness of a rapid pneumococcal antigen test for patients with community-acquired pneumonia in the emergency department (ED) setting. Over 3 years, 1224 patients were included; however, no control arm was used. In total, there were 51 positive test results, which led to a change in prescription for seven patients. The authors concluded that the costs of implementing the antigen test (€8748 annually) were too high compared with the benefits [28].

Six trial-based analyses considered specific TB strategies: different culture-based methods in Kenya [35], Xpert testing in South Africa in laboratories [34] or at POC [29, 31]; automated microscopy in the South African laboratory setting [33]; and a second Xpert test in China [32]. Patient-level clinical and cost data were collected. Cost-effectiveness outcomes were mostly influenced by the number of patients correctly diagnosed, screened or treated [29, 31,32,33,34,35], one study included TB morbidity measured with a numerical TB score [31]. Differences between the various groups were compared using standard frequentist methods [29, 32], logistic regression [29], univariate analyses [31, 33, 35] and Monte Carlo analyses [31, 33]. Two studies presented a cost-effectiveness frontier to compare the different strategies included in the analysis [32, 33]. The two studies assessing POC Xpert testing in South Africa had different conclusions compared with current care: cost saving [29] or cost effective (albeit more expensive) [31], while the study in the South African laboratory setting concluded that the higher costs were not matched by the improvement in TB diagnostic efficacy [34]. Wang et al. concluded that the price increase related to performing a second Xpert test is relatively high [32]. Four studies questioned the affordability of the assessed TB diagnostic strategies in low-income countries [32,33,34,35], even if the result of the analysis was that the assessed strategy was cost effective.

There were some common gaps in the reporting quality of the trial-based and regression analyses. Mostly, this was a result of the relatively short time horizons: the percentage of trial-based studies that reported the time horizon was 20% and 30% reported the discount rate. 40% of the articles reported the measurement of effectiveness and the choice of model was reported in 20% of the articles. An example of a paper with a high reporting quality is written by Oppong et al. [25]. Resource use, clinical outcomes and statistical methods were clearly described, and the performed regression analysis provided detailed insight into the parameters relevant for the model.

3.2 Decision Trees

Decision tree models are used to calculate the costs and effectiveness outcomes of different clinical interventions, usually over a limited time period as time cannot be modelled explicitly [37]. A combination of decisions and probability rates of occurrence are used to calculate the outcomes for various cohorts in the model.

A total of 46 articles compared diagnostic techniques using a decision tree model. Thirty-two articles focused on the use of diagnostic tests that identify bacterial infections (11 TB-specific tests, seven CRP, four PCT, six GABHS infection and four used other diagnostic techniques). Twelve articles focused on the use of diagnostic tests to detect influenza (FLU OIA, QuickVue and ZstatFlu). Finally, two articles compared tests for both bacterial and viral infection.

The main diagnostic test to detect TB was the Xpert test [38,39,40,41,42,43,44,45], with other articles assessing the lateral flow urine lipoarabinomannan assay Alere Determine™ test [46], the IGRA (Interferon-Gamma Release Assay) test [47] and the T-cell detection test [48]. The target population consisted of patients presenting with symptoms of active pulmonary TB disease [38,39,40,41,42,43, 47], in some cases, patients with human immunodeficiency virus specifically [44,45,46, 48]. The comparators were sputum smear microscopy [38,39,40,41, 45], culture [42,43,44] and chest radiography [41, 43, 45]. In the case of testing positive, the treatment was a routine TB regime. Time horizons considered were 6 months [45], 2 months [48] or lifetime [46, 47]. The clinical outcomes used were QALYs [40, 42, 47, 48], DALYs [41, 46], TB cases detected [38, 39, 46, 47], days free from disease [44] and deaths averted [45]. One analysis included multi-drug resistance into the model as they switched to a second-line treatment if rifampicin resistance was identified [41]. One study concluded that the delay in diagnosing active TB caused the test strategy not to be cost effective [48]. Other studies showed that the combination of Xpert with LF-LAM was cost effective compared with the usual care [40, 42, 46] and testing was the most cost-effective strategy while varying the incidence of TB (even when as low as 0.2%) [47]. Uncertainty was included using deterministic [38,39,40,41,42,43,44,45, 47] and probabilistic sensitivity analyses [39,40,41,42,43, 45,46,47,48].

Seven articles assessed a CRP test with a decision tree [49,50,51,52,53,54,55]. Patients were adults in hospital care [49, 50] or children in the ED [51], both with suspected influenza symptoms, adults with symptoms of acute respiratory tract infection attending primary care [52, 53, 55] and children visiting the paediatric ED of a hospital with meningeal signs [54]. The most common treatment included in the models was amoxicillin, in the case of a positive CRP result [49,50,51,52,53,54]. In the case of a negative result, antiviral therapy was prescribed [49,50,51,52,53]. Other strategies compared were treatment without testing [50, 53], no treatment [50] and intensified communication after targeted training for general practitioners (GPs) [52]. Different time horizons were used: 28 days [52, 53], 15 years [54] and lifetime [50, 51]. Nelson et al. [51] followed patients from ED through the rest of their lives, with a life expectancy of 78.7 years. Most analyses used QALYs as the clinical outcome [49,50,51, 53, 54] and one analysis used the number of antibiotic prescriptions safely saved [52]. One analysis [53] included the cost of antibiotic resistance, by increasing the price of each prescription with the estimated cost of AMR, based on the annual cost of resistance in the USA. Most articles concluded CRP testing was cost effective except for one [49]. Notably, this specific study estimated that performing the test was only profitable with an influenza prevalence of under 2.5%, whereas it was higher during the study period [49]. Uncertainty was included using a deterministic sensitivity analysis [49,50,51, 53, 54], probabilistic sensitivity analyses [49,50,51, 53] and cost-effectiveness acceptability curves (CEACs) [49,50,51, 53].

Procalcitonin testing was compared to usual care in four studies [56,57,58,59]. Patients were adults and children with suspected acute respiratory infections presented in primary care [56], the intensive care unit [57] or the hospital in general [58, 59]. Antibiotic treatment was prescribed to all who tested positive. In the usual care arm, empirical antibiotics were prescribed, as judged by the physician. The time horizon was short in all cases: 30 days [58, 59] or one episode of acute respiratory infection [56, 57]. Antibiotics avoided was used in three articles as the clinical outcome, expressed as the number of prescriptions saved [56] or as a reduction in the days of treatment [57, 58]. Additionally, QALYs [56] and the number of infections averted were used [57]. All articles included AMR: three studies assumed that the value of a safely avoided antibiotic prescription equalled the health system cost of resistant infections attributable to that prescription [56, 58, 59]. In another article, the reduction in resistant infections was calculated, multiplying the correlation of reduction in days of antibiotics and rate of resistance (estimated in previous publications [60]) by the difference in days of antibiotics between PCT and the usual care group [57]. Three articles concluded that PCT was cost effective [56, 58, 59]. Uncertainty was included using a sensitivity analysis graph [58, 59], CEAC [56] and tornado diagram [57].

Six decision-tree studies included a marker to detect GABHS infection [61,62,63,64,65,66]. Children and adults with symptoms of pharyngitis presenting to primary care [61,62,63] or hospitals [64], and in other cases with a sore throat [65, 66], formed the population. The strategy of performing a rapid test was compared with treating all [61,62,63,64,65], treating none [61, 64], using a clinical scoring measure to determine the treatment (triage the diagnosis and treat those with a high score with antibiotics) [61,62,63] or culture [64]. If the diagnostic test was positive, penicillin was prescribed [64,65,66] or two types of antibiotics depending on the severity of the infection. Clinical outcomes were expressed as quality-adjusted life-days [61, 64], proportion of patients cured without complications [62, 63] and rate of appropriate use of an antibiotic per patient treated [66]. The use of the rapid diagnostic test to detect GABHS was cost effective [61,62,63,64, 66] and cost saving [65]. Uncertainty was included using deterministic analyses [63, 65, 66], a tornado diagram [61, 63, 64] or a two-way sensitivity analysis graph [62].

Three articles [67,68,69] used other diagnostic techniques to detect pneumonia: plain chest X-ray and blood count [67], different cultures (BinaxNOW-SP and urinary antigen test) as an add-on to standard cultures [69], bronchoalveolar lavage, mini-bronchoalveolar lavage, or endotracheal tube or bronchoscopy [68]. The included populations varied: Bertrán et al. [67] considered a population of patients with community-acquired pneumonia, under age 65 years, without hospital admission criteria, and patients with acute exacerbation of chronic bronchitis due to a respiratory infection. Ost et al. [68] modelled a hypothetical cohort of immunocompetent patients in the intensive care unit, intubated for 7 days, with evidence of ventilator-associated pneumonia. The target population in Xie et al. [69] was hospitalised patients with community-acquired pneumonia. The three articles used a healthcare centre’s perspective, included uncertainty using a deterministic sensitivity analysis and concluded that testing for pneumonia was cost effective [67,68,69].

In another article, acute sinusitis was identified by an ultrasound or radiographic evaluation [70]. The target population consisted of patients presenting with acute sinusitis in primary care. The analysis took a healthcare centre’s perspective, with a time horizon of 7 days. The clinical outcome was the probability of being cured. As a result, authors suggested that the most rational clinical response to a suspected case of sinusitis was first ‘‘to wait and see for one week’’, with a ‘‘selective strategy by means of structured clinical assessment’’ as the next step. Uncertainty was included with a deterministic sensitivity analysis and a two-way sensitivity analysis graph.

In 12 articles, the strategy of a diagnostic test that detects the presence of viruses was studied in a decision tree. The population of the studies were adults [71,72,73,74,75,76,77], children [78,79,80] and the elderly [81, 82] with influenza-like illness during the influenza season [73,74,75, 77, 80, 82], and, in two papers, only patients without prior influenza vaccination were included [77, 78]. The diagnostic test to detect influenza was included in all decision trees and was compared to clinical judgement [71, 81], empirical treatment [72, 75,76,77, 80], treating none [72,73,74, 78, 79], targeted or universal rapid influenza testing [75], culture [76] as well as other rapid diagnostic tests such as FLU OIA, QuickVue, ZstatFlu [77, 80, 82] or other tests [73]. In the case of a positive influenza test, the treatment was oseltamivir [73,74,75, 77, 78, 80,81,82]. Amoxicillin [72, 75,76,77, 80] and/or oseltamivir [71,72,73,74,75, 78,79,80,81,82] were included in the empirical treatment strategies. The time horizon was 1 year [76, 79] or a single episode of influenza-like illness [72, 73, 75, 77]. The clinical outcome most used was QALYs [73, 74, 76,77,78,79,80,81,82], quality-adjusted life expectancy [77, 80, 82], days free from disease [71, 73] and antibiotic prescriptions saved [75]. Two decision trees considered resistance: oseltamivir resistance was included as a percentage of H1N1 strains [78] and, as assumed by Michaelidis et al. [56], authors supposed that the relationship between antibiotic use and antibiotic resistance was one-to-one linear, to determine the impact of antibiotics used in treating influenza-like illness on overall societal antibiotic resistance [75]. Four analyses showed that rapid testing for influenza was not cost effective during the influenza season [79, 80], testing was only a cost-effective approach early in a pandemic [74] and, in unvaccinated patients, antiviral therapy without testing was economically more reasonable [73]. Other authors concluded that the optimal strategy depended on the patient’s vaccination status and the risk of hospitalisation [82]. The findings showed that testing saved QALYs by reducing the rates of subsequent hospitalisation for influenza and mortality [81]. Uncertainty was included using deterministic [73,74,75,76,77,78,79,80,81,82] and probabilistic sensitivity analyses [73, 75, 77, 80, 82].

Two articles included diagnostic strategies for both viral and bacterial infections [83, 84]. The populations were adults with suspected infections [83] and children with symptoms of acute bronchiolitis [84], both in a hospital setting. In one article, three testing strategies were compared: a comprehensive strategy in which all available diagnostic tests were requested simultaneously; a stepwise strategy in which a limited number of diagnostic tests could be requested, prioritising those for the most prevalent diseases; and a minimalist strategy in which a limited number of diagnostic tests, prioritising those with the highest sensitivity and specificity could be requested [83]. In the case of a positive result, the treatment was 5 days of ceftriaxone [83]. The stepwise strategy was the most cost effective in terms of cost per correct diagnosis. Uncertainty was included using a deterministic analysis. In the other article, several diagnostic techniques were available (blood count, CRP, PCT, chest X-ray and respiratory virus detection tests) and a clinician could order any of these based on clinical suspicion and adhering to clinical practice guideline [84]. This forms a branch of the tree (named good practice) and it was compared with another branch in which any test was performed or, if it is done, these tests were not the appropriate tests based on the symptoms that the patient (named lack of good practice). In the case of a positive result from the test, the treatment was antibiotics. The utilisation of good practice in the diagnosis and management of patients with bronchiolitis was associated with both fewer patients readmitted within 10 days post-discharge and lower costs. Uncertainty was included using a deterministic analysis and a tornado diagram.

Frequent gaps in the reporting of decision tree models are also related to the relatively short time horizons, as is the case with trial-based analyses. The time horizon was reported by 65% of papers and the discount rate was reported by 24%. Sixty-eight percent of the articles reported the measurement and valuation of preference-based outcomes, and 43% the currency, price date and conversion. Michaelidis et al. provide a high-ranking decision tree model, based on the CHEERS score [56]. The authors described an easy-to-understand decision tree to model PCT-guided antibiotic therapy for outpatient respiratory-tract infections over a short time horizon (one episode). Extensive sensitivity analyses showed the uncertainty associated with several model assumptions.

3.3 Static Markov Models

A static Markov model presents a set of mutually exclusive and exhaustive states that describe the progression of a disease for a cohort of patients. In contrast to decision trees, Markov models can be used to incorporate time in the health-economic model [85].

Two studies used a static Markov model in combination with a decision tree to model the diagnosis and treatment of respiratory tract infection in the community setting [86, 87]. Balk et al. [86] considered acute bacterial sinusitis in the USA and compared four strategies: not prescribing antibiotics, empirical amoxicillin treatment, amoxicillin treatment based on a set of clinical criteria and amoxicillin treatment based on an X-ray. A combination of a decision tree and a Markov model was used to model a 14-day period using daily cycles: the decision tree was used to model the index consultation, including any tests performed and treatment decision, while the Markov model was used for the disease development using daily cycles. The Markov model incorporated disease complications, antibiotic side effects and symptom improvements. Antimicrobial resistance was considered, by reducing the efficacy of the antibiotic compared with placebo over the 14-day period. The prevalence of sinusitis was varied using deterministic sensitivity analyses. Balk et al. concluded that prescribing antibiotics based on clinical criteria was cost effective for settings where most patients experience mild or moderate symptoms; however, they also concluded that empirical antibiotics were cost effective if a sufficient proportion of the population experienced severe symptoms [86]. Hunter considered the implementation of POC CRP tests in the UK, comparing three strategies to the current standard of care, in which a GP decides to prescribe antibiotics based on the GP’s views and the patient’s expectations [87]. The strategies considered are: a CRP test performed by the GP; a CRP test performed by a practice nurse and a CRP test performed by the GP in combination with communication training for the GP [87]. A combination of a decision tree and Markov model was used, with a time horizon of 3 years using 28-day cycles after the index consultation [87]. The decision tree was used to model the consultation and direct follow-up (up to 28 days), while the recurrence of respiratory tract infections following the initial disease episode was modelled using the Markov model with two states: healthy and respiratory tract infection. The prevalence was not varied in the model. The model was probabilistic and a cost-effectiveness plane and CEACs were reported. Hunter concluded that CRP implementation is cost saving, with the strategy with communication training not being cost effective [87]. The reporting quality of the study by Hunter was high.

3.4 Dynamic Models

Dynamic models are characterised by a changing rate of infection within the population, usually based on the number of infected individuals [88]. Dynamic models can be individual based or cohort based.

One paper considered influenza: Nshimyumukiza et al. compared an influenza rapid diagnostic test followed by antiviral treatment to empiric antiviral treatment in Quebec, Canada, by using an individual-based dynamic model [89]. The model consisted of two parts: a “susceptible, infected, recovered” (SIR) model and an economic model, considering a time horizon of 1 year. The compartmental SIR model consisted of three differential equations to model three states using single-day cycles. The economic analytical model was used to simulate infected persons who could decide to seek care if they were symptomatic. Patients who sought care within 48 hours received oseltamivir, reducing the probability of complications such as pneumonia and death. Two outcomes were reported, the saved costs and life-years per 100,000 person-years, and uncertainty was included using deterministic and probabilistic sensitivity analyses. The authors concluded that the testing strategy was dominant (fewer deaths and fewer costs) compared with empirical antiviral treatment [89].

Six studies assessed TB diagnostic strategies using a dynamic model [90,91,92,93,94,95], of which five assessed one or more Xpert-based strategies compared to other interventions (e.g. standard care) [91,92,93,94,95], one assessed a public-private mixed programme for TB diagnosis [92] and one a national TB strain service. Countries included were various African countries [91, 93, 95], India [92, 94] and the UK [90]. The transmission models incorporated stages such as uninfected/susceptible, latent infection and active infection, in four cases including TB resistance [91, 92, 94, 95] or human immunodeficiency virus status [91, 93, 95]. Included were individual-level models [92, 94] (agent-based modelling) and compartmental models [90, 91, 93, 95]. Langley et al. also incorporated a discrete-event simulation to model the patient (presumptive TB cases, visiting a diagnostic centre) and sputum pathways (samples flowing from the diagnostic centre to the laboratory and undergoing various diagnostic tests) [95]. Clinical outcomes considered were DALYs [91, 93,94,95], QALYs [90, 92] and life-years [91], with all models considering a time horizon of 10 years or more. Five models incorporated a probabilistic sensitivity analysis (PSA) [91,92,93,94,95], four included a CEAC [91, 92, 94, 95], all incorporated a deterministic sensitivity analysis [90,91,92,93,94,95] and two a cost-effectiveness frontier [92, 95]. The conclusions of the papers vary and are dependent on various factors, such as the affordability [91, 92], uncertainty of the input parameters [93] or procedural factors (e.g. number of referrals and cost sharing) [94]. Langley et al. identified three cost-effective strategies, including Xpert testing and two microscopy-based strategies for Tanzania [95]. Mears et al., who described universal strain typing in the UK, concluded that this was unlikely to be a cost-effective strategy [90].

Because of the complexity of dynamic models, these often are accompanied by extensive supplementary material containing the details of the performed analysis. Areas for improvement regarding the reporting quality are currency conversion methods and details regarding the valuation of QALYs or DALYs. Other than that, there were no major differences between the reporting quality of the various papers.

3.5 Other Models

Lee et al. [96] assessed the cost effectiveness of POC testing for TB, including rifampicin resistance, with a new PCR test for India’s public sector (Truenat). The compared strategies were smear microscopy, Xpert and the Truenat test in designated microscopy centres and Truenat at POC. A microsimulation model was used to model a cohort of patients seeking care with TB symptoms, over a lifetime horizon. Tuberculosis prevalence was based on a previous implementation study. The cost-effectiveness measure was costs per life-year saved; a budget impact analysis was also performed, using time horizons of 2 and 5 years. Uncertainty was included using one-way and two-way deterministic sensitivity analyses, varying Truenat sensitivity and linkage to care, and a Tornado diagram, varying various key model parameters. The authors concluded that implementing Truenat at POC was cost effective [96].

Two CMAs used accounting data to study optimal healthcare resource use. Bogdanova et al. compared two diagnostic algorithms for TB in Russia: a culture-based diagnostic algorithm and line probe assay to detect resistant TB, using costs collected from the government’s accounting systems [97].The reduction in the number of hospital days to the correct diagnosis and treatment initiation were considered clinical outcomes. Oostenbrink et al. [98] used hospital data to assess cost savings related to the implementation of a decision rule to diagnose and treat children with meningeal signs visiting a Dutch ED. The considered outcomes were safely avoided lumbar punctures and empirical antibiotic treatment. Both studies estimated and compared the resulting costs of each algorithm. One-way sensitivity analyses were included by changing the cost parameters of the model, and both studies concluded that the investigated strategy was cost saving compared with current care [97, 98].

4 Discussion

In this study, we reviewed 70 economic analyses of applied diagnostic techniques for infectious diseases of the respiratory tract, covering a broad range of illnesses for which individuals seek care including influenza, pneumonia, TB and GABHS. The diagnostic techniques assessed range from POC to laboratory testing in numerous different country settings.

4.1 Advantages and Disadvantages of Different Modelling Methods

Twelve studies assessed the cost effectiveness of a new diagnostic strategy within the context of a single trial. Seven studies were performed in high-income countries (HICs) [24,25,26,27,28, 30, 32] and five in low- and middle-income countries (LMICs) [29, 31, 33,34,35]. The scope of trials of diagnostics is sometimes rather limited. Of those trials included in this review, trials have only a few patients [30], only one trial arm [28] or a limited scope (e.g. the cost effectiveness was only assessed from the perspective of the laboratory) [33, 34]. Most trial-based analyses resorted to outcomes related to the direct performance of the diagnostics [28, 31,32,33,34,35]. The generalisability of the studies was affected by all these factors, and the results may not be applicable outside the direct setting where the analysis was performed [28,29,30, 34]. These and other aspects make it difficult to assess the effects of the diagnostic method beyond the trial; no trial-based analysis reported a time horizon longer than a couple of weeks. In addition, only a few studies used generic clinical outcomes that can be compared between various studies and disease areas, such as QALYs [24,25,26].

The type of model most frequently found in the review was a decision tree (46 of 70 articles). One of the reported advantages of using a decision tree analysis is that the technique enables comparison of a large number of strategies and even the possibility of combining them sequentially, mostly a clinical scoring system to identify patients to be tested [61,62,63, 66]. A decision tree can easily be adapted to different health systems and settings. 29 studies were performed in HICs [39, 42, 47, 48, 50,51,52,53,54, 56, 57, 59, 61,62,63, 65,66,67,68,69,70, 72,73,74, 76,77,78, 80, 82] and 17 in LMICs [38, 40, 41, 43,44,45,46, 49, 55, 58, 64, 71, 75, 79, 81, 83, 84]. Although authors usually focus on one age group, one decision tree can also be applied to more than one group, as evidenced by some articles, which included both paediatric and adult populations in their analysis [58, 63, 65, 71]. Decision trees are straightforward to model and interpret for researchers and clinicians who may not be familiar with pharmacoeconomic methods. The (computational) simplicity of decision trees also enhances feasibility to include several sensitivity analyses, such as calculating the cost effectiveness under various disease incidence values [49,50,51,52,53,54,55]. A disadvantage of using decision trees can be that long-term outcomes are difficult to include, as time as a factor is not modelled explicitly. Therefore, many of these studies incorporate only a short time horizon or do not detail a time horizon at all. Yet, some overcame this disadvantage by estimating the life expectancy and applying that as time horizon [50, 51] or extended a decision tree with a Markov model to be able to model time explicitly [86, 87]. The main disadvantage found when using a decision tree was that several simplifying assumptions are needed [41, 53, 76], which makes it difficult to generalise the results. The testing strategy was cost effective in 32 articles and cost saving in six. Factors that most affected the results were the prevalence of infection and the patient’s vaccination status. First, below and above certain prevalence percentages, the testing strategy is no longer cost effective, either because there is not a sufficient number of cases or because empiric treatment dominates the other strategies [39, 44, 49,50,51, 62, 64, 69, 71, 76, 78, 83]. Furthermore, the patient’s vaccination status is a key aspect that affected the results, as the vaccination status influences the probability of a patient having the disease of interest: the cost effectiveness of testing can be reduced in vaccinated populations [54, 72, 73, 78,79,80, 82]. The test parameters affected the results of the analysis, but also the physician’s judgement was found to be influential.

Seven studies used a dynamic model to assess the use of a novel diagnostic strategy [89,90,91,92,93,94,95]. These models included transmission of influenza [89] or TB [90,91,92,93,94,95], providing more flexibility compared with most other model types included in this review. For example, the TB models included time horizons with a range from 10 years to a lifetime [90,91,92,93,94,95], and included either QALYs or DALYs [90,91,92,93,94,95]. Two studies were performed in HICs [89, 90] and five in LMICs [91,92,93,94,95]. Dynamic models require more data than more straightforward models and this was mentioned as a disadvantage by some studies included [89, 91, 93, 94]. The authors of two papers mentioned that the time to treatment was an important aspect that was not modelled in their analysis [90, 93]. Langley et al. [95] found a solution to this problem, as the authors not only modelled TB transmission, but also the operational process of transporting test samples to external diagnostic laboratories for different types of microscopy and Xpert in Tanzania. Using a discrete event simulation, the authors modelled the time to start the correct treatment and loss-to-follow-up [95]. Lee et al., did not include transmission in their TB microsimulation, instead, they used incidence data from a previous study. In this model, continuity of care after a visit to a health centre was improved when diagnosing at POC compared with external laboratories, resulting in better patient outcomes [96].

Three models were CMAs [65, 97, 98] with a main focus on financial outcomes and not on clinical outcomes, which we believe to be a feasible approach when the new diagnostic strategy is at least as effective as current care. As the data requirements of the CMAs included in this review (unit-level accounting data or micro-costing) are larger than most other reviewed studies, this type of study may be less reproducible and more difficult to interpret for clinicians.

4.2 Inclusion of AMR

Twenty articles included resistance in some way [26, 32,33,34, 41, 53, 56,57,58,59, 68, 75, 78, 86, 91, 92, 94,95,96,97]. We identified three main methods for including AMR, which were used by more than one paper. Seven models incorporated AMR by adding a ‘societal cost’ to any antibiotic prescription [26, 53, 56, 58, 58, 68], which was based on a paper by Oppong et al. [99]. Six models used a fixed percentage of resistant infections, in some cases varied in sensitivity analyses [32,33,34, 41, 78, 95]. Five studies modelled resistance dynamically, by changing the resistance rate based on the consumption of antibiotics [57, 91, 92], by modelling both a sensitive and resistant strain [94], or by decreasing the efficacy of antibiotics in the future [86]. Clearly, the consequences of increasing AMR can be incorporated in the numerator or denominator of the incremental cost-effectiveness ratio (i.e. included as a cost or a clinical effect). Still, most studies did not include AMR in the analysis at all, even though this is highly relevant for patients, care providers and policy makers [10].

4.3 Limitations

This review focused on diagnostic strategies, as defined in the methods. The distinction between a diagnostic test and tests used for other purposes, such as screening, disease monitoring or (pharmaco-)genetic tests, can be difficult to make in certain cases. In many cases, the authors do not clearly specify the population to which the test is applied. We therefore tried to clearly define what we consider to be a diagnostic and made the distinction in the full-text screening round, so that we could decide from the methods, often the model specification, whether the paper should be included. We included five languages in this review, which limited the papers we included (seven papers were excluded based on language). Additionally, we only included papers from the year 2000 onwards, which allowed us to focus on current health-economic methods and diagnostic strategies.

Because of the many different diagnostic strategies that are included in this review, as well as the large number of healthcare systems (over 30 countries), we have not included any comparison of cost-effectiveness results. The quality of reporting was included using the CHEERS checklist as described above.

4.4 Opportunities for Further Research

Considering the number of articles included in this review, there is great interest in the cost effectiveness of diagnostics for respiratory infections. However, there are some gaps where further research could be warranted. For example, in POC diagnostics in general practice, for many HICs we found no study on the cost effectiveness of any test. Considering the significant reductions of antibiotic consumption linked to these tests [26], this may be an important opportunity to contain increasing AMR. Even more reductions in inappropriate antibiotic prescribing may be possible in LMICs [100], but few studies considered the cost effectiveness of diagnostics for other respiratory infections than TB [58, 64, 71, 75, 79, 83, 84].

In the hospital setting, multiplex PCR systems may increasingly play a role in quickly testing for a range of viruses and bacteria, which can provide valuable insight into the local transmission of respiratory pathogens [101]. Only two papers included an analysis of multiplex PCR; both papers considered patient-specific outcomes, but not the broader value of knowing the epidemiology of respiratory tract infections in the community [24, 51]. Although multiplex PCR systems may be major investments for hospitals, the collected data on the aetiology of respiratory-tract infections and AMR could be used to inform prescribing decisions by GPs in the community setting as well.

This review shows that there are many different methodological approaches used in the literature to assess the cost effectiveness of diagnostics for respiratory tract infections. While 33 studies used generalisable outcomes such as QALYs [24,25,26, 40, 42, 47, 48, 50, 51, 53, 56, 61, 64, 74, 76,77,78,79,80,81,82, 86, 87, 90, 92, 98] or DALYs [41, 46, 91, 93,94,95], ten studies used outcomes related to the accuracy of the diagnostic test [31,32,33,34,35, 38, 39, 43, 69, 83] (such as the percentage of correct diagnoses) and eight used an outcome related to the prescribed treatment (such as the number of antibiotics saved) [28, 52, 57, 58, 62, 66, 67, 75]. Therefore, many studies did not incorporate a generally comparable clinical outcome in their cost-effectiveness analysis. This also translated into the time horizon used, with varying durations, various studies not reporting a time horizon at all [27,28,29, 31,32,33,34,35, 38,39,40,41,42,43,44, 48, 61,62,63,64,65,66, 74, 79, 83, 97, 98] or using only a limited time horizon of less than 1 year [24,25,26, 30, 45, 52, 53, 56,57,58,59, 67,68,69,70, 72, 73, 75, 86]. Generalisable outcomes and sufficiently long time horizons are regarded as important principles when performing health-economic analyses [102, 103] and identified as important areas for improvement for economic analyses of diagnostics of respiratory tract infections.

4.5 Policy Implications

None of the included articles assessed the diagnosis of a coronavirus. Because of the major economic impact of COVID-19, we expect any testing strategy here already to be worthwhile if this means that economies can function normally again. The recent public attention on rapid COVID-19 tests and the knowledge that respiratory tract infections can be diagnosed rather precisely may result in a permanent change in treatment practice. There may be a shift for doctors, who have experienced the value of POC testing during the COVID-19 pandemic, but also a shift for patients, who may demand to be informed regarding the cause of their respiratory complaints.

As the IVDR will come into effect soon, diagnostic companies will need to prove the clinical effectiveness of new products entering the European market [17]. Diagnostic test accuracy alone will not be sufficient to obtain market authorisation and diagnostic companies will need to monitor patient outcomes associated with their tests over a longer term. We recommend these companies include quality-of-life measurements in their trials, enabling the calculation of QALYs or DALYs at a later stage if they want to draw any conclusions with regard to cost effectiveness. Additionally, provided that sufficient clinical outcomes are recorded, standard pharmacoeconomic methods can be used to extrapolate the trial results so that longer time horizons can be included. The increased availability of clinical data on the performance of diagnostics after the introduction of the IVDR will present decision makers with a more evidence-based method of assessing diagnostics [17]. We expect HTA will play an increasingly important role here, as it has with pharmaceuticals [15]. Many European countries have developed guidelines for economic analyses, which are most often tailored to and used for pharmaceuticals [15]. Health-economic guidelines for diagnostics are not as well developed, an issue that has previously been raised by Garfield et al., who looked at the assessment of molecular diagnostics from the perspective of various HTA agencies [104]. We would recommend decision makers to consider the application of the pharmacoeconomic guidelines to diagnostics and adapt these guidelines if needed.

5 Conclusions

This review shows that methods used to assess the cost effectiveness of respiratory tract infection diagnostics vary greatly. Main points for improvement in this field are the application of generalisable outcomes and the extrapolation of results beyond the time horizon of the trial.