Background

The evaluation of the role and properties of diagnostic tools has become a priority for global health policy and decision-making, driven mainly by the development of new technologies for well-known diseases and the emergence of new deleterious conditions affecting large-scale populations [1, 2]. Diagnostic evidence of the accuracy of a test for detecting a target condition of interest can be appraised using systematic approaches following standardized methodologies [3]. Briefly, diagnostic studies focus on estimating the ability of the index tool to identify subjects with or without the condition of interest [3]; evidence synthesis then requires two quantities: test sensitivity and specificity and the correlation between them [4]. The statistical approach used depends on the choice between estimating accuracy for a common threshold (i.e. an average operating point), or an expected curve across many thresholds (i.e. a summary ROC curve) [5,6,7], using commercial software packages with the analytical characteristics needed for fitting complex hierarchical models.

We recently found that the statistical synthesis of accuracy data was one of the methods more frequently omitted during the development of rapid reviews of diagnostic tests [8]. This, then, would be a potential bottleneck for the extended evaluation of diagnostic tools [8]. For several years, Meta-DiSc software has been one of the most widely used statistical programs in the meta-analysis of diagnostic data, with more than 1300 citations in peer-reviewed scientific articles [9]. It is a freely available, easy-to-use tool, that enables reviewers to apply statistical methods for the meta-analysis of diagnostic test accuracy (DTA) within an evidence synthesis framework. This software implemented the statistical methods recommended during its development, including the linear model proposed by Littenberg and Moses, and the univariate I-squared index to quantify heterogeneity. Hierarchical models are currently the method of choice for overcoming the limitations of previous statistical approaches [3]. These methodological developments have prompted us to update Meta-DiSc to include current statistical methods for the meta-analysis of test accuracy systematic reviews and an enhanced web-based interface to improve user experience. Our objective was to develop a new version of the Meta-DiSc software as a web application (app) to summarize DTA results by applying statistical methods based on hierarchical models.

Implementation

We have developed a web-based app using R Shiny software. Shiny can be used to build R-based interactive applications directly on RStudio, the integrated development environment for R. The application has been deployed using the shinyapps.io platform.

Estimating pooled diagnostic accuracy indices

The app performs statistical analysis of DTA reviews using a bivariate random effects model [5] and the glmer function of the lme4 package [10] for fitting a generalized linear mixed effect model. Summary points (average sensitivity and specificity) and the parameters are derived to depict the sROC curve. Positive and negative likelihood ratios (LR) and the diagnostic odds ratio (DOR) estimates are obtained from model parameters. The Delta method as implemented in the msm package [11] is used to compute the standard error of the estimates parameters. Forest plots and ROC plots have been implemented using the functionalities of the meta, ggplot2 and plotly packages [12,13,14].

The program also offers the possibility of using a univariate random effects model. Although separate pooling is not recommended for DTA meta-analysis since it fails to account for the correlation between sensitivity and specificity, we have included this option because univariate models, in some instances, have a role in DTA reviews. This is the case, for example, when it is difficult to estimate all parameters of a bivariate model or when the focus of the analysis is only on one of the accuracy indices (i.e. sensitivity or specificity) [15].

Quantifying heterogeneity

Meta-DiSc 2.0 implements a thorough analysis of heterogeneity. In addition to the estimates of logit variances of sensitivity and specificity [16], the software calculates a bivariate I-squared index [17], the area of the 95% prediction ellipse using the polyarea function of the pracma package [18], and finally, the median odds ratios for sensitivity and specificity [16].

Exploring heterogeneity: subgroup and meta-regression analyses

The app can be used to perform subgroup and meta-regression analysis. For this purpose, additional columns need to be included in the dataset to define dichotomous covariates (one each time), which will be used to split the dataset and obtain the accuracy estimates for each subgroup. Exploring these individual results gives the reviewer insights into the between-group difference in sensitivity and specificity and the between-study variances in both indices. The meta-regression option compares the accuracy estimates obtained for these subgroups (i.e., sensitivity and specificity) [19]. The bivariate model includes interaction terms with both sensitivity and specificity and compares the statistical significance of these effects using the lmtest package [20]. For simplicity, meta-regression analysis implemented in Meta-DiSc 2.0 assumes that between-study variances are equal. Therefore, authors should check how appropriate is this assumption by comparing the between study variances in each subgroup.

Results

Meta-DiSc 2.0 is freely available from www.metadisc.es. The user interface design is intuitive and easy-to-use. The left lateral panel organizes the workspace in four main menus: File upload, Graphical description, Meta-analysis, and Summary of findings. The app also includes a short user-guide video to show the practical use of the application.

File upload menu

The app can import data as either comma-delimited (i.e.,.csv) or Excel files (.xlsx files). The file must include data from 2 × 2 tables of individual studies in four columns named TP, FP, FN, TN, representing the number of true positives, false positives, false negatives and true negatives, respectively. The file must also include a unique identifier for each study (ID). It may also incorporate additional columns that will be considered as covariates to explore sources of between-study variability (Fig. 1). Figs. 2, 3, 4 and 5 show different app screens for the analysis of a published diagnostic accuracy systematic review on pulse oximetry screening for a critical congenital heart defects dataset [21].

Fig. 1
figure 1

File upload menu

Fig. 2
figure 2

Bivariate analysis including summary statistics (A), model coefficients (B) and heterogeneity measures (C) (Meta-analysis menu)

Fig. 3
figure 3

SROC curves in subgroup analysis (Meta-analysis menu)

Fig. 4
figure 4

Metarregresion results (Meta-analysis menu)

Fig. 5
figure 5

Summary of findings menu

Graphical description menu

The app generates forest plots of sensitivity and specificity of individual studies to evaluate heterogeneity graphically. Studies of the forest plots are ordered in the same way as defined in the uploaded file. The ROC plot represents individual sensitivity and specificity, and offers the option of adding error bars, as either horizontal or vertical lines. This graphical description can be presented by subgroups defined by the covariates included in the file. All figures are downloadable as.png and.svg formats.

Meta-analysis menu

All analyses are obtained from the meta-analysis menu. The first option of this menu is to fit the bivariate model of sensitivity and specificity. Results are shown in the corresponding tabs: i) statistics, ii) sROC curve, iii) subgroup analysis, and iv) sensitivity analysis.

In the statistics tab, users will find the pooled accuracy estimates (sensitivity and specificity, positive and negative likelihood ratios, diagnostic odds ratio and false-positive rate) along with their corresponding 95% CI (Fig. 2A). Additionally, the app provides model parameters estimates (logit sensitivity, logit specificity, standard errors, logits variances, covariance and correlation), which can be easily transferred to the Cochrane Review manager system (RevMan [22]) (Fig. 2B). Finally, the app shows the heterogeneity statistics, including variances of the logit sensitivity and specificity along with corresponding median odds ratios (MOR) [16], bivariate I-squared [17], and the area of 95% prediction ellipse [16] (Fig. 2C).

After visualizing these numerical results, users can obtain graphical summary results by moving to the next tab named sROC curve (Fig. 3), where the ROC plane graphic can be visualized and downloaded. Different display options can be selected or omitted, e.g., summary point, confidence and prediction ellipses, summary ROC curve, and individual study results.

The subgroup analysis tab fits a new bivariate model, including additional parameters to assess whether sensitivity and specificity differ between subgroups. After showing the coefficients of the estimated model, a formal comparison between subgroups can be made using the meta-regression tab. The app shows the relative sensitivity and specificity along with 95% confidence intervals (LCI and UCI) and p-values of likelihood ratio tests to compare the subgroups formed according to the selected covariate (Fig. 4).

A final sensitivity analysis tab can be used to restrict the analysis to certain specific studies, by simply selecting the level of the dummy variable that will be employed as the inclusion criterion from the dropdown menu.

If two independent univariate analyses of sensitivity and specificity are selected, the results of both random effects models are displayed in a series of screens showing pooled estimates, heterogeneity statistics, and forest plots.

Summary of findings menu

To describe the absolute impact of a diagnostic test in a population with a given prevalence and fix a hypothetical sample size, the app calculates the number of false-positive and false-negative test results observed [23]. Users can download a figure that shows the outcomes (TP, FP, FN, TN) obtained (Fig. 5).

As a worked example, we have used a dataset that corresponds to a systematic review that evaluates the diagnostic accuracy of pulse oximetry as a screening method for detecting critical congenital heart defects (CCHD) in asymptomatic newborn infants [21]. The published meta-analysis included nineteen studies and was performed using the METADAS macro for SAS that uses Proc NLMIXED [24]. To assess potential sources of heterogeneity, we performed subgroup analyses and meta‐regression. The overall sensitivity of pulse oximetry for the detection of CCHD was 76.3% (95% CI 69.5 to 82.0), while specificity was 99.9% (95% CI 99.7 to 99.9). We measured total between-study variability in sensitivity and specificity through variances of the random effects for logit(sensitivity) and logit(specificity), and their covariance. We also provided 95% confidence and prediction ellipses.

We replicated the published analysis using Meta-DiSc 2.0, extending the heterogeneity description to include the area of the 95% prediction ellipse [16], the median odds ratio for sensitivity and specificity [16] and I2 bivariate [17] (Fig. 2C). We also replicated the subgroup analysis for the covariate "test of timing" (within 24 h of birth vs after 24 h from birth) (Fig. 3). Summary estimates of sensitivity and specificity of studies that performed screening after 24 h were 73.6% (95% CI 62.8 to 82.1) and 99.9% (95% CI 99.9 to 100). For studies that performed screening within 24 h, summary estimates of sensitivity and specificity were 79.5% (95% CI 70.0 to 86.6) and 99.6% (95%CI 99.1 to 99.8). The relative specificity for the detection of CCHD was significantly higher when newborn pulse oximetry was performed more than 24 h after birth (Fig. 4). Validation of the analyses using Meta-DiSc 2.0 produced the same results as those obtained with METADAS macro [24]. The comparison of the numerical results obtained with Meta-DiSc 2.0 and the results obtained with other software (METADAS in SAS [24], METANDI in Stata [25] and MetaDTA [26]) are shown in Table 1. We have further evaluated the app, replicating the analysis of four systematic reviews published in the literature [27,28,29,30] (Table 1).

Table 1 Metanalysis results using Meta-DiSc 2.0, METANDI (STATA), METADAS (SAS) and MetaDTA statistical software

Discussion

Our goal was to update a previous version of the MetaDiSc software [9]. After this update, we are confident that MetaDiSc 2.0 can be in the league of available DTA meta-analysis software. The application unifies the main standard routines for diagnostic accuracy meta-analysis and prevents reviewers from choosing among the variety of R packages available for this purpose, since not all of them have the currently recommended methods for DTA meta-analysis. Additionally, novel reviewers using MetaDiSc 2.0 could well avoid the steeped learning curve associated with using R. Another Shiny web application, MetaDTA [26], developed by the United Kingdom National Institute for Health Research (NIHR) Complex Review Support Unit in 2019, is available to conduct DTA meta-analyses. Meta-DiSc 2.0 has an advantage over the MetaDTA software because of its capacity to perform meta-regression analyses and calculate additional measures to quantify heterogeneity.

The app has several limitations. The meta-regression analysis implemented is based on the assumption of equal variances for the random effects of the logit sensitivities and the logit specificities of the compared subgroups. This assumption may be reasonable in many situations, although it may not be in some reviews. It is worth noting that the bivariate I-squared statistic depends on sample size. For this reason, the comparison of I-squared values among meta-analyses with a different number of studies and a different number of diseased and non-diseased participants is limited.

The app does not allow comparing the accuracy of two diagnostic tests, and the current version does not incorporate the risk of bias assessment using the QUADAS-2 tool [31].

The development of this web application was led by the Clinical Biostatistics Unit of the Ramón y Cajal Research Institute (IRYCIS), a unit that has broad experience in diagnostic test synthesis research focused on supporting informed decision-making in the healthcare area. This constitutes a collaborative project for knowledge transfer between IRYCIS and the Complutense University of Madrid and is supported by an intramural project funded by the Ramón y Cajal Research Institute ("Rapid diagnostic reviews for decision-making in healthcare: analysis of critical points and software development", 2018). This project has also been funded by Instituto de Salud Carlos III through the project "PI19/00481" (Co-funded by European Regional Development Fund/European Social Fund; “A way to make Europe”/"Investing in your future"). The Biomedical Research Networking Center in Epidemiology and Public Health (CIBERESP) funds the subscription to the shinyapps.io platform where the app is hosted.

Conclusion

We developed an updated version of Meta-DiSc for performing diagnostic test accuracy meta-analyses. All computational algorithms have been validated by comparing different statistical tools and published meta-analyses.