Background

In their 2015 report titled “Improving Diagnosis in Healthcare”, the National Academy of Medicine identified a better understanding of the performance of diagnostic tests as an imminent priority for patient safety [1]. Systematic reviews, which incorporate findings from multiple primary studies, can increase confidence in our understanding of the accuracy of diagnostic tests in detecting medical conditions or diseases [2]. Systematic reviews and meta-analyses are cited more than any other study design and are prioritized in clinical practice guidelines [3,4,5]. Consistent with this, the number of systematic reviews, including those on diagnostic test accuracy (DTA), has grown extremely rapidly over the past decade [6, 7].

When systematic reviews and meta-analyses are poorly reported, readers are not able to assess the quality of the review and its underlying primary studies or to weigh the applicability of its conclusions. Thus, incomplete or inaccurate reports that do not transparently and completely convey review methods and results may mislead readers, rather than clarify the true value of a test. This contributes to waste of scarce medical research resources [8, 9] and hinders efforts to ensure the reproducibility of research. Previous studies have shown that many published DTA systematic reviews are not adequately reported [10, 11].

The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement is a 27-item checklist and flow diagram that aims to provide guidance on complete and transparent reporting of systematic reviews [12]. Use of reporting guidelines, such as PRISMA, is associated with more informative reporting of medical research [10]. PRISMA was developed primarily for systematic reviews of medical interventions. While DTA systematic reviews share some common elements with intervention reviews, there are important differences. Thus, some items in the original PRISMA checklist may not apply to DTA reviews, and some essential items necessary for reporting DTA systematic reviews may be lacking [2, 6, 13, 14]. Existing guidance for reporting of DTA systematic reviews is limited to non-systematic “expert opinion” [2, 15, 16], guidance on specific methodologic items [6, 17], or work that is not yet complete [18].

The PRISMA-DTA group is developing an extension for DTA systematic reviews and meta-analyses. As the initial step, we performed a systematic review of existing guidance on reporting of DTA systematic reviews in order to compile a list of potential items that might be included in a reporting guideline for such reviews, the PRISMA extension for DTA (PRISMA-DTA).

Methods

The protocol for this review is available on the EQUATOR network’s website (http://www.equator-network.org/) in “guidelines under development” [19].

Database search

To identify published articles pertaining to reporting of DTA systematic reviews, an experienced medical information specialist (BS) developed a search strategy through an iterative process in consultation with the review team. The strategy was peer-reviewed prior to execution by another senior information specialist using the PRESS checklist [20]. Using the Ovid platform, we searched Ovid MEDLINE® and Ovid MEDLINE® In-Process & Other Non-Indexed Citations and Embase Classic+Embase on May 5, 2016. We also searched the Cochrane Methodology Register in the Cochrane Library, which contains records published July 2012 and earlier, (Wiley version) on the same date. Strategies used a combination of controlled vocabulary (e.g., “Diagnostic Tests, Routine,” “Review Literature as Topic,” “Publication Bias”) and keywords (e.g., “DTA,” “systematic review,” “reporting”). Vocabulary and syntax were adjusted across databases. There were no date or language restrictions on any of the searches. Specific details regarding search strategies appear in Appendix 1.

Inclusion/exclusion criteria, study selection, and data extraction

We included articles in full-text or abstract form that reported on any aspect of reporting DTA systematic reviews. Specifically, we included studies that evaluated the quality of reporting of any aspect of DTA systematic reviews and studies that provided guidance or suggestions as to how a DTA systematic review should be performed.

Titles and abstracts of all search results were screened independently for potential relevance by two investigators (MA, MDFM). For any citation deemed potentially relevant, full texts were retrieved and independently assessed in duplicate for inclusion with disagreements being resolved by consensus (TAM, MDFM). To facilitate the extraction process, studies were divided into several categories pertaining to the specific reporting topics: assessment of quality of reporting, general guidance on performing or reporting DTA systematic reviews, guidance on search methods for primary DTA studies, assessment of heterogeneity, pooling and meta-analysis methods, assessment of publication bias, risk of bias, and “other.” Reference list of included sources is provided in Appendix 2.

In addition to sources related to DTA systematic reviews, the following sources were reviewed: reporting guideline organizations’ websites (Enhancing the QUAlity and Transparency of Health Research (EQUATOR) [21]), guidance for reporting systematic reviews and meta-analyses of other types of research (Meta-analysis of Observational Studies in Epidemiology (MOOSE) [22], PRISMA [12], PRISMA extensions [23,24,25,26,27]), guidance for reporting diagnostic test accuracy studies (STARD 2015 [28], STARD for abstracts), guidance for, or tools for assessing the methodologic quality of systematic reviews and meta-analyses (A Measurement Tool to Assess Systematic reviews (AMSTAR) [29], risk of bias in systematic reviews (ROBIS) [30], Methodological Expectations of Cochrane Intervention Reviews (MECIR) [31]), and The Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (completed chapters) [18]. Post hoc assessment of the following items not included in the initial search was done: the Agency for Healthcare Research and Quality (AHRQ) Methods Guide for Comparative Effectiveness Research, the Institute of Medicine’s 2011 Standards for Systematic Reviews and the Centre for Reviews and Dissemination guidance [32,33,34]. No additional items were generated from these sources.

The PRISMA and STARD 2015 checklists were initially assessed independently and in duplicate in order to compile a list of potentially relevant items for the PRISMA-DTA statement. Any item that was deemed possibly relevant to DTA systematic reviews by either investigator was included. Next, all other guidance documents (reporting checklists, The Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy, etc.) and full texts of potentially relevant records were similarly assessed in duplicate for additional potentially relevant items (TAM, MDFM). Again, any item that was deemed possibly relevant to DTA systematic reviews by either investigator was included. Items deemed relevant may have had wording changed from the original source to make them more applicable to systematic reviews of diagnostic test accuracy and/or broken into multiple sub-items to facilitate the Delphi process for PRISMA-DTA. All included items were used to generate a comprehensive summary of existing guidance on reporting of DTA systematic reviews.

Results

Database search

The database search yielded 6967 results. After title and abstract screening, 386 results remained. This was further reduced to 203 results after full-text screening (Fig. 1 ).

Fig. 1
figure 1

Study flow diagram

Identification of potentially relevant items

After searching the existing literature and guidance documents, a preliminary list of 64 unique items was compiled and divided into the following categories mirroring the PRISMA statement: title (three items); introduction (two items); methods (35 items); results (13 items); discussion (nine items), and disclosure (two items). The methods section was further divided into eligibility criteria and search strategy (10 items), study selection and data extraction (seven items), primary study data items that should be provided (one item containing 10 sub-items.), risk of bias and heterogeneity (six items), and summary measures and statistics (11 items). The identified items along with citations for the sources from which they were taken are presented in Table 1; shaded items on the table indicate items specific to diagnostic accuracy systematic reviews, while unshaded items represent more general guidance for systematic reviews.

Table 1 Potential relevant items for PRISMA-DTA checklist. Items deemed by the authors to apply specifically to DTA reviews are in Bold

Items were taken from 19 unique sources with publication dates between 2007 and 2016, a combination of guidance documents and some of the 203 search results. The 19 sources included the PRISMA statement [12], the PRISMA Explanation and Elaboration document [35], STARD 2015 [28], MECIR [31], AMSTAR [36], QUADAS-2 [14], eight research articles [6, 17, 37,38,39,40,41,42], two reviews [2, 43], two DTA statistical methodology overviews [44, 45], and one conference abstract [46]. Many of the 203 included results contained redundant information; one source was cited per item.

Summary of rationale for relevant items

This section will highlight some of the items that are proposed that have particular relevance to DTA systematic reviews.

Title: The potential items listed in this section aim to clearly identify “big picture” components of study design; this not only allows immediate reader comprehension, but enhances indexing and searchability. Items 1 and 2 are drawn from PRISMA and STARD 2015 and require that the title indicate that the study is a systematic review (item 1) and is a study of diagnostic accuracy (item 2). Item 3 required reporting on whether the study design is comparative (one test vs. another) or non-comparative; comparative design is increasingly important, common, and associated with methodologic challenges [37].

Introduction: Item 4 requires framing the role of the index test in the existing clinical pathway; understanding the clinical role of a test is essential to generalizability of findings. For example, if a test evaluation focuses on a “triage” test (e.g., d-dimer for determination of pre-test probability prior to CT pulmonary angiogram), it may not be appropriate to generalize its use as a “replacement” test (e.g., d-dimer as a replacement for CT). The performance of diagnostic tests is variable depending on the specific clinical scenario [28, 47].

Methods—protocol, eligibility, and search: All items in this section are generalizable to all systematic reviews; none were deemed to be specific to DTA systematic reviews.

Methodsstudy selection and data collection: Multiple items in this section focus on specific details of the search strategy and are aimed at enhancing reproducibility. None of these is of particular specific relevance to DTA reviews; however, detail additional to that recommended by PRISMA has been listed since subsequent systematic review methodologic recommendations have suggested their inclusion [31].

Methods—primary study data items: Item 25 focuses on which characteristics from primary studies included in a review should be reported. Several aspects of this item are unique to DTA systematic reviews, such as index test, reference standard, target condition definition, test positivity thresholds, and clinical setting. All this information is vital for readers to make an appropriate assessment of the review.

Methods—risk of bias and heterogeneity: Assessment of study quality and heterogeneity are not unique to DTA reviews. However, study quality assessment for diagnostic accuracy studies includes assessment of risk of bias and concerns regarding applicability, thus the quality assessment tool used in DTA reviews should capture and report these issues (item 24) [14]. Additionally, since sensitivity and specificity are correlated, univariate measures of heterogeneity, such as I 2, are typically not appropriate to report heterogeneity in diagnostic test accuracy reviews. Thus, heterogeneity may be reported either qualitatively or using measures that account for the correlation between sensitivity and specificity (item 28) [2].

Methods—summary statistics: Multiple readers may interpret an index test. How this is accounted for statistically may affect the results and, therefore, should be reported (item 33) [17]. An important difference in DTA meta-analysis from interventions is the correlation between sensitivity and specificity. Thus, it is very important to report the statistical model used for meta-analysis so readers can determine the impact of these methods on the results (item 34) [6].

Results: In order to facilitate reproduction of analyses and to make it clear to the readers which data was meta-analyzed, 2 × 2 data for each study included in meta-analyses should be made available (item 46) [43, 45].

Discussion and disclosure: All items in this section are generalizable to all systematic reviews; none was deemed to be specific to DTA systematic reviews.

Discussion

We consulted existing guidance on the reporting of systematic reviews and the published literature related to the conduct and reporting of DTA systematic reviews to identify 64 potential items for reporting DTA systematic reviews. The systematic, comprehensive search categorized by manuscript section builds on prior work, which has been based on non-systematic searches and expert opinion. The items identified will form the basis of a Delphi process that will be conducted to generate the PRISMA-DTA checklist. Items have been broken down into single concepts or descriptors for the Delphi process. During the Delphi process, suggestions from the PRISMA-DTA group will be incorporated. Thus, some items may not appear on the final PRISMA-DTA checklist. Additionally, PRISMA-DTA group members may propose additional items during the Delphi process. Wording of items as presented here may also be adjusted at the PRISMA-DTA consensus meeting. Therefore, it is advised to consult the final checklist after it has been published for use in guiding reporting systematic reviews of diagnostic test accuracy.

This evaluation improves on prior work, which has largely been based on non-systematic reviews, and expert opinion. The work is a small but essential step towards a clear reporting guideline for DTA systematic reviews. Future work should not only include creating the PRISMA-DTA checklist, but evaluating for “baseline” adherence to PRISMA-DTA in order to guide knowledge translation interventions aimed at targeted improvements for reporting of DTA systematic reviews.

Strengths and limitations

This systematic review benefits from a comprehensive, expert, peer-reviewed search, duplicate extraction, and categorization of potentially relevant items by manuscript section which mirrors the format of the PRISMA checklist. Limitations of our systematic review are that we did not formally assess the quality of sources for included items, we provide only a qualitative summary, and we may not have identified potentially relevant items from work yet to be published. We believe that many of these shortcomings will be addressed in the process for generation of the PRISMA-DTA checklist as outlined in our complete study protocol [48].

Conclusions

The reporting of DTA systematic reviews is often incomplete [10, 11, 49]. Incomplete reporting has been identified as a preventable source of waste in biomedical research [43]. Therefore, a reporting guideline specific to DTA systematic reviews is needed to reduce waste, increase utility, and facilitate reproducibility of these reviews. This systematic review is the first step towards gathering all relevant evidence pertinent to reporting of DTA systematic reviews. This step is critical in the EQUATOR network’s established guidance for reporting guidelines development [50]. This information will serve as the substrate for a PRISMA-DTA extension to guide reporting of DTA systematic reviews and will complement the more than 300 reporting guidelines indexed by the EQUATOR Network [21].