Chapter 2: Medical Tests Guidance (2) Developing the Topic and Structuring Systematic Reviews of Medical Tests: Utility of PICOTS, Analytic Frameworks, Decision Trees, and Other Frameworks
- First Online:
- Cite this article as:
- Samson, D. & Schoelles, K.M. J GEN INTERN MED (2012) 27(Suppl 1): 11. doi:10.1007/s11606-012-2007-7
Topic development and structuring a systematic review of diagnostic tests are complementary processes. The goals of a medical test review are to identify and synthesize evidence to evaluate the impacts alternative testing strategies on health outcomes and to promote informed decisionmaking. A common challenge is that the request for a review may state the claim for the test ambiguously. Due to the indirect impact of medical tests on clinical outcomes, reviewers need to identify which intermediate outcomes link a medical test to improved clinical outcomes. In this paper, we propose the use of five principles to deal with challenges: the PICOTS typology (patient population, intervention, comparator, outcomes, timing, setting), analytic frameworks, simple decision trees, other organizing frameworks and rules for when diagnostic accuracy is sufficient.
KEY WORDSdiagnosis review literature as topic decision trees
“[We] have the ironic situation in which important and painstakingly developed knowledge often is applied haphazardly and anecdotally. Such a situation, which is not acceptable in the basic sciences or in drug therapy, also should not be acceptable in clinical applications of diagnostic technology.”
J. Sanford (Sandy) Schwartz, Institute of Medicine, 19851
Developing the topic creates the foundation and structure of an effective systematic review. This process includes understanding and clarifying a claim about a test (how a test might be of value in practice) and establishing the key questions to guide decisionmaking related to the claim. This typically involves specifying the clinical context in which the test might be used. Clinical context includes patient characteristics, how a new test might fit into existing diagnostic pathways, technical details of the test, characteristics of clinicians or operators using the test, management options and setting. Structuring the review refers to identifying the analytic strategy that will most directly achieve the goals of the review, accounting for idiosyncrasies of the data.
Topic development and structuring of the review are complementary processes. As evidence-based practice centers (EPCs) develop and refine the topic, the structure of the review should become clearer. Moreover, success at this stage reduces the chance of major changes in the scope of the review and minimizes rework. While this paper is intended to serve as a guide for EPCs, the processes described here are relevant to other systematic reviewers and a broad spectrum of stakeholders including patients, clinicians, caretakers, researchers, funders of research, government, employers, health care payers and industry, as well as the general public. This paper highlights challenges unique to systematic reviews of medical tests. For a general discussion of these issues as they exist in all systematic reviews, we refer the reader to previously published EPC methods papers2,3. This paper is one of 12 chapters in a JGIM and AHRQ supplement that address all aspects of preparation of systematic reviews of diagnostic tests.
The ultimate goal of a medical test review is to identify and synthesize evidence that will help evaluate the impacts on health outcomes of alternative testing strategies. Two common problems can impede achieving this goal. One is that the request for a review may state the claim for the test ambiguously. For example, a new medical test for Alzheimer’s disease might fail to specify the patients who may benefit from the test—ranging from the use of the test as a screening tool among the “worried well” without evidence of deficit to using it as a diagnostic test in those with frank impairment and loss of function in daily living. Similarly, the request for a review of tests for prostate cancer might neglect to consider the role of such tests in clinical decisionmaking, such as guiding the decision to biopsy.
Because of the indirect impact of medical tests on clinical outcomes, a second problem is how to identify which intermediate outcomes link a medical test to improved clinical outcomes compared to an existing test. The scientific literature related to the claim rarely includes direct evidence, such as randomized controlled trial results, in which patients are allocated to the relevant test strategies and evaluated for downstream health outcomes. More commonly, evidence about outcomes in support of the claim relates to intermediate outcomes, such as test accuracy.
PRINCIPLES FOR ADDRESSING THE CHALLENGES
Principle 1: Engage Stakeholders Using the PICOTS Typology
In approaching topic development, reviewers should engage in a direct dialogue with the primary requestors and relevant users of the review (herein denoted “stakeholders”) to understand the objectives of the review in practical terms; in particular, investigators should understand the sorts of decisions that the review is likely to affect. This serves to bring investigators and stakeholders to a shared understanding about the essential details of the tests and their relationship to existing test strategies (i.e., replacement, triage, or add-on), range of potential clinical utility, and potential adverse consequences of testing.
Operationally, the objective of the review is reflected in the key questions, which are normally presented in a preliminary form at the outset of a review. Reviewers should examine the proposed key questions to ensure that they accurately reflect the needs of stakeholders and are likely to be answered given the available time and resources. This is a process of trying to balance the importance of the topic against the feasibility of completing the review. Including a wide variety of stakeholders (such as the U.S. Food and Drug Administration [FDA], manufacturers, technical and clinical experts, and patients) can help provide additional perspectives on the claim and use of the tests. A preliminary examination of the literature can identify existing systematic reviews and clinical practice guidelines that may summarize evidence on current strategies for using the test and its potential benefits and harms.
The PICOTS typology (Patient population, Intervention, Comparator, Outcomes, Timing, Setting), defined in the Introduction to this Medical Test Methods Guide (Chapter 1), is a typology for defining particular contextual issues, and this formalism can be useful in focusing discussions with stakeholders. Furthermore, the PICOTS typology is a vital part of systematic reviews of both interventions and tests, lending them a transparent and explicit structure and influencing search methods, study selection and data extraction.
It is important to recognize that the process of topic refinement is iterative and PICOTS elements may change as the clinical context becomes clearer. Despite the best efforts of all participants, the topic may evolve even as the review is being conducted. Investigators should consider at the outset how such a situation will be addressed.4, 5, 6
Principle 2: Develop an Analytic Framework
We use the term “analytic framework” (sometimes called a causal pathway) to denote a specific form of graphical representation that specifies a path from the intervention or test of interest to all important health outcomes, including intervening steps and intermediate outcomes. 7 Among PICOTS elements, the target patient population, intervention and clinical outcomes are specifically shown. The intervention can actually be viewed as a test and treat strategy as shown in links 2 through 5. In the figure, the comparator is not shown explicitly, but is implied. Each linkage relating test, intervention, or outcome represents a potential key question and, it is hoped, a coherent body of literature.
The AHRQ EPC program has described the development and use of analytic frameworks in systematic reviews of interventions. Since the impact of tests on clinical outcomes usually depends on downstream interventions, analytic frameworks for systematic reviews of tests are particularly valuable and should be routinely included. The analytic framework is developed iteratively in consultation with stakeholders to illustrate and define the important clinical decisional dilemmas and thus serves to clarify important key questions further.2
However, systematic reviews of medical tests present unique challenge not encountered in reviews of therapeutic interventions. The analytic framework can help users to understand how the often-convoluted linkages between intermediate and clinical outcomes fit together, and to consider whether these downstream issues may be relevant to the review. Adding specific elements to the analytic framework will reflect the understanding gained about clinical context.
In summarizing evidence, studies for each linkage might vary in strength of design, limitations of conduct, and adequacy of reporting. The linkages leading from changes in patient management decisions to health outcomes are often of particular importance. The implication here is that the value of a test usually derives from its influence on some action taken in patient management. Although this is usually the case, sometimes the information alone from a test may have value independent of any action it may prompt. For example, information about prognosis that does not necessarily trigger any actions may have a meaningful psychological impact on patients and caregivers.
Principle 3: Consider Using Decision Trees
An analytic framework is helpful when direct evidence is lacking, showing relevant key questions along indirect pathways between the test and important clinical outcomes. Analytic frameworks are, however, not well-suited to depicting multiple alternative uses of the particular test (or its comparators) and are limited in their ability to represent the impact of test results on clinical decisions, and the specific potential outcome consequences of altered decisions. Reviewers can use simple decision trees or flow diagrams alongside the analytic framework to illustrate details of the potential impact of test results on management decisions and outcomes. Along with PICOTS specifications and analytic frameworks, these graphical tools represent systematic reviewers’ understanding of the clinical context of the topic. Constructing decision trees may help to clarify key questions by identifying which indices of diagnostic accuracy and other statistics are relevant to the clinical problem and which range of possible pathways and outcomes (see Paper 3) practically and logically flow from a test strategy. Lord et al. describe how diagrams resembling decision trees define which steps and outcomes may differ with different test strategies, and thus the important questions to ask to compare tests according to whether the new test is a replacement, a triage, or an add-on to the existing test strategy.9
One example of the utility of decision trees comes from a review of noninvasive tests for carotid artery disease.10 In this review, investigators found that common metrics of sensitivity and specificity that counted both high-grade stenosis and complete occlusion as “positive” studies would not be reliable guides to actual test performance because the two results would be treated quite differently. This insight was subsequently incorporated into calculations of noninvasive carotid test performance.10,11 Additional examples are provided in the illustrations below. For further discussion on when to consider using decision trees, see Paper 10 in this series.
Principle 4: Sometimes it Is Sufficient to Focus Exclusively on Accuracy Studies
Once reviewers have diagrammed the decision tree by which diagnostic accuracy may affect intermediate and clinical outcomes, it is possible to determine whether it is necessary to include key questions regarding outcomes beyond diagnostic accuracy. For example, diagnostic accuracy may be sufficient when the new test is as sensitive and as specific as the old test and the new test has advantages over the old test such as fewer adverse effects, is less invasive, is easier to use, provides results more quickly or is lower in cost. Implicit in this example is the comparability of downstream management decisions and outcomes between the test under evaluation and the comparator test. Another instance when a review may be limited to evaluation of sensitivity and specificity is when the new test is as sensitive as, but more specific than, the comparator, allowing avoidance of harms of further tests or unnecessary treatment. This situation requires the assumptions that the same cases would be detected by both tests and that treatment efficacy would be unaffected by which test was used.12,13
Are extra cases detected by the new, more sensitive test similarly responsive to treatment as are those identified by the older test?
Are trials available that selected patients with the new test?
Do trials assess whether the new test results predict response?
If available trials selected only patients assessed with the old test, do extra cases identified with the new test represent the same spectrum or disease subtypes as trial participants?
Are tests’ cases subsequently confirmed by same reference standard?
Does the new test change the definition or spectrum of disease (e.g., earlier stage)?
Is there heterogeneity of test accuracy and treatment effect (i.e., do accuracy and treatment effects vary sufficiently according to levels of a patient characteristic to change the comparison of the old and new test)?
When the clinical utility of an older comparator test has been established, and the first five questions can all be answered in the affirmative, then diagnostic accuracy evidence alone may be sufficient to support conclusions about a new test.
Principle 5: Other Frameworks May Be Helpful
Various other frameworks (generally termed “organizing frameworks,” as described briefly in the Introduction to this Medical Test Methods Guide [Paper 1]) relate to categorical features of medical tests and medical test studies. Lijmer and colleagues reviewed the different types of organizational frameworks and found 19 frameworks, which generally classify medical test research into 6 different domains or phases, including technical efficacy, diagnostic accuracy, diagnostic thinking efficacy, therapeutic efficacy, patient outcome, and societal aspects.13
These frameworks serve a variety of purposes. Some researchers, such as Van Den Bruel and colleagues, consider frameworks as a hierarchy and a model for how medical tests should be studied, with one level leading to the next (i.e., success at each level depends on success at the preceding level).14 Others, such as Lijmer and colleagues have argued that “The evaluation frameworks can be useful to distinguish between study types, but they cannot be seen as a necessary sequence of evaluations. The evaluation of tests is most likely not a linear but a cyclic and repetitive process.”13
Examples of Initially Ambiguous Claims that were Clarified Through the Process of Topic Development
Full-Field Digital Mammography
FFDM to replace SFM in breast cancer screening (Fig. 3)
HER2 gene amplification assay as add-on to HER2 protein expression assay (Fig. 4)
PET as triage for breast biopsy (Fig. 5)
Initial ambiguous claim
FFDM may be a useful alternative to SFM in screening for breast cancer
HER2 gene amplification and protein expression assays may complement each other as means of selecting patients for targeted therapy
PET may play an adjunctive role to breast examination and mammography in detecting breast cancer and selecting patients for biopsy
Key concerns suggested by PICOTS, analytic framework, and decision tree
Key statistics: sensitivity, diagnostic yield, recall rate; similar types of management decisions and outcomes for index and comparator test-and-treat strategies
Key statistics: proportion of individuals with intermediate/ equivocal HER2 protein expression results who have HER2 gene amplification; key outcomes are related to effectiveness of HER2-targeted therapy in this subgroup
Key statistics: negative predictive value; key outcomes to be contrasted were benefits of avoiding biopsy versus harms of delaying initiation of treatment for undetected tumors
In screening for breast cancer, interpretation of FFDM and SFM would be similar, leading to similar management decisions and outcomes; FFDM may have a similar recall rate and diagnostic yield at least as high as SFM; FFDM images may be more expensive, but easier to manipulate and store
Among individuals with localized breast cancer, some may have equivocal results for HER2 protein overexpression but have positive HER2 gene amplification, identifying them as patients who may benefit from HER2-targeted therapy but otherwise would have been missed
Among patients with a palpable breast mass or suspicious mammogram, if FDG PET is performed before biopsy, those with negative scans may avoid the adverse events of biopsy with potentially negligible risk of delayed treatment for undetected tumor
Blue Cross and Blue Shield Association Technology Evaluation Center, 200215
Seidenfeld et al., 200816
Samson et al., 200217
This case illustrates when a more formal decision analysis may be useful, specifically when new test has higher sensitivity but lower specificity than the old test, or vice versa. Such a situation entails tradeoffs in relative frequencies of true positives, false negatives, false positives, and true negatives, which decision analysis may help to quantify.
The immediate goal of a systematic review of a medical test is to determine the health impacts of use of the test in a particular context or set of contexts relative to one or more alternative strategies. The ultimate goal is to produce a review that promotes informed decisionmaking.
Reaching the above-stated goals requires an interactive and iterative process of topic development and refinement aimed at understanding and clarifying the claim for a test. This work should be done in conjunction with the principal users of the review, experts, and other stakeholders.
The PICOTS typology, analytic framework, simple decision trees, and other organizing frameworks are all tools that can minimize ambiguity, help identify where review resources should be focused, and guide the presentation of results.
Sometimes it is sufficient to focus only on accuracy studies. For example, diagnostic accuracy may be sufficient when the new test is as sensitive and specific as the old test and the new test has advantages over the old test such as fewer adverse effects, is less invasive, is easier to use, provides results more quickly or is lower in cost.
We wish to thank David Matchar and Stephanie Chang for their valuable contributions. Work on this paper was funded by AHRQ.
Conflict of Interest
The authors declare that they do not have a conflict of interest.