Network Meta-analysis: Users’ Guide for Surgeons: Part I – Credibility
- 1.4k Downloads
Conventional meta-analyses quantify the relative effectiveness of two interventions based on direct (that is, head-to-head) evidence typically derived from randomized controlled trials (RCTs). For many medical conditions, however, multiple treatment options exist and not all have been compared directly. This issue limits the utility of traditional synthetic techniques such as meta-analyses, since these approaches can only pool and compare evidence across interventions that have been compared directly by source studies. Network meta-analyses (NMA) use direct and indirect comparisons to quantify the relative effectiveness of three or more treatment options. Interpreting the methodologic quality and results of NMAs may be challenging, as they use complex methods that may be unfamiliar to surgeons; yet for these surgeons to use these studies in their practices, they need to be able to determine whether they can trust the results of NMAs. The first judgment of trust requires an assessment of the credibility of the NMA methodology; the second judgment of trust requires a determination of certainty in effect sizes and directions. In this Users’ Guide for Surgeons, Part I, we show the application of evaluation criteria for determining the credibility of a NMA through an example pertinent to clinical orthopaedics. In the subsequent article (Part II), we help readers evaluate the level of certainty NMAs can provide in terms of treatment effect sizes and directions.
KeywordsIntramedullary Nailing Indirect Comparison Treatment Effect Size Displace Femoral Neck Fracture Periprosthetic Femur Fracture
You are asked to see an active 36-year-old male patient who recently presented to the emergency department with a Gustilo Grade IIIA open mid-shaft tibial fracture. In your practice, you commonly perform unreamed intramedullary nailing for this fracture type. You present the case at morning rounds and several of your colleagues believe that given the amount of tissue destruction, an external fixator is the preferred option. Another colleague cites several studies that report high malunion rates with external fixation and he believes reamed intramedullary nailing leads to better bony stability and biologically enhanced union rates. You perform an uneventful, unreamed intramedullary nailing in the patient, who reports good function at 2-week followup. However, 9 months postoperatively, the patient continues to report residual pain, and radiographs show evidence of nonunion. You consider adjunctive revision procedures and wonder if either reaming or using an external fixator would have been better initial management options. You perform a literature search and come across a recently published network meta-analysis that evaluates outcomes for surgical treatment of open tibial shaft fractures . What approaches can you use to evaluate the credibility of this work?
Network Meta-analysis: Background and Rationale
Systematic reviews use defined strategies for identification, inclusion, appraisal, and reporting of results to summarize the literature addressing a specific clinical question [19, 21]. Meta-analysis refers to explicit statistical methods to generate a single pooled estimate summarizing the results of the included studies . Traditionally, meta-analyses have evaluated the effectiveness of one intervention compared with another intervention using direct (head-to-head) comparisons; these studies sometimes are referred to as conventional or pairwise meta-analyses, and during the past 20 years have played an increasingly important role in orthopaedic science and practice [1, 3, 27].
The advantages of pairwise meta-analyses of randomized controlled trials (RCTs) compared with individual RCTs include: (1) greater precision in estimating treatment effects; (2) ability to assess variability (heterogeneity) in treatment effects among trials; (3) improved power to conduct subgroup analyses; (4) avoidance of unrepresentative results that arise from single studies owing to either random variation or bias; and (5) better informed determination of areas for future research.
Pairwise meta-analyses, however, are limited to addressing the effects of a single treatment versus a single alternative. For many conditions in orthopaedic surgery, several treatment options exist and the number of head-to-head clinical trials is limited. For instance, a systematic review and meta-analysis addressing management of periprosthetic femur fractures evaluated four treatment strategies: nonoperative treatment, nonlocked plating, locked plating, and retrograde intramedullary nailing . Using pairwise meta-analysis to evaluate the relative effects of the four strategies would require six sets of head-to-head clinical trials. Further, the size of the trials would need to be large enough that, when pooled, they could provide sufficient precision for definitive decision-making . Such robust clinical data are unlikely to be available for most clinical questions.
In response to the need to simultaneously evaluate all available treatments, new methods in meta-analysis—known as network meta-analysis (NMA)—have emerged (Box 1) [5, 14, 16, 17]. These are complex studies that should be conducted only with the support of an expert statistician. Also referred to as multiple-treatment−comparison meta-analyses, NMAs involve creating networks of treatments. Authors then apply statistical methods to these networks to estimate the effects of treatments shown through direct comparisons (head-to-head trials, A versus B) and indirect comparisons (making inferences about A versus C by looking at how ‘A’ compares with common comparator ‘B’ and how ‘C’ compares with common comparator ‘B’). Investigators then combine direct and indirect comparisons to provide an overall pooled treatment effect .
Indirect evidence, when combined with head-to-head comparisons, may enhance precision by increasing sample size and thus narrowing confidence intervals (CIs) . A NMA also enables ranking available treatments and facilitates exploration of subgroup effects (eg, some treatments may work better with less severe fractures and others with more severe fractures) through a process known as meta-regression .
The credibility of a NMA depends on consistency of results between direct and indirect comparisons. Most of the time direct and indirect comparisons yield similar results . Dealing with inconsistency is an issue to which we will return.
Can You Trust the Results of a NMA?
Guide for assessing credibility of the systematic review process
Did the review explicitly address a sensible clinical question?
Was the search for relevant studies exhaustive?
Are selection and assessments of studies reproducible?
Did the review present results that are ready for clinical application?
Did the review address confidence in effect estimates?
A second possible difficulty, even if the NMA is highly credible and has adhered to rigorous methodologic standards, is that there still may be low certainty in the estimates of effect sizes and directions that emerge from the NMA. This is because the underlying evidence (the studies that contribute to the review) may be limited by a high risk of selection, transfer, or assessment bias, imprecision (small sample size and wide CIs), inconsistent results from study to study, and publication bias. Underlying evidence causing decreased certainty in terms of the estimated effects in a NMA is discussed in detail in Part II of the Users’ Guide for Surgeons.
Was the Systematic Review Process Credible?
Did the Review Explicitly Address a Sensible Clinical Question?
Formulating a sensible clinical question is the first step in any research endeavor, including systematic reviews and NMAs. Ideally, the question is framed in the Population, Intervention, Control (or Comparators), and Outcomes (PICO) format. If the patient population is too diverse, questions may be too broad to be useful. For instance, a hypothetical NMA that evaluates internal fixation for all hip fractures, regardless of patient age or severity of trauma, might be problematic. A displaced femoral neck fracture sustained by a 20-year-old man involved in a high-speed motor vehicle collision is likely to have a different response to available treatments than a low-energy fragility hip fracture sustained by an elderly woman with dementia . In the latter case, a hemiarthroplasty may lead to good function and fewer complications, whereas the younger man might do better if his femoral neck were to be fixed and his native femoral head preserved. If the effects of available treatments differ in these two patients, pooling different patient group results will provide misleading inferences, perhaps by resulting in an intermediate effect size that applies to neither young nor old patients.
Selection of patients is not the only study characteristic that requires similarity across trials; interventions also must be similar. For example, if techniques have improved with time, combining older studies of a surgical approach to treatment with more recent studies of the same but improved approach may yield misleading results (an intermediate effect representing neither the old nor the new procedure) . The same is true for differences in outcome measurement. For instance, effects may differ with time and including studies with short- and long-term followup again may yield intermediate results representing neither short- nor long-term effects.
Ensuring sufficient similarity in patients, interventions, and outcomes is important in any meta-analysis, but it is particularly important in NMAs. If there are differences in the groups, indirect comparisons can be particularly vulnerable. In other words, if we are making deductions regarding our comparison of interest (for example, A vs C) through a common comparator (B), important differences in patients, interventions, or outcomes between the A vs B trials and B vs C trials may bias the indirect comparison. The specific factors that differ, such as patients, interventions, or outcomes, are referred to as effect modifiers. For example, if treatment effects differ between young and old patients, then age is an effect modifier. When the A vs B and B vs C trials differ excessively in patients, interventions, or outcomes, we label the problem as a violation of transitivity .
One way to check whether transitivity has been violated is to ask the question: Could all patients and treatments in this NMA be included as independent arms in a single RCT? If the answer to the question is “no,” then transitivity has been violated and the credibility of the NMA is compromised (Box 2).
Example of Transitivity
In the tibial shaft NMA , comparator groups were primary external fixation, reamed intramedullary nailing, unreamed intramedullary nailing, Ender nails, plate fixation, and primary Ilizarov fixation. Each of the procedures could reasonably make up one arm of a six-armed hypothetical clinical trial evaluating treatment options in the primary management of open tibial shaft fractures. Further, all trials enrolled predominantly adult patients (age older than 18 years) who had experienced open fractures of the tibial shaft, and all trials reported unplanned reoperation rates. Therefore the comparators are similar and the NMA meets the transitivity assumption.
Was the Search for Relevant Studies Exhaustive?
As for any systematic review, investigators should conduct a comprehensive literature search. Standard search strategies for randomized trials should include electronic medical databases, such as MEDLINE, EMBASE, and the Cochrane Library; conference proceedings and abstracts; clinical trial registries; and manual searches of reference lists. In general, searching multiple databases is an important strategy to avoid missing relevant articles.
Are Selection and Assessments of Studies Reproducible?
Authors should present evidence that the selection and assessment of studies was reproducible. Duplicate review eligibility and risk of bias assessment, including a measure of agreement, (chance-corrected agreement measured by κ often is used and is appropriate) enhances the credibility of a NMA .
Did the Review Present Results That Are Ready for Clinical Application?
Results should be presented in a way that they are useful to clinicians and can be applied to patient care. Clinicians should be able to easily find best estimates of relative effect sizes and directions of each paired comparison, with 95% CIs or credible intervals. A credible interval conveys the same information as a CI, but is the term used when authors have used Bayesian rather than the standard frequentist statistical approaches . Bayesian approaches are popular in NMAs because, unlike frequentist approaches, they allow for estimation of effects in terms of probabilities, which are more intuitively understood in the context of multiple treatments . Therefore, readers often will find these articles reporting credible intervals.
Authors also should present estimates of absolute treatment effect sizes. A more costly procedure associated with a longer hospital stay may result in a 50% reduction in failure rates, which sounds impressive. The implications are very different, however, if it represents a reduction from 2% to 1% or from 20% to 10%. Thus, presentation of absolute differences is critical to informed decision-making.
In general, authors of NMAs should present results of each paired comparison in the network. Presentation of results of all paired comparisons, however, can become overwhelming when there are many competing alternative treatments. For instance, four treatments yield a manageable six comparisons, six treatments yield a challenging 15 comparisons, and 10 treatments yield an overwhelming 45 comparisons. One appealing format is a forest plot that presents the best estimates of treatment for all agents against one of the least effective. For instance, an extensive NMA addressing 21 surgical and nonsurgical treatment options for sciatica presented all results compared with an inactive control (conventional care) . Even with only a small number of treatments, such a presentation can aid interpretation.
Did the Review Address Confidence in Effect Estimates?
In any NMA, it is almost certain that confidence in estimates will vary from comparison to comparison. Authors need to provide information that helps distinguish between comparisons that warrant strong inferences and those that do not. Issues that compromise the strength of inference and, therefore, the certainty of the evidence include: (1) high risk of bias in the included trials, which is reflected through issues such as lack of concealed randomization, lack of blinding, and high loss to followup. Readers may be familiar with common methods to describe this bias through scales such as the quality assessment scale described by Detsky et al. , the scale reported by Jadad et al. , or the Cochrane risk of bias tool ; (2) imprecision, which is reflected in very wide CIs or credible intervals; (3) inconsistency, which occurs when results differ from study to study, or between direct and indirect comparisons; (4) indirectness, which in a NMA we refer to as intransitivity, typically resulting from differing populations, interventions, or outcomes; (5) publication bias, which occurs when negative trials remain unpublished; because systematic reviews are more efficient at finding published trials than unpublished ones, this can result in overestimates of treatment effect sizes and erroneous conclusions in support of newer treatments.
Credible NMAs ensure transparent reporting of necessary information to enable readers to make an informed assessment of the certainty of the evidence. Until recently, most NMAs dealt with issues of certainty of the evidence either in a cursory way or as inferences across the whole network . Because certainty is likely to differ from comparison to comparison, such an approach is not very useful. Without certainty estimates for each paired comparison, clinicians are left to guess which results they can trust and which they cannot. Ideally, for each comparison, the authors present the estimate from the direct comparisons and its associated certainty, the estimate from the indirect comparison and its associated certainty, and the overall NMA estimate and its associated certainty. Methodology from the The Grades of Recommendation, Assessment, Development and Evaluation (GRADE) working group provides a system for making certainty ratings that considers risk of bias, precision, consistency, directness, and publication bias [7, 23].
Revisiting the Clinical Scenario
The NMA addressing management of open tibial shaft fractures has posed a sensible question, patients are well defined and relevant to your clinical dilemma, comparator treatments are comprehensive yet similar, and outcomes are explicitly defined, patient-important, and measured during a consistent and logical time. The authors conducted an exhaustive search and performed duplicate review using predesigned, standardized forms. For the main outcomes, the authors used the 1-year critical unplanned reoperation rate; they also reported deep and superficial infection, nonunion, and malunion rates. They had planned to report postoperative functional outcomes and on health-related quality of life but data were sparse.
The authors present direct comparisons as forest plots, which include absolute event rates, odds ratios, and 95% CIs, and the indirect and combined direct-indirect comparisons in a concise table, along with odds ratios, 95% credible intervals, and GRADE assessments of confidence. The authors also provide an overall ranking of the four treatments with sufficient evidence. You conclude that the methodology is sufficiently robust to make it a credible NMA and therefore proceed to the results.
A NMA is a relatively new study methodology that provides simultaneous comparisons of multiple treatment options based on direct and indirect evidence. Assessing the credibility of the methodology is an important first step in critically appraising a NMA. As with conventional systematic reviews, assessing credibility involves evaluating the article for a sensible research question, an exhaustive search, reproducible selection and assessment of articles, presentation of clinically applicable results, and addressing certainty in effect estimates (Box 3).
In Part II of the Users’ Guide for Surgeons we discuss the second important step in critical appraisal of a NMA: judging the certainty we can place in the results.
- 6.Foote CJ, Guyatt GH, Vignesh KN, Mundi R, Chaudhry H, Heels-Ansdell D, Thabane L, Tornetta P 3rd, Bhandari M. Which surgical treatment for open tibial shaft fracture results in the fewest reoperations? A network meta-analysis. Clin Orthop Relat Res. 2015 Feb 28. [Epub ahead of print].Google Scholar
- 7.Guyatt G, Oxman AD, Akl EA, Kunz R, Vist G, Brozek J, Norris S, Falck-Ytter Y, Glasziou P, DeBeer H, Jaeschke R, Rind D, Meerpohl J, Dahm P, Schünemann HJ. GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables. J Clin Epidemiol. 2011;64:383–394.PubMedCrossRefGoogle Scholar
- 8.Higgins J, Altman DG, Sterne JA; Cochrane Statistical Methods Group and the Cochrane Bias Methods Group. Assessing risk of bias in included studies. In: Higgins JP, Green S, eds. Cochrane Handbook for Systematic Reviews of Interventions 5.1.0 [updated September 2011]. Available at: http://handbook.cochrane.org/. Accessed March 23, 2015.
- 12.Jansen JP, Fleurence R, Devine B, Itzler R, Barrett A, Hawkins N, Lee K, Boersma C, Annemans L, Cappelleri JC. Interpreting indirect treatment comparisons and network meta-analysis for health-care decision making: report of the ISPOR Task Force on Indirect Treatment Comparisons Good Research Practices: part 1. Value Health. 2011;14:417–428.PubMedCrossRefGoogle Scholar
- 13.Lee P. Bayesian Statistics: An Introduction. 4th ed.Chichester, United Kingdom: John Wiley & Sons, Ltd; 2012.Google Scholar
- 14.Lewis RA, Williams NH, Sutton AJ, Burton K, Din NU, Matar HE, Hendry M, Phillips CJ, Nafees S, Fitzsimmons D, Rickard I, Wilkinson C. Comparative clinical effectiveness of management strategies for sciatica: systematic review and network meta-analyses. Spine J. 2013 Oct 4. [Epub ahead of print] doi: 10.1016/j.spinee.2013.08.049.
- 19.Moher D Liberati A Tetzlaff J Altman DG; PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 2009 Jul 21. [Epub ahead of print].Google Scholar
- 21.Murad MH, Montori VM, Ioannidis JP, Jaeschke R, Devereaux PJ, Prasad K, Neumann I, Carrasco-Labra A, Agoritsas T, Hatala R, Meade MO, Wyer P, Cook DJ, Guyatt G. How to read a systematic review and meta-analysis and apply the results to patient care: users’ guides to the medical literature. JAMA. 2014;312:171–179.PubMedCrossRefGoogle Scholar
- 22.Norman GR, Streiner DL. Biostatistics: The Bare Essentials. 3rd ed. Hamilton, Canada:BC Decker Inc; 2008.Google Scholar