Background

Low back pain (LBP) is a global health problem; it is the leading cause of disability in 126 out of 195 countries according to the Global Burden of Disease Study 2017 [1]. Non-specific LBP represents over 90% of cases, in which a pathoanatomical source of pain cannot be reliably identified [2, 3], and is classified by the duration of pain: acute (fewer than 6 weeks), sub-acute (6 to 12 weeks), and chronic (greater than 12 weeks) [4, 5]. The estimated lifetime prevalence of LBP is up to 80%, meaning many adults will experience an episode of LBP at least once [6, 7]. Following onset, most people experience a reduction in pain within the first 6 weeks [8]. However, only 60% are fully recovered at 12 weeks and 70% will experience a recurrence of LBP within 12 months of recovery [9, 10]. Approximately 20% of the global population live with chronic LBP at any time [7].

Despite recommendations for non-pharmacological interventions to play a greater role in the management of LBP [11], analgesic medicines are the most commonly prescribed intervention across a range of primary care settings (e.g. general practice, emergency department) [12,13,14,15,16,17,18]. Analgesic medicines include non-steroidal anti-inflammatory drugs (NSAIDs), opioids, paracetamol (acetaminophen), anti-convulsants, anti-depressants, muscle relaxants, and corticosteroids. These medicines are used in a range of countries [13,14,15,16, 19, 20], but there appear to be no common patterns to prescribing [20]. For example, muscle relaxant medicines are commonly prescribed in the USA and Italy [14, 15, 19], but seldom in Australia [12].

People with LBP (and clinicians) want to know the most effective medicine for their condition [21,22,23,24]. This requires information about comparative effectiveness—the effect of a medicine compared to other medicines—to be available for clinical decision making. Comparative effectiveness has been insufficiently described in syntheses of the literature to date [25,26,27,28,29,30,31,32,33,34,35]. This is understandable, as most of these systematic reviews [28, 29, 31, 32] investigated a single comparison, usually efficacy, the effect of a medicine compared to sham. Several of these reviews [25, 35] additionally examined a limited number of effectiveness comparisons. No quantitative synthesis was made of these data, which was appropriate because methods for single comparisons should not be used across multiple comparisons. Comparative effectiveness data that are limited and lack synthesis have low utility for clinical decision-making.

Network meta-analysis (NMA) can be used to synthesise data across multiple comparisons. NMA provides valid estimates of comparative effectiveness by fitting a single statistical model to a connected network of interventions when there is confidence in the assumptions of transitivity and coherence (see the ‘Assumption of transitivity’ and ‘Assessment of network heterogeneity and coherence’ sections) [36,37,38,39]. The results are relative effect estimates for each comparison between medicines of interest. These data are applicable to decision scenarios if adults with LBP, clinicians, and policy makers are determining which of several medicines should be used. The relative effect estimates can be used to rank interventions based on their effect on relevant outcomes, which may also assist clinical decision-making. Therefore, this NMA will evaluate the comparative effectiveness of analgesic medicines for adults with LBP.

Objectives

The objectives of this review are to:

  1. 1)

    Determine the effect on pain intensity, safety, acceptability, and effect on function of a single course of [an] analgesic medicine(s) or combination of these medicines.

  2. 2)

    Determine a relative rank for each intervention according to its effect on pain intensity, safety, acceptability, and effect on function.

Methods

This protocol follows the Preferred Reporting Items for Systematic Reviews and Meta-Analysis Protocols (PRISMA-P) guidelines [40, 41], provided as Additional file 1. The review was registered with the International Prospective Register of Systematic Reviews on 31 October 2019 (PROSPERO ID: CRD42019145257).

Eligibility criteria for inclusion in the review

Study design

We will include randomised trials (RCTs) that provide at least one comparison between two interventions of interest (see the ‘Interventions’ section). There are no restrictions on language or publication status—we will include unpublished data because it may meaningfully alter the relative effectiveness of a medicine [42, 43]. We will include data from parallel group trials. We will also include data from the first phase of crossover RCTs because of the lack of relative stability in recent-onset LBP and known carry over effects in different classes of analgesic medicines (e.g. anti-depressants) [44, 45]. This approach has been used in published NMAs [36, 46, 47]. We will exclude cluster RCTs.

We will exclude enriched-enrolment RCTs. Although the single-arm run-in phase of enriched-enrolment trials might provide information on adverse effects, the NMA will not include data from uncontrolled trials. Additionally, the double-blind/randomised phase of enriched-enrolment RCTs has questionable external validity [48] and is limited in detecting adverse effects [48, 49].

Participants

We will include studies that randomised adults with non-specific LBP, defined as a primary area of pain between the twelfth rib and gluteal fold, with or without associated leg pain [4, 5]. We will consider three different pain durations, which we will analyse separately: acute (fewer than 6 weeks), sub-acute (6 to 12 weeks), and chronic (greater than 12 weeks) [4]. We will include studies that randomised participants with heterogeneous pain conditions if separate data is obtainable for the participants with LBP. Participants may be experienced or naïve to the trial intervention, which we will assess for the evaluation of transitivity (see the ‘Assumption of transitivity’ section). We will exclude interventional (surgical/operative) settings and studies where greater than 20% of participants have leg pain that meets the definition of sciatica according to Koes et al. [50], where patients have unilateral leg pain greater than LBP radiating to the foot or toes with associated neurological indications, or where LBP is attributable to a specific pathology, such as infection, neoplasm, metastasis, inflammatory disease, or fractures.

Interventions

We will include medicines from the following classes: NSAIDs, paracetamol (acetaminophen), opioid analgesics, anti-convulsants, anti-depressants, muscle relaxant medicines, and corticosteroids. These medicines must be listed on the World Health Organization (WHO) Anatomical Therapeutic Chemical (ATC) system and licensed for current use by at least one of the following agencies: US Food and Drug Administration (FDA), the Australian Therapeutic Goods Administration (TGA), the European Medicines Agency (EMA), or the Medicine and Healthcare Products Regulatory Agency (MHRA). These interventions of interest are listed in Additional file 2 with their ATC codes and licensing status. We will include additional analgesic medicines from these classes, identified during the review, provided they are licensed by one of the above agencies. Medicines may be delivered as mono or combination therapy via any systemic route of administration (e.g. oral, intravenous, intramuscular, buccal, sublingual). We will exclude non-systemic administration (such as topical, intraarticular, or epidural administration). We will not exclude studies that assign non-pharmacological co-interventions to one or more of the intervention arms. We will consider these studies in the evaluation of transitivity [36].

We will represent each intervention of interest as a separate node in a network of all possible comparisons between interventions (not shown due to visual complexity). We define the placebo/sham intervention node as any drug intervention that does not contain an active analgesic ingredient (including ‘active’ placebo arms). We consider that no-treatment includes continuation of usual care or being placed on a waitlist. We will further define nodes according to route of administration, which means that a single drug may be represented by multiple nodes in the network (e.g. oral diclofenac and intravenous diclofenac are separate nodes).

We will classify the dose of each medicine as either (i) standard dosing range (SDR), (ii) below the SDR, or (iii) above the SDR. We will source the relevant SDR for each medicine using the following hierarchy: Prescriber’s Digital Reference [51], MIMS [52], or the Australian Medicines Handbook [53]. If the SDR is not provided by any of these sources, we will use the medicine’s licensed dosing range (LDR). Typically, the LDR is equivalent to the SDR used in clinical practice, although the SDR may be lower than the LDR. We will identify a medicine’s LDR using the following hierarchy: FDA, MHRA, EMA, or TGA. Where a study contains two or more intervention arms within the same dosing range, we will combine these arms using formulae in the Cochrane Handbook for Systematic Reviews of Interventions [54].

Outcomes

The primary outcomes for this review are pain and safety.

  • Pain is defined as pain intensity, measured at the time point closest to the end of treatment. Pain intensity may be measured with a continuous self-report scale (e.g. visual analogue scale (VAS) or numeric rating scale (NRS)), a rating scale within a composite measure of pain (e.g. McGill Pain Questionnaire), or an ordinal scale (we will consider such ordinal scales to exhibit continuous properties). We will not exclude studies that use other measurement tools.

  • Safety is the number of participants who experience an adverse effect during the treatment period. We define adverse effects according to the US FDA, as ‘any untoward medical occurrence associated with the use of a drug in humans, whether or not considered drug related’ is considered an adverse effect (55). We will consider ‘adverse event’, ‘adverse drug reaction’, ‘side effect’, ‘toxic effect’, or ‘complication’ as indicating adverse effect. No change or an increase in pain intensity is not considered an adverse effect.

The secondary outcomes are function, serious adverse events, and acceptability:

  • Function, defined as low back specific function, measured at the end of treatment, may be measured with a continuous, self-report scale (e.g. Roland Morris Disability Questionnaire (RMDQ) or Oswestry Disability Index (ODI)), a rating scale within a composite measure (e.g. Short Form-36), or an ordinal scale (we will consider such ordinal scales to exhibit continuous properties). We will not exclude studies that use other measurement tools.

  • We will also compare the number of participants who experience a serious adverse effect during the treatment period using the US FDA classification [55, 56], where a serious adverse effect: ‘results in death; is life threatening; requires inpatient hospitalisation or causes prolongation of existing hospitalisation; results in persistent or significant disability/incapacity; may have caused a congenital anomaly/birth defect; or requires intervention to prevent permanent impairment or damage’. A consistent definition of serious adverse effect ensures the different medicines can be compared across the network.

  • Acceptability is defined as the number of participants who leave the trial for any reason before the end of treatment [47].

Search strategy and study selection

We will search the following electronic databases from inception to current:

  • MEDLINE (Ovid) (1946 to current)

  • EMBASE (Ovid) (1980 to current)

  • CINAHL (EBSCO) (1982 to current)

  • Cochrane Central Register of Controlled Trials (CENTRAL) in the Cochrane Library, current issue

  • ClinicalTrials.gov (ClinicalTrials.gov/ct2/home)

  • EU Clinical Trials Register (eudract.ema.europa.eu)

  • WHO International Clinical Trial Registry Platform (apps.who.int/trialsearch/Default.aspx)

Our search strategies incorporate the recommended strategies from the Cochrane Back and Neck Group (4) to identify randomised trials of low back pain and terms for the interventions of interest [36]. The search strategy for MEDLINE is listed in Additional file 3. We will also search previous systematic reviews and the reference lists of included studies to identify any additional trials. Records identified through all searches will be downloaded and managed in a custom relational database.

We will conduct record screening in Covidence systematic review software [57]. We will conduct two stages of screening: (i) title and abstract and (ii) full text. Two reviewers will independently screen studies for eligibility at each stage. Disagreements will be resolved through discussion, with arbitration from a third author (JHM) if required. We will contact a study’s corresponding author up to three times to obtain additional information to determine eligibility, and if no reply is received, we will exclude the study from this iteration of the review. Studies in languages other than English will be translated. We will summarise the literature search using an adapted PRISMA flow diagram [58].

Record management

We will manage the included records in the relational database. We will conduct record linkage to establish unique studies for data extraction, which may consist of multiple records. We will search for the protocols and trial registrations of all included trials. We will use the following hierarchy to prioritise records for data extraction: (i) primary report (typically the journal article reporting the results of the primary analysis of the trial), (ii) secondary report (secondary analysis of the trial), (iii) conference abstract (a report of a secondary analysis), (iv) trial registration, (v) other secondary records, and (vi) other conference abstracts.

Data extraction

Two reviewers will independently extract and enter data from included trials into standardised spreadsheets. Review authors will not extract data from any trial in which they have had any involvement. Data will be taken from previous reviews conducted by the authors when possible. Disagreements between reviewers will be resolved through discussion, with arbitration from a third author (JHM) if required. We will not extract data from interventions that do not meet the eligibility criteria for this review.

We will extract data on:

Trial characteristics: country, setting, and number of trial sites; sample size; and study duration.

Participants: diagnosis, duration of LBP, age, male/female ratio, arm-level pain intensity at baseline (as mean (standard deviation [SD])), experience or naivety with the trial intervention, and co-morbidities, including alternate sites of pain.

Interventions: medicine(s) tested, control; duration of intervention; dosage regimen; routes of administration; and usage of rescue medication.

Outcomes: type and dimensions of the scale/measure used to assess pain or function and the time from randomisation at which the end of treatment data were obtained in individual trials. We will extract the definition of ‘adverse effect’ and ‘serious adverse effect’ used in each study. We will extract data on study results including participant allocation to each intervention group; compliance to the intervention (including the definition of compliance); the number of participants who discontinued due to an adverse event; the event rate and descriptions of all reported adverse effects; and pain intensity and function at the completion of treatment.

If studies report more than one measure for pain, we will prioritise extraction in the following order: 100 mm VAS, 10 cm VAS, 11-point NRS, rating scale for pain intensity from a composite measure of pain (e.g. McGill Pain Questionnaire), and ordinal scale. We will preferentially extract the outcome score and measure of variance at the end of treatment (or closest time point) for each group, followed by the change from baseline and measure of variance. If data are not available for each trial arm, we will extract the between-group statistics at the end of treatment.

If studies report more than one measure for function, we will prioritise extraction in the following order: ODI, RMDQ, rating scale for functional ability from a composite measure, and ordinal scale. We will preferentially extract the outcome score and measure of variance at the end of treatment (or closest time point) for each group, followed by the change from baseline and measure of variance. If data are not available for each trial arm, we will extract the between-group statistics at the end of treatment.

Missing data

We will contact a trial’s corresponding author up to three times via email to request missing data, which will be considered unobtainable if no reply is received within 6 weeks. If data for outcomes of pain and function are not presented in an appropriate form for meta-analysis (such as median and range instead of SDs, standard errors, t-statistics, or p values), we will attempt to impute these using established methods [54, 59]. We will conduct sensitivity analyses for pain at the end of treatment and safety if we impute missing data for either of these outcomes.

Risk of bias

We will appraise each trial’s risk of bias using the Cochrane ‘Risk of bias’ tool, version 5.1 [54] and recommendations by Furlan et al. [4]. Two reviewers will independently appraise trial-level risk of bias for 13 items across the domains of selection, performance, attrition, detection, reporting, and other sources of bias. If an item is typically rated at outcome level, which may differ between our two primary outcomes (pain intensity and safety), we will use the more conservative rating (e.g. using high risk over unclear risk). Review authors will not appraise risk of bias for any trial in which they have had any involvement (e.g. trial investigator). Risk of bias assessments will be taken from previous reviews of analgesic medicines conducted by our author team, where the same approach was used.

We will determine an overall risk of bias for each trial by adapting the process from Furukawa et al. [47]: low overall risk is determined when three or fewer items are rated ‘unclear’ risk and no domains are rated ‘high’; moderate overall risk is determined if a single item is rated as ‘high’ risk of bias, or no item is rated as ‘high’ risk but four or more are rated as ‘unclear’; and high overall risk otherwise.

Data synthesis

We will perform separate analyses for the three classifications of pain duration: acute (fewer than 6 weeks), sub-acute (6 to 12 weeks), and chronic (greater than 12 weeks).

Summary of the network

Within each classification, we will present descriptive statistics for each included trial, including the comparison(s) and a clinical/methodological summary (e.g. year of publication, sponsorship, clinical setting). We will represent the network of trials in a network graph; the size of the node will reflect the total number of participants, the width of each edge will reflect the number of studies presenting direct evidence for the comparison, and the colour of each edge will represent the overall risk of bias (see the ‘Risk of bias’ section).

Pairwise comparisons

We will synthesise the data for each comparison using pairwise random-effects meta-analysis in R [60]. We will compare the effects of competing interventions on pain and function using mean differences (MD) with 95% confidence intervals (CIs) and on safety and acceptability using odds ratios (OR) with 95% CIs. For pain intensity and function, we will convert outcome data to common 0- to 100-point scales (mean (SD)), which has been used in reviews of analgesic medicines to enable greater clinical translation of results [25, 29, 33, 36]. This approach is based on evidence that measurement scales within the constructs of pain intensity and function are highly correlated [61, 62] and enables different types of study data to be pooled (e.g. endpoint, change from baseline).

We assume that the heterogeneity variance is different for each comparison in pairwise meta-analyses and will estimate this parameter (τ2) for each comparison. We will also test for the presence of statistical heterogeneity within each comparison using the Q statistic. We will calculate 95% prediction intervals and consider intervals spanning greater than 15 points (on a 0- to 100-point scale) on either side to indicate important heterogeneity [36]. We will visually inspect the distribution of effect sizes in the forest plots for each comparison and consider an I2 value greater than 50% indicative of important variability across studies that is not due to sampling error [63, 64].

Assumption of transitivity

Transitivity is the key assumption underlying the valid estimation of effects for indirect comparisons in NMA [38, 39]. Transitivity is the assumption that the distributions of effect modifiers (covariates associated with intervention effects) are balanced across comparisons in the network [39, 65]. Given the lack of evidence for robust effect modifiers in LBP trials [66], we have used clinical and methodological experience to identify the following potential effect modifiers:

  • Baseline mean pain intensity (continuous variable)

  • Assigned co-interventions (dichotomous variable), categorised as (i) yes or (ii) no

  • Small sample size [67] (dichotomous variable, categorised as (i) total sample fewer than 50 participants and (ii) total sample greater than 50 participants)

  • Experience with test medicine (dichotomous variable), categorised as (i) yes or (ii) no

  • Naivety to test medicine (dichotomous variable), categorised as (i) yes or (ii) no

  • Dose of medicine (trichotomous variable), categorised as (i) SDR, (ii) below the SDR, or (iii) above the SDR

We will represent the distribution of these effect modifiers in a range of covariate contribution plots [68]. The authors will visually assess the distributions of effect modifiers across all treatment comparisons in the network and determine by consensus whether there is sufficient dissimilarity between comparisons in the network to threaten the assumption of transitivity. We will explore the influence of effect modifiers that demonstrate dissimilarity on incoherence/heterogeneity using network meta-regression or subgroup analyses (or both). We will consider not proceeding with NMA, or altering the network structure, if we observe considerable dissimilarity. We anticipate that insufficient reporting of effect modifiers and pairwise comparisons containing few studies will limit the assessment of transitivity [38].

Network meta-analysis

We will perform random-effects NMA within an electrical network and graph theory framework using the netmeta package in R [69, 70]. We will account for multi-arm trials using the weighting method based on back-calculating variances (using the Laplacian matrix and its pseudoinverse) [70]. We will use MDs on a common 0 to 100-point scale for pain and function and ORs for safety and acceptability. We will present the results for each intervention compared to placebo in NMA forest plots for each outcome. We will consider a 10-point difference to constitute the minimal clinically important difference for pain intensity and function [26].

We will rank the effect of all interventions on pain intensity and safety using P-scores, a method for estimating treatment rankings that does not require re-sampling methods [71]. We will present a contributions matrix to indicate the weighting of direct evidence contributions to each NMA effect size, which will also be used to evaluate the confidence in the overall evidence (see the ‘Confidence in cumulative evidence’ section) [72]. We will illustrate the contribution of each design to an NMA effect size using net heat plots [73].

Assessment of network heterogeneity and coherence

We will assume a common heterogeneity variance across the network [74]. We will present the estimate for this parameter (τ2network) from the NMA models and the estimated proportion of variability across the entire network that is not due to sampling error (I2network). We will estimate the Q statistics for total network heterogeneity (Qtotal), heterogeneity within designs (Qwithin), and heterogeneity between designs (Qbetween), where designs constitute the individual elements of the set of trial designs.

Coherence is a property of closed loops of evidence, whereby it reflects the agreement between direct and indirect treatment effects [38]. We will evaluate coherence across the entire network using the Q statistics (above); the decomposed Qwithin and Qbetween, an alternative estimate for Qbetween using the ‘design-by-treatment’ interaction model [75, 76]; and the Separating Indirect from Direct Evidence (SIDE; aka node-splitting) approach [77]. We will illustrate the extent of incoherence across the network using net heat plots [73]. We will illustrate local coherence estimates using forest plots grouped into direct and indirect estimates for each available comparison. We will form judgements about important incoherence using all the measures of global and local heterogeneity and coherence.

If we encounter important incoherence, we will examine the dataset for data extraction errors and explore the observed incoherence using pre-specified covariates in network meta-regression and subgroup analyses, provided sufficient studies are available. We may consider not proceeding with NMA if important unexplained incoherence remains. This judgement will involve the clinical and methodological evaluation of transitivity, the approaches to identify incoherence, and the knowledge that small amounts of incoherence may be due to chance [78].

Meta-regression

We aim to perform random-effects network meta-regression within a Bayesian hierarchical framework using the gemtc package in R [79]. The gemtc package will automatically determine uninformative prior distributions for all parameters in our model [80], which are commonly applied in NMA [47, 81]. We will run the Markov Chain Monte Carlo simulation with four chains for each model, using 100,000 iterations, a burn-in of 5000 iterations and extraction of every 10th value. We will assess convergence with the Gelman-Rubin-Brooks plot and Potential Scale Reduction Factor (a threshold of < 1.05 indicates adequate convergence).

We will investigate baseline pain intensity (continuous variable) and sample size (dichotomous variable) as possible sources of incoherence or heterogeneity by default. We will specify our assumptions (common or exchangeable covariate-comparison interaction) for each network meta-regression model once the data are available, so as to make best use of the available data [82,83,84]. We assume coherent relative treatment effects estimated at the covariate value 0 and coherent regression coefficients for the treatment effect by covariate interaction [84, 85]. We hypothesise that:

  • Increasing baseline pain intensity increases the effect size between intervention and comparator.

  • Increasing sample size reduces the effect size between intervention and comparator.

We may also investigate clinical or methodological factors identified during the review process that may threaten transitivity as sources of incoherence or heterogeneity, or both [86].

Sensitivity analysis

We will conduct sensitivity analyses on pain and safety excluding studies at high risk of bias, provided the original network structure remains the same. We will also conduct sensitivity analyses on pain and safety excluding doses above or below the SDR, provided the original network structure remains the same. If sufficient data are not available for network meta-regression baseline pain intensity or sample size, we will conduct sensitivity analyses by removing trials with baseline mean pain intensity higher than 70/100 (VAS) or sample size less than 50, respectively. We will also conduct sensitivity analyses for pain at end of treatment and safety if we impute missing data for either of these outcomes. We will consider the effects in the sensitivity analyses to be important when their interpretation differs compared to the primary analysis, for example, if a statistically significant effect becomes non-significant.

Meta-bias(es)

We will assess small study effects in pairwise comparisons using comparison-adjusted [82] and contour-enhanced [87] funnel plots when there are at least 10 studies available. Such plots assist interpretation of asymmetry that is due to publication bias rather than other factors, such as lesser methodological quality. We will interpret an absence of studies in areas of non-significance as suggestive of publication bias for that pairwise comparison.

Confidence in cumulative evidence

We will form judgements of confidence in the effect estimates and rankings for pain, safety, function, and acceptability using the Confidence in Network Meta-Analysis (CINeMA) web application [88, 89]. CINeMA considers six domains: within-study bias, reporting bias, indirectness, imprecision, heterogeneity, and incoherence (Additional file 4) [72]. Initially, judgements will be rated as ‘high’ because all included trials will be RCTs.

Summary of findings

We will present the results of the NMA in adapted ‘Summary of findings’ tables for pain and safety [90, 91]. The tables will contain details of the clinical question, a network geometry plot, relative and absolute effect estimates, certainty of evidence, ranking of treatments, and interpretation of findings.