Spinal manipulative therapy in older adults with chronic low back pain: an individual participant data meta-analysis

Many systematic reviews have reported on the effectiveness of spinal manipulative therapy (SMT) for low back pain (LBP) in adults. Much less is known about the older population regarding the effects of SMT. To assess the effects of SMT on pain and function in older adults with chronic LBP in an individual participant data (IPD) meta-analysis. Electronic databases from 2000 until June 2020, and reference lists of eligible trials and related reviews. Randomized controlled trials (RCTs) which examined the effects of SMT in adults with chronic LBP compared to interventions recommended in international LBP guidelines. Authors of trials eligible for our IPD meta-analysis were contacted to share data. Two review authors conducted a risk of bias assessment. Primary results were examined in a one-stage mixed model, and a two-stage analysis was conducted in order to confirm findings. Pain and functional status examined at 4, 13, 26, and 52 weeks. 10 studies were retrieved, including 786 individuals, of which 261 were between 65 and 91 years of age. There is moderate-quality evidence that SMT results in similar outcomes at 4 weeks (pain: mean difference [MD] − 2.56, 95% confidence interval [CI] − 5.78 to 0.66; functional status: standardized mean difference [SMD] − 0.18, 95% CI − 0.41 to 0.05). Second-stage and sensitivity analysis confirmed these findings. SMT provides similar outcomes to recommended interventions for pain and functional status in the older adult with chronic LBP. SMT should be considered a treatment for this patient population.


3
LBP, such as complementary health approaches [10]. Finding safe and effective treatments for the older adult with LBP should be a priority. One such non-pharmacological approach is spinal manipulative therapy (SMT), which is a technique used worldwide by a variety of healthcare providers, such as chiropractors, osteopaths, and physiotherapists.
Many systematic reviews and meta-analyses have analyzed the effects of SMT [11]. Their results suggest that it is an effective intervention for both the reduction in pain and the improvement in function, two of the core domains in LBP trials [12]. Systematic reviews examining the effectiveness of various non-pharmacological treatments in older adults with LBP [13,14] identified only three studies that assessed the effect of SMT. Two of the three trials were included in this analysis, and the third trial was excluded due to average age below 55. Therefore, it is difficult to draw valid conclusions for this patient population due to the lack of trials.
Given this, one approach to examine the effectiveness of SMT in older adults with LBP is to perform individual participant data (IPD) meta-analysis. This type of analysis has distinct advantages over traditional aggregate meta-analysis. In an IPD meta-analysis, we can select certain individuals since we have the individual data for each participant. This is more efficient than setting up new trials, particularly if the data are sufficient in order to allow for a meaningful analysis. Additional advantages of IPD include allowing the investigator to analyze the data independently of how the data were reported in the original publication. This is in contrast to the traditional aggregate approach in which meta-analyses extract data at the study level, meaning that the author(s) of the review must rely on how the data were analyzed and presented originally. Additionally, IPD makes it possible to correct for baseline covariates which may influence the results, enabling a more precise, and thereby potentially more valid, calculation of the effect estimates [15].
In short, LBP is a common cause of disability in the older adult [16], and our current knowledge of LBP in this patient group is limited [17]. The objective of this IPD meta-analysis is to assess the effectiveness of SMT versus interventions recommended by the guidelines at 1-, 3-, 6-and 12-month follow-up in older adults with chronic LBP.

Methods
This study was conducted according to the Preferred Reporting Items of Systematic Reviews and Meta-Analyses for IPD (PRISMA-IPD) guidelines [18] (Appendix 1).

IPD database
A detailed description of the IPD database design and the procedures followed was published previously [19]. The protocol [19] for the original study upon which this analysis is based was registered with PROSPERO (https:// www. crd. york. ac. uk/ prosp ero/ 25714). The database includes the raw data from 21 RCTs, which were published between 2000 and April 2018 [11], examining the effects of SMT on chronic LBP. This study used the IPD database defined above and represents a secondary analysis as defined a priori in PROS-PERO. We updated the search from April 2018 to June 2020 identifying one trial that met our inclusion criteria [20].

Study selection
Trials examining the effects of SMT versus recommended therapies in the older age-group with chronic low back pain were included.
Inclusion criteria: Patients with chronic LBP with or without leg pain, defined as LBP of > 12 weeks of duration and not attributable to a recognizable, known specific pathology (e.g., infection, fracture, tumor, radicular syndrome, or herniation) were included. Additionally, trials from primary or secondary care settings were included. When a mixed population was involved (e.g., subacute and chronic), only those participants with > 12 weeks of LBP were included. For this IPD meta-analysis, we selected only those trials that had included participants aged 55 and older.
Exclusion criteria: Studies were excluded if they: (1) used an inadequate randomization procedure (e.g., alternate allocation, allocation based on birth date); (2) included patients with LBP and other conditions, such as pregnancy or postoperative patients; (3) tested the immediate effect of a single treatment only; (4) compared the effects of a multimodal therapy including SMT to another therapy or any other study design whereby the contribution of SMT could not be isolated; and (5) included patients where there was a contraindication to SMT.
Comparison: We addressed the effects on pain and functional status of SMT versus interventions (e.g., exercise therapy, usual care) that are consistently recommended in international guidelines [21][22][23][24], while SMT is not [25]. The determination for recommended therapy was based on Rubinstein et al. [11]. We categorized an intervention into 'recommended' when this was consistently stated in at least two of the guidelines.

Types of outcome measures
Primary outcomes were pain and back-specific functional status, as recommended in the core set of outcome measurements in LBP [12].

Risk of bias assessment
The 13 risks of bias criteria recommended by the Cochrane Back and Neck group were used [26] (Appendix 2). These 13 criteria are used to identify selection bias, performance bias, attrition bias, detection bias, and selective outcome reporting bias.
Data extracted were study characteristics, patient characteristics, types of outcomes, duration of follow-up, and descriptions of experimental and control interventions.

Preparing data for analyses
The original data were compared with the published data to check for completeness. All variables were then harmonized in a data harmonization platform developed for a previous IPD analysis [27].
All outcomes were pooled following a decision rule (Appendix 3). All pain scores were converted to a pain scale (range 0-100 where a higher score indicates more pain) following a decision rule. To allow pooling of different functional status measures, we recoded the individual scores into Z-scores for each separate time point using pooled standard deviations as the nominator Z score = x i − x SD . Analyzing these Z-scores resulted in standardized mean differences (SMDs). To ease interpretation of SMDs, we converted these to a mean difference (MD) for the 24-point Roland Morris Disability Questionnaire (RMDQ), by multiplying the SMD with the population standard deviation (SD) of the studies (ni−1) * S 2 (n i −1) ; n i = sample size for each trial; S = standard deviation for each trial).

Data analysis and synthesis
All analyses were based on the intention-to-treat principle. Our primary analyses consisted of one-stage IPD meta-analysis at 4, 13, 26, and 52 weeks of follow-up. We chose these specific intervals as they are standard follow-up moments in the treatment of LBP [11]. We did not examine the effects of SMT post-intervention as there were large inter-study variations in the number and frequency of treatments and, consequently, the duration of therapies and follow-up data for the period immediately following the end of treatment. Lastly, longitudinal analyses were not performed as the models are too complex and do not converge.
Analyses were conducted using a random-effects model that was adjusted for baseline using the restricted maximum likelihood (REML) method, where a separate intercept and a separate residual variance for each study are specified. However, in most analyses, these models did not demonstrate convergence. Instead, we present the results adjusted for baseline and with a random intercept and common residual variance [28].
The pooled treatment effects of SMT were estimated with a mean difference or Z-score for continuous outcomes, including the 95% confidence interval (CI).

Sensitivity and subgroup analyses
In order to examine whether the RCTs included in this IPD meta-analysis were a representative sample of all RCTs assessing the effects of SMT in older adult patients, we conducted a two-stage sensitivity analysis wherein we examined the effect sizes of RCTs both included in this IPD metaanalysis and those which were eligible for inclusion, but for which no IPD were available. For the latter, we used published aggregate data of those eligible trials that had an average age above 55.
A post hoc sensitivity analysis was performed as the onestage and two-stage estimates at 4 weeks for the outcome pain were not similar.
Lastly, we performed a moderator analysis for age. Age was dichotomized into 55-64 and 65 years and older. This moderator was analyzed using a one-stage random-effect IPD meta-analysis or get rid of 'a' before one-stage. The baseline outcome, treatment, age, and interaction between treatment and age were included as fixed effects. Study-specific intercepts were also included as fixed effects. Random treatment and interaction effects were added to the model. We performed these analyses for each time point and age separately to facilitate convergence of models. Centering the patient-level covariates about their study-specific means enabled us to separate the within-and across-study interactions [28]. The within-study interaction explained the patient-level variation in treatment response, while the across-study interaction represented the age effect on study level. We present the within-study interactions. A negative interaction coefficient indicates a more positive or less negative estimate of the intervention effects of SMT vs comparison for the group 65 years and older compared to 55-64 years old.
We refrained from presenting stratified results for subgroups of moderator variables, because these included a combination of within-and across-study information due to differences in proportions of persons within the separate subgroups between studies.

Synthesis of evidence
The overall quality of the evidence for each outcome was evaluated using the GRADE approach [29] (Appendix 4), and assessment of clinical relevance was defined as small, medium, or large effect [26,30]. Results from meta-epidemiological studies suggest that selection bias (i.e., randomization) and performance bias (i.e., blinding) are perhaps the more important forms of bias which influence treatment effects [31]; therefore, we focused on these two aspects when considering 'limitations' as part of the GRADE process.

Results
In total, of the 21 trials in the IPD database, ten RCTs met the inclusion criteria, all of which provided data for the primary analysis [32][33][34][35][36][37][38][39][40][41] (Table 1) (Fig. 1). One trial [42] that did not provide data and had an average age above 55 was used in the second-stage analysis. In total, 786 participants aged ≥ 55 years were examined (403 were randomized to the SMT group and 383 were randomized to the comparison group) ( Table 2). Two studies [20,42] fit the inclusion criteria but did not provide individual data. (Table 3) Their aggregate published results were used in the two-stage analyses (Table 4) as they had an average age above 55. We identified 261 participants (from a total 786) older than 65 years of age originating from seven studies [32][33][34][35][36][37]40], representing a third of all cases. Of the 261 patients, three quarters of them came from three studies [33,34,36] and were evenly distributed between treatment arms.

Description of studies
Of the ten RCTs, nine compared SMT to exercise therapy [32][33][34][35][36][37][38][39]41] and one evaluated the effects of SMT compared to standard medical care [40] (consisting of drug and nondrug therapies). The included trials varied with respect to recruitment method, type of SMT technique, number and duration of treatments, and type of practitioner (Table 1).
Sample sizes ranged from 5 to 220 (median = 78.6; interquartile range [IQR] = 16-132). It should be noted that some trials had multiple arms, and some included non-chronic LBP patients; therefore, the sample size for a given comparison should be considered to be smaller.
The patient characteristics at baseline for SMT versus recommended interventions are presented in Table 2. The average age of all participants was 63 years (standard deviation [SD] 6.7), and slightly more than half (58.4%) were women.

Risk of bias
Approximately 80% of the studies (n = 8/10) reported an adequate random sequence generation and allocation concealment [32,33,35,36,38,40,41]. Two trials provided an adequate overview of withdrawals or dropouts and were able to keep these to a minimum for the subsequent followup measurements [34,40].
Missing data for primary outcomes ranged from 12% at 4 weeks to 21% at 52 weeks.

Effects of SMT vs recommended interventions
Pain and function improved by the end of treatment, and this improvement was sustained up to 12 months after randomization for all groups (Table 3).

Pain
There is moderate quality evidence that SMT has similar benefits to recommended interventions at all time points for pain ( Table 3). The mean difference (MD) for SMT compared to recommended interventions is − 2.56 (95% CI − 5.78 to 0.66; scale 0-100) after 1 month, and these effects appear similar over the subsequent 12 months (Table 4). Further analysis on the group of patients 65 and older showed similar effects − 2.46 (95% CI − 7.41 to 2.48; scale 0-100) after 1 month and appear similar over the subsequent 12 months (Appendix 5).

Functional status
There is moderate quality evidence that SMT has similar benefits to recommended interventions at all time points for functional status ( Table 3)

Sensitivity analyses and subgroup analysis
We identified two trials to be included in the two-stage analysis, one from the original systematic review [42] and the other from our updated search [20]. We included the aggregate results of these studies in the second-stage analysis after going through a risk of bias assessment. The two-stage analysis showed a MD similar to the one-stage analysis except for pain at 4 weeks ( Table 4). The difference at 4 weeks was a result of two studies that included 5 patients, yet had a large effect on recommended therapies. The second-stage analysis confirmed the results of the onestage analysis at all time points, showing robustness of the effect in both analyses (Appendix 5). A subgroup analysis using age as a moderator showed similar results to a previous IPD [43], that age does moderate any effect of the treatment (Appendix 5) (Tables 5 and 6).

Discussion
These results suggest that SMT has similar effects to recommended interventions, mainly exercise therapy, at the short, intermediate, and long term. This is the first IPD meta-analysis to examine the effects of SMT in older adults with LBP, although admittedly, the majority of subjects (two-thirds) were between 55 and 65 years of age; therefore, these results should perhaps be interpreted with caution. However, if there were big differences in effects, this might have become intuitively obvious from this subgroup analysis [43,44]. Using age as a moderator also did not change the effects at all time points. The importance of these findings cannot be sufficiently underscored. Given the growing aging population and the burden of LBP, there is a need to provide safe, conservative treatments. These data provide support for the use of SMT in this population. These findings have important implications. The recent Lancet series [45] suggests that SMT should be considered a second treatment option, following the more commonly recommended treatments for chronic LBP (e.g., exercise). Our results suggest that SMT produces similar effects to other commonly recommended interventions for older patients with LBP. This is particularly pertinent because prior to this analysis, these effects were unclear. However, a note of caution is perhaps necessary because we did not examine adverse reactions in detail. These data were not registered in any systematic way in the individual studies and were   not directly available; therefore, uncommon and potentially serious adverse reactions cannot be ruled out. Importantly, our results appear consistent with recent systematic reviews using aggregate data on the effects of SMT for adults with LBP [11] as well as older adults [13,14]. An important difference of our IPD analysis compared to traditional aggregate meta-analyses is that we could adjust for the baseline pain and functional status and were not dependent  upon how these data were reported in the original publications. This adjustment increased the precision of our estimates compared to aggregate data meta-analyses, but did not lead to a different conclusion for the main effects. Adverse events were often not reported by trial authors, and when reported there was no uniformity in how this was done, particularly for older patients; therefore, these data do not provide more information than the adverse events described in our systematic review of aggregate data [11]. The adverse events which were reported are likely to be more serious events for which reporting was required, or were unrelated to SMT [20]. Nevertheless, there may be a theoretically increased risk with SMT which would need to be examined in future studies and compared to recommendations like exercise therapy (e.g., osteoporosis). In short, the risk of (major) adverse events is likely to be very low and may reflect adaptation by the therapist for this patient population.

Strengths and weaknesses
These results should be interpreted in light of a few strengths and limitations. The most important strength is that we included 786 patients from ten trials in our analysis. Furthermore, these patients came from 10 of the 11 trials that could have provided data, which minimized selection bias. Additionally, all trials provided data for pain and functional status for all the time points analyzed; lastly, the one-stage estimates were confirmed by the two-stage analysis, suggesting that our effects estimates were robust.

Study limitations
There are, however, some important limitations. Inclusion bias cannot be ruled out. We may have missed some important studies published after 2018. In order to determine whether this might be the case, we performed a cursory search of the literature in PubMed (up to June 2020). We identified 18 potential articles. Upon further analysis, 17 were excluded for various reasons, including younger age, lack of randomization, a protocol, or other type of comparison (e.g., SMT as adjuvant therapy). In short, only one study fulfilled the inclusion criteria which could have been included in an update. We analyzed that trial [20] in the two-stage analysis, and those results were consistent with the one-stage analysis.
Additionally, selection bias cannot be ruled out; 11 of the 21 studies identified in the search did not provide IPD data. Of those, three included patients older than 55 [46][47][48], but relatively few subjects would have been included because the average age was under 55 (SDs ranged from 12 to 15). One [42] trial included subjects with an average age over 55 and was examined in our second-stage analysis. Again, those results were consistent with our one-stage analysis. This suggests that our analysis is representative of all subjects that could have been included and, therefore, robust.

Implications for clinicians
SMT appears to be similarly effective to recommended therapies for reducing pain and improving function in older patients with chronic LBP, meaning SMT may be delivered as a stand-alone therapy. Future research should focus on identifying which older adults are best suited for SMT, taking lifestyle factors, comorbidities, and level of physical activity into account.

Conclusion
SMT is equally as effective as recommended interventions for the treatment of chronic low back pain in the older adult. Over three quarters of the data came from adults aged 55-64, yet sensitivity analysis in the second stage and using age as a moderator showed results were similar across all age-groups. Therefore, SMT should be considered a treatment option in this patient population. Describe the meta-analysis methods used to synthesize IPD. Specify any statistical methods and models used. Issues should include (but are not restricted to): Use of a one-stage or two-stage approach. How effect estimates were generated separately within each study and combined across studies (where applicable). Specification of one-stage models (where applicable) including how clustering of patients within studies was accounted for. Use of fixed or random effects models and any other model assumptions, such as proportional hazards. How (summary) survival curves were generated (where applicable). Methods for quantifying statistical heterogeneity (such as I 2 and τ 2 ). How studies providing IPD and not providing IPD were analyzed together (where applicable). How missing data within the IPD were dealt with (where applicable)

Completed
Exploration of variation in effects A2 If applicable, describe any methods used to explore variation in effects by study or participant level characteristics (such as estimation of interactions between effect and covariates). State all participant-level characteristics that were analyzed as potential effect modifiers, and whether these were pre-specified For each study, present information on key study and participant characteristics (such as description of interventions, numbers of participants, demographic data, unavailability of outcomes, funding source, and if applicable duration of follow-up). Provide (main) citations for each study. Where applicable, also report similar study characteristics for any studies not providing IPD Completed and Table 1 IPD integrity A3 Report any important issues identified in checking IPD or state that there were none There were none

Risk of bias within studies 19
Present data on risk of bias assessments. If applicable, describe whether data checking led to the up-weighting or down-weighting of these assessments. Consider how any potential bias impacts on the robustness of meta-analysis conclusions

Article completed
Results of individual studies 20 For each comparison and for each main outcome (benefit or harm), for each individual study report the number of eligible participants for which data were obtained and show simple summary data for each intervention group (including, where applicable, the number of events), effect estimates and confidence intervals. These may be tabulated or included on a forest plot Describe sources of funding and other support (such as supply of IPD), and the role in the systematic review of those providing such support Article completed page 10 A1-A3 denote new items that are additional to standard PRISMA items. A4 has been created as a result of re-arranging content of the standard PRISMA statement to suit the way that systematic review IPD meta-analyses are reported.
© Reproduced with permission of the PRISMA IPD Group, which encourages sharing and reuse for non-commercial purpose. 1 A random (unpredictable) assignment sequence. Examples of adequate methods are coin toss (for studies with 2 groups), rolling a dice (for studies with 2 or more groups), drawing of balls of different colors, drawing of ballots with the study group labels from a dark bag, computergenerated random sequence, pre-ordered sealed envelopes, sequentially-ordered vials, telephone call to a central office, and pre-ordered list of treatment assignments. Examples of inadequate methods are: alternation, birth date, social insurance/security number, date in which they are invited to participate in the study, and hospital registration number 2

Appendix 2: Criteria for a judgment of 'low risk of bias' for the sources of bias
Assignment generated by an independent person not responsible for determining the eligibility of the patients. This person has no information about the persons included in the trial and has no influence on the assignment sequence or on the decision about eligibility of the patient 3 Index and control groups are indistinguishable for the patients or if the success of blinding was tested among the patients and it was successful 4 Index and control groups are indistinguishable for the care providers or if the success of blinding was tested among the care providers and it was successful Adequacy of blinding should be assessed for each primary outcome separately. This item should be scored 'low risk' if the success of blinding was tested among the outcome assessors and it was successful or: -For patient-reported outcomes in which the patient is the outcome assessor (e.g., pain, disability): the blinding procedure is adequate for outcome assessors if participant blinding is scored 'low risk';-for outcome criteria assessed during scheduled visit and that supposes a contact between participants and outcome assessors (e.g., clinical examination): the blinding procedure is adequate if patients are blinded, and the treatment or adverse effects of the treatment cannot be noticed during clinical examination;-for outcome criteria that do not suppose a contact with participants (e.g., radiography, magnetic resonance imaging): the blinding procedure is adequate if the treatment or adverse effects of the treatment cannot be noticed when assessing the main outcome;-for outcome criteria that are clinical or therapeutic events that will be determined by the interaction between patients and care providers (e.g., co-interventions, hospitalization length, treatment failure), in which the care provider is the outcome assessor: the blinding procedure is adequate for outcome assessors if item ''4'' (caregivers) is scored 'low risk';-for outcome criteria that are assessed from data of the medical forms: the blinding procedure is adequate if the treatment or adverse effects of the treatment cannot be noticed on the extracted data 6 The number of participants who were included in the study but did not complete the observation period or were not included in the analysis must be described and reasons given. If the percentage of withdrawals and drop-outs does not exceed 20% for short-term follow-up and 30% for long-term follow-up and does not lead to substantial bias a 'low risk' is scored. (N.B. these percentages are arbitrary, not supported by literature) 7 All randomized patients are reported/analyzed in the group they were allocated to by randomization for the most important moments of effect measurement (minus missing values) irrespective of noncompliance and co-interventions 8 All the results from all pre-specified outcomes have been adequately reported in the published report of the trial. This information is either obtained by comparing the protocol and the report, or in the absence of the protocol, assessing that the published report includes enough information to make this judgment. In the absence of a protocol, the authors judged this item to have been met when the expected outcomes (i.e. pain and functional status) were reported 9 Groups have to be similar at baseline regarding demographic factors, duration and severity of complaints, percentage of patients with neurological symptoms, and value of main outcome measure(s) 10 If there were no co-interventions or they were similar between the index and control groups 11 The reviewer determines if the compliance with the interventions is acceptable, based on the reported intensity, duration, number and frequency of sessions for both the index intervention and control intervention(s). For example, physiotherapy treatment is usually administered for several sessions; therefore itis necessary to assess how many sessions each patient attended. For single-session interventions (e.g., surgery), this item is irrelevant 12 Timing of outcome assessment should be identical for all intervention groups and for all primary outcome measures 13 Other types of biases. For example:-When the outcome measures were not valid. There should be evidence from a previous or present scientific study that the primary outcome can be considered valid in the context of the present.-Industry-sponsored trials. The conflict of interest (COI) statement should explicitly state that the researchers have had full possession of the trial process from planning to reporting without funding bodies with potential COI having any possibility to interfere in the process. If, for example, the statistical analyses have been done by a funding body with a potential COI, usually 'unsure' is scored

Analysis plan of outcome pain
All of the RCTs in the repository had asked participant to rate or mark on a numerical rating scale or a visual analogue scale that described either their average pain at the present time or over a defined weeks or months. This item was presented either as a single standalone instrument or as an item that was part of a collective pain measurement:

Analysis of average pain and pain intensity separately
For the analyses of average pain, one of the following instruments from each trial, where available, was chosen (in descending order): (1) Individual visual analogue scale (VAS) on average pain today (2) Average pain over the past one week (3) The individual item of the Von Korff pain intensity score that is equivalent to the VAS if it is available For the analyses of pain intensity, one of the following instruments from each trial, where available, was chosen (in descending order): (1) Individual visual analogue scale (VAS) on pain intensity today (0-10 or 0-100) (2) Pain intensity over the past one week (0-10 or 0-100) (3) The individual item of the Von Korff pain intensity score that is equivalent to the VAS if it is available (4) Roland Morris pain score (0-6). (Divided by 6 and multiplied by 10).
All measures will be scaled to 0-100 scale.

Combining average pain and pain intensity
For the analyses of pain, one of the following instruments from each trial, where available, was chosen (in descending order): (1) Individual visual analogue scale (VAS) on average pain today (2) average pain over the past one week (3) the average pain item of the Von Korff pain intensity score that is equivalent to the VAS if it is available (4) individual visual analogue scale (VAS) on pain intensity today (0-10 or 0-100) (5) Pain intensity over the past one week (0-10 or 0-100) (6) The summary score of the Von Korff pain intensity score (7) Roland Morris pain score (0-6). (Divided by 6 and multiplied by 10).
All measures were scaled to 0-100 scale.

Analysis plan of outcome functional status
All of the RCTs but one in the repository had asked participant to rate or mark the functional status. Different functional status questionnaires were used. For the analyses of functional status, the sum scores of the following instruments where analyzed separately: • RMDQ: Only sum score of all studies will be analyzed • ODI: Sum score of all studies will analyzed

Combined measure of functional status
For the analyses of combined measure of functional status, one of the following instruments from each trial, when available, was chosen (in descending order): o RMDQ p ODI q Von Korff disability scale r Other functional status questionnaires One stage analysis • The individual scores will be recoded into z-scores by subtracting the individual score from the mean score at baseline, and dividing the result by the mean standard deviation at baseline. Subsequently, the pooled z-scores will be used for further analyses.
Two stage analysis • Standard mean difference will be calculated for each study. The evidence was graded upon the following five domains (i.e. limitations/risk of bias, inconsistency, indirectness, imprecision, publication bias) in the following manner:

Limitations/risk of bias
Limitations in the study design refers to the way in which the various forms of bias may influence the estimates of the treatment effect.

3
We examined all studies for the following forms of bias: • Selection bias (random sequence generation, allocation concealment, group similarities at baseline); • Performance bias (blinding of participants and/or healthcare providers); • Attrition bias (drop outs and intention-to-treat analysis); • Detection bias (blinding of the outcome assessors and timing of outcome assessment); • Reporting bias (selective reporting).
There is evidence that selection bias, specifically concealment of the allocation, and performance bias are most closely associated with treatment effect (Juni 2001; Savovic 2017).Therefore, we considered downgrading the quality of the evidence as follows: • By one level when the majority of subjects (> 50%) came from studies with selection bias (specifically, the allocation concealment was not conducted properly) and performance bias was present; • By two levels when the majority of subjects (> 50%) came from studies with selection bias (specifically the allocation concealment was not conducted properly) and performance bias and bias was present in one or more other category.

Inconsistency
Inconsistency refers to an unexplained heterogeneity of results. Widely differing estimates of the treatment effect (i.e. heterogeneity or variability in results) across studies suggest true differences in the underlying treatment effect. Inconsistency may arise from differences in the populations (e.g. patients treated for low-back pain in primary care may demonstrate a different treatment response than those treated in secondary or tertiary care; or those with non-specific lowback pain may demonstrate different effects as opposed to those with radiating pain), differences in the interventions (e.g. high-velocity SMT versus low-velocity SMT), or differences in the timing of the outcome measurements. The results of the second stage analysis will be used as this cannot be elicited from the one-stage analysis. We considered downgrading the quality of the evidence as follows: • By one level: when the heterogeneity or variability in results was large (e.g. I 2 statistic value > 50%, representing potentially substantial heterogeneity). I 2 statistic value will be collected from the two-stage analysis. • By two levels: when the heterogeneity or variability in results was large AND there was inconsistency arising from differences in the populations, interventions, or outcomes.

Indirectness
Indirectness refers to the generalizability of the findings. Indirectness may be a problem and diminish our confidence if the population, type of intervention, comparator, or outcome in the included randomized trials differs broadly from the research question being addressed in this review. In systematic review, study with mixed population studies (acute/ subacute/chronic), studies which included a majority of subjects with radiating pain, or the majority of subjects were referred from a secondary or tertiary professional (or setting)) are included in the analysis. In IPD, generalizability in many instances were solved because many variables describing these issues were present in the raw data, for example by excluding patients from the analyses e.g. only including chronic patients or performing a moderator analysis leg pain vs no leg pain. In cases where these data were not available, we considered downgrading the quality of the evidence as follows: • By one level: when there is indirectness in only one area. • By two levels: when there is indirectness in two or more areas.

Imprecision
Imprecision refers to limitations in the interpretation of the results when studies include relatively few participants and few events, leading to wide confidence intervals (CIs) surrounding the estimate of the effect, and thus resulting in uncertainty about the treatment effect. For dichotomous outcomes, we considered imprecision for either of the following two reasons: (a) There is only one study; when there is more than one study, the total number of events is less than 300 (a threshold rule-of-thumb value) (Mueller 2007). (b) The 95% CI around the pooled effect includes both 1) no effect and 2) appreciable benefit or appreciable harm. The threshold for'appreciable benefit' or appreciable harm' is a relative risk reduction (RRR) or relative risk increase (RRI) greater than 25%.
For continuous outcomes, we considered imprecision for either of the following two reasons.
(a) There is only one study; when there is more than one study, the total population size is less than 400 (a threshold rule-of-thumb value) (Mueller 2007).
(b) The 95% CI includes no effect and the upper or lower confidence limit crosses an effect size of 0.5 or mean difference of 20 mm in either direction.
We considered downgrading the quality of the evidence as follows: • By one level: when there is imprecision due to (a) or (b) for a continuous or dichotomous outcome. • By two levels: when there is imprecision due to (a) and (b) for a continuous or dichotomous outcome.

Publication bias
Publication bias refers to bias introduced as a result of the selective publication of studies, typically leading to an underestimation of the effect from studies demonstrating a 'negative' effect which are under-reported. We considered downgrading the quality of evidence as follows: • By one level: when the funnel plot suggests publication bias.