Background

Spinal manipulative therapy (SMT) is a manual treatment where a vertebral joint is passively moved between the normal range of motion and the limits of its anatomic range, though a universally accepted definition does not seem to exist [1]. SMT often involves a high-velocity, low-amplitude thrust, a technique in which the joints are adjusted rapidly, often accompanied by popping sounds [2, 3].

The use of SMT dates back to 400 BCE, but during the centuries, SMT has switched between being accepted and abandoned by the medical profession [4]. Today, SMT is included in many guidelines for primary care, such as the management of non-specific low back pain [5], and several evidence-based guidelines exist on the practice of SMT [610]. SMT is widely used; it has been estimated that 12% of adults in the USA and Canada are attending chiropractors each year, with 80% of the visits involving SMT [11, 12], and use of SMT has been increasing in the past several decades [13]. Various professional groups are performing SMT including chiropractors, osteopaths and manual therapists [14]. SMT is used for a wide range of diseases and conditions with frequent indications being neck and back pain [13]. Patient satisfaction is high [13], but the evidence on the effectiveness of SMT from randomized controlled trials (RCTs) is often unconvincing [1417].

As with all interventions, there are risks associated with SMT. Possible harmful outcomes of SMT includes, but are not limited to, headache, radiating discomfort and fatigue [18], which are often transient, but also more serious events such as death, stroke, paralysis and fractures [1922]. What the patients define as mild, moderate and major AEs depend on the severity of the pain or symptom, the impact on their function, the duration and by ruling out other causes for the AEs [23]. Currently, the knowledge about the risk of harms associated with SMT is fragmented since an enormous amount of literature exists on the topic, but with different conclusions. For instance, two retrospective population-based studies have suggested an association between vertebrobasilar strokes and chiropractic care (which usually involves spinal manipulation), but also a similar association with primary care physician visits [24, 25]. Another study concluded that SMT is independently associated with vertebral artery dissection [26]. Thus, uncertainty arises when single studies are reviewed, and there is a need for an overview of the field. To our knowledge, no one has provided a complete overview of what is known about the safety of SMT. Therefore, we performed an overview of reviews to elucidate and quantify the risk of serious adverse events (SAEs) associated with SMT regardless of the indications for the treatment.

Methods

A brief protocol was registered in the International Prospective Register of Systematic Reviews (PROSPERO: CRD42015030068) prior to the initiation of this overview [see protocol in Additional file 1]. This review was reported according to PRISMA harms [27] [see the completed checklist in Additional file 2].

Literature search

We searched Cochrane Database of Systematic Reviews, Cochrane Database of Abstracts of Reviews of Effects (DARE), Cochrane Health Technology Assessment Database (HTA), MEDLINE via PubMed (from 1966) and EMBASE via Ovid (from 1974). The original search was conducted on December 8, 2015 and updated on January 10, 2017, and no date restrictions were used. Our main search terms consisted of the terms spinal adjustment, chiropractic, and spine -, spinal -, lumbar -, back -, neck -, cervical -, thrust -, or osteopath manipulation, in addition to the MeSH term ‘Manipulation, Chiropractic’. Our systematic review filter included the terms Cochrane, CENTRAL, MEDLINE, EMBASE, pubmed, search, systematic review, meta-analysis, comparative effectiveness, indirect - and mixed treatment comparison, and systematic literature [see Additional file 3, showing the search strategy used]. References from relevant reviews, overviews of reviews and relevant national clinical guidelines were checked to identify additional relevant reviews.

Study selection

We included official health technology assessment reports and peer-reviewed reviews of studies of any type (including cohorts, case reports, etc.) that examine individuals receiving SMT. We did not require the SMT to be within a certain definition but relied on the definitions used by the review authors. No restrictions were put on the age, nationality, gender or health status of the population, or length of follow-up of the study. The control could be sham, placebo, any or none. At least an abstract in English, Danish, Swedish or Norwegian had to be available. For inclusion in the synthesis, data on AEs was required.

In order to ensure that the included reviews were conducted in a systematic manner, a criterion for inclusion was to include the following two items from a measurement tool to assess systematic reviews (AMSTAR): ‘were two or more electronic sources searched?’ and ‘was the scientific quality of the included studies assessed and documented?’ [28, 29], as done by other overview authors [30, 31]. Since no commonly accepted quality assessment tool exists for case reports, case series, cross-sectional studies or surveys, quality assessments of these study types were not required.

One reviewer (SMN) screened titles and abstracts, and subsequently reviewed full texts to identify relevant reviews for the overview. A second reviewer (MH) was consulted when the basis for decision making was not clear. We contacted authors of studies that could not be retrieved in full text.

Data extraction

The same reviewer (SMN) performed the data extraction, and the same second reviewer (MH) was consulted, when the basis for decision making was not clear. When possible, we extracted only data for patients receiving SMT, when other interventions were included in a review.

The primary outcome was SAEs defined as conditions requiring hospital admission (or mortality) [32], and the secondary outcome was any AEs reported. AEs were defined as ‘any untoward occurrence that may present during treatment’ [32]. If the severity of an AE was not defined in the review, one reviewer (MH) rated the severity of the reported AEs, and when the basis for rating was unclear, another reviewer (HB) was consulted. No attempt was made to contact authors of reviews or primary studies to obtain missing data.

It was pre-specified in our protocol that the AEs and SAEs should be summarized for each review with a subsequent synthesis and meta-analysis. However, the available data on AEs and SAEs were too heterogeneously and insufficiently reported. Instead, we appraised the communicated opinions of each review concerning the safety of SMT based on their conclusions regarding the AEs and SAEs. This was done by two reviewers independently (SMN, LK), who judged the communicated opinions as either ‘safe’, ‘neutral/unclear’ or ‘harmful’, based on the qualitative impression the reviewers had when reading the conclusions. The reviewers had no opinion about the safety/harmfulness of SMT before commencing the judgements. Cohen’s weighted Kappa was calculated for the agreement between the reviewers, with a value of 0.40–0.59 indicating ‘fair agreement’, 0.60–0.74 indicating ‘good agreement’ and ≥0.75 indicating ‘excellent agreement’ [33]. Disagreements were resolved by a third reviewer (MH).

Quality assessment

One reviewer (SMN) assessed the methodological quality of each review using the AMSTAR tool [28, 29]. AMSTAR consists of 11 criteria, where each was given one of the ratings: ‘yes’ (clearly done), ‘can’t answer’ (unclear if completed), ‘no’ (clearly not done) or ‘not applicable’. A second reviewer (MH) was consulted when the basis for decision making was not clear. We calculated a summary score by awarding each ‘yes’ with one point for each review [28]. A score of 0–4 is often classified as low quality, 5–8 as moderate quality and 9–11 as high [34].

We did not assess the quality of the evidence presented by each of the reviews. However, if a quality of evidence assessment (such as a GRADE assessment) was reported in the reviews, the approach and result were extracted.

Data analysis

To get an ‘objective’ measure of our confidence in the subjectively judged communicated opinions, we assessed whether a pattern of communicated opinions could be identified according to methodological quality of the reviews (i.e. AMSTAR). This was done by calculating a risk ratio (RR) of a review communicating the opinion ‘safe’ when meeting the requirements for each AMSTAR item, and a RR of the opinion of a review communicating ‘harmful’ when meeting the requirements for each AMSTAR item. The decision to conduct this assessment and subsequent analyses were, however, done post hoc.

Risk estimates for SAEs reported in the reviews are presented in a separate table, and a matrix was constructed showing which studies the estimates from each review were based on. All statistical analyses were performed using the statistical software R, version 3.2.3 (R Foundation for Statistical Computing).

Results

Study selection

The reviewer screened 2305 records and identified 841 potentially eligible records (Fig. 1). Thirteen authors were contacted regarding studies that could not be retrieved in full-text. Twelve authors responded of which 9 were able to provide full-text versions. Reviewing full-texts resulted in 257 records describing 252 reviews eligible for the overview [see Additional file 4 for a list of the excluded reviews]. From reference lists, we further identified 8 records on 6 eligible reviews. In total, 265 records describing 258 reviews were included in the overview [see Additional file 5 for a list of the 258 included studies]; of these, 110 records describing 104 reviews were included in the synthesis. The updated search resulted in screening of 267 additional records, identifying 68 potentially eligible records. Of these, 26 records describing 25 reviews were eligible for the overview, and 15 records describing 14 reviews were included in the synthesis. In total, 283 reviews were included in the overview, of which 118 reviews were included in the synthesis.

Fig. 1
figure 1

Flow diagram. AEs = adverse events; DARE = Cochrane Database of Abstracts of Reviews of Effects; HTA = Cochrane Health Technology Assessment Database; RCT = randomized controlled trial; SMT = spinal manipulative therapy; SRs = systematic reviews. *Non-systematic: does not report to have searched at least two electronic databases or does not document an assessment of the quality of the included studies (case reports, case series, cross-sectional studies and surveys were not required to have been quality assessed). † The DARE database stopped updating March 2015. ‡ Four of these protocols resulted in a systematic review which was retrieved in the updated search

Characteristics of the included reviews

The main characteristics of the 118 reviews included are presented in Table 1 [see Table, Additional file 6, which shows further study characteristics]. The included reviews consisted of 13 Cochrane reviews [1417, 3546], 41 other reviews including only RCTs [4787], 53 reviews including study types other than RCTs [88140], 3 guidelines [9, 141143] and 8 health technology assessments [144154].

Table 1 Summary of findings for spinal manipulative therapy

The vast majority of the reviews investigated SMT (either as the only intervention or as a separate subgroup). Some of these reviews further specified SMT as cervical, thoracic or lumbar SMT (21 reviews [46, 47, 49, 54, 57, 65, 91, 96, 103, 105, 114, 115, 119, 121, 123, 125127, 134, 136, 150]). Other reviews did not further specify than ‘manipulation’ (10 reviews [36, 66, 7073, 79, 93, 101, 107]), ‘osteopathic manipulative treatment/therapy’ (8 reviews [38, 52, 56, 64, 81, 82, 116, 139]), and ‘chiropractic care/interventions’ (5 reviews [67, 98100, 137]).

The populations most frequently studied were patients with cervical pain, low back pain or headache (based on a word count after categorization by the authors; Table 2). For 81 of the reviews, the main aim was to investigate efficacy (benefit), for 29 of the reviews, the main aim was to investigate AEs, and for the remaining 8, the aim was to investigate both.

Table 2 The patient populations most frequently studied in the included reviews (listed after frequency shown in brackets)

A word count of the reported AEs and SAEs showed that the most frequently used term describing AEs/SAEs in the reviews was stroke (counted after categorization by the authors; Table 3). However, it should be noted that a very common subject in the discussion sections was the poor reporting of AEs in the primary studies and the possible risk of underreporting. Thirteen of the reviews reported estimates for the incidence of SAEs, and also here, many of the reviews noted that these were rough estimates [see Table, Additional file 6, which includes conclusions extracted from each reviews].

Table 3 The terms describing the adverse events and serious adverse events most frequently used in the reviews (listed after frequency shown in brackets)

The methodological quality of included reviews

None of the reviews met the requirements for all 11 AMSTAR items (Table 4). The median number of ‘yes’ was 4 (interquartile range, 3 to 6), with a minimum and maximum of 0 and 9 ‘yes’ respectively. Only very few reviews had combined (e.g. in meta-analysis or other means of synthesis) the findings of AEs and SAEs or done this in an appropriate way; hence, item 9 was not applicable in most cases. One of the reviews made an attempt to assess the publication bias specifically for AEs and/or SAEs; hence, this one review met the requirements for item 10.

Table 4 Methodological quality of included reviews assessed with AMSTAR

Furthermore, very few reviews rated the quality of the evidence for AEs and/or SAEs, with GRADE being the most frequently used tool.

Serious adverse events

The estimates for the incidence of SAEs (Table 5) were heterogeneous, as they had different units (e.g. per number of manipulations, per visits or no unit), were based on different patient types, and were obtained from different types of studies [see Table, Additional file 7, showing which studies the estimates for the incidence of SAEs are based on].

Table 5 Estimates for the incidence of serious adverse events following spinal manipulative therapy

When not distinguishing between the different types of SMT treatments and assuming that one treatment or visit equals one manipulation, and leaving out the minority of estimates not specifying the units or using per patient as the unit, the estimates for the incidence of SAEs ranges from 1 in 20,000 manipulations to 1 in 250,000,000 manipulations (Table 6).

Table 6 Estimates of the incidences of serious adverse events (some scaled for comparability)

Based on the conclusions of the reviews regarding AEs and SAEs, 54 reviews (46%) expressed that SMT is safe, 15 (13%) expressed that SMT is harmful and 49 reviews (42%) were neutral or unclear regarding the safety of SMT, with a fair agreement between the two reviewers (Cohens Weighted Kappa, 0.50).

The calculations of RRs show a higher chance of a review communicating that SMT is safe, when having a higher methodological quality, compared to reviews of lower methodological quality (statistically significant for the AMSTAR items 5, 7 and 8; Table 7). And vice versa, there is a lower chance of a review communicating that SMT is harmful, when it has a lower methodological quality.

Table 7 The risk ratio of having the opinion that spinal manipulative therapy is safe or harmful, respectively, if a ‘yes’ was obtained in the individual AMSTAR items (118 reviews)

Reviews specifically investigating adverse events

When only considering the subset of reviews, where the objective was to investigate AEs (37 reviews), then 8 reviews (22%) expressed that SMT is safe, 13 reviews (35%) expressed that SMT is harmful and 16 reviews (43%) were neutral or unclear regarding the safety of SMT. Hence, there is a tendency that a bigger proportion of these reviews are expressing that SMT is harmful compared to the full sample of reviews. The calculations of RRs did not obtain enough power to show any statistically significant RRs [see Table, Additional file 8, which shows the calculations of RRs]. The possibility of a causal relationship between SMT and SAEs was specifically investigated in six of the included reviews [89, 90, 118, 124, 127, 133] (Table 8). Five of these had for each case report or case series assessed the likelihood of causality [89, 90, 118, 124, 133]. In all cases, ‘certain’ was not the single most used rating. Miley et al. [127] used another approach and concluded weak to moderate strength of evidence for a causal relationship between cervical SMT and vertebral artery dissection, and expressed that comprehensive prospective studies are needed to further examine this relationship.

Table 8 Assessments of the likelihood of the causal relationship between spinal manipulative therapy and serious adverse events in reviews based on case reports and case series

Discussion

In this overview, the included reviews did not provide sufficient data for synthesis, and therefore it is currently not possible to provide an overall estimate for the risk of SAEs associated with SMT. Of the few reviews providing estimates for the incidence of SAEs, no reliable single estimate was provided, and it was not possible to identify any agreement regarding the safety of SMT across the included reviews. Interestingly, we found indications that reviews with higher methodological quality generally used language suggesting SMT to be safer (or less harmful). However, when analysing this across the reviews whose objective was to investigate safety, this could not be replicated. In the few reviews assessing the likelihood of a causal relationship between SMT and SAEs, this relationship was not in all cases certain. However, it should be noted that these assessments were based on case reports and case series, which cannot determine causality.

This overview is to our knowledge, the most comprehensive overview conducted on SMT, by including more than 100 reviews on SMT, and the only one with a sole focus on the safety aspects of SMT. Our intention was to provide an overview of all SAEs from SMT regardless of the indications for the treatment, but our overview especially covers patients with cervical pain, low back pain and headache, which were the most frequently studied populations. The most frequently mentioned AEs/SAEs across the 118 reviews ranged from minor events, such as soreness, to significant events, such as spinal cord injury and death. While some of these events may to a large extent be unpredictable [155] and have major impact on not only the individual but also the SMT provider and society, it is not possible to ascertain the risk-benefit balance based on the current evidence [156]. We strongly encourage efforts to illuminate the risk-benefit ratio reliably, since this would be of value when comparing SMT with other treatment options. Some of our included reviews indicate that NSAIDs involve a substantially higher risk of SAEs (including death) than SMT [114, 150], but they did not take into account the possible benefits.

General limitations in overviews are that recently published primary studies or studies not included in reviews cannot be included, the included reviews may overlap, and that the overviews rely on the methodological quality of the included reviews, which again rely on the methodological quality of the primary studies [157]. Considering the low methodological quality of the included reviews, the communicated opinions could possibly be influenced by the background of the authors [158], and by lack of independence between the reviews, i.e. several reviews were written by the same author. A major limitation of this overview was the limited data on AEs and SAEs hindering a synthesis. On the level of reviews, poor reporting of AEs is present [159]; however, even high quality reviews may fail to provide reliable estimates due to poor reporting in the primary studies, and this was frequently highlighted in the discussions of the included reviews. In primary studies, underreporting may be expected for retrospective studies or poorly controlled prospective studies. Including only RCTs would provide an insufficient population size for detecting SAEs reliably, and it has been shown that even in RCTs, AEs and SAEs are poorly reported [126, 160] and underreported [96, 161]. Gorrell et al. [162] found that out of 368 RCTs on SMT, only 140 (38%) reported on AEs. This under-reporting will directly affect the reviews including the studies resulting in a underestimation of the risk. On the other hand, over-reporting may be present, since the different study types (ranging from case reports to RCTs) provide various levels of evidence, and therefore confounding and chance cannot be ruled out as possible explaining factors for some of the observed SAEs associated with SMT.

Our methodological approach has limitations too. Our inclusion criteria were slightly heterogeneous across reviews. We relied on the definitions of SMT used by the review authors, which varied between the reviews. Some of the reviews mixed SMT with other interventions under a common category such as ‘manual treatment’ or ‘manipulation’ without reporting on only the SMT subgroup. Even when authors describe interventions such as SMT, these may not always include high-velocity, low-amplitude thrusts. In that case, the intervention is less likely to result in SAEs and may influence their and our conclusion about safety by making (high-velocity, low-amplitude thrust-type) SMT appear more safe. Further, we did not require a quality assessment to have been conducted for case reports, case series, cross-sectional studies and surveys, which may have facilitated the inclusion of reviews including only these types of studies. Our judgements regarding the expressed opinions in the reviews were not based on any criteria but based on subjective interpretation and therefore not reproducible even though there was fair agreement between the reviewers. Other limitations include the absence of a double study selection, data extraction and quality assessment, and a very brief protocol. These methodological compromises were taken due to limited time resources. However, our search strategy was broad, and we applied a thorough study selection making us confident that we have identified the vast majority of the relevant scientific literature on SMT and we find it unlikely that more thorough study selection and extraction procedures would result in different conclusions.

Conclusions

This overview has indeed demonstrated how extensive the literature on SMT is. Unfortunately, the majority of reviews are non-systematic and of poor quality. The available evidence showed a broad range of communicated opinions and very variable estimates of SAE incidence. Reviews with less methodological flaws typically communicated that SMT may be safe; however, the methodological quality was in general low and the included reviews very heterogeneous. Furthermore, for the subset of reviews whose objective was to investigate safety, this could not be replicated. Research of high quality, with sufficient sample size and an appropriate comparison group is needed to obtain reliable risk estimates. Furthermore, reviews suggested that a causal relationship between SMT and SAEs was often not certain. However, the types of SAEs reported were indeed significant, sustaining that there is some risk present; sometimes SMT may even lead to death or permanent disability.