Introduction

Musculoskeletal disorders are highly prevalent globally, leading to personal suffering and high socio-economical costs [1]. Neck pain, together with low back pain, is one of the most common musculoskeletal disorders related to years lived with disability according to the Global Burden of Disease studies [1]. The estimated one-year incidence of neck pain is around 20% – with a higher incidence reported among office and computer workers – and is reportedly higher in women [2,3,4]. Furthermore, between 30 to 50% of the adult population have experienced neck pain in the previous year, and a high percentage report recurrent pain [5]. Strong risk factors for developing neck pain or for developing recurrent neck pain include social determinants of health such as psycho-social factors rather than physical factors, such as high muscular tension, depressed mood, role conflict, and high job demand [6]. For acute or sub-acute neck pain to translate into chronic neck pain, non-modifiable factors have been suggested – including age, gender, and co-morbidity with other disorders – as well as modifiable factors such as psychological problems, sleep troubles, job stress, and work-related positions/posture [7,8,9,10].

Patients with neck or back pain have high levels of healthcare utilization both in primary and specialist healthcare [11]. Several treatments are offered, including pharmacological and non-pharmacological treatments such as electrotherapy [12], manual therapy [13], massage [14], and acupuncture [15], but the evidence for the effectiveness of these treatments varies [16]. Guideline-endorsed treatments for chronic neck pain include advice, education, and manual therapy as well as recommendations for physical exercise programs [17, 18]. Exercise is further suggested to be an intervention with minimal negative adverse effects [16] and seems to be a cost-effective treatment for chronic neck pain compared to treatments such as manual therapy or massage [19].

Various exercise types are used in the rehabilitation of chronic neck pain and are suggested as potentially beneficial, although the evidence for these effects is low and results are inconsistent [16, 20]. The exercise types are summarized in several systematic reviews reporting various effects for specific exercises such as motor control exercises [21], yoga [22], and Pilates [23] as well as strength and endurance training [20].

The stability of the neck is dependent on several deep and superficial muscles as well as on the posture of the neck and loads transferred via the arms [24, 25]. The exercise types used in chronic neck pain have different aims such as training of the deep neck flexors through motor control exercises, strength training of superficial muscles in the neck and shoulder girdle, or stabilization and endurance training aiming to keep the neck stable during loaded arm movements [26].

Our research group previously conducted a systematic review (SR) of SRs on the effect of various exercise types used in chronic low back pain and concluded that no exercise type seems to have more effect on pain and disability than any other [27]. A SR of SRs can help in summarizing the effect in a specific research area even if such a SR is itself dependent on the quality of the included SRs [28]. To date, there is no consensus if one exercise type is more effective than another in the treatment of chronic neck pain. Further, it can be of use for the therapist in their dialogue with the patient to decide on what exercises to choose and preferably based on the best evidence. The aim of this SR of SRs was therefore to summarize the literature on the effect of various exercise types used in chronic neck pain and to assess the certainty of the evidence.

Methods

This study followed the PRISMA guidelines for systematic review [29] (Additional file 1). The method described in this study is the same as in our previous systematic reviews of systematic reviews on exercises used in chronic low back pain [27]. A protocol for the trial was registered in Prospero (CRD42022336014). No deviations were made from the protocol.

Eligibility criteria

We included SRs and meta-analyses (MAs) in which a majority (> 75%) of the included original studies were randomized controlled trials (RCTs). We based the inclusion on PICO: population, intervention, comparator, and outcome (Additional file 2). We did not exclude any of the SRs or MAs in terms of language; treatment duration, frequency, or intensity; comparator intervention; follow-up time; or year of publication. Hereafter, all SRs (with or without MAs) will be referred to as SRs.

Patients

We included SRs mainly (> 75%) based on a working population aged 18 to 70 years, which defined their populations as suffering from chronic neck pain (defined as having neck pain for 12 weeks or more). The rationale to only include SRs with chronic neck pain was to gain a homogenous population [30].

Intervention

We included SRs in which the effect of any exercise therapy or training type was studied as the main (single) intervention. Exercise was defined as “a regimen or plan of physical activities designed and prescribed for specific therapeutic goals, with the purpose to restore normal musculoskeletal function or to reduce pain caused by diseases or injuries” [31].

Comparator

We did not set any limitations for comparator interventions.

Outcome

We included SRs that investigated pain and disability as outcomes in short-, intermediate-, or long-term follow-up. We defined the duration of follow-up as short-term (one day to three months), intermediate-term follow-up (three months up to, but not including, one year), and long-term follow-up (one year or longer) [32].

Database search

We (authors ERB and WG) developed in collaboration with librarians at the Karolinska Institutet Library a comprehensive search strategy based on earlier published search strategies in Cochrane Reviews regarding exercise therapy and chronic neck pain in the following databases: Ovid MEDLINE, Embase, Cochrane Library (the Cochrane Database of Systematic Reviews), Web of Science (Core Collection), and SportDiscus. We combined search terms and MESH terms in a search strategy developed for Ovid MEDLINE and adapted this strategy for the other databases. For each search concept Medical Subject Headings (MeSH-terms) and free text terms were identified. SRs and MAs were considered in the database searches. Search strategies are presented in Additional file 3. After the original search was performed on 29 April 2022, the search was updated on 28 June 2023, using the methods described by Bramer et al. (2017) [33]. The data were then exported to Endnote (version 20). After removing all duplicates in Endnote using the methods described by Bramer et al. (2016) and comparing the DOIs, the papers were exported to Rayyan QCRI [34, 35]. All papers were alphabetically divided among five teams with two or three reviewers each. The reviewer pairs screened the titles and abstracts retrieved from the searches independently from each other and assessed these for eligibility against the predetermined inclusion criteria (PICO). At this stage of the process, regular reviewer meetings were held to reach a consensus. All titles and abstracts meeting the inclusion criteria were retrieved in full text. In each pair, both reviewers independently checked the full-text articles to assess their eligibility for the final inclusion in this review. Reasons for exclusion were noted in this stage, and if more than one reason for exclusion was available, the publication was excluded in PICO order, that is, a publication with wrong intervention, wrong publication type, and the wrong population was classified only as excluded based on population. We scrutinized the reference lists of the included SRs for additional potentially relevant publications.

Overlap

Overlap was defined when the same original study was included in more than one of the included SRs [36]. We calculated the total overlap (original RCTs in our included SRs) for each type of exercise type independent of the outcome following the formula proposed by Pieper et al. [36]. We present the overlap with the percentage of corrected covered area (CCA). Interpretation of CCA: 0–5% = slight overlap, 6–10% = moderate overlap, 11–15% = high overlap, and > 15% = very high overlap.

Assessment of the methodological quality of the included reviews

We conducted the assessment of the methodological quality using the recommended and updated tool AMSTAR-2 (A MeaSurement Tool to Assess systematic Reviews), which is considered valid and reliable when assessing SRs and MAs [37]. AMSTAR-2 has previously been used to assess the methodological quality in SRs of SRs [32, 38]. Our included SRs were assessed based on their score on AMSTAR-2 and thereafter assigned to one of four levels (critically low, low, moderate, and high), depending on the number of critical flaws and weaknesses as recommended by the designers of the tool [37]. Items 1, 4, 5, 6, 8, and 9 were classified as critical flaws, and all other items were classified as study weaknesses [37]. The two reviewers from each of the five pairs performed their assessments independently and compared them with each other. Disagreements in the assessments were handled in a consensus dialogue after comparing discrepancies between assessors and were discussed among the whole research group guided by ERB and WG.

Data extraction and synthesis

One reviewer per pair extracted data from the included SRs, and the other reviewer from the same pair checked the extraction for accuracy. We extracted the data into a data extraction form adapted from a Cochrane form [29]. We extracted data primarily from the included SRs. If the data presented in the included SRs were in doubt, the original included RCTs were checked for accuracy. The results of each included SR were separated on the outcomes of pain and disability and on short-, intermediate-, and long-term follow-up.

The results were synthesized based on narrative and quantitative analyses. In the narrative analyses, the results were compared to a control intervention for between-group statistical significance. For each exercise type the outcome (pain/disability) and the follow-up time (short- and intermediate/long), the overall between-effects were classified into “positive effects”, indicating significant results in favor of the specific exercise type, “negative effects”, indicating significant results in favor of the control group, “no effects”, indicating no significant differences between the intervention and control groups, or “varying” effects, when different SRs showed different results, i.e. positive, negative or no results). The narrative analyses were performed separately for each exercise type and comparisons with non-exercising controls (usual care, education, etc.) and with exercising controls were made, as well as for short, respective intermediate/long follow-up periods.

Quantitative analysis using meta-analysis was also performed when at least two SRs provided aggregated data on the same intervention, the same type of control group (non-exercising or exercising), the same outcome (pain, disability), and the same follow-up time (short, intermediate/long-term). If one SR provided multiple results, these were pooled before entering the meta-analysis. Data required for the meta-analysis were extracted from the data presented in the included SRs. The software Review Manager [39] was used and Standardized Mean Differences (SMDs) were computed using a random effects model for each intervention. The generic inverse variance method was used, which permits a wide selection of data formats in the analyses [40]. For example, for SRs which reported Weighted Mean Differences (WMDs) or Pooled Mean Differences (PMDs), when necessary, the original data from the included RCTs were used to calculate an SMD for this specific SR before entering the meta-analysis. For every meta-analysis, measures of statistical heterogeneity (I2) were assessed. Funnel plots were used to assess potential publication bias. When two separate SRs presented the same data from the same original RCTs in their analysis, we chose to include only one of them to avoid double counting.

Assessment of the certainty of the evidence

We used the GRADE approach to evaluate the certainty of evidence for each exercise type and each separate outcome [41]. In short, the first step of GRADE is to choose a starting point for the level of evidence. Because our included SRs only comprised RCTs, we decided to start at the highest level. We thereafter lowered the certainty of evidence by appraising the potential risk of bias due to study limitations (high risk of bias based on the AMSTAR ratings), inconsistency in results (heterogeneity), imprecision (large confidence intervals), indirectness (generalizability of population and interventions), and publication bias (funnel plots). The certainty of evidence was increased if large effects were presented in the SRs or if a “dose–response” was seen based on the reports of the SRs. In this way, we express our findings together with the certainty of evidence in the results using four levels of evidence: “high” (+ +  + +), “moderate” (+ + +), “low” (+ +), or “very low” ( +) [41].

Results

Search results

The search results are summarized in Fig. 1. The literature search returned a total of 1,794 records. Following removal based on duplicates, a review of the titles and abstracts (n = 1,223) was performed, and 82 full texts were screened. Automatic de-duplication was based on the method described by Bramer et al. [35]. After checking against our inclusion and exclusion criteria, we included 25 SRs in the final review, which included a total of 221 randomized controlled trials (RCTs) in which 17,321participants were included (overlap not accounted for). Taking overlap into consideration, a total of 125 (original) studies were included in the 25 SRs. All included SRs were in English. A list of excluded SRs and reasons for exclusion is included in Additional file 4.

Fig. 1
figure 1

PRISMA chart for eligible study selection process. *Consider, if feasible to do so, reporting the number of records identified from each database or register searched (rather than the total number across all databases/registers)

Study characteristics

Our included SRs were published from 2010 to 2023. The majority (80%; 20 out of 25) were MAs, and most of the included patients were defined as having chronic neck pain for at least 12 weeks (Table 1).

Table 1 Description of the included systematic reviews; the number of original studies included, population, exercise intervention, and controls

The 25 included SRs were grouped into five exercise types: a) motor control exercise (MCE) with craniocervical flexion and including Pillar exercises, b) Pilates exercises, c) resistance training, d) traditional Chinese exercise (TCE) such as Tai Chi and Qigong, and e) yoga. A description of the exercise types is presented in Table 2. In four SRs [26, 42, 45, 58] several exercise types were studied and were reported for each exercise type studied separately (Table 5). All but one of the included SRs [43] reported effects on pain, and five did not report effects on disability [21, 26, 42, 54, 58]. In the short-term perspective, some SRs diverted from our definition of < 12 weeks and defined short-term as up to 24 weeks [42, 45, 50].

Table 2 Description of the exercise types

Quality of the included SRs

Based on the AMSTAR-2 ratings, we found five SRs with high quality [23, 26, 42, 51, 59], seven SRs with moderate quality [21, 32, 44,45,46,47, 57], eight SRs with low quality [22, 43, 48, 50, 53, 54, 56, 61], and five SRs with critically low quality [49, 52, 55, 58, 60]. The AMSTAR-2 ratings for all included publications are presented in Table 3. Of the six items that were identified as critical, most studies fulfilled these criteria, except for item 4 “Did the authors use a comprehensive literature search strategy?”, where only 13 out the of 25 SRs scored a “yes”. Concerning the remaining items, many studies lacked reporting on item 10 funding of the included studies (n = 23), and item 7 “Did not (or partially did not) include a list of excluded studies” (n = 20), and item 2 “Did not establish a protocol before the review” (n = 10).

Table 3 Summary of methodological quality assessment of included studies using AMSTAR-2

Summary results for exercises in chronic neck pain

The narrative analyses of the included SRs showed positive effects for all exercise types regarding pain in the short-term and when compared with non-exercise controls, and either varying or positive effects in the intermediate/long-term. For disability, all showed positive effects in the short-term compared to non-exercise controls, while compared with other exercise interventions there were no, varying, or positive effects. In the intermediate/long-term there were mainly no or varying results for pain as well as disability levels when compared to non-exercise controls as well as other exercise interventions. Our meta-analyses were based on fewer SRs (n = 16) but were mostly consistent with the narrative analyses. For yoga, no results concerning pain and disability in the intermediate/long-term were available.

In all, we found low- to high-quality evidence that the exercise types studied in this SR of SRs are effective for reducing pain and disability in the short-term compared to non-exercise controls, but we found conflicting results when compared to other exercises as well as in the long-term perspective (Table 4).

Table 4 Certainty of evidence (GRADE) for the exercise types (motor control (MCE), resistance training, traditional Chinese exercises (TCE) and yoga) compared with non-exercising and exercising control groups, for the outcomes pain and disability, at short and intermediate/long-term follow-up

Results for various exercise types

MCE and pillar exercises

Eight SRs were included, and these were based on 97 studies (Tables 1 and 5). In these studies, a total of 4,566 participants were included (overlap not accounted for). Taking overlap into consideration 38 original studies were included. The SRs investigated MCE mostly using a cranio-cervical flexion hold in patients suffering from chronic neck pain [21, 26, 42,43,44,45,46,47]. Pillar exercises, which are intended to develop the ability of the spine to maintain a neutral position during load, were investigated in one high-quality SR [26]. The included SRs were published between 2010 [43] and 2023 [45]. The last updated search in the SRs was on September 30, 2022 [45]. Six SRs [21, 42, 44,45,46,47] performed a MA. The quality of the included SRs varied from low [43], to moderate [21, 44,45,46,47], to high [26, 42], and there was a very high overlap with a CCA of 21%.

Table 5 Results of the different exercise types compared to control interventions for pain and disability

Seven SRs reported results on the effect of MCE on pain compared to various control treatments, including general exercises [21, 26, 45,46,47], strength and endurance exercises [21, 26, 42, 44, 45, 47, 53], manual therapy [26, 44], and minimal interventions such as usual care or education [26, 45, 53]. Most SRs investigated MCE in the short/intermediate perspective, while only one SR investigated the effect of the MCE in the long-term [47]. When MCE was compared to manual therapy, one high-quality SR reported that MCE was more effective in the short-term [26] while another SR with moderate quality reported no difference between MCE and manual therapy in the short/intermediate term [44]. One SR with high quality reported that MCE was more effective when compared with usual care in the short/intermediate term [26] and one SR with moderate quality showed positive results when compared to a true comparison group/minimal intervention [45]. Combining the two SRs [44, 45] that provided aggregated data using a non-exercising comparison group, we found significant positive effects for MCE on pain-intensity in the short-term (SMD = -1.69, 95%CI -2.73 to -0.64; I2 = 5%; Additional file 5). There was an inconsistency regarding the reported effect on pain for MCE compared to other exercise interventions, where four of the SRs with moderate to high quality reported positive effects in the short/intermediate-term [21, 44, 46, 47], while three SRs of medium and high quality [21, 44, 45] reported no results. However, combining the six SRs [21, 42, 44,45,46,47, 53] that provided aggregated data comparing MCE with other exercise interventions into a meta-analysis we found significant positive effects on pain-intensity in the short-term (SMD = -0.25, 95%CI -0.38 to -0.13; I2 = 0%; Additional file 5). There were no positive intermediate/long-term effects reported when comparing MCE to other exercises in one SR with moderate quality [47]. For Pillar exercises, one high-quality SR reported no effect on pain in the short/intermediate term compared with other exercise treatments, but a positive effect compared with education [26].

Six SRs reported results on disability, and of these three SRs compared MCE with a non-exercising control group in the short-term. One moderate-quality SR did not find significant results [44], while two moderate-quality SRs found positive results [26, 45]. The meta-analysis based on the two SRs that provided data [44, 45] showed significant positive effects for MCE (SMD = -2.26, 95%CI -3.13 to -1.39; I2 = 0%; Additional file 5). Four moderate-quality SRs [44,45,46,47] reported a positive effect of MCE compared to other exercises, while one low-quality SR [43] and one high-quality SR [26] reported no positive short-term effects. Our meta-analysis based on four SRs [44, 45, 47, 62] showed short-term positive effects of MCE compared to other exercises (SMD = -0.36, 95%CI -0.52 to -0.20; I2 = 0%; Additional file 5). In the intermediate-term, one moderate-quality SR reported no difference between MCE and other exercises [47], while MCE was found significantly inferior to Pillar exercises in a high-quality SR [26]. No effect was reported comparing MCE with manual therapy in the immediate term [44]. Regarding the effect of Pillar exercises, one high-quality SR showed that Pillar exercises had no positive effect compared with other exercises in the short/intermediate term, while a positive effect was reported for Pillar exercises compared to education in the short/intermediate term [26].

The GRADE analyses (Table 4) showed that there is a high certainty of evidence that there are positive effects of MCE but not of Pillar exercises on pain and disability compared to non-exercise controls in the short-term. Compared to other exercise types, there are positive results concerning the effect of MCE but not for Pillar exercises on pain and disability in the short-term. In the intermediate/long-term, there is a high certainty of evidence that MCE is more effective than non-exercise controls concerning disability, but not compared to exercise controls. Moreover, we found varying results if MCE compared to non-exercise in the intermediate/long-term as well as other exercise interventions for pain. Downgrading was mainly based on the inconsistency of the results.

Pilates

One high-quality SR (MA), based on 5 original studies was included [23]. In the study, a total of 224 participants with chronic neck pain were included. The included SR was published in 2022, and the search was done up until October 2021. The SR investigated Pilates interventions compared with other exercises such as stabilizing exercises, stretching, or strength training or in one of the studies with pharmacological intervention. The SR reported, based on their MA, a low certainty evidence, that the results for pain are not more positive than other exercises/treatments in the short term (SMD = 9.29, 95% CI -25.84; 7.26). The same refers to disability in the short term (SMD 3.20, 95% CI -7.70: 1.30). One of the original studies investigated Pilates in the intermediate term and reported that there is a moderate certainty evidence that Pilates is more effective than a pharmacological intervention for pain (SMD = 3.11, 95%CI 2.05; 0.17) and for disability (SMD = 11.21, 95%CI 5.58; 16.74).

Resistance training

Eight SRs were included, and these were based on a total of 74 studies (Tables 1 and 5). These studies included a total of 8,380 participants (overlap not accounted for) and investigated some form of isometric or dynamic resistance exercises in patients suffering from chronic neck pain [26, 45, 48,49,50,51,52,53]. Taking overlap into consideration 65 original studies were included. The included SRs were published between 2013 [48] and 2023 [45], and the last updated search in the SRs was performed in September 2022 [45]. Six out of the eight SRs performed an MA [26, 45, 48, 50, 51, 53]. The quality of the included SRs varied from critically low [49, 52], and low [48, 50, 53], to moderate [45] and high quality [26, 51]. There was nearly no overlap for resistance training (CCA = 2%). There was a large range in dosage, e.g., the number of treatment sessions, duration, number of sets and reps, and intensity. In most studies, external resistance such as dumbbells, resistance bands, or body weight were used for training specific neck and shoulder muscles. Six of the SRs included a comparison to a non-exercise control such as no treatment, education, or stretching [26, 45, 48,49,50, 52], and three included a comparison to another exercise-based control such as Thai Chi, aerobics, or general exercises [45, 50, 51].

Concerning pain, all six SRs that compared the effect of resistance training against a non-exercise control reported a positive effect at the short-term follow-up [26, 45, 48,49,50, 52]. Two SRs of moderate respective low quality [45, 48] provided data for a meta-analysis and our results showed significant short-term effects on pain-intensity in favour of resistance exercises compared to non-exercising controls (SMD = -0.75, 95%CI -1.41 to -0.09; I2 = 48%; Additional file 5). A low-quality SR [48] also reported positive effects in the intermediate term, but this was not confirmed by a high-quality SR [26]. Compared to a non-exercise control group, three SRs reported on long-term effects on pain, with one low-quality SR reporting positive effects of resistance training [50], one low-quality SR reporting no difference [48], and one critically low-quality SR reporting contradicting results [49]. The results were narratively found to be varying, and our meta-analysis based on two of the included SRs [48, 63] showed no positive results on pain in the intermediate/long-term for resistance training compared with non-exercise controls (SMD = -0.19, 95%CI -0.48 to 0.09; I2 = 70%; Additional file 5).

When narratively comparing resistance training to other exercise-based controls such as Thai Chi, aerobics, and general exercises, varying results were found in three SRs [26, 45, 51]. Our meta-analysis based on two SRs of moderate and high quality [45, 51] showed, however, no significant short-term effects for resistance exercises when compared to exercising controls SMD = -0.48 95%CI -1.11 to 0.15; I2 = 0%; Additional file 5). One high-quality SR reported positive effects in the intermediate term [51].

Concerning disability, two SRs with critically low and low quality that compared the effect of resistance training to non-exercise controls reported no effects at the short-term follow-up [48, 52], while one SR with moderate quality showed positive effects [45]. Our meta-analysis on two of these SRs [45, 48], showed no significant short-term effects on disability for resistance exercises when compared to non-exercising controls SMD = -0.91 95%CI -2.22 to 0.39; I2 = 70%; Additional file 5). Of the three SRs reporting intermediate/long-term effects, two SRs of low and critically low quality reported positive effects of resistance training [49, 50], and one low-quality SR reported no positive effect, all compared to non-exercising control groups [48]. Moreover, one high-quality SR compared a resistance intervention to other exercise-based controls and reported no positive effects in the intermediate-term follow-up [51]. Our meta-analysis based on two studies [48, 50] on the effects of resistance training in the intermediate/long-term on disability showed positive results when compared to non-exercise controls (SMD = -0.19, 95%CI -0.33 to -0.05; I2 = 0%; Additional file 5).

One low-quality SR (MA) [53] concluded that long-term isometric resistance exercises were effective for lowering both pain-intensity, but included mixed control groups, and did not report if the outcomes regarded short- or long-time outcomes and was therefore not included in our narrative synthesis or meta-analyses [53].

The GRADE analyses showed (Table 4) that there is moderate certainty of evidence that, compared to non-exercise controls, resistance training has a positive effect on pain in the short-term and that there is low certainty of evidence for a positive effect on disability in the intermediate/long-term. However, compared to exercise controls in the short- and intermediate/long-term, there is evidence of moderate certainty that resistance training is not better. The certainty of evidence was downgraded due to low study quality and inconsistent results.

TCE

Eight SRs were included, and these were based on 26 studies (Tables 1 and 5). The eight SRs included a total of 2,905 participants (overlap not accounted for) and investigated the effect of TCE (Qigong and Tai Chi) in patients suffering from chronic neck pain [32, 42, 54,55,56,57,58, 61]. Taking overlap into consideration 7 original studies were included. The included SRs were published between 2015 [54, 57] and 2022 [61], and the last updated search in the SRs was performed in January 2022 [61]. There was a very high overlap for TCE with a CCA of 41%. The quality of the included SRs varied from critically low [55, 58], low [54, 56, 61], moderate [32, 57] to high quality [42]. Both Qigong and Tai Chi interventions were included in the SRs. The type of Qigong varied and included Dantian, Neiyanggong, and Biyun Medical Qigong, but also included neck- and shoulder exercises and in addition moving and breathing exercises. Three SRs included Tai Chi based on the Yang style, all with different combinations of body posture, movement, breathing, meditation, relaxation, and self-massage [42, 56, 58]. Qigong was compared with other exercise types including softball and TheraBand exercises, strength and endurance training, flexibility/mobility exercises, proprioceptive exercises, neck-specific exercises, and cervical manipulation [55, 57, 58, 61]. However, most studies compared TCE to waiting list controls that received no or only minimal intervention [32, 54, 55, 57, 58, 61].

Six of the included SRs reported results on pain with a focus on Qigong and Tai Chi compared with non-exercising controls [32, 54,55,56,57,58]. TCE showed positive effects compared with wait-list controls in the short-term in three SRs, one (TCE) with low quality [54] and two (Qigong) with moderate quality [32, 57]. One SR (Qigong) with critically low study quality found no difference with the non-exercising control [55]. One SR, also with critically low study quality [58] showed varying results, in which Qigong was found to be no better than waiting list control, while Tai Chi showed positive results. Our meta-analysis of the available data in four of the included SRs [20, 54, 57, 61] showed significant positive short-term effects of TCE on pain compared with non-exercising controls (SMD = -0.63, 95%CI -0.95 to -0.32; I2 = 30%; Additional file 5). TCE was also found to be superior to non-exercising controls in the intermediate term in four SRs [32, 54, 57, 61] with SMD = -0.54, 95%CI -0.74 to -0.35; I2 = 3%; Additional file 5. Five SRs reported varying results on TCE compared with different exercise controls [42, 55,56,57,58]. Thai Chi showed positive effects compared with neck-specific exercises in the short-term in one SR with critically low study quality [58]. Compared with other exercise interventions, TCE did not show any positive effects in four SRs with critically low to moderate quality [55,56,57,58], while one SR with high study quality [42] reported that other exercise interventions were superior compared with TCE. Our meta-analysis based on data from 4 of the included SRs found non-significant results (SMD = 0.08, 95%CI -0.09 to 0.26; I2 = 19%; Additional file 5) [42, 56, 57, 61].

Five of the included SRs reported results on disability [32, 55,56,57, 61]. Qigong was found to be superior to non-exercising controls in the short-term in two SRs with moderate study quality [32, 57] while one SR [55] with low study quality found Qigong to be no better than waiting list controls. Based on data available from 2 SRs [32, 57], our meta-analysis showed significant positive short-term results on disability for TCE compared to non-exercising controls (SMD = -0.39, 95%CI -0.65 to -0.13; I2 = 0%; Additional file 5). In the intermediate term, Qigong was found to be superior to non-exercise controls in the intermediate term in one SR [55] while two SRs found Qigong to be no better than waiting list controls [32, 57] and our meta-analysis based on these SRs showed significant short-term effects on disability for TCE compared to non-exercise controls (SMD = -0.45, 95%CI -0.76 to -0.14, I2 = 52%; Additional file 5). Three SRs reported results on disability where TCE was compared with various exercising controls [55,56,57]. TCE was no better than exercise therapy in the short-term in one SR with low study quality [56] while Qigong was no better than exercise therapy in the short- as well as in the intermediate term [55, 57]. Based on data from two SRs with critically low to moderate study quality [55, 57] our meta-analysis showed no effects of short-term effects of TCE compared to exercise controls (SMD = 0.05, 95%CI -0.26 to 0.35, I2 = 0%; Additional file 5).

The GRADE analyses showed (Table 4) that there is low to moderate certainty of evidence that TCE with a focus on Qigong and/or Tai Chi has a positive effect on pain in short- and intermediate/long-term and on disability in the short-term compared to non-exercise controls. The level of evidence was downgraded due to low study quality and inconsistency. With low certainty of evidence, we found positive results for intermediate/long-term effects on disability compared to non-exercise controls. There is a moderate certainty of evidence that TCE is not effective compared to exercising controls on pain in the intermediate/long term and for disability in both the short- and long-term. The level of evidence was downgraded due to low study quality. Low certainty of evidence was found for varying results on pain when compared to exercising controls.

Yoga

Four SRs were included, and these were based on 10 studies (Tables 1 and 5). The SRs included a total of 1,246 participants (overlap not accounted for) and investigated some form of yoga in patients suffering from chronic neck pain [22, 58,59,60]. Taking overlap into consideration 10 original studies were included. The SRs were published between 2016 [60] and 2020 [58], and the last updated search in the SRs was in 2018 [22, 58]. Two of the four SRs performed an MA [22, 59]. The quality of the included SRs varied from critically low quality [58, 60], to low quality [22], to high quality [59]. There was a very high overlap with a CCA of 30%. No SR investigated the effect of yoga from a long-term perspective.

The SRs included different yoga styles, and these could include combinations of physical postures, breathing, and meditation with the aim of promoting well-being. The most-studied yoga style was Iyengar yoga (a Hata yoga, which implies a more physical-based style) which uses protocols that focus on postures (asanas) that lengthen and strengthen muscles in the neck and shoulders to improve stability, flexibility, alignment, and mobility in muscles, joints, and tendons combined with breathing regulation (pranayama) and relaxation (dyana). Some studies included Kriya and Kundalini yoga, in which one relies less on the asanas and more on energy management, meditation, and breathing techniques [59], but also lesser-known programs like the yogic mind sound resonance technique, which relies on relaxation techniques practiced in supine or sitting positions aiming to increase will power, concentration, and deep relaxation [64]. The yoga interventions were heterogeneous not only in style, but also in the length, frequency, and intensity of the sessions. The interventions were given for a period of between 10 days and 3 months and lasted between 20 min per day to 90 min a week. The control interventions were treatment such as physical therapy [58, 59], exercise [22, 60], or other active non-pharmacological control interventions, Pilates exercises, usual care, self-care information, and supine rest [58,59,60].

The narrative synthesis on pain intensity in the included SRs showed positive short-term post-intervention effects for yoga compared with no or only minimal intervention, while our meta-analysis based on two SRs with low respective high quality [22, 59] showed positive results (SMD = -1.32, 95%CI -1.84 to -0.80; I2 = 0%; Additional file 5) but there were varying results compared to general exercises [22, 58,59,60]. Narratively, two SRs with high and low quality [22, 59] showed positive effects. Regarding disability, there were short-term positive effects for yoga compared to no or only minimal intervention in our meta-analysis based on two SRs [22, 59] with low respective high quality (SMD = -1.00, 95%CI -1.47 to -0.54; I2 = 0%; Additional file 5), but not compared to general exercises [22, 58,59,60].

The GRADE analyses showed (Table 4) that there is a low certainty of evidence for positive effects in the short-term of yoga regarding pain and a moderate certainty of evidence for positive effects in the short-term of yoga regarding disability when compared to non-exercise controls. Compared to exercising controls, there was no effect with low to moderate certainty of evidence for pain and disability, respectively. The level of evidence was downgraded due to poor study quality and conflicting results. Long-term effects could not be analyzed due to a lack of studies.

Discussion

This SR of SRs summarized the literature on various exercise types used in treating chronic neck pain. Our results show a low to high certainty of evidence that the exercise types studied demonstrate positive effects when compared to non-exercise controls for pain levels in the short-term. For disability, all showed positive effects in the short-term compared to non-exercise controls except resistance training. Compared to other exercise interventions, MCE showed positive results for pain and disability levels, while the other exercise types showed varying, or no results. In the long-term, there were mainly no or varying results when compared to non-exercise controls as well as other exercise interventions. Only one SR investigating Pilates was found in our database search and reported, based on low-certainty evidence, that Pilates exercises are not better than other exercises in the short-term to reduce pain and disability, thus aligning with our study findings of the other exercises [23]. Our results are based on 25 SRs (including a total of 125 original studies) with varying risk of bias, including five SRs with high quality, seven SRs with moderate quality, eight SRs with low quality, and five SRs with critically low quality.

Our results partly concur with the results from two SRs on the effect of exercise on chronic neck pain [58, 65]. De Zoete et al. (2020) reported on the effectiveness of general physical exercise (individualized physical exercise, yoga and Pilates, and Tai Chi and Qigong) on pain and disability in chronic neck pain and showed that these exercises have a positive effect compared to usual care interventions [58]. Furthermore, in a network meta-analysis including 40 original trials, the authors found low-quality evidence that exercises such as MCE, yoga, and TCE are equally effective in reducing pain and disability [65]. Mueller et al. (2023) showed with very low to moderate certainty evidence that the effects of MCE and resistance exercises increase with increased frequencies and longer duration of sessions [45]. On the other hand, previous studies have shown contradicting results [26]. Additionally, an updated Cochrane review on exercises for mechanical neck disorders further concluded that exercises for neck pain are safe with only minor adverse effects, but no high-quality evidence exists for the effectiveness of these exercises [20]. Even so, exercises, often MCEs and strength/endurance exercises, are recommended and used in the treatment of patients with chronic neck pain, and often together with early advice and education as recommended [18]. The challenge thus remains for the clinician to decide what type of exercise to use and with what dosage.

In our literature search, we found several SRs reporting on the effect of MCE, and all but one was published between 2020 and 2022 thus indicating that there has been a recent research focus on this exercise type. The results from our meta-analysis show that MCE had a positive effect compared to non-exercise controls as well as exercise controls for pain and disability levels in the short-term. MCE are performed as specific exercises, and under low load affecting the postural stability, thus might be preferred in the short-term before introducing more loaded exercise types [24, 25]. When summarizing the literature, MCE seemingly comprise different methodologies. One approach – craniocervical-flexion hold – is a static approach that was investigated in most of the included SRs, often using a biofeedback device to control the hold of the neck during the exercise [21, 26, 42, 44, 46, 47]. Other exercises used a more functional approach (Pillar exercises, segmental exercises) where the neck was challenged by loaded exercises via the arms, either via resistance by pulleys or by manual resistance by the therapist [26]. A challenge in summarizing the effect of MCE was that in several studies various types of MCE were compared to each other, which could result in a lack of between-group differences.

Summarizing the SRs on resistance training, we found evidence of moderate certainty that resistance training is effective for lowering levels of pain compared to no/minimal intervention in the short-term. These positive short-term effects for pain, when compared to a non-exercising control group, are in contrast to our results from the SR of SRs on the effects of resistance training for low back pain [27], which might indicate that the treatment mechanisms differ for neck compared to back pain. We also concluded that there is moderate certainty evidence that there are no positive effects of resistance training when compared to other types of exercises and inconclusive results were found for the long-term effects. Concerning disability, we found varying results with low certainty of evidence on the effects of resistance training compared to a no/minimal intervention control group in the intermediate/long-term, and there was moderate certainty of evidence for no effects compared to other types of exercises in the short-term. In the present SR of SRs, the resistance training was very heterogeneous, which makes it impossible to give clinical recommendations on the type or dosage. Future studies should investigate different dosages and incorporate progressive loading principles identified in previous research [66].

For TCE there was low- to moderate certainty of evidence that Qigong and/or Tai Chi have positive short- and long-term effects on pain and short-term effects on disability compared to waitlist controls. However, compared to exercise controls, there was low- to moderate certainty of evidence for varying/no effects of TCE on pain and disability. The types of TCE varied largely between the different studies as well as the number of treatment sessions, duration of treatment sessions, and duration of the training period. One interesting finding in our current SR of SRs is that for chronic neck pain, we found an extremely high overlap and more SRs on TCE (seven) than there are RCTs on TCE (five). The more than 20-fold increase in the number of SRs during the last 20 years compared to the 2.6-fold increase in other types of publications could have played a role here [67]. Moreover, in 2019, 24% of the SRs available from PubMed were published in China [67] but were excluded from this review due to the lack of competence in reading these SRs, and it is difficult to know if our results would be similar if we have had included these.

In the last decade, there has been an increased interest in exploring yoga’s effectiveness in chronic pain. Yoga combines physical postures, breathing, and meditation intending to enhance physical and mental well-being [22, 58,59,60]. However, the use of yoga in chronic neck pain has been studied to a lesser extent. In our SR of SRs, yoga showed low-to-moderate certainty evidence for positive results on pain and disability in the short-term compared to other exercises as well as to non-exercise interventions. Even if based on a few SRs [22, 58,59,60] and with varying quality, this finding is in line with the network analysis of original studies [65]. The evidence of the few and highly overlapping reviews in our SR of SRs yielded a moderate level of certainty that yoga may be effective for neck pain and disability in the short-term. No studies reported long-term effects. The interventions were compared with several mixed interventions, which made it difficult to elucidate its actual comparison. The extreme heterogeneity among the yoga interventions in terms of style, the number of sessions, their length, frequency, and intensity, and the comparison groups were remarkable when summarizing the reviews. This makes the implication of our findings uncertain, and it is difficult to provide clear clinical guidance. Additionally, very few studies had studied the effects over time and only assessed the effects at post-intervention. In yoga, the mind–body relation is in focus. However, our review included only outcomes on the “body” without considering outcomes concerning the “mind”, such as quality of life and mood. Even if not investigated in the current SR of SRs, the reason for the positive effect reported for yoga might be the “mind” perspective.

The neck is affected in different ways by static positions such as those seen in office workers or by heavy loads on the arms [24, 25]. The neck can also be considered less robust than the lower back with the range of motion between the segments being larger than in the lower back [68]. Even if neck pain differs from lower back pain our results are in line with what we found summarizing the literature in a SR of SRs on various exercise types used in chronic low back pain, reporting that no exercise type seems to have a positive effect compared to any other, while there seems to be a positive effect compared to non-exercise interventions [27]. In addition, studies have shown that chronic, long-lasting, and recurrent pain sensations lead to changes in the nervous system such as increased peripheral and central sensitization that results in decreased motor function [69]. The improvement of disability that some of the included SRs reported after a training period could be explained by increased physiological functioning, such as increased muscle strength and endurance, increased range of motion, increased relaxation of tensed muscles, or lower pain intensity due to increased endorphin production. However, non-physiological concepts of pain treatment could also play an important role because pain is nowadays seen as a homeostatic emotion [70] in which pain is influenced by changes in the nervous system rather than by changes in tissues [71,72,73,74]. Psychological factors – including catastrophizing, anxiety, avoidance behavior, and depression – are also important in the processes of local and central sensitization, and these factors are all positively influenced by exercise and light physical activity [75, 76]. Thus, the use of exercise treatment in chronic pain conditions should be seen as a form of cognitive therapy where the goal is to modulate the feeling of pain and to modulate the patients’ thoughts and feelings regarding the pain, and not just to increase muscle strength and endurance [77]. This could explain why the choice of exercise type and dosage seems to be of less importance in the treatment of patients with chronic neck pain.

Strengths and limitations

A strength of the present SR of SRs is that a large group of experienced researchers focused on this research question and followed the PRISMA recommendations as well as the recommendations from the Cochrane Back- and Neck group [29, 30]. We included SRs from several databases and thus base our results on a large study population. Furthermore, we did not limit our search and included SRs without any restrictions on publishing year, comparator group, or language. In addition, to our knowledge, this is the first SR of SRs on the effect of various exercise types used in chronic neck pain. A network analysis was recently published on the same topic but included original studies and thus missed out on the original SRs’ conclusions and the risk of bias affecting the certainty of the evidence [65].

Several limitations should be noted. The first is that our results are based on several SRs with a critically low to low quality (n = 13), which account for more than half of our included SRs. Another limitation is that we used data presented in the included SRs, without thoroughly checking if the data from the original RCTs were correct. Interestingly, we were astonished by the number of errors that were published in both SRs and MAs, when randomly checking some of the reported data to the original RCTs and extracting the data for the meta-analyses. We would therefore like to advocate for the highest accuracy when conducting systematic reviews and meta-analyses. Journal editors might, in addition, consider implying control mechanisms to avoid these kinds of errors. Our experience is that performing a peer review of SRs is very time-consuming, and we believe that most reviewers trust the tables and the meta-analyses maybe without checking them. Moreover, the AMSTAR-2 tool lacks the possibility to lower the quality of the included SRs based on this point. Due to the transparency of protocols, there is however low possibility of adding additional exclusion criteria during this process for these kinds of poorly conducted/reported trials.

Most of our included SRs were downgraded due to low quality and inconsistency in results. It might be considered that to have increased certainty of evidence for exercise types used in chronic neck pain we should have included only SRs with a moderate to high quality, thus reaching different conclusions. However, our aim was to include all SRs and MAs on exercises used in chronic neck pain without limitation. Another limitation, and a challenge when conducting a SR of SRs, is how to interpret the results of the included SRs because not all are clear on how participants and interventions are described or how the effects are measured. Some of our included SRs also included a variety of mixed interventions, for example, various exercise types conducted in the same intervention or exercises together with other interventions [26, 42, 58]. Some SRs also used their own definitions for the exercise types that were not used in the same way in other SRs [26]. Another challenge in summarizing the evidence in our SR of SRs is the large overlap shown for some of the exercise types. Motor control exercises, for example, had a high overlap of 26%, which means that several of the original studies were included in several of the SRs. Even so, the results based on the included SRs were somewhat ambiguous.

Chronic neck pain is a wide and heterogeneous diagnosis that might include, for example, patients who suffer from whiplash-associated disorders. We decided not to include SRs investigating patients suffering from whiplash-associated disorders because these often require a more multimodal approach instead of a single exercise intervention. However, we cannot rule out that some of our included SRs comprise such populations even if all defined their populations as having chronic neck pain.

Furthermore, we cannot say anything regarding the dose and duration of the various exercise types, which were heterogeneous in the various SRs, but defining the optimal dose was not the aim of our study. Based on the mechanisms that affect pain and disability levels as previously discussed, the dose and duration are important factors and should be investigated in future studies. In addition, we cannot rule out if a more pragmatic approach including other modalities in addition to the exercises investigated would have changed our results because such an approach was not the aim of our study.

Our results show that exercises have an overall positive effect on pain and disability compared with non-exercise interventions, at least in the short-term. However, in the intermediate/long-term there are varying results, and it remains unclear if the exercises studied in this review are effective when compared to non-exercise controls or to other exercises. Overall, the decision on what exercise type to use in the clinic should be in dialogue with the patient, which is the recommended way of working in a patient-centered way [78]. In addition, adherence to the exercises is seemingly important for a successful outcome [79, 80], but this was not investigated in the present SR of SRs.

Going forward, it is important that future SRs follow the recommendations on how to perform a SR with good quality using e.g. the PRISMA or the Cochrane group guidelines [29, 30] and that they also report the certainty of the evidence for the reader to be able to value the results [41]. Moreover, future original RCTs should preferably include larger cohorts and better-defined control groups so that a within-group comparison is feasible. Considering regression to the mean it is also important that there is a clear contrast between intervention and control group. All interventions and control groups should also be clearly described based on e.g. the template for Intervention and Replication (TIDieR) checklist [81]. Summarizing the literature in a SR of SRs has the advantage of being able to show that within an area of research, there are several SRs and MAs with low to critically low quality also based on original studies with a lower quality, thus affecting the overall evidence. The aim of a SR of SRs such as ours was to identify and appraise all published reviews in one area of interest and to describe their quality, summarize and compare their conclusions, and discuss the strengths of these conclusions [28]. This is important to highlight as many guidelines base their recommendation on how to manage a specific group of patients by summarizing the results of SRs.

Conclusion

Overall, our findings show low to high certainty of evidence for positive effects on pain and disability of the various exercise types used in chronic neck pain compared to non-exercise interventions, at least in the short-term. Compared to other exercises MCE showed short-term effects on pain and disability levels while no such effects were shown for the other exercise types. What exercises to choose for the individual patient with chronic neck pain cannot be recommended from our results since we found no large differences between the exercise types studied here. Because the quality of the included SRs varied greatly, future SRs need to increase their methodological quality.