Background

Tai Chi is a traditional exercise, martial art, and mind–body practice that is practiced by people of different ages and health statuses. Also known as Tai Chi Chuan/Quan or Taiji, Tai Chi originated in China in the seventeenth century A.D. [1]. The practice is low to moderate intensity with repetitive, flowing, meditative movements that aim to cultivate and maintain health and wellbeing [2]. There are five major traditional styles of Tai Chi, namely Chen, Yang, Wu, Wu/Hao, and Sun styles, along with numerous newer styles, hybrids, and extensions. Tai Chi integrates the essence of Chinese folk and military martial arts, with traditional Chinese medicine theories [3, 4]. The core components of Tai Chi are traditionally described as including sequenced movements, meditative and visualization techniques, and deep, abdominal breathing [3]. In China, Tai Chi is widely taught in high schools and higher education-related organizations [5].

Interest in evaluating the effects of Tai Chi in both healthy populations and people with a wide range of diseases, conditions, and symptoms has steadily increased globally [6, 7]. A bibliometric analysis of clinical studies of Tai Chi published between 1958 and 2013 identified 507 studies, of which 43 (8.3%) were systematic reviews (SRs) of randomized controlled trials (RCTs) and/or non-randomized studies of interventions (NRSIs) [6]. The 2010 to 2020 update identified 987 studies, of which 157 (15.9%) were SRs [7].

Given the large number of SRs of Tai Chi, SRs of SRs (henceforth referred to as overviews) are increasingly being conducted. Some have evaluated multiple interventions for a single condition [8,9,10,11,12,13,14,15,16], whilst others have focused only on Tai Chi interventions for either a single condition [17,18,19,20,21,22] or multiple conditions [23,24,25,26,27]. Limitations of the overviews evaluating only Tai Chi interventions [17,18,19,20,21,22,23,24,25,26,27] were the potential for language bias [17, 18, 22, 23, 25,26,27], reporting bias in which the most favourable results were emphasized [23, 27], and reporting multiple estimates of effects/results for the same or similar outcome and population, with limited or no discussion about conflicting results or overlapping of the primary studies [18,19,20,21,22,23,24,25, 27].

As such, this overview aims to systematically identify and appraise the best available SR evidence reported in the most recent, comprehensive, and/or highest-quality SRs, on the safety and effectiveness of Tai Chi for health promotion and managing disease.

Methods

The methods were guided by the Cochrane Handbook for Systematic Reviews of Interventions [28], in particular Chapter V: Overview of Reviews [29], the Joanna Briggs Institute Manual for Evidence Synthesis: Chapter 10 Umbrella Review [30], the GRADE (Grading of Recommendations, Assessment, Development and Evaluations) Handbook [31], and the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 statement [32]. The PRISMA 2020 checklist is presented in Additional file 1.

Protocol and registration

A protocol was registered prior to data extraction at the International Prospective Register for Systematic Reviews (PROSPERO) (CRD42021225708). Deviations from the protocol prior to formal screening and data extraction were as follows: only partial blinding of the reviewers to the results when selecting SRs and outcomes, including important secondary outcomes of a SR, reporting more than three outcomes for some populations; and including SRs of NRSIs.

Populations

All populations were included, regardless of health status, setting, location, and country.

Interventions

All exercise programs described as Tai Chi were included. No limitations were applied to Tai Chi styles (such as Chen, Yang, Wu, Wu/Hao, and Sun style) or forms (such as 6-form, 24-form, 54-form, and 83-form Tai Chi). Exercise programs that combined Tai Chi with other interventions such as Qigong, meditation, or conventional exercise were only included if the reviewers clarified that Tai Chi was the core component. A SR that evaluated Tai Chi and other interventions (e.g. any form of exercise) was excluded if the effects of Tai Chi was not analysed in a separate analysis.

Comparisons

Any type of control was included, for example, no intervention, waitlist control, usual care, and active control. When the data was available, the pooled effects according to control group categories were extracted to reduce clinical and methodological diversity. Comparisons also include a co-intervention if applied in all arms.

Outcomes

Any outcome was eligible for inclusion. However, as much as possible, the number of outcomes extracted per population/comparison group was limited to three. These were selected to reflect the SR’s primary/main outcome(s), outcomes that align with the reasons why people use Tai Chi and what matters to them, the validity/reliability of the measurement tool, and directness of the outcome measure to health status (e.g. clinical outcomes in preference to risk factors). Core outcome sets and other resources such as those published on the Core Outcome Measures in Effectiveness Trials (COMET) Initiative database [33] were used to inform these decisions. Two senior reviewers (GYY, JH) jointly made these decisions. When estimates of effect were reported for multiple timepoints, the timepoints with the most RCTs was selected. Additional timepoints were only selected if the studies were not included in the first estimate.

Study designs

All SRs of interventions, with or without a meta-analysis of RCTs, quasi-RCTs, and other NRSIs (e.g. cohort studies, case–control studies, controlled before-and-after studies, interrupted-time-series studies, case series and case reports), were included. Whilst SRs of RCTs were likely to provide the most reliable evidence for most estimates of effect, SRs of NRSIs were also included (post protocol, pre-data extraction) in the circumstance when this was the best available evidence.

Literature search

The search strategy built upon a bibliometric analysis of Tai Chi intervention studies published between 1st January 2010 and 31st January 2020 [7]. The search was updated for the purpose of this overview (1st January to 12th December 2020) using the same search terms and databases—PubMed, Cochrane Library, EMBASE, Medline, Web of Science, China National Knowledge Infrastructure (CNKI), Chinese Scientific Journal Database (VIP), Sino-Med, and Wanfang Database (Additional file 2). The search strategies were developed and refined by the team of experts who conducted an earlier bibliometric analysis [6]. Tai Chi search terms include “Taiji”, “Tai Ji”, “Tai-ji”, “Tai Chi”, “Tai Chi Chuan”, “Tai Chi Quan”, or “Taijiquan”. Limitation to language and publication status was not applied. Grey literature was included. Database searches were augmented with bibliography searches of other recently published SRs of SRs [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27].

Study selection

The search results from English databases were exported into EndNote (version X9), and those from Chinese databases into NoteExpress (version 3.2). Duplicates were removed before study selection. Following calibration exercises, reviewers (GYY, JH, WLH, HZ) worked in pairs to independently screen the title/abstracts and full texts. Two reviewers (GYY, JH) rescreened the full texts of the 157 SRs (106 published in English, 41 published in Chinese) that were identified in the 2020 bibliometric analysis [7]. Final decisions were made by consensus and involved other reviewers when necessary.

To minimize overlap of primary studies, one SR for each population, condition, or outcome (PCO) was then selected for the final evidence synthesis. A staged approach was applied to selecting this subset of SRs with the aim of identifying the most recent, comprehensive, and highest-quality SR for each PCO. First, SRs with a meta-analysis of RCTs were grouped according to their PCO, from which the publication date and number of RCTs were compared. When multiple SRs were published within 4–5 years of each other and/or the number of RCTs were similar, a single reviewer (GYY, JH) extracted further data about the number of databases searched, any language restrictions, the primary/main outcomes, and the number of RCTs and overlapping RCTs per meta-analysis. An informal appraisal of the SR quality using AMSTAR 2 [34] was also done. Finally, SRs without a meta-analysis were then screened, and SRs that included a meta-analysis of NRSIs were rescreened to ensure there were no missing PCO.

Data collection

A pre-defined data extraction form that was an extension of the bibliometric analysis extraction form was designed and piloted by two reviewers (GYY, JH). Data extraction was staged for pragmatic reasons and to partially blind the investigators when selecting the SRs and PCO. For all included SRs, information about the characteristics of the studies (i.e. citation details, authors, study design, number of RCTs and NRSIs, participants characteristics, and types of outcomes) were extracted. For the subset of SRs selected for the final evidence synthesis (and those when SR selection could not be made based on the preliminary data extraction), additional information about the search strategy, study characteristics of included studies, and the SR quality was also extracted. For each estimate of effect that was selected for the final evidence synthesis, additional information about the participants, settings, estimates of effect, statistical heterogeneity, subgroup and sensitivity analysis, and publication bias was then extracted. Estimates of effect were not extracted for the SRs with no meta-analysis as this would require extracting data from the original publications of the primary studies, nor for a meta-analysis that did not meet the criteria outlined in item 11 of AMSTAR 2. Following calibration exercises, five reviewers (GYY, WLH, FLB, HZ, JH) extracted data into Research Electronic Data Capture (REDCap) [35] that was verified by two senior reviewers (GYY, JH). Final decisions were made by consensus with the review team.

Quality assessment

Only the subset of SRs included in the final evidence synthesis were formally assessed for quality using AMSTAR 2 (A MeaSurement Tool to Assess systematic Reviews, improved version) critical appraisal tool and rated as high, moderate, low, or critically low quality [34]. Items 2, 4, 9, 11, 13, and 15 were deemed critical. Item 7, which requires the list of the excluded articles with the rationale is reported, was introduced to AMSTAR 2 in late 2017. A similar reporting requirement was introduced to the revised PRISMA 2020 statement published in early 2021 [32]. Consequently, for the purpose of this review item 7 was deemed non-critical. Additionally, SRs published before 2019 were not downrated for item 7 if they met the accepted reporting standards for excluded articles as per PRISMA 2009 [36]. For all other items, the AMSTAR 2 guidance was followed. A sensitivity analysis was conducted to compare this modified AMSTAR 2 rating for item 7 with the original guidance.

GRADE guidelines [31] and GRADEpro GDT software [37] were used to rate the overall certainty (quality) of the evidence for the extracted effect estimates. Due to pragmatic constraints, assessments of the risk of bias of the primary studies, heterogeneity, and publication bias relied upon the assessments reported in the SR. However, the results of any sensitivity analyses were extracted and considered. Given the large number of SRs, evaluating a wide range of populations and outcomes, a pragmatic approach similar to that used by Pollock et al. [38] was applied where specific thresholds, ranges, and criteria were established and piloted to optimize consistency and transparency across all the ratings. The details of the rubric used to inform the GRADE assessments are reported in Supplementary File 6 and summarized below.

For the risk of bias (RoB) assessments, randomization/selection bias, assessor blinding, and missing data were deemed the most important categories. This decision reflected the need to select domains assessed by the RoB assessment tools used in the SRs and that it is not possible to blind Tai Chi study participants. For there to be no serious concerns with RoB, at least 75% of the included RCTs in the SR had a low RoB in each of these three categories.

Inconsistency was investigated when the I2 test for statistical heterogeneity was ≥ 75%. This involved inspecting the Forest plot for overlapping 95% confidence intervals (CI) and direction of effects, and the findings from any subgroup or sensitivity analysis reported in the SR. In a post hoc sensitivity analysis, inconsistency was investigated if the I2 test was ≥ 30% or τ2 test p ≥ 0.1.

Since all participants, interventions, and outcomes were directly relevant to the research question, all estimates of effect were automatically rated as having no serious concerns with indirectness.

Assessments of imprecision were according to whether the optimum information size was likely to be met, the width of the 95% CI, and whether important benefits and/or harms could be excluded. Due to pragmatic constraints, unless reported otherwise in the SR, thresholds were set for optimum information sizes [31, 38]. In a post hoc sensitivity analysis, the threshold for the optimum information size for continuous data was increased from 200 [38] to 400 [31]. For standardized mean differences (SMD), the minimal clinically important difference (MCID) for important benefit was set at 0.5 that is considered to be a moderate effect size, and a large effect size was set at 0.8 [39]. For mean differences (MD), the MCID for important benefit was based on studies involving similar populations [40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59]. For relative risks (RR), the cut-off for important benefits was set at < 0.75 or > 1.25. For risk differences (RD), the cut-off for important harm was set at ± 0.1 for non-serious AEs and ± 0.01 for serious AEs.

Publication bias was only considered when at least ten RCTs were in the meta-analysis. In instances when the SR did not report on the publication bias for an effect estimates yet assessed it for another, the findings from that assessment were applied. If there was no information, at least half of the studies had to have a sample size larger than 100 for there to be no serious concerns about publication bias.

Following calibration exercises, the AMSTAR 2 assessments were independently made by two reviewers in pairs (GYY, JH, FLB) and the GRADE certainty assessments were made by one of these reviewers and verified by a second reviewer. Final decisions were made by consensus with the team.

Synthesis of results

The results are narrated and presented in tables, including a summary of findings table for all estimates of effect. Dichotomous data are presented as RR or RD and number needed to treat (NNT), with 95% confidence intervals (CIs). When available, rates are presented as the number of participants. Continuous data are presented as weighted MD or SMD, with 95% CIs. No further meta-analysis, network analysis, or re-analysis of the results was conducted.

Results

Search results

The literature searches identified 210 eligible SRs (211 articles) of Tai Chi (Fig. 1). The citations with the reason for excluding 100 full-text articles are listed in Additional file 3.

Fig. 1
figure 1

PRISMA flow diagram

Study selection for evidence synthesis

From the 210 SRs of Tai Chi, 47 SRs [60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106] were selected for the final evidence synthesis and 114 estimates of effect, representing 59,306 adult participants in RCTs, were extracted from 37 SRs [61, 62, 64, 66,67,68, 70, 71, 73,74,75, 77, 79,80,81,82,83,84,85,86,87,88, 90,91,92,93,94, 96,97,98, 100,101,102,103,104,105,106]. Estimates of effects were not extracted, and the GRADE certainty of the evidence was not appraised for four SRs with unreliable meta-analyses [65, 76, 78, 99] and six SRs with no meta-analysis [60, 63, 69, 72, 89, 95]. No results were extracted from, nor was the AMSTAR-2 quality formally appraised or reported for 163 SRs (164 articles) because for 79 SRs, a far more recent SR, typically with more primary studies, was identified; for 46 SRs (47 articles) following further consideration, a SR of higher quality and/or with more primary studies in the meta-analysis for the PCO was selected; and for 38 of the SRs that did not conduct a meta-analysis, the PCO were reported by a SR with a meta-analysis. When the analysis of this overview has been finalised, we found an erratum for an included SR with a meta-analysis on fear of falling was published on 3rd September 2022 [107], which corrects the error that led a misinterpretation of their methodology and findings because a meta-regression was performed with the SMD as the dependent variable. As a result, the comparison we included from this SR was Tai Chi with and without supervision by a Tai Chi instructor, which is not eligible for inclusion. The SR was still included as the corrections did not alter the overall assessment of the certainty of the evidence for that outcome. The citations and reasons for excluding the 163 potentially eligible SRs from the evidence synthesis are reported in Additional file 4.

Characteristics of studies

Since 2010, the number of SRs published each year in English and Chinese databases rose exponentially (Table 1). Most were SRs with a meta-analysis of RCTs (78.6%, 165/210) and were published in English (73.8%, 155/210) or Chinese (25.7%, 54/210). The first author of 139 (66.2%) SRs was from a university/institution located in mainland China, Hong Kong, or Taiwan. The median number of participants per SR was 750, ranging from 42 to 9263. Only 18 (8.6%) SRs included studies in which at least some of the study participants were under 25 years of age. Multiple outcomes measuring the effects of Tai Chi in a wide range of populations were evaluated. The most common conditions and their associated risks factors were for cardio/cerebrovascular diseases and falls. One SR specifically evaluated the risk of adverse events.

Table 1 Characteristics of systematic reviews of Tai Chi interventions

Table 2 summarizes the characteristics of the 47 SRs (41 SRs with meta-analysis and 6 SRs without meta-analysis) included in the final evidence synthesis. Of note, only two SRs included adolescents [95, 100] and 40 included older adults (≥ 60 years). Almost all study participants were living in independently in the community. Most SRs included participants from both Asian and non-Asian countries. Only two SRs were limited to Chinese participants only [103, 104].

Table 2 Characteristics of systematic reviews included in the evidence synthesis

Quality of studies

According to AMSTAR 2 quality rating, two (4%) of the 47 SRs were rated as ‘High’ [82, 105], seven (15%) as ‘Moderate’ [60, 79,80,81, 97, 101, 104], 20 (43%) as ‘Low’ [61, 66,67,68,69,70, 72,73,74,75, 83,84,85,86,87, 89, 96, 98, 102, 103], and 18 (38%) as ‘Critically low’ [62,63,64,65, 71, 76,77,78, 88, 90,91,92,93,94,95, 99, 100, 106] (Table 2, Table 3, and Additional file 5). Notably, only four SRs (9%) clearly stated a rationale for the study design inclusion/exclusion criteria (item 3), five (11%) reported the funding details of the included studies (item 10); five (11%) listed the articles excluded at full-text screening (item 10); and 17 (40%) had registered or published a protocol (item 2). Other common deficiencies were not adequately considering and/or discussing how the risk of bias of individual studies might impact the results (items 12), and/or not adequately considering or examining statistical, methodological, or clinical heterogeneity (items 13). Six SRs used the PEDro Scale [107] and another six, Jadad, and whilst both are well regarded risk of bias assessment tools, they do not ask about selective reporting bias that is a requirement for full marks for item 9. However, even if full marks were awarded, a sensitivity analysis confirmed this would not have changed their overall ratings. In contrast, a sensitivity analysis found that if item 7 was added to the critical item list and no concessions for SRs published before 2019 was applied, then despite having met the 2009 PRISMA reporting standards for excluded articles [108], only five (11%) of the systematic reviews would have met the criteria. Consequently, an additional seven SRs would be downrated from moderate to low quality [60, 79,80,81, 97, 101, 104] and 18 from low to critically low quality [61, 66, 68,69,70, 72,73,74,75, 83,84,85,86,87, 89, 96, 102, 103] (Additional file 5).

Table 3 Summary of findings of the health effects of Tai Chi

GRADE evidence certainty

Of the 114 estimates of effect that were extracted, only eight (7.0%) were graded as high certainty evidence; 43 (37.7%) moderate, 36 (31.6%) low, and 27 (23.7%) very low. Serious or very serious concerns with the risk of bias of the individual RCTs was the predominant issue that negatively impacted 92 (80.7%) of the extracted effect estimates. Imprecision in effect estimates was the next most common issue (43 effect estimates, 37.7%) that was a function of the small number of studies in the meta-analysis and/or their small sample sizes. Thirty-seven (32.5%) effect estimates were graded down for inconsistency. Whilst all the meta-analyses had at least one RCT with a small sample size, only three instances of publication bias were identified. However, if the thresholds and criteria from the post hoc sensitivity analyses were applied, then 31 (25.8%) estimates would be further downrated due to serious or very serious concerns with imprecision, and 6 (5.0%) estimates would be rated up from very serious to serious concerns. In this instance, only 6 (5.0%) would be graded as high certainty evidence; 28 (23.3%) moderate, 53 (44.2%) low, and 33 (27.5%) very low. Details of the GRADE certainty assessments can be found in Additional file 6.

Summary of the effects of Tai Chi

Table 3 presents the Summary of Findings of 114 estimates of effect and the GRADE certainty of the evidence of Tai Chi SRs according to population, outcome, and comparison that were extracted from 37 SRs with a meta-analysis. Of the 108 estimates of effect reported for Tai Chi treatment outcomes, 107 favoured Tai Chi. However, 21 estimates were not significant and are interpreted as equivalent to the comparison groups. This included the one estimate that favoured the comparison groups.

Adverse events

Cui et al. [64] evaluated the overall safety of Tai Chi. No significant differences were found in the risk of serious, non-serious, or intervention-related adverse events (AEs) from Tai Chi compared to both physically active and inactive interventions in healthy adults and people with chronic diseases (low to moderate certainty). The most common AEs were non-serious AEs, such as musculoskeletal aches and pains. Serious AEs were found in studies involving patients with heart failure, including death, hospitalized, and worsening heart failure or its co-morbidities. The reviewers reported that no serious AEs were determined to be attributable to Tai Chi or control conditions. The reviewers noted that an important limitation of the evidence was ongoing underreporting of AEs in many RCTs and only a few used an AE monitoring protocol.

Twenty of the other SRs included in the evidence synthesis also reported AEs (Table 2). Of which, 18 reported no AEs [62, 70, 72, 76, 81, 83, 84, 86, 89, 90, 93, 94, 96,97,98, 103, 104] and two reported mild, transient musculoskeletal AEs [82, 85].

General health, quality of life, and wellbeing

Whilst most SRs were for adults and older adults with chronic diseases, a SR with no meta-analysis reported various physical and psychological benefits of Tai Chi for students in higher education [95]. Another SR with no meta-analysis reported improved workplace productivity/motivation and work-related stress for healthcare workers [63].

Health-related quality of life (QoL) outcomes were frequently evaluated for adults and older adults, most of whom had one or more chronic diseases. The results from the meta-analyses of QoL outcomes for single conditions are presented in their respective sections below. Disease-specific QoL outcomes are reported for chronic heart failure [66], chronic obstructive pulmonary disease [67], fibromyalgia [61], and Parkinson’s disease [100], and generic QoL outcomes for cancer [83], hypertension [90], and type 2 diabetes mellitus [106]. Other related outcomes are reported for stroke (activities of daily living) [81], rheumatoid arthritis (functional status), and knee osteoarthritis (self-efficacy) [70].

Three additional SRs representing QoL outcomes for other populations were also selected. For women in the perimenopausal life stage, there was moderate certainty evidence of a clinically important effect for some of the Short Form Health Survey 36-item (SF-36) QoL domains (general health, vitality, bodily pain, and mental health) and low certainty evidence of equivalence to other control groups for the physical and social function QoL domains [92]. For older adults with or without chronic diseases, there were clinically important improvements in overall QoL that was measured using various generic and disease-specific QoL tools (low certainty) [91]. For those with chronic diseases, there were small improvements in both the physical and mental health SF-36/SF-12 QoL domains (moderate certainty) [62]. For the physical QoL domain, two RCTs overlapped with other reported effect estimates, one for hypertension (high certainty, small effect) [90] and one for perimenopause (low certainty, equivalent effect) [92], and for the mental health domain, one RCT overlapped with the perimenopause effect estimate (moderate certainty, small effect) [92].

Cancer

The effects of Tai Chi on QoL, pain, fatigue, and sleep were commonly appraised, particularly for breast cancer survivors. Four SRs were selected [77, 79, 83, 87]; however, none of the SRs were comprehensive and all of them had missed numerous eligible RCTs. Most of the effects from Tai Chi were either small or equivalent to the comparison groups, or there was very low certainty evidence.

For female cancer survivors, the evidence was more mixed. There was low certainty evidence of small improvements in the QoL physical domain [83]. However, the effects of Tai Chi were unclear for both psychological and social QoL domains due to very low certainty evidence [83]. For breast cancer survivors only, there were clinically important improvement in fatigue at 3 months when Tai Chi was added to usual care or rehabilitation (low certainty) [77], yet no difference at 3 or 6 months compared to psychological interventions or sham Qigong (low certainty evidence) [77]. Similarly, compared to usual care or rehabilitation, there were small improvements in pain at 3 months (moderate certainty), yet no difference at 3 weeks compared to rehabilitation only (low certainty) [79].

Cardiovascular diseases, diabetes, and risk factors

For adults and older adults post myocardial infarction, clinically important improvements in VO2-max were found (low certainty) [96], but the effects were unclear for older adults with stable angina due to very low certainty evidence [74]. For those with chronic heart failure, the effects of Tai Chi on left ventricular ejection fraction (LVEF), distance they could walk in 6 min, and disease-specific QoL were also unclear due to very low certainty evidence [66]. However, there was moderate certainty of clinically important improvements in psychological distress for people with chronic heart failure [90].

Clinically important reductions in both systolic and diastolic blood pressure were found for people with essential hypertension (moderate to low certainty) [102] and diabetes mellitus (moderate certainty) [106]. There was probably no effect for normotensive adults; however, the estimates are not reported because some RCTs were excluded from the final the meta-analyses and no sensitivity analysis was reported [65]. The antihypertensive effects for people with essential hypertension were greatest when Tai Chi was compared to no intervention or health education (moderate to low certainty, large effect), followed by anti-hypertensive medication (low certainty, moderate effect), and then other exercise interventions (moderate to low certainty, small effect) [102]. Compared to usual care, the effects of Tai Chi on psychological QoL were equivalent (moderate certainty) and there were small improvements in physical QoL (high certainty) [90].

The effects of Tai Chi were mixed for people with hyperlipidemia. Only moderate reductions in triglyceride levels were found (moderate certainty), and there was probably no difference between Tai Chi and usual care or other types of exercise on total cholesterol, high-density lipoprotein cholesterol, or low-density lipoprotein cholesterol (low to very low certainty) [84].

For people with type 2 diabetes mellitus, improvements in glycemic control were small and unlikely to be clinically important (moderate certainty) [106]. However, there were clinically important improvements in the QoL domains of pain and physical function (moderate certainty) [106].

Chronic obstructive pulmonary disease

When Tai Chi was compared to no exercise controls for people with chronic obstructive pulmonary disease, there were clinically important improvements in both lung function and disease-specific QoL (moderate certainty); however, the improvement in the distance walked in 6 min was unlikely to be clinically important (low certainty) [67]. Tai Chi was unlikely to be any more effective than other types of exercise (moderate to low certainty) [67].

Cognitive function and impairment

Clinically important effects on the executive function of people with no cognitive impairment were found when Tai Chi was compared to no exercise and exercise (moderate certainty) [94]. For people with mild cognitive impairment, only the delayed recall test improved (high certainty) [102]. There were no differences between the Tai Chi and control groups’ mini-mental state examination (MMSE) (high certainty) and digit span tests (moderate certainty) [102].

Fatigue, sleep quality, and fibromyalgia

For adults suffering from fatigue, with or without any serious ailments or chronic diseases, there were clinically important improvements in vitality (low certainty) and small improvements in fatigue (moderate certainty) [97].

For healthy adults, there were moderate improvements in sleep quality (low certainty) and small improvements for adults with chronic diseases (low certainty) [86]. Two of the three RCTs in the meta-analysis of sleep quality for cancer survivors (very low certainty, equivalent effect) [83] overlapped with the this larger meta-analysis of 15 RCTs for adults with chronic diseases [86].

For adults with fibromyalgia, there were clinically important improvements in activities of daily living after 12 to 16 weeks when Tai Chi was compared to usual care (moderate certainty); however, at 24 to 32 weeks, the effects were unclear due to very low certainty evidence [61]. Whether Tai Chi reduced the pain from fibromyalgia was also unclear due to very low certainty evidence [61].

Immunity

One SR with no meta-analysis reported improvements in cell-mediated immunity (including in people with HIV infections) and antibody levels (including in older adults) [69]. However, none of the studies included in the SR evaluated whether these improvements translated into direct clinical outcomes such as preventing or recovering from infections.

Mental health

Except for schizophrenia and university students with symptoms of depression, the SRs pooled the results of studies of participants who had mental health problems such as depression with studies of participants who had other health conditions in which mental health problems are a common comorbidity.

For adults and older adults with chronic diseases, including those suffering from depression, a 2014 SR reported small improvements in depression outcomes (high quality) and anxiety outcomes (moderate quality) and both estimates of effect were stable after adjusting for participants’ severity of baseline symptoms, health status, age, and ethnicity, and whether depression or anxiety was the primary outcome of the RCT [98]. The findings were congruent with more recent SRs that reported depression outcomes for stroke survivors (low certainty, small effect, one overlapping RCT) [80], fatigue from any cause (very low certainty, moderate effect, no overlapping RCTs) [97], knee osteoarthritis (moderate certainty, small effect, one overlapping RCT) [40] and older adults (moderate certainty, small effect, three overlapping RCTs) [62], and also psychological distress associated with chronic heart failure (moderate certainty, moderate effect, no overlapping RCTs) [90]. However, in another SR with no overlapping RCTs, due to very low certainty evidence, it was unclear if stress or mood outcomes improved in those with chronic diseases [93].

Improvements in depression outcomes were found when university students with depression or depressive symptoms used Tai Chi compared to no intervention or other exercise; however, the effect estimate was not extracted due to a probable data transformation error [99].

Clinically important improvements in negative symptoms (low certainty), but not positive symptoms (moderate certainty) of schizophrenia, were found when Tai Chi was added to usual care; however, it was unclear if discontinuation rates were lower (very low certainty) [104].

Multiple sclerosis

A SR with no meta-analysis reported positive improvements in fatigue, as well as balance, gait, flexibility, depression, and quality of life in adults with multiple sclerosis [89]. However, despite this positive trend, in a subgroup analysis of fatigue for any condition, the findings from two RCTs (one overlapping) were not significant (SMD − 0.77, 95% CI − 1.76 to 0.22) [97].

Musculoskeletal conditions and pain

Most of the SRs and their included primary studies were for older adults with knee osteoarthritis. There were clinically important improvements in pain (moderate certainty), stiffness (low certainty), physical function (moderate certainty), and depression outcomes (moderate certainty), as well as small improvements in self-efficacy (moderate certainty) [70]. Similar findings were also reported in the most recent SR for any type of osteoarthritis [78]. However, the effect estimates were not extracted due probable data transformation errors and/or extensive overlap with the meta-analyses reported for knee osteoarthritis.

The effects of Tai Chi on knee flexor and extensor muscle strength were also evaluated in adults with or without osteoarthritis. The effects favoured Tai Chi, especially when Tai Chi was only compared to non-exercise controls (low or moderate certainty) [88].

For people with rheumatoid arthritis, whilst the results were promising, there was only very low certainty evidence about the effects of Tai Chi on pain, disease activity, and function [82].

The findings were mixed for people with osteoporosis or osteopenia. Compared to usual care, there were clinically important improvements in spine bone mineral density (BMD) (low certainty) and possibly femur BMD (very low certainty) [101]. Compared to no-treatment controls, the improvements in spine BMD (moderate certainty) and femur BMD (low certainty) were small and probably clinically unimportant [101].

Regarding pain outcomes, there were clinically important improvements in bodily pain for perimenopausal females with or without osteopenia/osteoporosis (moderate certainty) [92] and low back pain when compared to usual care (moderate certainty) or inactive controls (low certainty) [85]. However, due to very low certainty evidence, it was unclear if Tai Chi reduced pain caused by tension headaches [68]. No SRs were identified that synthesized results for neck or shoulder pain.

Stroke, Parkinson’s disease, and falls

There was low certainty evidence of a 77% reduced risk of fatal stroke and an 89% reduction in the risk of nonfatal stroke over 1 to 2 years, in healthy older adults and people with diabetes and/or hyperlipidemia [103]. For stroke survivors, the addition of Tai Chi to their rehabilitation program resulted in clinically important improvements in upper limb function (low certainty) and balance (low certainty). The effects on lower limb function were unclear due to very low certainty evidence and there were only small improvements in timed up-and-go tests (low certainty) [81]. Compared to rehabilitation, there was low certainty evidence of improvements in disease-specific activities of daily living [81] and depression outcomes [80]. However, the improvements in depression were small and unlikely to be clinically important.

Clinically important improvements in the overall motor function of people with Parkinson’s disease (moderate certainty), balance (high certainty), and timed up-and-go tests (high certainty), as well as their disease-specific QoL (high certainty), were found [100].

Falls prevention and associated risk factors such as balance, mobility, and fear of falling were commonly reviewed. Tai Chi was found to reduce the risk of falling by at least 20% (NNT: 11) for older adults with or without a history of falling, including adults with Parkinson’s disease and stroke survivors (moderate certainty) [73]. Subgroup analysis suggested there might be a dose-relationship between the number of times Tai Chi was practiced per week and falls risk, but the findings were not statistically significant [73]. Falls risk factors also improved for older adults; however, the effects were unlikely to be clinically important (moderate or very low certainty) [71]. Mixed findings for falls risk factors in prefrail and frail older adults were also reported in a SR with no meta-analysis [60]. It was unclear if Tai Chi reduced the fear of falling due to very low certainty evidence [75].

Vestibular disorders

A SR with no meta-analysis of Tai Chi for vestibular rehabilitation reported improvements in dynamic balance, gait, and postural performance [72].

Discussion

This critical overview comprehensively identified SRs of Tai Chi published in English, Chinese, and Korean languages that evaluated the effectiveness and safety of Tai Chi for health promotion, and disease prevention and management. Tai Chi was found to be generally safe, even for frail older adults; however, mild, transient discomfort during the first few weeks was reported by some participants. Clinically important benefits were most consistently reported for Parkinson’s disease, falls risk, knee osteoarthritis, low back pain, cardiovascular diseases including hypertension, and stroke.

Despite the large number of SRs, there were gaps in the available SR evidence. For the most part, the conditions most commonly evaluated by SRs generally matched those most commonly evaluated by primary studies. However, based on the bibliometric analyses of studies evaluating Tai Chi interventions [6, 7], the following had sufficient RCTs and were yet to be systematically reviewed. These were for people with depression, anxiety, drug dependency, musculoskeletal conditions of the hip, neck or shoulder, sarcopenia/frailty, diabetic neuropathy, or dysmenorrhea. Other evidence gaps included a paucity of SRs examining effects of Tai Chi for disease prevention. Except for stroke prevention, only indirect disease prevention outcomes (i.e. risk factors) such as hypertension, hyperlipidemia, HbA1c, falls prevention, balance, mobility, bone mineral density, and executive cognitive function were identified. Finally, whilst some SRs included healthy participants, with the exception of executive cognitive function [94], only a few evaluated the effects of Tai Chi for health promotion, quality of life, and wellbeing in healthy participants [63, 95]. This is despite an astounding number of RCTs, well over 100 [6, 7], evaluating these outcomes in healthy population groups.

It is noteworthy that a rapid search of PubMed, Embase, Cochrane Library, and CNKI databases for SRs published between 1 January 2021 and 5 June 2022 identified 38 potentially eligible SRs. Therefore, some of the identified gaps in the evidence may have been addressed and there may be higher quality, more comprehensive SRs than those included in this synthesis. Given this rapidly growing evidence base, an update of this overview is warranted.

Limitations of the evidence

Rather than relying on the conclusions in the SRs, we appraised the evidence for the included estimates of effects. Notably, the GRADE certainty of the evidence for just over half of the estimates of effect was rated as low or very low. This was despite making a number concessions according to a pragmatic algorithm developed by Pollock et al. [38] when grading over 100 estimates for a Cochrane overview. Like Pollock et al. [38], the risk of bias for blinding focused on the study investigators rather than participants; the cut-off for the optimum information size for continuous outcomes was set at ≥ 200 participants, rather than the 400 tentatively recommended by GRADE [31]; and the cut-off for the I2 statistic when rating statistical heterogeneity was set at ≤ 75%. Additionally, although only a few instances of publication bias were identified, small sample sizes in many studies often reduced the imprecision of the estimates. Larger, higher-quality studies are therefore required to confirm many of the findings reported in this overview.

Limitations with the overall quality of the available SRs were another major concern. The majority of SRs were rated as low or critically low quality according to AMSTAR 2. Some of this reflected avoidable deficiencies in reporting. However, there were also numerous methodological deficiencies. Notably, many results were potentially conflated by pooling Tai Chi interventions of different intensity, frequency, and duration; comparison groups, regardless of whether they were likely to be an active or inactive control; and populations who may vary in their baseline severity, risk, prognosis, or clinical responsiveness. The impact of these decisions was often not appropriately investigated with subgroup or sensitivity analyses, or meta-regression. This may have exacerbated statistical heterogeneity and/or led to an over or underestimation of the effect sizes of Tai Chi. It also limited the ability to assess dose effects and determine how often and for how long Tai Chi needs to be practiced.

Issues with comprehensiveness and missing RCTs were another concern. Notably, during the final selection process, it became apparent that meeting the requirements for a comprehensive literature search strategy (item 4) and the overall AMSTAR 2 rating was no guarantee that all eligible primary studies were identified. For example, neither the high-quality Cochrane review of exercise interventions for falls [109, 110] nor its moderate quality 2020 update [111] was selected as they missed more of the eligible Tai Chi studies, partly due to not searching Chinese language databases. However, even when both English and Chinese language databases were searched, issues with missed studies were also identified in SRs for cancer, Parkinson’s disease, cardio/cerebrovascular diseases, and diabetes. It is highly recommended that reviewers pay greater attention to searching the reference lists not only of the included studies but also published SRs, consulting content experts in the field, and including experienced research librarians if possible to help optimize search strategies [112]. Further, considering Tai Chi originated in China and over half of the primary clinical studies have been published in Chinese [6, 7], it is difficult to justify not searching the major Chinese databases [112].

Strengths and limitations of this overview

Strengths of this overview include the comprehensive literature search, transparent study selection, prioritizing the outcomes, low overlap of primary studies, and independent rating of the GRADE certainty of the evidence for each estimate of effect. In addition, we developed a pragmatic GRADE certainty rubric to facilitate a transparent and consistent rating process. However, by not evaluating the primary studies, variations among the interventions, and setting thresholds for some decisions, important nuances may have been overlooked that could have justified upgrading or downgrading the evidence [113]. For instance, the post hoc sensitivity analyses applied more rigorous criteria that led to the evidence certainty for 31 of the 120 estimates being downgraded one level. Yet this approach was still blunt, as it did not allow for instances when there are borderline concerns across a few domains that when combined may justify rating down one level rather than two. Indeed, there were numerous instances when the same evidence was given a different GRADE certainty rating by other reviewers [13, 19, 23, 27]. Therefore, whilst the findings provide a general overview of Tai Chi effectiveness and the evidence gaps, an appraisal of the primary studies, involvement of stakeholders, and consideration of context and expert consensus may still be required before making any critical decisions for Tai Chi clinical guidelines or policies [113].

Substantially more SRs were identified than equivalent reviews [23, 26, 27]. This was despite restricting our search to publications from 2010. There were no language limitations, and the major English and Chinese databases were searched. Nevertheless, some SRs are likely to have been missed, including SRs only indexed in databases of another language such as Korean, Japanese, or Thai.

Due to the large number of SRs, most of which were screened using a partially blinded process to help reduce the risk of selective reporting bias, it is possible that some populations and outcomes were also missed. However, we are confident that we have reported the important outcomes also highlighted in other SRs of SRs [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27].

Efforts were made to minimize overlapping among the selected SRs, yet there were still a few instances of overlap (e.g. quality of life, mobility, mental health, and sleep) in which one or two RCTs were included in more than one of the reported estimates of effect. This may have biased results for the same outcome, either positively or negatively. However, unlike similar overviews of Tai Chi [23, 27], these limitations were offset in this overview by not reporting every estimate of effect for every SR and reporting the certainty of the evidence in the main summary of findings table irrespective of the effect size or statistical significance.

Finally, there was the potential for bias to be introduced during the selection and assessment processes, as three of the reviewers (GYY, JL, and PMW) were Tai Chi investigators (see “Competing interests” section). However, only GYY was directly involved in the screening, selection, and appraisal processes and was yet to publish a SR before the completion of this overview. Of the 210 included SRs, four were authored by reviewers of this overview [94, 114,115,116] and only one was selected in the final synthesis [94]. The SR was included despite being published in 2014 and rated as critically low quality because it was the only SR to meta-analyses cognitive performance outcomes for the healthy older adult population group.

Conclusions

This overview comprehensively identified and critically appraised the most recent, best available SR evidence. Tai Chi was found to be generally safe and can be practiced at various levels of intensity by healthy adults, frail older adults, and people with chronic diseases. There was some evidence of beneficial physical, psychological, and quality of life outcomes from Tai Chi for a wide range of conditions. Given its multisystem effects, Tai Chi might be a suitable choice for those seeking a single intervention to help with numerous problems and symptoms.

However, the certainty in the evidence was often limited by the quality of the primary studies and their systematic reviews, clinical, methodological and statistical heterogeneity, and small sample sizes. Further research, including implementation and cost-effective research is warranted to support patient decisions, clinical practice, and policies.