Introduction

Osteoporosis is a progressive systemic skeletal disease characterised by low bone mass and microarchitectural deterioration of the bone tissue, resulting in increased bone fragility and susceptibility to fractures [10]. Globally, approximately 200 million people have osteoporosis [11]. The overall prevalence of osteoporosis in Chinese adults aged ≥ 60 years is 37.7%, and the prevalence in older men aged ≥ 60 years, postmenopausal (PM) women aged ≥ 40 years, and older women aged ≥ 60 years is 23.7%, 32.5%, and 48.4%, respectively [25, 40]. The major complication of osteoporosis is an osteoporotic fracture, with more than 2 million patients having osteoporotic fractures in China [17]. China’s economic burden of osteoporosis and osteoporotic fractures could exceed $25.9 billion per year by 2050 [17].

Pharmacological treatments for promoting bone health include bisphosphonates, oestrogen, and selective oestrogen receptor modulators, which can prevent bone resorption (Qaseem et al., 2017; [12]. Calcium and active vitamin D supplements are also recommended for older patients with osteoporosis as a basic treatment, which could lower fracture risk and improve muscle strength and balance function [17]. However, adherence remains poor because of adverse events and frequent dosing with pharmacological therapy [18].

Tai Chi (TC) is a traditional Chinese exercise that combines meditation with slow and gentle movement and deep diaphragmatic breathing (Wayne et al., 2008). Several systematic reviews (SRs) have been performed to evaluate the efficacy and safety of TC on bone health in the intervention and prevention of osteoporosis and osteopenia. However, the conclusion of these SRs remains controversial, and the aggregated results of two SRs with meta-analyses (MA) revealed that TC might help improve BMD values [37, 41], while the other two SRs concluded that there was no evidence to support that TC could attenuate bone loss in PM women (Liu et al., 2017; Xu et al., 2012). Besides the inconsistency of findings in SRs, inconsistent and often inappropriate analytical methods may result in obvious differences in study quality. The majority of published SRs only included one specific population, resulting in the lack of comprehensive review of the TC exercise in promoting bone health. Therefore, we conducted this umbrella review to critically evaluate the quality and reliability of previously published SRs and assess the certainty of the evidence reported in these SRs. This study aimed to critically evaluate SRs of TC exercise on bone health and provide the latest available evidence in various relevant populations.

Methods

This umbrella review followed the guidance of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [20]. The checklist is provided as Supplementary File 2.

Protocol and registration

The protocol of this study was registered on PROSPERO (No. CRD42020173543).

Eligibility criteria

Inclusion criteria

Study design: systematic reviews with or without meta-analyses.

Participants: patients with osteoporosis and osteopenia, participants with high risk of developing osteoporosis and osteopenia (such as older people and perimenopausal (PERIM) and PM women), and healthy people receiving intervention for preventing osteoporosis.

Interventions: Any type of TC exercise

Comparison: no restrictions.

Exclusion criteria

Studies published in duplicate.

Full reports of studies could not be retrieved.

Outcomes

The primary outcomes of this study were endpoint events of fracture and fall. The secondary outcomes included BMD, quality of life, pain, muscle strength, balance function, and laboratory examinations for alkaline phosphatase (ALP) and serum calcium and phosphorus. The safety outcomes reported for the included SRs were evaluated.

Search strategy

The electronic databases PubMed, EMBASE, Cochrane Library, Web of Science, China National Knowledge Infrastructure, Wanfang Database, Chinese Biomedical Literature Database, and Chinese Scientific Journals Database (VIP database) were searched from their initiations to April 2022, and to March 2023 in updated search. The international prospective register of systematic reviews of PROSPERO was searched to find potentially relevant SRs, and the registration number of relevant records were further searched manually in PubMed. The languages of databases searches were restricted to English and Chinese. Previous researches indicated that language restriction may not influence the evidence of systematic reviews [4, 19]. We built a search strategy based on controlled vocabulary and free text terms. The terms “osteoporosis,” “postmenopausal,” “bone density,” “Tai Ji,” and “systematic review” were used to develop the search strategy for PubMed, which is shown in Supplementary File 3. We used modifications of this search strategy to locate retrievals in other databases.

Study selection

Two authors independently performed the study selection process. Retrieval of the database searches was imported into EndNote 20. After removing duplicate references, titles and abstracts were assessed, and the full texts of potentially eligible publications were critically scrutinized to determine the included SRs. We connected the corresponding authors of the articles that the full reports cannot be retrieved.

Data extraction

Two authors of this study independently performed data extraction. Microsoft Excel (Microsoft Office 2019, Microsoft, Redmond, WA, USA) was used to extract the data. The following data were extracted from the included SRs:

·Identification data (year of publication and name of the first author).

·Primary study data (number of included primary studies and risk of bias of included trials).

·Participant’s data (conditions and number of participants included).

·Details of outcomes reported in SRs (individual outcomes reported in SRs, metrics, effect estimates, and 95% confidence intervals (CIs) of synthesized outcomes).

Quality evaluation

Reporting quality

The reporting quality of included SRs was assessed using the updated PRISMA checklist (Page et al., 2020). Complete reporting in each item in the PRISMA-2020 checklist was scored as “Yes” and counted as 1 point, incomplete reporting in items or sub items was scored as “Partially yes” and counted as 0.5 points, and no reporting as “No” and 0 points [5, 28]. The possible total scores of the PRISMA checklist are 27 points, with a score of 22 to 27 points indicating high reporting quality, 15 to 21.5 points indicating moderate reporting quality, and 0 to 14.5 points indicating low reporting quality [28].

Methodological quality

The methodological quality of included SRs was assessed using A Measurement Tool to Assess Systematic Reviews 2 (AMSTAR-2) tool [22]. The AMSTAR-2 contains 16 items, including 7 critical domains. Flaws in more than one critical domain lead to critically low methodological quality, flaws in one critical domain lead to low quality, weakness in more than one non-critical item leads to moderate quality, and no weakness or weakness in only one non-critical item indicates high quality [22].

Descriptive analyses

Descriptive analyses were performed with the characteristics and results of the included SRs. When more than one MA evaluating the efficacy of TC on a given bone health biomarker for the same population were identified, the most recent one was retained for further analyses [26]. The analyses were performed in subgroup approach according to different population, to avoid the influence of clinical heterogeneity.

Certainty of evidence

We evaluated the certainty of synthesized evidence reported in included SRs using the Grading of Recommendation Assessment, Development, and Evaluation (GRADE) approach [6]. The evidence obtained in RCTs and non-randomised studies of interventions (NRSIs) starts at different levels, therefore we found it inappropriate to include results from RCTs and NRSIs in the same synthesis [27], and the assessment was performed when results were synthesized separately. The certainty of evidence was downgraded according to study limitations, inconsistency, indirectness, imprecision, and the presence of publication bias. The certainty of evidence was assessed as high, moderate, low, or very low using the GRADE.

Results

We obtained 213 retrievals from the electronic database searches, and after the study selection process, 17 published SRs were included in the present study (Chow et al., 2018; [3],Hao et al., 2019a; Hao et al., 2019b; [13, 14],Liu et al., 2017; [24, 31],Xu et al., 2012; Yang et al., 2019; [16, 35,36,37, 39, 41], and by contacting the corresponding author, we obtained an unpublished full report of SR, whose abstract was published as a conference paper [38], another record that cannot be retrieved was presented with an invalid email address of corresponding author and irrelevant doi record in the results of search of EMBASE. The study selection process is illustrated in Fig. 1.

Fig. 1
figure 1

Flow diagram of study selection

Characteristics of included SRs

Among these 18 SRs, 3 were without meta-analyses, while the other 15 were reported with meta-analyses. Additionally, nine SRs were published in English (Chow et al., 2018; [13],Liu et al., 2017; [16, 24, 31, 37, 39, 41], seven in simplified Chinese [3],Hao et al., 2019a; Hao et al., 2019b; [14],Xu et al., 2012; Yang et al., 2019; [35], other one in traditional Chinese [36], and the last one was unpublished report of an abstract [38]. The number of included primary studies ranged from 4 to 25, and the number of participants ranged from 312 to 1,758. Despite the overlap of included RCTs, there were 49 RCTs involving 3956 participants, and 16 NRSIs with 1157 participants. Eight SRs evaluated TC in population of PERIM and PM women (Hao et al., 2019a; Xu et al., 2012; [36],Liu et al., 2017; [16, 24, 31, 38], two SRs focused on elder population (Hao et al., 2019b; Yang et al., 2019), and other eight SRs did not set strict criteria of included population. The characteristics of the SRs are listed in Table 1.

Table 1 Characteristic of Included Systematic Reviews

Reporting quality of included SRs

Only 1 SR was evaluated as high reporting quality [16], with 7 and 10 SRs evaluated as moderate and low reporting quality, respectively. All 18 SRs reported items of rationale (item 3), objectives (item 4), and study characteristics (item 17). Item 2 of the abstract was fully reported in only 1 study [16], since other studies did not follow the guidance of the PRISMA-2020 statement. Only 2 SRs were registered on PROSPERO; thus, item 24 of the registration and protocol was only reported in these 2 SRs [16, 37]. The details of the reporting quality of the included studies are shown in Table 2 and Fig. 2.

Table 2 Reporting quality assessed with PRISMA-2020 checklist
Fig. 2
figure 2

Scores of PRISMA-2020 items of included SRs

Methodological quality of included SRs

The methodological quality of the included studies was assessed using the AMSTAR-2 tool, one SR was low quality [16], and other 17 SRs were critically low quality. The flaws in the critical domain of items 2 and 7 led to critical low quality. Sixteen SRs did not provide written protocol nor registered publicly, and got a “No” in item 2. Only one SR provided excluded records with reasons [16], and other 17 studies did not provide the list of exclusion, and got a “No” in item 7. The details of the methodological quality of the included studies are shown in Table 3.

Table 3 Methodological quality assessed with AMSTAR-2 tool

Summary of synthesized evidence

All included results were meta-analyses of continuous outcomes, and standardised mean difference (SMD) and weighted mean difference (WMD) metrics were adopted in the included studies. To avoid the overlap of RCTs, the most recent MA of specific results was shown in this review when more than one meta-analysis existed. A summary of the evidence is shown in Table 4.

Table 4 Summary of Findings

Endpoint events

The primary outcomes of our study were endpoint events of fracture and fall. One study adopted the incidence of fracture as its primary outcome [37], but no RCTs included in the SR reported this outcome. One RCT reported the incidence of fracture, but the incidence of fracture was relatively low, and the study was not designed to compare the fracture rates,thus, the data should not be over-interpreted [31]. Another SR concluded from an NRSI that TC could reduce the fracture rate. No falls were reported in the included studies [13].

BMD

The BMD of the lumbar spine, femoral neck, shaft, and proximal trochanter, forearm, and Ward’s triangle were reported in 14 SRs. The analyses were performed in subgroup approach.

PERIM and PM women

Seven SRs reported BMD in PERIM and PM women (Hao et al., 2019a; Xu et al., 2012; Liu et al., 2017; [16, 24, 35, 38]. For BMD of lumbar spine, femoral neck, femoral proximal trochanter, and ward’s triangle, the results of most recent MA were retained [16],and only one SR reported BMD of femoral shaft [35]. The results showed that compare to non-intervention, PERIM and PM participants who practiced TC may benefit in BMD of the lumbar spine [MD = 0.04, 95% CI (0.02, 0.07)], and femoral neck [MD = 0.04, 95% CI (0.02, 0.06)]. TC practitioners may not benefit in BMD of the femoral proximal trochanter [MD = 0.02, 95% CI (0.00, 0.03)], ward’s triangle [MD = 0.02, 95% CI (-0.01, 0.04)], and femoral shaft [SMD = 0.16, 95% CI (-0.11, 0.44)].

Elder population

Two SRs reported BMD in older populations (Hao et al., 2019b; Yang et al., 2019). The MA of appropriate statistical method was retained (Yang et al., 2019). The results showed that elders who practiced TC may benefit in BMD of the femoral neck [SMD = 0.28, 95% CI (0.10, 0.45)], femoral proximal trochanter [SMD = 0.39, 95% CI (0.05, 0.73)], and ward’s triangle [SMD = 0.21, 95% CI (0.05,0.37)], but may not in BMD of the lumbar spine [SMD = 0.03, 95% CI (-0.22, 0.27)].

General population

Eight SRs reported BMD in the general population, the results of three most recent MA were retained [14, 37, 39]. Compared to individuals who practice other exercise, TC practitioners may not benefit in BMD of the lumbar spine [SMD = -0.18, 95% CI (-0.51, 0.15)], femoral neck [SMD = 0.12, 95% CI (-0.41, 0.64)], femoral proximal trochanter [SMD = 0.04, 95% CI (-0.49, 0.56)], and ward’s triangle [SMD = -0.04, 95% CI (-0.56, 0.49)]. Compared to those received conventional treatment, TC practitioners may benefit in BMD of the lumbar spine [WMD = 0.16, 95% CI (0.09, 0.23)], and femoral neck [WMD = 0.16, 95% CI (0.04, 0.29)]. Compared to non-intervention, TC practitioners may benefit in BMD of the femoral neck [SMD = 0.43, 95% CI (0.17, 0.68)], femoral proximal trochanter [SMD = 0.49, 95% CI (0.23, 0.74)], and ward’s triangle [SMD = 0.36, 95% CI (0.13, 0.58)], but may not benefit in BMD of the forearm [WMD = 0.16, 95% CI (0.04, 0.29)]. Other results were shown in Table 4.

Serum calcium, phosphorus, and ALP

Three SRs reported the outcomes of serum calcium and phosphorus, 4 reported the outcomes of ALP, and the results of two most recent MA were retained [3, 37]. Compared to non-intervention, participants practiced TC had lower levels of ALP [WMD = -1.18, 95% CI (-1.66, -0.70)], and no significant difference was observed in levels of serum calcium [WMD = -0.06, 95% CI (-0.13, 0.00)], and serum phosphorus [WMD = 0.02, 95% CI (–0.04, 0.08). Other results are shown in Table 4.

Safety outcomes

Six SRs reported safety outcomes of the included primary studies [14, 16, 24, 31, 35, 39]. Three SRs reported no serious adverse effects in the included studies [16, 31, 35, 39]. Two other SRs reported muscle soreness and pain in participants who practiced TC [14, 24].

Certainty of synthesized evidence

We assessed the certainty of synthesized evidence reported by the included SRs. The certainty of the evidence was assessed as low or very low. The main reasons for downgrading were study limitations, clinical and statistical heterogeneity, wide confidence intervals, and a small sample size. Details of the quality of evidence are presented in Table 4.

Discussion

We have low certainty that for perimenopausal and postmenopausal women, TC could improve BMD of the lumbar spine, femoral neck, and in the older population, TC practitioners may benefit in BMD of the femoral neck, and ward’s triangle. The results also revealed that participants who practiced TC might not benefit from serum phosphorus, ALP, and BMD of the femoral shaft and forearm. Compared to other exercises, TC exercise may not improve BMD. We failed to obtain definite results in the BMD of the femoral proximal trochanter and serum calcium. Moreover, the results revealed that the TC exercise is safe to practice.

The present study has several strengths. (1) We employed explicit eligibility criteria, conducted a comprehensive search of eight electronic databases, assessed the eligibility of potential studies critically, and addressed clinically important outcomes of fracture incidence and BMD to gather the latest available evidence. (2) We assessed the reporting quality of the included SRs using the PRISMA checklist and the methodological quality using the AMSTAR-2 tool. (3) By critically evaluating the available evidence reported in previously published SRs using the GRADE approach, we provided an unbiased collection of evidence evaluating the effect of TC on the intervention and prevention of osteoporosis.

Our umbrella review had several limitations. First, most evidence assessed and re-evaluated in this study was reported in an un-subgrouped manner; clinical heterogeneity (the differentiation of populations, interventions, and comparisons) prevented us from providing more precise evidence. Second, as an umbrella review, and not an updated MA, we focused on evaluating available synthesized evidence instead of conducting a novel systematic review of RCTs. We may have omitted some evidence reported by RCTs that were not included in the 17 SRs. However, the main reasons for the poor quality of evidence were the poor quality of primary studies and the limited sample size; the latest published SRs included in our review were published in 2022 [16]. Therefore, considering the duration before the completion and publication of at least one rigorously designed RCT with large sample size, it was unnecessary to conduct an updated MA. Third, the incidence of fractures and falls, clinically important endpoint events in patients with osteoporosis, have been reported in three SRs; however, we still cannot evaluate the quality of evidence because of the limited number of studies. In addition, TC is mainly practiced in China; however, it is also practiced in other East Asian countries such as Korea and Japan. Owing to language barriers, we did not search electronic databases in Korean and Japanese. We also need to clarify that the title and objectives of this review have been changed from the original ones in the registration record, due to the uncritical process of study selection of these included SRs, which meant that participants in most SRs did not meet the diagnostic criteria of osteoporosis.

Several methodological flaws in the included SRs should be highlighted. (1) Only two SRs were registered in PROSPERO, and the absence of written protocols and registration records contributed to poor methodological quality. (2) Of the 14 SRs with MA, four used post-intervention values for evidence synthesis. A MA based on changes from baseline was more efficient and powerful than the comparison of post-intervention values since the measurement errors of BMD were acceptable [9]. Meta-analyses of post-intervention values failed to remove the component of between-person variability,thus, they were considered inappropriate and were not evaluated or presented in the summary of the findings of this umbrella review. (3) The results of statistical tests for heterogeneity should never be the reason for choosing fixed or random effects in a MA [9]. However, 9 of the 14 SRs with meta-analyses made their decisions on effect model selection based on the value of the I2 statistics. (4) Only one SR conducted subgroup analysis based on differentiation of populations and comparisons [37], and three SRs performed subgroup analysis based on control types. The absence of subgroup analysis in most SRs led to clinical heterogeneity and eventually contributed to the poor quality of evidence. (5) Three SRs included the results from RCTs and NRSIs in the same MA. The quality of evidence obtained from RCTs and NRSIs started with different levels of quality, and the mixed analysis made evaluating the certainty of this evidence impossible.

The essence of TC as an exercise that needs personal participation means that the participants and researchers of RCTs could not perform the blinded method, but assessors of outcomes could be blinded, and the risk of bias in outcome measurement could be lowered [23]. The effect of TC on alleviating bone loss may take a long time to produce, and the study duration of RCTs was relatively short, which may underestimate the effect of TC [37].

By conducting this umbrella review, we have several considerations for performing SRs to improve the quality. Registration of the protocol is recommended, and an amendment of the protocol is needed if there are significant changes in the study design. According to the PRISMA statement, most SRs have achieved good reporting quality, but the methodological quality remains poor. For methodological problems, the Cochrane Handbook recommends scouring for answers. Furthermore, we should focus more on clinically important outcomes when conducting SRs and RCTs.

Conclusion

We have low certainty that compared to participants who did not practice TC exercises, TC practitioners in the PREIM and PM populations could benefit in the BMD of the lumbar spine, femoral neck, and we also have low certainty that in the older population, TC practitioners may benefit in BMD of the femoral neck and ward’s triangle, and TC exercise is safe to practice. There were no definite conclusions for outcomes of incidence of fracture and fall; BMD of the femoral proximal trochanter, femoral shaft, and forearm; and levels of serum calcium, phosphorus, and ALP. More rigorously designed, large-sample RCTs of TC are needed in the future to better validate the effect of improving the bone health and alleviating bone loss and in the intervention and prevention of osteoporosis.

RCT. Randomized controlled trials; CCT, case control trials; CSS, cross sectional study.

① Alkaline phosphase; ② Serum calcium; ③ Serum phosphorus; ④ Quality of life; ⑤ Muscle strength; ⑥ Balance function; ⑦ Pain; ⑧ Safety outcomes.