Introduction

Exercise is considered a highly relevant component in the prevention and treatment of osteoporosis and fracture reduction [1, 2]. Consequently numerous exercise studies (review in [3]) aim to increase bone strength, predominately assessed by (areal) bone mineral density (BMD) in postmenopausal women, as the most prominent and largest cohort at risk for osteoporosis. However, although there are some evidence-based recommendations for exercise protocols [1, 4, 5], the most promising exercise to address BMD still remains unsettled [2]. Apart from exercise parameters and principles, even basic decisions, for example about the type of exercise that should be applied, is still (or once again) controversial [6, 7]. In a recent meta-analysis, Rahimi et al. [6] reported the absence of effects of resistance exercise and negative effects of weight bearing aerobic exercise on BMD at lumbar spine (LS) and femoral neck (FN) in postmenopausal cohorts 60 years and older (n = 16). Provided that these data are reliable and generalizable to the entire cohort of postmenopausal women, all the current exercise recommendations (e.g., [1, 4, 5, 8, 9]. and—even more importantly—the exercise effect on BMD in general are rendered questionable. In order to verify the findings of Rahimi et al. [6], and to estimate the effects of different roughly classified types of exercise on BMD at different regions of interest (ROI), we conducted a sub-analysis based on a recent comprehensive meta-analysis on exercise effects on BMD in postmenopausal women [3]. Similarly to Rahimi et al. [6], we roughly categorized exercises into (dynamic) resistance exercise (DRT), weight bearing (WB) exercise and combined WB&DRT exercise. Our hypotheses were that all types of exercise significantly affect BMD at (1) LS, (2) FN and (3) total hip (TH) (4), albeit without significant differences between the exercise categorizations at any BMD-ROI.

Material and Methods

The present study is based on a comprehensive systematic review of the effect of exercise on (areal) BMD in postmenopausal women [3] to which the reader is kindly referred for details.

Data Sources and Search Strategy

We strictly followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [10]; and fully registered the study in PROSPERO (CRD42018095097). Briefly eight databases (PubMed, Scopus, Web of Science, Cochrane, Science Direct, Eric, ProQuest and Primo) were searched for articles published up to March 1, 2019 without language restrictions.

The search strategy comprised a combination of population, intervention, and outcomes. Databases were systematically searched around the following combination of terms: “Bone Mineral Density”, “Exercise”, and “Postmenopausal”. Following the primary search and duplicate exclusion, the same reviewer (MS) screened studies by title and abstracts according to the eligibility criteria. A manual search in the reference lists of all included articles was carried out in an attempt to find new relevant studies. Authors of trials that were potentially eligible were contacted by e-mail for any missing data (e.g., mean change of BMD or SD) or clarification of data presented.

Inclusion and Exclusion Criteria

We included studies/study arms with (1) randomized and non-randomized controlled protocols with at least one exercise group versus one control group with sedentary/habitual active lifestyle or placebo exercise; (2) women who were postmenopausal at study start; (3) ≥ 6 months intervention duration; (4) areal BMD of the LS, femoral neck (FN) and/or total hip (tH) region at baseline and follow-up assessment as determined by (5) dual-energy X-ray absorptiometry (DXA) or dual-photon absorptiometry (DPA); (6) ≤ 10% of women on osteoanabolic/antiresorptive, or osteocatabolic (glucocorticoids) pharmaceutic agents; albeit only when the number of subjects was comparable between exercise and control.

We further excluded studies with (1) mixed gender or mixed pre- and postmenopausal cohorts without separate BMD analyses; (2) women undergoing chemo- and/or radiotherapy and (3) women with diseases that relevantly affect bone metabolism. (4) Duplicates from one study and (5) review articles, case reports, editorials, conference abstracts, and letters were not considered. Lastly, exercise study groups (see below) that cannot be classified on the intended type of exercise were also excluded from the present analysis.

Data Extraction

We designed a pre-piloted extraction form to extract relevant data. The form asked for details with respect to publication characteristics, methodology, participant characteristics, exercise characteristics, risk assessment and outcome characteristics. Two reviewers (SvS and MS) independently evaluated full-text articles and extracted data from the included studies, in case of inconsistency, a third reviewer decided (WK).

Outcome Measures

The primary outcome was change of (areal) BMD at LS-, FN- and TH-ROI as assessed by DXA or DPA between baseline and follow-up. In cases of multiple BMD assessments, we considered only changes between the baseline and final BMD assessments.

Quality Assessment

All studies included were independently assessed for risk of bias by two independent raters (WK and MV) using the Physiotherapy Evidence Database (PEDro) scale [11]. In case of inconsistency, a third reviewer decided (SvS).

Data Synthesis

For the detailed procedure to impute missing standard deviations (SD) the reader is kindly referred to the comprehensive meta-analysis of Shojaa et al. [3]. Briefly, if the studies presented a confidence interval (CI) or standard errors (SE), they were converted to SD. In cases of missing CI or SE data we first contacted authors (n = 11) to provide corresponding information. When no reply was received or data were not available, the exact p-value of the absolute change of BMD was obtained to compute the SD of the change. In the case of unreported p-value, we calculated the SDs using pre and post SDs.

In order to determine the effects of different types of exercise we categorized the studies according to the following approach: (a) dynamic resistance exercise, i.e., any kind of resistance exercise that involves joint movement to develop musculoskeletal strength. We focus on studies that applied isolated DRT without any adjuvant exercise component and without bone-specific warm ups (e.g., running, hopping, aerobic dance) with validated effect on bone [1, 4, 5], (b) weight bearing exercise that involved any kind of aerobic and anaerobic loading of axial skeletal sites due to gravity, i.e., Tai Chi, walking, running, dancing, movement games, heel drops, hopping, jumping and (c) exercise studies that combined weight bearing and DRT exercise, even if WB exercise was applied only shortly during warm up. The latter approach was selected due to the observation that only few cycles with high strain rates may induce positive effects on bone [12, 13]. Two raters (WK and MV) independently categorized the data, in case of inconsistency, a third reviewer decided (SvS).

Statistical Analysis

The statistical analysis was performed using the statistical software R (R Development Core Team) [14]. Effect size (ES) value was considered as the standardized mean differences (SMDs) combined with the 95% confidence interval (95% CI). Random-effects meta-analysis was performed by applying the metafor package [15]. Heterogeneity for between-study variability was determined using the Cochran Q test, as with other statistical analyses a P-value < 0.05 was considered significant. The level of heterogeneity was analyzed with the I2 statistic. For those studies with two different intervention groups, the control group was proportionally split into two groups for comparison against each intervention group [16]. Sensitivity analysis was conducted to check whether the overall result of the analysis is robust regarding the use of the imputed SDs. Funnel plots with regression test and the rank correlation between effect estimates and their standard errors, using the t-test and Kendall’s τ statistic respectively, were applied to explore potential publication bias. To adjust the results for possible publication bias, we also conducted a trim and fill analysis using the L0 estimator proposed by Duval et al. [17]. The present subgroup analysis was conducted as a mixed-effect meta-analysis with “type of exercise” as the moderator. A P-value of < 0.05 was considered as significant for all tests.

Results

Study Characteristics

In total, our search identified 74 eligible studies with 84 exercise groups (Fig. 1), categorized into 18 DRT groups [18,19,20,21,22,23,24,25,26,27,28,29,30,31,32], 30 weight bearing (WB) type exercise groups [31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57] and 36 study groups that scheduled combined exercise protocols [33, 48, 58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88].

Fig. 1
figure 1

Flow diagram of search process according to PRISMA [10]

The pooled number of participants was 2793 in the exercise and 2319 in the control group respectively. In detail, the number of participants in exercise and control was 1344 and 1175 women in the combined WB DRT group, 1045 and 815 women in the WB and 404 and 329 women in the DRT group. Table 1 gives the anthropometric participant characteristics of the included studies.

Table 1 Participants characteristics of included studies

Sample sizes in the exercise arms ranged from 5 [28] to 125 participants [58] per group (CG: 2 to 125 women). Thirteen studies [22, 33, 41, 45, 47, 61, 63, 68, 73, 75, 77, 87, 89] included women with osteopenia/osteoporosis (DRT: n = 1 vs. WB: n = 4 vs. WB&DRT: n = 8); (Table 2). Average age varied among the studies between 51 ± 2 years [18] and 77 ± 3 years [81]. Twelve studies with fourteen exercise groups [18, 24, 28, 38, 55, 57, 69, 72, 75] focused on cohorts of “early postmenopausal women” (1 to ≤ 8 years post). Thirty studies included participants with sedentary/habitually active lifestyles, 28 trials involved participants with exercise activities presumably with minor effects on bone and 16 studies did not provide the corresponding information (Table 2).

Table 2 Exercise prescription characteristics of included studies, categorized according the type of exercise

Intervention Characteristics

Cholecalciferol, Calcium Supplementation

Vitamin-D and/or calcium supplementation for the exercise and control groups were provided in 21 studies [18, 19, 21, 29, 33,34,35, 40, 41, 43, 44, 49, 51, 65, 71, 73, 76, 77, 79, 80, 84].

Exercise Intervention

Table 2 gives exercise characteristics of the included studies categorized according the type of exercise.

Dynamic Resistance Exercise (DRT)

Table 2 (midsection) specifies the exercise protocols of included 18 included DRT study groups.

Apart from one exception [30], all studies focused on a supervised group exercise protocol [18,19,20,21,22,23,24,25,26,27,28,29, 31, 32, 60]. Studies ranged from six [18, 20, 22, 26] to 24 months [30]. Except for three studies/study groups [24, 30, 32], the studies focused on all or most main muscle groups. Besides two studies that did not provide sufficient information for the LS-site [23, 32], all the other studies provided exercises for their specified BMD-ROI (i.e., LS and proximal femur). Most studies prescribed a training frequency of three sessions per week (Table 2); with a session length that varied from about 1–2 min (i.e., 10 × back extension [30]) to ≈ 120 min (39 sets × 20 reps, 2–3 min of rest [23]). Most studies applied a multiple set approach ( [18, 19, 21,22,23,24,25, 27, 29, 43, 60]. Relative exercise intensity ranged between 80% 1RM [18, 23, 25, 27, 29, 43, 60] and ≤ 30% 1RM [26, 30]. Five studies prescribed either work to repetition maximum [27, 28, 80] or work to muscular fatigue [21, 43], another study [20] referred to 5–6 (i.e., strong-strong +) on the Borg CR10 scale. Reviewing the repetition number and relative exercise intensity (% 1RM), some studies [30] or study arms [18, 29] might have exercised with (too) low effort.Footnote 1 None of the studies reported an explosive movement in the concentric or eccentric phase. Progression or at least regular adjustments of exercise intensity were realized by all but one DRT study. Periodization models [90] were not applied by any of the studies.

Weight Bearing Exercise

Table 2 (upper part) lists the exercise protocol of the weight bearing type exercise studies. By nature, the specific exercise was much more heterogeneous compared with DRT. Studies specified (brisk) walking including walking with additional load (n = 11), walking/running (n = 3), Tai Chi (n = 4), jumping or rope skipping (n = 3), heel drops (n = 1), stepping (n = 1), standing on one leg (n = 1) and combined weight bearing types (e.g., heel drops, jumping skipping; stairclimbing, n = 6) (Table 2). Duration of the studies varied between 6 (e.g., [33].) and 30 months [45]. Twelve of 30 study groups applied a supervised group exercise program, 12 study groups specified non-supervised individual exercise [36, 37, 39, 41, 42, 44, 47, 53, 55, 89] or additional [34, 35, 45, 51, 89] to the supervised group exercise, for 6 study groups this information was not listed (Table 2). Site specificity at the LS might be realized by direct muscular insertion of exercises applied in the Tai Chi studies [31, 32, 38, 47] and studies that applied higher ground reaction forces (i.e., jumping, drop-jumps and potentially jogging in older cohorts) [33,34,35, 40, 41, 43,44,45,46, 52, 55, 56, 83]. Net training frequency (considering attendance rate) varied between ≥ 10 sessionsFootnote 2 and about 2 sessions/week [55]; corresponding net exercise volume/week vary between ≥ 240 [89] and about 10–15 min/week [55]. Progression of exercise intensity was consciously considered by about half of the WB type exercise studies [40, 41, 43,44,45,46, 49, 52, 56, 83] (Table 2). Periodized exercise models were not applied.

Combined WB and DRT Studies

Most combined WB DRT studies applied a combination of walking, running, stepping, movement games, dancing either as single session, session component or during warm up and a DRT on machines or with free weights (Table 2, lower part). At least nine study arms [63, 69, 72, 76, 77, 79, 82, 84, 91] specified exercises with higher GRF (e.g., jumping variations, heel drops) during the WB&DRT sessions. With few exceptions [73], the studies scheduled either a consistently supervised exercise protocol or a mix of supervised (DRT) sessions and non-supervised walking/home training sessions (Table 2). Duration of the studies varied between 6 (e.g., [33, 58].) and 26 months [77]. Training frequency varied from ≈ 8 [58] to < 2 sessions/week [70, 82, 83]; net training volume ranged from about 6 h/week [58] to 67 min/week [70]. Due to the overall DRT (i.e., all or most main muscle groups) that was applied by all, but 3 studies [58, 72, 88], most studies mechanically addressed the LS-ROI by muscular tension. Apart from one study [66], all studies that adequately described their exercise protocols applied multiple set approaches. Peak relative exercise intensity varied between 90% 1RM [77] and 60–65% 1RM [76]. Progression of exercise intensity was applied by the vast majority of the studies [48, 62, 63, 65,66,67,68,69,70,71, 73, 74, 76,77,78,79,80,81, 83,84,85, 88].

Methodologic Quality

The Pedro scores of the included studies are listed in Table 3. Methodologic quality of the trials ranges from 3 to 9 score points (Table 3), with a mean and SD of 5.44 ± 1.32 score points. Methodologic quality of the DRT studies was on average (6.24 ± 1.30 points) significantly higher (P = 0.024) compared with the other groups.

Table 3 Assessment of risk of bias for included studies listed in alphabetic order

Outcomes

Apart from two studies [28, 30] that applied DPA, all the others used DXA. Furthermore, all the other studies except two ( [23]: hip only; [30]: LS only) determined both, BMD at LS and proximal femur regions of interest.

Effect of Different Types of Exercise on LS-BMD

Sixteen DRT exercise groups, 26 WB exercise groups and 33 combined WB&DRT exercise groups evaluated the effect of exercise on LS-BMD. In summary, the pooled estimate of random effect analysis for DRT was SMD: 0.40, 95% CI 0.15–0.65 (P = 0.009), for WB exercise SMD: 0.26, 95% CI: 0.03–0.49 (P = 0.037) and SMD: 0.42, 95% CI 0.23–0.61 (P = 0.001) for the combined WB&DRT exercise. No significant differences between the types of exercise were observed (P = 0.508). All types of exercise revealed a similarly high level of heterogeneity between their trials (I2 = 76.3–76.5%)) (Fig. 2).

Fig. 2
figure 2

Forest plot of meta-analysis results at the LS. The data are shown as pooled standard mean difference (SMD) with 95% CI for changes in exercise and control groups

Effect of Different Types of Exercise on FN-BMD

Fifteen DRT exercise groups, 23 WB exercise groups and 25 combined WB&DRT exercise groups evaluated the effect of exercise on femoral neck-BMD. In summary, the pooled estimate of random effect analysis for DRT was SMD: 0.27, 95% CI 0.09–0.45 (P = 0.003), for WB exercise SMD: 0.37, 95% CI 0.12–0.62 (P = 0.004) and SMD: 0.35, 95% CI 0.19–0.51 (P = 0.001) for the combined WB&DRT exercise. No significant differences between the types of exercise were observed (P = 0.822). Heterogeneity level of included trials in the WB and WB&DRT group was considerable (I2: 82.1) or substantial (I2: 63.6); but was negligible (I2: 16.5) in the DRT group (Fig. 3).

Fig. 3
figure 3

Forest plot of meta-analysis results at the femoral neck. The data are shown as pooled standard mean difference (SMD) with 95% CI for changes in exercise and control groups

Effect of Different Types of Exercise on TH-BMD

Ten DRT exercise groups, seven WB exercise groups and 12 combined WB&DRT exercise groups evaluated the effect of exercise on total hip-BMD. In summary, the pooled estimate of random effect analysis for DRT was SMD: 0.51, 95% CI 0.28–0.74 (P < 0.001), for WB exercise SMD: 0.40, 95% CI 0.21–0.58 (P < 0.001) and SMD: 0.34, 95% CI 0.14–0.53 (P < 0.001) for the combined WB&DRT exercise. No significant differences between the types of exercise were observed (P = 0.554). Heterogeneity level of included trials in the WB or DRT group was negligible (I2 < 10%) and moderate (I2: 43.8%) in the WB&DRT group (Fig. 4).

Funnel plots for LS, FN and TH did not suggest positive evidence of publication bias. The regression and rank correlation test for funnel plot asymmetry did not indicate significant asymmetry for LS or TH, but did for TH with missing studies to the right (positive difference/effects). The trim and fill analysis that correspondingly imputed three studies results in a slightly higher total SMD (0.43; 95% CI 0.31–0.54) than the non-adjusted results listed in Fig. 4.

Fig. 4
figure 4

Forest plot of meta-analysis results at the total hip. The data are shown as pooled standard mean difference (SMD) with 95% CI for changes in exercise and control groups. HI high intensity, LI low intensity

Discussion

In this sub-analysis of a comprehensive meta-analysis, we clearly confirmed the significant positive effects of different types of exercise on BMD at LS, FN and TH in postmenopausal women. Further, WB type exercises, DRT and a combination of both types of exercise revealed at least no significant groups differences for LS, FN or TH-BMD. Thus, we verified all our hypothesis and in turn now question the data of Rahimi et al. [6]. One possible explanation for the diverging results of the present analysis and the data of Rahimi et al. [6] might be the focus on studies with women 60 years + , i.e., the advanced postmenopausal status in the latter study. Considering that menopausal transition and early menopausal status is related to considerably increased bone turnover [92, 93], there is some evidence that exercise might be more effective during early than in late post-menopause, at least with respect to trabecular bone loss [76, 94]. The meta-analysis of Shojaa et al. [3] on this issue observed only slight, non-significant differences between exercise during the early vs. late postmenopausal years,Footnote 3 be it for LS (SMD “early”: MV = 0.64, 95% CI 0.33–0.95 vs. “late”: 0.39, 0.14–0.55) or total hip ROI (SMD: 0.51, 0.27–75 vs. 0.38, 0.20–0.56). Apart from age, both meta-analyses also differ with respect to eligibility criteria, i.e., randomization, language, publication type, medication and diseases, while the limitation on studies ≥ 6 months with healthy postmenopausal women without hormone replacement therapy and previous DRT are common to both studies. The most striking difference, on the other hand, is the low amount of studies classified into the exercise categories by Rahimi et al. [6]. Considering that only two studies were analyzed to determine the effect of WB aerobic exercise on LS-BMD (vs. n = 23 in the present study), one should draw definite conclusions from that data with extreme caution.

Although we consistently determined significant positive exercise effects on BMD-ROIs, (SMD: 0.26–0.51), SMDs of the single exercise trial vary substantially, particularly for the LS (I2 = 76–77%). Even in the DRT group, which can be considered as the most homogeneous group with respect to exercise type classification (see above), the heterogeneity level for LS-BMD effects was substantial (i.e., I2 > 75%). This is understandable, however, since considerable differences can be observed between the trials or study groups (Table 2) particularly with respect to exercise parameters (i.e., strain magnitude, rate [5]) and training principles (e.g., progression, periodization [5, 95]).

Revisiting the effects of different types of exercise, it is noteworthy that the effect of the WB type interventions at the LS was considerably less pronounced compared with the DRT group (SMD: 0.26 versus SMD: 0.40). This is not necessarily related to higher effects of DRT-induced direct muscular impact on LS in general, however, but to the large number of WB studies that applied low ground reaction forces (e.g., walking: n = 11) with corresponding axial impact loading that might not (longer) reach the LS area. Two meta-analyses [96, 97] that reported significant positive “walking effects” at FN-BMD without effects at LS-BMD support this estimation. Another surprising result is that the combined effect of WB&DRT group failed to generate relevantly higher BMD effects compared with DRT (…or apart from LS-BMD, WB type exercise). Recent evidence-based guidelines that focus on bone development [1, 4, 5] consistently recommended exercise protocols that included impact activities and progressive resistance training applied with high strain magnitude and rate. However, at this point at the latest, we have to acknowledge and discuss the limited ability of meta-analyses to derive exercise recommendations [98], largely independent of the outcome [99]. Selecting the adequate type of exercise to address a given training aim is only the first, rough decision within the training process [5, 95]. Much more challenging, particularly when addressing bone, is the consideration how to optimally specify the type of exercise in the light of the large variety of exercise parameters (e.g., strain magnitude, rate, duration, frequency, cycle number, rest periods) [5, 95]. Another modifying aspect within the exercise process is the inclusion of exercise principles [5, 95]. Applying, e.g., progression and periodization might not be important within a 10-week exercise intervention; however, considering that studies included in the present analysis on BMD average between ≥ 6 months and 30 months their relevance becomes obvious. The fact that even slight differences in exercise parameters, e.g., movement velocity of the concentric phase during DRT, significantly modify the effect on BMD [100] suggests that high complexity of exercise effects on BMD could conflict with the comprehensive meta-analytic approach. One may assume that the rather high number of study groups included in the present subgroups might even out differences at individual study levels, but this assumption is frequently wide of the mark. This might be confirmed by the considerably higher effects of DRT versus WB for TH-BMD (SMD: 0.51, 95% CI 0.28–0.74 vs. 0.34, 0.14–0.52), however, not for BMD at the adjacent FN-region (SMD-DRT: 0.27, 0.09–0.45 versus SMD-WB: 0.37, 0.12–0.62, Table 2), a constellation for which no serious explanationFootnote 4 can be provided.

Furthermore, some limitations and study features of the present analysis may decrease the evidence and generality of our finding. (1) Although we placed high emphasis on eligibility and reliable classification of the exercise types, some decisions are certainly debatable. This may be the case for the exclusion of the study of Rhodes et al. [101]Footnote 5 that combined non-weight bearing exercise (however only as a warm up) and DRT, while still including others (e.g., [66, 67, 87]. that applied a mixed weight bearing/non-weight bearing & DRT intervention. However, in our defense it should be noted that some studies were very lapse in their standards of exercise reporting, and so extracting the relevant information was sometimes challenging. (2) We conducted funnel plots with trim and fill analysis for the entire cohort of included studies for LS, FN and TH (not given). However, it might have been better to conduct separate funnel plots for the effects of the isolated exercise group for each ROI. On the other hand, reviewing the three funnel plots in detail, we did not observe relevant differences between the different exercise groups that might have significantly changed the present result. (3) We failed to generate reliable scores/categories for exercise intensity/strain magnitude across the different types of exercises, in order to conduct a sub-analysis for this crucial exercise parameters. A sub-analysis of our outcome adjusted for “exercise intensity/strain magnitude” might have resulted in more sophisticated results and higher overall treatment effects. (4) The present literature search was conducted up to March 1, 2019, i.e., some more studies might have been published in the meantime. However, considering the large amount of studies included in this systematic review and meta-analysis, we feel that the few additional exercise studies will not considerably modify our finding.

In conclusion, we do not share the enthusiasm for basing exercise recommendations or exercise guidelines on meta-analyses – at least in the area of “bone strengthening”. Nonetheless, at least uncritical acceptance of the acquired data should be avoided if this is done. Accurately designed randomized controlled exercise trials that manipulate a dedicated single aspect while maintaining all other exercise parameters and confounders will be more qualified to generate reliable exercise recommendations.