Introduction

Exercise is a promising agent for preventing osteoporosis in postmenopausal women. In their recent comprehensive systematic review and meta-analysis with 75 eligible studies, Shojaa et al. [1] provided definite evidence for the favorable effect of exercise on bone mineral density (BMD). Nevertheless, effect sizes in the predominantly healthy cohorts for lumbar spine (LS) and femoral neck (FN)-BMD were moderate at best. However, the substantial heterogeneity between the trials indicates that some studies were much more effective in improving BMD at LS or FN than others. Several factors might contribute to this aspect. Two factors, bone and menopausal status, might particularly contribute to the high degree of heterogeneity between the trials. While the early-postmenopausal stage is related to increased bone turnover with negative net effects on BMD in many women [2, 3], a factor that might dilute the exercise-induced effect on BMD, there is some evidence for more favorable exercise effects in people with osteopenia/osteoporosis compared to people with normal BMD [4]. However, the most striking effect on heterogeneity of exercise trial findings within comprehensive meta-analysis is differences in the exercise protocols. In this context, supervision of the exercise protocol has far-reaching consequences on setting, exercise composition, feasibility, safety, motivation, and adherence [5, 6]. In a recent systematic review and meta-analysis, we clearly demonstrated the favorable effect of supervised exercise protocols on fracture incidence [7]. Considering the higher complexity of exercise protocols for improving BMD compared to decreasing the number of falls [8], the role of supervision might be even more important in the area of bone strengthening.

In the present comprehensive systematic review and meta-analysis, we thus aimed to (a) provide a 2022 update regarding the effect of exercise on BMD at the LS, FN, and total hip (TH) regions of interest (ROI) using the inverse heterogeneity model (IVhet, [9]) that is less susceptible to underestimation of statistical error in heterogeneous studies and (b) to determine the relevance of the potentially confounding effect of bone status, menopausal status, and supervision of the session on exercise effects on BMD at LS, FN, and TH.

Methods

Data sources and search strategy

Electronic literature searches were conducted using PubMed, Web of Science, Cochrane, Science Direct, Eric, and ProQuest databases up to August 09, 2022, without any language restriction. The keywords and MeSH terms used in the search strategy included (“Bone” or “Bone mass” or “Bone status” or “Bone structure” or “Bone turnover” or “Bone metabolism” or “Bone mineral content” or “Skeleton” or “Bone Mineral Density” or “BMD” or “Bone Density” or “Osteoporosis” or “Osteopenia”) AND (“Postmenopause” or “Post-Menopause” or “Postmenopausal”) AND (“Exercise” or “Training” or “physical exercise” or” “physical activity” or “exercise training” or “weight bearing” or “strength training” or “resistance training” or “aerobic exercise” or “isometric exercise”) AND (“Clinical trial” or “Randomized controlled trial”). Furthermore, reference lists of the included articles were searched manually to extract additional eligible articles.

The present study was conducted following the Preferred Repointing Items for Systematic Reviews and Meta-Analyses (PRISMA) approach [10]. The study was registered in the international prospective register of systematic reviews (PROSPERO ID: CRD42021241407).

Inclusion and exclusion criteria

Articles were included in this meta-analysis if they met the following inclusion criteria: (a) clinical trials with at least one exercise group as an intervention versus one control group with sedentary/habitual active lifestyle without designed exercise, (b) women with postmenopausal status at study onset, (c) intervention of at least 6 months, (d) areal BMD of the LS or/and the proximal femur regions “TH” and/or “FN” were listed as outcome measures at baseline and follow-up assessment, (e) BMD determined by dual-energy X-ray absorptiometry (DXA) or dual-photon absorptiometry (DPA), (f) ≤ 10% of participants on hormone (replacement) therapy (HT or HRT), osteoanabolic/antiresorptive (e.g., bisphosphonate, Denosumab), or osteocatabolic (glucocorticoids) and pharmaceutic agents, albeit only if the number of users was comparable between exercise and control.

The exclusion criteria were as follows: (a) mixed gender or mixed pre- and postmenopausal cohorts without separate BMD analysis for postmenopausal women; (b) women undergoing chemo- and/or radio-therapy; (c) women with diseases that relevantly affect bone metabolism; (d) interventions applying novel exercise technologies (e.g., whole-body vibration), or cycling, and swimming/aqua fitness as the only type of exercise training; (e) the synergistic/additive effect of exercise and pharmaceutic therapy; (f) double/multiple publications from one study and preliminary data from subsequently published trials; and (g) review articles, case reports, editorials, conference abstracts, letters, and unpublished reports or articles for which only abstracts were available were not considered.

Data extraction and quality assessment

Two reviewers (RM and WK) independently evaluated the full-text articles and extracted data from all the eligible publications independently. If they could not reach a consensus, a third reviewer was consulted (SvS). Information including author’s name, year of publication, country, population, number of participants, age, years since menopause, BMI, study duration, type of exercise, interventions, frequency, intensity, duration, sets and repetition, compliance, BMD values at baseline and study completion was extracted.

Two authors (RM and WK) independently assessed the risk of bias using the PEDro (Physiotherapy Evidence Database) scale [11, 12], and any discrepancy was resolved by consulting with a third reviewer (SvS). The categories assessed were randomization, allocation concealment, similarity at baseline, blinding of participants and staff, assessor blinding, incomplete outcome data, intention-to-treat analysis, between-groups comparison, and measure of variability. If a criterion was met, a point was awarded for the study; otherwise, a point was not awarded. For each trial included, a total score ranging from 0 to 10 could be obtained. The methodological quality of the included studies was classified as follows: ≥ 7 = high quality, 5–6 = moderate quality, and < 5 = low quality [13].

Outcome measures and data synthesis

The primary endpoint was the change in BMD at the LS, the femoral neck (FN), and the total hop (TH) region of interest (ROI) from baseline to follow-up. For subanalyses, the intervention was classified for (a) bone status (i.e., cohorts with versus without osteopenia/osteoporosis), (b) menopausal status of the women (i.e. early (≤ 5 years) versus late postmenopausal > 8 years (or cohort 60 years and older)) [14], and (c) supervision considering the net exercise frequency reported for the study arm. For the latter aspect, we differentiated between predominantly supervised and predominantly non-supervised exercise programs.

If the studies presented a confidence interval (CI) or standard errors (SE), these were converted to standard deviation (SD) with standardized formulas [15, 16].

Statistical analysis

We applied a random-effects meta-analysis using the metafor package [17] that is included in the statistical software R [18]. Effect size (ES) values were presented as standardized mean differences (SMDs) in combination with the 95% confidence interval (95% CI). We applied the inverse heterogeneity (IVhet) model proposed by Doi et al. [9] Heterogeneity between the studies was checked using I2 statistics. I2 of 0–40% was considered as “low,” 30–60% as “moderate,” 50–90% as “substantial,” and 75–100% as “considerable” heterogeneity [16]. Along with regression test and the rank correlation effect estimates and their standard errors using the t-test and Kendall’s τ statistic for potential publication bias, we also conducted trim and fill analyses using the L0 estimator proposed by Duval et al. [19] Additionally, we used Doi plots and the Luis Furuya-Kanamori index (LFK index) [20] to check for asymmetry. LFK values within ± 1 were considered negligible, while values ≥  ± 1 to ± 2 were considered as showing minor asymmetry. Values higher than ± 2 indicate major asymmetry. Sensitivity analyses were applied to determine whether the overall result of the analysis is robust to the use of the imputed correlation coefficient (minimum, mean or maximum). P-value < 0.05 was considered as the significance level for all the tests. SMD values of 0.2, 0.5, and 0.8 were interpreted as small, medium, and large effects.

Results

Study selection

Figure 1 illustrates the search process of the study. After removing 267 duplicates, 2251 articles were screened based on title and abstract. The full texts of 101 potentially relevant articles were screened, and finally, a total of 80 articles were included in this systematic review and meta-analysis [21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100]. Studies were published from 1989 to 2022 (Fig. 1). Three studies included contained English abstracts but with Italian [77, 94], Portuguese [82], and German [58] full texts.

Fig. 1
figure 1

PRISMA 2020 flow diagram for updated systematic reviews for the present project [101]

Study and participant characteristics

The 80 studies included in this systematic review and meta-analysis comprise 94 training groups and 80 control groups (Table 1). The pooled number of participants was 5581 (intervention group: 3036, control group: 2545) and sample size in individual studies ranged from 5 [77, 94] to 125 [21] participants per group. Participants in the individual studies were on average between 50 [26] and 79 [68, 102] years old. Accordingly, the average menopausal age ranges from 0.5 [91, 97] to 24 years [56]; however, many studies do not provide this important information (Table 1). The mean body mass indexes (BMI, kg/m2) of individual studies indicate that cohorts were underweight on average (e.g., [103] while others were obese [90]) (Table 1).

Table 1 Study and participant characteristics of the studies

Difficult to rate but highly relevant for the intervention effect, 28 studies included participants with sedentary life styles, while 36 trials involved participants with some kinds of pre-study exercises activities (up to < 7 h/week [83]; Table 2). Unfortunately, many studies did not provide any information on the health and exercise status of their cohorts (Table 1).

Table 2 Exercise characteristics of the studies

Exercise characteristic description

Program duration varied considerably in the trials from 6 to 30 months (Table 2). Most studies (n = 42) applied an intervention period of between 9 and 18 months, while 27 trials scheduled a shorter — and 11 studies a longer — intervention period. A similarly large number of the 92 intervention groups employed either aerobic exercise (predominantly walking and/or jogging) or combined aerobic and resistance exercise as the primary exercise component. Twenty-eight interventions prescribed resistance exercise as the major component. Tai Chi was applied in five training groups [35, 69, 97, 98], hopping and jumping as the primary intervention was evaluated in six intervention groups [24, 51, 72, 77, 92]. Exercise frequency prescribed by the trials ranged from 2 sessions ([21, 34, 45, 50, 54, 70, 71, 79] to nine sessions/week [55]. The exercise session of eight studies [23, 24, 49, 77, 89, 91, 92] averaged about 10 min or less. Prescription of exercise intensity for aerobic exercise predominantly ranged between 60 and 80% of maximum heart rate (HRmax). Ground reaction forces during dynamic weight bearing exercise averaged from about ≈1.5 [48] to ≈4 × body mass [24] or potentially higher [72]. Resistance training protocols scheduled an exercise intensity of between 70 and 80% of the one repetition maximum (1-RM). In detail, four studies [26, 71, 81, 86] prescribed exercise intensities of 50% 1RM or lower. During resistance training, 1–21 exercises [42, 81, 91], with up to 108 repetitions [81], structured in 1–5 sets [22, 41, 42, 85, 91], were applied per session. Time under tension (i.e. movement velocity) was reported in only 10 studies [40, 50, 53, 59,60,61, 71, 74, 79, 87] and ranged between 3 and 9 s per repetition, with 3 studies using fast or explosive movements in the concentric part of the exercise [53, 61, 71]. In 57 exercise groups, the exercise intensity was progressively increased during the intervention period [26, 28,29,30,31, 36,37,38,39,40,41,42,43, 45,46,47, 49,50,51,52,53,54,55,56, 58,59,60,61,62,63,64,65,66,67, 71,72,73,74,75,76, 79,80,81,82, 85,86,87,88, 91, 93, 95].

Apart from one study with a very low attendance rate of 39% [84], all the other studies reported attendance rates of > 60%. Four [32, 44, 100] studies listed 100% attendance (Table 2). Unfortunately, 15 studies did not provide information on participant attendance.

Methodological quality

Methodological quality according to PEDro is shown in Table 3. Fifteen trials demonstrated high and 49 studies moderate methodological quality, while the remaining studies were classified as being of low quality (Table 3). Higher scores were frequently hindered by the lack of allocation concealment, participant, caregiver or assessor blinding, and < 85% of subjects assessed for at least one primary outcome. However, given that successful blinding of participants and caregivers (i.e., instructors) is hardly possible in exercise trials, 8 out of 10 score points can be considered an excellent result.

Table 3 Assessment of risk of bias for included studies

Outcome measures

Most of the trials determined BMD at the LS and femoral neck and/or total hip ROI. Ten studies measured BMD exclusively at the LS [46, 48, 52, 54, 55, 57, 75, 91, 95, 100], and seven studies determined BMD only at the proximal femur [25, 49, 62, 66, 74, 89, 94].

Meta-analysis results

Effects of exercise on BMD at the lumbar spine

Eighty-five comparisons addressed exercise effects at BMD-LS (Fig. 2). In summary, the inverse heterogeneity model (IVhet) (Fig. 2) with imputation of the mean correlation demonstrated a significant effect (p < 0.001) of exercise on BMD at the LS (SMD: 0.29; 95% CI: 0.16 to 0.42). Heterogeneity between the trial results (I2 = 68%) can be classified as substantial (Fig. 2). Applying sensitivity analysis with imputation with minimum correlation (i.e., maximum SD, SMD: 0.21; 95% CI: 0.12 to 0.29) or maximum correlation (i.e., minimum SD, SMD: 0.45; 95% CI: 0.21 to 0.69) led to diverging, but consistently significant (p < 0.001) results.

Fig. 2
figure 2

Forest plot of meta-analysis results at the lumbar spine (IVhet model). The data are shown as pooled standard mean difference (SMD) with 95% CI for changes in the exercise (EG) versus control groups (CG). Imputation with mean correlation

The IVhet model–based funnel plot analysis with trim and fill suggests significant evidence for a publication/small study bias (Fig. 3). The analysis imputes nine missing studies on the left-hand side (i.e., favors control group). However, even the corrected, i.e., imputation-adjusted, intervention effect remains significant (p = 0.008). The significant asymmetry was confirmed by the LFK Index (1.50), the regression (p = 0.011), and the rank correlation test (p = 0.016).

Fig. 3
figure 3

Funnel plot with trim and fill on the effect of exercise on BMD at the lumbar spine

Effects of exercise on BMD at the femoral neck

Thirty-one group comparisons determined exercise effects at BMD-FN (Fig. 4). In summary, the IVhet model with imputation of the mean correlation demonstrated a significant effect (p < 0.001) of exercise on BMD at the FN (SMD: 0.27; 95% CI: 0.16 to 0.39) (Fig. 4). Heterogeneity between the trial results (I2 = 58%) was moderate (Fig. 4). Applying imputation with minimum correlation (SMD: 0.20; 95% CI: 0.13 to 0.27) or maximum correlation (SMD: 0.55; 95% CI: 0.24 to 0.86) led to diverging, but consistently significant (p < 0.001), results.

Fig. 4
figure 4

Forest plot of meta-analysis results at the femoral neck (IVhet model). The data are shown as pooled standard mean difference (SMD) with 95% CI for changes in the exercise (EG) versus control groups (CG). Imputation with mean correlation

The funnel plot analysis with trim and fill suggests no relevant evidence for a publication/small study bias (Fig. 5). Inspecting the LFK Index (1.04), we observed minor asymmetry; results of the regression (p = 0.068) and rank correlation test (p = 0.127) were non-significant, however.

Fig. 5
figure 5

Funnel plot with trim and fill on the effect of exercise on BMD at the femoral neck

Effects of exercise on BMD at the total hip

Thirty comparisons addressed exercise effects for TH BMD (Fig. 6). The IVhet model (Fig. 6) with imputation of the mean correlation revealed a significant effect (p < 0.001) of exercise on BMD at the TH (SMD: 0.41; 95% CI: 0.30 to 0.52). Heterogeneity between the trial results (I2 = 20%) was low (Fig. 6). Applying sensitivity analyses with imputation with minimum correlation (SMD: 0.31; 95% CI: 0.22 to 0.42) or maximum correlation (SMD: 0.64; 95% CI: 0.44 to 0.83) led to diverging, but consistently significant (p < 0.001), results.

Fig. 6
figure 6

Forest plot of meta-analysis results at the total hip (IVhet model). The data are shown as pooled standard mean difference (SMD) with 95% CI for changes in the exercise (EG) versus control groups (CG). Imputation with mean correlation

The funnel plot for TH BMD suggested evidence for a small study/publication bias (Fig. 7). The trim and fill model imputed three studies at the lower right-hand side (i.e., small-moderate sized studies with positive results). Considering these (imputed) data in the analysis, SMD increased slightly (0.43; 95% CI: 0.32–0.54). Funnel plot asymmetry was not confirmed by the LFK index (− 0.04), regression (p = 0.47), or rank test (p = 0.57).

Fig. 7
figure 7

Funnel plot with trim and fill on the effect of exercise on BMD at the total hip ROI

Subanalyses on potentially modifying factors

Effect of bone status

Fourteen of 85 comparisons focused on BMD-LS in cohorts with osteopenia, osteoporosis, or a history of fractures (Table 1). The IVhet model determined no significant difference (p = 0.094) between the subgroups with (SMD: 0.54; 95% CI: 0.17 to 0.92) and without (SMD: 0.23; 95% CI: 0.08 to 0.38) osteopenia/osteoporosis on LS-BMD. Heterogeneity was substantial (I2: 63% and I2: 77%) in both subgroups. Ten versus 62 subgroups with and without osteopenia/osteoporosis were compared for FN. In summary, we observed slightly more favorable effects in the cohorts with osteopenia, osteoporosis, or a history of fractures; the between group differences were not significant (p = 0.711), however. The same was true for TH-BMD (p = 0.453) with 6 versus 25 comparisons.

Effect of menopausal status

In summary, we compared 7 early postmenopausal with 26 late postmenopausal study groups for LS-, 5 with 26 subgroups for FN-, and 5 with 9 subgroups for TH-BMD. In summary, we detected no significant BMD-difference at the LS (p = 0.901), FN (p = 0.547), or TH-BMD (p = 0.824).

Effect of supervision on exercise effects on BMD in postmenopausal women

Fifty-nine study groups that addressed LS-BMD applied a predominantly supervised exercise protocol, while 23 study arms focused on predominantly non-supervised exercise. In summary, the supervised exercise protocols revealed only tendentially (p = 0.37) higher effects compared with the predominantly unsupervised exercise protocols on LS-BMD (SMD: 0.19; 95% CI: − 0.03 to 0.41). In parallel, we observed no significant differences (p = 0.549) between predominantly supervised (n = 50) versus non-supervised (n = 19) exercise protocols for FN-BMD. However, the predominantly non-supervised exercise study groups per se did not show a significant exercise effect on BMD-LS and FN (p = 0.09 each). Finally, we observed no significant differences in TH-BMD (p = 0.798) for predominantly supervised (n = 25) versus predominantly non-supervised protocols (n = 5) with significant exercise effects for both subgroups.

Discussion

In the present systematic review and meta-analysis, we provide further evidence for the favourable effect of exercise on BMD at LS, FN, and TH in postmenopausal women. However, since this 2022 update added six exercise trials [51, 53, 54, 72, 77, 96] while excluding one (aquatic exercise) trial [104], the result of significant but “small” exercise effects at LS (SMD: 0.29), FN (SMD: 0.27), and TH (SMD: 0.41) did not differ relevantly from our 2020 finding [1]. However, one main advantage of the present study is the application of the inverse heterogeneity model (IVhet) [9]. The IVhet approach is less susceptible to underestimation of statistical error in heterogeneous studies, i.e., results are more reliable in heterogeneous studies especially with respect to the coverage probability of confidence intervals [105]. Nevertheless, it would be a misconception to assume that the enormous heterogeneity between the trial results can be adequately addressed by statistical methods. Of course, other and our (e.g., [106,107,108,109]) meta-analytic results are simply the quintessence taken from a pool of studies with favorable and less favorable results. This is generally the case for meta-analyses; however, the situation in the field of exercise and (particularly) bone strength is much more complex than the sophisticated pharmacologic area, where this type of analysis was first applied. It is obvious that in contrast to pharmaceutic studies, the vast majority of exercise studies started immediately with the phase III study approach, without having addressed the general effectiveness of the exercise protocol in earlier pilot studies [110]. In some cases, one gains the impression that less promising interventions were applied to verify their ineffectiveness to favorably affect bone. Since there is no reliable rationale to exclude these “just look what happens trials,” the dilution of the meta-analytic results is predictable.

Another enormous problem of systematic reviews and meta-analyses in the area of exercise and bone strength is the extreme variation among the participants and intervention characteristics of the individual trials. Considering the first aspect, bone [4, 111] and menopausal status [58] might be candidates with potentially moderating effect of exercise effect on BMD. In summary, we did not, however, observe any significant differences on BMD at LS, FN, and TH between the subgroups, be it for bone or for menopausal status. Of importance, with few exceptions (e.g., menopausal status: subgroup-analyses for TH-ROI), the number of studies included in the subgroups was high enough to exclude a predominate random effect. Nevertheless, there is some evidence that interaction effects between participant and exercise parameters in particular have affected our findings. As an example, it is plausible that more intense exercise protocols were applied in the younger, i.e., early postmenopausal cohorts or participants with a lower fracture risk, i.e., participants without osteopenia/osteoporosis. Reviewing the studies, however, we observed no striking differences in exercise intensity (or training frequency) between the cohorts with diverging bone or menopausal status. Even so, a simple adjustment of the subgroup analysis for exercise intensity is debatable, since the relevance of exercise intensity on a given outcome depends on other variables such as training frequency. The most successful approach for addressing the impact of participant characteristics such as bone or menopausal status on exercise induced BMD-changes might thus be to include participants with diverging characteristics but to apply the identical protocol.

While the effect of bone and menopausal status on exercise effect in middle aged-older women are “nice to know” for the exercise specialist, supervision of the exercise protocol is an issue with significant implications for various crucial elements of an intervention including setting, personal, finances, facilities, participants, and exercise protocol. This led us to the present approach of determining the effect of supervision on the exercise program on BMD. In a recent publication, we observed a significant superiority of supervision (versus predominantly non-supervised protocols) on overall and main osteoporotic fracture incidence in middle-aged to older adults [7]. This result was supported by findings [5, 112] which reported that supervised protocols demonstrate significantly higher effects on dynamic balance, strength, and power, i.e., parameters related to fall risk [113, 114] and bone strength [8]. Fisher et al. [5] postulated that the superiority of supervised (resistance) exercise programs might be related to higher adherence, motivation, intensity progression, and safety. Exercise programs on bone strengthening applied intensive resistance, weight bearing, and impact exercises [115, 116] which also underline the relevance of supervision for effective and safe exercise protocols, especially for older people. However, our approach does not determine a significant difference between predominantly supervised versus non-supervised exercise protocols, albeit with slightly higher effects of supervised protocols for BMD at the LS, FN, and TH. Again, we have to admit that in particular exercise parameters related to supervision (or not) diluted our result, e.g., some types of exercise (i.e. dynamic resistance exercise) need higher degrees of supervision compared to others (i.e. jumping protocols). Nevertheless, accepting this result as a reliable finding would simplify the broad implementation of exercise programs in the field of osteoporosis due to lower demands on personnel and — since most non-supervised protocols applied home exercise — locations. However, the decision about more or less supervision should ultimately be made in the light of the cohort addressed (e.g., fitness and health status), specific preventative/therapeutic aims and budget considerations in order to generate the safest and most effective/efficient training setting.

Apart from supervision, one may argue that comprehensive meta-analyses or their subanalyses on crucial exercise characteristics (i.e., type of exercise, strain intensity, exercise frequency, etc.) [1, 117] might be a smart solution for generating reliable recommendations on exercise protocols. In the meantime and in line with Gentil et al. [118], we do no longer agree with this idea. In fact, the close and inherent interactions within the given exercise protocol aggravate the addressing of isolated exercise parameters. For example, exercise intensity plays an important role for bone strengthening [117], but its relevance is (among other things) dependent on training frequency (once per day or once per week?) [119]. Bearing in mind that the effectiveness of continuously unchanged stimuli is limited [120], the aspect of progression is also crucial in particular when applying year-long exercise interventions [121]. One solution to this problem might be comparative meta-analyses that include trials with two study arms with diverging exercise parameters (e.g. intensity [122] or training frequency [119]) but otherwise identical protocols (… and participants).

At this point, we would like to briefly address some limitations and particularities of the present work. (1) Apart from the general limitation that quite heterogeneous exercise protocols have to be included in the analysis, some of our eligibility criteria might be also debatable. This refers to study length (≥ 6 months) and adjuvant pharmaceutic therapy with impact on bone metabolism (≤ 10% of participants — albeit only if the number of users was similar between exercise and control). Although evidence for additive effects of exercise and hormone replacement therapy [123] or bisphosphonates [124] is low, we tried to exclude such interactions. Furthermore, considering bone remodeling as the primary mode of bone renewal in adults [125, 126] and taking regular changes of exercise intensity into account, exercise studies shorter than 6 months might not have reached the full amount of mineralized bone and thus confound the BMD assessment. (2) In this context, a limitation to be mentioned was that only few data were available on the precision of DXA measurements and the calculated least significant change (LSC). The minimum acceptable precision for an individual technologist would be 1.9% (LSC = 5.3%) for BMD-LS, 1.8% (LSC = 5.0%) for BMD-TH and 2.5% (LSC = 6.9%) for BMD-FN [127]. (3) Another predominantly biometrical source of error was that SDs of the absolute change in BMD were not consistently available and thus had to be imputed. Although sensitivity analysis on imputation strategy consistently showed significant effects, outcomes for BMD at the LS, FN and TH varied considerably depending on whether imputation was conducted with mean, minimum, or maximum correlation. On the other hand, inspecting funnel and Doi plots, LFK indices, rank, and regression tests led to the conclusion that the probability of small study effects (publication bias, outcome reporting bias, dissemination bias, etc.) with confounding effects on our finding [128] is limited. (4) Our subgroup analyses on exercise-induced BMD changes focus on bone and menopausal status and supervision of the exercise session. We think that these are important aspects to be addressed, but other participant or study characteristics, i.e., baseline physical activity/exercise status or site specifity of the exercise might be equally or even more important and should be addressed by future studies. (5) Our subgroup analysis on menopausal status allocated study groups ≤ 5 years postmenopausal to the early and study groups > 8 years postmenopausal (or cohorts 60 years and older) to the late postmenopausal subgroup. Considering the considerable individual variation around the average time of menopause of 51 years [129], including study groups with a lower range of 60 years of age might slightly confound our analysis since few subjects of these cohorts might not fulfill the criteria of 8 years postmenopause. Nevertheless, an analysis with an alternative classification (i.e., ≤ 8 years postmenopausal: early versus > 8 years: late postmenopausal) results in similar non-significant group differences. (6) Although we carefully examined the studies, classification on “predominantly supervised” versus “predominantly non-supervised” based on attendance rates might be difficult and not always meaningful. This becomes more apparent where, for example, a study protocol that applied up to two supervised DRT and WB exercise sessions and three non-supervised walking sessions/week was classified as “predominantly non-supervised” by our procedure (5).

By applying robust statistical methods (i.e., the IVhet approach [9] which is less susceptible to underestimation of statistical error in the heterogeneous exercise studies), we provided further evidence for a positive effect of exercise on BMD at LS, FN, and TH in postmenopausal women. However, the average SMD for BMD effects on LS, FN, and TH can be classified as moderate at best (i.e., 0.2 to 0.5). Our subanalysis on bone and menopausal status and supervision effects did not indicate significant differences. Summing up our recent experiences and findings, we conclude that while comprehensive meta-analyses in the area of exercise and bone strength might be useful for a rapid (but rough) overview, their practical application for deriving dedicated and reliable exercise recommendations is rather limited.