Introduction

Physical exercise is the most powerful non-pharmaceutical fracture prevention strategy in postmenopausal women [1, 2]. Dynamic resistance training (DRT) as defined as any kind of resistance exercise that involves joint movement and focuses on the development of musculoskeletal strength is considered an important component of osteoporosis prevention and therapy [3,4,5]. Nevertheless, with respect to areal bone mineral density (aBMD), recent meta-analysis on DRT reported on average low exercise-induced effect sizes at best (review in [6,7,8,9]). Reviewing the underlying studies, however, there is some evidence that not only isolated DRT protocols were included in the meta-analysis. Further, large variations between the individual study findings can be observed. The main reason for this outcome can be attributed to the complexity of exercise interventions with respect to exercise variables (e.g., exercise intensity, duration, frequency), training principles (e.g., progression, periodization), and training conditions (e.g., supervision, devices) [10]. Previous meta-analysis on DRT did not account for training parameters. At best, there was separate analysis for “exercise intensity” [6]. Thus, apart from meta-analysis that properly focuses on isolated DRT protocols, there is a need for research aimed at identifying effective training parameters to generate recommendations for exercise protocols in the area of osteoporosis.

Therefore, in the present systematic review and meta-analysis, our primary study aim was to determine the effect (size) of isolated DRT on BMD at lumbar spine and proximal femur regions of interest (ROI) in postmenopausal women in comparison with control groups and further (secondary study aim) to identify relevant exercise characteristics by sub-analysis to derive recommendations for optimized exercise protocols in clinical practice.

Material and methods

Data sources and search strategy

The present study on DRT was based on a comprehensive systematic review of the effect of exercise on (areal) BMD in postmenopausal women. This systematic review and meta-analysis followed strictly the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [11]. The study was registered in the international prospective register of systematic reviews (PROSPERO) (ID: CRD42018095097). Briefly, eight electronic databases (PubMed, Scopus, Web of Science, Cochrane, Science Direct, Eric, ProQuest, and Primo) were searched for relevant articles published up to March 1, 2019 without language restrictions. The search strategy related to the population, intervention, and outcomes and was constructed around search terms for “Bone Mineral Density”, “Exercise,” and “Postmenopausal.” Key words and their synonyms were used to sensitize the search by applying the following query: (“Bone” or “Bone mass” or “Bone status” or “Bone structure” or “Bone turnover” or “Bone metabolism” or “Bone mineral content” or “Skeleton” or “Bone Mineral Density” or “BMD” or “Bone Density” or “Osteoporoses” or “Osteoporosis” or “Osteopenia”) AND (“Postmenopause” or “Post-Menopause” or “Postmenopausal”) AND (“Exercise” or “Training” or “Athletic” or “Sport” or” “physical activity”) AND (“Clinical trial” or “Randomized clinical trial”). Unpublished reports or congress abstracts were not considered. One reviewer (MS) conducted the search and, following the omission of duplicate publication, screened studies by title and abstracts according to the eligibility criteria. In the secondary search, reference lists of articles included were reviewed to further identify relevant and eligible studies.

In summary, 42 authors were contacted by e-mail due to the following issues: (a) participants/group allocation, (b) missing drop out data, (c) separation of the pre- or postmenopausal group, (d) result given in a graph only, (e) no mean change of BMD given, (f) missing standard deviation (SD).

Inclusion and exclusion criteria

We included studies (or study arms) if they met the following criteria: (a) randomized and non-randomized controlled trials with at least one exercise group as an intervention versus one control group with sedentary/habitual active lifestyle without exercise or with sham exercise; (b) women with postmenopausal status at study onset; (c) intervention of at least 6 months; (d) areal BMD of the LS or/and the proximal femur regions “TH” and/or “FN” were listed as outcome measures at baseline and follow-up assessment; (f) BMD determined by dual-energy X-ray absorptiometry (DXA) or dual-photon absorptiometry (DPA); (g) ≤ 10% of participants on hormone (replacement) therapy (HT or HRT), osteoanabolic/antiresorptive (e.g., bisphosphonate, denosumab, strontium ranelate), or osteocatabolic (glucocorticoids) and pharmaceutic agents, albeit only when the number of users was comparable between exercise and control. For the present analysis on isolated DRT effects, we further included only studies (f) that applied isolated DRT without any adjuvant exercise component and without bone specific warm-ups with validated effect on bone (e.g., running, hopping, aerobic dance).

We excluded studies that included (a) mixed gender or mixed pre- and postmenopausal cohorts without separate BMD analysis for postmenopausal womenFootnote 1, (b) women undergoing chemo- and/or radiotherapy, and (c) women with diseases that relevantly affect bone metabolism. Further, (d) double/multiple publications from one study and preliminary data from subsequently published trials and (e) review articles, case reports, editorials, conference abstracts, and letters were not considered. For the present analysis on isolated DRT effects, we likewise excluded studies (f) that reported a pre-study history of RT type exercise ≥ 60 min/week 1 year prior to the study intervention.

Data extraction

Two reviewers (SvS and MS) independently evaluated full-text articles and extracted data from the included studies. If they could not reach a consensus, a third reviewer was consulted (KW). We designed a pre-piloted extraction form to extract relevant data. This covered the publication characteristics (e.g., (first) author’s name, title, country, and publication year), methodology (i.e., design, objectives, sample size for each group), participant characteristics (i.e., age, weight, BMI, years since menopause), exercise characteristics (e.g., intervention duration, training frequency, exercise intensity, movement/velocity, progression), compliance/withdrawals, risk assessment, BMD, and outcome characteristics.

Outcome measures

The primary outcomes in the present study was the change in (areal) BMD at lumbar spine, total hip (TH), and femoral neck (FN) regions of interest (ROI) as assessed by DXA or DPA between baseline and follow-up.

Quality assessment

All the articles that satisfied the predefined inclusion criteria were independently assessed for risk of bias by two independent raters (WK and MV) using the PEDro (Physiotherapy Evidence Database) scale [12, 13]. Disagreements were solved by discussion including a third assessor (SvS) until a consensus was reached. We classified the methodological quality of the included studies as follows: ≥ 7 = high, 5–6 = moderate, and < 5 = low [14].

Data synthesis

According to the “Cochrane Handbook for Systematic Reviews of Interventions” [15], standard deviation (SD) can be obtained from the standard error (SE) or confidence interval (CI) by using the following formulasFootnote 2:

$$ \mathrm{SD}=\mathrm{SE}\times {\surd}_N $$
$$ \mathrm{SD}={\surd}_N\times \left(\mathrm{upper}\ \mathrm{limit}-\mathrm{lower}\ \mathrm{limit}\right)/3.92 $$

Further, authors (n = 11) were contacted to provide missing SDs. In cases of no reply or unavailable data (n = 11), the exact p value of the absolute change in BMD was obtained to compute the SD of the mean change. In cases of unreported p values (n = 11), we calculated the SDs using pre- and post-SDs and correlation coefficients according to the Cochrane Handbook of Systemic Reviews [15]. Lastly, when the absolute mean difference was not available, it was imputed by calculation of the difference between post- and pre-intervention (n = 7). In cases of multiple BMD assessments, we considered only changes between the baseline and final BMD assessments.

To identify potential predictors of successful DRT protocols, we applied several sub-group analyses for the following: (a) intervention period (≤ 8 months vs. 9–18 months vs. > 18 months); (b) type of RT (machines vs. free weight vs. both types); (c) net training frequencyFootnote 3 (< 2 vs. ≥ 2 sessions/week); (d) exercise intensity (low (< 65% 1RM) vs. moderate (65–< 80% 1RM) vs. high (≥ 80% 1RM)); and (e) exercise volume per session (exercises × sets × repetitions) as structured in low (< 160 reps/session), moderate (160 to < 300 reps session), and high (≥ 300 reps/session) volume.

Statistical analysis

The statistical analysis was performed using the statistical software R (R Development Core Team) [16]. Effect size (ES) value was considered as the standardized mean differences (SMDs) combined with the 95% confidence interval (CI).

Random-effects meta-analysis was performed by applying the metafor package [17]. Heterogeneity for between-study variability was determined using the Cochran Q test; comparable to other statistical analysis, a p value < 0.05 was considered significant. The level of heterogeneity was analyzed with the I2 statistic. An I2 of between 0 and 40% is considered as low, 30 to 60% as moderate, and 50 to 90% as substantial heterogeneity, respectively [15]. For those studies with two different intervention groups, the control group was proportionally split into two groups for comparison against each intervention group [15]. Sensitivity analysis was conducted to check whether the overall result of the analysis is robust regarding the use of the imputed correlation coefficient. Funnel plots with regression test and the rank correlation between effect estimates and their standard errors (SEs), using the t test and Kendall’s τ statistic respectively, were applied to explore potential publication bias. To adjust the results for possible publication bias, we also conducted a trim and fill analysis using the L0 estimator proposed by Duval et al. [18]. A p value of < 0.05 was considered as significant for all tests.

In order to identify potential moderators of exercise, sub-group analyses were performed with the exercise parameters and their corresponding categories as listed above.

Results

Study characteristics and quality assessments

In total, our search identified 17 eligible studies ([19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35]; Fig. 1), with 20 exercise and 18 control groups. Table 1 displays participant baseline characteristics of the included studies.

Fig. 1
figure 1

Flow diagram of search process according to PRISMA [11]

Table 1 Baseline characteristics of participants in included studies (n = 17)

The pooled number of participants in the intervention and control groups was 423 and 373 women, respectively. Sample sizes in the exercise arms ranged from 10 [31] to 35 participants [27] per group. Only one study [24] included women with osteopenia/osteoporosisFootnote 4; none of the other studies considered bone status (Table 2). Age of the postmenopausal women in the studies ranged between 41 and 60 years [19] and 65–82 years [32]. Menopausal age varied from 0.5 [34] to > 21 years post [26]. Three studies [19, 27, 31] focused particularly on cohorts of “early postmenopausal women” (1–≤ 7 years post). Average body mass index among the groups averaged from 23.1 [28] to 28.7 kg/m2 [19]. Six studies included participants with sedentary/habitually active lifestyles or at least no prior RT exposure [20, 22, 26, 28, 30, 33, 35], 8 trials involved participants with exercise activities presumably with minor effects on bone [19, 21, 23,24,25, 29, 31, 32], and two studies did not provide corresponding information [27, 34]. The studies were conducted in Australia [29], Brazil [22, 30], Canada [21, 23, 36], Germany [24], Hong Kong [35], and the USA [19, 20, 25,26,27,28, 31, 32, 34].

Table 2 Exercise characteristics of included studies (n = 16)

Intervention characteristics

Vitamin-D and calcium supplementation

Six studies provided Vit-D and/or calcium supplementation [19, 21, 23, 26, 32] in their exercise and control groups.

Exercise

Table 2 specifies the exercise protocols of included studies. Apart from three studies [19, 25, 32], all trials compared a single exercise group (EG) with a single non-exercise control group (CG). From the criteria, all the studies applied DRT; the majority of studies used conventional RT machines or free weights, one study [22] applied “Pilates exercises” with specific devices (e.g., reformer, cadillac), and another study [35] used an resistance band of low–moderate strength.

Length of the intervention (or study) ranged from six [19, 22, 24, 29] to 24 months [34]; no study reported a delay between the end of the intervention and the control assessment. Most studies focused on all or most main muscle groups.Footnote 5 Kerr et al. [25] applied a unilateral DRT that stresses the “ipsilateral forearm and hip region,” Sinaki et al. [34] focused on back strengthening in a prone position, Woo et al. [35] predominately conducted hip and lower limb exercises, and Maddalozzo et al. [27] specified back squats and deadlifts. Apart from two studies that did not provide sufficient information for the LS-site [25, 35], all the studies addressed their specified BMD ROI (i.e., LS and hip) with specific exercises.

Most studies prescribed a training frequency of three sessions per week (Table 2); however, when considering participant attendance, the net training frequency of five studies [24, 26,27,28,29] was on average below two sessions per week. Length of the exercise sessions varied from about 1–2 min (i.e., 10× back extension [34]) to about 120 min (i.e., 36 sets × 20 reps, 2–3 min of rest between the sets). Most studies applied a multiple set approach ([19,20,21, 23,24,25,26,27,28, 30, 32, 33]. The protocol of Nicholson et al. [29] scheduled 10 × 4–6 min blocks of one (e.g., squats) or several exercises for the same muscle groups (e.g., chest, back, triceps). As a result, repetitions per set for a single exercise were up to 108 reps (132 reps/block) [29] for the latter study, but most studies applied sets with 7–12 repetitions [19,20,21,22,23,24,25,26,27,28, 30,31,32,33,34].Footnote 6 Three studies [19, 25, 32], comparing high versus low intensity RT protocols, further scheduled sets of 14–20 reps in their low intensity study arms. Correspondingly, relative exercise intensity ranged between 80% 1RM [19, 20, 25, 26, 28, 30, 32] and ≤ 30% 1RM [29, 34]. Absolute exercise intensity (i.e., “effort”) was rarely prescribed, however. Five studies prescribed either work to repetition maximum [30, 31, 37] or work to muscular fatigue [23, 26] as a set endpoint [38]; another study [22] referred to 5–6 (i.e., strong–strong+) on the Borg CR10 scale. Reviewing the repetition number and relative exercise intensity (% 1RM), some studies [34] or study arms [19, 32] clearly exercised with low or very low effort.Footnote 7 Time under load or velocity during the different sections of the movement (concentric–isometric–eccentric) [39] was also rarely mentioned [24, 27, 33]. Time under load varied from 3 to 4 s [40] to 5–6 s [36] per repetition. None of the studies reported an explosive movement in the concentric or eccentric phase. Apart from the 24 month study by Sinaki et al. [34], progression or at least regular adjustments of exercise intensity was realized by all the studies. The application of periodization models [41] was not reported by any of the studies.

Most studies focused on a supervised group exercise protocol ([19,20,21,22,23,24,25, 27,28,29,30,31,32, 35], while two other studies relied on partially supervised individual gym-based RT [33] or non-supervised exercise training at home [34].

The Pedro Score of the reviewed studies ranged from 4 to 8 of 10 total score points (Table 3). Level of agreement between the raters for methodological quality of the studies was 100%. The methodological quality of five studies can be considered as high [35],Footnote 8 10 were considered to be of moderate, and 2 of low quality. According to the mixed-effects analysis, no significant differences between the Pedro Score categories (high vs. moderate vs. low) were detected at LS- (p = 0.639), FN- (p = 0.968), or TH-BMD (p = 0.416).

Table 3 Assessment of risk of bias for included studies

Results for primary outcomes

Apart from two studies [31, 34] that applied DPA, all the others used DXA. Furthermore, all the other studies except two ([25]: hip only; [34]: LS only) determined both BMD at LS and proximal femur regions of interest.

Effect of exercise on LS-BMD

Sixteen studies with 18 exercise groups evaluated the effect of exercise on LS-BMD. In summary, the exercise interventions resulted in significant positive effects (p = 0.001). The pooled estimate of random effect analysis was SMD 0.54, 95% CI 0.22–0.87, however with a substantial level of heterogeneity between trials (I2 = 74.8%, Q = 70.1) (Fig. 2). A sensitivity analysis imputing minimum SD (best case; SMD 0.70, 95% CI 0.27–1.13) or maximum SD (worst cases: SMD 0.42, 95% CI 0.17–0.67) resulted in significant result in all cases. We expect that the mean value imputation comes closest to the true effect (Fig. 2).

Fig. 2
figure 2

Forest plot of meta-analysis results at the LS. The data are shown as pooled standard mean difference (SMD) with 95% CI for changes in exercise and control groups

The funnel plot suggested positive evidence of publication bias (Fig. 3). The regression (p = 0.920) and rank (p = 0.881) correlation test for funnel plot asymmetry did not indicate any significant asymmetry. Adjusting for possible publication bias using a trim and fill analysis [18] did not result in varying results.

Fig. 3
figure 3

Funnel plot of the DRT studies that address LS BMD

Sub-group analyses for LS-BMD

Intervention duration

Of 18 groups, 6 training groups conducted short-term interventions (≤ 8 months), 11 groups applied a moderate duration (9–18 months) intervention, and one training group scheduled a 24-month intervention (Table 2). According to a mixed-effects analysis, no significant difference was observed between the sub-groups (p = 0.421).

Type of exercise

Of 18 training groups, 10 groups worked with resistance training devices, four with free weights or resistance bands, and four conducted a mix of both types (Table 2). According to a mixed-effects analysis, no significant difference was observed between the sub-groups (p = 0.700).

Training frequency

Of 18 training groups, 5 groups exercised fewer than 2 sessions per week and 13 groups exercised ≥ 2 sessions per week (Table 2). According to a mixed-effects analysis, a significant difference was observed between the two groups (p = 0.002). The sub-group analysis demonstrated the highest effects sizes for the lower training frequency (SMD 1.26, 95% CI 0.88–1.64) compared to an SMD of 0.24 (95% CI −.05–0.54) in the sub-group that exercised ≥ 2 sessions per week.

Exercise intensity

Of 18 training groups, 5 groups exercised with low (< 65% 1RM), 7 with moderate (65–< 80% 1RM), and 6 with high relative intensity (≥ 80% 1RM) (Table 2). According to a mixed-effects analysis, no significant difference was observed between the sub-groups (p = 0.404).

Exercise volume/session (exercises × sets × reps)

Of 18 training groups, 7 groups applied a low (< 160 reps/session), 8 a moderate (160 to < 300 reps/session), and 3 a high (≥ 300 reps/session) exercise volume/session (Table 2). The mixed-effects analysis did not demonstrate any significant difference between the sub-groups (p = 0.697).

Effect of exercise on FN-BMD

Fifteen studies with 18 exercise groups evaluated the effect of exercise on FN-BMD. In summary, the exercise interventions resulted in significant (p = 0.005), but low effects sizes (SMD 0.22, 95% CI 0.07–0.38) (Fig. 4). There was a negligible level of heterogeneity in estimates of the exercise effect (I2 = 0.0%, Q = 13.0). Sensitivity analysis of imputation determined that even in the worst case (i.e., imputing maximum SD), there is a significant effect (SMD 0.17, 95% CI 0.02–0.33, p = 0.027). Results listed in Fig. 4 are based on mean value imputation.

Fig. 4
figure 4

Forest plot of meta-analysis results at the femoral neck. The data are shown as pooled standard mean difference (SMD) with 95% CI for changes in exercise and control groups

The funnel plot indicates evidence for a publication bias (Fig. 5). The regression (p = 0.604) and rank correlation test (p = 0.601) for funnel plot asymmetry indicate relevant asymmetry. The analysis indicates missing studies on the lower right-hand side. A trim and fill analysis resulted in slightly higher effects sizes (SMD 0.26, 95% CI 0.11–0.41), after adjusting for publication bias.

Fig. 5
figure 5

Funnel plot of the DRT studies that address femoral neck BMD

Sub-group analyses for FN-BMD

Intervention duration

Of 18 groups, 6 studies applied a short, 12 groups a moderate, and no group a long duration of the exercise intervention (Table 2). A mixed-effects analysis did not observe significant differences between the sub-groups (p = 0.694).

Type of exercise

Ten groups worked with resistance training devices, two with free weights or resistance bands group, and six groups conducted a mix of both types (Table 2). A mixed-effects analysis demonstrated no significant differences between the sub-groups (p = 0.490).

Training frequency

Five groups exercised fewer than 2 sessions per week and 13 groups exercised ≥ 2 sessions per week (Table 2). In contrast to LS-ROI, no significant difference was observed between the two groups (p = 0.260) from mixed-effects analysis.

Exercise intensity

Four groups exercised with low (< 65% 1RM), 7 with moderate (65–< 80% 1RM), and 7 with high relative intensity (≥ 80% 1RM) (Table 2). The mixed-effects analysis determined no significant differences between the sub-groups (p = 0.279).

Exercise volume/session (exercises × sets × reps)

Six groups applied a low (< 160 reps/session), 7 a moderate (160 to < 300 reps/session), and 5 a high (≥ 300 reps/session) exercise volume/session (Table 2). No significant differences between the sub-groups (p = 0.373) were demonstrated by the mixed-effects analysis.

Effect of exercise on TH-BMD

Nine studies with 11 exercise groups evaluated the effect of exercise on TH-BMD (Fig. 6). In summary, the pooled estimate of random effect analysis was 0.48, 95% CI 0.22–0.75. Level of heterogeneity between trials was low (I2 = 35.8%, Q = 14.7). Sensitivity analysis demonstrated that even in the worst case (i.e., maximum SD), there is a significant effect (SMD 0.44, 95% CI 0.22–0.65, p = 0.001). Results listed in Fig. 6 are based on mean value imputation.

Fig. 6
figure 6

Forest plot of meta-analysis results at the total hip. The data are shown as pooled standard mean difference (SMD) with 95% CI for changes in exercise and control groups

The funnel plot suggested positive evidence of publication bias (Fig. 7). The regression (p = 0.013), but not the rank correlation test (p = 0.218) for funnel plot asymmetry, demonstrated significant asymmetry. Comparable to FN-BMD, there is a lack of studies on the (lower) right-hand side. A trim and fill analysis resulted in considerably higher effects sizes after adjusting for publication bias (SMD 0.67, 95% CI 0.40–0.93).

Fig. 7
figure 7

Funnel plot of the DRT studies that address total hip BMD

Sub-group analyses for the TH-BMD

Intervention duration

Of 11 groups, 5 training groups were classified as short-term, 6 groups as moderate, and no training groups was categorized as long-term interventions (Table 2). A mixed-effects analysis indicated no significant difference between the sub-groups (p = 0.835).

Type of exercise

Of 11 groups, 7 groups worked with resistance training devices, three with free weights or resistance bands group, and one group conducted a mix of both types (Table 2). A mixed-effects analysis indicated a significant difference between the sub-groups (p = 0.013). The sub-group analysis demonstrated the highest effects sizes (SMD 0.89, 95% CI 0.59–1.19) for the “free weight” training group (vs. RT devices 0.23, − 0.09–0.55; vs. mix 0.25, − 0.35–0.84).

Training frequency

Only two training groups applied low training frequency (< 2 sessions/week); nine training groups prescribed a high training frequency (≥ 2 sessions/week). Differences between the groups were significant (p = 0.023); sub-groups analysis provided higher effect sizes for the sub-group with lower training frequency (SMD 0.94, 95% CI 0.57–1.31 vs. high frequency 0.34, 0.07–0.61).

Exercise intensity

Four groups worked with low relative intensity (< 65% 1RM), four with moderate relative intensity (65–< 80% 1RM), and three with high relative intensity (≥ 80% 1RM) (Table 2). The mixed-effects analysis did not determine any significant differences between the sub-groups (p = 0.090).

Exercise volume/session (exercises × sets × reps)

Two groups exercised with low, six with moderate, and three with high exercise volume/session protocols (Table 2). No significant differences between the sub-groups (p = 0.84) were observed.

Discussion

In this systematic review and meta-analysis, we clearly confirmed the significant positive effects of DRT type exercise on BMD in postmenopausal women. However, effect sizes differed considerably between the regions of interest. While SMD for the LS-BMD (0.59) and total hip BMD (0.48) can be considered as moderate, the corresponding effect for the FN was quite low (SMD 0.22). We hypothesize that different loading configuration during DRT might predominately explain the different results at not only LS and FN but also FN and TH-ROI.Footnote 9 One explanation for the significantly lower effect on the femoral neck region during resistance exercise might be the high stress level from everyday activities at this site. Due to leverage ratios, weight-bearing loads in one-legged stand situations such as walking result in high tensions of the abductor muscles and high stresses especially at the femoral neck region. Force measurements revealed corresponding loads of approximately three times the body mass (e.g., [42]). Also, RT-induced joint reaction forces might not have exceeded the threshold for bone adaptation, and hence, no exercise effect occurred.

Summarizing the few other meta-analyses [6, 8, 43, 44] that focus on the effect of RT on BMD at LS, or proximal femur ROIs, effect sizes vary considerably. Most of them reported negligible to low effects of RT on BMD, be it at the LS (SMD ≤ 0.24 [43]) or femoral neck (SMD ≤ 0.21 [9]). We observed higher effects sizes particularly for LS (SMD = 0.54) and TH-BMD (SMD 0.48). Due to the longer search period, we included more studies than most other systematic reviews but the main difference between the present study and previous analyses might be the more careful screening of eligibility [43] related to isolated DRT protocols. Of importance for the generation of exercise protocols, one analysis [6], which divided studies according to their exercise intensity, reported a missing effect for low “force” RT protocols (LS n = 5, MD − 0.17%; FN n = 3, MD − 0.03%; TH, n = 3: MD 0.21%). “High force” RT protocols (≈ 60% 1RM), however, demonstrated significant but low–moderate BMD effects at the LS (n = 8, MD 0.86%) and FN (n = 8, MD 1.03%) but not for BMD at TH (n = 5, MD 0.11%, 95% CI − 0.06–0.29%). On the other hand, the study of Martyn-St-James et al. [8] that included only high intensity DRT studies and our sub-group analyses did not confirm the result that RT with moderate (65–< 80% 1RM) or high intensity (≥ 80% 1RM) is superior to RT protocols with low “force” [6].Footnote 10 If anything, reviewing the data of the sub-analyses revealed some unexpected findings.

Intervention duration

Considering that the mechanical stress during RT induces changes of BMD in adults might be triggered predominately by remodeling, we hypothesized that studies below 9 months of length would not determine the full amount of new mineralized bone [45].Footnote 11 Nevertheless, addressing this issue by mixed-effects analysis, no significant BMD difference was observed between studies of longer and shorter durations, be it at the LS-, FN-, or TH-ROI. One may thus speculate that despite (Table 2) progressive increase of exercise intensity,Footnote 12 no relevant further exercise-induced effects occur after initial bone adaptation, be it by modeling or (fast) remodeling [24, 46]. We are unable to reject this conjuncture for pure RT protocols; however, previously published studies applying mixed exercise long-term protocols [47, 48] observed an ongoing effect of exercise on BMD at LS and FN up to 16 years.

Type of exercise

From a pragmatic point of view, it is important to determine whether specific RT devices are needed to generate successful exercise programs. In summary, in consistent favor for “free weight training” (vs. “device” and “mixed type” of training), we observed BMD differences at the LS-, FN-, and total hip-ROI, but albeit significant for the TH-ROI. This result is very welcome, not only due to the much lower material effort of free weight training. Of relevance for older people, free weights might be more favorable to increase function [49] and in particular leg extensor strength [50] with its crucial relevance on mobility limitations, disability, morbidity, and mortality [51,52,53].

TrFr

This parameter might be the most important aspect for designing an exercise protocol. Apart from its direct impact on the outcome addressed, training frequency (TrFr) immediately affects the feasibility of the program and thus participant compliance [54]. In summary, our result clearly indicates that lower net training frequency (< 2 sessions/week, s/w) demonstrated significantly higher effect sizes for BMD changes at the LS-ROI and TH-ROIFootnote 13 versus higher training frequency (≥ 2 s/w). At the latest at this point, we have to subject our meta-analytical results to a critical review. Although some exercise studies did not detect significant BMD differences after exercising with varying TrFrFootnote 14 [55,56,57], other studies [58,59,60,61] clearly demonstrated significantly higher effect sizes for BMD changes at LS and hip-ROI when applying TrFr of at least 2 s/w, compared with 1–< 2 s/w. In these studies, lower TrFr was not only less favorable but showed no effects on BMD at LS or hip-ROIs at all. Nevertheless, there might be some explanations for at least similar BMD results after exercising with different TrFr. Firstly, one may speculate that higher intensity might compensate the effect of lower frequency or (vice versa) high frequency combined with high intensity might result in incomplete adaptation to exercise [10]. However, a sub-analysis combining training frequency with exercise intensity did not support this hypothesis. From the literature results mentioned above,Footnote 15 it would also be conceivable that particularly during the early phase of an intervention, each bone-specific exercise protocol might trigger positive effects on BMD largely independent of TrFr. However, combining training frequency and intervention duration in another sub-analysis did not support this hypothesis, either. Finally, the rather low variance within net TrFr (≈ 1.5–3.5 s/w) might confound a proper result on this issue. However, summarizing the result on TrFr of the present study, from a sport-scientific point of view, it is hardly possible that in this reasonable range of TrFr, a lower TrFr triggers significantly higher effects on BMD changes compared with higher TrFr.

Exercise intensity

Another key parameter of exercise effects on a given outcome is “exercise intensity.” We categorized relative exercise intensity according to % 1RM (low < 65% vs. moderate 65–< 80% vs. high intensity ≥ 80% 1RM) as listed by the exercise trials. In summary, however, the sub-group analysis did not reveal significant differences between the groups. Although not consistently determined (e.g., [19, 57]), there is a high level of evidence [25, 62,63,64] that high exercise intensity is superior to moderate or low exercise intensity for addressing BMD. The superiority of high intensity RT is strongly supported by basic research [65, 66], which indicates that the higher strain magnitude generated by higher deformation of the bone increases bone formation linearly to its deformation magnitude. Contrary to a fixed “bone adaptation threshold” at 1000 μΣ, as suggested by the Mechanostat theory [67], other authors revealed that loading thresholds for modeling/remodeling vary between different skeletal sites, according to their habitual loading history [68,69,70]. However, as reported, we did not find any evidence for the superiority of high intensity RT at any ROIFootnote 16 addressed by this study.

Exercise volume/session

Basically, there is a close interaction between exercise volume, in particular cycle number (i.e., number of reps) and strain magnitude (i.e., exercise intensity) [10]. With respect to bone physiology, the number of loading cycles is negligible when applying a high strain magnitude [71]; however, there is some evidence [5, 72] that higher cycle numbers might compensate for low to borderline strain magnitudes. In this context, Cullen et al. [73] demonstrated that 40 repetitions with a strain magnitude of 1000 μΣ did not relevantly affect bone formation rate, while 120 or 400 reps resulted in a significant increase in this parameter. However, addressing the relevance of exercise volume/session for BMD changes by our sub-analysis, we do not observe any relevant (p ≥ 0.373) effect of this parameter.

Limitations

In summary, our evaluation of exercise characteristics with particular relevance on BMD to identify moderators of exercise effects on bone strength largely failed. Accordingly, we are unable to recommend any promising DRT protocols for bone strengthening. In his critical review, Gentil et al. [74] questioned the relevance and practical application of meta-analytic results in strength training. While this might be going too far, it is nevertheless obvious, however, that the complex interaction of exercise variables, training principles, and training conditionsFootnote 17 prevent, or at least aggravate, a proper analysis of single exercise parameters even when focusing on relative homogeneous types of exercise (i.e., DRT). This is even more the case when applying exercise regimens in the real world and not in laboratory-based, artificially supported study settings. In addition, the brief reporting on relevant characteristics in research papers often does not provide sufficient information. Thus, meta-analyses might be an appropriate tool for determining the general effects of dedicated exercise on a given outcome, but their ability to distinguish between exercise parameters is more limited.

Some limitations and features of this work should be addressed to allow the reader to adequately interpret our findings and to follow our conclusions.

(1) We set out to determine the effect of preferably isolated DRT. DRT was defined as any kind of resistance exercise that involves joint movement and focuses on the development of musculoskeletal strength,Footnote 18 correspondingly excluded studies with other types of exercise, be it as training components or (bone) specific warm-ups. However, in reality, our approach might not always be considered consistent. Indeed, we included a study that also applied short bouts of rowing [26]. While accepting that 10 min of intense rowing can no longer count as RT,Footnote 19 we did not think that the joint reaction force character of rowing [26] would confound our results. In parallel, studies that applied cycling or stretching, i.e., exercises with no relevant mechanical impact on bone [79, 80], were included. (2) Although, with 17 studies, including 20 exercise and 18 control groups, our sample size of isolated DRT studies was higher than the sample size of recent meta-analyses, the statistical power might have been too low to address some dedicated issues by sub-group analyses. This limitation refers particularly to TH-ROI with considerably lower sample sizes. (3) There is a consistent lack of reporting of relevant DRT exercise parameters in the present literature; correspondingly, we are unable to evaluate all the promising exercise variables. Apart from absolute intensity (“effort”),Footnote 20 movement velocityFootnote 21 was rarely reported [27]. However, strain rate (corresponding to movement velocity in DRT), as defined as alteration in strain magnitude per second during the acceleration or deceleration of loading (μΣ/s) is an important mechanical parameter. Turner et al. [81], for example, observed a linear increase in bone formation rate with higher strain rates when using a protocol with constant strain magnitude but varying strain rates. Von Stengel et al. [82] confirmed this finding for DRT, by comparing fast-explosive vs. slow movement velocity during high intensity DRT. The authors reported that BMD changes in the power training group (i.e., explosive concentric velocity) significantly exceeded the results of the resistance training group (i.e., TUT 4 s–0 s–4 s). Considering the importance of this parameter for bone strengthening and the easy and safe applicability of high strain rates/fast movement velocity generated by joint reaction forces during DRT even in older, more vulnerable cohorts [83], more exercise studies should focus on this exercise variable. (4) Although we did our best to adequately classify our exercise characteristics according to exercise terminology or bone physiology, we admit that some of the categorizations (e.g., exercise volume/session) were made somewhat arbitrarily in order to ensure an appropriate distribution for comparisons. (5) Even after adjusting primary study outcomes (LS-, FN-, TH-BMD) for multiple testing, the significance of the results remained. However, following recent recommendations [84], we do not adjust on secondary outcomes (i.e., sub-analyses). (6) There is some evidence for a publication bias for LS-, FN-, and TH-BMD data. Due to the preference to report positive effects [85], the true effect size of exercise on BMD was in general considered lower for unadjusted data. However, the lack of studies in the (lower) right-hand corner of the funnel plot indicates that small-moderate size studies with positive effects are missing. Indeed, using trim and fill analysis [18], we determined no changes for the LS-BMD but higher effects sizes for FN and particular TH-BMD.

In conclusion, it is difficult to generate exercise recommendations on bone strengthening based on the meta-analytic results of the present exercise trials. Uncritical acceptance of the acquired meta-analytic data is certainly unwarranted in this context. Based on this experience, we conclude that dedicated, accurately designed randomized controlled exercise trials might be the more appropriate tool for addressing single exercise characteristics and thus generating exercise recommendations in the area of osteoporosis prevention and therapy.