The purpose of this meta-analysis was threefold: (1) to investigate the effects of LWS, MWS or HWS of strength training on muscular strength per exercise; (2) to investigate if the magnitude of strength gain differs between multi-joint and isolation exercises; and (3) to provide a perspective on developing muscular strength across stages of training progression.
For novice and intermediate male trainees, the findings suggest that MWS and HWS strength training may lead to larger strength gains than LWS. For more experienced individuals, such as advanced and elite trainees, the sparse data available suggest that MWS and HWS strength training may create greater strength gains compared with LWS. The findings suggest that MWS and HWS strength training may produce marginally greater strength gains compared with LWS in certain contexts. LWS strength training per exercise appears to be less effective in both multi-joint and isolation exercises compared with MWS and HWS strength training.
The available data for exercise-specific 1RM provide support for the conventional view of a graded dose–response relationship where strength gains increase as a function of increased number of sets included in training. Examination of the effects of weekly set volume on exercise-specific 1RM support a graded dose–response relationship. The pooled mean ES estimate for LWS was 0.80 (95% CI 0.47–1.13) compared with HWS (0.97; 95% CI 0.68–1.26). The mean ES for LWS multi-joint-only exercises pre- to post-intervention was 0.81 (95% CI 0.65–0.97) compared with HWS (1.00; 95% CI 0.77–1.23). When LWS training was used as the control group and the HWS group was used as the experimental programme, a trivial ES of 0.18 (95% CI 0.01–0.34; p = 0.04) suggested that HWS training was more effective in producing strength gains. When LWS training was used as the control group and MWS training used as the experimental group, a trivial ES of 0.15 (95% CI 0.01–0.30; p = 0.04) suggested that MWS training is marginally more effective in producing strength gains. According to Cohen’s [17] classifications for ESs, ≤0.2, ≤0.5, ≤0.8 and ≥0.8 are considered trivial, small, moderate and large, respectively. The pre- versus post-strength training results for combined multi-joint exercises, multi-joint and isolation-only studies exposed a pre- to post-intervention graded dose–response relationship in strength gains. The data would thus suggest that MWS and HWS strength training produce marginally superior results compared with LWS.
The mean ES for isolation exercises on pre- to post-intervention strength gains in males was 0.95 (95% CI 0.30–1.60) for LWS compared with 1.10 (95% CI 0.26–1.94) for the HWS training programme. When LWS training was used as the control group and HWS training used as the experimental programme, a small ES of 0.23 (95% CI 0.06–0.40; p = 0.008) suggested that HWS training is more effective in producing strength gains for isolation exercises. The data for isolation exercises support the conventional belief in a graded dose–response pattern for an increased number of exercise sets and strength gain. Examination of the effects of weekly set volume on isolation exercise-specific 1RM was not feasible due to very small sample sizes that would lead to an over-parameterised model being performed on isolation exercises.
Examination of the effect of stage of training (beginner through elite) on strength gain was problematic as no available well controlled studies tested such a relationship. The limited available data suggested that comparable strength gains may be produced in earlier stages of training by multi-joint exercises when either MWS or HWS are employed. For advanced and elite trainees, the employment of either MWS or HWS may be considered as there was a small increase in strength in comparison with LWS training. The MWS and HWS pre- to post-strength programme produced marginally greater ESs compared with LWS training for the multi-joint and isolation combined exercise (ES 0.98, 1.01 vs 0.82, respectively). When LWS training was used as a control group and the HWS as the experimental programme, a trivial ES of 0.18 (95% CI 0.06–0.30; p = 0.003) was observed that suggested HWS training is more effective in producing strength gains.
Current Meta-Analysis-Based Recommendations for Strength Development
The existing dogma on the number of sets best driving strength development has been largely indefinable and contentious. Set volume in RT has historically been an often-debated issue, based on varying recommendations favouring multiple-set programming with evidence cited from published meta-analyses [6,7,8,9]. However, these previous meta-analyses conducted on muscular strength gain reported inconsistent and varied outcomes. Of the four meta-analyses conducted [6,7,8,9] on this subject, none provided a clear and consistent trail of evidence identifying a dose–response relationship for maximum strength gains, upon which a determination of the best set scheme can be made. The inconclusiveness of previous research and previous meta-analyses have not altered the general preference in the programming of multiple sets over single-set training to create strength gains for every stage of training progression, beginner to elite.
Three of the four meta-analyses that are included within the ACSM [26] recommendations [6,7,8] have methodological constraints and the published evidence provided is disputed [11, 14]. Rhea et al. [6], in their meta-analysis, reported that significant differences emerged between the trained and the untrained groups. Rhea et al. [6] reported that 80% of 1RM with a training frequency of 2 days·week−1 and four sets per muscle group elicited superior gains in strength. In their consideration of untrained populations, they recommended RT loading at 60% of 1RM for 3 days·week−1 and employing four sets per exercise. Otto and Carpinelli [11] questioned the inclusion of studies in the Rhea et al. [6] meta-analysis and identified several confounding factors and inaccuracies that may have influenced the reliability of the meta-analysis. These included the reporting of incorrect ESs for advanced trainees, including claims that training at 80% 1RM resulted in an ES (1.8) that was three times greater than using 85% 1RM (ES 0.65). Furthermore, they reported that training each muscle group twice per week had an ES of 1.4 which was two times greater than training three times a week (ES 0.70). The inclusion of the Rhea et al. [13] study may have introduced error or bias as it reported an ES of 2.3 between groups in the bench press and an ES of 6.5 in the leg press when comparing the means of the one-set and three-set groups. The post-training standard deviation bench press results were two to three times superior to the pre-training standard deviation in both groups and the researchers did not provide confidence intervals for ES. This ES is almost three times larger than what is designated as statistically large (≥0.8) [27]. In addition, the leg press ES of 6.5 is more than eight times greater than a large ES, which presents an extraordinary ES that is not seen in any other related papers in the scientific literature [28]. As such, the inclusion of the Rhea et al. [13] leg press data in a meta-analysis could nullify the mean ES and spuriously affect the findings by increasing the heterogeneity of the meta-analysis and erroneously favouring multiple-set programming.
In the Peterson et al. [9] meta-analysis the authors propose that as strength increases so should RT volume. However, the evidence presented within their meta-analysis cannot be used to substantiate such a position. Inferences were made stating that competitive athletes should use eight sets per muscle group to promote strength gains. Such a conclusion is inconsistent with the evidence presented in their results, specifically the small number of ESs for eight sets. They present only six ESs contributing to the mean of the eight sets, and as such any conclusions warrant caution. Although not stated by the authors, the ESs presented could be derived from only one study. In contrast, the mean presented for four sets was accumulated from 199 ESs. Any conclusions drawn about the direct impact of eight sets compared with any other number of sets would be unreliable.
The meta-analysis of Wolfe et al. [7] sought to determine if the number of sets performed and the length of the RT programme affected outcomes. The subgroup analysis identified that programmes that lasted between 17 and 40 weeks did not generate significantly higher ESs in comparison with those lasting between 6 and 15 weeks. Significant interactions were reported for the set numbers and programme length, with multiple-set programmes producing superior increases in strength compared with single-set programmes (p ≤ 0.002). Data analysis indicated that trained individuals had greater increases in strength when using multiple-set programmes (p ≤ 0.001). Single-set programmes were proposed to be best suited to untrained individuals, as similar gains were noted with both single- and multiple-set programmes. These observations led the authors to suggest that as the subject’s progression in strength matures, there should be a concomitant change in programming from single to multiple sets to stimulate continuous strength gain.
Krieger’s meta-analytic review [8] comprised 14 studies (440 participants), with 30 treatment groups and a total of 92 ESs. The results showed that multiple sets were associated with a larger ES than a single set (difference 0.26 ± 0.05; 95% CI 0.15–0.37; p ≤ 0.0001). When the dose–response model was further analysed, there was a drift towards two to three sets per exercise compared with one set (difference 0.25 ± 0.06; 95% CI 0.14–0.37; p = 0.0001). No significant difference was reported between one set per exercise and four to six sets per exercise (difference 0.35 ± 0.25; 95% CI −0.05 to 0.74; p = 0.17) or between two to three sets per exercise and three to six sets per exercise (difference 0.09 ± 0.20; 95% CI −0.31 to 0.50; p = 0.64). The possibility of publication bias was assessed using methods described by Macaskill et al. [29]. Sensitivity analysis reported that no influential studies or publication bias were observed. This was performed by removing each study in turn to investigate the effect on the result of the multiple-sets variable. Krieger [8] concluded that two to three sets per resistance exercise was associated with 46% greater strength gains than one set in both trained and untrained subjects.
Strengths and Limitations
There are several strengths of this meta-analysis that set it apart from previous analyses of set configurations. The strict inclusion criteria controlled for confounding variables when comparing the effects of LWS, MWS or HWS on strength outcomes. This meta-analysis also considered the potentially different effects of the use of isolation versus multi-joint (integrated) exercises on strength outcomes of the effects of LWS, MWS or HWS strength programmes. The design of this study also differed from others as it did not cluster outcomes; rather, data were pooled across strength measures to enhance external validity.
This analysis restricted its subject pool to male populations to better control for sex and endocrine influences. Factors that are known to affect strength include age, sex, physical activity levels, previous training status and endocrine status. Sex can influence muscle functioning and morphology [30, 31]. Men have reported greater muscle strength and size than women, due to higher levels of anabolic hormones and greater body size. The lower blood androgen levels of women have also been hypothesised to induce less relative muscle hypertrophy in response to RT compared with men [32]. However, several studies have failed to identify any difference between males and females with similar relative improvements in strength adaptations [33,34,35]. Tracey et al. [36] compared the hypertrophic response of the quadriceps of older men and women after nine weeks of training. Tracy et al. [36] reported that both male and females had an identical response of 12% in muscle volume. Conversely, results for upper body training have indicated differences in response to RT in men and women [35, 37]. Hubal et al. [38] assessed the variation in muscle size and strength in a large cohort of men and women (243 men, 342 women) after a 12-week unilateral RT programme targeting the non-dominant elbow flexor of the arm. Dynamic strength was assessed by determining the 1RM on the standard preacher curl exercise. Men and women exhibited wide ranges of 1RM strength gains from 0 to +250% (0 to +10.2 kg). In addition, men experienced 2.5% greater gains in cross-sectional area (p ≤ 0.05) compared with women. Regardless of men having greater absolute gains in strength, relative baseline strength increases in strength measures were greater in women compared with men (+25%).
Limited reliable data exist concerning the different levels of strength after RT programmes in men and women. The available data are from coefficients of variation (CV) of pre- and post-training strength measures. Some studies that analysed published means and standard deviations found equivocal strength variability in muscle size and strength for men and women [33, 39]. Equivocal data exist on whether there is an effect of sex or RT or an interaction effect between sex and RT. This may be due to issues concerning possible sex differences in variability, potentially due to small sample sizes. Previous studies found similarities in relative strength and size changes after RT [34, 40]. One factor that may explain these equivocal findings is the small sample sizes that limit the statistical power of these studies to detect significant differences between men and women.
As with previous studies, there were limitations driven by the shortcoming of primary data sources. Although the present meta-analysis endeavoured to include research papers from high-quality sources, the number of suitable studies was small and there remained differences in design and control among included studies. One of the nine included research papers used a randomised control design. The other eight did not include a control group; rather, they used a repeated measures design with baseline measure serving as the control, but baseline measures were not uniformly implemented across those studies. In this meta-analysis, the strength increases may be due to the repeated 1RM testing rather than other physiological adaptations. The exercise loading specificity of the 1RM-tested exercises may have impacted upon individuals’ performance. For example, a leg extension may have impacted upon the leg press performance, but not to the same degree as a leg press itself. Thus, the impact of specific RT loading versus non-specific exercises is accounted for in this analysis. Variation in programme order and the type of RT exercise between groups was not equivalent in all identified studies and this could impact upon set number and strength gain.
In addition, several sets of tested exercises versus nonspecific exercise can impact on an individual’s 1RM due to the ‘learning’ effect of the specifically tested exercise. This has been demonstrated by Dankel et al. [41] who conducted 1RM and maximal voluntary isometric contraction (MVC) testing on upper body isolation exercise (elbow flexion). One arm performed a 1RM test and MVC elbow extension exercises while the other arm performed 1RM test and MVC, in addition to three sets of exercises (70% 1RM) for 21 days. The results suggested that the increase in the trained subjects’ 1RM may not have been solely related to exercise volume, but was driven by the specificity of the exercise. These short-term adaptations may be due to performing the 1RM test rather than additional sets. The increase in subjects’ 1RM may have been due to a ‘learning effect’ caused by performing repeated testing sessions. This increase in strength, therefore, could be attributed to the principle of specificity as strength improvements may not be augmented by additional volume (sets). The studies by Ostrowski et al. [18] and Marshall et al. [20] included specific and nonspecific exercises that would, therefore, increase training volume. Ostrowski et al. [18] included one, two, or four sets per week of bench press with additional nonspecific assistance exercises, while Marshall et al. [20] included two, eight and 16 weekly sets of squats. The pre- to post-intervention increase in strength in some of the included isolation studies in this analysis may have been due to neurological crossover in the untrained contralateral arm. The results are applicable to isolation exercises involving smaller muscle groups as larger muscles may have different recovery patterns and properties.
Future Development and Research
This meta-analysis demonstrates that potential outliers can affect pre- versus post-intervention strength data analysis [24] and may invalidate or skew the results when evaluating pre- to post-intervention strength difference. Previous observations lacking well controlled screening procedures that include unreliable evidence create difficulties for those attempting to summarise the existing data. The findings here suggest that researchers should be cautious when performing mixed model meta-analyses (mixed-sex subject groups), as this may produce spurious conclusions. There are limitations any time a comparison that combines subject characteristics (male–female or trained–untrained, for example) is conducted and the outcomes may or may not be valid. The body of scientific knowledge would be greatly improved if more RCT investigations were conducted on same age/sex and similar training status to clarify the set dose strength effects. This would help to establish the optimum set dose–response relationship and provide larger samples for meta-analyses, thus reducing the need to include low-power studies. As has been reported, meta-analyses have limitations when including the comparative outcomes of aggregated effects that do not necessarily assess the same construct [14]. Researchers to date have over simplified their RT designs and have inadvertently produced data that provide unreliable and confusing guidance regarding set numbers and strength gain. Sampling mixed-sex groups, use of expansive age ranges, use of multiple and different measurements, and the use of different training methods has resulted in a moderately large body of evidence that cannot fully answer the question at hand individually or collectively.