Introduction

Ageing is associated with a variety of biological changes that can contribute to decreases in skeletal muscle mass, strength, and function. Such losses decrease physiologic resilience and increase vulnerability to morbidity and mortality. Countering muscle disuse through structured exercise programs, particularly resistance training, is a powerful intervention to combat physiological vulnerability and its debilitating consequences on physical function, mobility, independence, chronic disease management, psychological well-being, and mental health [1, 2]. However, only 1 in 10 Australians over 50 years of age does enough exercise to ga in any cardiovascular benefit, with estimates among the wider population of one in four people not being sufficiently active [3], with a similarly small proportion (9.6%) performing resistance training consistent with the guidelines [4]. Solutions which slow the natural course of functional decline are needed to promote greater engagement in structured physical activity among older adults.

Hardstyle kettlebell training, which promotes a unique combination of tension and relaxation in its techniques and was popularised by Pavel Tsatsouline throughout the 2000s, claims to improve measures of health-related physical fitness [5]. Trials with younger participants report improvements in upper limb endurance [6], dynamic balance and vertical jump [7], leg strength and trunk endurance [8], standing long jump and grip strength [9], VO2 [10], and 1RM barbell deadlift [11]. A profile of the kettlebell swing in novice older adults shows peak net ground reaction force is higher during a swing with an 8 kg kettlebell than a deadlift with 32 kg [12], highlighting the potential utility of the swing in exercise prescription. Anecdotally, older adults can and do engage in kettlebell training, but there is little data about the effects of kettlebell training in an older population.

Only two longitudinal trials have been conducted to address conditions associated with ageing; sarcopenia in females [13] and Parkinson’s disease [14]. These studies reported improvements in sarcopenia index, grip strength, back strength, respiratory function, and inflammatory markers [13], Timed Up and Go, upper limb strength, and lower limb strength [14]. Although these results are encouraging, a focus on clinical conditions and limited reporting of trial protocols in these studies, offer little guidance for training otherwise healthy older adults. Furthermore, a recent review [15] identified no clinical practice guidelines or recommendations for using kettlebells with older adults. For older adults, clinical experience suggest that kettlebell training may have a risk of muscle strains and bruising, osteoporotic fractures, hypertensive and myocardial events, and pelvic organ prolapse in females. Most studies investigating the effects of kettlebell training are not representative of hardstyle techniques, or its training protocols. Investigation of the effects from highly controlled single kettlebell exercises with young healthy adults, provide policy makers with insufficient information about the safety or effectiveness of community-based group exercise programs for inactive men and women over 60 years of age, thus, a pragmatic assessment of typical training practices was required.

The aim of this study was to measure change in health-related physical fitness following 3-months of moderate- to high-intensity group hardstyle kettlebell training, in insufficiently active men and women over 60, in comparison with 3-months of usual activities of daily living. Additionally, exercise adherence and compliance rates, adverse events, and participant feedback about their experience were also collected to provide some information about the safety and feasibility of the exercise program. It was hypothesised that the intervention would be safe and effective for insufficiently active older adults to engage in group-based kettlebell training, with significant clinically meaningful improvements in measures of healthy ageing compared to control. Our research question was: can group-based hardstyle kettlebell training be used safely and effectively, to engage insufficiently active older adults living in the community, to increase physical activity and promote healthy ageing?

Material and methods

Study design and sample size

The BELL trial was an exploratory, single-centre, single cohort, repeated measures, controlled exercise intervention using hardstyle kettlebell training. Testing was conducted at baseline, week 4, week 13, week 19 and week 29. The study protocol had been published elsewhere https://osf.io/dz96p/. Scheduled test dates were changed due to the COVID-19 pandemic, which resulted in unequal periods between tests. Grip strength was selected as the primary outcome for clinical relevance, with measurements recommended in routine clinical practice and in community healthcare with older adults [16]. Grip strength is a reliable surrogate for more complicated measures of arm and leg strength and a consistent predictor of falls and fractures in both sexes [17, 18], although measures such as knee extension strength are better predictors of falls risk than handgrip strength [19], and handgrip may not be a reliable proxy for overall muscle strength [20]. Low grip strength is a powerful predictor of functional limitations, poor health-related quality of life (HRQoL), strongly and inversely associated with all-cause mortality [21,22,23] and associated with higher annual healthcare costs [24]. Secondary outcomes included a core set of clinical field tests of health-related physical fitness [25]. Sample size was calculated based on the primary outcome from existing data [13] to detect an effect size change of 0.88 [26]. A total of 19 participants were required to detect 95% power (1-ß = 0.05) and test the null hypothesis of equality (α = 0.05). Thirty-two participants were recruited to account for a 25% dropout and 15% for multivariable modelling. The current report comprises the findings and analysis of primary and secondary outcomes. A pragmatic approach was chosen to evaluate the effectiveness of the intervention in usual conditions, to identify variation between individuals, and maximise external validity [27, 28]. An evaluation of the trials’ design was conducted using the The PRagmatic-Explanatory Continuum Indicator Summary 2 (PRECIS-2) tool [29]. Participants were not blinded to the hypothesis and resource constraints prevented blinding of data collection and analysis. A timeline for the study is presented in Fig. 1.

Fig. 1
figure 1

Study timeline

Ethical approval

All research activities were conducted in accordance with the Declaration of Helsinki. The trial was prospectively registered on the Australian New Zealand Clinical Trials Registry (ACTRN12619001177145) and approved by the Bond University Human Research Ethics Committee (BUHREC; Protocol number NM03279). Written informed consent was obtained by the lead investigator from all participants.

Participants

Independently living, apparently healthy but insufficiently active males and females were recruited via print media advertising from the central Gold Coast in south-east Queensland, Australia. Participants were deemed to be insufficiently active if they were not meeting the Australian Physical Activity Guidelines for older adults [30], and self-determined their levels of physical activity prior to recruitment. A minor change was made to the inclusion criteria after trial registration, with the minimum age reduced to 59 years from the originally stipulated “60 years of age”. Exclusion criteria were as follows: medical conditions or medications known to affect musculoskeletal health which limit the capacity to perform moderate-to-high intensity exercise, recent surgery or trauma, uncontrolled cardiovascular or respiratory disease, engaged in a structured exercise program within the past nine months, malignancy, inability to perform a floor transfer independently, inability to comfortably lift the upper extremities overhead, unexplained pain with fundamental movements/activities e.g. sitting, walking, lifting, carrying, pushing, pulling or twisting, and presence of hazards which would prevent body composition analysis. Further exclusion was based on inability or unwillingness to attend three times weekly supervised training for the stipulated 3-month period. Participants completed a standardised pre-exercise screening tool from Exercise & Sports Science Australia [31], and asked to provide medical clearance from their General Practitioner. After commencement of the exercise period, it was noted that one participant was unable to perform a floor transfer unassisted, and one participant disclosed having osteoporosis and diabetic peripheral neuropathy. Participants would complete 12 weeks of usual activity and 12 weeks of kettlebell training.

Exercise intervention

Participants attended three-times weekly (Mon, Wed, Fri), 45-min, group classes at the Bond University High Performance Training Centre, Gold Coast, Australia, supervised by a Physiotherapist and certified (Russian Kettlebell Certification) kettlebell Instructor. Additionally, participants completed twice-weekly (Mon, Thur) prescribed home exercises. Program design was based upon the training principles and practices described by Tsatsouline [5] with exercises and delivery adjusted to meet individual limitations and the group dynamic, to maximise facilitators and reduce barriers to engagement. Classes were conducted face-to-face for six weeks, then remotely thereafter due to COVID-19 restrictions. The training period was preceded by a familiarisation week (2 × 45-min sessions) in which the participants were introduced to a standardised mobility drill (available as supplementary data) and the foundational kettlebell exercises of a swing, clean, military press, goblet squat and unloaded Turkish get-up. Kettlebell weights ranged from 4–80 kg. Participants were provided with a modified Borg Category-Ratio (CR10) scale [32] for reporting Session Rating of Perceived Exertion (sRPE). During the first two weeks, participants were advised to work at a relatively low intensity (2–4/10: “easy” to “somewhat hard”) with a low volume training load to minimise the likelihood of experiencing Delayed-Onset Muscle Soreness (DOMS). From week three onward, participants were encouraged to work up to a sRPE of 5–7/10 (described as “hard” to “very hard”) as tolerated. Maximal effort (9–10/10) was discouraged. Where technique was acceptable and RPE appeared to be < 4/10, participants were encouraged to increase exercise intensity (kettlebell weight). Exercises were modified where necessary to account for physical limitations or emergent biopsychosocial factors likely to affect performance. Home exercise, with an easy to moderate training load target (no upper limit), was performed at the participant’s discretion with an 8 kg kettlebell (provided), or bodyweight only. At the start of each group session, participants were asked to provide feedback regarding DOMS, adverse events, or health/medical conditions since the last training session.

All training sessions (Mon-Fri) commenced with the standardised mobility routine which served as a warm-up. Training sessions were planned based on a) physical capacity of the group, b) participant feedback, c) intent to offer variety, and d) plan to progress skill, intensity, and training load volume throughout the trial. Training load was periodised to promote positive physiological adaptation. Training plans were prepared within the preceding 36 h. Participants were able to self-select weights and change any program variable within the group sessions. Participants completed a daily training record which included the exercise(s), number of sets and repetitions performed, and sRPE. Training records were collected at the end of each session then entered into a database for tracking and analysis. Due to COVID-19, ethics approval for face-to-face training was withdrawn, in effect from 23 March 2020. Training weeks 1–6 were delivered face-to-face; weeks 7–12 were delivered remotely via training videos posted to YouTube and Facebook Live. Participants were not observed at home and trained independently, receiving no individual instruction or feedback. Available kettlebells (8–40 kg) were given to participants to use at home from week 7, with each participant receiving two kettlebells, in addition to the 8 kg provided at the start of the trial. These were roughly distributed based on physical capacity i.e., the stronger participants received heavier kettlebells. Reinstatement of ethics approval delayed final testing from week 26 (7–8 May) to week 29 (27–28 May). Additionally, all participants completed an international challenge to perform 100 kettlebell swings each day for the month of May, extending the planned training period by three weeks. Training material and programming was conceived, delivered, coordinated, and analysed by the lead investigator (NM). The full 3-month program is available as supplementary data.

Control activities

Participants were asked to continue their usual daily routines and refrain from taking up new structured exercise likely to influence the outcomes. Additionally, participants were requested to not make significant dietary changes likely to alter body composition.

Outcomes

The planned schedule for data collection was changed due to the COVID-19 pandemic. The 4th and 5th rounds of testing, scheduled for weeks 22 (9–10 April) and 26 (7–8 May), took place in weeks 19 and 29 respectively, with some measures unable to be collected. Tests at weeks 0, 4, 13 and 19 were conducted at the Bond Institute of Health & Sport (schedule of data collection available as supplementary data). Tests at weeks 0, 4 and 13 were conducted in a 5 h session, with participants completing a series of five 15 min stations, beginning with a fasted body composition scan (DXA). Tests in week 29 were completed in a single morning (n = 11). All DXA scans were conducted by the same licenced operator. Other data collection stations were conducted by a trained volunteer (or the lead investigator) following standard operating procedures prepared by the lead investigator. Tests involving prolonged close physical contact were omitted from the week 19 schedule due to COVID-19 restrictions. At week 29, only GS, 6MWD and 5xFT were collected for participants over 70 years, with testing conducted outdoors; an equidistant 6MWT was conducted on a straight, level, concrete walking track in a shaded area, and the 5xFT was conducted on level grass. All data, including DXA scans, were processed and analysed by the lead investigator (NM). Invalid data were deleted from one participant where outcome measures had been adversely affected by hip pain (5xFT, SC, 6MWT, STS, sit-and-reach, 1RM and CMVJ); the participant withdrew at the end of the control period and did not participate in the training. Invalid grip measures were deleted from week 0 and week 4 from one participant who had sustained a wrist injury and was unable to perform the test as required. A single resting systolic blood pressure reading > 180 mmHg was deleted from baseline data; the participant was subsequently treated by his GP and remained normotensive thereafter. Hip extension RFD could not be calculated at week 13 due to an error in the load cell rate of data capture.

Grip strength

Grip strength was measured by Jamar handheld isometric dynamometer following a modified Southampton protocol [33].

Anthropometrics

Age was obtained from self-report at baseline. Height was assessed without shoes using the stretch stature method with a wall-mounted stadiometer (Model Holtain 602VR, Seritex, New Jersey, USA). Weight in light clothing was measured using a calibrated scale (Model WM204, Wedderburn, Ingleburn, Australia). Body mass index (BMI) was calculated as mass divided by height squared.

Cardiorespiratory endurance

Two tests of physical capacity were performed to examine cardiorespiratory capacity. Individuals with poor fitness and high resting heart rate (≥80 bpm) have the highest risk of cardiovascular disease and all-cause mortality [34]. Designated the “6th vital sign”, walking is used to assess functional status and overall health [35], with reduced lower limb strength in women associated with slower walking speeds [36]. Faster walking speed, and greater walking distance, are also associated with significant reductions in cardiovascular disease and all-cause mortality compared to slower, shorter walks, with a more pronounced dose–response effect in those over 50 years of age [37, 38]. Aerobic fitness was assessed using a 6-min walk [25, 39] conducted according to American Thoracic Society guidelines [40], with 6MWD, HR, and RPE [41] recorded. Predicted distance (6MWDpred) was calculated using a validated equation. [42].

$$6MWD_{pred}\;(m)\:=\:218\:+\:(5.14\:\times\:height)\textemdash(5.32\:\times\:age)\textemdash(1.80\:\times\:weight)\:+\:(51.31\:\times\:sex)$$

where: female = 0; male = 1.

Stair climbing is an essential functional activity for independent community-dwelling older adults, with gait speed strongly predictive of performance [43]. Stair climb performance can be improved with lower limb resistance training and stretching exercise [44], with stair climb time used to estimate aerobic capacity [45]. Submaximal aerobic capacity was measured using a stair climb test [45] with participants ascending a vertical displacement of 13.65 m via seven flights of stairs having a 34° inclination. Four flights had 15 steps, 3 flights had 6 steps (totalling 78 steps), and there were approximately 21 steps on level ground between flights. Step height measured 17.5 cm. Participants were instructed to climb the stairs as quickly as possible, with standardized verbal encouragement at each flight given by the examiner. Participants were asked to not use the handrail unless necessary and ensure one foot was in contact with the floor at all times i.e., walk not running. The time taken to climb the stairs and finish with both feet on the top step was designated the stair-climb time. The test was performed only once. Work performed (W) was calculated using the formula: W = m × g × h, where m is participant mass (kg), g is acceleration due to gravity (9.81 m.s−2), and h is vertical displacement (metres). Power (P) was obtained by dividing work by stair-climb time. Age-predicted VO2maxpred [46] and estimated VO2maxest from SCT [45] were calculated using the formulae:

$$VO_2max_{pred}\:=\:79.9\textemdash(0.39\:\times\:Age)\textemdash(13.7\:\times\:gender\;\lbrack0\:=\:male;\;1\:=\:female\rbrack)\textemdash(0.127\:\times\:weight\;\lbrack lbs\rbrack)$$

Standard error of the estimate = 7.2 ml·kg−1·min−1.

$$VO_2max_{est}\:=\:(5.8\:\times\:m\;(kg)\:+\:151\:+\:(10.1\:\times\:P))/m$$

Muscular strength and power

Knee extension force is a useful measure to evaluate lower limb function in older adults, particularly in relation to functional activities such as rising from a chair and using stairs [47], and reduced hip extension strength may compromise postural balance and increase falls risk in older adults [48]. Assessment of knee extension strength is recommended as part of a comprehensive geriatric assessment [49] and is a better predictor of functional performance than handgrip, in older adults in assisted living facilities [19]. Lower limb RFD may also be a better predictor of mobility in older adults than hand grip strength and sit-to-stand performance [50].

Maximal isometric leg extension [19] and hip extension [51, 52] were performed to examine lower limb strength. Force and RFD were recorded using a strain gauge (DBB Series S-Beam Load Cell, Applied Measurements, Berkshire, UK). Isometric leg extension was performed with the knee supported at 60° of flexion, using a custom-made seat without back or arm rests (Fig. 2). The procedure has been previously described, with high test–retest reliability in older adults (ICC = 0.91–0.95) [19]. Participants upright with forearms crossed at the chest to perform the test. The distance from the tibial plateau to 5 cm above the inferior aspect of the lateral malleolus was recorded, with the strap and foam pad positioned around the ankle at the lowest part of the leg. Hip extension was performed on a firm plinth (Fig. 3) as previously described [51, 52]. Pelvic movement was restrained by a belt, the calcaneus suspended 8 cm above the plinth with the participant’s forearms crossed at the chest to perform the test. Relative lower limb power was measured using the Sit To Stand App [53] using a height adjustable chair. Femur length was recorded as the distance from the superior aspect of the greater trochanter to the lateral femoral condyle. A second seat of fixed dimensions was introduced at week 4 to improve standardisation of the tibia and femur in the start position, allowing a comparison of STS methods. Counter-movement vertical jump was performed as a functional expression of absolute lower limb power [54], with jump height determined from a floor-mounted force flatform (AMTI, Watertown, NY, USA) by applying the Impulse-momentum (IM) relation to the force–time curve [55].

Fig. 2
figure 2

Leg extension strength test

Fig. 3
figure 3

Hip extension strength test

Muscular endurance

Rising from a seated position is an essential functional movement for older adults, which can be impacted by age and age-associated conditions. Performing too few sit-to-stand manoeuvrers throughout a day can contribute to strength impairment and limitations in physical activity, with 45 repetitions recommended as a minimum [56]. Lower limb muscular endurance was examined using the 30 s sit-to-stand test following a standardised procedure [25, 57, 58].

Flexibility

Flexibility was examined by fingertip-to-floor test [59, 60]. A sit-and-reach test (Figure Finder Flex Tester, Novel Products, Rockton, USA) was also introduced at week 4 to limit movement at the knee [61].

Body composition

A DXA scan (General Electric, GE, Lunar Prodigy, Madison, WI, USA) was conducted for each participant to determine body composition (fat and lean mass). The scanner was calibrated each morning using a manufacturer’s “phantom” following quality assurance and quality control procedures. Participants were required to wear only light clothing with metal objects removed, and positioned according to the Nana protocol [62]. Results were analysed using the commercial software provided with the machine (enCORE software, version 17, GE, Lunar Prodigy, Madison, WI, USA) using regions of interest (ROI) recommended by the International Society for Clinical Densitometry official position [63]. All scans were performed by the same technician, with intra-rater reliability and precision of DXA in evaluating BC previously described [64, 65]. All ROI analysis was performed by the lead investigator (NM). Appendicular skeletal muscle mass (ASM) was calculated as the sum of the muscle mass of the four limbs. Appendicular skeletal muscle mass (also described as the Skeletal Muscle Mass Index (SMI) [66]) was adjusted for height, calculated as ASM/height2 [16]. Body composition was also measured by BIA (Tanita MC-980MA PLUS, Tokyo, Japan) following a procedure previously described [67]. The cross-validated Sergi Eq. [68] was used to calculate ASM from BIA data.

Functional capacity and balance

Ability to rise from the floor is a clinical measure of musculoskeletal fitness, proposed to predict all-cause mortality [69], with those unable to rise having a markedly higher risk of having a fall-related injury [70]. Evaluation of floor transfer capacity is a reliable measure of older adults’ functional mobility and recommended as a standardised component of a geriatric physical assessment [71, 72]. Functional capacity was examined using a 5-times floor transfer test [71, 72]. Machine-based 1RM measures have been shown to be safe and highly reproducible for older adults, with minimal detectable changes of 1–3% (< 1 kg) [73], but prediction equations consistently underestimate 1RM [74]. A predicted 1RM kettlebell deadlift was determined using the two-point method [75]. Excursion of centre of pressure during quiet standing balance, was measured on a floor-mounted force flatform (AMTI, Watertown, NY, USA) under two conditions; eyes open, and eyes closed [76, 77].

Quality of life and sense of coherence

Exercise, regardless of type, is associated with lower mental health burden, with aerobic and gym activities, durations of 45 min, and training frequencies of three to five times per week, associated with the largest reductions [78, 79]. Training just twice a week however is likely to improve QoL and Sense of Coherence [80]. Physical activity improves happiness and mood (positive affect), self-efficacy and self-esteem in the long-term [81]. Self-efficacy is a significant mediator for older adults engaging in healthy lifestyle activities [82], with resistance training specifically having been shown to significantly improve HRQoL with greatest effect in the domains of mental health and bodily pain [83]. Health Related Quality of Life (HRQoL) and Sense of Coherence were examined using the self-administered SF-36 questionnaire [84] and 29-item SoC scale [85].

Training load

Exercises and V-TL were not set a-priori as participant’s physical capacity was unknown to the investigator. All training (Mon-Fri) commenced with a standardised mobility routine which was used as a warm-up. At the end of each group training session, participants submitted a training record of the exercises performed, weights used, number of sets and repetitions completed, and sRPE. During intervention weeks 1–6, a paper record was collected by the lead investigator at the end of each session and transcribed to a database for analysis. During intervention weeks 7–12, daily training records (Mon-Fri) were submitted by each participant via Survey Monkey. Session V-TL (kg) was programmed at an individual level during the final two weeks of the training, to enable each participant to achieve a personal best (sessional training load volume) on the final day of the program.

Attendance, compliance, and adverse events

Exercise adherence was recorded, with 100% attendance defined as completion of 36, ‘group’ sessions over the 3-month trial period. Data were collected, entered into a database, and analysed by the lead investigator. Compliance to prescribed home-exercise was also reported. On Wednesday and Friday sessions during intervention weeks 1–6, participants reported whether the prescribed home-exercise had been completed the previous day with a Y/N response. Attendance and compliance were recorded for 12 weeks, with 100% attendance and compliance defined as 60 training sessions (group and individual) over the 3-month trial period. To maximise attendance and compliance, participants received frequent individual and group encouragement, both publicly and privately. Recognition was given to overcoming challenges, extraordinary effort, and achieving a ‘personal best’. Training and communication promoted group engagement to foster a spirit of support, camaraderie, and healthy competition. A private Facebook page was heavily used to provide encouragement, maintain accountability, and foster a community spirit. Participants were encouraged to make use of the on-site coffee-shop after training and provided a limited number of drink vouchers. Adverse events were defined as any undesirable outcomes that may be related to the intervention, which were recorded by the lead investigator for analysis. At enrolment, participants were provided with an information sheet about DOMS. Participants were asked to report to the principal investigator any new, unusual, or worrying physical symptoms, with participants frequently asked about their health and wellbeing before session training commenced.

Concomitant care

Advice and guidance were provided to those who experienced muscle soreness and adverse training-related symptoms.

Statistical analysis

Statistical analyses were undertaken using SPSS statistical software version 26.0 (SPSS Inc., USA). Descriptive statistics, expressed as mean (SD) for normally distributed continuous variables, were generated for participant characteristics and all dependent variables. Normality was checked through a combination of histograms, normal Q-Q plots and the Shapiro–Wilk test. Categorical variables were summarised using frequencies and percentages. Linear mixed models (LMMs) were applied to the 32 participants regardless of withdrawal or attendance, measured at 5 time-points to model the change in quantitative outcomes after adjusting for potential confounders which included age, sex, and previous training history. Time was treated as a fixed factor to enable the assessment of any statistically significant changes between specific time-points. The individual was treated as a random effect. Random intercepts, random slopes, and random intercept-and-slope models were investigated to determine the most suitable models. The final models were fitted with random intercept only, using the restricted maximum likelihood estimation method (REML), as this produced the best fitting models with the lowest Akaike Information Criterion. Residual diagnostics were used to check distributional assumptions. Effect sizes (ES) were calculated and interpreted using Lenhard [86] and Magnusson [87], quantified as trivial < 0.20, small 0.20–0.59, moderate 0.60–1.19, large 1.20–1.99, very large 2.0–3.99, and extremely large ≥ 4.0 [88]. Statistical significance was set at the 0.05 level a priori.

Results

Participant characteristics at baseline

Participant flow through the study is presented in Fig. 4. In total, 32 eligible participants completed all outcome measures at baseline. Three participants withdrew during the control period: medical condition (n = 1), injury (n = 1) and no longer able to attend (n = 1). Twenty-nine participants commenced the intervention. Five participants withdrew during the intervention: unexpected work (n = 1), substance abuse and mental health (n = 1), back pain (n = 1), viral infection (n = 1), uncontrolled hypertension: GP requested (n = 1). Participant characteristics at baseline are presented in Table 1. At baseline, males were significantly taller and heavier than females. Comorbid health conditions at baseline included: obesity, controlled hypertension, depression, diabetes, peripheral neuropathy, osteoporosis, sarcopenia, cancer, osteoarthritis (hip and knee), persistent non-specific low back pain, Ankylosing spondylitis, hypercholesterolemia, immunosuppression, migraines, alcohol dependency, and poor sleep.

Fig. 4
figure 4

CONSORT diagram of participant flow through the BELL trial study. Abbreviation: ETOH, alcohol

Table 1 Participant characteristics at baseline

Sixteen-week change in outcomes from mixed effects modelling are presented in Table 2. Pairwise comparison pre- to post-training (week 13 to week 29) are presented in Table 3. There was a large (8 > 1.4) pre- to post-training change in grip strength of 7.1 kg (95% CI [4.9, 9.3], p < 0.001) in the right hand, and 6.3 kg (95% CI [4.1, 8.4], p < 0.001) in the left hand, exceeding the minimum clinically important difference of 5.0—6.5 kg [89].

Table 2 Estimated regression coefficients from mixed effects modelling to show the effect of training over time (N = 32)
Table 3 Linear mixed modelling results showing the mean differences between pre- and post-training after adjusting for age, sex and previous training history

Adherence and compliance

Attendance rate was 91.5%. Twenty-one participants were absent from 81 of 956 potential Mon/Wed/Fri group training sessions. Eight participants had 100% group session attendance. The reasons for absence were: viral infection (n = 17), hospital/medical (n = 12), not disclosed (n = 7), unwell (n = 7), other infection (n = 6), muscle strain (n = 5), DOMS (n = 5), ‘life admin’ (n = 3), low back pain (n = 3), COVID-19 (not infected) (n = 3), bereavement (n = 2), mental health (n = 2), weather/transport (n = 2). Compliance with home-exercise was 88.7%. Seventy-one of 639 home sessions were reported as not completed. During intervention weeks 7–12, there were 35 recordings of training voluntarily undertaken on a weekend.

Change in secondary outcomes

Cardiovascular endurance

Estimates of fixed effects showed a small (8 = 0.39) but significant reduction in resting HR from baseline of 7.4 bpm, but no significant change in systolic or diastolic blood pressure at any time points. At baseline, there was no significant difference in mean 6WMD (598.5 m) and age-predicted maximum (602.7 m). There was a significant moderate (8 = 0.85) 7% pre- to post-training increase in 6MWD of 41.7 m. There were statistically significant reductions in stair climb time at weeks 13, 19 and 29. Pairwise comparison pre- to post-training revealed a small (8 = 0.55) reduction of 2.5 s. At baseline, estimated VO2 calculated from stair climb time suggested a mean VO2 of 37.7 ml.kg−1.min−1 however, this was 54.2% higher than an age-predicted VO2 of 24.5 ml.kg−1.min−1.

Muscular strength, power, and endurance

There were small (8 = 0.47 and 0.52) increases in knee extension peak force pre- to post-training, of 49.8 N and 61.6 N in the left and right legs respectively, with no significant change in RFD. There were small to moderate changes (8 = 0.38 and 0.60) in hip extension peak force of 10.8 N and 21.0 N in the right and left hips, respectively, with no significant change in RFD. There was no significant change in lower limb power or vertical jump height however, there was a significant moderate (8 = 0.66) 23% increase of 3.3 repetitions performed during the 30sSTS test pre- to post-training.

Flexibility

There was no significant change in flexibility at any time point.

Body composition

DXA-derived appendicular lean mass significantly increased pre- to post-training by 0.65 kg (95% CI [0.08, 1.22], p = 0.016), with a corresponding increase in SMI of 0.23 kg/m2 (95% CI [0.03, 0.42], p = 0.012). There was no significant change in fat mass measured by DXA at any time point, and no significant change in muscle mass or fat mass measured by BIA between baseline and week 29.

Functional capacity and balance

Pre- to post-training, there was a significant moderate (8 = 0.8) reduction in 5-times floor transfer time of 6.0s, (95% CI [9.8, 2.2], p < 0.001) and a significant 23.3% increase (8 = 0.57) in predicted 1RM of 16.2 kg (95% CI [2.4, 30.0], p = 0.013). There was no significant change in quiet standing balance.

Quality of life & sense of coherence

There was a statistically significant 17% increase in the ‘overall health’ domain of the SF-36, but no significant change in any one sub-domain of health status, and no significant change in sense of coherence.

Training load

Change in training load volume (kg and AUs) over time is presented in Fig. 5 Cumulative total training load volume for group sessions (Mon/Wed/Fri) was 1,022,220 kg for weeks 1–6 and 2,567,834 kg at week 12. Training load volume increased by 51.2% following the transition to home-only training. At 6 weeks (n = 28), mean external and internal training load was 36,067 (12,843) kg (range = 16,820 to 69,444 kg), and 3,430 (926) AUs (range = 2,115 to 5850 AUs), respectively. At 12 weeks (n = 24), mean training load was 100,914 (42,449) kg (range = 44,369 to 243,524) and 9,094 (1,727) AUs. External training load volume on the final day of the program was 29% higher than programmed: actual = 197,520 kg, mean = 8,230 (3623), range = 3,200 to 21,120 kg vs programmed = 153,070 kg, mean = 6,123 (2,281), range = 2,810 to 12,510 kg. Training load over the 12-week macrocycle shows a linear increase, however, the weekly mesocycle and daily microcycle changes in training load, were intentionally periodised. Representative periodisation of daily training load for the 12-week intervention, has been published elsewhere: http://ow.ly/QlA750Gctb4.

Fig. 5
figure 5

Cumulative group training load over time: kg (external training load) and arbitrary training unit (internal training load)

Adverse events

As a normal and expected response, DOMS was not considered an adverse event however, 20/28 participants (71%) reported some degree of DOMS on at least one occasion, reported as mild (n = 5, 25%), moderate (n = 12, 60%) or severe (n = 3, 15%). All participants reported that their symptoms had resolved in ≤ 3 days, and 90% (n = 18) recorded that they were unconcerned by it. Four participants missed at least one training session due to muscle soreness. The number of participants experiencing DOMS was high but unsurprising and anticipated given the volume and intensity of the training. There were four non-serious adverse events. One female with a 50-year history of low back pain, experienced an exacerbation of symptoms following the final session in week 2. The participant completed the session without symptoms (no indication of mechanical stress/strain), which were attributed to a large volume of deadlifts (kettlebell ladder). The participant continued in the program in a reduced capacity before withdrawing in week seven. Two females experienced an intercostal strain, one with onset of symptoms reported during kettlebell swings of an undisclosed weight, and one while performing a 1RM kettlebell deadlift. Both participants were able to continue training and avoid aggravating exercises, with symptoms resolving consistent with natural history. One female experienced non-traumatic shoulder pain which limited overhead activity. She was able to continue training, however her symptoms did not resolve before the end of the trial. Imaging revealed no tissue or mechanical explanation for the symptoms. Although the trial was not adequately powered to assess safety as an outcome, the absence of any serious adverse events suggests that kettlebell training is likely to have a risk profile similar to other forms of resistance training.

PRECIS-2 evaluation

The PRECIS-2 score of the BELL trial was 39/45, indicating high external validity. A summary evaluation and visual representation of the domain scores using the PRECIS-2 wheel, are provided as supplementary data.

Discussion

The BELL trial was the first study to examine the effects of 3-months pragmatic hardstyle kettlebell training on grip strength and measures of healthy ageing, in insufficiently active apparently healthy older adults. The program had high engagement with few non-serious musculoskeletal events from moderate to high-intensity training, and high training load volume. We believe that frequent supervised training and personalised programming provided in the initial six weeks were key to the safe and effective implementation of community-based group kettlebell exercise for older adults. Kettlebell training resulted in a large clinically important increase in grip strength, with significant improvements in cardiovascular capacity, lean muscle mass, lower limb strength and endurance, and functional capacity. Consistent with previous observation of no non-response to resistance-type exercise in older adults [90], all participants in the present study demonstrated a positive adaptive response to one or more outcomes.

Grip strength

Handgrip strength decreases with age and is predictive of disability, morbidity, healthcare costs and mortality [24, 33, 91]. At baseline, five females and five males recorded grip strength at or below the 16 kg and 27 kg clinical cut-off values to test for sarcopenia in females and males respectively [16]. For nine of those ten participants, grip strength increased above the clinical cut-off value following training. The estimate of fixed effects pre- to post-training, exceeded the minimum clinically important difference of 5.0 to 6.5 kg [89]. This finding is consistent with findings that the greatest increases in grip strength are observed in higher intensity exercise programs, which involve gripping activities and a high percentage of 1RM [92]. Results from the present study showed improvements in grip strength ≈2 × larger than those from a comparable 12-week kettlebell study in older women with sarcopenia (66.7 ± 5 yrs, ASM 15.4 kg, Sarcopenia index 5.57 kg.m2) [13].

Cardiovascular endurance

HR & blood pressure

The 7.4 bpm reduction in resting HR was significant from baseline to 29 weeks, but differences were non-significant between week 13 pre-training and week 29 post-training. Due to changes made to the testing procedures resulting from COVID-19 restrictions, resting blood pressure at week 29 was taken immediately following the DXA scan, with participants lying supine and rested at the start of the test schedule. At all other time points, resting HR was taken at the end of the test schedule in a seated position. Change in resting HR therefore, cannot confidently be attributed to a training effect. Random effects modelling of dynamic resistance training, show a mean decrease in systolic blood pressure of 1.8 mm Hg 95% CI [3.7 to 0.011] [93], and in some populations the effects may be comparable to or greater than those achieved with aerobic exercise [94]. The present study however found no significant change in SBP from baseline, although participants were normotensive.

6-min walk distance

At baseline, mean 6MWD for the participants in the present study was 599.8 m, farther than the 486.1 (87.2) m reported by Martien [19] from 770 community dwelling older adults of a similar age, but within normal age and sex-based reference standards [95]. Shnayderman [96] reported an improvement in 6MWD of 43.0 m 95% CI [19.6, 68.0] in 26 younger adults (43.6 ± 13.5 years) engaged in 6-weeks of muscle strengthening, a result virtually identical to the 41.7 m (8.7) in the present study. A 7.0% improvement in 6MWD pre-to post-training is suggestive of a training effect, however, only improvements > 50 m are likely to exceed a minimum clinically important difference [97].

Stair climb

In a review, Ozaki [98] reported six of nine studies showing significant improvement in VO2 max in older adults engaged in resistance training, however, such improvements in VO2 may only be likely among individuals with low baseline fitness i.e. a VO2 max under 25 ml.kg.min−1. Kalapotharakos [99] reported improvements in VO2max from 6.6% to 30%, with effects dependent upon a wide range of factors including individual characteristics and program variables. In the present study, there was a large difference at baseline in the mean VO2 estimated from the stair climb (39.8 ml.kg−1.min−1) and the age-predicted maximum (23.8 ml.kg−1.min−1). Estimated VO2max calculated from stair climb time, was deemed to be unreliable, as values far exceeded the estimated maximum expected for inactive healthy older adults [100]. This is most likely due to the difference in mean age of the reference population (52 ± 16 years) being > 15 years lower than the mean age of the participants in the present study. Linear mixed effects modelling demonstrates a change in estimated VO2max of 4.9 ml.kg−1.min−1, or 12.3%. A 12.3% increase applied to the age-predicted VO2max at baseline, suggests a more reliable training effect improvement of 2.9 ml.kg−1.min−1. Although beneficial, a change less than 5 ml.kg−1.min−1 is likely less than the minimum clinically important difference for healthy older adults [101].

Muscular strength, power, and endurance

Knee extension strength

At baseline, knee extension force normalised to bodyweight was 40.0 (10.6) % and 43.6 (12.4) % for females and males respectively. These are comparable to population reference data [19, 102] and higher than functionally relevant cut-off values of 31% and 40% for females and males respectively [19]. Consistent with other studies [103, 104], leg extension force in the present study increased 19.1% and 17.4% in the right and left legs respectively, pre- to post-training.

Knee extension RFD and lower body power

For older adults, RFD is associated with reduced postural balance and impaired balance recovery after tripping. Age-associated reduction in RFD can be attenuated by life-long resistance training [105], although responses to training are highly individual [106]. In healthy older adults, improvements in RFD have been reported from explosive and heavy resistance training [103] and low-repetition power training with a weighted vest [107]. In order to improve RFD, the speed of movement may not be as important as the intent to move rapidly [108]. A distinguishing feature of the hardstyle kettlebell swing, is the intent to perform the exercise rapidly, that is, the ability to develop force quickly during a rapid voluntary contraction from a low or resting level, thus there is merit to the claim that the ballistic hardstyle swing may improve lower limb RFD [5]. Furthermore, a vertical jumping motion is recommended as a prerequisite exercise for learning the movement pattern of a hardstyle swing. One might expect therefore, that training the swing would improve vertical jump performance, having trained the movement with the intent to do so rapidly. If the present study had been powered to cover all variables, these results would not support that hypothesis, with pre- and post-training comparison showing non-significant changes in lower limb power measured by sit-to-stand performance or vertical jump height. The findings of this study, however, are consistent with a temporal and kinetic comparison of the kettlebell swing and vertical jump, indicating that a kettlebell swing may lack the specificity to improve jump performance, at least when performed by novices [109]. The large difference in knee extension RFD, between the left and right leg, was relatively consistent at baseline, week 13 and week 19. This suggests that the order in which the legs were tested, influenced the way in which the test was performed. It appears that the participants simply ‘tried harder’ with the second leg, which was typically the non-dominant limb. The 95% confidence interval, being much larger for the left knee, appears to support this hypothesis, however, this observation does reduce our confidence in these data. Although a difference in knee extension RFD between the right and left leg was observed, there was no significant training effect in either leg.

Hip extension strength and RFD

Mean peak torque of the hip extensors at baseline in the present study, was higher than previously reported [48, 52, 110] however, peak force was lower than more recent data reporting 227.2 (56.7) N [111]. Pre- to post-training comparison showed a 6.7% increase in hip extension peak force in the right hip and 12.7% in the left hip, which is small but encouraging.

30sSTS

Mean 30sSTS in the present study was 14.6 rises, ‘average’ for males and females 70–79 years of age [112], and similar to the 13.97 reported in a similar cohort [113]. The mean difference pre- to post-training was an improvement of 3.3 reps, or 23%, exceeding the minimum clinically important difference of 2.0–2.6 repetitions for patients with hip osteoarthritis [114].

Flexibility

Fingertip to floor

Multi-component exercises intended to improve flexibility, are recommended for older adults, however, the most effective intervention characteristics of exercise type, frequency, duration and intensity are unclear [115]. The efficacy and utility of flexibility as a major component in exercise prescription for most populations has been recently questioned [116], with conflicting information in older adults regarding the relationship between flexibility (and interventions to improve it), and performance in functional activities. A recent systematic review and meta-analysis [117] suggests that static stretching is not necessary to improve flexibility, and resistance training programs might provide similar outcomes. There is also strong evidence that eccentric training can improve lower limb flexibility [118], sufficient to hypothesise that eccentric loading of the hip extensors during a kettlebell swing, might improve hip flexion range. Although not statistically significant, pairwise comparison pre- to post-training in the present study showed a mean difference of 4.1 cm in the fingertip to floor measure, suggestive of an improvement, although this should be interpreted with caution [25].

Body composition (DXA)

At baseline, five females had a height-adjusted SMI below 6.0 kg/m2, with two individuals below the EWGSOP2 sarcopenia cut-off point of 5.5 kg/m2 [16]. Four males at baseline had a SMI below 7.5 kg/m2, with one below the 7.0 kg/m2 cut-off. Muscle mass decreases with age, but age explains less than 25% of the variance in strength [119]. Losses in lower limb lean mass specifically are related to compromised functional activities. Mean appendicular skeletal muscle mass at baseline of participants in the present study (17.17 kg and 25.23 kg for females and males respectively) lay within the reference ranges for age- and sex-matched Australians [120]. Mean skeletal muscle index (ASM/m2) values although marginally lower, were within the expected range; differences might be explained by stature, with shorter individuals more likely to have a lower SMI if not adjusted for height [121]. Magnitude of change in ASM pre- to post-training in the present study (687.9 g), was > 2.5 × the reported effect size from 8 weeks of moderate to hard intensity kettlebell training in sarcopenic women 65–75 yrs (MD = 260 g, 8 = 0.11) [13]. This might be explained by the sarcopenic status of participants in the study by Chen, or perhaps difference in training variables. The findings of the present study are contrary to reports of limited low-quality evidence that resistance training is an effective intervention for improving muscle mass in older adults with sarcopenia [122]. In community-dwelling older adults, fat mass is independently associated with a greater decline in HRQoL [123]. Data from the present study suggests that kettlebell training in isolation would not be an effective intervention for reducing adiposity among older adults however, consistent with previous data [124], results from the present study suggest that frequent moderate to high intensity kettlebell training could be effective for increasing lean mass. While BIA-derived ASM has been reported in the present study, BIA data was deemed to be unreliable.

Functional capacity

5-times floor transfer

The large 14.3% pre-to post-training reduction in 5 × floor transfer time was very encouraging. Due to challenges teaching the Turkish get-up (a structured floor transfer manoeuvre specific to kettlebell training) effectively to a class of 14–16 older adults, far less time was spent practicing the Turkish get-up exercise than had initially been planned. As no other exercises were expected to provide specific transference to the floor transfer test, it is proposed that far larger improvements in floor transfer might be expected with greater emphasis given to teaching the Turkish get-up to older adults.

Predicted 1RM deadlift

High intensity (> 80% 1RM) resistance training programs are recommended to counter sarcopenia and osteopenia in older adults [125, 126]. Knowing or calculating 1RM, may have very limited utility for most people [127] but its use in clinical practice and research is likely to continue [128,129,130]. In the present study, participants’ maximal or predicted maximal 1RM kettlebell deadlift did not influence training loads used during the intervention. Change in predicted 1RM was not used as a proxy for change in strength, rather it was chosen as an evaluation of the participant’s functional capacity to perform a maximal lift [131]. This is of particular interest for healthcare providers working with older adults who may have developed maladaptive cognitions relating to their actual or perceived capacity to safely lift a heavy object. [132, 133]. Performing a large training volume with kettlebells up to 80 kg, it was expected that their physical capacity would significantly change, so the moderate pre- to post-training increase of 16.2 kg (23.3%) was unsurprising. Greater improvements may have been anticipated if the participants had been able to train with the heavier (44–80 kg) kettlebells for the second half of the trial.

Quality of life

SF36 & Sense of Coherence

Handgrip and walking speed are two of the most powerful biomarkers of HRQoL in older adults [134], highlighting the importance of maintaining physical capacity as a key element in successful aging. Exercise, regardless of type, is associated with lower mental health burden, with aerobic and gym activities, durations of 45 min, and training frequencies of three to five times per week associated with the largest reductions [78, 79]. Training just twice a week however is likely to improve QoL and Sense of Coherence [80]. In the present study, overall health change was the only sub-domain of the SF-36 to improve significantly, with no significant change in Sense of Coherence. This may have reflected the disruption and anxiety caused by the concurrent arrival of SARS-CoV2 in Australia, and COVID-19 restrictions which prevented the continuation of face-to-face training mid-way through training.

The salutogenic concept of Sense of Coherence [85], which is the individual’s perceived control over and ability to improve their physical, mental, and social health and wellbeing, has been shown to predict HRQoL [135]. Health promotion strategies (should) create environments which empower people to identify and make use of their own resources to this end. Providing older adults with the knowledge, practical skills, and group training opportunities to use kettlebells appears to be a safe and effective health promotion strategy.

Training load

Hardstyle kettlebell training does not follow all of the traditional resistance training protocols in relation to sets, repetitions, loads, or rest periods. It does however involve a significant within-session training load volume, which is beneficial for strength and hypertrophy [136]. High volume training may be advantageous or necessary in some cases [137, 138], but improvements were still anticipated among participants with the lowest training load volume, as low volume resistance training can still improve muscle strength and functional performance in older adults, with no evidence of non-responsiveness [139]. An unexpected finding of the present study was the linear increase in training load after face-to-face training stopped, when participants were required to train at home with limited access to kettlebells. Training load volume was only planned at an individual level during the final two weeks of the training.

Adverse events

The participant who withdrew in week 7, had not disclosed a 50-year history of persistent non-specific back pain. Imaging revealed only common age-related changes [140], and no evidence of tissue change which could confidently be attributed to the symptoms, or caused by the training. Diagnostic imaging also failed to identify any pathology in the shoulder of the participant with shoulder pain. This was remarkable given the prevalence of rotator cuff pathology in adults over 70 years of age [141]. The two participants who experienced an intercostal strain, regained pain-free function within a typical timeframe, and were able to continue training by avoiding aggravating exercises.

The four non-serious adverse events, represent a period prevalence of 3.07 per 1000 h. This rate falls within the 95% confidence interval for injuries reported from Power yoga [142], and is considerably less than that of novice runners, with an injury rate of 16.7–19.1 per 1000 h [143]. A similar injury rate of 2.2 musculoskeletal injuries per 1000 h, has previously been reported from strength training and endurance training in older adults with arthritis [144]. Given the intentionally high physical demands of the BELL trial intervention, the investigators are confident that kettlebell training does not appear to have a higher risk profile for older adults, than other forms of resistance training performed under similar conditions.

The absence of serious adverse events during a clinical trial, has been sufficient for previous investigators to conclude that a program of high intensity exercise with older adults, is “safe” [145]. The BELL trial intervention was delivered and closely monitored by an experienced kettlebell instructor and physiotherapist. The training was programmed for participants to attain a peak sessional training load volume on the final day of the intervention period, with all participants reporting an sRPE of 9–10/10 [146]. It is unlikely that a community-based program using kettlebells, would replicate the intensity, frequency, training load volume, or rate of progression observed in the BELL trail, thus, the reported period prevalence of muscle soreness and four non-serious adverse, is likely to be considerably lower in community-based kettlebell programs.

Strengths and limitations

There are several strengths to this study including the unique focus on insufficiently active older adults, high engagement with training, and the large numbers of clinically meaningful outcome measures. This is the first study to assess the feasibility of engaging older adults in a pragmatic group-based hardstyle kettlebell training to promote healthy aging in the community. The pragmatic design with comprehensive exercise reporting, replicated a training approach which had been used successfully in a physiotherapy practice with older adults for over 18 months, increasing our confidence that it could be readily replicated with minimal barriers to entry. Functional capacity, experience of and tolerance to structured exercise, was varied among participants at baseline, with several having comorbid health conditions which typically limit involvement in high intensity training. Thus, the participants were representative of the sample population for whom this intervention could be applied. Additionally, participant-determined exposure enabled a more appropriate comparison of dose based on self-perceived ratings of perceived exertion, also typical of normal practice. The cost-effective and time-efficient nature of the training provides an attractive alternative to healthcare providers to promote group-based resistance training for older adults. However, several limitations of the present study must be acknowledged. Firstly, the single cohort, repeated measures design has challenges with interval validity, particularly with respect to the absence of blinding and lack of a separate control group. Secondly, the large number of secondary outcome measures raises the likelihood of some changes being a false positive observation, and results from secondary outcomes for which the study was not powered, should be interpreted with caution. Finally, external validity is enhanced with an intervention which had previously been used in a community-based primary care setting, however, the PRECIS-2 domains of ‘recruitment’ and ‘setting’ scored low. Program delivery from a community or clinic setting, operating within an environment or framework governed or influenced by a myriad of factors, such as clinical traditions, health service organisation, staffing and resources constraints, and funding arrangements, may influence quantitative and qualitative outcomes. Ultimately, the effect and success of a program may be significantly influenced by the local framework of healthcare driver domains, including the choice to participate and the Instructor-participant interaction [147]. Linear mixed effect modelling however, increases our confidence that the results can be generalised to a random sample of participants with similar characteristics.

Conclusion

In conclusion, our findings demonstrate that group-based hardstyle kettlebell training, performed at moderate to high-intensity, can be used safely and effectively to engage insufficiently active older adults living in the community, to increase physical activity and promote healthy aging. Insufficiently active males and females 60–80 years of age were able to train 5 days weekly for 3-months, and maintain a very high level of engagement, with no serious adverse events and very low dropout rate. Participants developed the confidence and skills to train independently at home, with large improvements in grip strength, and small to moderate changes in cardiovascular capacity, muscular strength and endurance, functional capacity, and body composition. Further investigations are warranted to determine optimal exercise prescription for insufficiently active older adults with different functional needs and varying physical capacity.