Introduction

Sarcopenia, the age-related loss of muscle mass and function, is common and clinically important [1]. It is associated with an increased risk of falls, future disability and dependency, hospital admission and earlier death [2,3,4]. Finding new ways to prevent or treat sarcopenia is therefore an important area of research. Conducting clinical trials for sarcopenia is however not straightforward. Measuring muscle mass and muscle function are important efficacy outcomes in sarcopenia trials, but health related quality of life is also a key consideration [5]. Whilst there are many generic tools for measuring health related quality of life, until recently tools designed specifically for measuring quality of life in people with sarcopenia were lacking.

The SarQoL questionnaire was introduced in 2015 [6] to fill this gap. Originally derived in French and tested with older people from a Belgian outpatient clinic, it has been translated into multiple languages including English [7]. The questionnaire consists of 55 items organised into 22 questions across seven domains: Physical and mental health, Locomotion, Body composition, Functionality, Activities of daily living, Leisure activities and Fears.

Although some validation work has been performed [8,9,10], there is a need to test the internal consistency, responsiveness (how much the measure changes when a real change in health occurs) and concurrent validity (how the measure compares to measures that would be expected to be related to the measure under test) in a UK population of older people with sarcopenia. In particular, there is a need to derive the minimum clinically important difference for SarQoL using currently recommended anchor-based methods [11] to enable sample size calculations for clinical trials, but also to enable interpretation of effect sizes derived from intervention studies.

The aim of this analysis was therefore to test the internal consistency, responsiveness and concurrent validity of the SarQoL tool in a cohort of older people with sarcopenia in the UK.

Methods

Study population

We analysed data from the UK Sarcopenia Network and Registry (SarcNet) pilot study; the study design and baseline data have been reported previously [12]. SarcNet was designed as an observational study with a baseline visit and a six-month follow up. The target population was people aged 65 and over with self-reported impairment in physical function. To ensure that as many potentially eligible participants were included in SarcNet, we used a SARC-F score [13] of 3 or more out of 10 at telephone pre-screening (in line with our previous LACE randomised controlled trial and with recent data on the ability of SARC-F to detect patients with probable sarcopenia in similar populations) [13,14,15] rather than the more commonly used cutoff of 4. Participants were recruited from primary care organisations (General Practices) in the UK and assessed at six hospital-based recruitment sites. Exclusion criteria were life expectancy of less than 6 months in the judgement of the local investigator, participation in an interventional study within the last 30 days. Other exclusion criteria were: presence of a permanent pacemaker with an atrial sensing lead or presence of an implantable cardioverter-defibrillator, peripheral oedema present above knee level or fever at the baseline visit (all contraindications to bioimpedance testing). Previous analysis of baseline data [12] showed that 94% of those in SarcNet fulfilled the 2019 European Working Group on Sarcopenia in Older People (EWGSOP) criteria [16] for probable sarcopenia.

Measures collected

The SarQoL questionnaire (in its English translation) [7] was administered by the research nurse as part of the SarcNet study at baseline and at six-month follow-up to assess quality of life. Baseline visits were conducted face-to-face (in a research clinic or in the participants own home). Due to limitations on face-to-face research activity imposed in the UK as part of the pandemic response to COVID-19, all but eight follow-up visits were conducted by telephone by the research nurses. The SARC-F score was collected by telephone at pre-screening and at the six-month follow-up encounter (whether face to face or by telephone) by the research nurses.

At the baseline visit, maximum hand grip strength was measured using a Jamar hydraulic dynamometer (Lafayette Instrument Company, USA) [17]. Three measurements were taken on each hand and the maximum value was used for analysis. Appendicular lean muscle mass was measured using the Akern 101 bioimpedance system (Akern SRL, Pontassieve, Italy). Resistance and reactance were recorded and the Sergi equation [18] was used to derive appendicular skeletal muscle mass index (appendicular skeletal muscle mass divided by height squared). The Short Physical Performance Battery (SPPB) [19] was conducted, comprising side-by-side, semi-tandem and tandem balance tests, gait speed over 4-m walk distance, and five times sit to stand time from a chair without using arms to assist. At the six-month follow-up visit, all participants were asked two questions to assess global change in a) fitness, and b) quality of life. Change in fitness was assessed by the response on a 7-point Likert scale (much worse to much better) to the statement “Since the first visit, my overall fitness is…”. Change in quality of life was assessed by the response on a similar 7-point Likert scale to the statement “Since the first visit, my overall quality of life is…”. Due to only small numbers of participants recording the most extreme changes on the Likert scale, responses for ‘much better’ and ‘much worse’ were amalgamated with ‘better’ and ‘worse’ respectively. Few participants reported minimal global improvement, thus an additional category of ‘any improvement’ (including those reporting slightly better, better or much better) was also derived.

Statistical analysis

All statistical analyses were conducted using SPSS v26 (IBM, New York, USA). A two-sided p value of < 0.05 was taken as significant for all analyses. Descriptive statistics were generated for the full baseline group, those that dropped out before 6 months, and those that underwent follow up by telephone and face to face at 6 months; means and standard deviations were reported for normally-distributed continuous variables, medians and interquartile ranges were reported for continuous variables that were not normally distributed on visual inspection. Baseline characteristics of those undergoing telephone follow-up were compared with those undergoing face-to-face follow-up and with those who did not undergo assessment at 6 months; Student’s t-test and Mann-Whitney U tests were used for normally and non-normally distributed continuous variables, and Pearson’s chi-squared test was used to compare categorical variables. The internal consistency (a measure of whether individual questionnaire items are related to each other) of SarcQol was analysed using Cronbach’s alpha. The baseline, follow-up and telephone follow-up populations were analysed separately, and internal consistency within each of the SarQoL domains was also assessed. Correlations between each subdomain and both the total baseline SarQoL score and the score without the index subdomain were calculated using Pearson’s correlation coefficient to test whether a single subdomain could substitute for the total score.

Correlation between baseline and six-month follow-up values was assessed using Pearson’s correlation coefficient. This value is a key factor in calculating sample sizes for trial analyses that adjust for baseline values and is more informative for planning trial sample sizes than the intraclass correlation coefficient commonly calculated as part of psychometric assessment of measurement tools. A high correlation between baseline and follow-up values (i.e. a more stable measure) enables greater precision in detecting changes between groups, and the sample size can be reduced according to the formula (1 - r2) for a single time-point follow-up trial [20]. Responsiveness to change was calculated in two ways. Cohen’s d (equivalent to effect size) was calculated as the mean SarQoL difference between baseline and follow up) / pooled SD of baseline and follow up SarQoL [21]. Cohen’s d was calculated separately for groups reporting a slight improvement, a slight worsening, or any improvement. Guyatt’s responsiveness coefficient was calculated for the same groups using the mean change in SarQoL score for each group / SD of the change in SarQoL score in the group showing ‘no change’ on the Likert scale [22]. For comparison, responsiveness was also calculated for the SARC-F score in the same way.

Concurrent validity was assessed by calculating the correlation between baseline SarQol scores (total and individual domains), measures of physical performance (maximal grip strength, SPPB score and 4 m walk speed), and function in daily life measured by the SARC-F score. Pearson’s correlation coefficients were calculated as all data were normally distributed. Finally, a range of sample size calculations were performed to show the number of participants that would need to be recruited to detect different Minimum Clinically Important Difference (MCID) values for SarQoL under different assumptions.

Results

Data were available for 147 participants at baseline, and 125 participants underwent six-month follow up. Eight follow-up visits were conducted face-to-face; the others were conducted by telephone. The flow of participants through the study is shown in Fig. 1. The mean time between baseline and follow up was 6.5 (SD 1.1) months. Details of the baseline characteristics of the whole group, those undergoing six-month follow-up by telephone or face-to-face, and those who dropped out before six-month follow-up are given in Table 1.

Fig. 1
figure 1

Flow of participants through the study

Table 1 Baseline characteristics of study population

Internal consistency and subdomain correlations

Table 2 shows the results of internal consistency testing using Cronbach’s alpha. The full SarQoL score showed acceptable levels of consistency at baseline (alpha = 0.944) but showed less consistency at the follow up visit (alpha = 0.732). No differences in consistency were seen when confining the follow up analysis only to those undergoing telephone assessment at the six-month visit. Although consistency within most subdomains was good, consistency within the body composition and leisure activities domains was poor. Table 3 shows correlations between each subdomain and the total baseline score; the functionality and activities of daily living domains had the highest correlation with the total score.

Table 2 Internal consistency of SarQoL and subdomains (by Cronbach’s alpha)
Table 3 Pearson’s correlation between each subdomain of SarQoL and total score at baseline

Change in SarQoL over time

Follow up SarQoL scores at 6 months were weakly correlated with scores at baseline (r = 0.27, p = 0.03). In contrast, follow-up SARC-F scores were much more closely correlated with baseline SARC-F scores (r = 0.63, p < 0.001). Figure 2 shows the changes in the SarQoL score between baseline and follow-up for each category of global change in fitness or quality of life; details of the changes for each subdomain and for the SARC-F score are shown in Table 4. More participants reported worsening of global fitness or quality of life than reported improvement; no change in global fitness or quality of life were the most commonly selected categories for global change. Point estimates for SarQoL and for SARC-F in those reporting ‘no change’ were close to zero, suggesting no systematic bias in how scores changed between face-to-face assessment at baseline and telephone assessment at follow-up. Of the subdomains, functionality mapped most closely to the total score, both in terms of a stable functionality score in those reporting no global change, and also in the degree of improvement or deterioration in scores.

Fig. 2
figure 2

Relationship between global change and mean change in SarQol score between baseline and follow-up

Table 4 Mean Change in SarQol and other measures between baseline and follow-up

Responsiveness

Table 5 shows measures of responsiveness calculated separately for ‘slight improvement’ and ‘slight worsening’ in global fitness and global quality of life. Results for all categories of improvement combined are also presented, as numbers in each individual improvement category were small. The SarQoL tended to be more responsive to change than the SARC-F, and both tools were more responsive to improvement than to worsening.

Table 5 Responsiveness measures for SarQoL and other measures

MCIDs and sample size estimates

Table 6 depicts the anchor-based minimum clinically important differences for the SarQoL and SARC-F, and reports trial sample sizes that would be needed to detect these differences with 80 and 90% power, with and without adjustment for the correlation between baseline and follow-up, given an alpha of 0.05.

Table 6 Sample sizes required to detect minimum clinically important differences

Concurrent validity

Table 7 shows correlations between the total SarQoL score, measures of physical performance and the SARC-F score. The SarQoL score showed moderate correlations with each of the other measures as expected.

Table 7 Correlation between baseline SarQoL subdomains and baseline measures of physical function

Discussion

Our analysis is the first to validate the SarQoL score in a specific group of older patients in the UK with probable sarcopenia – the group who form the target population for the use of this tool in clinical trials and other studies of sarcopenia. We found that the SarQoL tool had good internal consistency, with better consistency at the baseline visit (where SarQoL was administered face-to-face) than at the follow up assessment (where it was administered by telephone in almost all participants). Responsiveness to change was variable, with small numbers limiting the robustness of the analyses, but the SarQol questionnaire appeared to be more sensitive to improvement than to deterioration, with sample sizes of 25–100 required to detect clinically significant improvements from interventions. Only a weak correlation was seen between baseline and follow up SarQoL scores, suggesting that adjustments for baseline values of the SarQoL in analyses of clinical trial data are unlikely to improve trial power by a large amount. The moderate correlations between SarQoL and measures of physical performance give good evidence of concurrent validity; higher correlations would not be expected given that the constructs of physical performance and daily activities are related to, but distinct from, the construct of sarcopenia-related quality of life.

Our findings complement and extend previous development work on the SarQoL tool. Most previous validation studies for the SarQoL questionnaire have been conducted by the research group that originally designed the questionnaire, and the current analysis is one of the few independent validation studies that have been performed to date. A previous analysis in a UK-based population of healthy older people [7] (not selected for activity limitation or sarcopenia) showed good internal consistency (Cronbach’s alpha 0.88); SarQoL scores were lower in the small number of participants with sarcopenia than those without, and SarQoL scores showed close correlation with related domains (e.g. physical function) of other health status tools including the Short Form 36 (SF-36) questionnaire. Reliability was high (intraclass correlation coefficient 0.95), in part due to the short gap of only 2 weeks between testing and retesting. No attempt was made to test responsiveness to change. Similar results were obtained in older Belgian outpatients using similar methods [8].

One previous study used longitudinal follow-up data to test responsiveness in a small group of patients (n = 43) with sarcopenia [9], but this analysis relied on correlating change in SarQol over time with change in related health status tools (SF-36 and EuroQoL 5-dimension tool), rather than by using currently recommended anchor-based methods. Although these data showed good correlation between change in SarQoL and other measures over the two-year follow up period, these data support construct validity of the SarQoL rather than responsiveness per se, and do not enable derivation of a minimum clinically important difference. Pooled data from nine studies [10] was used to derive the smallest detectable change (estimated at 7 points); the test-retest interval in this analysis was short (2 weeks in all studies), and a standard-error based method was used which is not optimal for deriving the minimum clinically important difference [11].

Our study has a number of strengths. Firstly, we studied a group of older people with levels of physical function representative of those seen in primary and secondary care services, almost all of whom had a diagnosis of probable sarcopenia [12]. This group is the group for which a sarcopenia quality of life tool would be deployed in both research and clinical practice. We used a 6 month follow up interval for both baseline to follow-up correlation and for responsiveness testing. This interval reflects the interval between visits that would be used in clinical trials or between assessments in clinical practice. Such an interval also increases the chances that improvement or deterioration will have occurred. We used an anchor-based approach to determine responsiveness to change and to estimate minimum clinically important differences, reflecting current recommendations in this field.

A number of limitations of our analysis also require highlighting. The number of participants who improved between baseline and follow up was small. This likely reflects the natural history of sarcopenia but was also likely to be due to the effects of movement restrictions imposed to combat the COVID-19 pandemic [23]. We were unable to administer the SarQoL face-to-face for most participants at follow up because of the pandemic, and the lack of face-to-face visits also precluded collection of some of the other planned outcome measures, particularly physical performance measures. It is possible that the different mode of questionnaire delivery at follow-up may have introduced more variability in the scores, reducing the observed consistency or responsiveness. Conversely, consistency and responsiveness were still acceptable despite heterogeneity in the mode of administration, suggesting that in research or clinical practice participants could be given a choice of completing the questionnaire face-to-face or by telephone. We relied on natural change in sarcopenia over the six-month follow-up rather than the response to an intervention, and future studies should assess the responsiveness to change for the SarQoL after resistance training (the intervention with the best evidence for improving sarcopenia) [24, 25]. We elected not to measure reliability in this analysis as we did not re-administer the SarQoL within a sufficiently short period after the baseline visit to be confident that participants had remained clinically stable [26].

The responsiveness to change for the SarQoL may enable smaller sample sizes to be used in trials than some generic quality of life tools (the EQ-5D for instance typically requires sample sizes of 200–300 to enable detection of the minimum clinically important difference of 0.074 points [27]). However, disease-specific tools such as SarQoL complement, but cannot replace, generic health status or quality of life for older people. Such generic measures are still essential to assess broader health status in older people who will typically suffer from multiple long-term conditions [28].

Although previous studies have noted that the SarQoL took participants only 10 min to complete, our anecdotal experience in this study suggests that participants who are more functionally impaired may take longer. As only one of a battery of tests that might be administered during a trial or other study visit, it is worth considering if it is possible to reduce the burden on study participants by reducing the size of the questionnaire. Of note, the subdomain of ‘function’ within SarQoL delivers a similar distribution of normalised scores to the full SarQoL questionnaire and correlates highly with the total score; use of this subset of questions could potentially enable less burdensome data capture, albeit with the loss of some aspects of quality of life captured elsewhere in the full SarQoL. These findings parallel those from studies of other health status measures such as the SF-36, where the physical function subdomain shows a close correlation with grip strength [29]. Our results suggest that the SarQoL can be administered face-to-face and by telephone as well as by self-completion as studied previously; this flexibility of delivery is important for effective trial delivery both during a pandemic when face-to-face visits may be impossible, but also in non-pandemic times, when remote trial delivery can improve participation and retention [30].

Conclusions

Taken together with previous validation studies of the SarQoL questionnaire, our results suggest that SarQoL has acceptable properties for use in clinical trials of sarcopenia interventions as part of a suite of outcomes. To date, few trials have included SarQoL as a trial outcome, and it is only by using SarQoL in clinical trials that we will be able to fully assess its performance as an outcome measure in this context.