Background

Surviving critical illness to hospital discharge is only the beginning of the journey for patients leaving intensive care. Many patients will experience post-intensive care syndrome [1, 2], with impaired health-related quality of life for ≥ 5 years. [3] Muscle wasting is a major driver for functional disability, with rates of loss of 2–3%/day during critical illness [4]. Despite extensive research and high-quality trials into physical rehabilitation strategies, there are no consistent results demonstrating benefit for patients, despite level 1 evidence in other clinical settings [5]. One explanation could be the range of primary outcome measures chosen for rehabilitation trials.

The increasing focus on functional measures as primary outcomes for multicenter trials of physical, nutritional, and metabolic interventions within critical care has led to an increasing number of Core Outcomes Sets [6]. While such standardization is important, the clinimetric properties of these outcomes are likely to influence trial results. Single and composite measures currently used in the intensive care unit (ICU) demonstrate both floor (≥ 15% of participants with a minimum score) and ceiling effects (≥ 15% of participants with a maximum score) [7]. For example, the 6-min walk test and the Short Physical Performance Battery (SPPB) have floor effects at ICU discharge [8, 9] and the Physical Function in ICU Test-scored (PFIT-s) has ceiling effects at hospital discharge [9]. These measurement limitations could impair our ability to assess intervention effects. [9]

In patients recovering from critical illness, physical rehabilitation activities typically progress from lower extremity in-bed exercises to standing activities. Outcome measures such as knee extensor muscle strength [10], assistance required for standing [11], and standing repetitions [12] can objectively document patients’ progression. The sit-to-stand (STS) test has been extensively used across a wide spectrum of chronic diseases [13], and its properties have been examined, with healthy age- and sex-matched normal data available [14]. The widespread use and acceptability of the STS test stem from the fundamental ability to stand from sitting unaided contributes to independence of function and activities of daily living (e.g., getting out of bed or going to the toilet). The STS test maps to more complex measures including the Barthel Index, the SF-36, and the Functional Independence Measure (FIM), which has been used to measure long-term functional recovery from critical illness [15,16,17]. Proximal hip muscle strength and power are required for this movement, a muscle group noted to be more severely affected by ICU-acquired weakness [18]. Interventions targeting muscle mass, strength, and power of quadriceps at the hip and knee may appropriately be measured using the STS, a test which is functional, patient-centered and represents an important functional milestone across the recovery trajectory.

To date, the time-based 30-s STS (30 s STS) has been examined as a patient-centered outcome measure at ICU and hospital discharge in small patient cohorts [19, 20]. The feasibility and responsiveness of the STS as a primary outcome in ICU populations across the recovery trajectory remain unclear. Unknown factors include its clinimetric properties (e.g., quantitative measures of clinical utility) [21], and the mathematical behavior of data over time.

We therefore investigated the clinimetric properties of three progressive outcomes required for physical functional independence starting with knee extension strength, progressing to STS assistance, culminating with 30 s STS, documenting measurement characteristics of interest to clinicians, researchers, and patients. Two of these measures, knee extension and STS assistance, are components of the PFIT-s, a 4-item performance-based outcome measure. [22]

Methods

We report this study using the Strengthening the Reporting of Observational Studies in Epidemiology statement. [23]

Participants

Participants prospectively enrolled in five published critical care rehabilitation studies (I-SURVIVE [20], TryCYCLE [24], CYCLE Pilot RCT [19], eStimCycle [25], the EXERCISE trial [26]) from three countries contributed data. Investigators from each study form the International METRIC Critical Care Data Group (METRIC—estiM cycle Exercise cycle piloT i suRvIve tryCycle). Briefly, participants were adults (≥ 18 years) admitted to ICU, were ventilated, previously independent, and deemed at greatest risk of future functional disability. Full inclusion and exclusion criteria for each study are included in Additional file 1: Table 1.

In I-SURVIVE, the inter-rater reliability of the PFIT-s and 30 s STS was assessed amongst 42 participants across two Canadian ICUs (enrolled between October 2016 and December 2017) [20]. TryCYCLE assessed the safety of an early in-bed cycling protocol in a single-center Canadian prospective cohort of 33 participants (October 2013–August 2014) [24]. Sixty-six participants were enrolled across seven Canadian ICUs in the CYCLE Pilot RCT, which assessed the feasibility of early in-bed cycling plus routine physiotherapy compared to routine physiotherapy alone (May 2015–June 2016) [19]. The eStimCycle multicenter RCT enrolled 162 participants across four hospitals in Australia and the USA, evaluating the effect of functional electrical stimulation-assisted cycle ergometry on physical and cognitive outcomes (August 2014–December 2018) [25]. EXERCISE, a single-center Australian RCT, assessed the effectiveness of an intensive physiotherapy program spanning ICU admission to the outpatient setting compared to usual care among 150 participants (May 2007–August 2009) [26]. In contrast to a meta-analysis of efficacy studies where population heterogeneity limits pooling, clinical heterogeneity across our studies enhances the clinimetric evaluation of outcome measures.

Outcome measures

We included three physical outcome measures: knee extensor strength, STS assistance, and the 30 s STS test [12, 27] (Additional file 1: Table 2). Knee extensor strength was assessed using manual muscle testing (MMT) and scored using the Medical Research Council (MRC) system. MRC scores range from 0 (no muscle contraction) to 5 (movement of muscle against gravity with full resistance) [28,29,30]. In each study, the MRC scoring system was used to assign a PFIT-s score ranging from 0 to 3; higher scores reflected greater strength [11, 22]. An MMT [29, 30] grades 0, 1, or 2 represented a PFIT-s score of 0; MMT grade 3 represented 1; MMT grade 4 represented 2, and MMT grade 5 represented 3. All studies recorded knee extension using the PFIT-s; however, not all studies documented individual MRC scores, and thus, we analyzed the PFIT-s. For STS assistance, a PFIT-s score of 0 represented a participant unable to perform the test; 1 represented a two-person assist; 2 represented one-person assist, and 3 represented no assist. For the 30 s STS test, participants completed as many full STS repetitions as possible in 30 s, using their arms if needed; higher scores represented greater strength and function. [12, 27]

Procedures

In each study, acute care physiotherapists and/or physiotherapy assistants were trained and completed outcomes assessment. Additional file 1: Table 2 summarizes outcomes and time points from each study.

Data analysis

From each study’s main dataset, we exported the following data at ICU and hospital discharge: anonymized participant identification code, knee extensor strength (PFIT-s), STS assistance (PFIT-s), 30 s STS repetitions (including whether arms were used), and reasons for missing data. If a participant did not complete an assessment because of a physical limitation or because the assessor perceived that they were unable, we scored these according to the PFIT-s (“0” (unable)). No identifying data were included in our pooled dataset.

Participants were considered “potentially eligible” for an assessment if they were enrolled in a study that assessed a given outcome at the relevant time point. Participants who died were excluded from the denominator for the respective time point. To reflect function as close as possible to the time point, we included strength or STS assistance assessments completed within three days of the date of ICU or hospital discharge. To maximize our sample size for 30 s STS, we included the most proximal assessment to each time point. For each measure, we identified paired assessments among participants with completed outcomes at ICU and hospital discharge.

We analyzed participant demographics and baseline characteristics for each study independently using descriptive statistics; some data have been previously reported in each study’s main publication [19, 20, 25, 26, 31]. We summarized outcomes using descriptive data. For each measure and time point, we identified the frequency distribution of scores (counts, percentages); identified floor (≥ 15% of participants with minimum score) and ceiling effects (≥ 15% of participants with maximum score) [7]; calculated central tendency [mean (standard deviation) or median (1st, 3rd quartiles) for skewed data]; and assessed normality (Shapiro–Wilk test, a = 0.05). For the 30 s STS, we calculated the mean or median time to assessment for each time point, and we considered the “maximum” score based on the upper limit of the 95% confidence interval for sex-matched normative values (29 repetitions for women, 32 for men) [32]. For paired assessments, we calculated each participant’s difference score (hospital minus ICU discharge); the overall group change score [mean (standard deviation) or median (1st, 3rd quartiles) for skewed data]; we compared the difference in assessment scores using a paired t test or Wilcoxon signed-rank test (skewed data), with a two-tailed a = 0.05. We also calculated the standard error of the measurement (SEM) and the minimal detectable change at a 90% confidence level (MDC90) [20, 33, 34] from paired assessments. We conducted a sensitivity analysis, removing participants who were assigned a score of “0” if they were unable to complete an assessment. Outcome assessment data were analyzed using Stata (v. 15.0, College Station, Texas: StataCorp LP).

We compared 30 s STS scores at each time point against established thresholds for maintenance of physical independence and normative values for community-dwelling older adults [32, 35] matched to our cohort characteristics.

Results

Participant demographics

Data from 451 participants enrolled across five studies were analyzed. Participant demographics and baseline characteristics are presented, by study, in Table 1. Most participants were male (n = 278, 61.6%) with a mean age between 60 and 66 years. Participants had a median duration of mechanical ventilation between 4 and 8 days, ICU length of stay between 7 and 11 days, hospital length of stay between 22 and 31 days, and mean APACHE II score between 19 and 24. In the next section, we describe results by outcome. Reasons for missing assessments by outcome and time point are in Additional file 1: Figs. 1 and 2.

Table 1 Patient demographics and baseline characteristics, by study

Knee extension

Of 387 potentially eligible participants alive at ICU discharge, 330 (85.3%) had a completed assessment (Fig. 1). The median PFIT-s knee extension score was 2 (2, 3) and a ceiling effect occurred in 48.5% (n = 160) (Fig. 2). Of 219 potentially eligible participants alive at hospital discharge, 154 (70.3%) had a completed assessment (Fig. 1). Measurement time points excluded from the parent study protocol accounted for 30 (46.2%) missing assessments (Additional file 1: Fig. 1). The median PFIT-s score was 3 (2, 3) with a ceiling effect in 74.7% (n = 115) (Fig. 2). In 139 participants with paired data, the median PFIT-s difference score between ICU and hospital discharge was 0 (0, 1) (Fig. 2; p < 0.01) (Fig. 3).

Fig. 1
figure 1

Flowchart of outcome measure assessments for participants enrolled across all five studies. Patients were potentially eligible for an assessment if they were enrolled in an included trial and it was part of the trial protocol to complete an outcome measure assessment at that time point. The number of potentially eligible patients for knee extension and STS assistance is lower at hospital discharge because these outcome measures were not performed at this time point in the EXERCISE trial. Assessments were excluded across all studies if they were performed greater than 72 h from the time of ICU or hospital discharge, respectively. Reasons for no assessment are included in Additional file 1: Figs. 1 and 2. *30 Second STS was only assessed in CYCLE Pilot RCT and I-SURVIVE. Pt, Patient; Ax, assessment; STS, sit to stand; KE, knee extension; 30 s STS, 30-second sit to stand; and d/c, Discharge

Fig. 2
figure 2

Distribution of scores, including individual assessments (left) and difference scores for paired assessments (right). We considered a floor as ≥ 15% of patients with minimum score and ceiling as ≥ 15% of patients with the maximum score. For 30 s STS, we considered the “maximum” score to be the upper limit of the 95% confidence interval for sex-matched normative values (29 repetitions for women, 32 for men) (Tveter et al., 2014). For patients with assessments completed at ICU and hospital discharge, difference scores were calculated by subtracting scores at ICU discharge from hospital discharge. d/c, Discharge; ax, assessment; and STS, sit to stand

Fig. 3
figure 3

Paired outcome measure scores for participants with assessments at ICU (left) and hospital discharge (right). Each gray line represents one paired assessment. Black diamonds represent median assessment scores at ICU discharge and hospital discharge, for the subset of participants with paired data. Vertical, red bars represent quartiles; bottom bars represent the 1st quartile, and top bars represent the 3rd quartile. For knee extension, where there is no top or bottom bar at hospital discharge, the quartile was the same as the median value. STS, Sit to stand; d/c, discharge

STS assistance

Of 387 potentially eligible participants alive at ICU discharge, 327 (84.5%) had a completed assessment (Fig. 1). The median STS assistance PFIT-s score was 2 (1, 3) representing assistance with one person, and we calculated a ceiling effect in 45.9% (n = 150) (Fig. 2). Of 220 potentially eligible participants alive at hospital discharge, 102 (46.4%) had a completed assessment (Fig. 1). Measurement time points excluded from the parent study protocol accounted for 88 (74.6%) missing assessments (Additional file 1: Fig. 1). The median STS assistance PFIT-s score at hospital discharge was 3 (3, 3) representing no assistance, and a ceiling effect occurred in 77.5% (n = 79) (Fig. 2). In 99 participants with paired data, the median difference score between ICU and hospital discharge was 1 (0, 2) (Fig. 2; p < 0.01) (Fig. 3).

30 s STS

Of 90 potentially eligible participants alive at ICU discharge, 80 (88.9%) had a completed assessment (Fig. 1) with a median 30 s STS score of 2 (1, 5) repetitions, and a floor effect occurred in 15.0% (n = 12) (Fig. 2). The median (IQR) time to 30 s STS assessment was 1 day (0, 3) after ICU discharge. Thirty-six participants (45%) used their arms during the test. Of 82 potentially eligible participants alive at hospital discharge, 58 (70.7%) had a completed assessment, with a median 30 s STS score of 6 (3, 9) repetitions (Fig. 1). The median (IQR) time to 30 s STS assessment was 1 day (0, 3) before hospital discharge. Thirty-three participants (57%) used their arms during the test. We did not observe a floor or ceiling effect (Fig. 2). In 54 participants with paired data, the median difference score between ICU and hospital discharge was 3 (1, 6) (n = 54; Fig. 2; p < 0.01) (Fig. 3). The SEM was 0.51, and the MDC90 was 1.19 STS repetition (Additional file 1: Table 3). Sensitivity analyses are included in Additional file 1: Table 4.

We compared 30 s STS scores for the age range of our cohort against the physical independence thresholds for older adults 60–64 years (females: 15 repetitions, males: 17) and normative values for those 60–69 years (females: 21, males: 24) (Fig. 4). One participant (1.3%) met thresholds for physical independence at ICU discharge, while two participants (3.5%) met thresholds at hospital discharge (Fig. 4). None achieved normative values at ICU discharge, and only one (1.7%) achieved 15 repetitions at hospital discharge (Fig. 4).

Fig. 4
figure 4

Distribution of 30-Second STS scores at ICU and hospital discharge. We used thresholds to maintain physical independence for moderately active older adults 60–64 years (Rikli and Jones 2013), and normative values for community-dwelling adults aged 60–69 years (Tveter et al. 2014). Blue represents ICU discharge (n = 80), and orange represents hospital discharge (n = 58). The histogram can be interpreted using the Y-axis. Vertical bars represent the number of patients with each number of STS repetitions. The median scores were 2 (1, 5) STS repetitions at ICU discharge and 6 (3, 9) at hospital discharge. Box plots superimposed upon the histogram represent the median participant score and quartiles. The vertical, black line within the box plot represents the median, while the left side represents the 1st quartile, and the right side represents the 3rd quartile. Tails of the box plot represent the spread of scores, where the left tail represents the minimum, and the right represents the maximum. The horizontal tail lines correspond to the number of patients with the median STS repetitions at each time point

Discussion

Our study represents 451 critically ill participants enrolled across 5 studies, from 3 countries, with synthesized measures at ICU and hospital discharge, and paired assessments between time points. The sample used in this analysis was comparable to previous ICU rehabilitation trial samples with respect to participant characteristics including age [36, 37], sex [36], and clinical characteristics including duration of mechanical ventilation, ICU and hospital length of stay, and APACHE II scores, enhancing generalizability of our findings. The range of APACHE II scores across studies represent moderate-to-severe disease.

Previous research highlighted profound disability experienced by ICU survivors, where only 40% could ambulate at 7 days post-ICU discharge [17]. As a result, outcome measures in this population are plagued by floor and ceiling effects. We identified ceiling effects in knee extension and STS assistance at ICU (~ 50%) and hospital discharge (~ 75%), and floor effects in 30 s STS at ICU discharge (15%). Importantly, we did not observe floor or ceiling effects in the 30 s STS at hospital discharge (Fig. 2). This is in contrast to other measures of physical function for ICU survivors, such as the de Morton Mobility Index and the PFIT-s, which have known limitations at ICU and hospital discharge, respectively. [38] Our data identify the 30 s STS as a promising performance-based functional measure for future ICU longitudinal studies and clinical trials focused on physical function.

ICU survivors demonstrated profound impairments in physical function measured by the 30 s STS at both ICU and hospital discharge. One participant reached normative values and only 2 met or exceeded thresholds required for maintaining physical independence, highlighting the importance of ongoing rehabilitation post-hospital discharge. Small changes in the 30 s STS are likely to be highly relevant to patients’ physical function, providing further justification for the 30 s STS as an outcome measure for clinical trials. [16, 27]

The PFIT-s was developed to measure function at ICU discharge [22], and previous research demonstrates its use at or around ICU discharge [39]; however, a small study documented good reliability and responsiveness post-ICU discharge [20]. Additionally, the potential to use the PFIT-s to prescribe exercise in ICU and at discharge is a unique feature of this test [22]. Data in this current study show that use of individual components of knee extension or STS assistance does not individually demonstrate rigorous outcome metrics for use at ICU or hospital discharge.

To date, many ICU rehabilitation trials are single-centered and enroll small samples [40]. A systematic review and meta-analysis of rehabilitation studies in the ICU summarized 60 RCTs enrolling 5,352 participants [41]. Out of these 60 RCTs, 20 measured muscle strength using the MRC scoring system (16 at ICU, 7 at hospital discharge), 22 reported function (21 at ICU, 15 at hospital discharge; 4 using PFIT-s); 30 s STS outcomes were not reported in this review. The 20 studies measuring muscle strength enrolled 1,713 participants, conducted 1,335 assessments at ICU discharge, and 461 at hospital discharge. The 4 studies reporting the PFIT-s enrolled 316 participants, conducted 167 assessments at ICU, and 53 at hospital discharge. Compared to previous work, our study represents the largest cohort of assessments for PFIT-s components and the 30 s STS at ICU and hospital discharge.

Implications for future studies

Our observations of the clinimetric properties of the 30 s STS test, including its ease of administration in a clinical or research setting, with no need for expensive equipment, may make it an appropriate and feasible measure of function in future ICU rehabilitation studies. Two approaches to evaluating STS exist: repetition-based (time required to complete a prescribed number of repetitions) [27] or time-based (number of repetitions completed within a prescribed time) [42]. Notably, in a repetition-based approach, participants unable to complete the test cannot be scored (i.e., a floor effect). A time-based approach allows assignment of a score, including a true zero, if a participant is unable to complete the test [27]. In this respect, the 30 s STS is more attractive than outcomes including repetition-based measures, such as the chair stand test in the SPPB, where one component of this battery includes the amount of time required to complete 5 STS repetitions [43]. Thus, for ICU survivors, a time-based approach is more suitable as it allows for a true zero rather than a floor effect, providing a more accurate measure of physical function.

Participants in our sample performed a median of 2 (ICU discharge) and 6 (hospital discharge) sit-to-stand repetitions in 30 s. Community-dwelling patients with stable chronic obstructive pulmonary disease completed an average of 13 sit-to-stand repetitions following pulmonary rehabilitation [44], and those with moderate–severe disease completed 10.8 repetitions [45]. Our data were comparable to the average of 5 repetitions performed by male veterans with an average age of 91 years, using the modified STS (mSTS) [27]. This level of disability lends itself to considering use of a mSTS, which is used with older adults and allows participants to use chair arm rests to perform the test [16]. Our data suggest the ICU survivor population is closer to the geriatric population in physical function at discharge with two potential implications: A mSTS may be best suited and secondly, that MCIDs would be better extrapolated from the geriatric population. While the inability to perform a STS is predictive of falls which is common across all forms of the test [46,47,48], a difference of 1 mSTS repetition has an odds ratio of 0.75 for decreasing falls risk, and a cutoff of 7 repetitions corresponds to significant decreases in falls risk [27]. The MDC90 of the mSTS is 0.7, indicating that a change of 1 or greater is a change beyond that which can be attributed to measurement error [16]. Further, in our cohort of participants, we identified an MDC90 of 1.19, representing 1 repetition clinically. Based on these data and our MDC90 results, our findings of a median difference score between ICU and hospital discharge of 3 (1, 6) indicate a true change in physical function and is likely functionally meaningful for patients.

Our study has limitations. The 30 s STS test was only performed in the two Canadian studies, and thus, fewer observations may have impacted the precision of our results. Knee extension and STS outcomes were not assessed at hospital discharge in the EXERCISE trial, also limiting our sample size. Our decision to use the most proximal 30 s STS assessment(s) to maximize our sample size may have introduced representation and selection biases in our results. Approximately half of the participants in this study used armrests when completing the 30 s STS test, introducing a variance in the testing protocol. However, we do not believe this contributed to a change in test performance, as participants still demonstrated profound deficits. Our combined data of five international prospective ICU rehabilitation studies also have several strengths, including detailed reasons for missing data, a continuum of measures, measurements at both ICU and hospital discharge, and change in scores between ICU and hospital discharge. We included studies that examined different interventions and somewhat different patient populations with a range of outcome scores. This clinical heterogeneity provides enhanced generalizability of our findings.

Conclusion

The 30 s STS is relevant to patient function, has good clinimetric and statistical properties, and can be used across the continuum of recovery post-ICU in clinical practice and research. The 30 s STS could be used to assess strength and function at ICU and hospital discharge in moderate to severely ill participants in future studies of physical, nutritional, or metabolic interventions. Until we develop normative values for critically ill patients, our study can inform normative values for ICU survivors and help clinicians contextualize patients’ recovery.