Background

Stroke remains one of the leading causes of disability worldwide, with the majority of stroke survivors requiring specialized rehabilitation [1]. Inpatient stroke rehabilitation is a program of medical intervention and targeted therapies, which aims to maximize a patient’s functional recovery and facilitate reintegration into the community [2, 3]. To evaluate progress, clinicians use standardized assessment tools or clinical outcome measures such as the Functional Independence Measure [4] (FIM) for level of disability or the Ten-Meter Walk Test [5] (TMWT) for walking ability. Understanding the factors that affect these outcomes may help clinicians to streamline the treatment plan and efficiently allocate rehabilitation resources [6, 7]. Further, clinicians assess a patient’s functional abilities based on performance in these standardized tests, such as classifying patients as household ambulators or limited community ambulators based on walking speed score from the TMWT [8, 9]. Estimating a patient’s future discharge scores early in a rehabilitation program would help clinicians set realistic rehabilitation goals and anticipate needs for additional care or medical equipment at discharge.

Several studies have investigated predictors of clinical outcomes after acute inpatient stroke rehabilitation [10,11,12,13,14,15]. Their main focus was to predict individual’s ability to perform activities of daily living, as measured by the FIM and the Barthel Index [16], or to predict walking speed as measured by the TMWT [14]. These studies found that the clinical assessment scored at discharge could be predicted based on patient demographics such as age [10,11,12,13, 15] and sex [11], medical information such as the time from stroke onset to rehabilitation admission [11, 13] and the admission score of the predicted outcome [10,11,12,13,14]. However, there are some notable gaps in our knowledge and understanding of these outcomes. Specifically, previous studies have primarily investigated predictors of a single clinical outcome measure, while therapists often use multiple standardized tests to gauge functional abilities. The American Physical Therapy Association highly recommends additional tests [6], including the Berg Balance Scale [17] (BBS), which assesses balance outcomes and fall risk, and the Six-Minute Walk Test [18] (SMWT), which assesses walking endurance and aerobic capacity. Understanding interactions among different clinical outcomes may help identify the tests that provide unique information about specific functional abilities compared to tests that may be redundant or unrelated to those abilities. Second, studies have predicted the discharge score of a clinical outcome using admission scores from a small subset of other clinical outcomes [14, 19]. For example, discharge walking speed has been predicted from admission scores of BBS and the Motor Assessment Scale [20]. Considering additional admission assessments should improve predictive accuracy, while including additional discharge assessments should provide a more comprehensive overview of a patient’s functional outcomes. Finally, previous studies developed predictive models for clinical outcomes using stepwise methods based on the predictors’ significance level (p-value). However, the ability of the p-value to determine the importance of predictors and to output the optimal set of predictors is limited, especially for small sample sizes, small ratio of sample size to predictors, and correlated predictors [21,22,23,24,25,26,27]. Conversely, certain machine learning approaches aim to reduce model error by selecting a targeted set of predictors based on relative importance [28] and incorporate regularization mechanisms to produce more accurate and generalizable predictions [29].

The objective of this study was to use machine-learning algorithms to develop predictive models for discharge scores of four standardized clinical tests (FIM, TMWT, SMWT, BBS) after inpatient stroke rehabilitation. Potential predictors included patient demographics, stroke characteristics, and the scores of each of the four tests at admission. We also investigated the correlations between the clinical outcomes and the predictors, stated the predictors’ significance level and compared their relative importance in effecting the discharge scores.

Methods

Fifty individuals with stroke admitted to the Shirley Ryan AbilityLab (formerly, the Rehabilitation Institute of Chicago) for acute inpatient rehabilitation participated in this study. All individuals (or a proxy) provided written informed consent prior to participation. Inclusion criteria were: diagnosis of stroke and admitted to the Shirley Ryan AbilityLab; at least 18 years of age, and able and willing to give consent and follow study procedure directions. Exclusion criteria were: diagnosis of neurodegenerative pathology as a co-morbidity (e.g., Alzheimer’s disease, Parkinson’s disease, etc.); pregnant or nursing; or utilizing a powered, implanted cardiac device for monitoring or supporting heart function (i.e., pacemaker, defibrillator, or LVAD). Medical clearance was obtained from each patient’s primary physician for study participation. The study was approved by the Institutional Review Board of Northwestern University (Chicago, IL; STU00205532) in accordance with federal regulations, university policies and ethical standards regarding research on human subjects.

After consent, and within the first week of admission, a battery of clinical tests – including the TMWT, SMWT, and BBS – was administered by a licensed physical therapist. These tests were performed in a non-standardized order based on the availability of equipment and space in the therapy room. During the inpatient rehabilitation program, patients received, on average, 180 min of therapy per day, five to 6 days a week. Based on the needs of the patient, this time was divided among physical, occupational, and speech-language therapy. This rehabilitation program follows requirements of Medicare, a major health insurance provider, which sets standards for inpatient stroke rehabilitation in the United States [30]. Within a week of discharge from the hospital, the same battery of clinical tests was again administered to determine the clinical outcomes after inpatient rehabilitation. FIM scores at admission and discharge were compiled from individual FIM items recorded in the patient’s electronic medical records in accordance with the Inpatient Rehabilitation Facility Patient Assessment Instrument guidelines (IRF-PAI, regulated by the United States Centers for Medicare & Medicaid Services). As per hospital standards, the FIM was also administered by licensed physical therapists and performed within 72 h of admission and within the 24–48 h window prior to discharge.

Patient demographics and stroke type were obtained from the Electronic Medical Record (EMR). Diagnoses of dysphagia, cognitive-communication deficit, and other speech/language impairments were made by experienced speech/language pathologists in the hospital and also collected from the EMR as additional stroke characteristics. Finally, patients (or their proxies) completed a study intake form regarding lifestyle and education.

Dependent and independent variables

The dependent variables were the discharge assessment scores of four commonly used clinical tests: FIM, TMWT, SMWT, BBS.

The independent variables (predictors) included demographic information, stroke characteristics, and scores of the clinical tests from the admission assessment. Demographic information included the patient’s sex, age, body mass index (BMI), race, years of education, and pre-stroke activity levels (defining sedentary as less than 3 h of exercise per week, moderately active as 3–6 h of exercise per week, and highly active as greater than 6 h of exercise per week). Stroke characteristics included time from stroke onset to rehabilitation admission, stroke type (hemorrhagic or ischemic), and diagnoses at admission: dysphagia (i.e., difficulty or discomfort in swallowing), cognitive-communication deficit (i.e., frontal lobe disorders), speech impairments (e.g., aphonia, dysphonia or dysarthria), and language impairment (i.e., aphasia). For analysis, these diagnoses were coded as binary variables (present or absent). The clinical tests at admission included the patients’ FIM, TMWT, SMWT, and BBS scores. Patients who could not walk during a given assessment received a score of 0 for the TMWT or SMWT, in accordance with clinical practice guidelines [31] and similar to previous discharge prediction models [14, 32].

Data analysis

All statistical analyses were performed using Python version 3.7.3. Normality was evaluated for each dependent variable (i.e. FIM, BBS, TMWT and SMWT) using the Shapiro-Wilk test. For normally-distributed variables, correlations among continuous variables were measured using the Pearson product-moment coefficient (r) and among continuous and categorical variables were measured using the Point-biserial coefficient (rpb). For non-parametric variables, correlations were measured using the Spearman’s rank correlation coefficient (rs). For all procedures, we considered a coefficient value below 0.3 to express a weak correlation, 0.3 to 0.5 to express a moderate correlation and above 0.5 to express a strong correlation, as recommended by Cohen [33]. Significance level (α) was set to 0.05 and was used to determine which predictors significantly affected each clinical outcome score at discharge.

Predictive models for the discharge scores of each clinical outcome were developed using the cross-validated Lasso regression [29]. Lasso regression is a type of linear regression that includes a regularization term. This term penalizes a model based on the number of predictors and the magnitude of their coefficients. Therefore, it encourages the development of simpler models (fewer predictors) and reduces risk of overfitting [34,35,36,37]. The relative strength of the regularization is determined by the value of its parameter λ, wherein λ = 0 produces the same coefficients as linear regression and higher values of λ produce sparser models by forcing more coefficients to 0. In this study, we developed the prediction equations and evaluated their performance using a two-stage, nested, leave-one-out cross-validation (LOOCV) procedure [38, 39]. The outer LOOCV stage was used for evaluating the ability of the model to predict the outcome of a new patient, while the inner stage was used to optimize the parameter λ. In each iteration of the outer stage, the data was divided into train and test sets. Then, the train set was sent to the inner stage and divided again for optimizing λ. Using this procedure ensured that the test set would only be used to evaluate the models performance and never be used for development of the model or optimization of the λ parameter. To quantify the goodness-of-fit of each predictive model, we calculated the percentage of variance explained (R2), and Mean Absolute Error (MAE). To evaluate model performance while accounting for the number of predictors, we also computed the adjusted R2 (\( {\boldsymbol{R}}_{\boldsymbol{adj}}^{\mathbf{2}} \)). To compare model performance across the different dependent variables, we normalized the MAE of each model by the range of observed values (MAEn). To evaluate the model’s ability to predict both patients that experience small recovery and patients that experience large recovery, we used the Spearman’s rank correlation coefficient (rs) and calculated the correlation the patient response to therapy (i.e. change in outcome from admission to discharge) and the model’s error.

We applied the permutation importance analysis based on a Random Forest model [28, 40] to measure the relative importance of the independent variables on each clinical outcome score. Relative importance was established from the contribution of the variable to the predictor in reducing the prediction error. The permutation importance analysis assigned an importance score (IS) to each variable, ranging from 0 to 1. The relative importance (RI) of a predictor (%) was calculated by dividing the predictor’s score by the sum of all the predictors scores, as follows:

$$ {RI}_{i,j}={IS}_{i,j}/\sum \limits_{i=1}^n{IS}_{i,j} $$
(1)

where RIi, j is the relative importance of predictor i to clinical outcome j; ISi, j is the importance score of predictor i to clinical outcome j assigned by the Random Forest model; and n is the number of predictors for clinical outcome j. Only variables with RIi, j > 0.01 were considered in the analysis.

Results

Summary statistics of the patient demographics, stroke characteristics, and clinical test scores are presented in Table 1. The scores of all four clinical outcomes measures significantly improved from admission to discharge (p < 0.05). On average, from admission to discharge, FIM scores increased by 47.5% (26.6 points), walking speed from TMWT increased by 61.7% (0.29 m/s), walking endurance from SMWT increased by 82% (185 m), and BBS scores increased by 43% (9 points).

Table 1 Demographic information, stroke characteristics, and clinical tests of study participants (N = 50)

Correlations between clinical outcomes

These results show a strong correlation (0.61 < rs < 0.92) among all clinical outcomes both at admission and at discharge (Table 2). The strongest correlation was found between the TMWT and SMWT at admission (rs = 0.92). All correlations were significant (p < 0.05) and positive, such that higher scores in one test indicated higher scores in the other tests.

Table 2 Correlations between clinical test scores, at admission and discharge

Predictors of clinical outcomes at discharge

All clinical outcomes at discharge (FIM, TMWT, SMWT, BBS) were strongly correlated to the scores of the FIM, TMWT, SMWT, and BBS at admission (0.69 < rs < 0.88; p < 0.05), meaning that a high score in one clinical test at admission indicated high scores in all clinical tests at discharge. Time from the stroke onset to admission marginally affected the BBS and TMWT (rs = − 0.24; 0.05 < p < 0.1), meaning that shorter time from stroke onset to admission indicated improved clinical outcomes at discharge. The FIM score was moderately correlated with the patient’s sex (rpb = 0.3; p < 0.05), with females having higher FIM scores at discharge, and with diagnoses of dysphasia at admission (rpb = 0.32; p < 0.05), where dysphagia was related to lower FIM scores at discharge. The BBS score was also moderately correlated with diagnoses of dysphasia (rs = 0.38; p < 0.05), where dysphagia was related to lower BBS scores at discharge. Finally, the patient’s age significantly affected the BBS score (rs = − 0.32; p < 0.05), and marginally affected the SMWT (rs = − 0.26; 0.05 < p < 0.1), where younger patients had greater SMWT and BBS scores at discharge.

Predictive equations for clinical outcomes at discharge

Predictive models for discharge scores of each clinical outcome were developed using cross-validated Lasso regression (Table 3). The resulting models explained 70–77% of the variance in discharge scores, and average normalized error ranged from 10 to 13% for the study participants and 13–15% for new patients. The generalizability of each model was evaluated using a two-staged nested LOOCV procedure, testing its ability to predict scores of patients that did not participate in the model’s development (Table 3). The LOOCV results show that the MAE increased by an average of 19% to predict the outcomes of a new patient in comparison to the prediction error of the study’s participants. For predicting clinical outcomes of new patients, the average error was 9.5 points for the FIM model (range 0–23), 0.3 m/s for the TMWT model (range 0.01–0.9), 80.8 m for the SMWT model (range 7–256), and 7.4 points for the BBS model (range 0–23).

Table 3 Predictive models for the discharge clinical outcomes, including coefficients of each predictor and model goodness-of-fit (R2, \( {R}_{adj}^2 \), MAE, and MAEn)

We used Spearman’s coefficient to measure the correlation between the patient response to therapy and the model’s error. The results show a weak (rs ≤ 0.3) and non-significant correlation (p > 0.05) for all clinical tests, though there is a trend of greater error for individuals with large change in clinical scores in the TMWT and SMWT (Fig. 1). Patients with a change of 0 in the TMWT and SMWT were unable to complete these tests at both Admission and Discharge due to insufficient ambulation ability. Average MAE for these patients was 0.16 ± 0.10 m/s in the TMWT (n = 7; Fig. 1b) and 80.7 ± 23.6 m in the SMWT (n = 3; Fig. 1c). On the other hand, some patients were unable to complete these tests at Admission but gained sufficient ambulation ability to attain a score at Discharge. Average MAE for these patients was 0.27 ± 0.25 m/s in the TMWT (n = 9; Fig. 1b) and 56.7 ± 32.9 m in the SMWT (n = 10; Fig. 1c).

Fig. 1
figure 1

Relationship between patient recovery and model performance. Spearman’s rank correlation between change in clinical score and mean absolute error (MAE) for the (a) FIM, (b) TMWT, (c) SMWT, and (d) BBS. Red circles represent patients who scored a 0 at both Admission and Discharge assessments (did not achieve sufficient ambulation ability to complete the test by the end of inpatient rehabilitation); yellow circles represent patients who scored a 0 at Admission but gained sufficient functional ability to complete the test at Discharge

The relative importance of the models’ predictors for each clinical outcome at discharge is illustrated graphically in Fig. 2. The most important predictor for the discharge score of the FIM, TMWT, and BBS was their own score at admission. The most important predictor for the SMWT at discharge was the TMWT score at admission. The scores of the clinical tests at admission contributed 80–90% of the relative importance, while demographics and stroke characteristics together contributed the remaining 10–20%.

Fig. 2
figure 2

Relative importance of independent variables for discharge clinical outcomes. (a) FIM Discharge model; (b) TMWT Discharge model; (c) SMWT Discharge model; (d) BBS Discharge model. Within each model, predictors relating to clinical outcomes tests (FIM, TMWT, SMWT, BBS) are scores from those tests at admission. TSA = time from stroke onset to admission; EDU = education in years; BMI = Body Mass Index; HIS = Hispanic; SI = speech impairment; LI = language impairment; Age = patient’s age in years

Discussion

This study presents a machine learning approach for the prediction of clinical outcomes at discharge after inpatient stroke rehabilitation. The equations developed in this study considered scores of clinical tests at admission, patient demographics, and stroke characteristics as possible predictors, which explained 70–77% of the variance in clinical scores at discharge. The normalized errors for the study’s patients ranged between 0.10–0.13 and for new patients between 0.13–0.15. The permutation analysis found that the most important variables for prediction of the discharge outcomes predictors were the admission scores of the clinical tests. The importance of the scores of clinical test in admission for predicting discharge score was also shown in a previous studies focusing on prediction of FIM [10] and walking speed [14]. Our predictive equations may assist clinicians estimate a trajectory of recovery for their patients during inpatient rehabilitation, using measures that are often available following admission. These results are especially relevant for rehabilitation programs similar to the current study (i.e. following the requirement of Medicare in terms of therapy types and dosage).

We investigated the correlation between the clinical outcomes and found that the TMWT and SMWT were strongly correlated (rs = 0.92), as previously observed by several studies for patients with stroke, spinal cord injury, multiple sclerosis [41,42,43,44,45]. These correlations could explain why only one of the walking tests is included in the FIM, BBS, and TMWT models, since the Lasso regression tends to choose a single variable in a set of correlated predictors [29].

In the current study, apart from the admission scores, additional variables with at least 1% of relative importance for at least one clinical outcome included the time from stroke onset to admission, age, BMI, race, education, dysphasia, and language impairment. Each of these predictors was found to affect clinical outcomes in at least one previous study [10, 13, 14, 46, 47]. The contribution of the current study is in providing a more comprehensive investigation of the clinical tests and set of predictors, in which we found that the relative importance of these variables was much smaller (10–20%) than the importance of the scores of clinical tests at admission (80–90%).

The predictive equation for the FIM discharge score explained 76% of the variance. This model explained more variance than the models presented in all previous studies for predicting FIM at discharge [9, 13, 48], except Ferriero et al. [48] whose model explained 82% of the variance. However, the model of Ferriero et al. [48] included medical comorbidities and complications, which were not considered in the current study. The TMWT discharge predictive equation in the current study explained 70% of the variance, outperforming previous models [15, 19] except for Bland et al. [14] whose model explained 81% of the variance. The model in Bland et al. [14] might have explained more variance because it considered the FIM walk item, which focuses more on elements affecting gait velocity compared to the total FIM score used in the current study. To the best of our knowledge, the current study is the first to develop predictive models for the BBS or SMWT values at discharge.

We applied a machine learning approach to develop predictive models of clinical outcomes at hospital discharge (using cross-validated Lasso regression). Previous studies that predicted discharge scores of clinical outcomes used the p-value as a criterion for determining relative importance or selecting features [11, 13, 14, 19]. However, this criterion is prone to overfitting and may not select the most important features, especially in cases where the predictors are strongly correlated [21,22,23,24,25,26,27]. In the current study, the feature selection process was performed using the cross-validated Lasso regression, which includes a regularization mechanism (L1) to reduce the risk of overfitting. Since Lasso regression may rule out important variables due to co-linearity with other variables, we investigated the relative importance of the independent variables using permutation importance analysis considering all independent variables. The importance of each variable was evaluated by its ability to reduce error of the Random Forest model which provides a more comprehensive, non-linear, analysis of the relative contributions of each variable to the clinical outcome.

The ability to predict clinical outcomes during stroke rehabilitation remains a meaningful yet challenging task. Clinical test scores at discharge are informative when assessing the patient’s level of independence, ambulation, and risk of falling. Forecasting a patient’s discharge scores early in a rehabilitation program can help clinicians, patients, families, and insurance companies better prepare for the patient’s care needs after leaving the hospital (e.g., to plan discharge location such as skilled nursing facility or home, to estimate the level of assistance the patient will require, to order equipment such as a wheelchair or orthosis, or to evaluate the expected medical costs or insurance coverage). One of the ongoing disputes in the field is the “proportional recovery” rule in stroke recovery [49,50,51,52]. Assuming that most stroke patients follow the rule and recover approximately 70% of their functional loss, many studies have developed prediction models of stroke recovery based on admission data [51]. However, recent work has raised important questions regarding the validity of the proportional recovery rule, citing conditions for which models based on this rule might by over-optimistic [49, 50, 52]. In the current study, we tried to avoid this potential pitfall by directly predicting the scores of clinical outcomes at discharge instead of the relative changes in those scores. We acknowledge that our R2 results might be over-optimistic and thus base our claims on the MAE results. Our models did not identify non-responders in the TMWT and SMWT (individuals who did not attain sufficient ambulation ability to complete these tests by hospital discharge), which is an important area of improvement for clinical prediction models.

Predicting clinical outcomes in the time of admission has been shown to improve therapy efficiency, increasing therapists’ confidence and help to prepare for a probable discharge location [51, 53, 54]. However, the type of rehabilitation program or engagement of the patient could also affect the discharge outcomes. The rehabilitation program in this study is based on the requirements of Medicare, which drives the inpatient rehabilitation structure in the United States, and is expected to be similar to other national inpatient programs. Therefore, the results of this study should be relevant for other U.S. hospitals as well. Future work should consider including objective measures of the rehabilitation program and even measures of patient attitude or engagement during the rehabilitation process in order to further refine the model predictions and improve generalization to alternative rehabilitation programs.

Standard clinical tests alone may not have the prognostic resolution to determine later functional ability. Wearable sensors are an emerging technology that can allow precise, fine-scale measurement of biomechanical and physiological markers during rehabilitation [7, 55]. Such technologies may improve prediction of clinical outcomes by capturing objective, high-resolution data signatures of post-stroke impairment and informing efficient, patient-specific rehabilitation strategies [53]. However, because a sensor-based approach is still in a preliminary research phase and not yet readily available in clinical settings, the models presented in the current study could provide a practical, accessible tool for clinicians to estimate a patient’s recovery trajectory during inpatient rehabilitation.

Limitations

This study included a relatively small sample size of 50 patients from a single inpatient rehabilitation hospital, which may result in bias, overfitting, and limitations for generalization to other populations. To minimize the effect of small sample size and minimize potential for overfitting, we used Lasso regression [34,35,36,37]. Furthermore, the patients who participated in this study had a wide range of demographic characteristics and impairments at admission (Table 1), suggesting that there is moderate variation in the sample for generalization to new patients. Nevertheless, future research could expand the current study by predicting clinical outcomes using a larger sample size from different rehabilitation settings to increase generalizability. The current study included the four clinical outcomes which are highly recommended for evaluation of inpatient stroke rehabilitation by the American Physical Therapy Association [6]. However additional recommended measures could include outcomes such as the Fugl-Meyer Assessment [56] and the Dynamic Gait Index [57], and future research could focus on their prediction.

Conclusions

We investigated the factors affecting clinical outcomes during inpatient stroke rehabilitation and developed predictive models for their scores at discharge.

All the measured outcomes (FIM, TMWT, SMWT, BBS) were strongly correlated with each other; with the highest correlation found between the TMWT and SMWT (rs = 0.92). The SMWT was not inserted to the model as a predictor for the FIM, BBS or TMWT. Therefore, while the SMWT contributes unique information regarding the patient walking endurance, it might have redundancy with the TMWT for predicting the walking speed (TMWT), balance (BBS) and overall disability (FIM).

The most influential factors for the outcomes scores at discharge were the scores of the clinical test at admission. Therefore, even if a clinicians use only one clinical outcome in their evaluation (e.g. FIM), we recommend to perform additional clinical tests at admission and use their scores as predictors.

The machine learning approach used in this study resulted in the development of predictive models with relatively high percentage of explained variance in comparison to previous studies. Since this approach aims to avoid overfitting, we think these models could be used for other patients as well.