Introduction

Health related quality of life (HRQOL) refers to the way an individual feels and reacts to his/her health status affected by medical conditions1. Compared to quality of life that covers all aspects of well-beings of human life, HRQOL focuses more on well-beings related to health domains such as physical, functional and mental health and it has been regarded as an important outcome of treatments1,2.

Stroke remains a leading cause of long-term disability3. It has a wide-ranging impact not only on physical and daily function but also on HRQOL4. Most patients still suffered from deteriorated HRQOL even in the chronic phase of stroke, which makes HRQOL an crucial target for stroke rehabilitation4,5. To improve patients’ HRQOL, healthcare professionals have to provide rehabilitation interventions that are most effective for each patient based on his/her responses to that rehabilitation therapy. Building accurate prediction models for forecasting patients’ HRQOL improvements after rehabilitation interventions and identifying predictors relevant for HRQOL improvements in stroke patients are thus imperative for providing insights to healthcare professionals on making accurate clinical decision.

Machine learning (ML) has become a popular prediction analytic approach. Machine learning uses automatic computerized algorithms to discover patterns in the data and builds prediction models to forecast future events. Machine learning is particularly suitable for predicting health outcomes because it can process large volumes of data, analyze the complex relationship between various different features/variables and easily incorporate new variables into prediction models without re-adjusting the preprogrammed rules6. In addition, the feature selection procedure can be incorporated into machine learning procedures to help identify important predictors7. These advantages make machine learning a potentially ideal tool for realizing accurate outcome prediction in patient populations.

In stroke, machine learning has been primarily used for predicting motor and activities of daily living (ADL) recovery and has achieved an overall positive result8,9,10,11,12. However, to our knowledge, only one study to date has applied machine learning algorithms in predicting stroke-specific HRQOL recovery13. In that study, the authors incorporated six demographic factors into machine learning models and built a preliminary system to forecast HRQOL changes of chronic stroke patients. Small prediction errors (i.e., the root mean square errors) were found between the data derived from the prediction model and the actual data collected from the patient, suggesting that machine learning might be feasible for predicting HRQOL changes in chronic stroke patients13.

Despite this positive evidence, the previous study only included demographic attributes into the machine learning prediction model13; nevertheless, HRQOL has been shown to be affected by factors across multiple domains including demographic as well as health-related domains such as physical and functional domains4,5,14. Including only demographic attributes in the machine learning model may not be sufficient for optimizing prediction accuracy. In addition, the previous study only examined prediction errors (e.g., the mean squared error) of the machine learning model13. Important clinical performance metrics such as prediction accuracy and the ability of machine learning models to distinguish between responders and non-responders to rehabilitation interventions remain largely unexplored15. A comprehensive examination of machine learning prediction performance along with factors across health domains is required for determining the efficacy of machine learning on predicting HRQOL recovery of stroke patients after rehabilitation interventions.

Stroke sensorimotor rehabilitation interventions including the robot-assisted therapy (RT), mirror therapy (MT) and transcranial direct current stimulation (tDCS) have become popular approaches for improving stroke recovery in the recent decade. These three approaches (i.e., RT, MT and tDCS) use modern equipment/modalities (e.g., robotic arms, mirror boxes and electrical stimulators) to modulate peripheral and/or central sensorimotor systems (e.g., visuomotor and sensorimotor systems and cortical areas) to augment stroke recovery16,17,18. Several studies have demonstrated that these three sensorimotor interventions (i.e., RT, MT and tDCS) not only facilitated functional recovery but also improved participation and HRQOL in stroke patients19,20,21,22,23,24,25. The rationale of why these three sensorimotor interventions (i.e., RT, MT and tDCS) could improve HRQOL is that these interventions could reduce arm/hand impairment, restore arm/hand function, which would allow stroke patients to participate in daily activities and accomplish essential daily tasks19,20,21,22,23,24,25. Most daily tasks such as bathing, dressing, dining and grocery shopping all involve use of the arm/hand to manipulate objects to accomplish tasks. Good arm/hand function would lead to successful participation in daily tasks and subsequently may increase stroke patients’ subjective feeling of well-beings and satisfaction toward daily life19,20,21,22,23,24,25. Thus, these three interventions (i.e., RT, MT and tDCS) may have potentials to be incorporated into current clinical practice to facilitate not only functional recovery but also HRQOL in stroke patients. Machine learning may be a potentially useful tool for predicting HRQOL changes after these three interventions, which may help identify responders to these three interventions and facilitate clinical application6,7.

Therefore, the purpose of this study was to determine the performance of machine learning algorithms on predicting clinically significant HRQOL improvements of chronic stroke patients after stroke sensorimotor rehabilitation interventions including the RT, MT and tDCS. We examined the performance of five commonly used machine learning algorithms and identified important predictors for building machine learning prediction models.

Methods

Study design

This study was an observational cohort study that used secondary analysis of data from our previous randomized controlled or cluster-controlled trials and ongoing projects24,26,27,28. Data screening was done by three investigators (Liao WW, Wu CY and Hsieh YW). The three investigators determined the eligibility and completeness of the data. Patients that completed the interventions and outcome measurements at pre- and post-intervention were included for analysis.

Participants

One hundred and thirty-two chronic stroke patients (N = 132) were included. Participants were recruited from three hospitals in the northern part of Taiwan. Table 1 outlines the characteristics of participants. The inclusion criteria were (1) a first-ever unilateral ischemic or hemorrhagic stroke, (2) more than 6 months post stroke, (3) Fugl-Meyer assessment scale of upper extremity (FMA) scores between 18 and 60, suggesting mild to moderate arm hemiparesis29, (4) no excessive spasticity in upper limb joints (Modified Ashworth Scale, MAS ≤ 3)30, (5) ability to follow study instructions (Mini-Mental State Examination ≥ 22), and (6) no concomitant neurological disorders (e.g., brain tumor and dementia). The exclusion criteria were (1) participation in any drug or rehabilitation projects/experiments in the past 6 months, (2) had Botulinum toxin injections in the past 3 months, (3) severe vision or visual perception impairments (e.g., neglect and poor visual field) as assessed by the National Institutes of Health Stroke Subscale, and (4) any contradictions to non-invasive brain stimulation (for participants receiving tDCS)31. The institutional review boards of participating hospitals including the Linkou Chang Gung Memorial Hospital and Taipei Tzu Chi Hospital approved the trials. All participants provided written informed consents before enrolled into clinical trials. All study procedures were conducted in accordance with the Declaration of Helsinki.

Table 1 Clinical characteristics of participants.

Stroke sensorimotor rehabilitation interventions

All participants received interventions for 1.5 to 2 h per session with a total of approximately 30 h of training across 3 to 4 weeks. The frequency and duration of training were similar to those of most rehabilitation interventions studies19,20,21,22,23,24,25. Participants received interventions at the hospitals where they were recruited from. The interventions were administered by certified occupational therapists that were properly trained by the senior therapists and the principal investigators (Wu CY). Among these participants, 70 received RT, 32 received MT and 30 received tDCS/MT.

For the RT, participants practiced unilateral paretic movements by using the InMotion robotic systems (the InMotion ARM and InMotion WRIST)28. For the MT, participants imagined that the mirror reflection of the non-paretic arm was the paretic arm and performed bilateral movements as simultaneously as possible26,27. For the tDCS/MT, participants received 2 mA anodal tDCS on the ipsilesional primary motor cortex for 20 min followed by another 20 min of MT24. For all trainings (i.e., RT, MT and tDCS/MT), participants performed an additional 15–30 min of functional task training in each session. Participants were assessed within 1 week before and after interventions by the evaluators that were blinded to the study purpose and treatment allocation of participants.

Classification of HRQOL improvement

The Stroke Impact Scale (SIS) 3.0 was selected as the major outcome for classifying HRQOL improvements32. It is a self-reported questionnaire used for evaluating HRQOL in stroke patients. The reliability and validity of SIS have been well established33,34. The SIS consists of two parts. The first part is the main scale of SIS that assesses multidimensional HRQOL including the strength, hand function, ADL/instrumental ADL, mobility, communication, emotion, memory/thinking, and social participation32. It has 59 items and each item was scored subjectively by stroke patients based on the difficulty they perceived in that item during the past two weeks. The scores of each domain were transformed to a score out of 100 and the mean scores of all domains were used to represent multidimensional HRQOL of stroke patients35,36. The second part is a global rating scale that evaluates stroke patients’ self-perceived HRQOL recovery. The scores range from 0 (no recovery) to 100 (full recovery). Both parts were included in this study to comprehensively represent multi-dimensional HRQOL and global self-perceived HRQOL changes of chronic stroke patients.

To facilitate clinical use of our ML prediction models, the minimal clinical important differences (MCID) were selected as the criterion to classify participants into high and low responders. The MCID is the smallest change in scores that were considered clinically important and meaningful in health status perceived by the patient37. Previous studies have set the MCID as 10–15% of total scores in patient populations and received clinically beneficial results38,39,40. As a result, based on the literatures, we defined the MCID as 10% of changes in the scores of main SIS scale and global rating scale. Participants that had SIS change scores greater than or equal to 10 were classified as high responders and participants with SIS change scores less than 10 were classified as low responders to stroke sensorimotor rehabilitation interventions.

Candidate predictors

We selected thirty-two potential predictors based on stroke HRQOL literatures and the International Classification of Functioning, Disability and Health (ICF) framework to include “Body function and structures”, “Activity” and “Participation” attributes41,42,43. These predictors included (1) personal characteristic attributes: age, gender, education, time since stroke and side of hemiplegia, (2) baseline cognitive and motor function attributes: The Montreal Cognitive Assessment (MOCA) scores44, FMA proximal/distal/total scores45, Wolf Motor Function Test (time and functional ability scale) scores46, Medical Research Council (MRC) muscle strength scale scores (paretic shoulder, elbow, wrist, and finger muscle strength scores and MRC mean scores)47, MAS scores (paretic shoulder, elbow, forearm, wrist, finger scores and MAS mean scores)47, Motor Activity Log (MAL) amount of use (AOU) and quality of movement (QOM) scores48 and Box and Block test scores49, (3) baseline sensory function attributes: the revised Nottingham Sensory Assessment (RNSA) scores (paretic side tactile sensation, proprioception and stereognosis)50, (4) baseline ADL and instrumental ADL attributes: Functional Independence Measure (FIM) scores51 and the Nottingham Extended Activities of Daily Living (NEADL) scale scores52, and (5) baseline HRQOL attributes: the SIS mean scores and global rating scores32. These attributes are commonly used in research and clinical settings to represent the motor/sensory impairment, functional ability and participation of stroke patients, and therefore were selected as potential predictors41,42,43.

Machine learning algorithms

Five ML algorithms, which were the random forest (RF), k-nearest neighbors (KNN), artificial neural network (ANN), support vector machine (SVM), and logistic regression (LG) were used for developing prediction models. The RF uses the ensemble learning method for outcome prediction. It combines the results of multiple decision trees and generates a final overall result to augment prediction accuracy53. It is a flexible method that can be used in categorical and continuous data. In addition, the RF has a low probability of overfitting and is therefore suitable for use in the clinical setting53. The KNN is a distance-based method. It predicts that similar objects would exist in close proximity. As a result, it labels the class of the target based on the majority of classes of its surrounding k neighbors54. The KNN classification pattern is similar to the clinical decision-making process made by the clinicians/therapists, where similar treatments would be prescribed to patients with similar responses and characteristics55. The ANN is inspired by the neurological network of the brain56. It consists of several neurons/nodes in layers including the input, hidden and output layers. The input layer receives the data, and transfers it to the hidden layer, where the computations (e.g. the activation function) are primarily taken places. After the computations are done, the hidden layer generates the final output to the output layer and finishes the prediction model. The ANN can process complex health informatics data and is therefore a potentially useful tool for outcome prediction in stroke patients57. The feedforward back propagation method was used in this study. The SVM uses the binary classification technique, where it projects data onto a high-dimensional plane by using the kernel function first, and then finds the maximum-margin hyperplane that best separates data into two classes58. The SVM is efficient in high dimensional planes and suitable for modeling complicated medical data. The LG is a binary classification technique that uses the logistic sigmoid function to predict the probability of observed data that would belong to one of the two possible classes59. It is a commonly used algorithm for building prediction models in stroke patients.

These 5 ML algorithms were selected because they are widely used modeling techniques for outcome prediction in patients and have been shown to have good prediction performance6.

Feature selection procedure

The feature selection procedure was performed to remove redundant attributes and identify the essential ones for prediction accuracy7. A popular feature selection method called “information gain ratio” was implemented11. This feature selection method evaluates the influence (i.e., the information gain) of each attribute to the output classes (i.e., the SIS classes) using the ranker search method60,61,62. A higher gain ratio of the attribute indicates a greater contribution of this attribute to prediction accuracy62,63. In this study, attributes with gain ratio greater than zero were used for developing ML prediction models.

Model development and testing

Figure 1 shows the model development and testing process. Data were randomized and divided into a training data set (70%) and a test data set (30%)64. The training data set was used for training and developing the model. The test data set was used for final evaluation of model performance. The tenfold cross validation procedure was performed to train the models65. During the tenfold cross validation process, the training data set was split into 10 groups, where 9 of them were used for training the model while the remaining one was used for validating the model. This process was repeated until all groups of data had been trained and validated. After the model was built, the testing data set was entered into the model to assess the model performance.

Figure 1
figure 1

The flow chart of model development and validation process. Subject data were randomized into a training set and a test set. The training set was 70% of the data and the test set was 30% of the data. For the training data set, the tenfold cross validation procedure was used to train and build 5 machine learning models (i.e., the RF, KNN, ANN, SVM and LG) in which the data was randomly split into 10 groups (9 groups for training and 1 group for validation). The tenfold cross validation process repeated until all 10 groups of data were trained and validated. The tenfold cross validation process was performed for all 5 machine learning models. After the 5 models were built, the test data set was entered into the 5 models to determine the model performance.

The hyper-parameters of the prediction models were determined according to the procedures used in the ML literatures. For the RF model, the numbers of trees to build were 100 and the numbers of features to consider at a node were the first integer less than log2M + 1 (M is the number of inputs)66. For the KNN and ANN models, the tenfold cross validation was employed to tune the value of hyper-parameters (i.e., the k value of the KNN; the numbers of hidden neurons in the hidden layer of the ANN)10. We found that k = 5 and the numbers of hidden neurons = 3 in one hidden layer (the main SIS scale) and k = 9 and the numbers of hidden neurons = 2 in one hidden layer (the SIS global rating scale) had the best prediction accuracy. As a result, these hyper-parameters were used in the KNN and ANN models. For the SVM model, the polynomial kernel function was employed because it provided the best prediction accuracy58,59.

Model performance metrics

The performance of ML models was evaluated using the standard ML performance metrics including (1) accuracy, (2) recall, (3) precision, (4) F1 scores, and (5) area under the receiver operating characteristic curve (AUC-ROC)15. Accuracy is an overall index of prediction performance. Accuracy was computed as the sum of true positive (TP) and true negative (TN) divided by the sum of TP, TN, false positive (FP) and false negative (FN). Recall is the ratio of participants that were correctly identified as positive by the model to those whom were actually positive. Recall was computed as the TP divided by the sum of TP and FN. Precision is the ratio of participants that were correctly identified as positive by the model to those were labelled as positive by the model. Precision was calculated as TP divided by the sum of TP and FP. F1 scores is a combined index of precision and recall. It was calculated as the harmonic mean of precision and recall. The AUC-ROC is the ratio of area under the ROC curve to the total area. It represents the ability of the model to distinguish between classes.

Statistical analysis

The continuous variables were standardized and the categorical variables were coded before developing the ML models. The Waikato Environment for Knowledge Analysis (Weka) 3.8.3 developed by the University of Waikato, New Zeeland was employed for model development, training and testing67. The Weka has been extensively used for constructing ML prediction models in various fields and in different patient populations in the medical field11,68,69.

Results

Five most important attributes were identified by the feature selection procedure for the SIS HRQOL main scale, which were the baseline SIS mean scores (gain ratio = 0.1), baseline MRC finger metacarpophalangeal (MP) extensors (gain ratio = 0.15) and flexors (gain ratio = 0.06) scores, baseline MAS wrist flexors (gain ratio = 0.1) scores and baseline WMFT time (gain ratio = 0.14). The gain ratio of the other 31 attributes was 0. As a result, these 5 attributes were used for developing the SIS multidimensional HRQOL prediction model.

Four most important attributes were identified by the feature selection procedure for the SIS global rating scale, including age (gain ratio = 0.16), gender (gain ratio = 0.06), baseline RNSA stereognosis (gain ratio = 0.1) and proprioception (gain ratio = 0.07) scores. The gain ratio of the other 32 attributes was 0. Therefore, these 4 attributes were used for developing the stroke global self-perceived HRQOL recovery prediction model.

Table 2 summarizes the performance metrics of the five ML models. For the SIS multidimensional HRQOL scale, the prediction performance was the best in the RF model. The accuracy of the RF model was 85%, precision was 0.88, recall was 0.85 the F1 scores were 0.85 and the AUC-ROC was 0.86. The prediction performance was similar between the other 4 models (KNN, ANN, SVM and LG). The accuracy ranged from 72 to 75%, the precision was from 0.73 to 0.77, the recall was from 0.73 to 0.75, the F1 scores were from 0.72 to 0.75 and the AUC-ROC was from 0.71 to 0.87.

Table 2 Performance metrics of SIS prediction models.

For the SIS global rating scale, the prediction performance was the best and similar between the RF and KNN model. The accuracy of the RF model was 80%, the precision was 0.78, the recall was 0.8, the F1 scores were 0.78 and the AUC-ROC was 0.75. The accuracy of KNN model was 82.5%, the precision was 0.82, the recall was 0.83, the F1 scores were 0.81 and the ACU-ROC was 0.76. The prediction performance was similar between the other three models (ANN, SVM and LG). The accuracy was all 77.5%, the precision was from 0.77 to 0.78, the recall was all 0.78, the F1 scores were from 0.77 to 0.78 and the AUC-ROC was from 0.68 to 0.75.

Discussion

Our results demonstrated that machine learning could accurately predict HRQOL improvements after stroke sensorimotor rehabilitation interventions in chronic stroke patients. In particular, the RF and the KNN models had better performance than the other three algorithms (i.e., ANN, SVM and LG). The RF model had 85% accuracy on predicting multidimensional HRQOL changes and 80% accuracy on forecasting global self-perceived recovery. It could also accurately distinguish between high and low responders on the multidimensional HRQOL outcome with 86% chances and on the global self-perceived recovery with 75% chances. The KNN model had good prediction performance on global self-perceived recovery only, where it had 82.5% prediction accuracy and could distinguish between high and low responders with 76% chances. Furthermore, we identified important attributes for predicting multidimensional HRQOL improvements, which were the baseline HRQOL, baseline paretic finger muscle strength, wrist muscle tone and arm movement efficiency, and also key attributes for forecasting global self-perceived recovery including age, gender, baseline hand stereognosis and limb proprioception.

To our knowledge, this study is the first to comprehensively evaluate the performance of ML algorithms on predicting HRQOL improvements after stroke sensorimotor rehabilitation interventions in stroke patients. In addition, we identified the two more effective algorithms, which were the RF and KNN among 5 commonly used ML algorithms. The RF algorithm had good prediction performance on both multidimensional HRQOL and global self-perceived recovery. This good prediction performance may be contributed by the unique ensemble method that it employed in the modeling process. During the ensemble modeling process, the RF creates as many base models (i.e., the decision trees) as many as possible and combines these base models into a final one66. Therefore, instead of creating one model and hoping this model would be the best, the RF takes a myriad of multiple models into account to optimize its prediction accuracy. Furthermore, these base models (i.e., the decision trees) are designed to be uncorrelated with each other, thus reducing the probability of overfitting66. Our finding of the superior performance of RF was in line with several previous studies demonstrating that ML algorithms with ensemble methods such as the RF and the Adaboost algorithms outperformed other types of ML algorithms, for example the decision trees or LG63,70,71,72,73. Future studies could examine the performance of ML algorithms with different types of ensemble methods such as the boosting, bagging and stacking on HRQOL outcome prediction to determine the best one for use in stroke patients.

To our surprise, the KNN algorithm has similar and slightly better prediction performance than the RF algorithm on predicting global self-perceived HRQOL recovery in chronic stroke patients. Compared to the RF, the KNN is a simpler and more straightforward model because it is a distance-based method and does not involve ensemble learning processes54,55. Our study revealed that despite its simplicity, the KNN may be as powerful as the RF model when predicting a single-item outcome such as the SIS global self-perceived HRQOL recovery scale in stroke patients. In contrast, the KNN may not be the best algorithm for processing multidimensional health outcome data due to its weaker prediction performance on multidimensional than single item SIS HRQOL outcomes. As a result, we recommend using the KNN model for predicting simple HRQOL outcome changes in chronic stroke patients. Future studies could compare and contrast the performance of KNN on single domain and multidimensional health outcome data to validate findings of this study.

Our study showed that 5 attributes related to initial HRQOL, muscle function (the muscle strength and muscle tone) and movement efficiency were important predictors for forecasting multidimensional HRQOL improvements after sensorimotor rehabilitation interventions. Indeed, studies have found that baseline HRQOL was associated with HRQOL restoration in stroke patients74,75. It is thus not surprising to find baseline SIS scores important for HRQOL improvements after interventions. In addition, studies have shown that paretic arm/hand muscle strength and muscle tone significantly affected functional recovery after stroke43,76,77. The movement efficiency of the paretic arm was also associated with HRQOL recovery post stroke5. Similarly, in the present study, we found that muscle function (i.e., baseline MRC finger MP extensors/flexors, MAS wrist flexors) and the movement efficiency (i.e., baseline WMFT time scores) of the arm associated with prediction of HRQOL improvements. However, compared to previous studies, we further identified the most important components, which were the muscle function of “the wrist/hand” and the movement efficiency of “the whole arm”. These components are highly involved in daily routines. For example, stroke patients have to be able to appropriately open/close their hands, grasp/release objects and move their arms efficiently in time to perform most essential daily tasks such as dressing, bathing and cooking. The inability to accomplish essential daily tasks may result in a source of distress and consequently affect participation and life satisfaction after stroke78. Therefore, even though muscle function and movement efficiency are commonly regarded as the basic level (i.e., body functions and structures) of the ICF model, they could substantially affect the multi-dimensional HRQOL recovery after stroke sensorimotor interventions and therefore should be considered when assigning these interventions to stroke patients.

In addition, our study also found another four attributes imperative for forecasting global self-perceived HRQOL recovery, which were age, gender and the paretic hand stereognosis and limb proprioception. Age and gender have been demonstrated to be associated with HRQOL recovery post stroke in previous studies5,79. Our results support theses previous findings and further suggest that additional somatosensory impairments, especially the sensory components including the hand stereognosis and limb proprioception are also important for predicting global self-perceived HRQOL recovery. Indeed, reduced sensation has been found to be related to slower recovery, decreased motor function (e.g., motor control and activity) and lesser rehabilitation outcomes43,80,81. Sensory deficits, such as impaired hand stereognosis and limb proprioception may cause difficulties in sensing arm/hand position and recognizing objects in the hand and hard to control arm/hand movements. This may result in decreased confidence in using the arm/hand in daily activities, fear of safety, thus affecting stroke patients’ participation and satisfaction in daily life80,81,82. Our results were also in line with one previous study that found paretic limb proprioception associated with reduced HRQOL and increased feeling of social isolation in stroke patients14. Nonetheless, most studies did not assess the contributions of sensory impairments on HRQOL outcome prediction after rehabilitation interventions in stroke patients83. Our study suggested that sensory impairment may contribute to global self-perceived HRQOL recovery post stroke intervention and could be considered in the data collection and HRQOL model building process.

Taken together, we found that attributes in the three major categories: the personal characteristics (i.e., age and gender), initial life satisfaction (baseline SIS mean scores) and arm/hand sensorimotor components (i.e., muscle function, movement efficiency, hand stereognosis and limb proprioception) were important predictors for HRQOL restoration after stroke sensorimotor rehabilitation. Based on our findings, healthcare professionals could at least assess these attributes in the three categories before assigning the three sensorimotor interventions to stroke patients. In addition, these attributes may have potential to be used as indicators to determine which stroke patient may be suitable for receiving stroke sensorimotor rehabilitation interventions to improve HRQOL. This may help to improve clinical rehabilitation efficacy and potentially save workload in the hospitals/clinics.

Study limitations

Six limitations should be considered. First, our HRQOL outcome prediction was focused on stroke sensorimotor rehabilitation interventions that share similar rehabilitation principles. Future studies could examine whether the identified predictors, such as the sensorimotor components could generalize to HRQOL outcome prediction in other types of rehabilitation interventions. Second, our predictions were based on changes immediately after interventions. Future studies could investigate the prediction performance of ML algorithms in the follow-up period. This will help identify patients that could retain improvements after sensorimotor interventions. Third, we examined 5 commonly used ML algorithms. Future studies could evaluate the performance of other types of ML or deep learning algorithms such as those with ensemble methods (e.g., the Adaboost)84 or the deep learning algorithms (e.g., the deep neural network)85 and compare the results to our findings to determine the optimal method for predicting HRQOL restoration after rehabilitation interventions in stroke patients. Fourth, this study included thirty-two potential predictors based on the results of previous prediction model studies. Nevertheless, some stroke-related factors were not included, for example, the size/severity of lesion, due to the incapability to retrieve those data from some of our patients. Future studies could examine if these stroke related factors would also be important predictors for forecasting HRQOL recovery in chronic stroke patients, in addition the predictors identified in this study. Fifth, we used SIS to assess multi-dimensional HRQOL and self-perceived global HRQOL changes in stroke patients. Although SIS is a widely used HRQOL assessment in stroke rehabilitation field, it is an ordinal questionnaire that may still have measurement errors. In addition, the SIS may not cover all aspects of HRQOL related factors such as environmental factors (e.g., the context and time), personal factors (e.g., personality) and social indicators (e.g., economic status). We encourage future researchers to include the above factors into machine learning prediction models and examine whether inclusion of these additional factors would optimize prediction accuracy on HRQOL. Sixth, the machine learning models built in this study were preliminary models that focused on predicting HRQOL recovery. Therefore, these models should not be used for excluding patients from receiving the three stroke sensorimotor interventions at this moment. Future studies could include more participants and develop machine learning models that comprehensively predict motor, functional and HRQOL recovery in stroke patients.

Conclusion

Machine learning may predict clinically significant HRQOL improvements after stroke sensorimotor rehabilitation interventions in chronic stroke patients. In particular, the RF and the KNN algorithms may be more effective than the other 3 ML algorithms (DT, LG and SVM) and could be considered for use in clinical settings. We suggest including at least three categories of predictors, which are age/gender, initial HRQOL and sensorimotor components including sensory function, muscle function and movement time into the ML HRQOL prediction model for optimizing prediction accuracy. Future studies with a different sample of stroke patients are warranted to validate our findings and improve model generalizability.