Introduction

Essential tremor (ET) and Parkinson's disease (PD) are two common movement disorders in older populations [1,2,3] and have some overlapping clinical features, e.g., rest and postural tremors, making it difficult to differentiate ET and PD in their early stages [4,5,6,7].

Early-stage PD and ET can be distinguished by the following clinical symptoms: (1) Tremor features: rest tremor is usually an early sign in PD, while action tremor is usually an early sign of ET, and rest tremor may be present years after disease onset in ET. Re-emergent tremor can be present in PD but absent in ET. (2) Bradykinesia: bradykinesia is the prerequisite for PD diagnosis, and bradykinesia in patients with PD often manifests as poor hand flexibility in the early stages. However, ET is not usually associated with bradykinesia. (3) Rigidity: PD is associated with rigidity, whereas ET usually is not. (4) Gait: early-stage PD may manifest as reduced stride length, slow speed, poor symmetry, and reduced amplitude of arm swing on the affected side. However, gait disorder is usually not present in ET [8].

However, despite the differences in the above symptoms between PD and ET, it is still challenging to distinguish early-stage PD from ET. First, rest tremor, postural tremor, or action tremor can be present in both ET and PD patients at the early stage. Second, some PD patients are only with tremor and other motor symptoms are very mild or not obvious, which are difficult to be detected by subjective observation and physical examination. Therefore, in recent years, some researchers have begun to use devices to quantitatively identify the above-mentioned motor symptoms, such as bradykinesia, tremor, and gait, hoping to achieve early diagnosis of PD.

In previous studies, researchers have mainly focused on the characteristics of the patient’s tremors to differentiate the diseases [9,10,11,12]. Long-term EMG recordings with/without combined accelerometers were used to differentiate the two disorders [9, 11]. EMG analysis might help differentiate the two disorders, but it is an invasive examination and is limited for application when patients have mixed types of tremors.

Gait impairment and bradykinesia have been reported in early-stage PD [13]. However, it is not easy to distinguish the gait impairment of early-stage PD from ET by subjective assessment, since these symptoms are minor. Therefore, advanced technologies, such as wearable sensors and motion capture systems, have been used for clinical differentiation of the diseases according to gait and balance parameters [14, 15]. Moon et al. [14] obtained some gait and balance characteristics from inertial measurement unit (IMU) sensors during the instrumented stand and walk test to discriminate PD and ET (average durations (years), PD: 8.2, ET: 13.83). The results showed that the cross-validation F1 score was 0.61 for the best model. However, it was not clear if this classification system could be applied to differentiate early-stage PD from ET. The Time Up and Go (TUG) test has been widely used as an assessment for gait and balance problems in movement disorders, including PD and ET [16, 17], and its use in combination with IMU sensors was recommended for capturing the raw kinematic signal to quantitatively analyze the gait [18, 19].

Segmentation of the TUG test into phases provides additional parameters [20] that might be helpful for differentiating PD from ET. Therefore, we separated the entire TUG test into four components (standing, straight walk, turning, and sitting) and integrated them into a final weighted average ensemble classification model. In our diagnostic study, we primarily examined whether wearable sensor-based gait and postural transition parameters obtained from the TUG test could be used as input features in machine learning algorithms to differentiate early-stage PD from ET.

Materials and methods

Participants

This study was approved by the Ethics Committee of Ruijin Hospital, Shanghai Jiao Tong University School of Medicine. Written informed consent was obtained from all the participants. Eighty-four subjects with PD (age: 58.13 ± 10.43) and eighty age-matched ET subjects (age: 58.7 ± 13.9) participated in this study at Ruijin Hospital, Shanghai Jiao Tong University School of Medicine between October 2019 and November 2021. PD and ET subjects were diagnosed by two movement disorder specialists according to the Movement Disorder Society (MDS) PD criteria [21] and ET criteria [22]. Only early-stage PD [Hoehn and Yahr (H&Y) stage 1–1.5)] [23] and ET patients who had limb tremor symptoms were recruited into this study. The exclusion criteria were as follows: (1) a history of cerebrovascular disease (e.g., infarction, hemorrhage), brain tumor, head trauma or any psychiatric disorders; (2) a history of medication known to cause parkinsonism or affect clinical assessment; (3) orthopedic impairment or other disease that likely contributed significantly to gait disturbance; (4) MMSE < 24 or cognitive disorder that likely contributed significantly to gait disturbance; and (5) participants who had both PD and ET. The demographic data and clinical characteristics of participants are provided in Table 1.

Table 1 Demographic characteristics of the early-stage PD subjects and ET subjects

Protocol and materials

A wearable motion and gait quantification assessment system, MATRIX (GYENNO SCIENCE, Shenzhen, China), which is commercially available, was utilized in this study [24]. It is approved by the National Medical Products Administration (NMPA), U.S. Food and Drug Administration (FDA), and Conformitè Europëenne Medical (CE Medical). All participants were equipped with 10 IMU sensors, with a sampling rate of 100 Hz (Fig. 1A). Each IMU provided inertial sensing results via a (1) tri-axial accelerometer (range = ± 16 g, sensitivity = 16,384 LSB/g) and a (2) tri-axial gyroscope (range =  ± 2000 dps, sensitivity = 131 LSB/dps). Two hand sensors were bilaterally placed on the dorsal side of the wrist. The chest sensor was placed on the sternum of the chest, and the waist sensor was attached to the fifth lumbar vertebra. Two thigh sensors were bilaterally placed 7 cm above the knee, while two shank sensors were bilaterally placed 7 cm below the knee. Two-foot sensors were bilaterally placed at the instep (dorsal side of the metatarsus) of each foot. All sensors were tightened to designated locations by straps (Fig. 1B). The TUG test was performed (Fig. 1C). During the TUG test, participants were instructed to stand up from a chair, walk in a straight line for 5 m at a comfortable speed, turn 180° around at the end of the 5 m marker, walk back to the start point, turn 180° around in front of the chair and sit down on the chair. The raw kinematical signals of participants during TUG tests were captured using the ten wearable sensors in real time and were transmitted to the host computer via a Bluetooth link for further analysis. A total of 184 gait and postural transition parameters (Appendix A, Table 4) based on the raw kinematical signals were calculated automatically by our prebuilt MATLAB algorithm [24]. An introduction of the gait cycles and segmentation of the TUG test is shown in Appendix C.

Fig. 1
figure 1

A Sensor overview. B Sensor locations. C TUG test procedure

Data analysis

Data were split into a training dataset and testing dataset

The entire dataset included 164 recordings (PD: 84, ET: 80), of which 80% of the recordings (PD: 67, ET: 64) were used for training, whereas the remaining 20% (ET: 13, PD: 20) were used for independent testing. In the training dataset, we ensured that the age, sex, and height between the PD and ET groups were matched. Leave-one-out cross-validation (LOOCV) was performed to fine-tune the model parameters in the training data. The final model was selected among different candidate models based on model LOOCV performance. Independent testing was used to provide an unbiased evaluation of the final model (Fig. 2).

Fig. 2
figure 2

Model training and independent test data evaluation

Weighted average ensemble classification model construction

Several feature selection methods were tried (Appendix D) and generated three different feature groups, FG I, FG II and FG III. We trained support vector machine (SVM) and random forest (RF) models on the training subset with features from different feature groups. For each feature group, we trained the models for four components, straight walk, sitting, standing and turning, with corresponding component features separately using LOOCV and obtained 4 probabilities of having PD for each subject. These probabilities were used as input variables for building logistic regression models. The logistic regression coefficients were combined into weights used in a linear combination of the previous 4 probabilities, resulting in an ensemble learning prediction probability score (\(P\_{\text{val}}\)) for each subject in the training set. If \(P\_{\text{val}}\) was > 0.5, the subject would be classified as having PD using our model; otherwise, they would be classified as having ET. This kind of model was called the weighted average ensemble classification model. We selected the weighted average ensemble classification model, which had the best model performance among all models, as our final model. Independent test data were used to evaluate the final model. We calculated the ensemble learning prediction probability score of each subject in the test data \(P\) by multiplying the weights obtained during the above training process with the test data predicted scores obtained from four different component models. If P was > 0.5, the subject in the test data would be classified as having PD using our model, otherwise they would be classified as having ET (details about ensemble classification model construction are in Appendix E).

Performance evaluation

The classification models were evaluated with accuracy, kappa, sensitivity, specificity, and AUC. An ROC curve is a graph showing the classification model performance at all different classification thresholds. AUC is the area under the ROC curve. In our case, we set PD as a positive case and ET as a negative case. The accuracy, sensitivity, specificity, and kappa [25] were calculated as follows (TP = true positive, TN = true negative, FP = false positive, FN = false negative):

$${\text{Accuracy}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}}$$
$${\text{Kappa}} = \frac{{2 \times ({\text{TP}} \times {\text{TN}} - {\text{FN}} \times {\text{FP}})}}{{({\text{TP}} + {\text{FP}}) \times ({\text{FP}} + {\text{TN}}) + ({\text{TP}} + {\text{FN}}) \times ({\text{FN}} + {\text{TN}})}}$$
$${\text{Sensitivity}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}}$$
$${\text{Specificity}} = \frac{{{\text{TN}}}}{{{\text{TN}} + {\text{FP}}}}$$

Results

Feature comparisons of gait and transition parameters between early-stage PD and ET

Sixty-six out of 214 features were significantly different between early PD and ET (Table 2). Importantly, we found that some feature parameters that differed between early PD and ET were consistent with clinical observations. For example, the Arm–Symbolic Symmetry Index was higher in PD than in ET by approximately 16.0%. This parameter was used to describe the symmetry of the arms’ movements during the TUG test. The lower the parameter is, the better the symmetry of the arms is. Our results showed that Arm–Symbolic Symmetry Index was higher in PD than in ET, indicating that PD presents with worse arm symmetry than ET. This finding was consistent with the clinical cases in which early PD usually reduces the arm swing range on the affected side, resulting in bilateral asymmetry, while ET arm swing was not affected. Stand To Sit–Trunk–Min Lean Angle was lower in PD than in ET by approximately 5.38%. It was defined as the sagittal projection of the maximum backward tilt angle during the sitting process (backward: positive value, forward: negative value). The higher the parameter was, the larger backward tilt the participant had during the sitting process. Our results showed that Stand To Sit–Trunk–Min Lean Angle was lower in PD than in ET, indicating that PD has a smaller backward tilt range than ET. This finding was consistent with the clinical cases that PD usually has a smaller range of motion. Sit To Stand–Trunk–Max Sagittal Angular Velocity was smaller in PD than in ET by approximately 18.9%. This parameter is defined as the absolute value of the sagittal projection of the maximum angular velocity of the trunk during the standing process. The higher the parameter is, the faster the participants stand from the chair. Our results showed that Sit To Stand–Trunk–Max Sagittal Angular Velocity was smaller in PD than ET, indicating that PD stand slower from the chair during TUG testing compared to ET on average. The 180° Turn–Max Angular Velocity was smaller in PD than in ET by approximately 13.0%. This parameter is defined as the maximum value of angular velocity during the turning process. The higher the value is, the faster the participants turn. Our results showed that the 180° Turn–Max Angular Velocity was smaller in PD than in ET, indicating that PD turned slower than ET on average. These two features, Sit To Stand–Trunk–Max Sagittal Angular Velocity and 180° Turn–Max Angular Velocity, were consistent with the clinical cases in which bradykinesia was the main symptom in patients with PD. Although these parameters differ between early PD and ET, box plots of these four features showed overlaps between early PD and ET (Fig. 3), so the combination of more features was needed to achieve a more accurate discrimination.

Table 2 Significant gait and postural transition features obtained from the TUG test
Fig. 3
figure 3

Representative feature comparison between early-stage PD and ET. The box plot represents the following data: the central line represents the median, the top and bottom line of the box represents the 75th quantile (Q3) and 25th quantile (Q1), the top and bottom of the error bar indicates the “Maximum” (Q3 + 1.5 × (Q3 − Q1)) and “Minimum” (Q1 − 1.5 × (Q3 − Q1)), dots represent outliers (outside the “Maximum” and “Minimum”). p: P value was estimated using the Mann‒Whitney U test for exploring feature discrimination ability between the PD and ET groups

Feature selection

Several feature selection methods were tried in the training data, and the feature combinations were organized into 3 groups based on the feature selection method results.

Method 1: Sixty-six out of 214 features were significantly different between early PD and ET (Table 2); thus, FG I contained 66 features. In addition, 56, 6, 2, and 2 features were obtained from the straight walk, turning, sitting, and standing components, respectively.

Method 2: 34 out of 66 features were kept (eTable 1 in the Online Resource); thus, FG II contained 34 features. In addition, 29, 3, 1, and 1 were obtained from the straight walk, turning, sitting, and standing components, respectively.

Method 3: Out of the 66 significant features, 33 individual features were most discriminative in differentiating early PD from ET, with a fivefold cross-validation ROC AUC ≥ 0.6 (Table 2, above the blue line); thus, FG III contained 33 features. Among these 33 most discriminative features in differentiating early PD from ET, 25 were obtained from the straight walk component, and 5, 2, and 1 were obtained from the turning, sitting, and standing components, respectively.

Model LOOCV performance comparisons

The segmented model LOOCV performance results and weights of four different components (straight walk, turning, standing, and sitting) in all model (SVM or RF) and feature group (FG I, II, III) combinations are shown in Appendix B Table 5. We found that straight walk achieved the highest weights among the other components in all model and feature group combinations. An example of how to calculate the ensemble learning prediction probability score is shown in Appendix F.

The average ensemble classification model validation performance result was obtained as mentioned in the Methods. The validation results of SVM and RF with FG I, II, and III are shown in Table 3. SVM with FG III outperformed all the other models and feature group combinations [accuracy: 84%, kappa: 0.68, sensitivity: 85.9%, specificity: 82.1%, AUC: 0.912]. Therefore, SVM with FG III was selected as our final weighted average ensemble classification model. Corresponding to this best ensemble model, the validation accuracy of the straight walk component (25 features) for discriminating early PD and ET was 78.6%, which achieved the highest accuracy among all four components since turning (5 features), standing (1 feature), and sitting (2 features) were 67.9%, 76.3% and 61.1%, respectively. Most of the information came from the straight walk component, such as bradykinesia, arm swing range, arm symmetry, etc. This may explain why such a component makes the main contribution to the classification process.

Table 3 LOOCV performance of the weighted average ensemble classification models

Independent clinical evaluation

In our study, the entire dataset had 164 recordings (PD: 84, ET: 80), in which 80% of the recordings were used for training, whereas the remaining 20% were used for independent testing. The selected final weighted average ensemble classification model (SVM with FG III) was evaluated on these 20% recordings (13 ET, 20 PD). The test data performance of the Weight Average Ensemble Classification Model was as follows: accuracy: 75.8%, kappa: 0.492, sensitivity: 80%, specificity: 69.2%, and AUC: 0.823.

Discussion

The weighted average ensemble classification model with the basic SVM model and FG III outperformed the other ensemble classification models. Our final ensemble model was evaluated in independent test data and achieved 75.8% accuracy in discriminating between early-stage PD and ET.

Consistency of discriminative parameters and clinical manifestations

The 33 most classifying gait parameters and postural transition parameters (Table 2, above the blue line), such as Arm–Symbolic Symmetry Index, Stand To Sit–Trunk–Min Lean Angle, Sit To Stand–Trunk–Max Sagittal Angular Velocity, and 180° Turn–Max Angular Velocity, were consistent with clinical manifestations. PD patients are characterized by rest tremor, bradykinesia, rigidity and postural instability [26]. They showed speed slow-down and amplitude reduction in turning, arm swing, cadence, and trunk rotation compared with ET [15, 27]. When PD patients were at an early stage, motor symptom asymmetry was especially prominent. Thus, slow velocity of turning/sit/stand, increased arm swing asymmetry, and reduced amplitude of trunk rotation were typical clinical features of early-PD patients [28]. Previously, PD was differentiated from ET simply by physical examination and clinical experience. To our surprise, wearable sensors make it possible to sensitively detect these differential characteristics in early-stage PD patients. Moreover, wearable sensors also provide richer multidimensional information than subjective assessment, which greatly improves the accuracy of differential diagnosis.

Bradykinesia-related gait parameters correlated with MDS-UPDRS bradykinesia scores

In our study, the bradykinesia scores were calculated based on the MDS-UPDRS motor score according to previous criteria (sum of items 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, and 3.14) [29]. The results showed that several gait parameters that could reflect bradykinesia during gait performance were associated with clinically subjective assessed bradykinesia scores. For example, 180° Turn–Mean Angular Velocity (r = -0.44, p = 0.00027), Sit To Stand–Trunk–Max Sagittal Angular Velocity (r = − 0.41, p = 0.0008) and 180° Turn–Max Angular Velocity (r = − 0.33, p = 0.008) had a negative correlation with bradykinesia scores, while 180° Turn–Duration had positive correlation (r = 0.44, p = 0.00027) with bradykinesia scores. Bradykinesia is the main symptom in PD. Bradykinesia is usually measured according to the UPDRS part III (motor section), but such measurement suffers from low reliability [30, 31]. Interestingly, our study revealed that bradykinesia manifestation could also be presented by our gait analysis system via velocity and time duration parameters. These bradykinesia-related gait parameters could also be used to differentiate between early PD and ET (Table 2, above the blue line).

The advantage of our study

First, our study separated the entire TUG test into four components and integrated them into a final weighted average ensemble classification model with different weights. The key reason for this separation was considering the different levels of sensitivity to PD/ET for the four components, which should correspond to different weights while building the ensemble classification model. Second, to evaluate our model performance, we kept 20% of our whole dataset as our independent test data. Third, unlike previous studies that focused on PD with H&Y 1–4 [11], we included PD subjects with H&Y 1–1.5, and both PD and ET patients had limb tremor symptoms, which was more meaningful and challenging for the differentiation between early PD and ET in clinical practice. Fourth, our dataset and final model are highly stable. We added a process at the end to verify the stability of our data and the selected final ensemble model (eDiscussion in the Online Resource).

Comparison between our model and other methods

Other studies have used EMG or sensors to distinguish between PD and ET. Ghassemi et al. [9] utilized features that were extracted from the tremor component of the hand movement signal obtained from EMG and accelerometer while participants performed standardized upper extremity movement tests to distinguish PD from ET (13 PD and 11 ET) and achieved a LOOCV accuracy of 83%. Although Ghassemi et al.’s study was comparable in accuracy to our study (LOOCV accuracy of 84%), the EMG they utilized is an invasive examination, and when participants had mixed types of tremors, the technology’s applications were limited. Moon et al. [14] utilized gait measures collected from wearable sensors combined with machine learning methods to distinguish PD from ET, which is similar to our research. However, the F1 score of their best model was 0.61. To make the comparison to their study, we calculated the F1 score based on our model and found that the F1 score of our best model was 0.84 for LOOCV and 0.8 for independent test data, suggesting that the accuracy of our model is better than theirs. Additionally, the average disease duration for PD in their study was 8.2 years; therefore, it was not clear if their study could be applied to discriminate between early-stage PD and ET. However, the differential diagnosis of early-stage PD and ET is exactly the clinical challenge, and that is what we have worked on solving.

Limitations and future study

Our proposed early-stage PD and ET classification model (LOOCV accuracy: 84%, independent test accuracy: 75.8%) is good with proven feasibility and potential to differentiate early-stage PD from ET, but it is not outstanding. Some limitations and future extension need to be considered to make the current study better. First, there are individual differences in gait and postural parameters even in participants who have the same disease. To better represent the population data, future research should include more participants to make the samples more representative and the research more reliable and accurate. Second, in addition to the basic models SVM and RF, other models should also be taken into consideration, which may make the classification framework more accurate. Third, the assessments from our current study were all performed in the clinic with a gait quantitative evaluation system. It is affordable and readily available for the clinic; however, it may not be suitable for personal use at home. In addition to gait parameters obtained from a short and standard test such as the TUG test in the clinic, future work is required to extend this approach to real-world gait assessments with more flexible devices and continuous monitoring and compare it with our current classification framework. Fourth, in addition to gait features, other measurements, such as the measurement of hand tremor variation acquired through tremor signal analysis from rest, postural and kinetic tasks [32] and the measurement of the temperature of participants before and after cold stimuli acquired through the cold stress test [33], could be integrated into our current study. Fifth, although no participants complained about the number of sensors that they needed to wear during the test, sensor number minimization can be considered in our future work and compared with our study to see if we could balance the number of sensors and the overall accuracy of our classification model.

Conclusion

Our study showed that simple wearable sensors combined with machine learning algorithms and instrumented TUG test had the potential to differentiate early-stage PD from ET.