Introduction

Traumatic cervical spinal cord injury (TCSCI) is associated with high lethality and disability1,2, seriously affecting the quality of life and psychological health of the patients and imposing a huge economic burden on the families and society3,4. Spinal cord injury (SCI) affects conduction of sensory and motor signals across the site of lesion, as well as the autonomic nervous system5,6. After TCSCI, not only the cervical nerve roots that innervate the upper limb muscles may be damaged, but also nerve fibers passing through the cervical SCI site and below the injury may be affected, resulting in motor dysfunction of the lower limb muscles innervated by the lumbar nerves7.

The International Standards for Neurological Classification of Spinal Cord Injury (ISNCSCI) is the international standard for characterizing sensorimotor impairments after SCI, that describes the examination as well as the classification including the American Spinal Injury Association (ASIA) Impairment Scale5,6,8. Among them, the ASIA motor score (AMS) including separate upper-extremity motor score (UEMS) and lower-extremity motor score (LEMS) is used to more clearly track a change in motor function and evaluate degree of motor function impairment of the key muscle groups corresponding to the cord segments5,9,10 Accurate prediction of motor function recovery in TCSCI is crucial for the development of targeted diagnosis, care and rehabilitation strategies in the very early stage of injury in that it can help clinicians design scientific treatment curves at a very early stage, set reasonable rehabilitation goals, and establish scientific and objective prognostic expectations for patients and their families9,10,11. In addition, accurate prediction facilitates resource allocation and long-term care planning, ultimately contributing to the improvement of therapeutic outcomes including the post-injury quality of life and mental health9.

However, given the heterogeneity of the injury and inconsistency among individual studies, prediction of motor function recovery after TCSCI is often complex9,10,12. One of the most difficult tasks that clinicians face is discussing neurological recovery and prognosis with patients and/or families13. Most current studies classify TCSCI patients into 5 levels through the initial ASIA Impairment Scale classification to predict motor function recovery9,14,15, which is a relatively crude method. However,there is no research to achieve accurate prediction of UEMS and LEMS corresponding to all key muscles after TCSCI.

In recent years, machine learning and ensemble learning technology have gradually been applied clinically16,17,18,19,20, providing new ideas for predicting motor function recovery in TCSCI. In this study, we developed a nested integration algorithm to achieve an accurate and robust prediction of motor function recovery with initial AMS(including UEMS and LEMS) for key muscle groups in TCSCI patients by using the concepts of "wisdom of the crowds" and "elites" to adequately interpret the unbalanced and complex clinical data18.

Materials and methods

Data collection and processing

Data collection and processing are key to determining the predictive power, and the following is a detailed process for data collection in this study. Inclusion criteria were patients aged ≥ 18 years with a clear diagnosis of TCSCI, who underwent the first AMS (including UEMS and LEMS) assessment at ≤ 24 h post-injury and 6 months post-injury. Exclusion criteria were patients with non TCSCI with the presence of other neurologic disorders or other past histories that may affect motor function, histories of surgery that may affect the recovery of motor function, and the presence of serious underlying medical conditions.

A total of 315 TCSCI patients were retrospectively collected from our institution from 2015 to 2023 according to the above inclusion and exclusion criteria, and patient information was desensitized with strict ethical guidelines. This study was approved by the Ethics Committee of Shanghai Changzheng Hospital, and the written informed consent was waived for the retrospective study of desensitized data. This study adhered to the principles of the Declaration of Helsinki and complied with the regulations of the Health Insurance Portability and Accountability Act (HIPAA).All patients in this study were anonymous.

The very early UEMS and LEMS within 24 h was obtained by three senior specialists with 30 years of experience in cervical SCI in our Spinal Cord Injury Treatment Center by comprehensively assessing the key muscle groups in the same time window and at the same place. The UEMS and LEMS results were recorded as a database. The choice to perform baseline ISNCSCI examination within 24 h in all subjects was based on recent international guidelines, which recommend that patients with traumatic SCI should undergo surgery within 24 h regardless of their initial neurological status21. The TCSCI patients collected from our hospital were also strictly operated within 24 h after injury22, and there were also a large number of studies proving that surgical decompression within 24 h can improve the neurological outcome of TCSCI patients23,24,25,26. And the initial UEMS and LEMS we collected was preoperative data.

The longitudinal follow-up motor function assessment of the patients was performed in the same way at 6 months post-injury, and the results were also entered into the database. The six-month period for follow-up was based on recommendations used in the NASCIS and Sygen trials27,28, which was also consistent with some studies24,29.

To ensure the quality of the data, incomplete data were deleted after processing the data set, the data set composed of AMS within 24 h and 6 months after TCSCI was then normalized and divided into a training set (80%) and a test set (20%).

The nested ensemble prediction algorithm

The first-stage model

Support Vector Classification (SVC)30 works by finding an optimal hyperplane in a multidimensional space that maximizes the residual between different classes of data points, and is excellent at identifying nonlinear and complex relationships in high-dimensional data, and is able to use a kernel function to transform nonlinearly separable data into easily separable high-dimensional spaces. Because data on TCSCI patients are complex and unbalanced, SVC is a good choice for TCSCI data processing. As one of the primary learning models, SVC adds a level of advanced analytic capability that allows for complex inferences to be made based on all patient-specific factors to predict motor recovery.

$$\begin{gathered} \mathop {\min }\limits_{\omega ,b,\zeta } \frac{1}{2}\omega^{T} \omega + C\sum\limits_{i = 1}^{n} {\zeta_{i} } \hfill \\ subject\,to\,y_{i} (\omega^{T} \Phi (x_{i} ) + b) \ge 1 - \zeta_{i} , \hfill \\ \zeta_{i} \ge 0,\,\,n = 1, \ldots ,n. \hfill \\ \end{gathered}$$

where Xi is the trained vector and y is the category label with value − 1 or 1.

Dummy31 is a classifier that uses simple rules to make predictions, and its main role is to establish a baseline level of performance against which the effectiveness of more complex models can be assessed, thus providing predictions based on basic rules and ensuring that the model is not overly complex to maintain the balance of the prediction methodology.

$$y_{t} = \beta_{0} + \beta_{1} x_{t} + \beta_{2} D + u_{t}$$

where yt, x_t are quantitative variables and D is a qualitative variable.

Weak-learner32 excels at capturing basic trends and patterns in data that more sophisticated algorithms may ignore, requires fewer computational resources, and is computationally fast and efficient.

$$\begin{gathered} H(x) = \frac{1}{n}\sum\limits_{1}^{n} {\omega_{i} h_{i} h(x)} \hfill \\ s.t.\,\,\omega_{i} \ge 0,\,\,\sum\limits_{1}^{n} {\omega_{i} = 1} . \hfill \\ \end{gathered}$$

Adaboost33: Adaptive Enhancement (Adaboost) is an iterative integration method for powerful predictive models, which acts as a more accurate learner assigning more weights to avoid overfitting and enhance the overall predictive power. It also automatically adjusts the “hardness” of the data, focusing on more challenging cases, which is ideal for changing data patterns.

$$H(x) = {\text{sign}}\left( {\sum\limits_{i}^{n} {(a_{t} + h_{t} (x))} } \right)$$

where at is the weight coefficients, ht(x) denotes the basic classifier and H(x) denotes the final classifier, which is based on a linear combination of sign function transformations.

The second-stage model

After considering the viewpoints of each primary model, Adaboost was chosen as the second-stage model to make more comprehensive and accurate prediction decisions, summarizing the training experience on all tasks.

Model architecture convergence

The nested ensemble algorithm developed in this study was implemented mainly through a two-stage architecture. It is a more accurate predictive modeling method for predicting motor function recovery 6 months after TCSCI. The first-stage architecture was constructed by SVC, Adaboost, Weak-learner and Dummy. Each module provided its own potential prediction results based on the input data, and the results from all the modules were then fused and passed to the second-stage model (Adaboost) to make the final prediction decision. The principal architecture is shown in Fig. 1.

Figure 1
figure 1

The principle framework diagram of the nested ensemble prediction model.

Figure 1 is the principle framework diagram of the nested ensemble prediction model. The ASIA motor scores corresponding to key muscle nodes within 24 h were input into each model in the first stage for prediction, and the potential prognostic results of the corresponding key muscle nodes were obtained. The potential prediction results of each key muscle node obtained by each model in the first stage were used as the input of the second stage model to obtain the final prediction result of the key muscle node, and then the prediction value of the corresponding key muscle node was output.

Training, validation and testing

The dataset was divided into a training set (80%) and a test set (20%). The training set was divided into K-folds for cross-validation training of the 4 models in the first stage. Specifically, the 4 models were trained k (k = 5) times in the first layer, with one in k (k = 5) samples kept for validation at the time of training, and the prediction results of each model were spliced on the above one in k (k = 5) validation samples. Then the prediction results obtained from the previous layer (predictive results splicing of the validation samples) were uploaded to the second layer for training and re-prediction to get the final fused results. Once the training of the entire nested ensemble prediction model was completed, predictions were made on the test set to get final test results of the model.

Model performance evaluation metrics

In order to verify the performance and effectiveness of the nested ensemble prediction model, it is necessary to evaluate the results. The metrics used in an artificial intelligence project will have a substantial influence on what the artificial intelligence tool actually does34. We have chosen accuracy, precision, recall rate, F1 score, and confusion matrix as the model performance evaluation metrics and calculated their results34.

Results

Considering that TCSCI data are extremely complex and unbalanced, the hierarchical optimization nested ensemble algorithm we developed can provide optimized prediction results of motor function recovery at 6 months post-injury after all models in both two stages jointly predicted. The accuracy of correct prediction of the first-stage models of SVC, Adaboost, Weak-learner and Dummy was 80.6%, 80.3%, 80.5% and 80.6% respectively. Recall rate was utilized to avoid possible misrepresentation due to data imbalance, and the recall rate of SVC, Adaboost, Weak-learner and Dummy was 80.6%, 80.3%, 80.5% and 80.6%, respectively. The precision was utilized to measure how many of the samples predicted as positive examples were true positive examples, specifically, which of SVC, Adaboost, Weak-learner and Dummy was 80.6%, 80.3%, 80.5% and 80.6% respectively. The average precision (AP) value was the average of the precision at different recall rate points, which was crucial to minimize the error of each primary learning model, and the value of SVC, Adaboost, Weak-learner and Dummy was 81.7%, 77.1%, 82.9% and 72.3% respectively. In particular, under the unbalanced distribution of motor function recovery data for TCSCI, the F1 score of SVC, Adaboost, Weak-learner and Dummy achieved a balance between precision and recall rate, accounting for 80.6%, 80.3%, 80.5% and 80.6% respectively.

The second-stage model (Adaboost) showed an accuracy of 80.6%, a recall rate of 80.6%, a precision of 80.6%, an AP of 74.3%, and a F1 score of 80.6%. The details are shown in Table 1.

Table 1 Visualization of performance at each level.

F1 score was found as the main indicator of the performance to predict the outcome of all features in this case and balanced the precision and recall rate extremely well, making the algorithm affirmative in the context of TCSCI. The graph in Fig. 2 shows the prediction performance trend of the nested ensemble model in different features. The comprehensive performance metrics and comparative analyses showed that the algorithm had good performance in predicting the motor function outcome in TCSCI patients.

Figure 2
figure 2

F1 score of each feature in the nested integration algorithm.

The comprehensive performance of the model for prediction at each feature is shown through F1 and the details are presented in Fig. 2. Knowing that relying on accuracy alone may lead to misleading results in an unbalanced dataset, we visualized the prediction results of the nested ensemble model by means of confusion matrix diagrams to show the real example prediction results and the wrong prediction results. Taking the left Elbow extensors dominated by C7 as an example, 43 of the 63 patients in the test set predicted correctly, of which 41 cases were correctly predicted for level 5 muscle strength and 2 cases were correctly predicted for level 1 muscle strength, and the remaining 20 predicted incorrectly. Specific details are all shown in Figs. 3, 4, 5, 6.

Figure 3
figure 3

Confusion matrix of key muscle nodes in the right upper limb.

Figure 4
figure 4

Confusion matrix of key muscle nodes in the left upper limb.

Figure 5
figure 5

Confusion matrix of key muscle nodes in the right lower limb.

Figure 6
figure 6

Confusion matrix of key muscle nodes in the left lower limb.

Discussion

Model performance analysis

The results of this study demonstrated the validity and generalizability of the nested ensemble prediction algorithm in accurately predicting prognostic function 6 months after TCSCI27,28. Ensemble learning of multiple machine learning models can increase the performance of the model, handle linear and nonlinear data, and avoid overfitting, which improves the accuracy and precision of the predictions compared to a single model17,18,20,35,36,37. The nested ensemble prediction algorithm we developed introduced the concept of nested for the first time based on multiple existing ensemble learning models and was implemented in two stages to better process the input data to achieve more accurate predictions, which was also the novelty of this study.

Multidimensional performance metrics34 such as high F1 score (80.1%) and accuracy (80.1%) demonstrated the robustness of the prediction algorithm on quantitative data processing of UEMS and LEMS in TCSCI patients. At the same time, the introduction of the confusion matrix to evaluate model performance was extremely important in terms of underreporting and misreporting. And these metrics are crucial for evaluating the potential for practical clinical applications and good for avoiding overfitting of the prediction algorithm due to excessive accuracy34.

Comparison with existing prediction methods

Previous prediction-related studies laid more emphasis on factors affecting prognosis and remained at the level of cause38,39,40,41,42, which often used standard statistical methods. And it is difficult to obtain high prediction accuracy since functional prognosis and explanatory variables do not necessarily have linear relations.

The rapid development of artificial intelligence technology, especially machine learning and ensemble learning, provides useful means for the realization of accurate prognosis prediction of TCSCI patients. For relatively large amounts of data, machine learning can locate complex nonlinear relationships in a given data set, and its predictive accuracy can be evaluated. Currently, machine learning has been widely used to predict the prognosis of SCI patients43,44,45,46,47. Some researchers used the extreme gradient boosting (XGBoost) learning approach to predict the neurological function recovery of SCI patients, which is a nonlinear regression prediction model that is superior to traditional linear methods16,48. The core idea of ensemble learning is to obtain better overall prediction performance by combining the prediction results of multiple machine learning models49, and it also has certain applications in predicting the prognosis and functional outcome of SCI43,50. Kato et al. used ensemble machine learning combined with ridge regression to predict the Spinal Cord Independence Measure total scores at discharge after SCI50. However, the nested ensemble prediction model we developed selected Adaboost33 as the second-stage model, which itself is a strong classifier with superior performance, and our prediction of functional outcome indicators for TCSCI patients was more detailed and direct, that was, the specific UEMS and LEMS corresponding to the 20 key muscles at 6 months after TCSCI. In short, our model achieves a leap from overall to detailed predictions.

Practical clinical contribution

The nested ensemble algorithm developed in this study is based on the clinical need and ensemble learning to predict motor function outcomes in TCSCI patients at 6 months post-injury by using only the very early AMS within 24 h21,23,24,25,26. The results of the study showed that the superiority of the nested ensemble algorithm and confirmed the potential of the algorithm as a powerful and reliable tool for predicting motor function in TCSCI patients. It makes a valuable contribution to the development of personalized treatment and rehabilitation of TCSCI patients by providing more nuanced and reliable predictions51.

The prediction algorithm can predict motor function recovery at 6 months post-injury by using first-hand quantitative scoring data at the very early stage, which can greatly help clinicians make reliable diagnosis, design appropriate treatment plans, and customize personalized rehabilitation programs as early possible, and help patients and their families set up scientific and objective prognostic expectations.

Limitations

Although our findings are clinically significant, the study has certain limitations.

Firstly, we chose the time point ≤ 24 h based on recent international guidelines, which recommend that patients with traumatic SCI should undergo surgery within 24 h regardless of their initial neurological status21. The TCSCI patients admitted to our hospital were also strictly operated within 24 h, and surgical decompression within 24 h can improve the neurological outcome of TCSCI patients23,24,25,26. However, it should be noted that if there are factors affecting the patient's cognition or communication early after injury, the early ISNCSCI examination5,6 may be unreliable52. In this case, an ISNCSCI examination performed 72 h after injury is better than earlier assessment53,54,55, which should be collected in the future to further predict recovery of the key muscles innervated by the injured area.

In addition, the outcome measure of our study was AMS at 6-month follow-up time based on recommendations used in the NASCIS and Sygen trials27,28, which was also consistent with some studies24,29. The article by Fawcett et al. shows that the rate of recovery eventually reaches plateau at 12–18 months after SCI56. In the future, we will further follow up patients to enable prediction of longer-term neurological outcomes.

Moreover, the size of the dataset and clinically relevant parameters including ASIA Impairment Scale need to be increased to train the model to obtain more robust performance. In the future, we will incorporate multi-center data to get the algorithm tested on larger and more diverse data sets. And we will also focus on the prediction of the neurological level of injury to achieve a more comprehensive and accurate prediction of post-injury recovery in patients with TCSCI.

Conclusions

We have developed a nested ensemble algorithm, whose performance and effectiveness were evaluated through accuracy, precision, recall, F1 score, and confusion matrix, and the model not only showed good overall accuracy in predicting patients’ motor function recovery but also could effectively identify patients of high risk for poor prognosis with few false positive rate. This may revolutionize the method and way of thinking in predicting motor function recovery in patients with TCSCI.

The algorithm can utilize very early clinical indicators in a timely and effective manner to provide objective recommendations to clinicians at all levels, assist them clinicians in making early intervention plans, and set objective expectations for patients and their families to deal with the severity and prognosis for recovery.