FormalPara Key Summary Points

Why carry out this study?

Firstly, it is urgent to precisely predict the risk of severe fever with thrombocytopenia syndrome virus (SFTSV) developing into encephalitis and its mortality at the early stage of the illness, since there is a close association with encephalitis and high mortality rate (i.e., 12–50%).

Secondly, use of artificial intelligence is spreading rapidly in many areas due to its excellent performance. The effectiveness of artificial intelligence in SFTS-assisted diagnosis is unclear.

Our models contribute to the early accurate prognosis of SFTS, even with limited medical resources in underdeveloped areas.

What was learned from the study?

The reservoir computing with boosted topology (RC-BT) model that predicts the risk of developing encephalitis in patients with SFTS in our study contains nine clinical parameters at the admission and shows an area under curve of 0.899 (95% CI 0.882–0.916), a sensitivity of 0.855 (95% confidence interval [CI] 0.824–0.886), and a specificity of 0.859 (95% CI 0.831–0.887).

The RC–BT model for the prediction of the fatality of patients with SFTS in our study includes seven parameters and demonstrates an area under curve of 0.917 (95% CI 0.902–0.932), a sensitivity of 0.913 (95% CI 0.902–0.924), and a specificity of 0.884 (95% CI 0.851–0.917).

Introduction

Severe fever with thrombocytopenia syndrome (SFTS), first identified in China in 2009 and lately reported in Korea and Japan, is considered as one of the top ten blueprint priority diseases by the World Health Organization [1]. SFTS is an emerging hemorrhagic fever caused by a tick-borne virus (SFTS virus, i.e., SFTSV), carrying a high fatality rate [2]. SFTSV is a phlebovirus from the family Phenuiviridae and the order Bunyavirales [3, 4]. Additionally, SFTSV is genetically closely related to heartland virus (HRTV), a tick-borne virus reported in the United States that causes a clinically similar illness to SFTS [5,6,7,8].

SFTS typically presents a spectrum of clinical manifestations, such as mild-to-multiorgan failure, disseminated intravascular coagulation (DIC), and mortality. The typical clinical presentations include fatigue, severe fever, thrombocytopenia, leukocytopenia, and gastrointestinal symptoms. In severe and fatal cases, encephalitis, DIC, or multiorgan failure occurs. The mortality rate of SFTS ranges from 12% to 50% [9]. Particularly, it is reported that 19.1% of patients with SFTS develop into encephalitis, and the fatal outcome occurs in 44.7% of the patients with encephalitis [10]. Therefore, it is urgent to precisely predict the risk of developing into encephalitis and mortality at the early stage of SFTS. Despite there being no effective therapy for SFTS currently [11], emerging evidence suggests that some medical interventions, such as calcium channel blockers and favipiravir, might ameliorate the disease progression and reduce the fatality of patients with SFTS [12, 13]. Therefore, it is urgent to establish precise prognosis models to predict the fatality of SFTS disease or its association with encephalitis. Our previous study has reported a scoring model, which however, is only applicable to fatality risk prediction and needs to be improved [14].

It used to be fraught with challenges to identify predictors associated with encephalitis or fatality; however, it is now easier with artificial intelligence. Through machine learning, rules can be learned from one dataset, and then used to make predictions for another dataset. Machine learning has been found to be able to improve the predictive modeling of clinical outcome in various diseases [15,16,17]. Nevertheless, the existing machine learning models rely on a large data size, which might degrade generalization performances if given a small sample size, due to the consideration of unnecessary parameters. Therefore, it is critical to develop an optimization algorithm that can improve the performance with a small sample size. In addition, since the patients with SFTS usually reside in rural areas with limited medical resources, it is not feasible to access various clinical parameters that rely excessively on medical testing. It is ideal to predict the clinical outcome of the disease based on as few medical test data as possible, which poses a higher requirement for the feasibility and the generalizability of the prediction model.

Herein, we aim to develop a novel reservoir computing with boosted topology machine learning algorithm that can predict the clinical outcome of patients with SFTS, achieving accurate classification prediction of fatality and encephalitis of patients with SFTS.

Methods

Study Design and Ethics Statement

In this retrospective and multicenter study, data are retrieved from three large tertiary hospitals in Jiangsu Province of China, including Nanjing Drum Tower Hospital, Zhongda Hospital, and the Second Hospital of Nanjing. In this study, all SFTS cases are diagnosed according to the following two conditions: (1) acute fever of > 38 °C with thrombocytopenia [platelet (PLT) count < 100 × 109/L], and (2) laboratory-confirmed SFTSV infection by a certificated real-time polymerase chain reaction (RT-PCR) kit. The criteria for clinically diagnosed SFTS-related encephalitis are defined as patients with SFTS with disordered mental status (including decreased or altered level of consciousness, lethargy, or personality change) lasting ≥ 24 h, with no alternative cause identified [18].

We collect the demographic factors, clinical presentation, clinical outcome, and laboratory parameters from 327 patients with SFTS hospitalized between 27 October 2010 and 23 August 2022. For all the patients, the parameters used in the model were collected at the first time of admission during the diagnosis, for an early prediction of the potential development of SFTS disease. Data are collected retrospectively by trained investigators using a uniform form and reviewed by research staff. Our study is approved by the human research ethics committee of Nanjing Drum Tower Hospital, Zhongda Hospital, and the Second Hospital of Nanjing. The patient consent is waived by the ethics committees because of the retrospective nature of the study. This study is performed in accordance with the Helsinki Declaration of 1964 and its later amendments.

Data Description and Processing

Our study workflow includes the following three parts (Fig. 1): (1) data description and preprocessing—our SFTS dataset is categorized into encephalitis versus nonencephalitis and fatality versus survived, (2) the original reservoir computing (RC) model is optimized with boosted topology structure, which generates the RC with boosted topology (RC–BT) model—the performance of the RC–BT model is validated in terms of information with graph theory, and (3) two predictive classification tasks for encephalitis and fatality are performed by the newly developed RC–BT model. Meanwhile, the RC–BT model is compared with the commonly used models.

Fig. 1
figure 1

Workflow for the development and validation of RC–BT model for SFTS prognosis. Part I represents the data processing process. Part II shows the RC–BT optimization algorithm. Part III represents prediction of SFTS-related encephalitis and prediction of SFTS fatality. In part III, training cohort and testing cohort are fully and randomly divided (267 samples from Nanjing Drum Tower Hospital). In the validation cohort, 38 samples are from Nanjing Drum Tower Hospital, 8 from Zhongda Hospital, and 14 from the Second Hospital of Nanjing

The admission data for patients with SFTS, including demographic information, clinical presentation, and laboratory parameters, are collected (Fig. S1A). Unstructured text-based clinical data are handled by the one-hot encoding method of natural language processing. Both the numerical and unstructured text-based clinical data are retrieved by least absolute shrinkage and selection operator (LASSO). The latter is obtained from the physician–patient communications, including but not limited to disease history, symptoms, etc. The parameters obtained by LASSO are imported into the model with equal weights, and any nonlinearity is handled by artificial intelligence (AI).

The Architecture of Reservoir Computing Algorithm

Reservoir computing (RC) is a recently developed machine learning algorithm that has been applied in temporal information processing [19] and complex system dynamics prediction [20], etc. It is suggested that RC out-performs other large-scale models in the setting of small-size datasets [21, 22]. The classification function of RC relies on an activation function added to the output layer of the original model. The RC structure consists of the input matrix, the intermediate layer network, the output matrix, the state-iterating equation, and the output function (Fig. S2). To improve the model prediction performance, RC is enhanced by topological changes, resulting in RC–BT. The detailed description is provided in Supplementary Methods.

Results

Training and Validation of SFTS RC–BT Model

To train and test the prediction classification of SFTS-related encephalitis or fatality, the training cohort and the test cohort are randomly divided using 267 samples from Nanjing Drum Tower Hospital, on the basis of our clinical criteria of patients with SFTS during hospitalization. To verify the generalization performance of RC–BT, data from these three hospitals are utilized as the validation cohort, including 38 samples from Nanjing Drum Tower Hospital, 8 from Zhongda Hospital, and 14 from the Second Hospital of Nanjing. Using five-fold cross-validation approach, the cohort is divided randomly into training cohort and testing cohort in 8:2 (Fig. 1). By applying the first-order regularization method of LASSO, we set the weights of the lower contributing parameters as zero. Specifically, in the task to predict SFTS-related encephalitis, nine parameters are extracted, namely calcium, cholesterol, muscle soreness, dry cough, smoking history, body temperature at admission, troponin T, potassium, and thermal peak. For the task to predict fatality of patients with SFTS, seven parameters are extracted, including calcium, cholesterol, history of drinking, headache, field contact, potassium, and dyspnea (Fig. 2A, C).

Fig. 2
figure 2

Key indicators analysis. Indicator weight map for the prediction of SFTS-related encephalitis (nine indicators) in A and for the prediction of SFTS fatality (seven indicators) in B. The length of the horizontal column represents the importance of the corresponding indicator. Note that the indicator with a larger weight is more important. Indicator correlation heat map for the prediction of SFTS-related encephalitis in C and for the prediction of SFTS fatality in D. Note that the correlation between most of the two predictions is small. Although some of the indicators are highly correlated, this combination is the best in the prediction

The correlations of these parameters are also analyzed, and Pearson correlation coefficients between these parameters range from −0.18 to 0.22. (Fig. 2B, D), suggesting that most parameters are relatively independent and are essential in our prediction model.

Performance of SFTS RC–BT Model for Encephalitis and Fatality

The clinical characteristics of the patients with SFTS are summarized in Table 1. The predictive performance of the SFTS RC–BT model with respect to the clinical outcome of patients with SFTS are evaluated on the basis of independent validation cohorts (Table 2). To verify the effectiveness of RC–BT model, we test the difference between RC–BT with RC in area under curve (AUC) using the same cohort (Fig. S2). For the encephalitis prediction task, the accuracy for the validation cohort by the RC–BT model is 0.897 (95% CI 0.873–0.921), and the sensitivity and NPV of the RC–BT model are 0.855 (95% CI 0.824–0.886) and 0.904 (95% CI 0.863–0.945), respectively. High AUC value indicates an excellent predictive performance of the RC–BT model for the validation cohort, which is 0.899 (95% CI 0.882–0.916). Meanwhile, the specificity and positive predictive value (PPV) is 0.859 (95% CI 0.831–0.887) and 0.876 (95% CI 0.854–0.898), respectively. For the fatal risk prediction task, the RC–BT model achieves the accuracy of 0.903 (95% CI 0.881–0.925), the sensitivity of 0.913 (95% CI 0.902–0.924), the specificity of 0.884 (95% CI 0.851–0.917), the PPV of 0.809 (95% CI 0.802–0.816), the NPV of 0.946 (95% CI 0.917–0.975), and the AUC of 0.917 (95% CI 0.902–0.932), respectively.

Table 1 Demographic and clinical information of training, testing and validation cohorts of patients with SFTS in our study
Table 2 The performance metrics of our developed RC–BT models to predict SFTS encephalitis and fatality

The Comparative Prediction Performance of SFTS RC–BT Model Versus the Machine Learning Algorithms

To testify whether our newly developed SFTS RC–BT models improve the prediction performance, compared with traditional machine learning algorithms, we also compare two of SFTS RC–BT models with the models using Lightgbm, support vector machine (SVM), Xgboost, Decision Tree, and Neural Network (NN). For the SFTS encephalitis prediction task, the accuracy, specificity, and PPV is 0.723, 0.863, and 0.651 in SVM, respectively, the sensitivity and NPV is 0.631 and 0.812 in NN, respectively, and the AUC is 0.627 in Xgboost (Table 3 and Fig. 2). Accordingly, metric value for the RC–BT model to the best value for the machine learning models is +0.174 (accuracy), +0.224 (sensitivity), −0.004 (specificity), +0.225 (PPV), +0.092 (NPV), and +0.272 (AUC), respectively. Therefore, SFTS RC–BT model performs best among all tested machine-learning algorithms (Fig. 3).

Table 3 Performance comparisons between the RC–BT model with traditional machine learning algorithms
Fig. 3
figure 3

Performance of RC–BT models and the traditional machine learning algorithms. The prediction of SFTS-related encephalitis (A) is tested under different machine learning models (Decision Tree, Lightgbm, SVM, Xgboost, and NN) using the same cohort. The machine learning model and binary regression model (scoring model) are used in the prediction of SFTS fatality (B). The performance of RC–BT models are the best in the above two prediction experiments

Next, we examine the prediction performance for SFTS-related fatality using RC–BT model versus alternative machine learning models. We previously reported a scoring model to predict SFTS fatality, which is also included in this comparison [14]. We find that the accuracy, sensitivity, and specificity of previous scoring model is 0.81, 0.745, and 0.871, respectively. The PPV of SVM is 0.721, and the NPV and AUC of previous scoring model is 0.855 and 0.708, respectively. The metric value for the RC–BT model is +0.104 (accuracy), +0.178 (sensitivity), +0.018 (specificity), +0.088 (PPV), +0.092 (NPV), and +0.219 (AUC), respectively. Our data suggest that, when a small size dataset is tested, RC–BT model for the prediction of SFTS-related fatality remarkably outperforms our previously-reported scoring model, as well as the baseline machine learning model.

Nonlinearity of SFTS RC–BT Model Versus Linearity of Multiple Regression

Since machine learning algorithm is capable of identifying nonlinear laws in multiple indicators, we further compare the prediction performance using two RC–BT models and the traditional binary regression model. For both tasks, we remove one indicator from the N-dimensional parameters, and then (N-1)-dimensional parameters remain in the model. Firstly, in the RC–BT model to predict SFTS-related encephalitis (Fig. 4A), the performance of RC–BT model is slightly improved after removing “smoking history,” compared with the removal of any other indicator (AUC, 0.699), whereas the AUC is 0.899 when SFTS RC–BT model is trained with all nine indicators. Besides, in the RC–BT model to predict SFTS fatality (Fig. 4B), the AUC of 0.799 reaches maximum with models using six indicators. Nevertheless, the AUC reaches 0.917 for our final RC–BT model to predict SFTS fatality, which consists of seven indicators. Therefore, the AUC derived from (N-1)-parameters is not further improved when compared with that of N-parameters. Of note, the sum of the above differences is greater than the AUC of our RC–BT model, which includes all the indicators. Therefore, the contribution of each indicator in the prediction task is in a nonlinear relationship, and our RC–BT model can identify the laws that cannot be demonstrated by traditional models.

Fig. 4
figure 4

Receiver operating characteristic (ROC) curve after removing an indicator in our SFTS RC–BT model. In the two subgraphs, the “X curve” indicates that after the “X” indicator is removed, the remaining indicator data are used for model experiments. For example, “thermal peak curve” in A means that eight indicators, other than this one, are used for experiments. The AUC on the right side of the two subgraphs can be observed as greater than the AUC of all parameters (nine indicators in the prediction of SFTS encephalitis, seven indicators in the prediction of SFTS fatality) after differential summation. This demonstrates that there is a nonlinear relationship between the indicators

Discussion

In this study, we develop a machine learning algorithm, reservoir computing with boosted topology model to predict the risk of encephalitis or fatality among patients with SFTS. Our SFTS RC–BT models show improved prediction performance and the prediction classification accuracy, which exceeds 90%, for encephalitis or fatality.

Our prediction model for SFTS–related encephalitis includes nine indicators, including calcium, cholesterol, muscle soreness, dry cough, smoking history, temperature at admission, troponin T, potassium, and thermal peak. And our prediction model for SFTS fatality contains seven indicators, including calcium, cholesterol, history of drinking, headache, field contact, potassium, and dyspnea. The indicators included in these two prediction models are easy to collect, which can serve as promising tools for early clinical prognosis of patients with SFTS. Additionally, there are several infectious diseases, including the infection of heartland virus, Crimean–Congo hemorrhagic fever, and granulocytic anaplasmosis, which share similar clinical signs and laboratory findings. It will be interesting to determine whether our prediction model can also be applied to similar infectious diseases.

Previous studies that dissected the interaction of SFTSV and host immune responses strongly support the key parameters recognized by our RT–BT models. For example, calcium channel blocker inhibits SFTSV replication in vitro, correlating with our uncovering of calcium ion concentration indicators [23]. Cholesterol synthesis pathways regulated by site 1 protease (S1P) are required for efficient replications of SFTSV [24]. NLRP3 inflammasome activation in SFTSV infection may be stimulated by calcium influx or potassium efflux [25]. Our previous study also identified that patients with cough, wheezing dyspnea in encephalitis group are significantly higher at clinical manifestations, than those in nonencephalitis group [18].

The superior prediction performance of our RC–BT model can be explained by (1) the ability of RC–BT model to capture nonlinear correlations between input parameters presented in the dataset, and (2) further optimization for the middle layer topology of information processing. The nonlinear processing capability of RC–BT models also validate nonlinear association between the metrics, suggesting that the machine learning-based algorithm has improved the prediction performance compared with that of traditional linear models. RC–BT models are an excellent predictor of clinical diagnosis, demonstrating a nonlinear association between these indicators. We further implemented a validation test using multicenter cohorts to ensure that the RC–BT models can also be generalized in alternative study centers. Therefore, our proposed RC–BT models can also be applied in prediction tasks for the clinical cohort with a relatively small size, but offers a high accuracy, specificity, and sensitivity of prediction classification.

Prognosis of clinical outcome of patients with SFTS at an early stage allows prompt intervention for patients with SFTS with life threatening conditions, such as encephalitis or fatality. Effective interventions include the treatment with antiviral therapy, dehydration, and cranial pressure lowering therapy, and timely attention and maintenance of nutritional and electrolyte balance. In addition, our RC–BT-assisted prognosis models can not only improve the diagnostic accuracy of less experienced physicians, but also be applied in developed areas with overloaded medical needs or rural areas with limited medical resources. This can facilitate the clinical judgement of SFTS treatment strategy and reduce the treatment expanse during hospitalization.

Our study also has several limitations. First, this is a retrospective study, whereas a study with a prospective cohort might be more convincing. Second, the size of the current validation cohort is relatively small. Further evaluations are necessary to validate these findings and evaluate the clinical utility of these algorithms in a prospective patients with SFTS cohort.

Conclusion

Our developed machine learning algorithm, reservoir computing with boosted topology, provides promising prediction models to predict either encephalitis or fatality of patients with SFTS at hospital admission. Our models can not only greatly improve the early prognosis accuracy of SFTS, but also be widely applied in underdeveloped areas with limited medical resources.