Introduction

Dyslipidemia is a medical condition that refers to an abnormal level of lipid metabolism, including high levels of total cholesterol (TC), Triglyceride (TG), and low-density lipoprotein cholesterol (LDL-C) and low levels of high-density lipoprotein cholesterol (HDL-C) [1, 2]. Dyslipidemia is recognized as a prominent risk factor for cardiovascular diseases (CVD) [3,4,5,6] and one of the primary independent modifiable factors of diabetes [7] and stroke [8]. Clinical observations and epidemiological studies have shown that dyslipidemia increases the risk of cardiovascular disease events [9,10,11]. Therefore, the effective prevention and treatment of dyslipidemia are of great importance [12].

Statins can significantly improve the prognosis of dyslipidemia. However, their side effects, including liver dysfunction [13], statin-induced myopathy [14], high creatinine level, high creatine kinase level, cannot be ignored. Hence, effective treatment for dyslipidemia with little side effect has become a major field of interest [15].

Traditional Chinese medicine (TCM) has been used in clinical practice for more than 2000 years in China. Clinical and lab research showed that TCM has certain traits and some strengths in treating dyslipidemia [16,17,18]. In diagnosing, a TCM doctor would first determine the syndrome (That's the classification or pattern of diseases in TCM), of the patient based on factors for diagnosis such as symptoms, tongue and pulse manifestations, etc., and then treat according to the syndrome (classification). The essence of TCM diagnosis is a classification problem. Previous research [19, 20] has shown that mutual obstruction of phlegm and stasis (MOPS) is the most common dyslipidemia classification. However, Which factors for diagnosis should be used as diagnostic rules and how to quantify their importance all depend on the doctor's personal experience, lacking of unified standards and verification on large sample data. Thus, we hope to develop a new type of prediction tool based on the factors that can diagnose objectively without relying on personal experience.

In recent years, great progress has been achieved in applying deep-learning in medical research. As an approach to deep learning, the artificial neural network (ANN) is a highly parameterized, non-linear model [21]. It can approximate observed outcomes with minimal error [22], or approximate any continuous function [23], and support non-linear complex classification problems. ANN has been widely used in medical research, such as providing decision support in cancer [24], predicting of clinical deterioration in adult patients with hematologic malignancies [25], and tumor biology [26, 27], etc.

As of now, there is no similar research on the classification and prediction model for dyslipidemia in TCM. In this study, we try to use the deep learning frameworks TensorFlow [28, 29] and Keras [30, 31] to train and construct an effective predictive model for dyslipidemia with MOPS. The TCM diagnosis process was converted into a process of multi-factor classification. A large amount of clinical data was used to develop the models. Research Flow for data processing and construction of the predictive model is illustrated in Fig. 1. Standardized and objective diagnostic patterns buried in data were then revealed, paving the way for more efficient TCM treatment of dyslipidemia (See Fig. 2)

Fig. 1
figure 1

Research flow for data processing and construction of the predictive models. After conducting a preliminary screening of the data collected from the data source, 1019 cases of patients with dyslipidemia were obtained. The baseline table, type distribution, and 89 diagnostic factors were obtained through data statistics and processing. The 89 diagnostic factors were screened using multiple linear regression and were sorted based on their correlation with the output parameters to obtain the 36 most important diagnostic factors. The 36 important diagnostic factors were used as inputs to the models, with whether or not the patient has dyslipidemia with MOPS as the output. The data set was randomly divided into a training set, validation set, and test set in a ratio of 60:20:20, and the models' performance were evaluated. Finally, the optimal model was determined to be Model-11 based on the comprehensive evaluation results of the models, and the importance distribution of the diagnostic factors was calculated

Fig. 2
figure 2

The schematic diagram of the predictive models. The nodes of the neural network are represented by hollow circles, and the weight of the neural network connection is represented by the width of the edges. The input parameters are represented by the 36 nodes of the input layer on the left side of the model, corresponding to the 36 diagnostic factors in TCM. In the middle are hidden layers. On the right is the output layer consisting of one node. The node represents the output parameter, whether the predicted value of the model was dyslipidemia with MOPS

Methods and materials

Source of data

This study was based on a cross-sectional data collection. The data came from 1,019 cases of confirmed dyslipidemia collected from 2013 to 2016 by three hospitals, namely Guang'anmen Hospital of Chinese Academy of Traditional Chinese Medicine, Beijing Traditional Chinese Medicine Hospital Affiliated to Capital Medical University and Dongzhimen Hospital of Beijing University of Chinese Medicine. All patients signed informed consent.

Diagnostic criteria

The diagnostic criteria of dyslipidemia in this study are based on the Guidelines for the Prevention and Treatment of Dyslipidemia in Chinese Adults (2016 Revision) [32], that is, TC ≥ 6.2 mmol /L and / or TG ≥ 2.3 mmol /L and / or LDL-C ≥ 4.1 mmol /L and / or HDL-C < 1.0 mmol /L.

Diagnostic criteria of TCM in this study are based on the differentiation standard of dyslipidemia with MOPS in the Clinical Diagnosis and Treatment Terminology of Traditional Chinese Medicine—Syndrome Part [33]. The study referred to the Terms of Traditional Chinese Medicine [34] for standardization for syndromes that are unclear in the above documents.

Inclusion and exclusion criteria

Inclusion criteria: (1) The patient meets the diagnostic criteria of the modern clinic and the syndrome differentiation criteria of TCM; (2) The patient is physically and mentally stable; (3) The patient is between 20 and 90 years old; (4) The patient agrees to sign informed consent.

Exclusion criteria: (1) Patients with secondary hyperlipidemia; (2) Patients with chronic consumptive diseases such as malignant tumors and tuberculosis; (3) Patients with serious heart, liver, kidney, hematopoietic system and other primary diseases; (4) Patients with recent surgery or trauma; (5) Patients with mental diseases; (6) Patients who have recently taken hypolipidemic drugs; (7) Patients with other metabolic abnormalities; (8) Patients whose observation data are incomplete and may affect the result evaluation.

Syndrome type distribution

Two cardiovascular experts with senior professional titles were responsible for determining the syndrome of the patients based on the disease and syndrome differentiation standard of TCM adopted in this study. A total of 1019 cases of dyslipidemia were included, of which 255 cases were identified as the syndrome of MOPS, accounting for 25.02%. The remaining 764 cases were non syndrome of MOPS, representing 74.98 percent of all cases. Among them were 96 cases of Yin blood deficiency syndrome, 90 cases of Qi deficiency and blood stasis syndrome, 76 cases of phlegm stagnation syndrome, 69 cases of spleen deficiency and Qi stagnation syndrome.

Data preprocessing

We used the EpiData 3.1 software to create a database after preliminarily sorting out the basic information of the patients' clinical medical records, data on diagnostic factors in TCM and data on the syndrome differentiation and classification of diseases. Two doctors input the data independently on two computers to reduce data error. TCM symptom terms of the study were standardized according to the Study on Standardization Status of TCM Terms [35]. Value 1 was assigned to any reported symptom and value 0 is assigned to symptoms that did not appear. Invalid data were cleared out and cases with incomplete records were removed so that a total of 89 diagnostic factors in TCM without missing items were identified, including chest tightness, wheezing, dark purple tongue, etc. Then we used the R package "stats" (version 3.6.3)[36, 37] to run a multiple linear regression analysis on these 89 diagnostic factors to rank them for feature importance and screen out 36 important factors. These factors were then used as input parameters for the following prediction models.

The architecture of the models

The fully connected ANN for all models was built based on TensorFlow and Keras, which are popular deep learning frameworks. Considering that a large number of parameter inputs may cause the model diagram automatically drawn by the package of R not to be fully displayed, schematic diagrams were used to describe the architecture of the models in this paper.

Parameters setting of the models

The activation function of the hidden layer is “ReLU” [38] and the activation function of the output layer is “sigmoid” [39]. The weights of the neural network were determined by minimizing the loss value through gradient descent in a process called standard back propagation [40]. The gradient descent algorithm employed was "Adam" [41], and the learning ratio was set to 0.001. Binary cross_entropy (BCE) [42] was set as the loss function (Formula 1), which is often used for binary classification problems [43].

$$Loss = - \frac{1}{output\_size}\mathop \sum \nolimits_{i = 1}^{output\_size} y_{i} \cdot {\text{log}}\hat{y}_{i} + \left( {1 - y_{i} } \right) \cdot {\text{log}}(1 - \hat{y}_{i} )$$
(1)

The “sklearn” [44], a random number generator in python, was used to randomly divide the data set of 1019 patients. 60% of the data (611 cases) were distributed to the training set, 20% of the data (204 cases) were distributed to the validation set, and 20% of the data (204 cases) were distributed to the test set. The training set optimization was used to tune model parameters. The validation set was used to test the model during training. The test set was used to test the model after training to evaluate the accuracy of the model.

The model.fit function (https://www.tensorflow.org/guide/keras/customizing_what_happens_in_fit) in TensorFlow was used to train models. The parameters were set as follows.

The class weighting scheme [45] is introduced in the training process for handling the imbalanced data. Tensorflow offers a parameter called class_weight in model.fit function that allows to specify the weights for each of the target classes to ensure the predictive performance for imbalanced data. The batch_size in training was set to 128, and the training Epoch was set to 50. To prevent overfitting, the "early stop" method was used, i.e., the training would stop when the accuracy of the verification set did not increase for ten consecutive trainings.

Data statistics of the models

The performance of the constructed model was evaluated by calculating the performance of the models on the test set. The statistical indicators were the number of true positive samples (TP), false positive samples (FP), true negative samples (TN), false negative samples (FN), loss, accuracy, precision and recall. Based on the above data, the confusion matrixes, PR and ROC curves were drawn, and the value of PRC (area under PR curve) and AUC (area under ROC curve) were calculated.

Calculating the factor importance of the diagnostic factors

The significance of the 36 factors that had been previously filtered out was calculated and visualized analysis was conducted using the Permutation feature importance (PFI) method[46, 47]. The R packages "pheatmap" [48] and "RColorBrewer" [49] were used to normalize the values and plot heat and cluster diagram for factor significance.

Results

Screening of the diagnostic factors in TCM

As mentioned before, we ran a multiple linear regression analysis on these 89 diagnostic factors in TCM with the R package “stats”, screening out 36 significant factors. These 36 factors were used to train and develop the models.

Baseline situation of base variables and screened diagnostic factors

The mean age of participants at baseline was 64.4 years, and 525 (51.52%) were males. There were 525 males (51.52%) and 494 females (48.48%). The male to female ratio is 1.06:1. 403 (39.55%) had a smoking history, 272 (26.93%) had a drinking history, 640 (63.94%) had a family history, 680(66.80%) had coronary heart disease and 795(78.10%) had hypertension. 445(43.71%) were complicated with diabetes, 392(38.55%) were complicated with cerebral infarction. See Table 1 for details (Table 1: Baseline situation of the 1019 cases with dyslipidemia). (In Table 1, the variable column for smoking has values ranging from 0 to 6. 0 represents smoking within 6 months, 1 represents smoking within 6 months to 5 years, 2 represents smoking within 5–10 years, 3 represents smoking within 10 to 15 years, 4 represents smoking within 15–20 years, 5 represents smoking for over 20 years, and 6 represents an unspecified time period.)

Table 1 Baseline situation

Model construction for the prediction of dyslipidemia with MOPS

The number of neurons of each model can be seen on Table 2. We calculated performance indicators of the models such as accuracy, precision, recall, AUC, PRC and time cost for ten training sessions (timeCost10). Each model was trained ten times with data set randomly divided. The above indicators of each model were averaged to evaluate the impact of the number of hidden layers and neurons on the models’ performance, as shown in Fig. 3.

Table 2 Structure of the predictive models. The hidden layers of the models were set to 0, 1, 2, 3, and 4, respectively, and the number of neurons in each hidden layer was set to 0, 8, 16, 32, 64, 128, et al.
Fig. 3
figure 3

The training results of the 20 predictive models. The curves in the figure represent the loss, accuracy, precision, recall, AUC, PRC, and timeCost10 for model-1 to model-20, respectively

As is shown in Fig. 3, from model-1 to model-5, accuracy, precision, recall, auc, and prc kept increasing while loss kept decreasing. And from model-5 to model-20, these indicators stayed steady. Time cost for ten trainings mostly remained steady across models. From model-1 to model-5, the number of hidden layers of the model increases from 0 to 1, and the number of neurons in the hidden layer also increases, which indicates that the model with 0 hidden layers does not predict as well as the model with 1 hidden layer, and within a certain range of the number of neurons in the hidden layer, the increase in the number of neurons helps to improve the prediction of the model. However, as the number of hidden layers and hidden layer neurons continues to increase, the performance of the model does not continue to improve but remains at a high level and does not increase. In general, for model training using TensorFlow, the training sample size needs to be adapted to the number of hidden layers and neurons, and it is not advisable to use complex neural network structures for training small samples, and similarly, it is not advisable to use simple neural network structures for training large samples. Figure 3 shows that models 5–20 are more appropriate with the sample size of 1019 cases used in this study. So we will evaluate and select the optimal model from model-5 to model-20.

Performance analysis of the prediction models for dyslipidemia with MOPS

The models were divided into five groups based on their number of hidden layers (0, 1, 2, 3, 4). Then we selected the model with the highest precision in each group: model-1, model-8, model-11, model-16 and model-18. Using the training data with optimal loss value, we evaluated the performance of the five models in the test set in 50 Epochs of training.

We calculated the confusion matrix of each model, as shown in Fig. 4. The sum of TN and TP samples meant the number of correct predictions by the model. Model-1, with 0 hidden layers, had 143 TN and TP samples (Fig. 4a). Model-8, with 1 hidden layer, had 180 TN and TP samples (Fig. 4b). Model-11, with 2 hidden layers, had 180 TN and TP samples (Fig. 4c). Model-16, with 3 hidden layers, also had 180 TN and TP samples (Fig. 4d). Model-18, with 4 hidden layers had 178 TN and TP samples (Fig. 4e). These results indicate that models with at least one hidden layer have better prediction performance than those with 0 hidden layer.

Fig. 4
figure 4

Confusion matrix of the five models a Confusion matrix of model-1 b Confusion matrix of model-8 c Confusion matrix of model-11 d Confusion matrix of model-16 e Confusion matrix of model-18 (Note: The upper left corner of the confusion matrix represents the number of TN predicted by the model, the lower right corner represents the number of TP, the upper right corner represents the number of FP, and lower left corner represents the number of FN)

We plotted PR curves and ROC curves to evaluate the prediction performance of the models. As shown in Fig. 5a, b, the closer the PR curve is to the right, the better the model's performance; the closer the ROC curve is to the left, the better the model's performance [50]. Furthermore, the discrimination ability of the models was evaluated by the area under the curve of the PR and ROC. PRC is the area under the PR curve. AUC is the area under the ROC curve. When the values of PRC and AUC are greater than 0.5, it means the model performs well. The closer these values are to 1, the better the model's performance [50, 51]. The PRC and AUC values show that all the five models performed well, while Model-11 performed the best. In summary, all the five models had good prediction performance. Among them, those with multiple hidden layers performed better than those with 0 hidden layer. And Model-11 was optimal in performance.

Fig. 5
figure 5

a PR curves (PRC) of the 5 models b ROC curves (AUC) of the 5 models

The significance analysis of diagnostic factors for dyslipidemia with MOPS

In this study, the above five models were used to evaluate the significance of diagnostic factors for dyslipidemia with MOPS. We calculated the significance values of the diagnostic factors with PFI. The results are shown in Fig. 6. From the heat diagram, it can be seen that diagnostic factors, such as dark purple tongue, slippery pulse, slimy fur, expectoration, petechiae of the tongue, were of high significance. From the hierarchical cluster diagram, it can be seen that dark purple tongue, slippery pulse, slimy fur, expectoration and so on were on the higher hierarchy, indicating greater importance. According to this, we believe that standardized and objective diagnostic rules for dyslipidemia with MOPS can be constructed based on dark purple tongue, slippery pulse, slimy fur, expectoration, petechiae of the tongue.

Fig. 6
figure 6

Heat and cluster diagram of diagnostic factors on five screened models Note the number in the box is the value obtained by normalizing the importance value of the diagnostic factor calculated using PFI, the value ∈ [− 4, 4)

It is worth mentioning that in the clinical diagnosis scale of MOPS compiled by Fang Ge et al. [52], dark purple tongue, slippery pulse and slimy fur are also in the high-frequency vocabulary of this syndrome. It means that from the perspective of clinical observation, dark purple tongue, slippery pulse and slimy fur are also of great significance in the differentiation and classification of dyslipidemia with MOPS, which is highly consistent with the prediction results of the models.

Discussion

Dyslipidemia has drawn extensive attention due to its important clinical significance. There is no disease term for dyslipidemia in TCM, but it is currently considered to be close to the concepts of “phlegm turbidity” and “cream fat” in TCM. Although dyslipidemia has a variety of classification in TCM, its basic pathological characteristics are phlegm and blood stasis as manifestation, deficiency as the root cause, and mutual obstruction of phlegm and blood stasis as the problem [19]. This description is also consistent with this study's finding that MOPS is the most frequent among all the classifications.

The process of diagnosis in TCM is essentially a classification problem. However, TCM diagnosis relies heavily on the doctors' experience, which is highly subjective. Standardized diagnostic rules cannot be formed, which makes it difficult to standardize and promote the characteristics of TCM diagnosis and treatment.

This study used deep learning to train and construct a prediction model of dyslipidemia based on the clinical data in TCM, so as to confirm the feasibility of inputting the diagnostic factors in TCM (such as dark purple tongue, chest tightness, etc.) into the model to predict whether the patient has dyslipidemia with MOPS, simulating the process of clinical syndrome classification of dyslipidemia in TCM.

One advantage of this model is that it can help solve the problem of the lack of objectivity of TCM in clinical diagnosis. Another significant advantage of this model is the high accuracy, which will reduce the workload caused by clinical misdiagnosis. In addition, the model efficiently uncovers and utilizes the hidden rules and patterns buried in a large amount of clinical data so that standardized and objective diagnostic rules for dyslipidemia with MOPS can be constructed.

To sum up, in this study, we constructed prediction models for dyslipidemia with MOPS through deep learning method with a large amount of multi-centered clinical data. We further evaluated the performance of the models. Results of the study show that the models performed well in predicting dyslipidemia with MOPS, and the model-11 is the optimal model. In the meantime, diagnostic factors in TCM, such as dark purple tongue, slippery pulse and slimy fur, were screened out as significant factors and diagnostic rules for the diagnosis of MOPS. The study is an avant-garde attempt at introducing the deep-learning method into the research of TCM, contributing to the standardization and objectiveness of TCM diagnosis for dyslipidemia.

Strengths and limitations

As far as we know, this study is the first to use a diagnostic model based on deep learning to predict whether patients have dyslipidemia with MOPS, so as to guide the corresponding treatment in TCM. Although this is just a small step ahead, it reveals the great potential in applying the deep learning method to clinical data mining and diagnosis, which will help clinicians reduce subjectivity and improve stability in clinical diagnosis.

Unlike traditional linear models, the prediction model based on deep learning is a nonlinear “black box”. Although the “black box” can provide more accurate prediction results, its opaqueness and lack of clinical interpretability may lead to some restrictions to applying the deep learning method. In addition, due to personnel and funds limitations, we cannot collect more data for external verification. In the next research, we plan to carry out clinical data collection of larger samples to verify further, improve and modify our prediction model for even stronger prediction performance.

Conclusions

This study proved the feasibility of constructing a diagnostic prediction model based on the deep learning method to predict whether patients have dyslipidemia with MOPS to guide the corresponding TCM treatment. The model-11 is the optimal model, with a high level of accuracy, and it provides clinicians with a more objective and stable guide for diagnosing and treating dyslipidemia with MOPS in TCM.