Introduction

According to estimates by the World Health Organization, stroke is the second leading cause of death that will account for 7.8 million deaths and 23 million first-time ischemic stroke events by 2030 [1]. Many risk factors for stroke, such as hypertension, dyslipidemia, diabetes, smoking, and alcohol consumption, have been identified [2]. With rising levels of prosperity and an aging population, the prevalence of hypertension in China has increased from 23.4% in 1991 to 28.6% in 2011 (concerning approximately 300 million adults), which places a huge burden on public health resources [3]. Hypertensive patients commonly suffer acute ischemic strokes, especially among the elderly with multiple risk factors.

Considering the high fatality and disability rates resulting from stroke, we intended to develop a practical prediction model by integrating the common risk factors observed in the clinic. It is beneficial to estimate the risk of acute ischemic stroke in geriatric patients with primary hypertension so that appropriate preventive measures can be taken. Nomograms have been widely used for medical diagnosis and prognosis evaluation in recent years [4, 5] for their user-friendliness. Our aim was to provide an individualized clinical decision tool for physicians.

Materials and methods

Study design and data source

This retrospective file review entailed the extraction of information on geriatric patients who were older than 60 years [6] and diagnosed with primary hypertension, whether or not they suffered an acute ischemic stroke, from the electronic medical record database of the affiliated hospital of Guangdong medical university from October 2018 to May 2020. Patients with detailed clinical information, biochemical, and imaging examinations were included in the study. The diagnosis of acute ischemic stroke was based on neuroimaging.

This resulted in the files of a total of 1367 patients being analyzed in this retrospective study and randomly divided these into a training set and a testing set in a ratio of 70 to 30%.

Study variables

A total of 15 risk factors associated with stroke were included in the study based on the literature [1, 7,8,9] and are listed in Table 1. Risk factors are indicators that can be easily assessed in clinical practice. All the risk factors were transformed into categorical variables to develop a nomogram. With this model, the sample size should be at least ten times greater than the number of variables [11].

Table 1 The risk factors with a definition in this study

Statistical analysis

All variables were expressed as counts (%). Statistical analysis was performed using R software 3.6.1(http://www.R-project.org/). The risk factors showing a P-value < 0.05 in the Chi-square test were regarded as statistically significant. Multivariable logistic regression analysis was used to identify the optimal variables for the construction of the prediction model. These variables were expressed as odds ratios (ORs) with 95% confidence intervals (CIs) and P-values. The area under the curve (AUC) and calibration curves were used to assess the performance of the prediction model. A nomogram was developed to visualize the prediction model in a user-friendly manner [12, 13].

Furthermore, we applied four machine-learning classifiers (random forest, support vector machine with polynomial kernel, support vector machine with radial basis function kernel, and backpropagation neural network) using JupyterLab 1.2.6 (https://jupyterlab.readthedocs.io/en) to compare the results with the multivariable logistic regression model. The best combination of parameters of the machine learning algorithms was identified based on the highest log-likelihood. The average log-likelihood over five repetitions of fivefold cross-validation was used to select the optimal parameters [14].

Results

Baseline characteristics and optimal risk factors identification

Among the 1367 patients diagnosed with primary hypertension between October 2018 and May 2020 in this study, 437 had suffered an acute ischemic stroke. A total of 959 patients were assigned to the training set and 408 to the testing set. Detailed information about the characteristics of patients in the total cohort and the training set are shown in Tables 2 and Table 3, respectively.

Table 2 Baseline characteristics of the total cohort
Table 3 Baseline characteristics of the training set

There were nine variables (gender, smoking, alcohol abuse, blood pressure management, a history of stroke, diabetes, carotid artery stenosis (CAS), total cholesterol, and LDL-cholesterol) with statistically significant differences (P < 0.05) in the chi-square test. Six variables (smoking, alcohol abuse, blood pressure management, stroke history, diabetes, CAS) showed a statistically significant difference (P < 0.05) in the multivariable logistic regression analysis. The results of the multivariable logistic regression analysis are displayed as forest plots in Fig. 1.

Fig. 1
figure 1

The risk factors in multivariable logistic regression analysis. Notes: OR = odds ratio, CI = confidence interval

Construction and assessment of the prediction nomogram

The prediction model was constructed by multivariable logistic regression based on the six identified variables (smoking, alcohol abuse, blood pressure management, stroke history, diabetes, CAS). The nomogram in Fig. 2 visualizes the model in a user-friendly manner.

Fig. 2
figure 2

The nomogram for estimating risk of acute ischemic stroke

Nomogram interpretation: The observed value of each feature variable was assigned a certain number of points by drawing a vertical line towards the top points scale. The sum of the points for each variable corresponded to the individual risk of acute ischemic stroke. If we assume that a geriatric patient has a history of ischemic stroke, smoking and poor blood pressure management, but no alcohol abuse or carotid stenosis, we can calculate the score of each feature of the patient according to the value of each variable: smoking (68 points) + history of ischemic stroke (54 points) + poor blood pressure management (100 points) + without alcohol abuse or carotid stenosis (0 points) =222 total points. From the total points scale, a line perpendicular to the acute ischemic risk scale at the bottom shows that the probability of acute ischemic stroke occurrence is about 75%.

The AUC of the prediction model was 0.736 in the training set, while the AUC after 1000-times bootstrap resampling was 0.730 and 0.725 in the external verification using the testing set (Fig. 3). The calibration curve illustrated an overlap between the probabilities of the predicted and actual diagnosis of stroke in both the training set and the testing set (Fig. 4).

Fig. 3
figure 3

ROC curve of the nomogram. Notes: The ROC curves of the training set and testing set. The AUC of the training set is 0.736 and 0.725 in the testing set

Fig. 4
figure 4

Calibration curve of the nomogram. Notes: The x-axis represents the risk predicted by the nomogram. The y-axis represents the patients diagnosed with acute ischemic stroke. The diagonal dotted line represents a perfect prediction by an ideal model. The apparent line represents the performance of the nomogram

Multivariable logistic regression analysis and machine learning

We constructed the prediction model based on the same variables using the five different algorithms, and verified them using the testing set. The multivariable logistic regression analysis and support vector machine with radial basis function kernel both achieved an AUC score of 0.71 that was better than the other three prediction models (Fig. 5).

Fig. 5
figure 5

ROC curve of the machine learning and multivariable logistic regression. Notes: LR = logistic regression, RF = random forest, Poly SVM = support vector machine with polynomial kernel, RBF SVM = support vector machine with radial basis function kernel, BPNN = backpropagation neural network

Discussion

This study developed a practical nomogram that includes six variables that can be easily identified in the clinic to assist physicians in discriminating patients with high risk of stroke, enabling them to implement preventive measures as early as possible.

Blood pressure management is the most important variable that has a positive effect on stroke. With aging, the vascular elasticity decreases as a consequence of atherosclerosis. Thus, it is recommended that the systolic blood pressure in the elderly is less than 150 mmHg [15]. A meta-analysis reported that there was a 41% reduction in stroke for every blood pressure reduction of 10 mmHg systolic or 5 mmHg diastolic [16]. Although various hypertension guidelines indicate a certain goal of blood pressure control, few large-scale clinical evidence-based data focus on hypertension or stroke in very elderly patients. Professional doctors should be aware of this practical clinical problem and pay attention to the notion of individualized blood pressure management in elderly patients [17], without ignoring the symptoms and feelings of very elderly patients. In addition to the absolute value of blood pressure, blood pressure variability deserves attention. Excessive blood pressure fluctuation in the morning is a classic phenomenon. Kario used ambulatory blood pressure monitoring and magnetic resonance imaging and demonstrated that an exaggerated early morning blood pressure surge was independently associated with stroke in elderly hypertensive patients. The risk of stroke in patients with a morning blood pressure surge > 55 mmHg was 2.7 times higher than that in patients with a morning blood pressure surge < 55 mmHg. Pierdominico reached a similar conclusion that stroke had a relationship with an exaggerated early morning blood pressure surge independent of the 24-h average blood pressure [18, 19].

Smoking and alcoholism are controllable risk factors for stroke. Both played an important role in our prediction model, and these were valid for more than 90% of the males in our cohort. A large number of clinical studies in different races and populations have confirmed the strong association between smoking and stroke, while exposure to secondhand smoke should also be noted. Current smokers are at least two-to-four times more likely to have a stroke than those who never smoked or those who quit smoking 10 years ago [20]. Some epidemiological studies have demonstrated that the impact of drinking on stroke risk depends on the quantity. A small amount of red wine may reduce the risk of cardiovascular disease and stroke. However, alcohol abuse (> 60 g/day) is associated with an increased risk of stroke in the long term [21, 22].

CAS is a marker of systemic atherosclerosis that can be easily detected by ultrasound. According to studies from the 1980s, the annual risk of ipsilateral stroke was 3% in patients with a CAS ≥ 50%, which increased to 5.5% in patients with a CAS > 75%. With the widespread use of preventive drugs, the annual risk of stroke has been reduced to 0.34% for patients with a CAS ≥ 50% in contemporary studies [23, 24].

Other risk factors that are not included in our nomogram, such as age, total cholesterol and LDL-cholesterol [25,26,27], were proven to be related to stroke by an abundance of clinical trials and should be considered by clinicians. It is worth noting that elderly patients usually present with multiple chronic diseases, such as hypertension, diabetes and coronary heart disease. The risk of ischemic stroke caused by pathological changes of organs caused by these diseases may be more serious than that caused by physiological aging [28]. Additionally, elderly patients often do not adhere to prescribed treatments. The direct visual display of the nomogram model can play a role in educating elderly patients and increase their compliance to treatment.

In the era of artificial intelligence, machine learning has become a popular method in data analysis. It utilizes mathematical models and training data to make predictions [29, 30]. The random forest, support vector machines, and backpropagation neural networks are three representative algorithms of machine learning that are increasingly used in the prediction of adverse events in clinical practice or biological research in tumor [31, 32]. Although these machine learning algorithms have attracted much attention with the availability of increasingly voluminous datasets (such as electronic medical records), the internal process of which is similar to a “black box” with poor interpretability and visualization, limit their practical application.

In a number of reports, the results of multivariable logistic regression analysis as the classic reference standard were compared with those of machine learning algorithms. In our study, the machine learning algorithms offered no obvious advantage over multivariable logistic regression in evaluating a binary categorical problem (whether or not patients will suffer an acute ischemic stroke). This conclusion is the same as that of several recent studies [14, 33].

Our prediction model based on multivariable logistic regression analysis not only has considerable accuracy but also can be visualized by a nomogram, which is convenient for its clinical application.

Limitations

This study was a single-center retrospective study, which limits its generalizability. As a retrospective study, potential selection bias was inevitable. Furthermore, there are numerous other stroke-related risk factors, such as the body mass index, diet habits, and physical exercise, that were not analyzed because they were not reported in the electronic records of patients.