Background

Low-density lipoprotein cholesterol (LDL-C) has been long regarded as one of the major pathogenic risk factors that increase the risk of cardiovascular diseases (CVD) and cerebrovascular diseases [1]. LDL-C lowering therapy has been demonstrated to be effective in reducing atherosclerotic disease risk substantially [2]. The LDL-C concentration of 70 mg/dL was considered as an appropriate target goal for optimal lipid management in people who are at high risk of CVD [3,4,5]. However, emerging observational evidence, suggested that the risk of ischemic stroke [6, 7] and hemorrhagic stroke [8, 9] remained high among those who had low concentrations of LDL-C. Little is known about the effect of longer-term habitual cumulative exposure to very low LDL-C concentrations (e.g., < 40 mg/dL).

It is thus of clinical significance to understand the factors related to the risk of cerebrovascular diseases in the population with low LDL-C concentrations. Whether other metabolic abnormalities (e.g., hypertension, diabetes mellitus, and obesity) contribute to the risk of stroke within the context of a low LDL-C concentration remains unclear. Important to consider, LDL-C concentrations and metabolic and lifestyle risk factors covary and may together have a synergistic or antagonistic effect on stroke-related outcomes.

Recently, machine learning techniques have been widely used for developing risk stratification algorithms due to their intuitive graphical representation [10]. Conditional inference tree is a fundamental machine learning method that recursively partitions participants into the homogenous group with similar outcome probabilities [11], to identify variable importance in the context of high-dimensional interactions [12]. We thus sought to prioritize the strong predictive risk factors of ischemic stroke risk and hemorrhagic stroke risk using a survival conditional inference tree (SCTREE) in a community-based cohort including 9327 participants with LDL-C concentrations < 70 mg/dL during an 8.5–9.0-year follow-up period. We further validated our findings in another independent cohort including 1753 participants with LDL-C concentrations < 70 mg/dL.

Methods

Study populations

We analyzed data from two independent ongoing cohorts—the Kailuan I study was used as the training dataset to develop the risk stratification algorithm, and the Kailuan II study was used as the validation dataset. The study design of these two cohorts has been described in detail previously [13, 14]. Briefly, both cohorts have been conducted in 11 hospitals affiliated with the Kailuan community in Tangshan city, China. The Kailuan I study was initiated in 2006–2007 and consisted of 101,510 Chinese adults (81,110 men and 20,400 women) aged 18 years or older living in the Kailuan community in 2006. The Kailuan II study was initiated in 2008–2010, including 35,856 adults, who lived in the Kailuan community but did not participate in the Kailuan I study. In both cohorts, participants who completed a questionnaire on demographic details, lifestyle behaviors (e.g., smoking and drinking habits), medication history, and medical comorbidity and underwent clinical and laboratory examinations at baseline and were followed every 2 years with the same strategy to update their health/lifestyle status. Included were 9327 participants in the training dataset and 1753 participants in the validation dataset based on the following criteria: (1) baseline LDL-C concentrations < 70 mg/dL, (2) cumulative average LDL-C concentrations < 70 mg/dL during the follow-up period (mean follow duration 9.0 and 8.5 years in the Kailuan I and II studies, respectively), (3) without CVD or cancer in or prior to the baseline, and (4) without lipid-modifying drugs at baseline or during the follow-up period (Supplementary Fig. 1).

Assessment of outcomes (incident cases of ischemic stroke and hemorrhagic stroke)

The primary outcome was the first occurrence of ischemic stroke and hemorrhagic stroke. As previously described [9, 15, 16], all potential fatal and non-fatal cerebrovascular diseases cases were identified by the relevant International Classification of Diseases (ICD)-10th Revision [17, 18] from the Municipal Social Insurance Institution (covering all study participants) and the Hospital Discharge Register data and self-report questionnaires during the biennial follow-up surveys. Medical records for all the potential stroke cases were reviewed by 3 cardiologists and neurologists served at a committee of experts. The mortality information was obtained from Hebei Provincial Vital Statistics Offices or directly contacting the participants’ family members. Study clinicians reviewed death certificates and coded the main cause of death according to the ICD-10. Ischemic stroke and hemorrhagic stroke were defined as a neurological deficit of cerebrovascular cause that lasted more than 24 h or a significant lesion detected by computed tomography or magnetic resonance imaging [19].

Assessment of potential predictors

Potential predictors include age, sex, smoking, alcohol intake, physical activity, body mass index, estimated glomerular filtration rate, urine protein, high-sensitivity C-reactive protein, lipid profiles, heart rate, blood pressure, and blood glucose control status (Supplementary Table 1). To take advantage of biennially repeated assessment of predictors, we used cumulative average values of LDL-C concentrations and other continuous variables calculated from all available measures since the baseline survey, as previously described [9, 13, 20]. For instance, the average of 2006 and 2008 LDL-C concentrations was used to predict stroke events occurring during 2008–2010; and the average of 2006, 2008, and 2010 measures was used to predict stroke occurring during 2010–2012. This approach allowed us to reduce random within-person variation and capture the long-term effects of studied stroke risk factors.

Information on demographic data, lifestyle factors, and use of medications (e.g., hypoglycemic agents and antihypertensives) was collected using a structured questionnaire [21]. Fasting (8–12 h) blood samples and random midstream morning urinary samples were collected at baseline and biennial face-to-fact interview and analyzed in the Central Laboratory of Kailuan General Hospital every 2 years. Serum concentrations of 6 traits (LDL-C, high-density lipoprotein cholesterol, triglyceride, glucose, high-sensitivity C-reactive protein, and creatinine) were measured by an auto-analyzer (Hitachi 747; Hitachi, Tokyo, Japan) using commercially available kits, as previously described [13, 20, 21]. The intra-assay and the inter-assay coefficient variation of all the traits were less than 10%. Proteinuria status was assessed using a dry-chemistry method and standard urinary sediment examination within 2 h (H12-MA test strips; Changchun Dirui Medical Technology Co., Ltd., Changchun, China) and measured by a urine analyzer (N-600; Changchun Dirui Medical Technology Co., Ltd.). The results were semi-quantified as negative (< 15 mg/dL), trace (15–29 mg/dL), 1+ (30–300 mg/dL), 2+ (300–1000 mg/dL), or 3+ (> 1000 mg/dL) [22]. The estimated glomerular filtration rate (eGFR) was calculated according to the Chronic Kidney Disease Epidemiology Collaboration equation considering creatinine, sex, and age [23].

Weight and height were measured by trained field workers (nurses and physicians) during the survey, and body mass index was calculated as weight (kg)/height (m2).

Systolic blood pressure (SBP) and diastolic blood pressure (DBP) were measured twice from the seated position using a mercury sphygmomanometer, and the mean of the two readings was used for the analyses [14, 24]. Blood pressure control status was classified as follows: (1) well-controlled, SBP < 140 mmHg and DBP < 90 mmHg without treatment; (2) well-controlled, SBP < 140 mmHg and DBP < 90 mmHg with certain or uncertain information on drugs; (3) poorly controlled, SBP ≥ 140 mmHg or DBP ≥ 90 mmHg without treatment; and (4) poorly controlled, SBP ≥ 140 mmHg or DBP ≥ 90 mmHg with certain or uncertain information on drugs. Heart rate was detected by electrocardiogram at baseline and during follow-up surveys, as described previously [13, 24].

Given a long-term effect of high hyperglycemia on cerebrovascular disease occurrence, we classified all participants into 4 categories according to their glycemic control levels: (1) well-controlled, fasting blood glucose (FBG) < 126 mg/dL without administration of glucose-lowering drugs; (2) well-controlled, FBG < 126 mg/dL with certain or uncertain information on drugs; (3) poorly controlled, FBG ≥ 126 mg/dL without treatment; and (4) poorly controlled, FBG ≥ 126 mg/dL with certain or uncertain information on drugs (Supplementary Table 1).

Statistical analysis

The person-time for each participant was calculated from the date of the baseline survey to the date of any stroke event diagnosis, lost to follow-up due to migrations or other reason (8.53%), mortality, or the end of follow-up, 31 December 2016, whichever came first.

The SCTREE model was used to develop a risk stratification algorithm for stroke risk in the Kailuan I study with 15 candidate attributes (Supplementary Table 1). SCTREE recursively partitions the dataset into smaller subsets for selecting the top predictor and corresponding cutoff value with the largest weighted Kaplan-Meier estimate. The incidence of stroke cases and survival time within each terminal node were calculated to generate an associated risk stratification tree.

The hazard ratios (HRs) and 95% confidence intervals (CIs) were calculated to compare the stroke risk across the risk groups generated by SCTREE. The performance of the SCTREE-developed risk stratification algorithm was further evaluated and compared using the validation dataset (the Kailuan II study).

We also conducted multivariate Cox regression and propensity score-matched Cox regression including the same predictors that were identified by the SCTREE analysis. The predictive ability and accuracy of the SCTREE and multivariate Cox models were compared using area under the receiver operating characteristic curves (AUC) [25] and Brier score [26]. A higher AUC with a lower Brier score was considered as a better prediction performance.

A Cox proportional hazards model was used to assess the association between the cumulative average values of LDL-C according to predefined groups with clinically meaningful cutoffs (≤ 40 mg/dL, 40–55 mg/dL, and 55–70 mg/dL) [27] and quartiles, and stroke risk. We also conducted other sensitivity analyses by excluding participants with eGFR < 60 ml/min/1.73 m2 or a 10-year Framingham risk score > 30% [28]. Considering malnutrition, as suggested by low BMI, which was associated with low concentrations of LDL-C and increased risk of stroke, we performed a sensitivity analysis by excluding participants with BMI < 18.5 kg/m2 [29]. To remove the confounding effect of treatment of hypertension or diabetes mellitus, we further performed sensitivity analyses by excluding those who used blood pressure-lowering drugs or glucose-lowering drugs.

We used the random survival forests algorithm on all 15 variables to validate the risk precision of the SCTREE model. The variables at a higher rank had a smaller minimal depth of a maximal subtree (a shorter distance from the root node to the parent node of the closest maximal subtree). We extracted the variable importance (VIMP) of each individual predictor to reflect the predictive abilities of the variables identified by the random survival forests algorithm [30]. Since VIMP is the increase of prediction errors after permuting the variable under consideration, a positive VIMP value indicates the variable improves the prediction accuracy, and a negative VIMP value indicates the variable leads to overfitting [31].

All statistical analyses were conducted using the R version 3.6.3 software (R Foundation for Statistical Computing, Vienna, Austria) and STATA12.0 (Stata Corporation, College Station, TX, USA). All statistical tests were 2-sided with a P value < 0.05 regarded as significant.

Results

The cumulative average LDL-C concentrations of the training and validation datasets were similar. In contrast, participants in the validation cohort were younger and had a higher proportion of women, smoker, and drinker; low BMI; high level of heart rate and eGFR; high concentrations of triglyceride and LDL-C; and low concentrations of high-density lipoprotein cholesterol and engaged in a low level of physical activity and well-controlled blood pressure and blood glucose (Table 1). Identified were 388 ischemic stroke cases and 145 hemorrhagic stroke cases in the Kailuan I study with a mean of 9.0 years of follow-up and 20 ischemic stroke cases and 8 hemorrhagic stroke in the Kailuan II study with a mean of 8.5 years of follow-up.

Table 1 Baseline characteristics of Kailuan I study (training dataset) and Kailuan II study (validation dataset) participants with low-density lipoprotein cholesterol concentrations < 70 mg/dL

Of the 15 variables that were examined, the first risk factor identified was blood pressure control status, followed by age and LDL-C concentrations for ischemic stroke. Participants (i) with poorly controlled blood pressure and LDL-C concentrations ≤33.2 mg/dL and (ii) with well-controlled blood pressure, aged > 64.9 years, and LDL-C concentrations ≤32.0 mg/dL had the highest ischemic stroke risk among the 9 sub-groups identified by the SCTREE model (Fig. 1). The HRs for these high-risk sub-groups, compared with the sub-group with the lowest stroke risk (well-controlled blood pressure and age ≤ 54.1 years), were more than 20 (P < 0.001 for all) (Table 2).

Fig. 1
figure 1

Conditional inference tree for ischemic stroke in individuals with low-density lipoprotein cholesterol concentrations < 70 mg/dL in the Kailuan I study. The terminal nodes show the Kaplan-Meier curves. BP, blood pressure; LDL-C, low-density lipoprotein cholesterol; UPRO, urine protein

Table 2 Adjusted hazards ratios and 95% confidence intervals for the risk of ischemic stroke and hemorrhagic stroke across terminal nodes in the Kailuan I participants with low-density lipoprotein cholesterol concentrations < 70 mg/dL

Poorly controlled blood pressure and low LDL-C concentrations were identified as the main top discriminators for hemorrhagic stroke (Fig. 2 and Supplementary Table 2). Participants with poorly controlled blood pressure and LDL-C concentrations ≤32.8 mg/dL had the highest hemorrhagic stroke risk compared to those with well-controlled blood pressure, LDL-C concentrations > 40.2 mg/dL, and aged ≤64.8 years (HR 41.7, 95%CI 17.2–101.6). Similar results were observed by excluding participants with eGFR < 60 ml/min/1.73 m2, 10-year Framingham risk score > 30%, BMI < 18.5 kg/m2, or who used blood pressure-lowering drugs or glucose-lowering drugs (Table 2).

Fig. 2
figure 2

Conditional inference tree for hemorrhagic stroke in individuals with low-density lipoprotein cholesterol concentrations < 70 mg/dL in the Kailuan I study. The terminal nodes show the Kaplan-Meier curves. BP, blood pressure; LDL-C, low-density lipoprotein cholesterol; eGFR, estimated glomerular filtration rate

The Kailuan I study participants were stratified into high (> 5% developed ischemic stroke; n = 4548), intermediate (3–5% developed ischemic stroke; n = 2840), and low (< 3% developed ischemic stroke; n = 1939) risk groups. The ability of the derived risk tree to stratify participants into these groups was tested in the validation dataset (the Kailuan II study). A similar dose-response trend across the 3 risk groups for ischemic stroke was observed—the HRs for the high- versus low-risk groups were 7.03 (95%CI 5.01–9.85) in the training dataset and 4.68 (5%CI 1.58–13.9) in the validation dataset. The HRs for the high-risk group (> 2% developed hemorrhagic stroke, n = 5468) versus the low-risk group (< 2% developed hemorrhagic stroke, n = 3859) were 3.94 (95%CI 2.54–6.11) in the training dataset and 4.73 (5%CI 0.81–27.6) in the validation dataset (Fig. 3). The SCTREE model had a similar AUC and Brier score relative to the multivariate Cox model (Supplementary Table 3). The random survival forest analysis showed that blood pressure control and LDL-C concentrations were among the top predictors for both ischemic stroke and hemorrhagic stroke, and age was a strong predictor for ischemic stroke, which was consistent with the results of the SCTREE model (Supplementary Fig. 2). When we turned the outcomes into a classification one (stroke versus non-stroke event), the results did not materially change. Poorly controlled blood pressure and low LDL-C concentrations remained the top predictors of ischemic stroke and hemorrhagic stroke (Supplementary Fig. 3 & 4).

Fig. 3
figure 3

Percentages of participants who developed ischemic stroke and hemorrhagic stroke during follow-up and adjusted hazard ratios (HRs) and 95% confidence intervals (CIs) across the risk groups in individuals with low-density lipoprotein cholesterol concentrations < 70 mg/dL in the Kailuan I study and the Kailuan II study. IS, ischemic stroke; HS, hemorrhagic stroke

There was a linear association between cumulative average LDL-C concentrations and stroke risk. Very low LDL-C concentrations (< 40 mg/dL) were significantly associated with increased risk of ischemic stroke (HR 2.07, 95%CI 1.53, 2.80) and hemorrhagic stroke (HR 2.70, 95%CI 1.70, 4.30) compared to LDL-C concentrations of 55–70 mg/dL. These results were further confirmed using a quartile-based analysis (Ptrend < 0.01 for both) (Table 3).

Table 3 Adjusted hazards ratios and 95% confidence intervals for ischemic stroke risk and hemorrhagic risk according to low-density lipoprotein cholesterol clinical cutoffs and quartiles in Kailuan I study participants with low-density lipoprotein cholesterol concentrations < 70 mg/dL

Discussion

Using data from 2 community-based cohorts and a machine learning approach, we found that in participants with LDL-C concentrations< 70 mg/dL, and not receiving lipid-lowering therapy, the major attributes of stroke risk were very low LDL-C concentrations and poorly controlled blood pressure. The highest risk group, characterized by the presence of 2–3 of these risk factors, was at high risk of developing stroke during the 8.5- to 9-year follow-up period relative to the lowest risk group when predicted using either the training or the validation datasets. There was remarkable consistency between the two datasets. We further confirmed the association between low LDL-C concentrations and stroke risk using the traditional Cox proportional hazards model. For the primary prevention of stroke, these findings highlight the need for a better understanding of the influence of potential confounders in individuals with very low LDL-C concentrations in the absence of therapy.

The predictive models indicated that individuals with very low LDL-C concentrations without the influence of lipid-lowing drugs were still at elevated risk for stroke. One possible interpretation of these findings is that a very low LDL-C concentration is a marker of a chronic metabolic disorder and associated adverse sequelae of the disorder such as high inflammatory burden. Systemic chronic inflammation may lead to very low blood LDL-C concentrations by exacerbating cholesterol accumulation into macrophages [32]. Therefore, non-treatment and on-treatment low LDL-C concentrations may have different associations with cerebrovascular disease. Whether the observed results could apply to intervention trials remains to be elucidated.

Our study suggested the appropriate concentration of LDL-C below which stroke events identified was 33 mg/dL. Interestingly, this level is similar to the neonatal LDL-C concentrations at birth (21–39 mg/dL) [33]. Cholesterol is a constituent of cell membranes, hence, essential to maintain cellular structural integrity and serves as a precursor for bioactive compounds, ranging from steroid hormones to vitamin D. Plasma LDL-C concentrations of 21–39 mg/dL have been suggested to be the lower limit that will sustain normal cellular function [34,35,36]. In the Reasons for Geographical and Racial Differences in Stroke (REGARDS) study, participants with high LDL-C (≥70 mg/dL) and low hs-CRP (< 2 mg/L) had a lower risk of stroke [37]. A recent randomized trial reported that high-dose atorvastatin significantly reduced the overall incidence of stroke and CVD but increased the risk of hemorrhagic stroke [38]. Three recent large-scale observational studies reported that LDL-C concentrations < 70 mg/dL were positively associated with hemorrhagic stroke risk [8, 9]. The causal relevance of these observed associations between low LDL-C concentrations and hemorrhagic stroke was confirmed in a meta-analysis of LDL-C-lowering intervention and a Mendelian randomization analysis [39]. Interestingly, another Mendelian randomization analysis reported that decrement of the LDL-C concentrations may lead to decreased CVD risk but increased DM risk [40].

Of note, some guidelines [2, 41], although not consistently [27], comment on the potential adverse effect of very low LDL-C concentrations, in the range of 25 to 70 mg/dL, achieved with lipid-lowering therapy. The recent 2019 European Society of Cardiology/European Atherosclerosis Society lipid guidelines recommended a lower LDL-C goal (e.g., < 55 mg/dL) than the previous guidelines for individuals at very-high CVD risk [27]. The authors of the guidelines indicated there are no known adverse effects of LDL-C concentrations < 40 mg/dL [27].

Our result suggested poorly controlled blood pressure contributed to the risk of stroke in individuals with very low LDL-C concentrations. Poorly controlled blood pressure or glucose conferred 1.5–2-fold increased risk of stroke [42]. Individuals of Asian descent have a higher prevalence of metabolic syndrome than of Caucasians [41]. Solely increment in lowering LDL-C is not as effective in reducing atherosclerotic risk more in Asians compared to Caucasians. The Evaluation of Cardiovascular Outcomes After an Acute Coronary Syndrome During Treatment with Alirocumab study demonstrated that LDL-C lowering with alirocumab significantly reduced the primary CVD outcomes in North Americans (HR 0.78, 95%CI 0.65–0.94), but not in Asians (HR 1.03, 95%CI 0.76–1.38) [43].

Our study has several strengths, including its large sample size of participants with LDL-C concentrations < 70 mg/dL. The SCTREE analysis, beyond the traditional statistical analyses, provides a robust framework for testing attributes that are predictive of stroke risk taking the complex high-order interactions into account. We excluded people using lipid-modifying drugs to reduce the sources of potential confounding related to these medications. The ability to use cumulative average values for all continuous variables in the SCTREE model reduced the possibility of “regression dilution.”

Our study has several limitations. Our study is based on two Chinese cohorts, which limits the generalizability of our findings to other ethnic groups. Further, the mean age of the validation cohort (43.6 years) was much lower than that of the training cohort (57.3 years), which implied that a smaller proportion of high-risk participants was included in the validation analysis. The sample size of our study (n = 9327) was relatively small because of our strict inclusion criteria, and we thus identified a small number of incident ischemic stroke events (n = 388) and hemorrhagic stroke events (n = 145) during follow-up, which limited the detection of some potential weak-to-moderate predictors because of inadequate statistical power in each terminal node, the stopping rules or the competitive importance of the variables/pruning procedure. The small number of incident ischemic stroke events (n = 20) and hemorrhagic stroke events (n = 8) limited the detection of significant predictive combinations in the validation set, and we could not swap the train and validation set to confirm the results. However, similar associations were observed in both cohorts. We did not measure hemoglobin A1c because of its high cost as a screening test in the general population, and some individuals with poorly controlled blood glucose could be misclassified. The proportion of hemorrhagic stroke in our cohort with LDL-C concentrations < 70 mg/dL was higher than that in the general population [18] because the ischemic stroke events attributed to high LDL-C concentrations were excluded. Further study or publicly datasets with repeated LDL-C concentrations may help to validate our results.

Conclusions

In a Chinese population with LDL-C concentrations < 70 mg/dL, very low concentrations of LDL-C, incorporating poorly controlled blood pressure, and older age significantly predicted the occurrence of ischemic stroke and hemorrhagic stroke. Additional data are required to confirm our findings in a population with different ethnic and social-economic backgrounds.