Introduction

Cardiovascular disease and oculomics

Cardiovascular disease (CVD) is a profound global public health challenge and ranks among the primary contributors to the worldwide disease burden [1, 2]. CVD remains a leading cause of both mortality and morbidity on a global scale, taking an estimated 17.9 million lives each year, even with advancements in preventive strategies and therapeutic techniques. The assessment of cardiovascular risk assumes paramount importance in global public health.

Patients with CVD are more easily found to have metabolic abnormalities, such as insulin resistance, hyperglycemia, and dyslipidemia [3, 4]. Recently, some new indicators calculated through blood glucose and serum lipids, such as the triglyceride-glucose index (TyG-index) and the atherogenic index of plasma (AIP) demonstrated to be a higher correlation with CVD [5]. Furthermore, TyG-index and AIP are good indicators for cardiovascular risk [5, 6]. Different cut-off point values for TyG-index and AIP can reflect the cardiovascular risks of different groups of people.

Previous studies have confirmed that ophthalmology diseases are closely related to CVD. Eye is an organ that can directly reflect microvascular changes [7]. The pathogenesis of many CVDs also leads to ocular changes [8]. Therefore, it is feasible to use the pathology of eyes to predict CVD. Oculomics is a newly proposed concept in recent years [9]. It refers to the blending of big data, artificial intelligence (AI), and ocular imaging to identify retinal biomarkers of systemic disease [10, 11]. AI has been extensively employed in the medical field for several years, automatically uncovering intrinsic patterns and connections between data variables and related diseases [12,13,14,15]. In the present study, we leveraged data from the Korean National Health and Nutrition Examination Survey (KNHANES) to develop and validate a machine-learning model for predicting cardiovascular risk based on variables encompassing clinical questionnaires and oculomics, as shown in Fig. 1. This model effectively anticipates the range of TyG-index and AIP, which reflect the level of cardiovascular risks.

Fig. 1
figure 1

Central illusion. Machine-learning-based cardiovascular risk prediction using data extracted from oculomics and clinic variables in Korean National Health and Nutrition Examination Survey (KNHANES)

Materials and methods

Dataset

This study used a comprehensive health examination dataset based on the Korean National Health and Nutrition Examination Survey (KNHANES IV and V; available online at https://knhanes.kdca.go.kr/knhanes/eng) conducted from the year 2008 to 2012. The KNHANES surveyed both comprehensive ophthalmologic examinations and DEXA only during this period. The study protocol was approved by the Institutional Review Board of the Korean Center for Disease Control and Prevention (No. 2008-04EXP-01-C, 2009-01CON-03-C, 2010-02CON-21-C, 2011-02CON-06-C, and 2012-01EXP-01-2C), and data collection was approved by the Institutional Review Board of the Korean National Institute for Bioethics Policy. All participants signed consent forms for the use of their health information for data collection of the KNHANES. The KNHANES is a nationwide, population-based, cross-sectional survey conducted by the Division of Chronic Disease Surveillance of the Korea Centers for Disease Control and Prevention [16]. This project randomly selected all participants from 200 (2008-2009) and 192 (2010-2011) enumeration districts using stratified sampling in which the following factors were considered: population, sex, age, regional area, and type of residential area. The KNHANES comprises health records based on health interviews, health examinations, and nutrition surveys. Each participant completed a questionnaire containing information such as age, household income, alcohol use, smoking status, hypertension, and diabetes [17].

Fig. 2
figure 2

Schematic depiction of the data screening flow of the KNHANES from 2008 to 2012. Arrows indicate the data screening process. And the experimental process is described in the “Results” section. The details about the specific process of our experiment are shown in Supplementary Fig. S3

We conducted data curation based on the KNHANES dataset. Initially, individuals lacking either clinical variables or oculomics variables (N=13136), as well as those with missing TyG-index or AIP values (N=553), were excluded. Following this filtration process, a cohort of 32122 participants was included in this study. Then we utilized the KNHANES data from year 2008 to 2011 for model development and internal validation, exploring and verifying the associations between input variables and TyG-index as well as AIP. The model construction employed a five-fold cross-validation method. The training set, comprising 80% of randomly selected data from the year 2008 to 2011, was employed for machine learning model development, as the internal validation set encompassed the last 20% of randomly chosen data from this database. And the external validation set, sourced from the year 2012, was used to evaluate the model’s ability to predict AIP and TyG-index for previously unseen cases (Fig. 2). The code of this research is open source and can be accessed at “https://github.com/RickyZhang901/ML-Based-AIP_TYG”.

Key variables definition

The TyG-index was calculated as ln [TG (triglycerides) (mg/dL) \(\times\) FBG (fasting blood glucose) (mg/dL)/2], derived from previous studies [18, 19]. The AIP is a logarithmically transformed ratio of TG to HDL-C (High density lipoprotein cholesterol) in molar concentration (mmol/L), and it is mathematically derived from log (TG/HDL-C) [20]. Regarding the outcomes, we selected different cut-off points for TyG-index and AIP. Their different tangent values can reflect the cardiovascular risk of different groups of people. For the TyG-index endpoint, we chose 8.0 [21], the upper one-third value of TyG values (8.75) [22], and the upper one-fourth value of TyG values (8.93) [23]. As for the AIP endpoint, 0.318 [20] and 0.34 [24] were chosen as the cut-off points. Endpoint values beyond these numerical cut-off points were deemed to indicate high cardiovascular risk, while values below were considered low risk.

Ophthalmic examination

In this study, we analyzed oculomics measurements graded by ophthalmologists. A comprehensive eye examination was conducted by the Korean Ophthalmological Society (KOS) using a vehicle equipped with ophthalmic devices. Trained ophthalmologists measured the eyelid positions of all participants. Marginal reflex distance 1 (MRD1) was defined as the distance from the upper eyelid margin to the corneal light reflex in the primary position (Supplementary Table S1). MRD1 values were obtained, and blepharoptosis was defined as an MRD1 of less than 2mm for either eye [25]. Ophthalmologists made a differential diagnosis of blepharoptosis with particular attention to avoiding misdiagnosis of pseudoptosis and dermatochalasis. The levator muscle function test was also performed by measuring the upper eyelid excursion from downgaze to upgaze (Supplementary Table S1), excluding any influence of frontalis muscle function, and sorted into normal (\(\ge\)12mm) and decreased levator function (\(\le\)11mm). Standardized slit-lamp examinations were performed to diagnose pterygium and cataracts. Fundus photographs were obtained using a digital fundus camera (TRC-NW6S; Topcon, Tokyo, Japan). For each eye of each participant, a 45\(^{\circ }\) digital retinal fundus image centered on the macula and fovea was obtained. All fundus photographs were subjected to preliminary and detailed grading. Preliminary grades for retinal diseases and optic discs were assigned to the retinal images by trained ophthalmologists, and multiple retinal specialists performed detailed grading. After grading the fundus photographs, glaucoma was defined according to a previous study [26]. When the conditions of the two eyes from one participant differed, the worse eye was chosen for the analysis. The KOS National Epidemiologic Survey Committee periodically trained ophthalmologists to control the quality of the survey.

Statistical analysis

Population baseline table

Statistical analyses were performed with SPSS version 24.0 (IBM Corp, Armonk, NY, USA), Python 3, and R version 4.2.2 (www.R-project.org). Continuous variables were described as mean ± standard deviation (SD) and compared by independent T-test or ANOVA. Categorical variables were described as numbers (percentages) and compared using chi-square tests.

Processing missing value

In this study, we employed polynomial interpolation to handle missing values in the dataset. Polynomial interpolation, a commonly utilized technique, fills in the missing values by constructing curves between known data points, preserving the overall trend of the data more effectively without introducing excessive noise. Thus, this method maintains the relative smoothness and continuity of the data. We deleted those variables with more than 30% missing and processed the remaining variables with missing value filling only in the internal dataset, which from the year 2008 to 2011 KNHANES.

Spearman correlation analysis

Within this research, we utilized Spearman correlation analysis to identify variables strongly correlated with the target variable. Spearman correlation analysis, a nonparametric statistical method, measures the monotonic relationship between two variables, which is particularly suitable for data that does not adhere to the assumption of a linear relationship. And the p<0.001 is considered as a statistically significant difference. Through Spearman correlation analysis, we effectively reduced the dimensions of the dataset, thereby enhancing modeling efficiency.

Features construction

In addition to the selected features, we employed AutoFeat [27] for feature construction. This automated feature engineering tool facilitates extracting more insightful features from the raw data through a sequence of feature engineering steps, encompassing both directly derived features and interaction features. This augmentation would furnish our model with a more extensive wellspring of information, contributing to enhancing the model’s generalizability and predictive accuracy. From the processing missing value step to the feature construction step, we only used the data set from 2008 to 2011 of KNHANES to avoid going against established data science best practices. Supplementary Fig. S3 details the process of building our model and the different uses of data sets from different years.

Metrics for evaluation

To evaluate the performance of the machine learning models, we computed the Area Under the Curve (AUC), accuracy, precision, recall, and F1 score for predicting the TyG-index and AIP separately. The AUC, which reflects the comprehensive performance of the model across a spectrum of classification thresholds, serves as a pivotal metric in evaluating the model’s efficacy. A substantial AUC denotes an exceptional discernment capacity of the model in distinguishing between positive and negative samples. Accuracy, as the cornerstone metric for assessing classification prowess, represents the ratio of correctly predicted samples to the total samples. An increase in accuracy can somewhat indicate enhanced model performance, but accuracy is not a comprehensive metric when the sample distribution is not balanced. Precision, on the other hand, delineates the ratio of true positive samples to the model’s predicted positives within the entirety of predicted positive samples. Amplified precision signals a heightened precision in the identification of positive samples. Recall embodies the ratio of model-predicted positive samples to true positive samples, thus showcasing the model’s resilience in detecting all positive samples. The F1 score, embodying the harmonious fusion of precision and recall, acts as a comprehensive metric for gauging the holistic performance of the model. A higher F1 score corresponds to a superior overall model performance. The formulas for computing these metrics can be found in the supplementary materials (Supplementary Fig. S1).

Machine learning algorithms

Regarding the machine learning algorithms, we adopted various models to conduct predictive analysis, aiming to explore the effectiveness of different models in addressing cardiovascular prediction problems. These models encompass Random Forest [28], Extra Trees [29], Bagging [30], Decision Tree [31], Extra Tree [29], XGBoost [32], LGBM [33], Gradient Boosting [34], Support Vector Machine (SVM) [35], AdaBoost [36], Label Propagation [37], Label Spreading [37], Linear SVM [35], Logistic Regression [38], Ridge Regression [39], Ridge CV [39], Multi-Layer Perceptron (MLP) [40], K Neighbors [41], Stochastic Gradient Descent Classifier (SGD) [42], Bernoulli NB [43], Perceptron [40], Passive Aggressive [44], Quadratic Discriminant Analysis [45], Gaussian NB [43], and Dummy. Through the utilization of these diverse machine learning models, we aim to evaluate their performance on the dataset comprehensively. This endeavor will aid in understanding which models exhibit optimal performance in handling our specific problem and what their strengths and weaknesses are. We conducted experiments using a five-fold cross-validation approach and employed multiple evaluation metrics to compare these models, including accuracy, precision, recall, and F1 score. These evaluation metrics will assist us in selecting the most suitable machine learning model for addressing our specific problem and provide guidance for future improvements and optimizations.

Results

Correlation analysis of data exploration

Initial processing of the raw dataset was conducted, as illustrated in Fig. 2 and Supplementary Fig. S3. After excluding variables with no significant statistical differences (P>0.001), we removed variables without clinical value and those directly calculable to yield outcome indicators. The remaining variables were used as input for our model. To illustrate the associations between the variables incorporated in the model, we generated a heatmap of Spearman coefficients (Fig. 3). Detailed descriptions of the input variables for the machine learning models are shown in Supplementary Table S1.

Table 1 Baseline Characteristics of Participants Stratified by Gender to develop a machine learning model to identify the TyG-index or AIP scales by clinical questionnaires and oculomics
Fig. 3
figure 3

Heatmap of Spearman correlation ship in oculomics and clinic variables with TyG-index and AIP. Detailed descriptions of the input variables are shown in Supplementary Table S1. The standard names in the dictionary corresponding to different codes in the heatmap are shown in Supplementary Table S2. TyG-index, Triglyceride-glucose index. AIP, Atherogenic index of plasma

The findings are displayed in the form of a heatmap, where the color gradient from dark to light indicates the correlation from high to low. Due to space constraints, the variables in Fig. 3 are represented in the form of codes. Each code or character has a specific meaning, which can be found in the dictionaries of KNHANES for the years 2007-2009 and 2010-2012.

Demographic analysis

In this study, a total of 32,122 participants (14,254 males and 17,868 females) from KNHANES were included in the final dataset. We gathered 36 input features comprising 11 demographic variables, 4 anthropometric parameters, and 21 ophthalmological measurements. These data were utilized to identify potential factors that may affect TyG-index or AIP, and the authentic calculated values of TyG-index or AIP were used as the outcome for constructing the machine learning model. Demographic characteristics of the participants are shown in Table 1. Baseline oculomics measurement specifics are detailed in Supplementary Tables S3 and S4 (Fig. 3).

Fig. 4
figure 4

ROC and AUC of models use both oculomics and clinic variables as input variables. a-c The performance of our model using TyG-index=8.0, 8.75, or 8.93 as the cut-off point in the internal validation dataset is demonstrated. d-f The performance of our model using TyG-index=8.0, 8.75, or 8.93 as the cut-off point in the external validation dataset is demonstrated. AUC, Area under curve. TyG, Triglyceride-glucose index.ROC, Receiver operating characteristic

Our findings revealed that there were significant differences in the outcome values of TyG-index or AIP across all age and gender subgroups (p<0.001). Additionally, over 50% of the participants in the survey were either non-smokers (n=20223, 62.96%) or within the normal weight range (n=20110, 62.61%).

Model performance and verification

We employed 25 machine learning algorithms for model establishment. After screening the correlation coefficient, the input variables consist of 36 variables, including 15 clinical indicators and 21 ophthalmic indicators.

In terms of performance evaluation, based on the clinical questionnaire and oculomics measurement variables, we tested the overall model performance for five endpoints, yielding promising results (Tables 2 and 3). Moreover, due to the independent nature of the KNHANES database across different years, utilizing data from different years as internal and external validation sets is a common practice [46, 47]. In the predictive model employing all input factors, when the TyG-index cut-off values were 8.0, 8.75, and 8.93, the model’s AUC in the internal validation set was 0.812, 0.873, and 0.911, respectively, while in the external validation set, the AUC was 0.809, 0.863, and 0.901. We plot the model performance in the internal and external validation datasets with different TyG-index cut-off points as predicted outcomes in Fig. 4. As for the AIP endpoint, when set at 0.34, the internal and external validation sets are 0.849 and 0.842, respectively. Slightly poorer performance was observed for the 0.318 cut-off value, with AUCs of 0.844 and 0.836, but it remained reliable and compelling. The results of AIP endpoints are presented in Fig. 5.

Fig. 5
figure 5

ROC and AUC of models use both oculomics and clinics as input variables. a-b The performance of our model using AIP=0.318 or 0.34 as the cut-off point in the internal validation dataset is demonstrated. c-d The performance of our model using AIP=0.318 or 0.34 as the cut-off point in the external validation dataset is demonstrated. AIP, Atherogenic index of plasma. AUC, Area under curve. ROC, Receiver operating characteristic

Fig. 6
figure 6

ROC and AUC in the internal validation models used both oculomics and clinic as input variables. a-c, g-h Five cut-off points of AIP and TyG-index in the male subgroup. d-f, i-j Five cut-off points of AIP and TyG-index in the female subgroup. AIP, Atherogenic index of plasma. AUC, Area under curve. ROC, Receiver operating characteristic. TyG-index, Triglyceride-glucose index

In addition, as gender is an independent risk factor for CVD [48], to eliminate the overfitting caused by gender bias in the overall predictive model, we conducted subgroup analyses for males and females separately. Results between gender subgroups are shown in Fig. 6. The AUC of models in male and female subgroups are most close only when the cut-off point of TyG-index is 8.93 (AUC=0.907 versus 0.906). The differences were observed when the cut-off point of TyG-index was set at 8, 8.9, and the cut-off point of AIP was set at 0.318, 0.34. The AUC values in males and females are corresponding with 0.832 and 0.790, 0.874 and 0.862, 0.853 and 0.825, 0.858 and 0.831.

Table 2 The model performance of 25 algorithms on the validation set and external validation set when TyG-index takes three different cut-off values
Table 3 The model performance of 25 algorithms on the validation set and external validation set when AIP takes two different cut-off values

Regarding specific models, the Gradient Boosting method demonstrated the best performance across most experiments (Tables 2 and 3). The AUC of the LGBM method was only stronger than Gradient Boosting method with a slight advantage of 0.001 in the external validation set of 0.318 and 0.34. In the other three cut-off points, the LGBM method was slightly inferior to Gradient Boosting, although the results were similar. It is worth noting that out of the 25 methods employed, the top 7 methods were selected for illustration purposes (Tables 2 and 3, Supplementary Tables S5, S6, S7, and S8).

Discussion

In this study, we realized the use of oculomics and clinic variables to predict the range of TyG-index and AIP by machine learning algorithms. The model demonstrated outstanding prediction efficiency with both internal and external datasets.

CVD is one of the most critical and dangerous chronic diseases in the world [49]. Early prevention and risk assessment of CVD are vital to patients or healthy adults. In order to achieve early prevention and risk assessment of CVD, many studies have been conducted on the risk factors of CVD. TyG-index and AIP combine traditional risk factors such as blood lipids and/or blood glucose that can reflect cardiovascular risk in different aspects. It has also been pointed out that TyG-index and AIP are better indicators of CVD than single biomarkers like TG and HDL-C [50, 51]. Recently, a systematic review documented that a greater TyG index might be individualistically linked to a greater incidence of atherosclerotic CVD among asymptomatic individuals. Several studies have confirmed the cross-sectional correlation between AIP and cardiovascular risk [52, 53]. Therefore, in primary screening of the population, especially in clinical practice, the TyG index and AIP can be considered to identify patients with high risk of CVD.

As previously reported, the eye provides a unique and transparent medium through which we can observe and measure various biological markers without any invasive procedures. Since oculomics was proposed in 2020, it has been widely discussed as a concept that combines AI, big data, and eye images [9]. Oculomics can be used for the prediction of many diseases, including sarcopenia and schizophrenia [46, 54, 55]. As for CVD, a review discussed the possibility of cardiovascular risk assessment based on oculomics [56, 57]. There are also studies combining oculomics and genomics to reveal aneurysm-related biomarkers [58]. In our study, we used oculomics and clinical features to predict cardiovascular risk. As for the cardiovascular risk, we selected different thresholds for TyG-index and AIP. The TyG-index cut-off point equals 8, and it is from a study involving 150,000 Koreans [21]. A graded positive association was observed between the TyG-index and CVD hospitalization. Per 1-unit increase in the TyG-index, a 16% increase in CVD hospitalization was demonstrated. When the cut-off point is the upper one-third of TyG values, we calculated 8.75 in our study population [22]. Compared with the lowest tertile of the TyG index, the highest tertile (tertile 3) was associated with a greater incidence of the composite outcome, myocardial infarction, stroke, and incident type 2 diabetes. When the cut-off point is the upper one-fourth of TyG values, we calculated 8.93 in our study population [23]. During 8.2 years of mean follow-up, the highest TyG index quartile demonstrated that these patients were at higher risk for stroke (HR=1.259; 95% CI 1.233-1.286), for MI (HR=1.313; 95% CI 1.28-1.346), and for both (HR=1.282; 95% CI 1.261-1.303) compared with participants in the lowest TyG index quartile.

As for the threshold of AIP, we chose 0.318 and 0.34. In type 2 diabetic subjects undergoing percutaneous coronary intervention, AIP plays an important role in predicting the prognosis. A recent study approved that when the value of AIP is higher than 0.318, the prognosis of the high AIP group was significantly worse than that of the low AIP group [20]. Another threshold for AIP was determined to be 0.34 in the study population [24]. High AIP (\(\ge\)0.34)presented the highest risk of cardiovascular deaths in patients with type 2 diabetes mellitus. Remarkably, in our study, the predictive efficacy was most conspicuous when TyG-index equated to 8.93, or AIP reached 0.318, suggesting that individuals demonstrating these particular indicators should be classified as the cohort at heightened risk for subsequent cardiovascular disease.

Gender differences in CVD have been discussed in recent studies [59]. A recent study showed that the absolute incidence of CVD in men was significantly higher than in women across all age groups [60]. Some researchers have observed that diabetes is more likely to be associated with ischemic heart disease in women than in men. And women are more likely to present with ischemia with no obstructive coronary arteries (INOCA) [61, 62]. On the treatment side, new treatments that are useful in men have not led to significant reductions in CVD mortality in women [63, 64]. So it is necessary to assess the gender-specific cardiovascular risks. In the subgroup analysis of this study, we found that the predictive efficacy of women was only stronger than that of men in both external validation sets when TyG=8.75 (AUC=0.849 and 0.867) and TyG=8.93 (0.881 and 0.908). Among all the remaining cut-off values of TyG and AIP, the AUC of the male subgroup was higher than that of the female subgroup. Our model has good predictive value for both men and women. Furthermore, this result suggests that the role of the TyG-index in predicting cardiovascular risk in women may be worthy of further exploration.

In this study, we used oculomics and non-invasive clinical data to predict the TyG index and AIP which are good indicators for CVD. Our research proposed that machine learning-based models successfully enhance the predictive performance for detecting abnormal ranges of AIP and TyG-index by using clinical questionnaires and oculomics. This conclusion holds promise for early identification of heightened cardiovascular risk by forecasting atypical TyG-index and AIP values in patients.

Therefore, our study has several strengths. First, we achieved good prediction of TYG and AIP by using variables from oculomics and clinical questionnaires. Both oculomics and clinical variables in the questionnaire were obtained non-invasively. These data are well recorded in the KNHANES database. We achieved the acquisition of these data and the construction of models at low cost. Low cost and non-invasiveness are important advantages for widespread application in clinical real-life scenarios [65]. Furthermore, although in its early stages, artificial intelligence (AI)-based oculomics methods may be a valuable non-invasive tool in primary care settings, enabling accurate diagnosis more cost-effectively and providing immediate results [66, 67]. Thus, our study has the advantage of enabling early detection and early intervention to prevent the progression of CVD in individuals at a low cost. In addition, due to the easy availability of ocular biomarkers, our study may also help eliminate barriers to uneven medical resources among regions with different economic levels.

Limitation

Nevertheless, this study still bears several limitations. Firstly, whether the results of this study can be applied to populations of other races that differ significantly from Asians remains to be further validated. The KNHANES dataset is primarily composed of the East Asian population. Previous studies have indicated differences in upper eyelid anatomy between Asians and Caucasians [68]. Secondly, we only included a limited number of oculomics measurement variables due to data limitations. The original fundus photographs in the KNHANES database were inaccessible, preventing us from establishing an image-based predictive model. Incorporating imaging data such as oculomics photographs and retinal images could yield more high-throughput information and potentially improve the prediction of cardiovascular risk [10].

Conclusion

In this study, we integrated two classifications of prognostic determinants, encompassing both clinical and oculomics textual parameters, in the formulation of the subject selection and model establishment. Throughout the experiment, we independently predicted the TyG-index and the AIP, yielding encouraging findings. These two variables collectively portray the metabolic status and may have an influence on the risk prediction of cardiovascular risks.