Introduction

Atrial fibrillation (AF) is an emerging cardiovascular epidemic representing the most common clinically meaningful arrhythmia. Worldwide, around 5 million new diagnoses are made every year [6]. AF is associated with a lower quality of life, higher subsequent risk of stroke, heart failure (HF), and mortality [16]. AF-associated burden measured as disability-adjusted life-years increased by 19% from 1990 to 2010 [6], and available treatments convey relevant risks [12]. Thus, the best option is prevention and, consequently, a deeper understanding of its pathophysiology and etiological determinants is needed.

Metabolomic profiling techniques can help provide more comprehensive insights into AF causes, especially in relationship with nutritional exposures, as this condition has been previously modulated by dietary interventions and exposures [25]. Metabolic profiles reflect gene and protein functional activity and are simultaneously sensitive to lifestyle, in addition to results of gene-regulated metabolic pathways [5].

Although conceptually appealing, prospective studies on metabolomics and AF are scarce [1, 13, 17, 18, 33]. Two studies have been conducted within two observational cohorts, the Atherosclerosis Risk in Communities (ARIC) Study and the Framingham Heart Study, reporting no significantly AF-associated single metabolites after multiple testing adjustment [1, 17]. On the other hand, in the Cardiovascular Health Study, ceramides and sphingomyelins with palmitic acid—Cer-16 and SM-16—were associated with an increased risk of AF, whereas ceramides and sphingomyelins with very-long-chain saturated fatty acids—Cer-20, Cer-22 and Cer-24 and SM-20, SM-22, and SM-24—were inversely associated with AF risk [13]. Additionally, in the ADVANCE trial, GM3(d18:1/41:1) and GM3(d18:1/18:0) were inversely associated with the risk of developing AF [33]. Interestingly, in the latter study, the authors also performed a cluster analyses in which a cluster containing GM3(d18:1/24:1) and closest related lipid species was relevant to prediction of future AF. Recently, three Swedish cohorts on this topic reported that only sphingomyelin (28:1) was significantly and inversely associated with AF risk in a secondary analysis after multiple testing adjustment [18]. Beyond observational studies, analyses of nutritional intervention randomized trials can elucidate the effects of diet on metabolites and their relationships with the subsequent risk of new-onset AF. This approach can provide stronger causal inferences with the potential to be translated to preventive strategies. In fact, some randomized trials have shown sizable effects of nutrition-related interventions on the risk of AF [22].

In the context of the PREDIMED (PREvención con DIeta MEDiterránea) trial, a large nutrition intervention trial, some multimetabolite lipid patterns or lipid network-clusters were previously reported to be prospectively associated with either major cardiovascular disease (coronary heart disease, stroke or cardiovascular death), heart failure, or type 2 diabetes [29, 30, 34, 37, 38], suggesting that lipidomics networks may also be related to AF risk. Moreover, the incidence of AF was significantly reduced in the PREDIMED arm using a Mediterranean diet (MedDiet) supplemented with extra-virgin olive oil (EVOO) [22], suggesting that modification of the dietary lipid content may mitigate AF risk, especially because an inverse dose–response effect was found between the actual percentage of caloric intake provided by EVOO and the subsequent risk of AF. However, comprehensive information on the association of lipidome profiles with future AF incidence is still scarce, and evidence on the joint association of interrelated lipid metabolites with new-onset AF has seldom been assessed [33]. We aimed to assess if participants’ baseline lipidome profile in the PREDIMED trial was associated with incident AF and whether the dietary intervention conducted in the PREDIMED trial could modify this association.

Methods

The present case–control study was nested in the PREDIMED trial [10, 21]. Briefly, the PREDIMED trial was a multicenter randomized primary prevention trial designed to assess the effect of the MedDiet on CVD. Participants were recruited during 2003–2009. Participants were 7447 men (aged 55 to 80 years) and women (aged 60 to 80 years) without prior CVD but at high cardiovascular risk due to the presence of either type 2 diabetes or at least three of the following classical cardiovascular risk factors: current smoking, overweight/obesity, high LDL-cholesterol, low HDL-cholesterol, family history of early coronary heart disease, or hypertension. Participants were randomly allocated in a 1:1:1 ratio to a MedDiet supplemented with EVOO (MedDiet + EVOO), to a MedDiet supplemented with mixed tree nuts (MedDiet + nuts) or to a control group (advised to follow a low-fat diet and to reduce all types of fat). Participants received individual and group dietary educational sessions by a trained dietitian at the baseline visit and quarterly thereafter. Participants in the control group also received dietary education promoting a low-fat diet with the same intensity and frequency as the two MedDiet groups [21].

The trial was stopped because of early benefit based on endpoints documented through December 1, 2010, after a median intervention time and follow-up of 4.8 years.

Out of the 7447 participants in the PREDIMED trial, 7343 had no AF at study inception and 639 incident cases were registered (Fig. 1).

Fig. 1
figure 1

Flowchart of participants in the PREDIMED trial

For the present study, we used a nested case–control design, where controls were matched by recruitment center, year of birth (± 5 years), and sex. We selected 1 to 3 matched controls per case, depending on availability of controls with the matched characteristics. Finally, we were able to include 512 incident cases of AF and 735 matched controls. We used the incidence density sampling (risk-set sampling) with replacement for selecting controls: controls were randomly selected from all eligible participants at risk at the time of the incident case occurrence, and selected controls could be selected again as a control for another index case and they could become later a case [28].

All participants completed a detailed battery of questionnaires at baseline which included information on sociodemographic characteristics and medical conditions, a validated Spanish translation of the Minnesota leisure time physical activity questionnaire [9] and a validated 14-item screener on their adherence to the traditional MedDiet [32].

The Institutional Review Board of the coordinating center approved the PREDIMED extended follow-up protocol (HCB/2019/0629), and of the Harvard TH Chan School of Public Health (00,002,642) and of the University of Navarra (2017.154_2012.104) approved the case–control subproject. All participants gave written informed consent.

At baseline, participants underwent a blood draw after an overnight fast (> 8 h) by trained technicians. Samples were processed by the study personnel according to the study protocol. EDTA plasma samples were coded and kept refrigerated until they were stored at − 80 °C in freezers.

For the present work, stored samples were shipped on dry ice to the Broad Institute for metabolomics analysis in 2018. Samples from cases-control participants were randomly distributed before being shipped to the Broad Institute in Boston, MA, for metabolomics assays.

Metabolomic analyses of plasma

A detailed description on the determinations of plasma lipids can be found in the supplemental materials.

Endpoint ascertainment

AF was a pre-specified secondary outcome in the PREDIMED trial as previously described elsewhere in detail [22] (see supplemental material).

Statistical analysis

Missing values for the individual lipids in the PREDIMED trial were imputed with half of the minimum detected value. Baseline individual lipid values were normalized and scaled in multiples of 1 SD with Blom’s rank-based inverse normal transformation in the PREDIMED trial [3].

Association between individual lipids and atrial fibrillation

We estimated the association for each individual lipid with a conditional logistic regression model adjusted for recruitment site, age, sex, smoking (3 categories), body mass index, type 2 diabetes, hypertension, family history of premature coronary heart disease, leisure time physical activity, educational level, statin use and intervention group. p Values were adjusted for multiple hypothesis testing with the procedure described by Benjamini and Yekutieli [2].

Network and clustering

We used the conditional independence-based PC-algorithm for causal structure learning to generate the lipidomics network in the PREDIMED trial [15, 20]. This algorithm was applied to the same lipidomics data in another nested case–control study in the PREDIMED-trial on HF [38]. We retained edges with a partial correlation > 0.1 which were robust across the two case–control studies. We applied the walktrap algorithm in the igraph R-package (http://igraph.org/) to the robust lipidomics network to detect densely connected lipid clusters.

Random forest-based evaluation of cluster importance

We selected the most important lipid clusters for AF prediction with a machine-learning-based selection in the PREDIMED trial. We prepared the lipidomics data based on deviations calculated from matching-strata-specific means to keep the matched design. We grew a random forest for AF-prediction based on the transformed lipids (500 trees, sampling rate of 2/3). The importance of lipid clusters for AF prediction was then evaluated in the out-of-bag sample. Hitherto, we assessed the random forest model’s predictive performance based on information on the full lipidomics data compared to the predictive performance based on lipidomics data with the joint permutation of lipid variables in each of the clusters. Clusters were ranked by the extent to which omitting the information in their variables hampered the predictive performance, with the largest increase in prediction error corresponding to the highest cluster importance.

Determining atrial fibrillation-predictors within important clusters and definition of a lipid score

To select the accountable lipid pattern for high cluster importance, for each of the most important clusters, we mutually included all cluster-variables in a regression model. We ran a tenfold cross-validated elastic net regression to select the robust AF predictors within each of the clusters in the PREDIMED trial.

Finally, we estimated a weighted score combining all the metabolites described in the last step applying the leave-one-out cross-validation approach in conditional logistic regression models adjusted for age, sex, smoking habit, body-mass index, type 2 diabetes, hypertension, family history of coronary heart disease, leisure-time physical activity, education (3 levels), statin use, and intervention group (3 groups). We addressed the possible interaction between the weighted score and the intervention (3 groups) with the Wald test.

Principal component analysis

We ran a principal component analysis (PCA) with the 216 lipid metabolites among the controls in the PREDIMED trial. Those factors with an eigenvalue higher than 2 were retained. The varimax rotation was applied. Sixteen uncorrelated factors were extracted. The extracted factors explained 83% of the total variance. Individual metabolites with absolute loadings > 0.40 were considered relevant components of the identified factors. Loadings obtained among controls were then applied to the cases. To analyze the association between the extracted factors and incident AF, factors were categorized into quartiles based on the distribution among controls. We also calculated the p for trend by estimating quartile-specific medians and treating the resulting variable as continuous. We addressed the possible effect modification by the intervention (3 groups) by testing interaction product-terms (Wald test).

Analyses were performed with Stata SE 15.1 (College Station, TX) and R 3.6.3.

Data and code sharing

After submission of a brief proposal and statistical analysis plan to the corresponding author and upon approval from the Predimed Steering Committee and Institutional Review Boards, the data and the code for data analyses will be made available to them using an onsite secure access data enclave.

Results

Baseline characteristics of AF cases and controls are displayed in Table 1. In both cases and controls, mean baseline age was 68.5 years and 49% of the participants were women. The average BMI was 30 kg/m2. Nearly half of the participants had type 2 diabetes, and more than 80% had hypertension at baseline.

Table 1 Baseline characteristics of cases and controls

Association between individual lipids and atrial fibrillation

The associations between each baseline individual lipid and incident AF are displayed in Supplemental Table 1. Forty-one lipids were nominally (p < 0.05) associated with incident AF, but none remained associated after multiple testing adjustment (all FDR > 0.05).

Lipid networks and their association with atrial fibrillation

Figure 2 shows the robust data-driven lipid network based on the conditional independence structure, which integrated 216 lipids. Generally, the data-derived clusters included lipids from the same or metabolically closely related lipid classes.

Fig. 2
figure 2

Lipid network and color-coded walktrap-clusters

Table 2 shows the most important lipid clusters identified in the random-forest-based evaluation. They were ranked according to the extent to which omitting the information in their variables hampered the predictive performance. For example, omitting the information on MG 16:1, MG 18:0, and palmitoyl-EA (cluster 1-lipids) resulted in the largest increase in prediction error.

Table 2 Most important lipid clusters for AF-prediction identified in the random-forest based evaluation

We selected within-cluster AF predictive lipids for each important cluster with cross-validated elastic net regression (Supplemental Table 2). Then, we selected network-wide independent AF predictors by tenfold cross-validated elastic net regression with all selected within-cluster predictors as potential explanatory variables. Table 3 shows the network-wide, mutually independent AF predictors. The selected lipids included fourteen plasmalogens (8 with positive weights, 6 with inverse weights), as well as palmitoyl-EA and CE 16:0, which were positively weighted, and PC 36:4;O, cholesterol, and TG 53:2 and 53:3 which were inversely weighted.

Table 3 Mutually independent robust independent atrial fibrillation predictors across all selected network clusters and their median values in the leave-one-out cross-validation

Finally, we summarized the selected lipid predictors’ association with AF risk in a weighted sum score. The weights were generated in a leave-one-out cross-validation procedure based on the selected lipid predictors. The multivariable-adjusted OR for AF incidence per SD increase in the lipid score was 1.32 (95% CI 1.16 to 1.51; p < 0.001). There was no evidence of an effect modification by the dietary intervention of the PREDIMED trial (p value > 0.05).

Results from the PCA and the association between the identified factors and atrial fibrillation

Out of the 16 extracted factors, 3 factors (factors 5, 11, and 15) showed associations (nominal p values < 0.05) with incident AF and the individual lipids with a factor loading > 0.40 in these factors are displayed in Supplemental Table 3. Supplemental Table 4 shows the ORs between the quartiles of each of the 3 factors identified in the PCA and AF. The strongest association was an inverse association observed for factor 5 (labeled as “low in cholesterol esthers and high in di- and triacylglycerols”): OR5th vs. 1st quartile 0.64 (95% CI 0.45–0.90) (p for trend: 0.027). This factor was poor in rather saturated cholesterol esters and rich in diacylglycerols and triacylglycerols with a rather long chain and an intermediate degree of unsaturation.

None of these factors showed an interaction with the intervention group (p value > 0.05).

Discussion

In this nested case–control within the PREDIMED trial, after adjusting for multiple testing, no single lipid was associated with AF incidence. However, a network-based multilipid combined score identified using a robust data-driven lipid network was associated with AF risk. In this combined score, several PC-plasmalogens and PE-plasmalogens were positively weighted whereas others were inversely weighted; in addition, palmitoyl-EA and were positively weighted, and cholesterol, PC 36:4;O, and TG 53:3 were negatively weighted.

Our results are consistent with results from the ARIC Study and the Framingham Heart Study [1, 17] in which no single lipid molecule was found to be prospectively associated with AF after adjustment for multiple testing. In the ARIC Study, only a secondary bile acid (glycocholenate sulfate), pseudouridine and acisoga were directly associated with AF onset and uridine was inversely associated with AF. Notwithstanding, SM 16:0 was directly associated with the risk of developing AF and this lipid was also found to be directly associated with AF risk in a previously published cohort [13]. In that study, ceramide with palmitic acid was also associated with AF risk but this association was not replicated in our study.

Despite the lack of associations between individual lipids and AF, we combined network analysis and machine learning to select a subset of lipids jointly associated with AF risk in PREDIMED. We summarized the joint association of these lipids with AF risk in a weighted sum score, using weights from a leave-one-out cross-validation procedure to avoid model overfitting. The estimated score included information on 17 lipids, out of which 7 were PC-plasmalogens and 5 were PE-plasmalogens. The short- and medium-chain and less unsaturated PC-plasmalogens with positive coefficients and medium-chain and more unsaturated PC-plasmalogens with negative coefficients. Similarly, shorter PE-plasmalogens showed positive coefficients and medium-chain PE-plasmalogens showed negative coefficients. These results are in line with the results of a case control study on AF, in which they observed in partial least square-discriminant analysis that a lower degree of unsaturation of LPC, LPE and PC levels was associated with a higher odds of AF [14]. Also, in the latter study, a relatively high degree of unsaturation was associated with a lower odds of AF. On the other hand, the other prospective study whose authors conducted a cluster analyses was the ADVANCE trial. They found that a cluster containing GM3(d18:1/24:1) and closest related lipid species was relevant to prediction of future AF. Unfortunately, we could not assess the consistency with their results since those lipids were not available in our platform [33].

Lipoproteins transport plasmalogens, and nearly one-third of the human heart’s phospholipids are plasmalogens. Plasmalogens could play a role in AF prevention through different pathways [8, 36]. Plasmalogens have antioxidant properties, since their free radical-scavenging chemical structure may decrease oxidative stress sensitivity [4, 24]. These properties support our findings’ biological plausibility—shorter and more saturated plasmalogens were positively associated with the multilipid score and inverse weights were observed for longer and more unsaturated plasmalogens—given that atrial myocardial inflammation and oxidative stress may play a crucial role in AF development [4, 8, 24, 36].

Plasmalogens have also been suggested to be PPARγ agonists [7], which may prevent cardiac fibrosis through different pathways, according to experimental models [19]. First, PPARγ can regulate inflammation through the NF-κB pathway reducing the risk of cardiac fibrosis and AF. In addition, PPARγ agonists could decrease the number of myofibroblasts, collagen I and brain natriuretic peptide (BNP) expression, matrix metalloproteinases-2 (MMP-2) activity, and protein level of connective tissue growth factor (CTGF). PPARγ agonists could also attenuate atrial structural remodeling, regulate cardiac telomere biology, alleviate MMP-9 activity, and reduce vascular endothelial dysfunction by inhibition of cardiovascular NADPH oxidase, among others. Additionally, plasmalogens may contribute to the stability of lipid raft microdomains and improved vesicular function [27, 35] and they may be required for cholesterol transport from the plasma membrane to the endoplasmatic reticulum [23]. In the identified multilipid score, some plasmalogens were associated with a higher risk of AF whereas others showed an inverse association. In our study, shorter and more saturated PC-plasmalogens showed positive coefficients and medium-chain and more unsaturated PC-plasmalogens showed negative coefficients. Thus, not only plasmalogens as a whole but also the fatty acid composition of plasmalogens may play an etiological role in AF development as it is the case for other lipid species [14]. Nevertheless, more evidence is needed for the potential association between plasmalogens as a whole and specific plasmalogens and atrial fibrillation.

In the multilipid score, palmitoyl-EA showed a positive association with incident AF. This observation does not easily align with experimental evidence, suggesting possible anti-inflammatory properties of palmitoyl-EA [26]. A possible explanation may be compensatory upregulation of anti-inflammatory factors, which has been observed to precede chronic disease onset. Nevertheless, we have no clear explanation for this finding.

In the multilipid score, free cholesterol was inversely associated with incident AF. Evidence on the association between free cholesterol and incident AF is scarce. Nevertheless, in a recent meta-analysis, total cholesterol was inversely associated with this condition [11].

In PREDIMED, participants allocated to the MedDiet + EVOO group showed a 38% lower risk of AF than those allocated to the control group, whereas participants allocated to the MedDiet + nuts group showed no reduction in the risk of AF [22]. We observed no effect modification by the interventions for the association between the multilipid score and AF. This result suggests that the intervention did not counterbalance the possible deleterious association between the multilipid score and incident AF and that the underlying mechanisms in the association between the intervention and AF may be different from the lipid profile modification, possibly suggesting that some nonlipid constituents of EVOO, such as phenolic compounds with potent antioxidant and anti-inflammatory properties, may be important for disease prevention [31].

We acknowledge that our study had some limitations. First, we could not distinguish between different isomers of a given lipid formula in our primary analyses, and different isomeric forms of some lipids may show a differential association with AF. Second, participants in the PREDIMED trial were at high cardiovascular risk and mainly Caucasians, limiting the generalizability of our results to other populations. Third, despite the multivariable adjustment, residual confounding cannot be ruled out.

Despite the abovementioned limitations, our study also shows some strengths. We used different dimension-reduction methods to disentangle potential lipidome patterns associated with the risk of developing AF. To our knowledge, this is one of the few prospective studies on AF risk that have addressed the assessment of interplays between different lipids and has not only considered individual lipids in isolation [33]. Also, we have applied machine learning methods to combine different metabolites to avoid overfitting. This methodological advance increases the robustness and reliability of our results.

In conclusion, no individual lipids were prospectively associated with the risk of AF after penalization for multiple comparisons. However, a weighted multilipid score, primarily made up of plasmalogens, was associated with AF risk. Further studies are needed to examine our observations’ generalizability to other populations and elucidate the underlying biological mechanisms.