Background

Cardiovascular diseases (CVDs) comprise major causes of death worldwide, resulting in extensive burden of early mortality, reduction in quality of life, and other socioeconomic and health impacts on populations and societies [1, 2]. Alterations in normal lipid profiles, i.e., dyslipidemia, are risk factors significantly associated with CVD considering the mechanisms linked to the pathophysiology of atherosclerosis [3, 4]. However, comprehensive evidence on the relationships between dyslipidemia and other CVD risk factors is lacking, considering that only part of the variance in lipid traits is explained by traditional risk factors (e.g., lifestyle, demographic and socioeconomic characteristics, biochemical mechanisms, among others). Heritability, candidate genes, and genome-wide association studies (GWASs) have been performed to fill the gap in the literature, revealing the considerable genetic influence on lipid traits [5,6,7].

However, a major part of genetic investigations has been conducted in European descent populations, which hinders the extrapolation of findings to groups of admixed ancestry [8]. In fact, a recent analysis showed that the power of GWASs might be increased using data from admixed populations [9]. A study performed in the Brazilian population comprising a mix of multiple ancestries estimated moderate heritabilities for LDL-c, HDL-c, total cholesterol, and triglycerides (TGL) in a family-based investigation [10]. Other studies in Brazil have identified links between single nucleotide polymorphisms (SNPs) and fatty acid profiles and serum lipid traits under the candidate-gene framework [11,12,13].

The Sao Paulo Health Survey with Focus on Nutrition (ISA-Nutrition) represents one of the pioneering initiatives in Brazil inquiring about the relationship between dyslipidemia and CVD risk factors, including SNPs [14]. While the studies performed using ISA-Nutrition data provided initial insights, the genetic contribution to dyslipidemia and its underlying mechanisms remains to be fully understood [7]. For instance, these previous studies rely on a priori hypothesis over a limited set of markers, which impacts on the discovery of potential novel variants throughout the genome [15]. Therefore, the present study aimed to perform a genome-wide association study (GWAS) to detect SNPs linked to blood lipid traits in individuals participating in the ISA-Nutrition study, assuming a linear additive genetic model. The hypothesis of the study refers to the existence of diverse genetic contributions to lipid traits within highly-admixed populations, representing novel evidence regarding the role of genetic information from individuals in underexplored ethnic groups.

Materials and methods

Study design and population

The present study is part of the cross-sectional population-based Sao Paulo Health Survey with Focus on Nutrition study (ISA Nutrition), conducted in 2015, which aims to investigate the associations of lifestyle, sociodemographic, economic, biochemical, and genetic information with cardiometabolic diseases in the city of São Paulo. The present study was conducted in accordance with the principles of the Declaration of Helsinki, being approved by the Research Ethics Committee of the School of Public Health from the University of São Paulo (43838621.7.0000.5421 and 30848914.7.0000.5421). The details of the study are described elsewhere [14].

Data initially comprised information collected from 901 residents in São Paulo municipality during 2015. Participants were distributed in three groups according to age: adolescents (corresponding to individuals ≥ 12 to 19 years old), adults (individuals ≥ 20 to 59 years old), and older adults (individuals ≥ 60 years old). Questionnaires were administered by trained personnel, including information on socioeconomic, demographic, anthropometric, lifestyle, and health status of individuals, among other characteristics. Blood pressure, anthropometric data and blood samples were collected from the participants in the households by trained nurses for the identification of biochemical and genetic markers. Further details on the sampling procedure and summary statistics of this dataset were previously described in other publications [7, 14, 16].

Phenotypic data

Previously, lipid traits were modeled as a function of variables belonging to six comprehensive classes of variables: inflammation, which comprises the inflammatory biomarkers interleukin (IL)-1β, IL- 6, IL-10, C-reactive protein (CRP), monocyte chemoattractant protein 1 (MCP-1) and tumor necrosis factor-alpha (TNF-α); insulin, fasting blood glucose levels, and absence or presence of insulin resistance according to the homeostasis model assessment of insulin resistance (HOMA-IR); anthropometric characteristics (body mass index, BMI; waist circumference, and waist circumference to height ratio); socioeconomic and demographic variables (sex, age, educational attainment); systolic and diastolic blood pressure; and lifestyle characteristics (alcohol and tobacco use, diet quality and physical activity) [7]. Lipid traits were converted through rank-based normal inverse transformation to meet statistical modeling assumptions.

BMI was estimated using information of height and weight of participants, and categorized into presence or absence of overweight (including overweight and obesity), according to age group. Twelve dietary components were evaluated and combined into the Healthy Eating Index Revised and adapted for the Brazilian population (BHEI-R) to assess diet quality: dark green and orange vegetables, total vegetables, whole fruits, total fruits, legumes, whole grains, total grains, meats, eggs and legumes, milk and dairy products, saturated fat, oils, sodium, and the component corresponding to calories from solid fat, alcohol and added sugar (SoFAAS). Dietary data were obtained from two 24-hour dietary recalls, adjusted for usual intake distributions using the Multiple Source Method. The International Physical Activity Questionnaire (IPAQ)-Long Form, adapted to Portuguese and validated for the Brazilian population, was adopted for assessment of the physical activity level. Details on the phenotypic data collection and calculation of indicators are described elsewhere [17, 18].

Genetic markers and quality control

DNA was quantified using the Qubit™ dsDNA BR DNA Quantification Kit in Qubit® 2.0 fluorometer (Thermo Fisher Scientific, Waltham, USA) from blood samples. Information from 864 free-living healthy individuals was genotyped with the Axiom™ 2.0 Precision Medicine Research Array (Affymetrix Inc, Santa Clara, CA), and 681 individuals were considered unrelated (genomic relatedness matrix, GRM, estimations > 0.125) [19]. Global ancestry was assessed with the SNPRelate package in R software v4.1.0 and PLINK 2.0 using 393,284 markers from the array that were also present in common with the 1000 Genomes Project phase 3 (1 KGP) after quality control pruning [20] (Table S1).

GWAS

After exclusion of individuals with missing phenotype data, a GWAS was performed for 667 unrelated individuals, with SNPs filtered based on the criteria of Hardy-Weinberg Equilibrium (P) ≥ 10− 5 and MAF > 0.05, using the genetic information of 330,656 SNPs. The GWAS approach used the traditional polygenic model of additive effects:

$${y}_{i}=\mu +{\beta }^{{\prime }}\times {X}_{i}+{{\beta }^{{\prime }}}_{{SNP}_{i}}\times {X}_{{SNP}_{i}}+{\epsilon }_{i}$$
(1)

Where yi = response variable of the ith individual; µ = trait mean; β’ = transposed vector of covariate effects; Xi = vector of covariates; XSNPi = vector with genotype information for the ith individual; β’SNPi = transposed vector of SNP effects; and εi = residual term associated with the ith individual.

The GWASs under the linear model approach were performed using the 10− 5 significance level for the HDL-c, LDL-c, TGL, HDL-c/LDL-c, total cholesterol, VLDL-c, and non-HDL-c phenotypes, according to their respective selected models.

The adjustment baseline covariates age, sex, age-sex interaction, age2, and presence of overweight were commonly used across phenotypes in previous association analyses to avoid confounding, as in other studies [8]. The first two principal components of global ancestry (PC1 and PC2) were included to account for the highly admixed population characteristics [21,22,23]. The selected models with PC1, PC2 and significant covariates with association to each of the lipid traits are shown in Table 1. The synthesis of variables in the dataset are presented in Table 2.

Additionally, linkage disequilibrium analysis for the significant SNPs that were associated with more than two lipid traits and three common FTO SNPs (rs1421085, rs17817449 and rs9939609) was performed. Both GWAS and linkage disequilibrium analysis were performed using R 4.3.0.

Table 1 Selected models of serum lipid traits used for GWAS.
Table 2 Descriptive statistics of the ISA-Nutrition dataset

Results

GWAS - linear regression model

There were 19 significantly different SNPs associated with lipid traits, most of which corresponded to intron variants. Three variants (rs1562012, rs16972039, and rs73401081) and two variants (rs8025871 and rs2161683) were associated with two and three phenotypes, respectively. Non-HDL-c had the highest number of associations, as opposed to VLDL-c and LDL-c/HDL-c ratio. Among the associations, 14 and 12 had positive and negative coefficients, respectively (Table 3).

Table 3 SNPs significantly associated with lipid traits according to the polygenic additive model

Manhattan plots with SNPs above the significance threshold are shown in Fig. 1 and Figures S1-S6.

Fig. 1
figure 1

Manhattan plot of the significant SNPs associated with non-HDL-c

Discussion

The GWAS under the polygenic additive model revealed 19 novel significant associations between SNPs and lipid traits in the present study. Some of the associations were consistently found across two to three lipid traits, which is in line with the well-established understanding of the metabolism and physiology of lipoproteins. The literature on specific associations of phenotypes with SNPs identified in the present study showed that only rs7591899 was previously investigated in relation to glucometabolic traits, which presented conflicting evidence [24, 25].

A recent GWAS performed through Mendelian randomization to evaluate circulating lipoproteins, including HDL, LDL, and triglycerides levels, using data from the UKBiobank, identified more than one thousand associated SNPs. However, none of their results were replicated in the present investigation [26]. Similarly, findings from other GWASs that included data from underrepresented populations also lacked correspondence with the present results [27,28,29,30]. While a recent multi-ancestry meta-analysis incorporated a sample from the Brazilian population, its focus was on exploring associations between the interaction effects of LDL-c, HDL-c, and triglycerides with physical activity, rather than solely assessing the lipid profiles independently [31]. Also, there was no correspondence between the results of that study (significant SNPs on CLASP1, LHX1, SNTA1, and CNTNAP2 genes) and the ones of the present investigation. Hence, it should be noted that findings from these studies should not be directly compared due to several methodological differences, including sample size, evaluated trait, genetic ancestry, genotyping platform, and significance level, among others.

The results of the present study showed that phenotypic lipid traits were significantly associated with SNPs linked to the genes CDH12, COL26A1, DACH1, ENOSF1, FAM81A, LINC02778, LOC105374505, LOC105379358, POMC, PTPRD, WIPF1, WRN, and ZFHX3. Some genes have been previously investigated due to links with lipid metabolism (FAM81A) [32], low-density lipoprotein cholesterol and obesity (ZFHX3) [33, 34], myocardial infarction (CDH12) [35], nonalcoholic fatty liver disease (PTPRD) [36], cardioembolic stroke risk (WIPF1) [37], satiety and obesity (POMC), fasting lipids and insulin in children (POMC) [38], and atherosclerosis (DACH1) [39].

In the present study, the majority of the variants linked to two or more phenotypes were present in intronic regions, particularly within the genes FAM81A, ZFHX3, PTPRD, and POMC. Except for POMC (proopiomelanocortin), there were two variants found for each of the genes, which suggested that the significant variants within a given gene might be in linkage disequilibrium (LD) with each other. This was confirmed by additional LD analysis, which showed that ZFHX3 SNPs and FAM81A were in strong and weak LD, respectively, while PTPRD SNPs were in linkage equilibrium (Table S2 and S3).

POMC is responsible for encoding a preproprotein subjected to extensive, tissue-specific, post translational processing, resulting in up to ten possible different active peptides involved in several cellular processes. One of the main peptides is lipotropin beta, which is responsible for the mobilization of fat from adipose tissue [40]. Variants in POMC have been linked to obesity and hyperphagia, likely through (a) leptin-dependent sympathetic innervation of adipose tissue, which then decreases the mobilization of lipids within the white adipose tissue (WAT), and (b) impaired MC4R signaling in the hypothalamus because of the lack of α-MSH and diacetyl-α-MSH, which leads to increased appetite [41,42,43].

Regarding FAM81A, there was no function assigned for either rs4775168 or rs8025871, being the latter linked to both LDL-c and non-HDL-c. However, it should be noted that rs8025871 is near the rs17302400 variant within the same gene, which has been previously associated with visceral adipose tissue [44]. In a previous GWAS performed on multiple ancestry participants from the Million Veteran Program, variants in other FAM genes were shown to be associated with several lipid traits, e.g., FAM13A with HDL-c; FAM136A with both LDL-c and total cholesterol; and FAM117B with both LDL-c and total cholesterol [28]. In addition, an association with FAM241B was detected in a study with a smaller sample of the underrepresented Indian population [29].

Furthermore, the two variants in ZFHX3, which encodes the zinc-finger homeobox 3 protein are present in intronic regions and have not been described in other studies. Nonetheless, the ZFHX3 gene acts as a transcription regulator and some of its polymorphisms were associated with risk of atrial fibrillation [45, 46]. Considering that ZFHX3 is located on chromosome 16, the same chromosome in which several SNPs in the FTO obesity-related gene are found, a possible hypothesis for the significant associations identified is that they might be in linkage disequilibrium with FTO and FTO-related genes [47].

For instance, in comparison to FTO, ZFHX3 has approximately 1 million base-pairs closer to Iroquois homeobox protein 3 (IRX3), which is known to mechanistically interact with the genetic variation of FTO to influence obesity and related metabolic disorders [48]. Importantly, the effects have also been observed in admixed Latin populations and might be connected with hepatic lipid metabolism, as shown by negative correlations of the transcription factor with serum triglycerides, LDL-c, uric acid, and total cholesterol levels [49, 50]. This hypothesis was confirmed in this study, as shown by low, albeit significant LD values between the ZFHX3 SNPs and three main SNPs in the FTO gene (rs1421085, rs17817449 and rs9939609) showing very low LD values (Table S2 and S3).

Concerning PTPRD (protein tyrosine phosphatase receptor type D), neither of the two variants had been associated with lipid traits in previous studies, and, accordingly, its gene product, which is a signaling peptide involved in several cellular processes, has no reported involvement in lipid metabolism.

Major part of the significant associations with single phenotypes were in genes that have broader ranges of cellular functions (e.g., cell adhesion, cell growth, differentiation, organization of cytoskeleton), with no sound implication for lipid metabolism or any cardiometabolic-related outcome. Notably, there were pinpointed variants in two noncoding RNA (ncRNA) genes (LOC105374505 and LINC02778), that have not been characterized thus far. It is widely recognized that ncRNAs have important regulatory functions in several diseases and health conditions, including cancer, metabolic disorders, diabetes, and inflammation [51, 52].

Interestingly, a novel ncRNA has been reported to reprogram lipid metabolism, leading to the accumulation of lipids inside the cell and promoting hepatocellular carcinoma progression [53]. However, the roles of the ncRNAs in the onset of dyslipidemia or other phenotypes in the Brazilian population has yet to be determined by further investigation.

Furthermore, the novel evidence identified in the present study may contribute to advances in precision medicine applied for treatment of cardiometabolic diseases, including dyslipidemia, and metabolic syndrome. The identification of genetic features linked to lipid traits may support pharmacogenomic investigations for the prediction of treatment responses, allowing to avoid adverse effects and improve therapies through integrated approaches for dyslipidemia at the individual level, in addition to supporting disease prevention strategies that may reduce treatment costs in national health systems [54,55,56].

Study strengths and limitations

The study presents numerous strengths. The GWAS was performed in a Brazilian cohort of free-living individuals from a study with a sample that is representative at population level in the largest city of the country, adopting strict methodological rigor regarding data collection and analysis. In addition, the population evaluated has admixed ancestries and is underrepresented in genetic research, which may contribute to the understanding of genes and lipid related outcomes, considering that the availability of numerous GWASs in multi ancestries populations may contribute to research progress in this field with the ultimate goal of improving lipid profiles and reducing CVD risk [57].

Importantly, certain limitations should be considered in the interpretation of the aforementioned results. First, the dataset had a small sample size, which decreases the study power for detection of significant associations. Second, the genetic data included a limited set of SNP genotype data, which might lack information on other important markers with possible clinical relevance. Third, there was lack of specific information on other lipids, like LDL-c fractions and apolipoproteins usually associated with risk for CVD (e.g., apoB48, apoB100, apoC-III). Finally, the use of cross-sectional data imposes challenges for interpretation of the clinical significance of SNPs using data from a single population due to limitations in the establishment of causality; thus, additional research is required on the associations between SNPs and lipid profiles identified in the present study.

Conclusions

The GWAS results offer insights regarding the genetic structure underlying lipid traits in an underrepresented population with high ancestry admixture. The associations identified in the study were robust across multiple lipid phenotypes, and some of the associations were significant for two or more variants. Furthermore, the findings raise important questions about the role of ncRNAs in lipid metabolism, which remains a relatively unexplored subject.

Nevertheless, comparisons with other populations should be approached with caution, and further replication on larger datasets and in other populations with admixed backgrounds should be rendered. Thus, the present findings may guide follow-up investigations aiming at replicating the results, and to enhance interpretability by identifying credible or causal variants involved in the metabolism of lipoproteins, which may facilitate the identification of novel targets for therapies that improve lipid profile. Further evidence may be achieved using fine-mapping, functional annotation, and causal inference approaches, as well as candidate-gene experiments focused on the genes FAM81A, ZFHX3, PTPRD, and POMC.