Background

Abnormal lipid and lipoprotein levels are a major risk factor for coronary heart disease (CHD) [1], the leading cause of death worldwide [2]. Elevated low-density lipoprotein cholesterol (LDL-C) levels and decreased high-density lipoprotein cholesterol (HDL-C) levels are correlated with the development of CHD. There is a strong genetic basis for lipoprotein-lipid levels with heritability estimates of 40–80 % [3]. A large number of genes and genetic variants associated with lipid traits have been discovered in genome-wide association studies (GWAS) [46]. Most of the common variants (minor allele frequency [MAF] ≥5 %) identified by GWAS have modest effects on lipid levels, and have overall a small contribution to total genetic variance of lipid traits (~25–30 % of the heritability) [48]. A portion of the missing heritability of lipid traits could be explained by low frequency (LoF)/rare variants (MAF <5 %) as suggested by recent studies [911].

HDL, the smallest and densest (d = 1.063–1.21 g/mL) class of lipoprotein particles, has a variety of antiatherogenic properties [12]. One of the HDL properties to protect against CHD is mediated by reverse cholesterol transport (RCT) from peripheral tissues back to the liver [13]. Scavenger receptor class B member 1 (SCARB1, protein; SCARB1, gene) serves as a HDL-C receptor in RCT that mediates selective uptake of HDL-C cholesteryl esters (CE) by the liver and free cholesterol efflux from cells to HDL-C [14]. SCARB1 is also implicated in the metabolism of apolipoprotein B (ApoB)-containing particles [1521].

The SCARB1 gene (Entrez Gene ID: 949) is located on human chromosome 12, and is abundantly expressed in liver and steroidogenic tissues [22, 23]. The role of SCARB1 in HDL-C and ApoB-containing lipoproteins metabolism has been established in animal studies. The disruption of SCARB1 is associated with increased HDL-C levels and decreased CE uptake [2426]. Whereas the overexpression of SCARB1 reduces levels of HDL-C, ApoA-I, very low-density lipoprotein cholesterol (VLDL-C), LDL-C, and ApoB [1517, 19] and promotes the hepatic uptake of CE as well as the biliary secretion of HDL-C [15, 27]. The SCARB1 expression is also significantly associated with hepatic VLDL-triglycerides (TG) and VLDL-ApoB production. Hepatic VLDL cholesterol production together with VLDL clearance is enhanced in response to SCARB1 overexpression [21]. In contrast, reduced hepatic VLDL-TG and VLDL-ApoB production is associated with SCARB1 knockout status [18, 20, 21].

In humans, three SCARB1 mutations (rs397514572 [p.Ser112Phe], rs187831231 [p.Thr175Ala], and rs387906791 [p.Pro297Ser]; MIM: 601040) have been reported to be associated with significantly increased HDL-C levels [28, 29]. Moreover, several genetic studies have demonstrated the association of common SCARB1 variation with lipoprotein-lipid levels [5, 2839] and subclinical atherosclerosis [40].

To our knowledge, no genetic study has exclusively investigated the association between SCARB1 and lipid traits in native African populations to date. The objective of this study was to resequence all 13 exons and exon-intron boundaries of SCARB1 in 95 African Blacks from Nigeria with extreme HDL-C levels for variant discovery and then to genotype selected variants in the entire sample of 788 African Blacks, followed by genotype-phenotype association analyses with five major lipid and apolipoprotein (Apo) traits (HDL-C, LDL-C, TG, ApoA-I and ApoB). Because our initial gene-based analysis demonstrated evidence of association with HDL-C and ApoA-I, our subsequent analyses focused on these two traits.

Methods

Study population

The present study was carried out on 788 African Black subjects from Benin City, Nigeria, who were recruited as part of a population-based epidemiological study on CHD risk factors. Detailed information on the study design and population description is provided elsewhere [41]. In brief, 788 recruited subjects were healthy civil servants (37.18 % females) from three government ministries of the Edo state in Benin City, Nigeria, aged between 19 and 70 years, including 464 junior staff (non-professional staff with salary grades 1–6), and 324 senior staff (professional and administrative staff with salary grades 7–16). The summary features, including biometric and quantitative data of the entire sample of 788 subjects are given in Table 1 and Additional file 1: Table S1.

Table 1 Characteristics and lipid profile of 95 individuals with extremea HDL-C levels and of the entire sample of 788 African Blacks

For resequencing, 95 individuals with extreme HDL-C levels (within the upper and lower 10th percentiles of HDL-C distribution) were chosen from the entire sample of 788 African Blacks. Resequencing sample comprised of 48 individuals with high HDL-C levels (≥90th percentile, range 68.30–99.00 mg/dL; Table 1) and 47 individuals with low HDL-C levels (≤10th percentile, range 10.30–35.00 mg/dL; Table 1). The University of Pittsburgh Institutional Review Board approved the study protocol. All participants gave their informed consent.

Lipid and apolipoprotein measurements

At least 8-hour fasting blood samples were collected from all participants. Serum specimens were separated by centrifugation of blood samples and then stored at −70 °C for 6–12 months until ready for testing. Lipid and apolipoprotein measurements included total cholesterol, HDL-C, TG, ApoA-I, and ApoB and were done with standard assays at the Heinz Nutrition Laboratory, University of Pittsburgh under the Centers for Disease Control Lipid Standardization Program [41]. LDL-C was calculated with the Friedewald equation [42] when TG levels were less than 400 mg/dL.

PCR and sequencing

Genomic DNA was isolated from clotted blood using the standard DNA extraction procedure. All 13 SCARB1 exons (isoform 1, NM_005505), exon-intron boundaries, and 1 kb of each of 5′ and 3′ flanking regions on chromosome 12 (hg19, chr12: 125,262,175-125,348,519) were polymerase chain reaction (PCR) amplified and sequenced. Specific primers were designed using the Primer3 software (Whitehead Institute for Biomedical Research, http://bioinfo.ut.ee/primer3-0.4.0/) to cover 13 target regions, resulting in 14 amplicons, including two overlapping amplicons for the largest last exon 13. PCR reaction and cycling conditions are available upon request. The primer sequences and amplicon sizes are given in Additional file 2: Table S2.

Automated DNA sequencing of PCR products was performed in a commercial lab (Beckman Coulter Genomics, Danvers, MA, USA) using Sanger method and ABI 3730XL DNA Analyzers (Applied Biosystems, Waltham, MA, USA). Variant analysis was performed using Variant Reporter (version 1.0, Applied Biosystems, Waltham, MA, USA) and Sequencher (version 4.8, Gene Codes Corporation, Ann Arbor, MI, USA) software in our laboratory.

Variant selection for genotyping

Of 83 variants identified in the discovery step (see Additional file 3: Table S3, Additional file 4: Table S4, Additional file 5: Figure S1, and Additional file 6: Figure S2), 78 (28 with MAF ≥5 % and 50 with MAF <5 %) were selected based on the pairwise linkage disequilibrium (LD) and Tagger analysis using an r2 threshold of 0.90 (5 were excluded due to high LD) in Haploview (Broad Institute of MIT and Harvard, https://www.broadinstitute.org/scientific-community/science/programs/medical-and-population-genetics/haploview/haploview) [43] for follow-up genotyping in the entire sample (n = 788). Since our sequencing was focused primarily on coding regions, in addition we selected 69 HapMap tag single nucleotide polymorphisms [SNPs] (out of total 108 HapMap tagSNPs; see Additional file 7: Table S5 and Additional file 8: Figure S3) based on Tagger analysis (MAF ≥5 % and r2 ≥ 0.80) of HapMap data (Release #27) from the Yoruba people of Ibadan, Nigeria (YRI), in order to cover the entire gene for common genetic variation information. Moreover, we selected two SCARB1 variants previously reported to be significantly associated with lipid traits in the literature (Additional file 9: Table S6). Conclusively, a total of 149 variants, comprising of 78 sequence variants, 69 common HapMap-YRI tagSNPs, and two relevant associated variants, were selected for follow-up genotyping.

Genotyping

Genotyping of selected variants in the total sample of 788 individuals was performed by using either iPLEX Gold (Sequenom, Inc., San Diego, CA, USA) or TaqMan (Applied Biosystems, Waltham, MA, USA) methods and following the manufacturers’ protocols.

Out of 149 selected variants, two failed assay designs and nine failed genotyping runs (see details in Additional file 3: Table S3, Additional file 7: Table S5, and Additional file 9: Table S6). Quality control (QC) measures for successfully genotyped variants were as follow: a genotype call rate of ≥90 %, a discrepancy rate of <1 in 10 % replicates, and no deviation from Hardy-Weinberg equilibrium [HWE] (P >3.62 × 10−4 after Bonferroni correction). Ultimately, a total of 137 QC-passed genotyped variants were included in genetic association analyses (see Additional file 9: Table S6, Additional file 10: Table S7, Additional file 11: Figure S4, and Additional file 12: Figure S5).

Statistical analysis

We used the Haploview program to determine allele frequencies, to test HWE for genotype distribution, and to evaluate the LD and pairwise correlations (r2) between variants [43].

The values of each lipid phenotype outside the mean ± 3.5 standard deviation (SD) were excluded from downstream gene-based, single-site, and haplotype analyses. However, the extreme phenotypic values associated with rare variants (MAF ≤1 %) were maintained during rare variant analysis, as was the case for the p70201/chr12:125279319 variant (see study workflow in Fig. 1). Values of the five lipid and apolipoprotein traits—HDL-C, LDL-C, TG, ApoA-I, and ApoB—were transformed using the Box-Cox transformation. For each trait, we used stepwise regression method to select the most parsimonious set of covariates from the following list: sex, age, body mass index, waist, current smoking (yes/no), minutes of walking or biking to work each day (jobmin), and occupational status (staff: junior [non-professional staff]/senior [professional and administrative staff]). Genetic association analyses, including gene-based, single-site, LoF/rare variant, and haplotype association tests, were performed using linear regression models that included significant covariates for each variable (Additional file 13: Table S8).

Fig. 1
figure 1

Summary of the study design and flow. Chart presents an overview of the study design and flow, including sequencing and genotyping stages and analysis approaches. ApoA-I, apolipoprotein A-I; ApoB, apolipoprotein B; HDL-C, high-density lipoprotein cholesterol; LD, linkage disequilibrium; LDL-C, low-density lipoprotein cholesterol; LoF, low-frequency; MAF, minor allele frequency; SD, standard deviation; SKAT-O, an optimal sequence kernel association test; SNP, single nucleotide polymorphism; TG, triglycerides; YRI, Yoruba people of Ibadan from Nigeria

The gene-based association analysis was conducted under linear additive model for the combined evaluation of common and LoF/rare variants (n = 136, excluding p70201/chr12:125279319; see details above in paragraph two of this section) for five major lipid traits using the versatile gene-based association study [VEGAS] (http://gump.qimr.edu.au/VEGAS/) software [44]. The significance threshold for the gene-based test was set at P-value of 0.05.

Following gene-based analysis, which primarily implicated SCARB1 in regulation of HDL-C and ApoA-I levels, we further elucidated the association of SCARB1 variants with these two traits using additional tests. In single-site association analysis, P-values for each trait were adjusted for multiple testing using Benjamini-Hochberg procedure [45] to determine the false discovery rate [FDR] (q-value). For common variants (MAF ≥5 %), a nominal P-value of <0.05 was considered to be suggestive evidence of association, and an FDR cut-off of 0.20 was used to define statistical significance. For LoF/rare variants (MAF <5 %), the single-site association results were interpreted separately because of inadequate power of our study to detect individual statistical significance for these variants.

We conducted an optimal sequence kernel association test (SKAT-O) [46] to evaluate the association between a total of 43 LoF/rare variants (MAF <5 %) and the two lipid traits (HDL-C and ApoA-I) by using three different MAF thresholds: <5 % (n = 43), ≤2 % (n = 26), and ≤1 % (n = 23). A significant SKAT-O test was set at a P-value of <0.05.

Haplotype association analysis was performed using the generalized linear model. We applied a fixed sliding window approach that included four variants per window and sliding for one variant at a time. For each window, a global P-value was used to assess the association between the haplotypes with frequency >1 % and a given trait. A global P-value threshold of 0.05 was used to define significant haplotype association.

All analyses, except for VEGAS, were performed using the R statistical software (http://www.r-project.org/) and relevant R packages (i.e., Haplo.Stats for haplotype analysis and SKAT for SKAT-O analysis).

Results

Identification and distribution of SCARB1 sequence variants in 95 individuals with extreme HDL-C levels

Resequencing of SCARB1 exons and exon-intron boundaries plus flanking regions in 95 African Blacks with extreme HDL-C levels identified 83 variants, of which 51 had MAF <5 % (Additional file 3: Table S3 and Additional file 5: Figure S1). The majority of 83 variants (n = 73) were previously identified (dbSNP build 139: GRCh37.p10). Most variants (n = 80) were singlenucleotide variations [SNVs] (67 transitions and 13 transversions); the rest (n = 3) were short insertion and deletion variations (indels).

Tagger analysis using an r2 cutoff of 0.9 identified 28 bins for 32 common variants (MAF ≥5 %), of which three included more than one variant (r2 ranging from 0.95 to 1.0) (Additional file 6: Figure S2). One of these three bins contained two variants (rs204901986 and rs34339961) in complete LD (r2 = 1.0). Of 51 LoF/rare variants (MAF between 1 and 5 %, n = 31; MAF ≤1 %, n = 20), 17 were present only in the high HDL-C group (MAF ranging between 0.010 and 0.042) and eight were observed only in the low HDL-C group (MAF ranging between 0.011 and 0.033). In the high HDL-C group, 29 of 48 (~60 %) individuals cumulatively carried at least one LoF/rare variant, ranging from 1 to 7 variants. Similarly, in the low HDL-C group, 27 of 47 (~57 %) individuals carried at least one LoF/rare variant, ranging from 1 to 9 variants.

Most variants (n = 60) from our sequencing were located in intronic regions, of which two (rs113910315, MAF = 0.005 and rs10396210, MAF = 0.138) were within splice sites (defined as ± 20 bp from the start or end of an exon). The former splice site variant was observed only in the low HDL-C group.

Of the total eight coding variants observed, four were common variants (rs2070242 [p.Ser4Ser], rs10396208 [p.Cys21Cys], and rs5888 [p.Ala350Ala], and rs701103 [p.Gly499Arg]—3′ untranslated region [UTR] in isoform 1 and exon 13 in isoform 2), and the remaining four were LoF/rare variants (rs4238001 [p.Gly2Ser], rs5891 [p.Val135Ile], rs5892 [p.Phe301Phe], and rs141545424 [p.Gly501Gly]). Of note, two LoF/rare coding variants, (rs5891 [p.Val135Ile] and rs141545424 [p.Gly501Gly]), were found only in the high HDL-C group.

Fifteen variants were located in either UTRs (n = 5) or flanking regions (n = 10). One 3′ UTR variant (rs150512235, MAF = 0.006) was very close to a predicted microRNA-145 (miR-145) target site (TargetScanHuman version 6.2, http://www.targetscan.org/). One 5′ flanking variant (rs181338950, MAF = 0.048) was located in the putative promoter region [47].

All 10 novel variants (9 SNVs and 1 insertion) identified in this study have been submitted to dbSNP database ([batch ID: SCARB1_AB]:

http://www.ncbi.nlm.nih.gov/SNP/snp_viewTable.cgi?handle=KAMBOH) and were non-coding with MAF <5 % (ranging between 0.005 and 0.011; Additional file 4: Table S4). Of these novel variants, six and four were present only in the high and low HDL-C groups, respectively.

Genotyping of SCARB1 variants in the entire sample of 788 individuals

Since our sequencing was focused primarily on coding regions, we selected additional HapMap tagSNPs from the HapMap-YRI data in order to cover the entire SCARB1 gene for common genetic variation in SCARB1. Altogether we selected 149 variants for genotyping in our entire African Black sample as follows: 78 variants (28 common variants and 50 LoF/rare variants) discovered in the sequencing step (Additional file 3: Table S3, Additional file 5: Figure S1, and Additional file 6: Figure S2), 69 common HapMap-YRI tagSNPs (Additional file 7: Table S5), and two additional variants with reported association in the literature (Additional file 9: Table S6).

Of these 149 variants, 11 (10 from sequencing, including one promoter [rs181338950], one coding (rs4238001 [p.Gly2Ser]), and one novel [p87459/chr12:125262061], and 1 from HapMap tagSNPs [rs4765180]) failed genotyping, and one (rs866793 from HapMap tagSNPs) failed QC measures. Thus, a total of 137 variants (Additional file 9: Table S6 and Additional file 11: Figure S4) that passed QC were advanced into association analyses with five lipoprotein-lipid traits.

The majority of 137 genotyped variants (n = 120) were located in introns, 11 were in exons, and six were in 3′ flanking region (Table 2 and Additional file 12: Figure S5). Ninety-four of 137 variants had MAF ≥5 %, including four coding variants, one UTR variant, two deletions, and one splice site variant. The remaining 43 variants had MAF <5 % (MAF between 1 and 5 %, n = 20; MAF ≤1 %, n = 23), including three coding variants, three UTR variants, one insertion, and one splice variant.

Table 2 Distribution of 137 SCARB1 genotyped variants

Of the 10 novel variants discovered in the sequencing step, nine (8 SNVs and 1 insertion) with MAF <1 % were successfully genotyped (Additional file 4: Table S4). There was one individual with plasma HDL-C levels above the mean + 3.5 SD carrying one novel variant—p70201/chr12:125279319 (MAF = 0.0010). Although this extreme HDL-C value was excluded as outlier from the gene-based, single-site, and haplotype analyses, it was included in the SKAT-O rare variant analysis considering a possible large effect size of this variant (Fig. 1).

Gene-based association analyses

Gene-based tests revealed a nominally significant association (P = 0.0421; Table 3) of SCARB1 variants with HDL-C levels (best SNP: rs141545424 [p.Gly501Gly], exon 12, MAF = 0.0007, P = 0.0016). Additionally, a trend for association (P = 0.1016) was also observed for ApoA-I levels (best SNP: rs7134858, intron 6, MAF = 0.1560, P = 0.0052).

Table 3 Gene-based association analysis results

Since the gene-based tests showed evidence of associations with HDL-C and ApoA-I, we primarily focused on these two traits to further examine the SCARB1 variants in the entire sample of 788 African Blacks.

Single-site association analyses of common SCARB1 variants

Of 94 common SCARB1 variants with MAF ≥5 %, 10 showed nominal associations (P < 0.05) with HDL-C and/or ApoA-I (Table 4; see results for each trait in Additional file 14: Table S9 and Additional file 15: Table S10), of which three (rs11057851, rs4765615, and rs838895) exhibited associations with both HDL-C and ApoA-I.

Table 4 Nominally significant single-site associations (P < 0.05) of common SCARB1 variants

The most significant association was found between rs11057851 and HDL-C (β = −0.5924, P = 0.0043, FDR = 0.1465). The second best association was between rs7134858 and ApoA-I (β = 1.7537, P = 0.0052, FDR = 0.2918), followed by the association of rs5888 (p.Ala350Ala) with ApoA-I (β = 2.0962, P = 0.0080, FDR = 0.2918).

Of 10 variants that showed nominal associations, high LD (r2 > 0.80) was observed for two pairs of variants (Fig. 2), between rs8388912 and rs5888 (p.Ala350Ala; r2 = 0.86), and between rs838896 and rs838895 (r2 = 0.84).

Fig. 2
figure 2

Single-site P-values of 94 SCARB1 common variants for HDL-C and ApoA-I. Top: The -log10 P-values are presented in the Y-axis. A total of 94 genotyped variants with MAF ≥5 % are shown on SCARB1 gene (5′ → 3′; RefSeq: hg19, NM_005505) in the X-axis. The dash line indicates the nominal significance threshold (P = 0.05). Middle: Gene structure of SCARB1. Bottom: Linkage disequilibrium (LD) plot of 10 SCARB1 variants with P-values <0.05. Shades and values (r2 × 100) in each square of LD plot indicate pairwise correlations: black indicating r2 = 1, white indicating r2 = 0, and shade intensity indicating r2 between 0 and 1. Marker names are shown as “SNP name-SNP ID”. SNP ID is based on dbSNP build 139. ApoA-I, apolipoprotein A-I; FDR, false discovery rate; HDL-C, high-density lipoprotein cholesterol; LD, linkage disequilibrium; MAF, minor allele frequency; SNP, single nucleotide polymorphism; UTR, untranslated region

Association analyses of low-frequency/rare SCARB1 variants

The LoF/rare variants (n = 43) were categorized into three groups based on their frequencies for association analysis with HDL-C and ApoA-I using SKAT-O: MAF <5 % (n = 43), MAF ≤2 % (n = 26), and MAF ≤1 % (n = 23). Although no association between LoF/rare variants and ApoA-I was detected, the group of 23 variants with MAF ≤1 % yielded nominal association with HDL-C levels (P = 0.0478; Table 5).

Table 5 Association results for low-frequency and rare SCARB1 variants (MAF <5 %)

We then individually examined the association of 23 variants with MAF ≤1 % with HDL-C and ApoA-I. Six of these rare variants showed association with either HDL-C levels or both HDL-C and ApoA-I levels (Table 6). While three of them are known variants (rs115604379, rs377124254, and rs141545424 [p.Gly501Gly]), the other three are novel (p52919/chr12:125296601, p54611/chr12:125294909, and p54856/chr12:125294664). Moreover, four of these six rare variants (rs377124254, rs141545424 [p.Gly501Gly], p54611/chr12:125294909, and p54856/chr12:125294664) were present in individuals with extreme phenotypic values (above or below the 3rd percentile). Two of these variants (rs377124254: β = 11.5518, P = 0.0016; rs141545424 [p.Gly501Gly]: β = 11.585, P = 0.0016) were found in a single subject who had very high HDL-C level. Whereas the other two were observed in one individual each, who had extremely low HDL-C levels (p54611/chr12:125294909: β = −9.5243, P = 0.0097; p54856/chr12:125294664: β = −8.4305, P = 0.0215) and ApoA-I levels (p54611/chr12:125294909: β = −19.3821, P = 0.0344; p54856/chr12:125294664: β = −24.0757, P = 0.0082). This rare variant group also included a novel variant (p70201/chr12:125279319) that was observed in one individual with an unusually high plasma HDL-C level (above the mean + 3.5 SD).

Table 6 Characteristics and effects of 6 SCARB1 rare variants of interest

Haplotype association analyses

The 4-SNP sliding window haplotype analyses revealed associations of 32 haplotype windows with HDL-C and/or ApoA-I (global P < 0.05; Table 7; see results for each trait in Additional file 16: Table S11), of which five (windows #47, #72, #111, #112, and #123) were associated with both.

Table 7 Significant haplotype association (global P < 0.05) of 136 SCARB1 genotyped variants with HDL-C and ApoA-I

Overall, a total of 21 haplotype windows showed significant associations with ApoA-I, of which 10 contained seven variants associated with ApoA-I in single-site analysis. Haplotype window #110 spanning introns 10–11 showed the best association signal (global P = 0.0012) and contained the rs838896 variant with a nominal evidence of association with ApoA-I (P = 0.0278) in single-site analysis.

A total of 16 haplotype windows yielded significant associations with HDL-C, of which seven contained three HDL-C-associated variants detected in single-site analysis. The most significant association was found with window #111 (global P = 0.0040) spanning intron 11, which contained the rs838895 variant nominally associated with HDL-C (P = 0.0162) in single-site analysis.

We observed nine regions (5 regions for ApoA-I and 4 regions for HDL-C) harboring consecutive significant haplotype windows (global P < 0.05; ranging from 2 to 6 windows per region; Table 8 and Fig. 3). Seven of those regions contained at least one of the six variants that exhibited nominal associations (P < 0.05) with HDL-C and/or ApoA-I (rs4765615, rs7134858, rs838912, rs838896, rs838895, and rs701106) in single-site analysis.

Table 8 Significantly associated haplotype regions (global P < 0.05) with HDL-C and ApoA-I
Fig. 3
figure 3

Haplotype association plots for HDL-C and ApoA-I. Top: The -log10 P-values are presented in the Y-axis. A total of 136 genotyped variants are shown in order on SCARB1 gene (5′ → 3′; RefSeq: hg19, NM_005505) in the X-axis. Middle: gene structure of SCARB1. Marker names are shown as “SNP name-SNP ID/chromosome 12 position (for novel variants)”. Bottom: linkage disequilibrium (LD) plot of 136 variants. SNPs with MAF ≥5 % are shown in bold. SNP ID is based on dbSNP build 139. All 10 novel variants identified in this study have been submitted to dbSNP (batch ID: SCARB1_AB): http://www.ncbi.nlm.nih.gov/SNP/snp_viewTable.cgi?handle=KAMBOH. The dash line indicates the significance threshold (global P = 0.05). Significantly associated haplotype regions are highlighted. The degree of shades and values (r2 × 100) in each square of LD plot represent the pairwise correlations between 136 genotyped variants: black indicating r2 = 1, white indicating r2 = 0, and shade intensity indicating r2 between 0 and 1. ApoA-I, apolipoprotein A-I; HDL-C, high-density lipoprotein cholesterol; LD, linkage disequilibrium; MAF, minor allele frequency; SNP, single nucleotide polymorphism; UTR, untranslated region

Functional evaluation of identified variants

In order to examine the possible regulatory function of all 153 SCARB1 variants (83 variants identified by our sequencing, 68 common HapMap tagSNPs [excluding rs4765180 due to genotyping failure; see Additional file 7: Table S5], and two relevant variants from the literature), we used the RegulomeDB database (version 1.0, Stanford University, http://www.regulomedb.org/) [48]. Although most of 153 variants (n = 132) revealed scores ranging from 1 to 6, only 11 were supported by strong evidence for regulatory function (scores of 1f -2b): one promoter, one 5′ UTR, two coding (rs2070242 [p.Ser4Ser] and rs10396208 [p.Cys21Cys]), five intronic, one 3′ UTR, and one 3′ flanking variants. Summary and detailed regulatory functions are provided in Additional file 17: Table S12 and Additional file 18: Table S13.

Of 10 variants associated with HDL-C and/or ApoA-I, only one ApoA-I associated variant (rs5888 [p.Ala350Ala] in exon 8) showed suggestive evidence of regulatory function with a score of 3a (Table 4).

Of 10 novel variants, one insertion variant (p1048insC/chr12:125348472) located in 5′ UTR-exon 1 had a strong potential for regulatory function with a score of 2a (Additional file 4: Table S4).

Comparison of SCARB1 single-site and haplotype association analysis results between African Blacks (this study) and US Non-Hispanic Whites (previous study [49])

We compared SCARB1 single-site and haplotype association results in African Blacks reported in this study to those in US Non-Hispanic Whites (NHWs) reported in our previously published study [49]. In the sequencing stage, the number of variants identified in African Blacks (n = 83) was greater than that in US NHWs (n = 44). Notably, most (~90 %) of the 22 sequence variants that were shared between the two populations differed in minor alleles and/or MAFs. Although our major findings included the associations with HDL-C and ApoA-I in African Blacks, we also sought to replicate four associations observed with ApoB levels in US NHWs [49] (Table 9); the association between rs11057820 and ApoB (P < 0.05) that we previously reported in US NHWs [49] was also observed in African Blacks (US NHWs [G allele]: β = 0.8700, P = 0.0436; African Blacks [A allele]: β = 1.8661, P = 0.0292). In addition, we observed two variants (rs4765615 and rs701106) exhibiting nominal associations (P < 0.05) in both populations, albeit with different lipid traits (US NHWs| rs4765615 [G allele]: β = 1.2493, P = 0.0059 for ApoB; rs701106 [T allele]: β = 0.0394, P = 0.0066 for HDL-C; African Blacks| rs4765615 [A allele]: β = −0.4646, P = 0.013 for HDL-C and β = −0.9139, P = 0.048 for ApoA-I; rs701106 [T allele]: β = 1.2967, P = 0.0156 for ApoA-I). Moreover, we noticed that two regions associated with HDL-C or ApoA-I (global P < 0.05; Table 10) in African Blacks spanning intron 2 and intron 3 overlapped with the ApoB-associated region (Region I in Fig. 4) previously reported in US NHWs [49]. Three haplotype regions associated with HDL-C (global P < 0.05) spanning intron 11 and exon 13-3′ UTR in African Blacks also overlapped with a large HDL-C-associated region (Region II in Fig. 4) previously reported in US NHWs [49].

Table 9 Results for 7 SCARB1 lipid-associated variants in US Non-Hispanic Whites (previous studya) and in African Blacks (this study)
Table 10 Significant lipid-associated regions (global P < 0.05) that were observed in US Non-Hispanic Whites (previous studya) and African Blacks (this study)
Fig. 4
figure 4

Lipid-associated SCARB1 common variants and haplotype regions identified in US Non-Hispanic Whites (previous study; Ref [49]) and African Blacks (this study). Lipid-associated variants with MAF ≥5 % with P-values <0.05 and haplotype regions with global P-values < 0.05 that were previously identified in US Non-Hispanic Whites (US NHWs; n = 623) are shown in top panel and those identified in African Blacks (n = 788) are shown in bottom panel (see details in Table 9 and Table 10). SCARB1 variants and haplotype regions are shown on SCARB1 gene (5′ → 3′; RefSeq: hg19, NM_005505). All SNP IDs are based on dbSNP build 139. Regions I and II that are defined based on consecutive haplotype windows with evidence of lipid-association in US NHWs (global P < 0.05; see details in Ref [49]) also show some significant associations in African Blacks (global P < 0.05; see details in Table 7 and Table 8). ApoA-I, apolipoprotein A-I; ApoB, apolipoprotein B; HDL-C, high-density lipoprotein cholesterol; MAF, minor allele frequency; NHW, Non-Hispanic White; SNP, single nucleotide polymorphism; UTR, untranslated region

Discussion

Our sequencing identified 83 variants, of which 78 were selected for follow-up genotyping in the total sample of 788 African Blacks. Additional 69 tagSNPs from the HapMap-YRI data along with two previously reported lipid-associated SCARB1 variants were also genotyped in the total sample. Of 149 genotyped SCARB1 variants, 137 that passed QC were examined for association with major lipid traits (Table 2). The initial gene-based analyses revealed a nominal association with HDL-C (P = 0.0421) as well as a trend for association with ApoA-I (P = 0.1016; Table 3). Consistent with the gene-based results, single-site association analyses also revealed 10 common variants nominally associated (P < 0.05) with HDL-C (n = 5) and/or ApoA-I (n = 8; Table 4 and Fig. 2). The best association signal was between rs11057851 in intron 1 and HDL-C (P = 0.0043, FDR = 0.1465) followed by two associations with ApoA-I including rs7134858 in intron 6 (P = 0.0052, FDR = 0.2918) and rs5888 (p.Ala350Ala) in exon 8 (P = 0.0080, FDR = 0.2918). Moreover, three variants (rs11057851, rs4765615, and rs838895) exhibited evidence of associations (P < 0.05) with both HDL-C and ApoA-I. These findings are supported by the fact that SCARB1 appears to influence ApoA-I in addition to HDL-C [15, 17]. In our data, there was a moderate correlation between ApoA-I and HDL-C levels (r2 = 0.61).

Except for previously reported association of rs5888 (p.Ala350Ala) with lipid traits (HDL-C or LDL-C) in non-African populations [3034, 36, 37, 39], the remaining nine associations observed in this study with the lipid traits (HDL-C and/or ApoA-I levels) in general population are novel and await replication in independent African or African-derived populations. Two of these nine SNPs have previously been shown to have differential effects on cholesterol levels in response to statin (rs4765615) [50] or on HDL-C/TG levels in response to estradiol in post-menopausal women (rs838895) [51]. Another variant (rs838896) was found to be associated with decreased SCARB1 expression in liver [51]. Although the latter SNP was not associated with a low RegulomeDB score (<3), we cannot rule out the possibility that it might be affecting the SCARB1 expression in a tissue-specific manner.

The haplotype analysis revealed evidence of significant association (global P < 0.05) of 32 haplotype windows with HDL-C (n = 16) and/or ApoA-I (n = 21; Table 7) and nine regions harboring consecutive overlapping haplotype windows significantly associated with either HDL-C (4 regions) or ApoA-I (5 regions; Table 8 and Fig. 3). In addition, six variants with nominal association (P < 0.05) in single-site analysis were contained in seven of these nine significantly associated regions, indicating the presence of functional variants in these regions. Our findings demonstrate that haplotype analysis may provide more information than single-site analysis.

Our comparison of the single-site and haplotype association results between in African Blacks (this study) and US NHWs (previous study [49]) has revealed three variants (rs11057820, rs4765615, and rs701106; Table 9) and two regions (Regions I and II; Table 10 and Fig. 4) showing evidence of lipid-associations in both ethnic groups. However, there were differences in associated traits, and/or associated alleles or their directional effects between the two ethnic groups, which reflects the genetic heterogeneity of complex phenotypes like lipid traits among diverse populations. This phenomenon can be explained by different ancestry backgrounds associated with differences in LD structure and genetic architecture, as well as by differences in SNP-SNP, gene-gene, and gene-environment interactions. Nonetheless, the lipid associations observed across different ethnic populations provide convincing evidence that causal/functional variants are present in SCARB1 gene that deserves comprehensive sequencing and functional studies in order to confirm and further characterize the effects of its variants on lipid metabolism.

Rare variant analysis showed significant evidence of association between a group of 23 rare variants (MAF ≤1 %) and HDL-C (P = 0.0478; Table 5). Single-site analysis of these rare variants revealed six (including three novel ones) with effects on HDL-C, of which three also had effects on ApoA-I (Table 6). In addition, four of these six rare variants appeared to be carried by individuals with extreme HDL-C and/or ApoA-I levels (above or under the 3rd percentile). This HDL-C-associated rare variant group also included a novel variant (p70201/chr12:125279319) that was observed in one individual with an unusually high plasma HDL-C level (above the mean + 3.5 SD). Our findings suggest that these rare variants might have functional relevance, thus screening of additional large African samples for these rare variants may help to establish their role in HDL-C and ApoA-I metabolism.

To date, there has been limited information concerning possible functional effects of lipid-associated SCARB1 variants, particularly for those located in non-coding regions. In fact, most of common and rare HDL-C/ApoA-I-associated variants observed in the current study are non-coding and do not show strong evidence of regulatory function based on RegulomeDB database. Nonetheless, three of these HDL-C/ApoA-I-associated SCARB1 variants (rs5888 [p.Ala350Ala], rs838885, and rs838886) have been previously demonstrated to influence the SCARB1 expression [5153]. Therefore, additional functional studies are needed and may help to determine the functional nature of the SCARB1-associated variants and those in LD with them.

Our study has revealed a number of novel findings, although we also acknowledge some limitations. SCARB1 is a large gene and we sequenced only its coding regions and exon-intron junctions and also our sequencing sample size was small. Thus, we may have missed some functional LoF/rare variants due to small sample size and those located in uncovered intronic regions. Moreover, consistent with generally small effect sizes of lipid-associated variants reported in the literature, most of our single-site associations reached nominal significance (P < 0.05) but did not survive multiple testing corrections. Only the top variant (rs11057851) associated with HDL-C yielded an FDR cut-off of <0.20 (FDR = 0.1465; Table 4). Therefore, future larger studies in independent African or African-derived populations are necessary to validate all nominal associations observed in this study.

Conclusions

In conclusion, we report the first comprehensive association study of SCARB1 variants with lipid traits in a native African population, which revealed a number of novel associations in single-site and haplotype analyses. In addition, resequencing allowed us to identify 10 novel rare variants, of which four were in the group of 23 rare variants that has showed association with HDL-C levels. The SCARB1 associated common and rare variants observed in our study explained ~11.09 % of the variation in HDL-C levels and ~8.63 % of the variation in ApoA-I levels. Our findings indicate the genetic contribution of SCARB1, both common and LoF/rare variants, to inter-individual lipid variation in the general African Black population, which warrants further follow-up in independent studies. Insights into the HDL-C and related lipid traits may also lead to new potential targets for CHD treatment.