Introduction

The clustering of multiple metabolic abnormalities, such as obesity, dyslipidaemia and hypertension, is common in patients with type 2 diabetes. We have previously reported a locus on chromosome 1q21–25 with significant evidence for linkage to type 2 diabetes, the metabolic syndrome and waist circumference in families from the Hong Kong Family Diabetes Study (HKFDS) [1, 2]. This chromosomal region has also been linked to type 2 diabetes or glucose intolerance, the metabolic syndrome, obesity, and dyslipidaemia or familial combined hyperlipidaemia (FCHL) in other populations [310].

Recently, Pajukanta et al. found association and linkage of variation at the gene for the transcription factor upstream stimulatory factor 1 (USF1) on chromosome 1q22–23 with FCHL in a Finnish population [11]. Since the USF1 transcription factor regulates both glucose and lipid metabolism [12], we hypothesised that USF1 might be a candidate gene for susceptibility to type 2 diabetes, the metabolic syndrome or obesity. Our primary goal was to test whether variation at USF1 could explain the observed evidence for linkage of type 2 diabetes and/or the metabolic syndrome in this region in the HKFDS families included in our genome scans. We also wanted to assess the evidence for association of variation at USF1 with disease and metabolic phenotypes in the families, and association of this variation with disease phenotypes in independent samples of hospital cases and normal control subjects.

Subjects and methods

Subjects

All subjects were of southern Han Chinese ancestry, residing in Hong Kong. The study included individuals from three independent samples. The first sample (family cases) included 179 families (897 subjects, average family size 5.0±2.3) recruited from the HKFDS. The details of ascertainment, exclusion criteria and phenotyping are described elsewhere [1, 2]. All families were recruited through a proband selected from the Prince of Wales Hospital Registry. Eighty-three percent of the probands were diagnosed at ≤40 years of age. The second sample (hospital cases) consisted of 1383 unrelated type 2 diabetic patients randomly selected from the same registry and 34% were diagnosed ≤40 years [13]. The third sample (control subjects) consisted of 454 normal control subjects (fasting plasma glucose [FPG]<6.1 mmol/l) recruited from the general population participating in a community-based cardiovascular risk screening programme, as well as hospital staff. Control subjects with glucose intolerance (IFG, IGT or diabetes) [14], the metabolic syndrome (National Cholesterol Education Program in Adult Treatment Panel III criteria) [15] or a family history of diabetes were excluded. The clinical characteristics of subjects in the three samples are summarised in Table 1 of the Electronic supplementary material. Informed consent was obtained for each participating subject. This study was approved by the Clinical Research Ethics Committee of the Chinese University of Hong Kong.

Table 1 Association of USF1 SNPs and haplotypes with type 2 diabetes and metabolic syndrome using FBAT

Clinical studies

All subjects underwent detailed clinical investigation as described previously [2, 13]. A fasting blood sample was collected for measurement of plasma glucose, insulin, lipids, liver and renal function, and DNA. All family members and 335 of the control subjects who had no history of diabetes underwent a 75 g OGTT. Three fasting blood samples collected at intervals of 5 min were assessed for mean FPG, insulin and C-peptide. Blood samples were also collected at 15, 30, 60 and 120 min during the OGTT for measurement of plasma glucose and insulin. Using the homeostasis model assessment (HOMA), the insulin resistance index (HOMA%IR) was assessed as fasting insulin (mU/l)×FPG (mmol/l)/22.5, and beta cell function (HOMA%β) was assessed as fasting insulin×20/(FPG–3.5) [16].

The metabolic syndrome was defined according to the NCEP III guidelines [15]. Subjects who had at least three of the following five risk factors were classified as having the metabolic syndrome: (1) hyperglycaemia with known diabetes or FPG ≥6.1 mmol/l; (2) known hypertension or BP ≥130/85 mmHg; (3) hypertriglyceridaemia with triglyceride ≥1.7 mmol/l; (4) HDL<1.0 mmol/l in men or <1.3 mmol/l in women; and (5) central obesity with waist circumference >90 cm in men or >80 cm in women. The definition of central obesity was modified for Asian populations [17].

Genotyping

Among the common single nucleotide polymorphisms (SNPs) observed in USF1 and the adjacent gene F11R by sequencing in a Finnish study [11], we genotyped nine SNPs (rs836, rs790056, rs4339888, rs3737787, rs2073658, rs2516841, rs2516839, rs2516838 and rs1556259), representing each of the linkage disequilibrium (LD) clusters in 38 unrelated family cases and 51 control subjects (Fig. 1). We estimated pairwise LD between markers using the GOLD-ldmax program [18] and haplotype frequency using the PHASE program (v. 2.1) [19]. Four common haplotypes (frequency >0.01) constitute 99.7% of all haplotypes and three haplotype-tagging SNPs (rs3737787, rs2516841 and rs2516839) in USF1 were genotyped in all subjects to capture these four common haplotypes ([20]; r 2≥0.5, minor allele frequency ≥0.1). All SNPs were genotyped with the TaqMan assay (Applied Biosystems, Foster City, CA, USA). Blind duplicates of 68 samples showed no discordant genotypes. For family data, we removed Mendelian errors and potential genotyping errors identified in 0.4% of the data using PedCheck (v. 1.1) [21] and Merlin (v. 0.10.1) [22], respectively. A total of 0.4% of genotype data were missing. We assessed Hardy–Weinberg equilibrium with the χ 2 test in each study group separately. The marker rs2516839 showed moderate departure from Hardy–Weinberg equilibrium (p=0.05) in the hospital cases, but this is consistent with the possibility that variation in the region affects phenotype using a goodness-of-fit test [23]. Moreover, we could assign the four common haplotypes to most cases with high probability (>0.85) using PHASE. The departure from Hardy–Weinberg equilibrium is therefore less likely to be due to genotyping error but rather to reflect departure of this local region from Hardy–Weinberg equilibrium. Thus, all data were included in the subsequent analyses.

Fig. 1
figure 1

Structure of F11R and USF1 and the location of the SNPs studied. The LD clusters refer to groups of SNPs with r 2≥0.5. Common haplotypes (frequency >0.01) are shown. Haplotype tagging SNPs; *SNPs showing the most significant association with FCHL in the Finnish population [11]

Statistical analyses

Continuous data were transformed by natural logarithm if necessary and expressed as mean±SD or geometric mean (95% CI). Categorical data were compared using the χ 2 test or Fisher’s exact test as appropriate. All statistical tests were performed with SPSS (SPSS for Windows, v. 11.5; SPSS, Chicago, IL, USA) unless specified otherwise. Given the a priori hypotheses being tested and the correlations in phenotype data, no adjustment for multiple comparisons was made. A p value less than 0.05 was considered significant (two-tailed).

Family-based linkage and association analyses

We used Statistical Explanation for Positional Cloning (STEPC) [24] to determine if the SNPs could fully explain the observed linkage for type 2 diabetes and the metabolic syndrome. The hypothesis was that, if a particular marker is the only site that influences the evidence for linkage at the trait, no residual linkage will remain after conditioning on the genotypes of the marker segregating in the families. We performed covariate analyses to determine whether the SNPs explained the evidence for linkage for waist circumference. We performed variance component linkage analyses for waist circumference using Merlin with adjustment for age and sex, and compared the result with results obtained using an additional individual SNP (in an additive model) as covariates.

We used subset analyses to assess whether the SNPs were associated with linkage results for type 2 diabetes and the metabolic syndrome. We used Merlin to select the affected individual from each family who shared the most alleles that were identical by descent at the peak linkage region (175 centimorgans) with the other affected family members [22]. These subjects tend to harbour the disease allele in linked families [25]. We divided the families into three subsets based on the genotype of the selected affected individuals. We performed non-parametric multipoint linkage analyses for type 2 diabetes and the metabolic syndrome in each subset and the combined samples using Merlin with the score-pairs option, as described previously [1]. Significance was assessed for the subset with the highest linkage signal in each SNP by performing 1,000 simulations with random selection of the same number of families as in the subset analyses. The empirical p value (P empirical) was calculated as the proportion of simulations with a LOD (log of the odds) score greater than or equal to the observed LOD score.

We used the FBAT program (v. 1.5.5) [26] to test for the association between SNPs or haplotypes and the traits under an additive genetic model. We tested the null hypothesis of no association in the presence of linkage with the −e option, which empirically estimated the variance. In haplotype analyses, the association was assessed with the overall haplotype frequency distribution for the four common haplotypes with at least ten informative families. In quantitative trait analyses, data on BP and glucose traits were not used if the subjects were taking BP- or glucose-lowering medications. Data were regressed on age and sex and residuals with outliers greater than four standard deviations from the mean were removed. The residuals were then standardised to mean zero and unit variance for FBAT analyses.

Case-control association analyses

We hypothesised that, if a SNP is associated with disease, the risk allele should have the highest frequency in family cases contributing to the linkage signal (linked-family cases), compared with all family cases and hospital cases, whereas control subjects will have the lowest frequency. We tested this hypothesis using three samples. The first sample included family cases in which one subject with either type 2 diabetes or the metabolic syndrome was selected per family. This corresponded to the subject showing the most evidence of allele sharing identity by descent with other affected family members (all family cases). Subsets of cases were also selected based on having a positive family LOD score at the linkage peak for the respective trait (linked-family cases). The second sample included 1,383 unrelated type 2 diabetic patients (hospital cases) for testing association with type 2 diabetes. A subset of 812 type 2 diabetic patients who also had the metabolic syndrome was selected for testing association with the metabolic syndrome. The third sample included 454 subjects without diabetes and the metabolic syndrome (control sample). The allele frequency of each case group was compared with that of the control subjects using a 2×2 χ 2 table. Odds ratios (OR) with 95% CI are presented. We also compared the overall haplotype frequency distribution in each case group vs. the control subjects using PHASE under the condition of no recombination, and assessed the significance through 100 permutations [19].

Power of association studies

We estimated the locus-specific sibling relative risk (λ s(locus)) by calculating the ratio of expected vs observed probability of identity by descent=0 at the peaks of our linkage signals using ASPEX (v. 2.3) [27]. We then assessed the posterior power of our study to detect associations for genetic models that were consistent with the observed λ s(locus) [28, 29].

Results

Linkage disequilibrium in the USF1 region in Chinese individuals

We initially typed nine SNPs in the USF1-F11R region (Fig. 1). All SNPs were in strong LD (D′≥0.6) and formed three clusters with r 2≥0.5 (cluster 1: rs836, rs4339888, rs3737787, rs2073658; cluster 2: rs790056, rs2516841; cluster 3: rs2516839, rs2516838, rs1556259). Subsequently, three haplotype-tagging SNPs, including rs3737787, rs2516841 and rs2516839, were selected for typing in all samples. The pattern of LD in the USF1 region in Chinese was similar to that reported in the Finnish population [11] and the Han Chinese and CEPH populations of the HapMap project [30]. There were, however, differences in the minor allele frequencies in Chinese family samples compared with the Finnish family samples (allele T, 0.10 vs 0.34; allele T, 0.17 vs 0.28; allele A, 0.29 vs 0.61 for rs3737787, rs2516841 and rs2516839, respectively) (Table 1).

Family-based linkage and association

STEPC analyses showed significant residual evidence for linkage conditioning on any of the individual SNPs for type 2 diabetes (0.0002<p<0.05) and the metabolic syndrome (0.00001<p<0.0009). Comparison of linkage results for waist circumference with or without using USF1 SNPs as a covariate showed no effect on linkage peak (original vs SNP-adjusted LOD scores, 4.79 vs 4.16–4.40 at 178 centimorgans and 3.80 vs 3.45–4.09 at 200 centimorgans). Thus, none of the polymorphisms can explain the observed linkage signals for type 2 diabetes, the metabolic syndrome or waist circumference. We also examined the association of USF1 SNPs with the evidence of linkage for type 2 diabetes and the metabolic syndrome by subset linkage analyses based on the genotype of an affected individual in each family. We did not observe significant associations, but the rs3737787 CC genotype and rs2516841 CT genotype showed a trend of association with the evidence for linkage for type 2 diabetes (p empirical=0.10 and 0.09, respectively) and similarly for the rs2516839 GG genotype with the evidence for linkage for the metabolic syndrome (p empirical=0.08).

We further assessed the evidence of association of USF1 SNPs and haplotypes with type 2 diabetes and the metabolic syndrome in the presence of linkage in our families, using FBAT. We found that the common C allele of rs3737787 and the rare T allele of rs2516841 were significantly associated with higher risk of type 2 diabetes (p=0.004 and 0.014, respectively) with an overall significant haplotype effect (p=0.022). The C allele of rs3737787 also showed a trend to increasing risk of the metabolic syndrome (p=0.057) (Table 1). Moreover, we assessed the association of these SNPs and haplotypes with age- and sex-adjusted metabolic traits in all family members. Significant haplotype associations were observed for waist circumference (p=0.010) and HOMA%β (p=0.034), with a trend to association of rs3737787 with waist circumference (p=0.057) (Table 2).

Table 2 Association of USF1 SNPs with metabolic traits in all family members in HKFDS (n=897); p values are shown

Case-control association

We also examined the association of USF1 with type 2 diabetes and the metabolic syndrome in both unrelated family and hospital cases compared with normal control subjects. There was an overall significant haplotype effect in a comparison of family cases and normal control subjects for type 2 diabetes (p=0.01) but not for the metabolic syndrome. Moreover, the common C allele at rs3737787 was associated with significantly higher risk of type 2 diabetes and the metabolic syndrome in all family cases (OR=1.55–1.56, p<0.05) (Table 3). This association was even stronger in the subset of cases from linked (positive LOD score at 175 centimorgans) families (OR 2.33 for type 2 diabetes and 2.60 for the metabolic syndrome, p<0.05) (Table 3). In contrast to results for the family cases, there was no association of USF1 SNPs or haplotypes with type 2 diabetes and the metabolic syndrome when comparing hospital cases and normal control subjects (Table 3). Using genetic models consistent with the λ s(locus) of 1.7 for type 2 diabetes and 2.8 for the metabolic syndrome in our linkage studies, we had more than 99% power to detect association in the case-control studies at r 2≥0.5 between the marker and disease locus under our primary hypothesis that the disease locus accounts for the evidence for linkage in this region.

Table 3 Case-control association of USF1 SNPs with type 2 diabetes and metabolic syndrome

Discussion

We have previously demonstrated significant linkage of chromosome 1q21–25 to type 2 diabetes, the metabolic syndrome and waist circumference in Chinese [1, 2]. Here we show that variation at USF1 in this region was unable to explain much of the evidence for the linkage using STEPC, covariate and subset linkage analyses. However, USF1 is associated with type 2 diabetes, the metabolic syndrome and related traits (waist circumference and HOMA%β) in the family data using both family-based and/or case-control association methods (Tables 1, 2, 3). In contrast to the associations observed in the family cases, we observed no association of USF1 with either type 2 diabetes or the metabolic syndrome in the population-based (hospital) type 2 diabetic cases.

Although we did not resequence USF1 in our population to search for Chinese-specific polymorphisms, private SNPs tend to have very low minor allele frequency and the strong signals observed in our linkage studies would require cumulative susceptibility allele frequencies in the intermediate range [31]. In addition, the LD pattern of D′ and r 2 around USF1 are strong and similar between Han Chinese and CEPH populations in the HapMap data [30], suggesting that the SNPs we studied are representative of USF1. The strongest associations were observed for SNP rs3737787 in most analyses. This SNP is located in a region of strong LD (D′) among the genes ITLN2, F11R, USF1 and PVRL4, but is in strong LD only with SNPs in F11R and USF1. Collectively, the results suggest that variation at USF1, or an as yet unidentified variant in strong LD located at or near USF1, contributes to the risk of developing type 2 diabetes and/or metabolic syndrome in families with evidence for linkage to this region. However, the results provide little support for USF1 as a major contributor to risk of type 2 diabetes or the metabolic syndrome in the general Chinese population.

Genetic variation at USF1 has also been associated with FCHL. Pajukanta et al. [11] showed association and linkage of a USF1 haplotype (the two common alleles for rs3737787 and rs2073658) with FCHL in Finnish families. The strongest association was with high triglyceride levels in men. We observed a similar trend for association of the rs3737787 common C allele with higher triglyceride levels in men from the family study (p=0.051). We also selected a subset of 88 patients having combined hyperlipidaemia (>90 percentiles of the age- and sex-specific population references of both total cholesterol and triglyceride) [32] from the population-based (hospital) cases. We found no association of this phenotype with variation at USF1 (data not shown).

The discrepancy in the results between family-based and population-based (hospital) studies described here was also observed in studies of USF1 and FCHL. The Finnish family study showed association and linkage of variation at USF1 with FCHL in families with evidence for linkage on chromosome 1q21–23 [11]. However, a case-control study in Europeans was not able to replicate this result using population-based cases with FCHL [33]. Why do we and others [11, 33] observe association of USF1 variation with metabolic diseases in families with linkage to chromosome 1q region but not in population-based samples? We cannot exclude the possibility that this is due to differences in ascertainment criteria, clinical heterogeneity and the ethnicity of the study samples. On the other hand, variation at USF1 showed stronger association with the diseases in cases from families demonstrating linkage, but the segregation of that variation in families could not explain much of that evidence for linkage. Given our strong linkage signals, we expect a large effect size and good power to detect association, given the hypothesis that it is variation at USF1 that explains the evidence for linkage. Our results implicate additional susceptibility gene(s) in this region for at least the type 2 diabetes and the metabolic syndrome phenotypes that we studied. These findings have implications for the positional cloning of the gene(s) contributing to the development of metabolic diseases in the chromosome 1q22–23 region. It is plausible that an unidentified gene(s) may play a major role in the development of type 2 diabetes and related diseases and that USF1 could act as a modifier that interacts with this gene; this is consistent with the absence of associations in population-based studies. Thus, it will be interesting to test whether USF1 accounts for the evidence for linkage in the Finnish family study [11]. Clearly, however, further studies in other populations, including both population-based and family samples, and in particular family samples showing evidence for linkage on chromosome 1q, may provide a better understanding of the contribution of this gene to the regulation of glucose and lipid metabolism.