Background

Bone is continuously remodeled in vertebrates through coordinated phases of bone formation and resorption in order to maintain bone volume and phosphorus and calcium homeostasis [1]. Bone remodeling by direct contact with bone cells or by the release of soluble effectors is also altered by other cell protagonists present in the bone microenvironment such as monocytes/macrophages, lymphocytes, and endothelial cells [2]. In the disease state, the loss of bone homeostasis is potentially associated with changes in the numerous cellular protagonists that are responsible for the interactions between bone tissue, the immune system, and the vascular compartment. The study of bone homeostasis can therefore be utilized to elicit a better understanding of the pathologies associated with bone diseases such as osteoporosis [2]. Bone mass also has a very strong genetic determination: Twin and family studies showed that genetic factor could cause 50 to 90% of variance in bone mineral density (BMD) [38]. In addition, both the calcium-sensing receptor (CASR) and the interleukin 6 (IL-6) are important candidate genes for osteoporosis as well as in bone and mineral metabolism. These genes may have effects on BMD variation in Chinese nuclear families [9]. Determining SNPs for bone remodeling-related genes is becoming a more feasible and efficient tool for analyzing the processes associated with osteoporosis. However, an investigation of the distribution of SNPs within human populations is laborious and costly, mainly due to the necessity of testing large numbers of individuals and SNPs. Some SNPs for bone remodeling genes have already been reported; however, there are significant differences in allele frequency distributions among population groups, indicating that the populations exhibit genetic heterogeneity with respect to the incidence of these SNPs. Moreover, racial differences in the prevalence of certain alleles could account for a certain proportion of bone disease trait variation between different ethnicities [10]. The genetic variability of Asian and Caucasian populations was observed at restriction sites exhibiting polymorphisms of five important candidate genes for BMD: CASR-BsaHI, alpha 2HS-glycoprotein (AHSG)-SacI, estrogen receptor alpha (ESR1)-PvuII and XbaI, vitamin D receptor (VDR)-ApaI and parathyroid hormone (PTH)-BstBI. The results of the statistical analysis between the two populations revealed a significant allelic and genotypic differentiation in polymorphisms associated with osteoporosis. Intra- and inter-population variability implies that the studied pattern of variation at some loci may be affected by various types of natural selection [11]. A case-control approach is normally used to investigate the association of osteoporosis with SNPs in osteoporosis-related genes. A few of the newly discovered candidate genes (PLXNA2, CAT and SEMA7A) in our study were also used in case-control association studies in a Korean population [1214]. These genes were screened in 24 individuals and then were genotyped in 560 postmenopausal women to compare gene and bone properties. Statistical analyses found a genetic linkage of the SNPs and haplotypes from the above genes with a risk of vertebral fracture or with BMD at the lumbar spine and at the femur neck [1214]. Thus, to facilitate further association studies using SNPs of genes involved in osteoporosis, we selected 81 candidate genes involved in bone formation and resorption. We have characterized the genetic variants of these candidate osteoporosis genes, including gene-based haplotype diversity. These SNPs may be useful for genetic association studies that compare the SNP and haplotype information of ethnic groups.

Methods

Subjects and candidate genes

The study population consists of 24 unrelated Korean individuals, 11 men and 13 women, who were recruited from Ansan and Ansung area. The men were aged between 41 and 65 years (mean ± SD: 57.8 ± 8.5 years) and the women were aged between 41 and 62 years (mean ± SD: 52.6 ± 6.9 years). They were used for SNP screening and immortalized B lymphocyte cell line generation (cell line IDs GRB2015717, GRB2014744, GRB2014719, GRB2014754, GRB2015301, GRB2014712, GRB2012585, GRB2012949, GRB2012816, GRB2013123, GRB2012811, GRB2012998, GRB2015263, GRB2014890, GRB2014112, GRB2014896, GRB2014197, GRB2010947, GRB2021291, GRB2021404, GRB2021105, GRB2022466, GRB2026940, GRB2021302). Informed consent was obtained from all of the subjects, and this study was approved by the Institutional Review Board of the Korea National Institute of Health. Candidate osteoporosis genes were selected based on their function in bone/chondrocyte formation or bone resorption according to reports in the literature. We included the following genes of interest: those that promote or inhibit bone/chondrocyte formation; those that promote or inhibit bone resorption; and those involved in adipocyte differentiation. Genes of interest that promote bone/chondrocyte formation are as follows: FGFs [15], SOX5,6,9 [15], BMPs [16], LGALS3 [17], LGALS1 [18], DLX5 [16, 19], MSX2 [19], SP7 [19], CBFB [20], TGFBI [21], MSX1 [22], BGLAP [23], SPP1 [24], IBSP [24], IL1RN [25], CTNNB1 [16, 26], WNTs [26], TCF4 [27], OMD [28], VEGFs [29], DMP1 [30], IL13 [31], AR [32], CYP17A1 [32, 33] and CYP19A1 [32, 33]. Genes of interest that inhibit bone/chondrocyte formation are as follows: PTHrP/PTHR1 [15, 19], NPY2R [19], PPARG [19], TWIST1 [16, 24], DKK1 [26], PTH [19, 34], AHSG [35], PPP3CA [36], WIF1 [37], MEPE [38] and IL10 [39, 40]. Genes of interest that promote bone resorption are as follows: PTHrP/PTHR1 [15, 19], PTH [19, 34], PTGS2 [26], IL4 [34], IL6ST [41], CTSK [42], H+ATPase [42], ITGA1 [42, 43], NFKB [42, 44], CALCR [44], CLCN7 [44], FOS [44, 45], FOSB [42, 45], FOSL2 [46, 47], ITGAV [42, 44], CSK [42], TRAF6 [42, 44], MITF [44], CCR1 [48], NFATC1 [49], JDP2 [50], IL15 [51], PTK2B [52], CASR [53], SEMA7A [54], PTGER4 [55] and PLXNA2 [12]. Genes of interest that inhibit bone resorption are as follows: IL13 [31], AR [32], IL3 [56], ZNF675 [57], GPX1 [58] and CAT [59]. In additional, a decrease in bone volume that occurs with age and in osteoporosis is accompanied by an increase in adipose tissue in the bone marrow, suggesting a dysregulation of the mesenchymal stem cell differentiation pathway in favour of adipogenesis. Therefore, we also included the following adipocyte differentiation genes: PPARs [19], CEBPB [19, 47] and DBI [60].

Resequencing analysis

To identify SNPs in the 81 candidate osteoporosis genes (Table 1), we resequenced all exons, including the coding region, the 5' UTR and the 3' UTR up to the splice junctions, as well as the promoter regions of approximately 0.5 kb proximal to the transcription start site in genomic DNA samples. For sequencing analysis, genomic DNA information was obtained from GenBank. Polymerase chain reaction (PCR) primers were designed using the Primer 3 program [61]. Genomic DNA was isolated from the 24 immortalized B lymphocyte cell lines of the selected subjects. PCR products were sequenced using the BigDye Terminator v3.1 cycle sequencing kit (Applied Biosystems, Foster City, CA) and an ABI 3730 automated sequencer (Applied Biosystems). SNPs were detected by multiple alignments of the sequences using the Phred/Phrap/Consed package [62, 63] and polyphred [64]. All data for the SNPs discovered in the Korean samples have been deposited in the KSNP database [65].

Table 1 Gene information for candidate osteoporosis genes

Statistical analysis

The HapMap database [66] was used to compare the Korean population with other populations. To measure the genetic differentiation between populations, Wright's F ST (the classic measure of population divergence) was calculated from the genotypic data. Haplotypes were suggested using the Partition Ligation-Expectation Maximization (PL-EM) algorithm [67]. We used the KSNP database to analyze LD and haplotype blocks and for tagging the detected SNPs. We defined LD blocks according to the method of LD-based blocking with bootstrapping [68], and haplotype tagging of selected SNPs was accomplished using the Entropy method [69].

Results

Identification of SNPs in candidate osteoporosis genes in the Korean population

We directly sequenced 81 candidate osteoporosis genes including all exons, their intron boundaries, and ~1.5 kb of the 5' flanking region. We identified 942 variants, including 888 SNPs, 43 insertion/deletion polymorphisms, and 11 microsatellite markers (Table 2). Of the 888 SNPs, 118 were located in promoter regions, 21 in 5' untranslated regions (UTRs), 157 in coding regions, 435 in introns, 119 in 3' UTRs and 38 in intergenic regions (Table 2). With regard to the minor allele frequency (MAF), we classified the 888 SNPs into low (MAF < 0.05), intermediate (0.05–0.15), and high (>0.15) frequency classes as described by Cargill et al [70] (Fig. 1A). Of the 888 SNPs, we identified 331 unknown SNPs which were not reported in dbSNP (build 124), and the rest were known (Fig. 1B). Of the 888 SNPs, 401 belonged to the high MAF class, of which 53 (13.2%) were unknown SNPs. In addition, the majority of the low MAF class (70.3%) were also unknown SNPs, suggesting that a large portion of newly identified SNPs exist in a recessive model. Overall, about two-third of the SNPs identified in this study are common in the Korean population (MAF > 0.05). When functionally classified, 76% of the nonsynonymous SNPs (cSNP) belonged to the low MAF class whereas only 52.2% of SNPs in the promoter regions belonged to this class (Fig. 1C). In addition, newly identified SNPs with MAF > 0.15 represented 16% of all the discovered SNPs. In functional aspect, we found some unknown SNPs in the coding region of the genes encoding interleukin 6 signal transducer (IL6ST), the androgen receptor (AR), and the core-binding factor beta subunit (CBFB) which were not reported in dbSNP database. However, there were no SNPs in the coding region of NFKB2 in both our dataset and dbSNP, suggesting that they are functionally and evolutionary highly conserved genes.

Table 2 Summary of polymorphisms discovered in candidate osteoporosis genes
Figure 1
figure 1

Distribution of the SNPs identified in the 81 candidate osteoporosis genes. (A) Classification of the SNPs into minor allele frequency (MAF) classes. (B) Number of known and unknown SNPs. (C) Distribution of SNPs according to location or type. The percentages in (A), (B), and (C) refer to the percentage of SNPs within each MAF class in the given categories.

It has been reported that the Japanese SNP database (JSNP) was constructed through the gene-based resquencing method of 24 individuals [71]. Therefore, the newly discovered SNPs for candidate genes of osteoporosis from this study were compared with those in the JSNP database. Of 70 SNPs in the exon region (excluding UTR) with MAF > 0.05 in our data, 28 SNPs were common between our study and the JSNP database. The ratio of the common SNPs to all SNPs from our data and those from JSNP for the selected genes was 28/70 and 28/43, respectively.

Deviation in Heterozygosity and Genetic diversity

We used HapMap to compare the allele frequencies of diverse ethnic groups with that of the Korean population [72]. Among the 557 known SNPs detected in this study, 313 were found in HapMap. We thus evaluated genetic differences between Koreans and the diverse populations by measuring the Wright's F ST coefficients using the 313 common SNPs assuming the Hardy-Weinberg principle. F IS is the average deviation in heterozygosity within subpopulations, F ST is the deviation due to subdivision alone, and F IT is the overall deviation in heterozygosity in the total population [73]. The mean values of F IS , F ST and F IT for multiple loci with five subpopulations (KR, CHB, JPT, CEU and YRI) are -0.0121, 0.3366 and 0.3287, respectively, indicating that the SNPs in genes associated with osteoporosis were significantly differentiated among the five subpopulations while the SNPs within the subpopulations were consistent with the Hardy-Weinberg expectations. In addition, the pairwise F ST (s) of KR compared with each of the four subpopulations using the 313 individual SNPs were calculated. The distribution of the pairwise F ST (s) values is plotted in Fig. 2. Interestingly, two distribution patterns were observed that grouped KR-CHB with KR-JPT and KR-CEU with KR-YRI. In addition, the F ST values for KR-CHB and KR-JPT continually decreased to 0.05 whereas those of KR-CEU and KR-YRI continued to 0.2 or more from which point the overall major and minor alleles are reversed, suggesting that there is a large genetic barrier among continental populations. When a threshold (F ST = 0.1 or higher) as the level of significance was applied [74] to our data, 2, 2, 73 and 92 out of 313 SNPs were significantly deviated between KR compared with CHB, JPT, YRI and CEU, respectively. In order to investigate the difference in linkage disequilibrium (LD) patterns between the significantly diverse SNPs in the sub-populations, two highly polymorphic genes (PTK2B and IL1RN) in terms of the number of SNPs per gene were selected and their Haplotype blocks using Haploview [75, 76] were plotted against five subpopulations, KR, CHB, JPT, CEU and YRI, as shown in Fig. 3. Interestingly, all five haplotype blocks for each gene were different from one another. Overall, the largest block was found in the CEU population whereas smaller blocks were found in the two genes of the KR and YRI populations. This result implies that determining genetic properties, such as, LD is a powerful method to elucidate the subtle differences in genetic diversity between sub-populations.

Figure 2
figure 2

Distribution of F ST among the sub-populations.

Figure 3
figure 3

Comparison of LD patterns of PTK2B and IL1RN among the sub-populations.

In order to determine the genetic diversity between subpopulations, both Nei's standard genetic distance and Latter's F ST distance were also calculated [77, 78] and listed in Table 3. Overall, both distance measures agreed with each other in terms of the trend, but overall, Nei's distances were lower than those of Latter's. The genetic distance between the KR and either the CHB (0.012) or the JPT (0.013) subpopulations was very close to each other. On the other hand, the genetic distance of the KR population was closer to the YRI population (0.594) than that of the CEU population (0.646) in these SNPs of selected genes. Therefore, the genetic diversity between KR compared with the other populations for the selected genes also agreed with the F ST analysis result.

Table 3 Pairwise genetic distance among five population

Discussion

In this study, 81 candidate genes of osteoporosis were sequenced to identify common genetic polymorphisms that might alter bone remodeling. In the analysis of differences among ethnic group allele frequencies using the measure of genetic distance, we showed that the Han Chinese and Japanese populations were close to the Korean population. This implies a strong genetic linkage among the Han Chinese, Japanese and Korean populations, which may reflect either a recent common ancestry or high levels of mutual immigration among these groups [79].

The 888 polymorphisms identified in this study were obtained from 24 unrelated individuals. Three hundred and thirty-one (37.3%) variants were newly identified polymorphisms that were not present in the public database examined, whereas 557 (62.7%) of the polymorphisms found by resequencing were already present in the database. Of the 331 variants that were not reported in the database, 64.4% belonged to the low minor allele frequency group (MAF < 0.05) in Koreans and variants, 35.6% were common SNPs in the Korean population. These common SNPs could be useful for further case-control association studies of osteoporosis in Koreans. We identified new SNPs that had low allele frequencies. This may be due to the fact that previous studies used various factors, such as a mixture of populations, or had a relatively smaller sample size, thereby limiting their ability to discover low allele frequency SNPs. Alternatively, as an ethnically homogeneous population, the Korean samples may have allele frequencies that significantly differ from those from mixed samples. Of the 557 variants that were already present in the dbSNP database, only 16.2% had a minor allele frequency lower than 0.05, 21.4% between 0.05 to 0.15 and 62.5% greater than 0.15. Therefore, our resequencing effort provided experimental validation for more than 460 polymorphisms that were already in the database.

In our study, we measured the LD block structure of the candidate genes, excluding cases of one or two SNPs and uncommon SNPs (MAF < 0.05) in each gene, from the limited sample using normalized D' statistics between all pairwise SNP markers with MAF > 0.05 that satisfied the Hardy-Weinberg's equilibrium (p < 0.05). The LD and haplotype results are shown in the KSNP database [65]. A comparison of the haplotype blocks of two highly polymorphic genes (PTK2B and IL1RN) from the KR population with those from the 4 subpopulations in HapMap, showed diverse block patterns (Fig. 3). Therefore, the LD and haplotype information could be valuable resources for ethnicity comparison, tagging SNPs and recombination signals of the osteoporosis-related genes in future studies.

In this study, the nonsynonymous cSNPs tended to have a larger proportion of low allele frequencies compared with the synonymous cSNPs, the noncoding SNPs, and the promoter SNPs. This trend is consistent with a selection pressure against SNPs that cause amino acid changes [80]. In contrast, the promoter regions, which had a wide range of allele frequencies overall, had more SNPs with high allele frequency compared with the other regions. These results indicate that the promoter variants found in this study might be utilized as genetic determinants for future studies [81]. The several million human SNPs reported in the HapMap international project will likely prove useful for association studies; however SNPs located close to functionally important genes are more valuable as markers than random genomic SNPs. Moreover, SNPs located in the coding or promoter regions have the added benefit of potentially causing the genetic variation that directly contributes to disease. Therefore, additional resequencing efforts are still needed for comprehensive studies of osteoporosis candidate genes across ethnic groups as such data should prove important for future association studies of osteoporosis.

Conclusion

We directly resequenced 81 candidate osteoporosis genes and identified 942 variants including 888 SNPs, 43 insertion/deletion polymorphisms, and 11 microsatellite markers. Of the 888 SNPs, 331 SNPs have not been previously identified and 557 SNPs were already reported in the dbSNP database, of which more than 460 were validated by our resequencing effort.

Statistical analysis of deviation in heterozygosity with the HapMap data depicted that compared with SNPs in Koreans, 1%(or less) of SNPs in Japanese and Chinese and 20% of those in Caucasian and African were significantly differentiated from the Hardy-Weinberg expectations. In addition, the analysis of genetic diversity between Korean and the other four populations showed that the order of the closest neighbor (in terms of genetic distance) is Han Chinese, Japanese, African and Caucasian. In general, we didn't find any significant differences among three sub-populations from KR, CHB and JPT, but these Asian populations, CEU and YRI were significantly different in both the F ST and genetic diversity results in selected genes. Nevertheless, analysis using genetic properties, such as LD and haplotype patterns showed that all-sub populations were substantially different.

Overall, through the resequencing of 81 osteoporosis candidate genes, 118 unknown SNPs with MAF > 0.05 were discovered in a Korean population. In addition, our newly discovered SNPs were compared with those in HapMap to elucidate diversity and deviation in heterozygosity, resulting in strong genetic linkages between the Han Chinese, Japanese and Korean populations. This result may reflect either a recent common ancestry or high levels of mutual immigration among these groups. Yet, using a genetic property, such as LD patterns, is a powerful method to elucidate the subtle differences between the Korean, Chinese and Japanese populations. Our results could aid in the design of case-controlled and population stratification studies in the Korean population.