Variation analysis to construct Korean-specific exome variation database of pilot scale
Abstract
Progress in human genome research has been made in a number of large international projects, including the HapMap, 1000 Genomes (1KGP), ENCyclopedia of DNA elements (ENCODE) and International Human Epigenome Consortium (IHEC) projects, and the data generated from the projects can be used as reference information for human genome studies. However, more specific reference sets are needed at each population level. While a few studies have been conducted for Korean reference sets with a few reference genomes as well as the chip-based Korean SNP and CNV databases, no Korean-specific variation information is constructed as genome scale. Here, we used Korean exomes to construct Korean variation information. Using read data of 100 Korean exomes obtained Korea National Institution of Health (KNIH), we mapped the exome data of each individual on NCBI GRCh37, merged the mapped information, and extracted information on SNPs and indels. We identified a pool of 1,907,598 SNPs and 325,166 indels as initial variations, masked dbSNP the known variation information against 1KGP variation database, and constructed a database of Korean-specific variations. The database can be utilized as a pilot database of Korean exome variation and contribute to Korean variation study with exome chips or whole genome data.
Keywords
Exome sequencing NGS Korean specific SNP VariantsPreview
Unable to display preview. Download preview PDF.
References
- 1.International HapMap Consortium. The International HapMap Project. Nature 426, 789-796 (2003).Google Scholar
- 2.1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56-65 (2012).Google Scholar
- 3.ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306, 636-640 (2004).Google Scholar
- 4.Adams, D. et al. BLUEPRINT to decode the epigenetic signature written in blood. Nat. Biotechnol. 30, 224–226 (2012).CrossRefGoogle Scholar
- 5.Koike, A., Nishida, N., Inoue, I., Tsuji, S. & Tokunaga, K. Genome-wide association database developed in the Japanese Integrated Database Project. J. Hum. Genet. 54, 543–546 (2009).CrossRefGoogle Scholar
- 6.Yang, X., Xu, S. & The HUGO Pan-Asian SNP Consortium. Identification of close relatives in the HUGO Pan-Asian SNP Database. PLoS One 6, e29502 (2011).Google Scholar
- 7.Gurdasani, D. et al. The African Genome Variation Project shapes medical genetics in Africa. Nature 517, 327–332 (2015).CrossRefGoogle Scholar
- 8.Rodriguez-Flores, J.L. et al. Exome sequencing identifies potential risk variants for Mendelian disorders at high prevalence in Qatar. Hum. Mutat. 35, 105–116 (2014).CrossRefGoogle Scholar
- 9.Lee, S.G., Hong, S., Yoon, Y., Yang, I. & Song, K. Characterization of publicly available SNPs in the Korean population. Hum. Mutat. 17, 281–284 (2001).CrossRefGoogle Scholar
- 10.Lee, J.K. et al. Characterization of 458 single nucleotide polymorphisms of disease candidate genes in the Korean population. J. Hum. Genet. 48, 213–216 (2003).CrossRefGoogle Scholar
- 11.Kang, T.W. et al. Copy number variations (CNVs) identified in Korean individuals. BMC Genomics 9, 492 (2008).CrossRefGoogle Scholar
- 12.Stein, L.D. et al. The generic genome browser: a building block for a model organism system database. Genome Res. 12, 1599–1610 (2002).CrossRefGoogle Scholar
- 13.Patel, R.K. & Jain, M. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS One 7, e30619 (2012).CrossRefGoogle Scholar
- 14.Li, H. Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30, 2843–2851 (2014).CrossRefGoogle Scholar
- 15.D’Antonio, M. et al. WEP: a high-performance analysis pipeline for whole-exome data. BMC Bioinformatics 14, S11 (2013).CrossRefGoogle Scholar
- 16.Martin, L.J. et al. Whole exome sequencing for familial bicuspid aortic valve identifies putative variants. Circ. Cardiovasc. Genet. 7, 677–683 (2014).CrossRefGoogle Scholar
- 17.O’Rawe, J. et al. Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med. 5, 28 (2013).CrossRefGoogle Scholar
- 18.Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from highthroughput sequencing data. Nucleic Acids Res. 38, e164 (2010).CrossRefGoogle Scholar
- 19.Palmer, R.H. et al. Examining the role of common genetic variants on alcohol, tobacco, cannabis and illicit drug dependence: genetics of vulnerability to drug dependence. Addiction 110, 530–537 (2015).CrossRefGoogle Scholar
- 20.Zook, J.M. et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–251 (2014).CrossRefGoogle Scholar
- 21.Hong, C.B. et al. KAREBrowser: SNP database of Korea Association REsource Project. BMB Rep. 45, 47–50 (2012).CrossRefGoogle Scholar
- 22.Stein, L.D. Using GBrowse 2.0 to visualize and share next-generation sequence data. Brief. Bioinform. 14, 162–171 (2013).CrossRefGoogle Scholar