BioChip Journal

, Volume 10, Issue 2, pp 126–130 | Cite as

Variation analysis to construct Korean-specific exome variation database of pilot scale

Original Article

Abstract

Progress in human genome research has been made in a number of large international projects, including the HapMap, 1000 Genomes (1KGP), ENCyclopedia of DNA elements (ENCODE) and International Human Epigenome Consortium (IHEC) projects, and the data generated from the projects can be used as reference information for human genome studies. However, more specific reference sets are needed at each population level. While a few studies have been conducted for Korean reference sets with a few reference genomes as well as the chip-based Korean SNP and CNV databases, no Korean-specific variation information is constructed as genome scale. Here, we used Korean exomes to construct Korean variation information. Using read data of 100 Korean exomes obtained Korea National Institution of Health (KNIH), we mapped the exome data of each individual on NCBI GRCh37, merged the mapped information, and extracted information on SNPs and indels. We identified a pool of 1,907,598 SNPs and 325,166 indels as initial variations, masked dbSNP the known variation information against 1KGP variation database, and constructed a database of Korean-specific variations. The database can be utilized as a pilot database of Korean exome variation and contribute to Korean variation study with exome chips or whole genome data.

Keywords

Exome sequencing NGS Korean specific SNP Variants 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    International HapMap Consortium. The International HapMap Project. Nature 426, 789-796 (2003).Google Scholar
  2. 2.
    1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56-65 (2012).Google Scholar
  3. 3.
    ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306, 636-640 (2004).Google Scholar
  4. 4.
    Adams, D. et al. BLUEPRINT to decode the epigenetic signature written in blood. Nat. Biotechnol. 30, 224–226 (2012).CrossRefGoogle Scholar
  5. 5.
    Koike, A., Nishida, N., Inoue, I., Tsuji, S. & Tokunaga, K. Genome-wide association database developed in the Japanese Integrated Database Project. J. Hum. Genet. 54, 543–546 (2009).CrossRefGoogle Scholar
  6. 6.
    Yang, X., Xu, S. & The HUGO Pan-Asian SNP Consortium. Identification of close relatives in the HUGO Pan-Asian SNP Database. PLoS One 6, e29502 (2011).Google Scholar
  7. 7.
    Gurdasani, D. et al. The African Genome Variation Project shapes medical genetics in Africa. Nature 517, 327–332 (2015).CrossRefGoogle Scholar
  8. 8.
    Rodriguez-Flores, J.L. et al. Exome sequencing identifies potential risk variants for Mendelian disorders at high prevalence in Qatar. Hum. Mutat. 35, 105–116 (2014).CrossRefGoogle Scholar
  9. 9.
    Lee, S.G., Hong, S., Yoon, Y., Yang, I. & Song, K. Characterization of publicly available SNPs in the Korean population. Hum. Mutat. 17, 281–284 (2001).CrossRefGoogle Scholar
  10. 10.
    Lee, J.K. et al. Characterization of 458 single nucleotide polymorphisms of disease candidate genes in the Korean population. J. Hum. Genet. 48, 213–216 (2003).CrossRefGoogle Scholar
  11. 11.
    Kang, T.W. et al. Copy number variations (CNVs) identified in Korean individuals. BMC Genomics 9, 492 (2008).CrossRefGoogle Scholar
  12. 12.
    Stein, L.D. et al. The generic genome browser: a building block for a model organism system database. Genome Res. 12, 1599–1610 (2002).CrossRefGoogle Scholar
  13. 13.
    Patel, R.K. & Jain, M. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS One 7, e30619 (2012).CrossRefGoogle Scholar
  14. 14.
    Li, H. Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30, 2843–2851 (2014).CrossRefGoogle Scholar
  15. 15.
    D’Antonio, M. et al. WEP: a high-performance analysis pipeline for whole-exome data. BMC Bioinformatics 14, S11 (2013).CrossRefGoogle Scholar
  16. 16.
    Martin, L.J. et al. Whole exome sequencing for familial bicuspid aortic valve identifies putative variants. Circ. Cardiovasc. Genet. 7, 677–683 (2014).CrossRefGoogle Scholar
  17. 17.
    O’Rawe, J. et al. Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med. 5, 28 (2013).CrossRefGoogle Scholar
  18. 18.
    Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from highthroughput sequencing data. Nucleic Acids Res. 38, e164 (2010).CrossRefGoogle Scholar
  19. 19.
    Palmer, R.H. et al. Examining the role of common genetic variants on alcohol, tobacco, cannabis and illicit drug dependence: genetics of vulnerability to drug dependence. Addiction 110, 530–537 (2015).CrossRefGoogle Scholar
  20. 20.
    Zook, J.M. et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–251 (2014).CrossRefGoogle Scholar
  21. 21.
    Hong, C.B. et al. KAREBrowser: SNP database of Korea Association REsource Project. BMB Rep. 45, 47–50 (2012).CrossRefGoogle Scholar
  22. 22.
    Stein, L.D. Using GBrowse 2.0 to visualize and share next-generation sequence data. Brief. Bioinform. 14, 162–171 (2013).CrossRefGoogle Scholar

Copyright information

© The Korean BioChip Society and Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Department of Biomedical InformaticsHanyang UniversitySeoulKorea
  2. 2.Department of Biomedical ScienceHallym UniversityKorea, GangwonKorea
  3. 3.Biomedical Research Institute, College of MedicineHanyang UniversitySeoulKorea
  4. 4.Department of Physiology, College of MedicineHanyang UniversitySeoulKorea

Personalised recommendations