Journal of Biosciences

, Volume 40, Issue 1, pp 113–124 | Cite as

Whole genome analysis of a Vietnamese trio

  • Dang Thanh Hai
  • Nguyen Dai Thanh
  • Pham Thi Minh Trang
  • Le Si Quang
  • Phan Thi Thu Hang
  • Dang Cao Cuong
  • Hoang Kim Phuc
  • Nguyen Huu Duc
  • Do Duc Dong
  • Bui Quang Minh
  • Pham Bao Son
  • Le Sy Vinh
Article

ABSTRACT

We here present the first whole genome analysis of an anonymous Kinh Vietnamese (KHV) trio whose genomes were deeply sequenced to 30-fold average coverage. The resulting short reads covered 99.91% of the human reference genome (GRCh37d5). We identified 4,719,412 SNPs and 827,385 short indels that satisfied the Mendelian inheritance law. Among them, 109,914 (2.3%) SNPs and 59,119 (7.1%) short indels were novel. We also detected 30,171 structural variants of which 27,604 (91.5%) were large indels. There were 6,681 large indels in the range 0.1–100 kbp occurring in the child genome that were also confirmed in either the father or mother genome. We compared these large indels against the DGV database and found that 1,499 (22.44%) were KHV specific. De novo assembly of high-quality unmapped reads yielded 789 contigs with the length ≥300 bp. There were 235 contigs from the child genome of which 199 (84.7%) were significantly matched with at least one contig from the father or mother genome. Blasting these 199 contigs against other alternative human genomes revealed 4 novel contigs. The novel variants identified from our study demonstrated the necessity of conducting more genome-wide studies not only for Kinh but also for other ethnic groups in Vietnam.

Keywords

Genomic variant analysis Vietnamese human genome Whole genome sequencing data analysis 

Supplementary material

12038_2015_9501_MOESM1_ESM.pdf (717 kb)
ESM 1(PDF 716 kb)

References

  1. 1000 Genomes Project Consortium 2010 A map of human genome variation from population-scale sequencing. Nature 467 1061–1073CrossRefGoogle Scholar
  2. 1000 Genomes Project Consortium 2012 An integrated map of genetic variation from 1,092 human genomes. Nature 491 56–65CrossRefGoogle Scholar
  3. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W and Lipman DJ 1997 Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 25 3389–3402CrossRefPubMedCentralPubMedGoogle Scholar
  4. Ahn SM, Kim TH, Lee S, Kim D, Ghang H, Kim DS, Kim BC, Kim SY, et al. 2009 The first korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res. 19 1622–1629CrossRefPubMedCentralPubMedGoogle Scholar
  5. Azim MK, Yang C, Yan Z, Choudhary MI, Khan A, Sun X, Li R, Asif H, et al. 2013 Complete genome sequencing and variant analysis of a pakistani individual. J. Hum. Genet. 58 622–626CrossRefPubMedGoogle Scholar
  6. Boomsma DI, Wijmenga C, Slagboom EP, Swertz MA, Karssen LC, Abdellaoui A, Ye K, Guryev V, et al. 2014 The genome of the netherlands: design, and project goals. Eur. J. Hum. Genet. 22 221–227CrossRefPubMedCentralPubMedGoogle Scholar
  7. Chen K, Wallis JW, McLellan MD, Larson DE, Kalicki JM, Pohl CS, McGrath SD, Wendl MC, et al. 2009 Breakdancer: an algorithm for high-resolution mapping of genomic structural variation. Nat. Methods 6 677–681CrossRefPubMedCentralPubMedGoogle Scholar
  8. Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, Land SJ, Lu X, et al. 2012 A program for annotating and predicting the effects of single nucleotide polymorphisms, snpeff: Snps in the genome of Drosophila melanogaster strain w1118 iso-2 iso-3. Fly 6 80–92Google Scholar
  9. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, et al. 2011 A framework for variation discovery and genotyping using next-generation dna sequencing data. Nat. Genet. 43 491–498CrossRefPubMedCentralPubMedGoogle Scholar
  10. DePristo M and Mark D 2010 Mendelian violations in the CEU andYRI Pilot 2 Trios. Technical report at broad Institute of Harvard and MITGoogle Scholar
  11. Dogan H, Can H and Otu HH 2014 Whole genome sequence of a turkish individual. PLoS One 9 85233CrossRefGoogle Scholar
  12. Drmanac R 2011 The advent of personal genome sequencing. Genet. Med. 13 188–190CrossRefPubMedGoogle Scholar
  13. Eden E, Navon R, Steinfeld I, Lipson D and Yakhini Z 2009 Gorilla: a tool for discovery and visualization of enriched go terms in ranked gene lists. BMC Bioinform. 10 48Google Scholar
  14. Fujimoto A, Nakagawa H, Hosono N, Nakano K, Abe T, Boroevich KA, Nagasaki M, Yamaguchi R, et al. 2010 Whole-genome sequencing and comprehensive variant analysis of a japanese individual using massively parallel sequencing. Nat. Genet. 42 931–936CrossRefPubMedGoogle Scholar
  15. Hardy BJ, Seguin B, Singer PA, Mukerji M, Brahmachari SK and Daar AS 2008 From diversity to delivery: the case of the indian genome variation initiative. Nat. Rev. Genet. 9 9–14CrossRefGoogle Scholar
  16. He Z, O’Roak BJ, Smith JD, Wang G, Hooker S, Santos-Cortez RLP, Li B, Kan M, et al. 2014 Rare-variant extensions of the transmission disequilibrium test: Application to autism exome sequence data. Am. J. Hum. Genet. 94 p33–46Google Scholar
  17. International Human Genome Sequencing Consortium 2004 Finishing the euchromatic sequence of the human genome. Nature 431 931–945CrossRefGoogle Scholar
  18. Iqbal Z, Caccamo M, Turner I, Flicek P and McVean G 2012 De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat. Genet. 44 226–232CrossRefPubMedCentralPubMedGoogle Scholar
  19. Kumar P, Henikoff S and Ng PC 2009 Predicting the effects of coding non-synonymous variants on protein function using the sift algorithm. Nat. Protoc. 4 1073–1081CrossRefPubMedGoogle Scholar
  20. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, et al. 2001 Initial sequencing and analysis of the human genome. Nature 409 860–921CrossRefPubMedGoogle Scholar
  21. Li H and Durbin R 2009 Fast and accurate short read alignment with burrows–wheeler transform. Bioinformatics 25 1754–1760CrossRefPubMedCentralPubMedGoogle Scholar
  22. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, et al. 2009 The sequence alignment/map format and samtools. Bioinformatics 25 2078–2079CrossRefPubMedCentralPubMedGoogle Scholar
  23. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, et al. 2010 The genome analysis toolkit: a mapreduce framework for analyzing next-generation dna sequencing data. Genome Res. 20 1297–1303CrossRefPubMedCentralPubMedGoogle Scholar
  24. MacDonald JR, Ziman R, Yuen RK, Feuk L and Scherer SW 2014 The database of genomic variants: a curated collection of structural variation in the human genome. Nucleic Acids Res. 42 986–992CrossRefGoogle Scholar
  25. Roach JC, Glusman G, Smit AF, Huff CD, Hubley R, Shannon PT, et al. 2010 Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328 636–639CrossRefPubMedCentralPubMedGoogle Scholar
  26. Shendure J and Ji H 2008 Next-generation dna sequencing. Nat. Biotechnol. 26 1135–1145CrossRefPubMedGoogle Scholar
  27. Sherry ST, Ward MH, Kholodov M, BakerJ PL, Smigielski EM and Sirotkin K 2001 dbsnp: the ncbi database of genetic variation. Nucleic Acids Res. 29 308–311CrossRefPubMedCentralPubMedGoogle Scholar
  28. Shigemizu D, Fujimoto A, Akiyama S, Abe T, Nakano K, Boroevich KA, Yamamoto Y, Furuta M, Kubo M, Nakagawa H, et al. 2013 A practical method to detect snvs and indels from whole genome and exome sequencing data. Sci. Rep. 3 Google Scholar
  29. Siva N 2008 1000 genomes project. Nat. Biotechnol. 26 256–256PubMedGoogle Scholar
  30. Skryabin K, Prokhortchouk E, Mazur A, Boulygina E, Tsygankova S, Nedoluzhko A, Rastorguev S, Matveev V, et al. 2009 Combining two technologies for full genome sequencing of human. Acta Naturae 1 102PubMedCentralPubMedGoogle Scholar
  31. Spielman RS, McGinnis RE and Ewens WJ 1993 Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am. J. Hum. Genet. 52 506–516PubMedCentralPubMedGoogle Scholar
  32. Tennessen J, Bigham A, O'Connor T, et al. 2012 Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337 64–69CrossRefPubMedCentralPubMedGoogle Scholar
  33. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, et al. 2001 The sequence of the human genome. Science 291 1304–1351CrossRefPubMedGoogle Scholar
  34. von Bubnoff A 2008 Next-generation sequencing: the race is on. Cell 132 721–723CrossRefGoogle Scholar
  35. Wang J, Wang W, Li R, Li Y, Tian G, Goodman L, Fan W, Zhang J, et al. 2008 The diploid genome sequence of an Asian individual. Nature 456 60–65Google Scholar
  36. Wong LP, Ong RTH, Poh WT, Liu X, Chen P, Li R, Lam KKY, Pillai NE, et al. 2013 Deep whole-genome sequencing of 100 Southeast Asian Malays. Am. J. Hum. Genet. 92 52–66Google Scholar
  37. Zerbino DR and Birney E 2008 Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18 821–829Google Scholar

Copyright information

© Indian Academy of Sciences 2015

Authors and Affiliations

  • Dang Thanh Hai
    • 1
  • Nguyen Dai Thanh
    • 1
  • Pham Thi Minh Trang
    • 1
  • Le Si Quang
    • 2
  • Phan Thi Thu Hang
    • 2
  • Dang Cao Cuong
    • 1
  • Hoang Kim Phuc
    • 1
  • Nguyen Huu Duc
    • 3
  • Do Duc Dong
    • 4
  • Bui Quang Minh
    • 5
  • Pham Bao Son
    • 1
  • Le Sy Vinh
    • 1
    • 4
  1. 1.University of Engineering and Technology, Vietnam National University HanoiHanoiVietnam
  2. 2.Wellcome Trust Center for Human GeneticsOxford UniversityOxfordUK
  3. 3.High Performance Computing CenterHanoi University of Science and TechnologyHanoiVietnam
  4. 4.Information Technology InstituteVietnam National University HanoiHanoiVietnam
  5. 5.Center for Integrative Bioinformatics Vienna, Max F. Perutz LaboratoriesUniversity of Vienna, Medical University of ViennaViennaAustria

Personalised recommendations