ABSTRACT
We here present the first whole genome analysis of an anonymous Kinh Vietnamese (KHV) trio whose genomes were deeply sequenced to 30-fold average coverage. The resulting short reads covered 99.91% of the human reference genome (GRCh37d5). We identified 4,719,412 SNPs and 827,385 short indels that satisfied the Mendelian inheritance law. Among them, 109,914 (2.3%) SNPs and 59,119 (7.1%) short indels were novel. We also detected 30,171 structural variants of which 27,604 (91.5%) were large indels. There were 6,681 large indels in the range 0.1–100 kbp occurring in the child genome that were also confirmed in either the father or mother genome. We compared these large indels against the DGV database and found that 1,499 (22.44%) were KHV specific. De novo assembly of high-quality unmapped reads yielded 789 contigs with the length ≥300 bp. There were 235 contigs from the child genome of which 199 (84.7%) were significantly matched with at least one contig from the father or mother genome. Blasting these 199 contigs against other alternative human genomes revealed 4 novel contigs. The novel variants identified from our study demonstrated the necessity of conducting more genome-wide studies not only for Kinh but also for other ethnic groups in Vietnam.
Similar content being viewed by others
References
1000 Genomes Project Consortium 2010 A map of human genome variation from population-scale sequencing. Nature 467 1061–1073
1000 Genomes Project Consortium 2012 An integrated map of genetic variation from 1,092 human genomes. Nature 491 56–65
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W and Lipman DJ 1997 Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 25 3389–3402
Ahn SM, Kim TH, Lee S, Kim D, Ghang H, Kim DS, Kim BC, Kim SY, et al. 2009 The first korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res. 19 1622–1629
Azim MK, Yang C, Yan Z, Choudhary MI, Khan A, Sun X, Li R, Asif H, et al. 2013 Complete genome sequencing and variant analysis of a pakistani individual. J. Hum. Genet. 58 622–626
Boomsma DI, Wijmenga C, Slagboom EP, Swertz MA, Karssen LC, Abdellaoui A, Ye K, Guryev V, et al. 2014 The genome of the netherlands: design, and project goals. Eur. J. Hum. Genet. 22 221–227
Chen K, Wallis JW, McLellan MD, Larson DE, Kalicki JM, Pohl CS, McGrath SD, Wendl MC, et al. 2009 Breakdancer: an algorithm for high-resolution mapping of genomic structural variation. Nat. Methods 6 677–681
Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, Land SJ, Lu X, et al. 2012 A program for annotating and predicting the effects of single nucleotide polymorphisms, snpeff: Snps in the genome of Drosophila melanogaster strain w1118 iso-2 iso-3. Fly 6 80–92
DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, et al. 2011 A framework for variation discovery and genotyping using next-generation dna sequencing data. Nat. Genet. 43 491–498
DePristo M and Mark D 2010 Mendelian violations in the CEU andYRI Pilot 2 Trios. Technical report at broad Institute of Harvard and MIT
Dogan H, Can H and Otu HH 2014 Whole genome sequence of a turkish individual. PLoS One 9 85233
Drmanac R 2011 The advent of personal genome sequencing. Genet. Med. 13 188–190
Eden E, Navon R, Steinfeld I, Lipson D and Yakhini Z 2009 Gorilla: a tool for discovery and visualization of enriched go terms in ranked gene lists. BMC Bioinform. 10 48
Fujimoto A, Nakagawa H, Hosono N, Nakano K, Abe T, Boroevich KA, Nagasaki M, Yamaguchi R, et al. 2010 Whole-genome sequencing and comprehensive variant analysis of a japanese individual using massively parallel sequencing. Nat. Genet. 42 931–936
Hardy BJ, Seguin B, Singer PA, Mukerji M, Brahmachari SK and Daar AS 2008 From diversity to delivery: the case of the indian genome variation initiative. Nat. Rev. Genet. 9 9–14
He Z, O’Roak BJ, Smith JD, Wang G, Hooker S, Santos-Cortez RLP, Li B, Kan M, et al. 2014 Rare-variant extensions of the transmission disequilibrium test: Application to autism exome sequence data. Am. J. Hum. Genet. 94 p33–46
International Human Genome Sequencing Consortium 2004 Finishing the euchromatic sequence of the human genome. Nature 431 931–945
Iqbal Z, Caccamo M, Turner I, Flicek P and McVean G 2012 De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat. Genet. 44 226–232
Kumar P, Henikoff S and Ng PC 2009 Predicting the effects of coding non-synonymous variants on protein function using the sift algorithm. Nat. Protoc. 4 1073–1081
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, et al. 2001 Initial sequencing and analysis of the human genome. Nature 409 860–921
Li H and Durbin R 2009 Fast and accurate short read alignment with burrows–wheeler transform. Bioinformatics 25 1754–1760
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, et al. 2009 The sequence alignment/map format and samtools. Bioinformatics 25 2078–2079
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, et al. 2010 The genome analysis toolkit: a mapreduce framework for analyzing next-generation dna sequencing data. Genome Res. 20 1297–1303
MacDonald JR, Ziman R, Yuen RK, Feuk L and Scherer SW 2014 The database of genomic variants: a curated collection of structural variation in the human genome. Nucleic Acids Res. 42 986–992
Roach JC, Glusman G, Smit AF, Huff CD, Hubley R, Shannon PT, et al. 2010 Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328 636–639
Shendure J and Ji H 2008 Next-generation dna sequencing. Nat. Biotechnol. 26 1135–1145
Sherry ST, Ward MH, Kholodov M, BakerJ PL, Smigielski EM and Sirotkin K 2001 dbsnp: the ncbi database of genetic variation. Nucleic Acids Res. 29 308–311
Shigemizu D, Fujimoto A, Akiyama S, Abe T, Nakano K, Boroevich KA, Yamamoto Y, Furuta M, Kubo M, Nakagawa H, et al. 2013 A practical method to detect snvs and indels from whole genome and exome sequencing data. Sci. Rep. 3
Siva N 2008 1000 genomes project. Nat. Biotechnol. 26 256–256
Skryabin K, Prokhortchouk E, Mazur A, Boulygina E, Tsygankova S, Nedoluzhko A, Rastorguev S, Matveev V, et al. 2009 Combining two technologies for full genome sequencing of human. Acta Naturae 1 102
Spielman RS, McGinnis RE and Ewens WJ 1993 Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am. J. Hum. Genet. 52 506–516
Tennessen J, Bigham A, O'Connor T, et al. 2012 Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337 64–69
Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, et al. 2001 The sequence of the human genome. Science 291 1304–1351
von Bubnoff A 2008 Next-generation sequencing: the race is on. Cell 132 721–723
Wang J, Wang W, Li R, Li Y, Tian G, Goodman L, Fan W, Zhang J, et al. 2008 The diploid genome sequence of an Asian individual. Nature 456 60–65
Wong LP, Ong RTH, Poh WT, Liu X, Chen P, Li R, Lam KKY, Pillai NE, et al. 2013 Deep whole-genome sequencing of 100 Southeast Asian Malays. Am. J. Hum. Genet. 92 52–66
Zerbino DR and Birney E 2008 Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18 821–829
Acknowledgements
We would like to express our special thanks to Prof Nguyen Huu Duc from Vietnam National University, Hanoi, for his constant encouragement and support. We thank Prof Jean Daniel Zucker, Dr Zamin Iqbal and Prof Arndt von Haeseler for providing useful inputs to our manuscript. This project is partly financially supported by the Science and Technology Foundation of Vietnam National University, Hanoi (grant no. QKHCN.13.01). We also would like to thank the Center for Integrative Bioinformatics Vienna for providing computational resources. BQM acknowledges financial support by the Austrian Science Fund - FWF (grant no. I760-B17).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Corresponding editor: PARTHA P MAJUMDER
[Hai DT, Thanh ND, Trang PTM, Quang LS, Hang PTT, Cuong DC, Phuc HK, Duc NH, Dong DD, Minh BQ, Son PB and Vinh LS 2015 Whole genome analysis of a Vietnamese trio. J. Biosci. 40 1–12] DOI 10.1007/s12038-015-9501-0
Supplementary materials pertaining to this article are available on the Journal of Biosciences Website at http://www.ias.ac.in/jbiosci/mar2015/supp/Hai.pdf
Electronic supplementary material
Below is the link to the electronic supplementary material.
ESM 1
(PDF 716 kb)
Rights and permissions
About this article
Cite this article
Hai, D.T., Thanh, N.D., Trang, P.T.M. et al. Whole genome analysis of a Vietnamese trio. J Biosci 40, 113–124 (2015). https://doi.org/10.1007/s12038-015-9501-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12038-015-9501-0