Skip to main content
Log in

Whole genome analysis of a Vietnamese trio

  • Published:
Journal of Biosciences Aims and scope Submit manuscript

ABSTRACT

We here present the first whole genome analysis of an anonymous Kinh Vietnamese (KHV) trio whose genomes were deeply sequenced to 30-fold average coverage. The resulting short reads covered 99.91% of the human reference genome (GRCh37d5). We identified 4,719,412 SNPs and 827,385 short indels that satisfied the Mendelian inheritance law. Among them, 109,914 (2.3%) SNPs and 59,119 (7.1%) short indels were novel. We also detected 30,171 structural variants of which 27,604 (91.5%) were large indels. There were 6,681 large indels in the range 0.1–100 kbp occurring in the child genome that were also confirmed in either the father or mother genome. We compared these large indels against the DGV database and found that 1,499 (22.44%) were KHV specific. De novo assembly of high-quality unmapped reads yielded 789 contigs with the length ≥300 bp. There were 235 contigs from the child genome of which 199 (84.7%) were significantly matched with at least one contig from the father or mother genome. Blasting these 199 contigs against other alternative human genomes revealed 4 novel contigs. The novel variants identified from our study demonstrated the necessity of conducting more genome-wide studies not only for Kinh but also for other ethnic groups in Vietnam.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8

Similar content being viewed by others

References

  • 1000 Genomes Project Consortium 2010 A map of human genome variation from population-scale sequencing. Nature 467 1061–1073

    Article  Google Scholar 

  • 1000 Genomes Project Consortium 2012 An integrated map of genetic variation from 1,092 human genomes. Nature 491 56–65

    Article  Google Scholar 

  • Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W and Lipman DJ 1997 Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 25 3389–3402

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Ahn SM, Kim TH, Lee S, Kim D, Ghang H, Kim DS, Kim BC, Kim SY, et al. 2009 The first korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res. 19 1622–1629

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Azim MK, Yang C, Yan Z, Choudhary MI, Khan A, Sun X, Li R, Asif H, et al. 2013 Complete genome sequencing and variant analysis of a pakistani individual. J. Hum. Genet. 58 622–626

    Article  CAS  PubMed  Google Scholar 

  • Boomsma DI, Wijmenga C, Slagboom EP, Swertz MA, Karssen LC, Abdellaoui A, Ye K, Guryev V, et al. 2014 The genome of the netherlands: design, and project goals. Eur. J. Hum. Genet. 22 221–227

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Chen K, Wallis JW, McLellan MD, Larson DE, Kalicki JM, Pohl CS, McGrath SD, Wendl MC, et al. 2009 Breakdancer: an algorithm for high-resolution mapping of genomic structural variation. Nat. Methods 6 677–681

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, Land SJ, Lu X, et al. 2012 A program for annotating and predicting the effects of single nucleotide polymorphisms, snpeff: Snps in the genome of Drosophila melanogaster strain w1118 iso-2 iso-3. Fly 6 80–92

  • DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, et al. 2011 A framework for variation discovery and genotyping using next-generation dna sequencing data. Nat. Genet. 43 491–498

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • DePristo M and Mark D 2010 Mendelian violations in the CEU andYRI Pilot 2 Trios. Technical report at broad Institute of Harvard and MIT

  • Dogan H, Can H and Otu HH 2014 Whole genome sequence of a turkish individual. PLoS One 9 85233

    Article  Google Scholar 

  • Drmanac R 2011 The advent of personal genome sequencing. Genet. Med. 13 188–190

    Article  PubMed  Google Scholar 

  • Eden E, Navon R, Steinfeld I, Lipson D and Yakhini Z 2009 Gorilla: a tool for discovery and visualization of enriched go terms in ranked gene lists. BMC Bioinform. 10 48

  • Fujimoto A, Nakagawa H, Hosono N, Nakano K, Abe T, Boroevich KA, Nagasaki M, Yamaguchi R, et al. 2010 Whole-genome sequencing and comprehensive variant analysis of a japanese individual using massively parallel sequencing. Nat. Genet. 42 931–936

    Article  CAS  PubMed  Google Scholar 

  • Hardy BJ, Seguin B, Singer PA, Mukerji M, Brahmachari SK and Daar AS 2008 From diversity to delivery: the case of the indian genome variation initiative. Nat. Rev. Genet. 9 9–14

    Article  Google Scholar 

  • He Z, O’Roak BJ, Smith JD, Wang G, Hooker S, Santos-Cortez RLP, Li B, Kan M, et al. 2014 Rare-variant extensions of the transmission disequilibrium test: Application to autism exome sequence data. Am. J. Hum. Genet. 94 p33–46

  • International Human Genome Sequencing Consortium 2004 Finishing the euchromatic sequence of the human genome. Nature 431 931–945

    Article  Google Scholar 

  • Iqbal Z, Caccamo M, Turner I, Flicek P and McVean G 2012 De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat. Genet. 44 226–232

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Kumar P, Henikoff S and Ng PC 2009 Predicting the effects of coding non-synonymous variants on protein function using the sift algorithm. Nat. Protoc. 4 1073–1081

    Article  CAS  PubMed  Google Scholar 

  • Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, et al. 2001 Initial sequencing and analysis of the human genome. Nature 409 860–921

    Article  CAS  PubMed  Google Scholar 

  • Li H and Durbin R 2009 Fast and accurate short read alignment with burrows–wheeler transform. Bioinformatics 25 1754–1760

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, et al. 2009 The sequence alignment/map format and samtools. Bioinformatics 25 2078–2079

    Article  PubMed Central  PubMed  Google Scholar 

  • McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, et al. 2010 The genome analysis toolkit: a mapreduce framework for analyzing next-generation dna sequencing data. Genome Res. 20 1297–1303

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • MacDonald JR, Ziman R, Yuen RK, Feuk L and Scherer SW 2014 The database of genomic variants: a curated collection of structural variation in the human genome. Nucleic Acids Res. 42 986–992

    Article  Google Scholar 

  • Roach JC, Glusman G, Smit AF, Huff CD, Hubley R, Shannon PT, et al. 2010 Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328 636–639

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Shendure J and Ji H 2008 Next-generation dna sequencing. Nat. Biotechnol. 26 1135–1145

    Article  CAS  PubMed  Google Scholar 

  • Sherry ST, Ward MH, Kholodov M, BakerJ PL, Smigielski EM and Sirotkin K 2001 dbsnp: the ncbi database of genetic variation. Nucleic Acids Res. 29 308–311

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Shigemizu D, Fujimoto A, Akiyama S, Abe T, Nakano K, Boroevich KA, Yamamoto Y, Furuta M, Kubo M, Nakagawa H, et al. 2013 A practical method to detect snvs and indels from whole genome and exome sequencing data. Sci. Rep. 3

  • Siva N 2008 1000 genomes project. Nat. Biotechnol. 26 256–256

    PubMed  Google Scholar 

  • Skryabin K, Prokhortchouk E, Mazur A, Boulygina E, Tsygankova S, Nedoluzhko A, Rastorguev S, Matveev V, et al. 2009 Combining two technologies for full genome sequencing of human. Acta Naturae 1 102

    PubMed Central  CAS  PubMed  Google Scholar 

  • Spielman RS, McGinnis RE and Ewens WJ 1993 Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am. J. Hum. Genet. 52 506–516

    PubMed Central  CAS  PubMed  Google Scholar 

  • Tennessen J, Bigham A, O'Connor T, et al. 2012 Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337 64–69

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, et al. 2001 The sequence of the human genome. Science 291 1304–1351

    Article  CAS  PubMed  Google Scholar 

  • von Bubnoff A 2008 Next-generation sequencing: the race is on. Cell 132 721–723

    Article  Google Scholar 

  • Wang J, Wang W, Li R, Li Y, Tian G, Goodman L, Fan W, Zhang J, et al. 2008 The diploid genome sequence of an Asian individual. Nature 456 60–65

  • Wong LP, Ong RTH, Poh WT, Liu X, Chen P, Li R, Lam KKY, Pillai NE, et al. 2013 Deep whole-genome sequencing of 100 Southeast Asian Malays. Am. J. Hum. Genet. 92 52–66

  • Zerbino DR and Birney E 2008 Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18 821–829

Download references

Acknowledgements

We would like to express our special thanks to Prof Nguyen Huu Duc from Vietnam National University, Hanoi, for his constant encouragement and support. We thank Prof Jean Daniel Zucker, Dr Zamin Iqbal and Prof Arndt von Haeseler for providing useful inputs to our manuscript. This project is partly financially supported by the Science and Technology Foundation of Vietnam National University, Hanoi (grant no. QKHCN.13.01). We also would like to thank the Center for Integrative Bioinformatics Vienna for providing computational resources. BQM acknowledges financial support by the Austrian Science Fund - FWF (grant no. I760-B17).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Le Si Quang or Le Sy Vinh.

Additional information

Corresponding editor: PARTHA P MAJUMDER

[Hai DT, Thanh ND, Trang PTM, Quang LS, Hang PTT, Cuong DC, Phuc HK, Duc NH, Dong DD, Minh BQ, Son PB and Vinh LS 2015 Whole genome analysis of a Vietnamese trio. J. Biosci. 40 1–12] DOI 10.1007/s12038-015-9501-0

Supplementary materials pertaining to this article are available on the Journal of Biosciences Website at http://www.ias.ac.in/jbiosci/mar2015/supp/Hai.pdf

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 716 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hai, D.T., Thanh, N.D., Trang, P.T.M. et al. Whole genome analysis of a Vietnamese trio. J Biosci 40, 113–124 (2015). https://doi.org/10.1007/s12038-015-9501-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12038-015-9501-0

Keywords

Navigation