Whole genome analysis of a Vietnamese trio
We here present the first whole genome analysis of an anonymous Kinh Vietnamese (KHV) trio whose genomes were deeply sequenced to 30-fold average coverage. The resulting short reads covered 99.91% of the human reference genome (GRCh37d5). We identified 4,719,412 SNPs and 827,385 short indels that satisfied the Mendelian inheritance law. Among them, 109,914 (2.3%) SNPs and 59,119 (7.1%) short indels were novel. We also detected 30,171 structural variants of which 27,604 (91.5%) were large indels. There were 6,681 large indels in the range 0.1–100 kbp occurring in the child genome that were also confirmed in either the father or mother genome. We compared these large indels against the DGV database and found that 1,499 (22.44%) were KHV specific. De novo assembly of high-quality unmapped reads yielded 789 contigs with the length ≥300 bp. There were 235 contigs from the child genome of which 199 (84.7%) were significantly matched with at least one contig from the father or mother genome. Blasting these 199 contigs against other alternative human genomes revealed 4 novel contigs. The novel variants identified from our study demonstrated the necessity of conducting more genome-wide studies not only for Kinh but also for other ethnic groups in Vietnam.
KeywordsGenomic variant analysis Vietnamese human genome Whole genome sequencing data analysis
We would like to express our special thanks to Prof Nguyen Huu Duc from Vietnam National University, Hanoi, for his constant encouragement and support. We thank Prof Jean Daniel Zucker, Dr Zamin Iqbal and Prof Arndt von Haeseler for providing useful inputs to our manuscript. This project is partly financially supported by the Science and Technology Foundation of Vietnam National University, Hanoi (grant no. QKHCN.13.01). We also would like to thank the Center for Integrative Bioinformatics Vienna for providing computational resources. BQM acknowledges financial support by the Austrian Science Fund - FWF (grant no. I760-B17).
- Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, Land SJ, Lu X, et al. 2012 A program for annotating and predicting the effects of single nucleotide polymorphisms, snpeff: Snps in the genome of Drosophila melanogaster strain w1118 iso-2 iso-3. Fly 6 80–92Google Scholar
- DePristo M and Mark D 2010 Mendelian violations in the CEU andYRI Pilot 2 Trios. Technical report at broad Institute of Harvard and MITGoogle Scholar
- Eden E, Navon R, Steinfeld I, Lipson D and Yakhini Z 2009 Gorilla: a tool for discovery and visualization of enriched go terms in ranked gene lists. BMC Bioinform. 10 48Google Scholar
- He Z, O’Roak BJ, Smith JD, Wang G, Hooker S, Santos-Cortez RLP, Li B, Kan M, et al. 2014 Rare-variant extensions of the transmission disequilibrium test: Application to autism exome sequence data. Am. J. Hum. Genet. 94 p33–46Google Scholar
- Shigemizu D, Fujimoto A, Akiyama S, Abe T, Nakano K, Boroevich KA, Yamamoto Y, Furuta M, Kubo M, Nakagawa H, et al. 2013 A practical method to detect snvs and indels from whole genome and exome sequencing data. Sci. Rep. 3 Google Scholar
- Wang J, Wang W, Li R, Li Y, Tian G, Goodman L, Fan W, Zhang J, et al. 2008 The diploid genome sequence of an Asian individual. Nature 456 60–65Google Scholar
- Wong LP, Ong RTH, Poh WT, Liu X, Chen P, Li R, Lam KKY, Pillai NE, et al. 2013 Deep whole-genome sequencing of 100 Southeast Asian Malays. Am. J. Hum. Genet. 92 52–66Google Scholar
- Zerbino DR and Birney E 2008 Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18 821–829Google Scholar