Advertisement

SVEM: A Structural Variant Estimation Method Using Multi-mapped Reads on Breakpoints

  • Tomohiko Ohtsuki
  • Naoki Nariai
  • Kaname Kojima
  • Takahiro Mimori
  • Yukuto Sato
  • Yosuke Kawai
  • Yumi Yamaguchi-Kabata
  • Testuo Shibuya
  • Masao Nagasaki
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8542)

Abstract

Recent development of next generation sequencing (NGS) technologies has led to the identification of structural variants (SVs) of genomic DNA existing in the human population. Several SV detection methods utilizing NGS data have been proposed. However, there are several difficulties in analysis of NGS data, particularly with regard to handling reads from duplicated loci or low-complexity sequences of the human genome. In this paper, we propose SVEM, a novel statistical method to detect SVs with a single nucleotide resolution that can utilize multi-mapped reads on breakpoints. SVEM estimates the amount of reads on breakpoints as parameters and mapping states as latent variables using the expectation maximization algorithm. This framework enables us to handle ambiguous mapping of reads without discarding information for SV detection. SVEM is applied to simulation data and real data, and it achieves better performance than existing methods in terms of precision and recall.

Keywords

Reference Genome Copy Number Change Next Generation Sequencing Data Single Nucleotide Poly Single Nucleotide Resolution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Feuk, L., Carson, A.R., Scherer, S.W.: Structural variation in the human genome. Nat. Rev. Genet. 7(2), 85–97 (2006)CrossRefGoogle Scholar
  2. 2.
    Xu, B., Roos, J.L., Levy, S., Van Rensburg, E.J., Gogos, J.A., Karayiorgou, M.: Strong association of de novo copy number mutations with sporadic schizophrenia. Nat. Genet. 40(7), 880–885 (2008)CrossRefGoogle Scholar
  3. 3.
    Futreal, P.A., Coin, L., Marshall, M., Down, T., Hubbard, T., Wooster, R., et al.: A census of human cancer genes. Nat. Rev. Cancer 4(3), 177–183 (2004)CrossRefGoogle Scholar
  4. 4.
    Reich, D.E., Schaffner, S.F., Daly, M.J., McVean, G., Mullikin, J.C., Higgins, J.M., et al.: Human genome sequence variation and the influence of gene history, mutation and recombination. Nat. Genet. 32(1), 135–142 (2002)CrossRefGoogle Scholar
  5. 5.
    Hoogendoorn, E.: Computational methods for the detection of structural variation in the human genome (2012)Google Scholar
  6. 6.
    Pinkel, D., Segraves, R., Sudar, D., Clark, S., Poole, I., Kowbel, D., et al.: High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat. Genet. 20(2), 207–211 (1998)CrossRefGoogle Scholar
  7. 7.
    Hehir-Kwa, J.Y., Egmont-Petersen, M., Janssen, I.M., Smeets, D., Van Kessel, A.G., Veltman, J.A.: Genome-wide copy number profiling on high-density bacterial artificial chromosomes, single-nucleotide polymorphisms, and oligonucleotide microarrays: a platform comparison based on statistical power analysis. DNA Res. 14(1), 1–11 (2007)CrossRefGoogle Scholar
  8. 8.
    Miller, D.T., Adam, M.P., Aradhya, S., Biesecker, L.G., Brothman, A.R., Carter, N.P., et al.: Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am. J. Hum. Genet. 86(5), 749–764 (2010)CrossRefGoogle Scholar
  9. 9.
    Alkan, C., Coe, B.P., Eichler, E.E.: Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12(5), 363–376 (2011)CrossRefGoogle Scholar
  10. 10.
    Tuzun, E., Sharp, A.J., Bailey, J.A., Kaul, R., Morrison, V.A., Pertz, L.M., et al.: Fine-scale structural variation of the human genome. Nat. Genet. 37(7), 727–732 (2005)CrossRefGoogle Scholar
  11. 11.
    Abyzov, A., Urban, A.E., Snyder, M., Gerstein, M.: CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome. Res. 21(6), 974–984 (2011)CrossRefGoogle Scholar
  12. 12.
    Rausch, T., Zichner, T., Schlattl, A., Stütz, A.M., Benes, V., Korbel, J.O.: DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28(18), i333–i339 (2012)Google Scholar
  13. 13.
    Chen, K., Wallis, J.W., McLellan, M.D., Larson, D.E., Kalicki, J.M., Pohl, C.S., et al.: BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat. Methods 6(9), 677–681 (2009)CrossRefGoogle Scholar
  14. 14.
    Ye, K., Schulz, M.H., Long, Q., Apweiler, R., Ning, Z.: Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25(21), 2865–2871 (2009)CrossRefGoogle Scholar
  15. 15.
    Suzuki, S., Yasuda, T., Shiraishi, Y., Miyano, S., Nagasaki, M.: ClipCrop: a tool for detecting structural variations with single-base resolution using soft-clipping information. BMC Bioinformatics 12(Suppl. 14), 7 (2011)CrossRefGoogle Scholar
  16. 16.
    Luo, R., Liu, B., Xie, Y., Li, Z., Huang, W., Yuan, J., et al.: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Giga Science 1(1), 18 (2012)CrossRefGoogle Scholar
  17. 17.
    Abecasis, G.R., Auton, A., Brooks, L.D., DePristo, M.A., Durbin, R.M., Handsaker, R.E., Kang, H.M., Marth, G.T., McVean, G.A.: An integrated map of genetic variation from 1,092 human genomes. Nature 491(7422), 56–65 (2012) (1000 Genomes Project Consortium)Google Scholar
  18. 18.
    Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)CrossRefGoogle Scholar
  19. 19.
    Ewing, B., Hillier, L., Wendl, M.C., Green, P.: Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res. 8(3), 175–185 (1998)CrossRefGoogle Scholar
  20. 20.
    Sachidanandam, R., Weissman, D., Schmidt, S.C., Kakol, J.M., Stein, L.D., Marth, G., et al.: A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409(6822), 928–933 (2001)CrossRefGoogle Scholar
  21. 21.
    Mills, R.E., Walter, K., Stewart, C., Handsaker, R.E., Chen, K., Alkan, C., et al.: Mapping copy number variation by population-scale genome sequencing. Nature 470(7332), 59–65 (2011)CrossRefGoogle Scholar
  22. 22.
    Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A., Dewey, C.N.: RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26(4), 493–500 (2010)CrossRefGoogle Scholar
  23. 23.
    Nariai, N., Hirose, O., Kojima, K., Nagasaki, M.: TIGAR: transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference. Bioinformatics 29(18), 2292–2299 (2013)CrossRefGoogle Scholar
  24. 24.
    Mimori, T., Nariai, N., Kojima, K., Takahashi, M., Ono, A., Sato, Y., Yamaguchi-Kabata, Y., Nagasaki, M.: iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data. BMC Systems Biology 7(6), 1–8 (2013)Google Scholar
  25. 25.
    Kojima, K., Nariai, N., Mimori, T., Takahashi, M., Yamaguchi-Kabata, Y., Sato, Y., Nagasaki, M.: A statistical variant calling approach from pedigree information and local haplotyping with phase informative reads. Bioinformatics 29(22), 2835–2843 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Tomohiko Ohtsuki
    • 1
  • Naoki Nariai
    • 2
  • Kaname Kojima
    • 2
  • Takahiro Mimori
    • 2
  • Yukuto Sato
    • 2
  • Yosuke Kawai
    • 2
  • Yumi Yamaguchi-Kabata
    • 2
  • Testuo Shibuya
    • 1
  • Masao Nagasaki
    • 2
  1. 1.Human Genome Center, Institute of Medical ScienceUniversity of TokyoTokyoJapan
  2. 2.Department of Integrative Genomics, Tohoku Medical Megabank OrganizationTohoku UniversitySendaiJapan

Personalised recommendations