Massively Parallel Sequencing Approaches for Characterization of Structural Variation

  • Daniel C. Koboldt
  • David E. Larson
  • Ken Chen
  • Li Ding
  • Richard K. Wilson
Part of the Methods in Molecular Biology book series (MIMB, volume 838)


The emergence of next-generation sequencing (NGS) technologies offers an incredible opportunity to comprehensively study DNA sequence variation in human genomes. Commercially available platforms from Roche (454), Illumina (Genome Analyzer and Hiseq 2000), and Applied Biosystems (SOLiD) have the capability to completely sequence individual genomes to high levels of coverage. NGS data is particularly advantageous for the study of structural variation (SV) because it offers the sensitivity to detect variants of various sizes and types, as well as the precision to characterize their breakpoints at base pair resolution. In this chapter, we present methods and software algorithms that have been developed to detect SVs and copy number changes using massively parallel sequencing data. We describe visualization and de novo assembly strategies for characterizing SV breakpoints and removing false positives.

Key words

Next-generation sequencing Paired-end sequencing 454 Illumina Solexa Abi solid Insertions Deletions Duplications Inversions Translocations Indels Copy number variants 


  1. 1.
    Mardis, E.R. (2008). The impact of next-generation sequencing technology on genetics. Trends Genet. 24(3): p. 133–41.PubMedCrossRefGoogle Scholar
  2. 2.
    Ahn, S.M., T.H. Kim, S. Lee, et al. (2009). The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res. 19(9): p. 1622–9.PubMedCrossRefGoogle Scholar
  3. 3.
    Bentley, D.R., S. Balasubramanian, H.P. Swerdlow, et al. (2008). Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 456(7218): p. 53–9.PubMedCrossRefGoogle Scholar
  4. 4.
    Drmanac, R., A.B. Sparks, M.J. Callow, et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science. 327(5961): p. 78–81.Google Scholar
  5. 5.
    Kim, J.I., Y.S. Ju, H. Park, et al. (2009). A highly annotated whole-genome sequence of a Korean individual. Nature. 460(7258): p. 1011–5.PubMedGoogle Scholar
  6. 6.
    McKernan, K.J., H.E. Peckham, G.L. Costa, et al. (2009). Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res. 19(9): p. 1527–41.PubMedCrossRefGoogle Scholar
  7. 7.
    Pushkarev, D., N.F. Neff, and S.R. Quake (2009). Single-molecule sequencing of an individual human genome. Nat Biotechnol. 27(9): p. 847–52.PubMedCrossRefGoogle Scholar
  8. 8.
    Wang, J., W. Wang, R. Li, et al. (2008). The diploid genome sequence of an Asian individual. Nature. 456(7218): p. 60–5.PubMedCrossRefGoogle Scholar
  9. 9.
    Wheeler, D.A., M. Srinivasan, M. Egholm, et al. (2008). The complete genome of an individual by massively parallel DNA sequencing. Nature. 452(7189): p. 872–6.PubMedCrossRefGoogle Scholar
  10. 10.
    Volik, S., S. Zhao, K. Chin, et al. (2003). End-sequence profiling: sequence-based analysis of aberrant genomes. Proc Natl Acad Sci U S A. 100(13): p. 7696–701.PubMedCrossRefGoogle Scholar
  11. 11.
    Raphael, B.J., S. Volik, C. Collins, et al. (2003). Reconstructing tumor genome architectures. Bioinformatics. 19 Suppl 2: p. ii162–71.Google Scholar
  12. 12.
    Tuzun, E., A.J. Sharp, J.A. Bailey, et al. (2005). Fine-scale structural variation of the human genome. Nat Genet. 37(7): p. 727–32.PubMedCrossRefGoogle Scholar
  13. 13.
    Korbel, J.O., A.E. Urban, J.P. Affourtit, et al. (2007). Paired-end mapping reveals extensive structural variation in the human genome. Science. 318(5849): p. 420–6.PubMedCrossRefGoogle Scholar
  14. 14.
    Campbell, P.J., P.J. Stephens, E.D. Pleasance, et al. (2008). Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat Genet. 40(6): p. 722–9.PubMedCrossRefGoogle Scholar
  15. 15.
    Stephens, P.J., D.J. McBride, M.L. Lin, et al. (2009). Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature. 462(7276): p. 1005–10.PubMedCrossRefGoogle Scholar
  16. 16.
    Pleasance, E.D., P.J. Stephens, S. O’Meara, et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature. 463(7278): p. 184–90.Google Scholar
  17. 17.
    Pleasance, E.D., R.K. Cheetham, P.J. Stephens, et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature. 463(7278): p. 191–6.Google Scholar
  18. 18.
    Margulies, M., M. Egholm, W.E. Altman, et al. (2005). Genome sequencing in microfabricated high-density picolitre reactors. Nature. 437(7057): p. 376–80.PubMedGoogle Scholar
  19. 19.
    Li, H. and N. Homer A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform.Google Scholar
  20. 20.
    Li, H., J. Ruan, and R. Durbin (2008). Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18(11): p. 1851–8.PubMedCrossRefGoogle Scholar
  21. 21.
    Li, H. and R. Durbin (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 25(14): p. 1754–60.PubMedCrossRefGoogle Scholar
  22. 22.
    Langmead, B., C. Trapnell, M. Pop, et al. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3): p. R25.Google Scholar
  23. 23.
    Homer, N., B. Merriman, and S.F. Nelson (2009). BFAST: an alignment tool for large scale genome resequencing. PLoS One. 4(11): p. e7767.Google Scholar
  24. 24.
    Rumble, S.M., P. Lacroute, A.V. Dalca, et al. (2009). SHRiMP: accurate mapping of short color-space reads. PLoS Comput Biol. 5(5): p. e1000386.Google Scholar
  25. 25.
    Ning, Z., A.J. Cox, and J.C. Mullikin (2001). SSAHA: a fast search method for large DNA databases. Genome Res. 11(10): p. 1725–9.PubMedCrossRefGoogle Scholar
  26. 26.
    Li, H. and R. Durbin Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 26(5): p. 589–95.Google Scholar
  27. 27.
    Li, H., B. Handsaker, A. Wysoker, et al. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25(16): p. 2078–9.PubMedCrossRefGoogle Scholar
  28. 28.
    Kidd, J.M., G.M. Cooper, W.F. Donahue, et al. (2008). Mapping and sequencing of structural variation from eight human genomes. Nature. 453(7191): p. 56–64.PubMedCrossRefGoogle Scholar
  29. 29.
    Yoon, S., Z. Xuan, V. Makarov, et al. (2009). Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res. 19(9): p. 1586–92.PubMedCrossRefGoogle Scholar
  30. 30.
    Chiang, D.Y., G. Getz, D.B. Jaffe, et al. (2009). High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat Methods. 6(1): p. 99–103.PubMedCrossRefGoogle Scholar
  31. 31.
    Alkan, C., J.M. Kidd, T. Marques-Bonet, et al. (2009). Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet. 41(10): p. 1061–7.PubMedCrossRefGoogle Scholar
  32. 32.
    Koboldt, D.C. (2009). Short Read Aligners. MassGenomics.
  33. 33.
    Hormozdiari, F., C. Alkan, E.E. Eichler, et al. (2009). Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res. 19(7): p. 1270–8.PubMedCrossRefGoogle Scholar
  34. 34.
    Sindi, S., E. Helman, A. Bashir, et al. (2009). A geometric approach for classification and comparison of structural variants. Bioinformatics. 25(12): p. i222–30.PubMedCrossRefGoogle Scholar
  35. 35.
    Ye, K., M.H. Schulz, Q. Long, et al. (2009). Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics. 25(21): p. 2865–71.PubMedCrossRefGoogle Scholar
  36. 36.
    Chen, K., J.W. Wallis, M.D. McLellan, et al. (2009). BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat Methods. 6(9): p. 677–81.PubMedCrossRefGoogle Scholar
  37. 37.
    Futreal, P.A., L. Coin, M. Marshall, et al. (2004). A census of human cancer genes. Nat Rev Cancer. 4(3): p. 177–83.PubMedCrossRefGoogle Scholar
  38. 38.
    Maher, C.A., C. Kumar-Sinha, X. Cao, et al. (2009). Transcriptome sequencing to detect gene fusions in cancer. Nature. 458(7234): p. 97–101.PubMedCrossRefGoogle Scholar
  39. 39.
    Levin, J.Z., M.F. Berger, X. Adiconis, et al. (2009). Targeted next-generation sequencing of a cancer transcriptome enhances detection of sequence variants and novel fusion transcripts. Genome Biol. 10(10): p. R115.Google Scholar
  40. 40.
    Fiume, M., V. Williams, A. Brook, et al. Savant: genome browser for high-throughput sequencing data. Bioinformatics. 26(16): p. 1938–44.Google Scholar
  41. 41.
    Manske, H.M. and D.P. Kwiatkowski (2009). LookSeq: a browser-based viewer for deep sequencing data. Genome Res. 19(11): p. 2125–32.PubMedCrossRefGoogle Scholar
  42. 42.
    Krzywinski, M., J. Schein, I. Birol, et al. (2009). Circos: an information aesthetic for comparative genomics. Genome Res. 19(9): p. 1639–45.PubMedCrossRefGoogle Scholar
  43. 43.
    Bashir, A., S. Volik, C. Collins, et al. (2008). Evaluation of paired-end sequencing strategies for detection of genome rearrangements in cancer. PLoS Comput Biol. 4(4): p. e1000051.Google Scholar
  44. 44.
    Eichler, E.E., D.A. Nickerson, D. Altshuler, et al. (2007). Completing the map of human genetic variation. Nature. 447(7141): p. 161–5.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Daniel C. Koboldt
    • 1
  • David E. Larson
    • 1
  • Ken Chen
    • 1
  • Li Ding
    • 1
  • Richard K. Wilson
    • 1
  1. 1.The Genome Institute at Washington University School of MedicineSt. LouisUSA

Personalised recommendations