Abstract
The emergence of next-generation sequencing (NGS) technologies offers an incredible opportunity to comprehensively study DNA sequence variation in human genomes. Commercially available platforms from Roche (454), Illumina (Genome Analyzer and Hiseq 2000), and Applied Biosystems (SOLiD) have the capability to completely sequence individual genomes to high levels of coverage. NGS data is particularly advantageous for the study of structural variation (SV) because it offers the sensitivity to detect variants of various sizes and types, as well as the precision to characterize their breakpoints at base pair resolution. In this chapter, we present methods and software algorithms that have been developed to detect SVs and copy number changes using massively parallel sequencing data. We describe visualization and de novo assembly strategies for characterizing SV breakpoints and removing false positives.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mardis, E.R. (2008). The impact of next-generation sequencing technology on genetics. Trends Genet. 24(3): p. 133–41.
Ahn, S.M., T.H. Kim, S. Lee, et al. (2009). The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res. 19(9): p. 1622–9.
Bentley, D.R., S. Balasubramanian, H.P. Swerdlow, et al. (2008). Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 456(7218): p. 53–9.
Drmanac, R., A.B. Sparks, M.J. Callow, et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science. 327(5961): p. 78–81.
Kim, J.I., Y.S. Ju, H. Park, et al. (2009). A highly annotated whole-genome sequence of a Korean individual. Nature. 460(7258): p. 1011–5.
McKernan, K.J., H.E. Peckham, G.L. Costa, et al. (2009). Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res. 19(9): p. 1527–41.
Pushkarev, D., N.F. Neff, and S.R. Quake (2009). Single-molecule sequencing of an individual human genome. Nat Biotechnol. 27(9): p. 847–52.
Wang, J., W. Wang, R. Li, et al. (2008). The diploid genome sequence of an Asian individual. Nature. 456(7218): p. 60–5.
Wheeler, D.A., M. Srinivasan, M. Egholm, et al. (2008). The complete genome of an individual by massively parallel DNA sequencing. Nature. 452(7189): p. 872–6.
Volik, S., S. Zhao, K. Chin, et al. (2003). End-sequence profiling: sequence-based analysis of aberrant genomes. Proc Natl Acad Sci U S A. 100(13): p. 7696–701.
Raphael, B.J., S. Volik, C. Collins, et al. (2003). Reconstructing tumor genome architectures. Bioinformatics. 19 Suppl 2: p. ii162–71.
Tuzun, E., A.J. Sharp, J.A. Bailey, et al. (2005). Fine-scale structural variation of the human genome. Nat Genet. 37(7): p. 727–32.
Korbel, J.O., A.E. Urban, J.P. Affourtit, et al. (2007). Paired-end mapping reveals extensive structural variation in the human genome. Science. 318(5849): p. 420–6.
Campbell, P.J., P.J. Stephens, E.D. Pleasance, et al. (2008). Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat Genet. 40(6): p. 722–9.
Stephens, P.J., D.J. McBride, M.L. Lin, et al. (2009). Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature. 462(7276): p. 1005–10.
Pleasance, E.D., P.J. Stephens, S. O’Meara, et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature. 463(7278): p. 184–90.
Pleasance, E.D., R.K. Cheetham, P.J. Stephens, et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature. 463(7278): p. 191–6.
Margulies, M., M. Egholm, W.E. Altman, et al. (2005). Genome sequencing in microfabricated high-density picolitre reactors. Nature. 437(7057): p. 376–80.
Li, H. and N. Homer A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform.
Li, H., J. Ruan, and R. Durbin (2008). Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18(11): p. 1851–8.
Li, H. and R. Durbin (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 25(14): p. 1754–60.
Langmead, B., C. Trapnell, M. Pop, et al. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3): p. R25.
Homer, N., B. Merriman, and S.F. Nelson (2009). BFAST: an alignment tool for large scale genome resequencing. PLoS One. 4(11): p. e7767.
Rumble, S.M., P. Lacroute, A.V. Dalca, et al. (2009). SHRiMP: accurate mapping of short color-space reads. PLoS Comput Biol. 5(5): p. e1000386.
Ning, Z., A.J. Cox, and J.C. Mullikin (2001). SSAHA: a fast search method for large DNA databases. Genome Res. 11(10): p. 1725–9.
Li, H. and R. Durbin Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 26(5): p. 589–95.
Li, H., B. Handsaker, A. Wysoker, et al. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25(16): p. 2078–9.
Kidd, J.M., G.M. Cooper, W.F. Donahue, et al. (2008). Mapping and sequencing of structural variation from eight human genomes. Nature. 453(7191): p. 56–64.
Yoon, S., Z. Xuan, V. Makarov, et al. (2009). Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res. 19(9): p. 1586–92.
Chiang, D.Y., G. Getz, D.B. Jaffe, et al. (2009). High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat Methods. 6(1): p. 99–103.
Alkan, C., J.M. Kidd, T. Marques-Bonet, et al. (2009). Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet. 41(10): p. 1061–7.
Koboldt, D.C. (2009). Short Read Aligners. MassGenomics. http://www.massgenomics.org/short-read-aligners.
Hormozdiari, F., C. Alkan, E.E. Eichler, et al. (2009). Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res. 19(7): p. 1270–8.
Sindi, S., E. Helman, A. Bashir, et al. (2009). A geometric approach for classification and comparison of structural variants. Bioinformatics. 25(12): p. i222–30.
Ye, K., M.H. Schulz, Q. Long, et al. (2009). Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics. 25(21): p. 2865–71.
Chen, K., J.W. Wallis, M.D. McLellan, et al. (2009). BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat Methods. 6(9): p. 677–81.
Futreal, P.A., L. Coin, M. Marshall, et al. (2004). A census of human cancer genes. Nat Rev Cancer. 4(3): p. 177–83.
Maher, C.A., C. Kumar-Sinha, X. Cao, et al. (2009). Transcriptome sequencing to detect gene fusions in cancer. Nature. 458(7234): p. 97–101.
Levin, J.Z., M.F. Berger, X. Adiconis, et al. (2009). Targeted next-generation sequencing of a cancer transcriptome enhances detection of sequence variants and novel fusion transcripts. Genome Biol. 10(10): p. R115.
Fiume, M., V. Williams, A. Brook, et al. Savant: genome browser for high-throughput sequencing data. Bioinformatics. 26(16): p. 1938–44.
Manske, H.M. and D.P. Kwiatkowski (2009). LookSeq: a browser-based viewer for deep sequencing data. Genome Res. 19(11): p. 2125–32.
Krzywinski, M., J. Schein, I. Birol, et al. (2009). Circos: an information aesthetic for comparative genomics. Genome Res. 19(9): p. 1639–45.
Bashir, A., S. Volik, C. Collins, et al. (2008). Evaluation of paired-end sequencing strategies for detection of genome rearrangements in cancer. PLoS Comput Biol. 4(4): p. e1000051.
Eichler, E.E., D.A. Nickerson, D. Altshuler, et al. (2007). Completing the map of human genetic variation. Nature. 447(7141): p. 161–5.
Acknowledgments
We thank John Wallis for insightful discussions on structural variant analysis. We are also grateful for the support of the medical genomics, analysis pipeline, and technology development groups of the Genome Institute at Washington University in St. Louis.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Koboldt, D.C., Larson, D.E., Chen, K., Ding, L., Wilson, R.K. (2012). Massively Parallel Sequencing Approaches for Characterization of Structural Variation. In: Feuk, L. (eds) Genomic Structural Variants. Methods in Molecular Biology, vol 838. Springer, New York, NY. https://doi.org/10.1007/978-1-61779-507-7_18
Download citation
DOI: https://doi.org/10.1007/978-1-61779-507-7_18
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-61779-506-0
Online ISBN: 978-1-61779-507-7
eBook Packages: Springer Protocols