Advertisement

Use of RAPTR-SV to Identify SVs from Read Pairing and Split Read Signatures

  • Derek M. BickhartEmail author
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1833)

Abstract

High-throughput short read sequencing technologies are still the leading cost-effective means of assessing variation in individual samples. Unfortunately, while such technologies are eminently capable of detecting single nucleotide polymorphisms (SNP) and small insertions and deletions, the detection of large copy number variants (CNV) with these technologies is prone to numerous false positives. CNV detection tools that incorporate multiple variant signals and exclude regions of systemic bias in the genome tend to reduce the probability of false positive calls and therefore represent the best means of ascertaining true CNV regions. To this end, we provide instructions and details on the use of the RAPTR-SV CNV detection pipeline, which is a tool that incorporates read-pair and split-read signals to identify high confidence CNV regions in a sequenced sample. By combining two different structural variant (SV) signals in variant calling, RAPTR-SV enables the easy filtration of artifact CNV calls from large datasets.

Key words

Read pair Split-read Combined detection RAPTR-SV Whole genome sequencing 

References

  1. 1.
    Korbel JO, Urban AE, Affourtit JP et al (2007) Paired-end mapping reveals extensive structural variation in the human genome. Science 318:420–426. https://doi.org/10.1126/science.1149504 CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Chen K, Wallis JW, McLellan MD et al (2009) BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat Methods 6:677–681. https://doi.org/10.1038/nmeth.1363 CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Hormozdiari F, Hajirasouliha I, Dao P et al (2010) Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery. Bioinformatics 26:i350–i357. https://doi.org/10.1093/bioinformatics/btq216 CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Korbel J, Abyzov A, Mu X et al (2009) PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome Biol 10:R23. https://doi.org/10.1186/gb-2009-10-2-r23 CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Ye K, Schulz MH, Long Q et al (2009) Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25:2865–2871. https://doi.org/10.1093/bioinformatics/btp394 CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Handsaker RE, Korn JM, Nemesh J, McCarroll SA (2011) Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nat Genet 43:269–276. https://doi.org/10.1038/ng.768 CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Layer RM, Chiang C, Quinlan AR, Hall IM (2014) LUMPY: a probabilistic framework for structural variant discovery. Genome Biol 15:R84. https://doi.org/10.1186/gb-2014-15-6-r84 CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Bickhart DM, Hutchison JL, Xu L et al (2015) RAPTR-SV: a hybrid method for the detection of structural variants. Bioinformatics 31:2084–2090. https://doi.org/10.1093/bioinformatics/btv086 CrossRefPubMedGoogle Scholar
  9. 9.
    Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760. https://doi.org/10.1093/bioinformatics/btp324 CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. https://doi.org/10.1093/bioinformatics/btp352 CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Hach F, Hormozdiari F, Alkan C et al (2010) mrsFAST: a cache-oblivious algorithm for short-read mapping. Nat Methods 7:576–577. https://doi.org/10.1038/nmeth0810-576 CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Hach F, Sarrafi I, Hormozdiari F et al (2014) mrsFAST-ultra: a compact, SNP-aware mapper for high performance sequencing applications. Nucleic Acids Res 42:W494–W500. https://doi.org/10.1093/nar/gku370 CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Zhang C-Z, Spektor A, Cornils H et al (2015) Chromothripsis from DNA damage in micronuclei. Nature 522:179–184. https://doi.org/10.1038/nature14493 CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842. https://doi.org/10.1093/bioinformatics/btq033 CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Hormozdiari F, Alkan C, Eichler EE, Sahinalp SC (2009) Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res 19:1270–1278. https://doi.org/10.1101/gr.088633.108 CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    English AC, Richards S, Han Y et al (2012) Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS One 7:e47768. https://doi.org/10.1371/journal.pone.0047768 CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Research Microbiologist/BioinformaticianUSDA ARS DFRCMadisonUSA

Personalised recommendations