Validation of Genomic Structural Variants Through Long Sequencing Technologies

  • Xuefang ZhaoEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1833)


Although numerous algorithms have been developed to identify large chromosomal rearrangements (i.e., genomic structural variants, SVs), there remains a dearth of approaches to evaluate their results. This is significant, as the accurate identification of SVs is still an outstanding problem whereby no single algorithm has been shown to be able to achieve high sensitivity and specificity across different classes of SVs. The method introduced in this chapter, VaPoR, is specifically designed to evaluate the accuracy of SV predictions using third-generation long sequences. This method uses a recurrence approach and collects direct evidence from raw reads thus avoiding computationally costly whole genome assembly. This chapter would describe in detail as how to apply this tool onto different data types.

Key words

VaPor Long read sequencing Structural Variants 


  1. 1.
    Alkan C, Coe BP, Eichler EE (2011) Genome structural variation discovery and genotyping. Nat Rev Genet 12(5):363–376CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Brand H et al (2014) Cryptic and complex chromosomal aberrations in early-onset neuropsychiatric disorders. Am J Hum Genet 95(4):454–461CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Layer RM et al (2014) LUMPY: a probabilistic framework for structural variant discovery. Genome Biol 15(6):R84CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Rausch T et al (2012) DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28(18):i333–i339CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Zhao X et al (2016) Resolving complex structural genomic rearrangements using a randomized approach. Genome Biol 17(1):126CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Chaisson MJP et al (2014) Resolving the complexity of the human genome using single-molecule sequencing. Nature 517(7536):608–611CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Pendleton M et al (2015) Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat Methods 12(8):780–786CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Rhoads A, Au KF (2015) PacBio sequencing and its applications. Genomics Proteomics Bioinformatics 13(5):278–289CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Shi L et al (2016) Long-read sequencing and de novo assembly of a Chinese genome. Nat Commun 7:12065CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Koren S et al (2012) Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat Biotechnol 30(7):693–700CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Huddleston J et al (2016) Discovery and genotyping of structural variation from long-read haploid genome sequence data. Genome Res 27(5):677–685. CrossRefPubMedGoogle Scholar
  12. 12.
    Carvalho AB, Dupim EG, Goldstein G (2016) Improved assembly of noisy long reads by k-mer validation. Genome Res 26(12):1710–1720CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Gibbs AJ, McIntyre GA (1970) The diagram, a method for comparing sequences. Its use with amino acid and nucleotide sequences. Eur J Biochem 16(1):1–11CrossRefPubMedGoogle Scholar
  14. 14.
    Parikh H et al (2016) svclassify: a method to establish benchmark structural variant calls. BMC Genomics 17:64CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Center for Genomic Medicine at Massachusetts General HospitalBostonUSA

Personalised recommendations