Skip to main content

Structural variant identification and characterization

Abstract

Structural variant (SV) differences between human genomes can cause germline and mosaic disease as well as inter-individual variation. De-regulation of accurate DNA repair and genomic surveillance mechanisms results in a large number of SVs in cancer. Analysis of the DNA sequences at SV breakpoints can help identify pathways of mutagenesis and regions of the genome that are more susceptible to rearrangement. Large-scale SV analyses have been enabled by high-throughput genome-level sequencing on humans in the past decade. These studies have shed light on the mechanisms and prevalence of complex genomic rearrangements. Recent advancements in both sequencing and other mapping technologies as well as calling algorithms for detection of genomic rearrangements have helped propel SV detection into population-scale studies, and have begun to elucidate previously inaccessible regions of the genome. Here, we discuss the genomic organization of simple and complex SVs, the molecular mechanisms of their formation, and various ways to detect them. We also introduce methods for characterizing SVs and their consequences on human genomes.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

Abbreviations

GRCh38:

Genome Reference Consortium Human Build 38

HGR:

human genome reference

HTS:

high-throughput sequencing

WGS:

whole-genome sequencing

SNV:

single-nucleotide variant (1 bp)

InDel:

insertion and deletion (1 – 49 bp)

SV:

structural variant (≥ 50 bp)

DSB:

double-strand break

CNV:

copy number variant

bp:

base pair

DEL:

deletion

INS:

insertion

MEI:

mobile element insertion

DUP:

duplication

TRP:

triplication

INV:

inversion

TRA:

translocation

CGR:

complex genomic rearrangement

LOH:

loss of heterozygosity

SNP:

single-nucleotide polymorphism

NHEJ:

non-homologous end joining

NAHR:

non-allelic homologous recombination

FoSTeS:

fork stalling and template switching

MMBIR:

microhomology-mediated break-induced replication

SSA:

single-strand annealing

FISH:

fluorescence in situ hybridization

aCGH:

array comparative genomic hybridization

SCE:

sister chromatid exchange

PacBio:

Pacific Biosciences

CCS:

circular concensus sequencing

CLR:

continuous long read

ONT:

Oxford nanopore technologies

SMRT:

single-molecule real time

BrdU:

bromodeoxyuridine

GEM:

gel-bead in emulsion

IGV:

Integrative Genomics Viewer

References

  1. 1000 Genomes Project Consortium et al (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061–1073. https://doi.org/10.1038/nature09534

    CAS  Article  Google Scholar 

  2. 1000 Genomes Project Consortium et al (2015) A global reference for human genetic variation. Nature 526:68–74. https://doi.org/10.1038/nature15393

    CAS  Article  Google Scholar 

  3. Abyzov A, Urban AE, Snyder M, Gerstein M (2011) CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res 21:974–984. https://doi.org/10.1101/gr.114876.110

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  4. Amarasinghe KC, Li J, Hunter SM, Ryland GL, Cowin PA, Campbell IG, Halgamuge SK (2014) Inferring copy number and genotype in tumour exome data. BMC Genomics 15:732. https://doi.org/10.1186/1471-2164-15-732

    Article  PubMed  PubMed Central  Google Scholar 

  5. Audano PA et al (2019) Characterizing the major structural variant alleles of the human genome. Cell 176:663–675 e619. https://doi.org/10.1016/j.cell.2018.12.019

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  6. Backenroth D, Homsy J, Murillo LR, Glessner J, Lin E, Brueckner M, Lifton R, Goldmuntz E, Chung WK, Shen Y (2014) CANOES: detecting rare copy number variants from whole exome sequencing data. Nucleic Acids Res 42:e97. https://doi.org/10.1093/nar/gku345

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  7. Beck CR et al (2010) LINE-1 retrotransposition activity in human genomes. Cell 141:1159–1170. https://doi.org/10.1016/j.cell.2010.05.021

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  8. Becker T et al (2018) FusorSV: an algorithm for optimally combining data from multiple structural variation detection methods. Genome Biol 19:38. https://doi.org/10.1186/s13059-018-1404-6

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  9. Berlin K, Koren S, Chin CS, Drake JP, Landolin JM, Phillippy AM (2015) Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat Biotechnol 33:623–630. https://doi.org/10.1038/nbt.3238

    CAS  Article  PubMed  Google Scholar 

  10. Boeva V et al (2012) Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics 28:423–425. https://doi.org/10.1093/bioinformatics/btr670

    CAS  Article  PubMed  Google Scholar 

  11. Brand H, Collins RL, Hanscom C, Rosenfeld JA, Pillalamarri V, Stone MR, Kelley F, Mason T, Margolin L, Eggert S, Mitchell E, Hodge JC, Gusella JF, Sanders SJ, Talkowski ME (2015) Paired-duplication signatures mark cryptic inversions and other complex structural variation. Am J Hum Genet 97:170–176. https://doi.org/10.1016/j.ajhg.2015.05.012

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  12. Brouha B, Schustak J, Badge RM, Lutz-Prigge S, Farley AH, Moran JV, Kazazian HH Jr (2003) Hot L1s account for the bulk of retrotransposition in the human population. Proc Natl Acad Sci U S A 100:5280–5285. https://doi.org/10.1073/pnas.0831042100

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  13. Cameron DL et al (2017) GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly. Genome Res 27:2050–2060. https://doi.org/10.1101/gr.222109.117

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  14. Carvalho CM, Lupski JR (2016) Mechanisms underlying structural variant formation in genomic disorders. Nat Rev Genet 17:224–238. https://doi.org/10.1038/nrg.2015.25

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  15. Carvalho CM et al (2011) Inverted genomic segments and complex triplication rearrangements are mediated by inverted repeats in the human genome. Nat Genet 43:1074–1081. https://doi.org/10.1038/ng.944

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  16. Carvalho CM et al (2013) Replicative mechanisms for CNV formation are error prone. Nat Genet 45:1319–1326. https://doi.org/10.1038/ng.2768

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  17. Carvalho CM et al (2015) Absence of heterozygosity due to template switching during replicative rearrangements. Am J Hum Genet 96:555–564. https://doi.org/10.1016/j.ajhg.2015.01.021

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  18. Caspersson T et al (1968) Chemical differentiation along metaphase chromosomes. Exp Cell Res 49:219–222. https://doi.org/10.1016/0014-4827(68)90538-7

    CAS  Article  PubMed  Google Scholar 

  19. Chaisson MJ et al (2015a) Resolving the complexity of the human genome using single-molecule sequencing. Nature 517:608–611. https://doi.org/10.1038/nature13907

    CAS  Article  PubMed  Google Scholar 

  20. Chaisson MJ, Wilson RK, Eichler EE (2015b) Genetic variation and the de novo assembly of human genomes. Nat Rev Genet 16:627–640. https://doi.org/10.1038/nrg3933

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  21. Chaisson MJP et al. (2019) Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat Commun 10:1784. https://doi.org/10.1038/s41467-018-08148-z

  22. Chan S, Lam E, Saghbini M, Bocklandt S, Hastie A, Cao H, Holmlin E, Borodkin M (2018) Structural variation detection and analysis using bionano optical mapping methods. Mol Biol 1833:193–203. https://doi.org/10.1007/978-1-4939-8666-8_16

    CAS  Article  Google Scholar 

  23. Chen X et al (2016) Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32:1220–1222. https://doi.org/10.1093/bioinformatics/btv710

    CAS  Article  PubMed  Google Scholar 

  24. Chiang C et al (2017) The impact of structural variation on human gene expression. Nat Genet 49:692–699. https://doi.org/10.1038/ng.3834

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  25. Chiarle R et al (2011) Genome-wide translocation sequencing reveals mechanisms of chromosome breaks and rearrangements in B cells. Cell 147:107–119. https://doi.org/10.1016/j.cell.2011.07.049

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  26. Chin C-S, Khalak A (2019) Human genome assembly in 100 minutes. bioRxiv 705616. https://doi.org/10.1101/705616

  27. Chong Z, Chen K (2018) Structural variant breakpoint detection with novoBreak methods. Mol Biol 1833:129–141. https://doi.org/10.1007/978-1-4939-8666-8_10

    CAS  Article  Google Scholar 

  28. Conrad DF et al (2010) Origins and functional impact of copy number variation in the human genome. Nature 464:704–712. https://doi.org/10.1038/nature08516

    CAS  Article  PubMed  Google Scholar 

  29. Cooper GM, Zerr T, Kidd JM, Eichler EE, Nickerson DA (2008) Systematic assessment of copy number variant detection via genome-wide SNP genotyping. Nat Genet 40:1199–1203. https://doi.org/10.1038/ng.236

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  30. Cui C, Shu W, Li P (2016) Fluorescence in situ hybridization: cell-based genetic diagnostic and research applications. Front Cell Dev Biol 4:89. https://doi.org/10.3389/fcell.2016.00089

    Article  PubMed  PubMed Central  Google Scholar 

  31. Deng W, Shi X, Tjian R, Lionnet T, Singer RH (2015) CASFISH: CRISPR/Cas9-mediated in situ labeling of genomic loci in fixed cells. Proc Natl Acad Sci U S A 112:11870–11875. https://doi.org/10.1073/pnas.1515692112

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  32. Eisfeldt J, Martensson G, Ameur A, Nilsson D, Lindstrand A (2019) Discovery of novel sequences in 1,000 Swedish genomes. Mol Biol Evol. https://doi.org/10.1093/molbev/msz176

  33. English AC et al. (2015) Assessing structural variation in a personal genome-towards a human reference diploid genome. BMC Genomics 16:286. https://doi.org/10.1186/s12864-015-1479-3

  34. Ersfeld K (2004) Fiber-FISH: fluorescence in situ hybridization on stretched DNA methods. Mol Biol 270:395–402. https://doi.org/10.1385/1-59259-793-9:395

    CAS  Article  Google Scholar 

  35. Falconer E, Lansdorp PM (2013) Strand-seq: a unifying tool for studies of chromosome segregation. Semin Cell Dev Biol 24:643–652. https://doi.org/10.1016/j.semcdb.2013.04.005

    CAS  Article  PubMed  Google Scholar 

  36. Falconer E et al (2012) DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution. Nat Methods 9:1107–1112. https://doi.org/10.1038/nmeth.2206

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  37. Fan X, Abbott TE, Larson D, Chen K (2014) BreakDancer: identification of genomic structural variation from paired-end read mapping. Curr Protoc Bioinformatics 45:15.16.11-11. https://doi.org/10.1002/0471250953.bi1506s45

  38. Flasch DA et al (2019) Genome-wide de novo L1 Retrotransposition connects endonuclease activity with replication. Cell 177:837–851 e828. https://doi.org/10.1016/j.cell.2019.02.050

    CAS  Article  PubMed  Google Scholar 

  39. Frock RL, Hu J, Meyers RM, Ho YJ, Kii E, Alt FW (2015) Genome-wide detection of DNA double-stranded breaks induced by engineered nucleases. Nat Biotechnol 33:179–186. https://doi.org/10.1038/nbt.3101

    CAS  Article  PubMed  Google Scholar 

  40. Fromer M et al (2012) Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. Am J Hum Genet 91:597–607. https://doi.org/10.1016/j.ajhg.2012.08.005

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  41. Gabrieli T, Sharim H, Michaeli Y, Ebenstein Y (2017) Cas9-Assisted Targeting of CHromosome segments (CATCH) for targeted nanopore sequencing and optical genome mapping. bioRxiv 110163. https://doi.org/10.1101/110163

  42. Gardner EJ et al (2017) The Mobile Element Locator Tool (MELT): population-scale mobile element discovery and biology. Genome Res 27:1916–1929. https://doi.org/10.1101/gr.218032.116

  43. Gilbert N, Lutz-Prigge S, Moran JV (2002) Genomic deletions created upon LINE-1 retrotransposition. Cell 110:315–325. https://doi.org/10.1016/s0092-8674(02)00828-0

    CAS  Article  PubMed  Google Scholar 

  44. Gong L et al (2018) Picky comprehensively detects high-resolution structural variants in nanopore long reads. Nat Methods 15:455–460. https://doi.org/10.1038/s41592-018-0002-6

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  45. Goubert C, Zevallos NA, Feschotte C (2019) Contribution of unfixed transposable element insertions to human regulatory variation. bioRxiv 792937. https://doi.org/10.1101/792937

  46. Gu S et al (2015) Alu-mediated diverse and complex pathogenic copy-number variants within human chromosome 17 at p13.3. Hum Mol Genet 24:4061–4077. https://doi.org/10.1093/hmg/ddv146

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  47. Gu S, Szafranski P, Akdemir ZC, Yuan B, Cooper ML, Magriñá MA, Bacino CA, Lalani SR, Breman AM, Smith JL, Patel A, Song RH, Bi W, Cheung SW, Carvalho CM, Stankiewicz P, Lupski JR (2016) Mechanisms for complex chromosomal insertions. PLoS Genet 12:e1006446. https://doi.org/10.1371/journal.pgen.1006446

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  48. Guan P, Sung WK (2016) Structural variation detection using next-generation sequencing data: a comparative technical review. Methods 102:36–49. https://doi.org/10.1016/j.ymeth.2016.01.020

    CAS  Article  PubMed  Google Scholar 

  49. Handsaker RE, Van Doren V, Berman JR, Genovese G, Kashin S, Boettger LM, McCarroll SA (2015) Large multiallelic copy number variations in humans. Nat Genet 47:296–303. https://doi.org/10.1038/ng.3200

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  50. Hastings PJ, Ira G, Lupski JR (2009a) A microhomology-mediated break-induced replication model for the origin of human copy number variation. PLoS Genet 5:e1000327. https://doi.org/10.1371/journal.pgen.1000327

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  51. Hastings PJ, Lupski JR, Rosenberg SM, Ira G (2009b) Mechanisms of change in gene copy number. Nat Rev Genet 10:551–564. https://doi.org/10.1038/nrg2593

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  52. Heyer EE, Deveson IW, Wooi D, Selinger CI, Lyons RJ, Hayes VM, O'Toole SA, Ballinger ML, Gill D, Thomas DM, Mercer TR, Blackburn J (2019) Diagnosis of fusion genes using targeted RNA sequencing. Nat Commun 10:1388. https://doi.org/10.1038/s41467-019-09374-9

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  53. Hills M, O'Neill K, Falconer E, Brinkman R, Lansdorp PM (2013) BAIT: organizing genomes and mapping rearrangements in single cells. Genome Med 5:82. https://doi.org/10.1186/gm486

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  54. Hindson BJ et al (2011) High-throughput droplet digital PCR system for absolute quantitation of DNA copy number. Anal Chem 83:8604–8610. https://doi.org/10.1021/ac202028g

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  55. Hoijer I et al (2018) Detailed analysis of HTT repeat elements in human blood using targeted amplification-free long-read sequencing. Hum Mutat 39:1262–1272. https://doi.org/10.1002/humu.23580

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  56. Holland AJ, Cleveland DW (2012) Chromoanagenesis and cancer: mechanisms and consequences of localized, complex chromosomal rearrangements. Nat Med 18:1630–1638. https://doi.org/10.1038/nm.2988

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  57. Hu L et al (2014) Fluorescence in situ hybridization (FISH): an increasingly demanded tool for biomarker research and personalized medicine. Biomark Res 2:3. https://doi.org/10.1186/2050-7771-2-3

    Article  PubMed  PubMed Central  Google Scholar 

  58. Iafrate AJ et al (2004) Detection of large-scale variation in the human genome. Nat Genet 36:949–951. https://doi.org/10.1038/ng1416

    CAS  Article  PubMed  Google Scholar 

  59. Iqbal Z, Caccamo M, Turner I, Flicek P, McVean G (2012) De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat Genet 44:226–232. https://doi.org/10.1038/ng.1028

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  60. Jain M et al (2018) Linear assembly of a human centromere on the Y chromosome. Nat Biotechnol 36:321–323. https://doi.org/10.1038/nbt.4109

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  61. Jeffares DC, Jolly C, Hoti M, Speed D, Shaw L, Rallis C, Balloux F, Dessimoz C, Bähler J, Sedlazeck FJ (2017) Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat Commun 8:14061. https://doi.org/10.1038/ncomms14061

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  62. Ji W, Zhang XY, Warshamana GS, Qu GZ, Ehrlich M (1994) Effect of internal direct and inverted Alu repeat sequences on PCR. PCR Methods Appl 4:109–116

    CAS  Article  Google Scholar 

  63. Kallioniemi A, Kallioniemi OP, Sudar D, Rutovitz D, Gray JW, Waldman F, Pinkel D (1992) Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science 258:818–821. https://doi.org/10.1126/science.1359641

    CAS  Article  PubMed  Google Scholar 

  64. Kang SH et al (2010) Insertional translocation detected using FISH confirmation of array-comparative genomic hybridization (aCGH) results. Am J Med Genet A 152A:1111–1126. https://doi.org/10.1002/ajmg.a.33278

    Article  PubMed  Google Scholar 

  65. Kazazian HH Jr, Wong C, Youssoufian H, Scott AF, Phillips DG, Antonarakis SE (1988) Haemophilia A resulting from de novo insertion of L1 sequences represents a novel mechanism for mutation in man. Nature 332:164–166. https://doi.org/10.1038/332164a0

    CAS  Article  PubMed  Google Scholar 

  66. Kielbasa SM, Wan R, Sato K, Horton P, Frith MC (2011) Adaptive seeds tame genomic sequence comparison. Genome Res 21:487–493. https://doi.org/10.1101/gr.113985.110

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  67. Kloosterman WP et al (2011) Chromothripsis as a mechanism driving complex de novo structural rearrangements in the germline. Hum Mol Genet 20:1916–1924. https://doi.org/10.1093/hmg/ddr073

    CAS  Article  PubMed  Google Scholar 

  68. Kloosterman WP et al (2012) Constitutional chromothripsis rearrangements involve clustered double-stranded DNA breaks and nonhomologous repair mechanisms. Cell Rep 1:648–655. https://doi.org/10.1016/j.celrep.2012.05.009

    CAS  Article  PubMed  Google Scholar 

  69. Koboldt DC et al (2012) VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res 22:568–576. https://doi.org/10.1101/gr.129684.111

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  70. Kolmogorov M, Yuan J, Lin Y, Pevzner PA (2019) Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol 37:540–546. https://doi.org/10.1038/s41587-019-0072-8

    CAS  Article  PubMed  Google Scholar 

  71. Korbel JO et al (2007) Paired-end mapping reveals extensive structural variation in the human genome. Science 318:420–426. https://doi.org/10.1126/science.1149504

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  72. Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM (2017) Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27:722–736. https://doi.org/10.1101/gr.215087.116

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  73. Kosugi S, Momozawa Y, Liu X, Terao C, Kubo M, Kamatani Y (2019) Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol 20:117. https://doi.org/10.1186/s13059-019-1720-5

    Article  PubMed  PubMed Central  Google Scholar 

  74. Kraft K, Geuer S, Will AJ, Chan WL, Paliou C, Borschiwer M, Harabula I, Wittler L, Franke M, Ibrahim DM, Kragesteen BK, Spielmann M, Mundlos S, Lupiáñez DG, Andrey G (2015) Deletions, inversions, duplications: engineering of structural variants using CRISPR/Cas in mice. Cell Rep 10:833–839. https://doi.org/10.1016/j.celrep.2015.01.016

    CAS  Article  PubMed  Google Scholar 

  75. Ku CS et al (2012) Exome versus transcriptome sequencing in identifying coding region variants. Expert Rev Mol Diagn 12:241–251. https://doi.org/10.1586/erm.12.10

    CAS  Article  PubMed  Google Scholar 

  76. Lander ES et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921. https://doi.org/10.1038/35057062

    CAS  Article  PubMed  Google Scholar 

  77. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359. https://doi.org/10.1038/nmeth.1923

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  78. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25. https://doi.org/10.1186/gb-2009-10-3-r25

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  79. Layer RM, Chiang C, Quinlan AR, Hall IM (2014) LUMPY: a probabilistic framework for structural variant discovery. Genome Biol 15:R84. https://doi.org/10.1186/gb-2014-15-6-r84

    Article  PubMed  PubMed Central  Google Scholar 

  80. Lee JA, Carvalho CM, Lupski JR (2007) A DNA replication mechanism for generating nonrecurrent rearrangements associated with genomic disorders. Cell 131:1235–1247. https://doi.org/10.1016/j.cell.2007.11.037

    CAS  Article  PubMed  Google Scholar 

  81. Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv e-prints

  82. Li H (2015) FermiKit: assembly-based variant calling for Illumina resequencing data. Bioinformatics 31:3694–3696. https://doi.org/10.1093/bioinformatics/btv440

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  83. Li H (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34:3094–3100. https://doi.org/10.1093/bioinformatics/bty191

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  84. Li J et al (2012) CONTRA: copy number analysis for targeted resequencing. Bioinformatics 28:1307–1313. https://doi.org/10.1093/bioinformatics/bts146

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  85. Linardopoulou EV, Williams EM, Fan Y, Friedman C, Young JM, Trask BJ (2005) Human subtelomeres are hot spots of interchromosomal recombination and segmental duplication. Nature 437:94–100. https://doi.org/10.1038/nature04029

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  86. Liu P et al (2011a) Chromosome catastrophes involve replication mechanisms generating complex genomic rearrangements. Cell 146:889–903. https://doi.org/10.1016/j.cell.2011.07.042

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  87. Liu P, Lacaria M, Zhang F, Withers M, Hastings PJ, Lupski JR (2011b) Frequency of nonallelic homologous recombination is correlated with length of homology: evidence that ectopic synapsis precedes ectopic crossing-over. Am J Hum Genet 89:580–588. https://doi.org/10.1016/j.ajhg.2011.09.009

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  88. Liu P, Carvalho CM, Hastings PJ, Lupski JR (2012) Mechanisms for recurrent and complex human genomic rearrangements. Curr Opin Genet Dev 22:211–220. https://doi.org/10.1016/j.gde.2012.02.012

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  89. Luo R et al (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1:18. https://doi.org/10.1186/2047-217X-1-18

    Article  PubMed  PubMed Central  Google Scholar 

  90. Lupianez DG et al (2015) Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell 161:1012–1025. https://doi.org/10.1016/j.cell.2015.04.004

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  91. Lupski JR et al (1992) Gene dosage is a mechanism for Charcot-Marie-Tooth disease type 1A. Nat Genet 1:29–33. https://doi.org/10.1038/ng0492-29

    CAS  Article  PubMed  Google Scholar 

  92. Ma C, Shao M, Kingsford C (2018) SQUID: transcriptomic structural variation detection from RNA-seq. Genome Biol 19:52. https://doi.org/10.1186/s13059-018-1421-5

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  93. Mackinnon RN, Campbell LJ (2013) Chromothripsis under the microscope: a cytogenetic perspective of two cases of AML with catastrophic chromosome rearrangement. Cancer Gene Ther 206:238–251. https://doi.org/10.1016/j.cancergen.2013.05.021

    CAS  Article  Google Scholar 

  94. Mantere T, Kersten S, Hoischen A (2019) Long-read sequencing emerging in medical genetics. Front Genet 10:426. https://doi.org/10.3389/fgene.2019.00426

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  95. McClintock B (1950) The origin and behavior of mutable loci in maize. Proc Natl Acad Sci U S A 36:344–355. https://doi.org/10.1073/pnas.36.6.344

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  96. McTaggart AR et al (2018) Chromium sequencing: the doors open for genomics of obligate plant pathogens. Biotechniques 65:253–257. https://doi.org/10.2144/btn-2018-0019

    CAS  Article  PubMed  Google Scholar 

  97. Michaelson JJ, Sebat J (2012) forestSV: structural variant discovery through statistical learning. Nat Methods 9:819–821. https://doi.org/10.1038/nmeth.2085

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  98. Miga KH et al (2019) Telomere-to-telomere assembly of a complete human X chromosome. bioRxiv 735928. https://doi.org/10.1101/735928

  99. Moran JV, Holmes SE, Naas TP, DeBerardinis RJ, Boeke JD, Kazazian HH Jr (1996) High frequency retrotransposition in cultured mammalian cells. Cell 87:917–927. https://doi.org/10.1016/s0092-8674(00)81998-4

    CAS  Article  PubMed  Google Scholar 

  100. Nagarajan N, Pop M (2013) Sequence assembly demystified. Nat Rev Genet 14:157–167. https://doi.org/10.1038/nrg3367

    CAS  Article  PubMed  Google Scholar 

  101. Neill NJ et al (2011) Recurrence, submicroscopic complexity, and potential clinical relevance of copy gains detected by array CGH that are shown to be unbalanced insertions by FISH. Genome Res 21:535–544. https://doi.org/10.1101/gr.114579.110

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  102. Nussenzweig A, Nussenzweig MC (2007) A backup DNA repair pathway moves to the forefront. Cell 131:223–225. https://doi.org/10.1016/j.cell.2007.10.005

    CAS  Article  PubMed  Google Scholar 

  103. O'Connor C (2008) Karyotyping for chromosomal abnormalities. Nat Educ 1(1):27

    Google Scholar 

  104. Ostertag EM, Prak ET, DeBerardinis RJ, Moran JV, Kazazian HH Jr (2000) Determination of L1 retrotransposition kinetics in cultured cells. Nucleic Acids Res 28:1418–1423. https://doi.org/10.1093/nar/28.6.1418

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  105. Paszkiewicz K, Studholme DJ (2010) De novo assembly of short sequence reads. Brief Bioinform 11:457–472. https://doi.org/10.1093/bib/bbq020

    CAS  Article  PubMed  Google Scholar 

  106. Payer LM et al (2017) Structural variants caused by Alu insertions are associated with risks for many human diseases. Proc Natl Acad Sci U S A 114:E3984–E3992. https://doi.org/10.1073/pnas.1704117114

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  107. Pellestor F (2019) Chromoanagenesis: cataclysms behind complex chromosomal rearrangements. Mol Cytogenet 12:6. https://doi.org/10.1186/s13039-019-0415-7

    Article  PubMed  PubMed Central  Google Scholar 

  108. Pierce AJ, Johnson RD, Thompson LH, Jasin M (1999) XRCC3 promotes homology-directed repair of DNA damage in mammalian cells. Genes Dev 13:2633–2638. https://doi.org/10.1101/gad.13.20.2633

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  109. Pinkel D, Landegent J, Collins C, Fuscoe J, Segraves R, Lucas J, Gray J (1988) Fluorescence in situ hybridization with human chromosome-specific libraries: detection of trisomy 21 and translocations of chromosome 4. Proc Natl Acad Sci U S A 85:9138–9142. https://doi.org/10.1073/pnas.85.23.9138

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  110. Popejoy AB, Fullerton SM (2016) Genomics is failing on diversity. Nature 538:161–164. https://doi.org/10.1038/538161a

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  111. Porubsky D, Sanders AD, Taudt A, Colome-Tatche M, Lansdorp PM, Guryev V (2019) breakpointR: an R/bioconductor package to localize strand state changes in Strand-seq data. Bioinformatics. https://doi.org/10.1093/bioinformatics/btz681

  112. Pounraja VK, Jayakar G, Jensen M, Kelkar N, Girirajan S (2019) A machine-learning approach for accurate detection of copy number variants from exome sequencing. Genome Res 29:1134–1143. https://doi.org/10.1101/gr.245928.118

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  113. Quinlan AR (2014) BEDTools: the Swiss-Army tool for genome feature analysis. Curr Protoc Bioinformatics 47:11.12.11-34. https://doi.org/10.1002/0471250953.bi1112s47

  114. Quinlan AR, Hall IM (2012) Characterizing complex structural variation in germline and somatic genomes. Trends Genet 28:43–53. https://doi.org/10.1016/j.tig.2011.10.002

    CAS  Article  PubMed  Google Scholar 

  115. Rakocevic G et al (2019) Fast and accurate genomic analyses using genome graphs. Nat Genet 51:354–362. https://doi.org/10.1038/s41588-018-0316-4

    CAS  Article  PubMed  Google Scholar 

  116. Rausch T, Zichner T, Schlattl A, Stutz AM, Benes V, Korbel JO (2012) DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28:i333–i339. https://doi.org/10.1093/bioinformatics/bts378

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  117. Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP (2011) Integrative genomics viewer. Nat Biotechnol 29:24–26. https://doi.org/10.1038/nbt.1754

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  118. Rosenfeld JA, Mason CE, Smith TM (2012) Limitations of the human reference genome for personalized genomics. PLoS One 7:e40294. https://doi.org/10.1371/journal.pone.0040294

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  119. Ruan J, Li H (2019) Fast and accurate long-read assembly with wtdbg2. Nat Methods https://doi.org/10.1038/s41592-019-0669-3

  120. Sanders AD, Hills M, Porubsky D, Guryev V, Falconer E, Lansdorp PM (2016) Characterizing polymorphic inversions in human genomes by single-cell sequencing. Genome Res 26:1575–1587. https://doi.org/10.1101/gr.201160.115

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  121. Sanders AD, Falconer E, Hills M, Spierings DCJ, Lansdorp PM (2017) Single-cell template strand sequencing by Strand-seq enables the characterization of individual homologs. Nat Protoc 12:1151–1176. https://doi.org/10.1038/nprot.2017.029

    CAS  Article  PubMed  Google Scholar 

  122. Sathirapongsasuti JF et al (2011) Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV. Bioinformatics 27:2648–2654. https://doi.org/10.1093/bioinformatics/btr462

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  123. Schroder J, Kumar A, Wong SQ (2019) Overview of fusion detection strategies using next-generation sequencing methods. Mol Biol 1908:125–138. https://doi.org/10.1007/978-1-4939-9004-7_9

    CAS  Article  Google Scholar 

  124. Scully R, Panday A, Elango R, Willis NA (2019) DNA double-strand break repair-pathway choice in somatic mammalian cells. Nat Rev Mol Cell Biol. https://doi.org/10.1038/s41580-019-0152-0

  125. Sedlazeck FJ, Lee H, Darby CA, Schatz MC (2018a) Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat Rev Genet 19:329–346. https://doi.org/10.1038/s41576-018-0003-4

    CAS  Article  PubMed  Google Scholar 

  126. Sedlazeck FJ, Rescheneder P, Smolka M, Fang H, Nattestad M, von Haeseler A, Schatz MC (2018b) Accurate detection of complex structural variations using single-molecule sequencing. Nat Methods 15:461–468. https://doi.org/10.1038/s41592-018-0001-7

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  127. Seo JS et al (2016) De novo assembly and phasing of a Korean human genome. Nature 538:243–247. https://doi.org/10.1038/nature20098

    CAS  Article  PubMed  Google Scholar 

  128. Sheen CR et al (2007) Double complex mutations involving F8 and FUNDC2 caused by distinct break-induced replication. Hum Mutat 28:1198–1206. https://doi.org/10.1002/humu.20591

    CAS  Article  PubMed  Google Scholar 

  129. Shen MM (2013) Chromoplexy: a new category of complex rearrangements in the cancer genome. Cancer Cell 23:567–569. https://doi.org/10.1016/j.ccr.2013.04.025

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  130. Shendure J, Ji H (2008) Next-generation DNA sequencing. Nat Biotechnol 26:1135–1145. https://doi.org/10.1038/nbt1486

    CAS  Article  PubMed  Google Scholar 

  131. Sherman RM et al (2019) Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nat Genet 51:30–35. https://doi.org/10.1038/s41588-018-0273-y

    CAS  Article  PubMed  Google Scholar 

  132. Shi L et al. (2016) Long-read sequencing and de novo assembly of a Chinese genome. Nat Commun 7:12065. https://doi.org/10.1038/ncomms12065

  133. Shrivastav M, De Haro LP, Nickoloff JA (2008) Regulation of DNA double-strand break repair pathway choice. Cell Res 18:134–147. https://doi.org/10.1038/cr.2007.111

    CAS  Article  PubMed  Google Scholar 

  134. Smith SD, Kawash JK, Grigoriev A (2017) Lightning-fast genome variant detection with GROM. Gigascience 6:1–7. https://doi.org/10.1093/gigascience/gix091

    Article  PubMed  PubMed Central  Google Scholar 

  135. Stankiewicz P, Lupski JR (2010) Structural variation in the human genome and its role in disease. Annu Rev Med 61:437–455. https://doi.org/10.1146/annurev-med-100708-204735

    CAS  Article  PubMed  Google Scholar 

  136. Stephens PJ et al (2011) Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell 144:27–40. https://doi.org/10.1016/j.cell.2010.11.055

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  137. Sudmant PH et al (2015) An integrated map of structural variation in 2,504 human genomes. Nature 526:75–81. https://doi.org/10.1038/nature15394

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  138. Talevich E, Shain AH (2018) CNVkit-RNA: copy number inference from RNA-sequencing data. bioRxiv:408534. https://doi.org/10.1101/408534

  139. Tattini L, D'Aurizio R, Magi A (2015) Detection of genomic structural variants from next-generation sequencing data front. Bioeng Biotechnol 3:92. https://doi.org/10.3389/fbioe.2015.00092

    Article  Google Scholar 

  140. Teague B et al (2010) High-resolution human genome structure by single-molecule analysis. Proc Natl Acad Sci U S A 107:10848–10853. https://doi.org/10.1073/pnas.0914638107

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  141. Therman E, Susman B, Denniston C (1989) The nonrandom participation of human acrocentric chromosomes in Robertsonian translocations. Ann Hum Genet 53:49–65. https://doi.org/10.1111/j.1469-1809.1989.tb01121.x

    CAS  Article  PubMed  Google Scholar 

  142. Tian S, Yan H, Klee EW, Kalmbach M, Slager SL (2018) Comparative analysis of de novo assemblers for variation discovery in personal genomes. Brief Bioinform 19:893–904. https://doi.org/10.1093/bib/bbx037

    CAS  Article  PubMed  Google Scholar 

  143. Trask BJ (2002) Human cytogenetics: 46 chromosomes, 46 years and counting. Nat Rev Genet 3:769–778. https://doi.org/10.1038/nrg905

    Article  PubMed  Google Scholar 

  144. Tsai Y-C et al (2017) Amplification-free, CRISPR-Cas9 targeted enrichment and SMRT sequencing of repeat-expansion disease causative genomic regions. bioRxiv:203919. https://doi.org/10.1101/203919

  145. Uhrig S, Fröhlich M, Hutter B, Brors B (2018) PO-400 Arriba—fast and accurate gene fusion detection from RNA-seq data. ESMO Open 3:A179–A179. https://doi.org/10.1136/esmoopen-2018-EACR25.426

  146. Wala JA et al (2018) SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res 28:581–591. https://doi.org/10.1101/gr.221028.117

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  147. Wang K et al (2007) PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res 17:1665–1674. https://doi.org/10.1101/gr.6861907

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  148. Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:57–63. https://doi.org/10.1038/nrg2484

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  149. Wang J et al (2011) CREST maps somatic structural variation in cancer genomes with base-pair resolution. Nat Methods 8:652–654. https://doi.org/10.1038/nmeth.1628

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  150. Wang M, Beck CR, English AC, Meng Q, Buhay C, Han Y, Doddapaneni HV, Yu F, Boerwinkle E, Lupski JR, Muzny DM, Gibbs RA (2015) PacBio-LITS: a large-insert targeted sequencing method for characterization of human disease-associated chromosomal structural variations. BMC Genomics 16:214. https://doi.org/10.1186/s12864-015-1370-2

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  151. Weckselblatt B, Rudd MK (2015) Human structural variation: mechanisms of chromosome rearrangements. Trends Genet 31:587–599. https://doi.org/10.1016/j.tig.2015.05.010

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  152. Wenger AM et al (2019) Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 37:1155–1162. https://doi.org/10.1038/s41587-019-0217-9

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  153. Willis NA, Chandramouly G, Huang B, Kwok A, Follonier C, Deng C, Scully R (2014) BRCA1 controls homologous recombination at Tus/Ter-stalled mammalian replication forks. Nature 510:556–559. https://doi.org/10.1038/nature13295

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  154. Wiszniewska J et al (2014) Combined array CGH plus SNP genome analyses in a single assay for optimized clinical testing. Eur J Hum Genet 22:79–87. https://doi.org/10.1038/ejhg.2013.77

    CAS  Article  PubMed  Google Scholar 

  155. Ye K, Guo L, Yang X, Lamijer EW, Raine K, Ning Z (2018) Split-read indel and structural variant calling using PINDEL methods. Mol Biol 1833:95–105. https://doi.org/10.1007/978-1-4939-8666-8_7

    CAS  Article  Google Scholar 

  156. Zarate S et al (2018) Parliament2: fast structural variant calling using optimized combinations of callers. bioRxiv:424267. https://doi.org/10.1101/424267

  157. Zhang F, Carvalho CM, Lupski JR (2009a) Complex human chromosomal and genomic rearrangements. Trends Genet 25:298–307. https://doi.org/10.1016/j.tig.2009.05.005

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  158. Zhang F, Khajavi M, Connolly AM, Towne CF, Batish SD, Lupski JR (2009b) The DNA replication FoSTeS/MMBIR mechanism can generate genomic, genic and exonic complex rearrangements in humans. Nat Genet 41:849–853. https://doi.org/10.1038/ng.399

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  159. Zheng GX et al (2016) Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat Biotechnol 34:303–311. https://doi.org/10.1038/nbt.3432

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  160. Zimin AV, Marcais G, Puiu D, Roberts M, Salzberg SL, Yorke JA (2013) The MaSuRCA genome assembler. Bioinformatics 29:2669–2677. https://doi.org/10.1093/bioinformatics/btt476

    CAS  Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank the members of the Beck lab for reading and editing the review, in particular Alex V. Nesta. This work was supported in part by the National Institute of General Medical Sciences grants R00GM120453 and R35GM133600 and startup funds from the University of Connecticut Health and the Jackson Laboratory to Christine R. Beck.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Christine R. Beck.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Responsible Editor: Beth Sullivan

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Balachandran, P., Beck, C.R. Structural variant identification and characterization. Chromosome Res 28, 31–47 (2020). https://doi.org/10.1007/s10577-019-09623-z

Download citation

Keywords

  • Structural variant
  • High-throughput sequencing
  • DNA repair
  • Transposon
  • Cancer
  • Bioinformatic approaches