Skip to main content

Detecting Structural Variants and Associated Gene Presence–Absence Variation Phenomena in the Genomes of Marine Organisms

  • Protocol
  • First Online:
Marine Genomics

Abstract

As complete genomes become easier to attain, even from previously difficult-to-sequence species, and as genomic resequencing becomes more routine, it is becoming obvious that genomic structural variation is more widespread than originally thought and plays an important role in maintaining genetic variation in populations. Structural variants (SVs) and associated gene presence–absence variation (PAV) can be important players in local adaptation, allowing the maintenance of genetic variation and taking part in other evolutionarily relevant phenomena. While recent studies have highlighted the importance of structural variation in Mollusca, the prevalence of this phenomenon in the broader context of marine organisms remains to be fully investigated.

Here, we describe a straightforward and broadly applicable method for the identification of SVs in fully assembled diploid genomes, leveraging the same reads used for assembly. We also explain a gene PAV analysis protocol, which could be broadly applied to any species with a fully sequenced reference genome available. Although the strength of these approaches have been tested and proven in marine invertebrates, which tend to have high levels of heterozygosity, possibly due to their lifestyle traits, they are also applicable to other species across the tree of life, providing a ready means to begin investigations into this potentially widespread phenomena.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Feuk L, Marshall CR, Wintle RF et al (2006) Structural variants: changing the landscape of chromosomes and design of disease studies. Hum Mol Genet 15:R57–R66

    Article  CAS  PubMed  Google Scholar 

  2. Marroni F, Pinosio S, Morgante M (2014) Structural variation and genome complexity: is dispensable really dispensable? Curr Opin Plant Biol 18:31–36

    Article  CAS  PubMed  Google Scholar 

  3. Read BA, Emiliania huxleyi Annotation Consortium, Kegel J et al (2013) Pan genome of the phytoplankton Emiliania underpins its global distribution. Nature 499(7457):209–213. https://doi.org/10.1038/nature12221

    Article  CAS  PubMed  Google Scholar 

  4. McInerney JO, McNally A, O’Connell MJ (2017) Why prokaryotes have pangenomes. Nat Microbiol 2:17040. https://doi.org/10.1038/nmicrobiol.2017.40

    Article  CAS  PubMed  Google Scholar 

  5. Medini D, Donati C, Tettelin H et al (2005) The microbial pan-genome. Curr Opin Genet Dev 15:589–594

    Article  CAS  PubMed  Google Scholar 

  6. Vernikos G, Medini D, Riley DR et al (2015) Ten years of pan-genome analyses. Curr Opin Microbiol 23:148–154

    Article  CAS  PubMed  Google Scholar 

  7. Aherfi S, Andreani J, Baptiste E et al (2018) A Large Open Pangenome and a Small Core Genome for Giant Pandoraviruses. Front Microbiol 9:1486. https://doi.org/10.3389/fmicb.2018.01486

    Article  PubMed  PubMed Central  Google Scholar 

  8. Song J-M, Guan Z, Hu J et al (2020) Eight high-quality genomes reveal pan-genome architecture and ecotype differentiation of Brassica napus. Nat Plants 6:34–45

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Alonge M, Wang X, Benoit M et al (2020) Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell 182:145–161.e23

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Golicz AA, Bayer PE, Bhalla PL et al (2020) Pangenomics comes of age: from bacteria to plant and animal applications. Trends Genet 36:132–145

    Article  CAS  PubMed  Google Scholar 

  11. McCarthy CGP, Fitzpatrick DA (2019) Pan-genome analyses of model fungal species. Microb Genom 5:e000243

    PubMed Central  Google Scholar 

  12. Sherman RM, Forman J, Antonescu V et al (2019) Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nat Genet 51:30–35

    Article  CAS  PubMed  Google Scholar 

  13. Tian X, Li R, Fu W et al (2020) Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data. Sci China Life Sci 63:750–763

    Article  PubMed  Google Scholar 

  14. Li R, Li Y, Zheng H et al (2010) Building the sequence map of the human pan-genome. Nat Biotechnol 28:57–63

    Article  CAS  PubMed  Google Scholar 

  15. Rosa RD, Alonso P, Santini A et al (2015) High polymorphism in big defensin gene expression reveals presence–absence gene variability (PAV) in the oyster Crassostrea gigas. Dev Comp Immunol 49(2):231–238. https://doi.org/10.1016/j.dci.2014.12.002

    Article  CAS  PubMed  Google Scholar 

  16. Gerdol M, Moreira R, Cruz F et al (2020) Massive gene presence-absence variation shapes an open pan-genome in the Mediterranean mussel. Genome Biol 21:275

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Vos M, Eyre-Walker A (2017) Are pangenomes adaptive or not? Nat Microbiol 2:1576–1576

    Article  CAS  PubMed  Google Scholar 

  18. Calcino AD, Kenny NJ, Gerdol M (2021) Single individual structural variant detection uncovers widespread hemizygosity in molluscs. Philos Trans R Soc Lond Ser B Biol Sci 376:20200153

    Article  CAS  Google Scholar 

  19. Martinez AS, Willoughby JR, Christie MR (2018) Genetic diversity in fishes is influenced by habitat type and life-history variation. Ecol Evol 8:12022–12031

    Article  PubMed  PubMed Central  Google Scholar 

  20. Olsen KC, Ryan WH, Winn AA et al (2020) Inbreeding shapes the evolution of marine invertebrates. Evolution 74:871–882

    Article  PubMed  PubMed Central  Google Scholar 

  21. Seppey M, Manni M, Zdobnov EM (2019) BUSCO: assessing genome assembly and annotation completeness. Methods Mol Biol 1962:227–245

    Article  CAS  PubMed  Google Scholar 

  22. Zdobnov EM, Tegenfeldt F, Kuznetsov D et al (2017) OrthoDB v9.1: cataloging evolutionary and functional annotations for animal, fungal, plant, archaeal, bacterial and viral orthologs. Nucleic Acids Res 45:D744–D749

    Article  CAS  PubMed  Google Scholar 

  23. Bushnell B. et al. (2014) BBMap: A Fast, Accurate, Splice-Aware Aligner. No. LBNL-7065E. Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA.

    Google Scholar 

  24. Neph S, Kuehn MS, Reynolds AP et al (2012) BEDOPS: high-performance genomic feature operations. Bioinformatics 28:1919–1920

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Li H, Durbin R (2009) Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25:1754–1760

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Li H (2013). Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. http://github.com/lh3/bwa

  28. fastp, Github. https://github.com/OpenGene/fastp

  29. Andrews S FastQC, Github. https://github.com/s-andrews/FastQC

  30. Marçais G, Kingsford C (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27:764–770

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  31. Pedersen BS, Quinlan AR (2018) Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics 34:867–868

    Article  CAS  PubMed  Google Scholar 

  32. Harris CR, Millman KJ, van der Walt SJ et al (2020) Array programming with NumPy. Nature 585:357–362

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. McKinney W (2010) Data Structures for Statistical Computing in Python. Proceedings of The 9th Python in Science Conference, pp. 51-56. https://doi.org/10.25080/majora-92bf1922-00a

  34. Pacific Biosciences (2017) pbmm2, Github. https://github.com/PacificBiosciences/pbmm2

  35. Pacific Biosciences (2017) pbsv, Github. https://github.com/PacificBiosciences/pbsv

  36. Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  37. Virtanen P, Gommers R, Oliphant TE et al (2020) Author correction: SciPy 1.0: fundamental algorithms for scientific computing in python. Nat Methods 17:352

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27:573–580

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Li H (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34:3094–3100

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Wingett SW, Andrews S (2018) FastQ screen: a tool for multi-genome mapping and quality control. F1000Res 7:1338

    Article  PubMed  PubMed Central  Google Scholar 

  41. Danecek P, Bonfield JK, Liddle J et al (2021) Twelve years of SAMtools and BCFtools. Gigascience 10:giab008

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  42. Falcon S, Gentleman R (2008) Hypergeometric testing used for gene set enrichment. Analysis:207–220. https://doi.org/10.1007/978-0-387-77240-0_14

  43. Ashburner M, Ball CA, Blake JA et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25:25–29

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Gene Ontology Consortium (2021) The gene ontology resource: enriching a GOld mine. Nucleic Acids Res 49:D325–D334

    Article  CAS  Google Scholar 

  45. Mistry J, Chuguransky S, Williams L et al (2021) Pfam: the protein families database in 2021. Nucleic Acids Res 49:D412–D419

    Article  CAS  PubMed  Google Scholar 

  46. Jones P, Binns D, Chang H-Y et al (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236–1240

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Blum M, Chang H-Y, Chuguransky S et al (2021) The InterPro protein families and domains database: 20 years on. Nucleic Acids Res 49:D344–D354

    Article  CAS  PubMed  Google Scholar 

  48. Stancu MC, van Roosmalen MJ, Renkens I et al (2017) Mapping and phasing of structural variation in patient genomes using nanopore sequencing. Nat Commun 8:1–13

    CAS  Google Scholar 

  49. Heller D, Vingron M (2019) SVIM: structural variant identification using mapped long reads. Bioinformatics 35:2907–2915

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Jiang T, Liu Y, Jiang Y et al (2020) Long-read-based human genomic structural variation detection with cuteSV. Genome Biol 21:189

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Rhie A, Walenz BP, Koren S et al (2020) Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol 21:245

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Bemm F, Weiß CL, Schultz J et al (2016) Genome of a tardigrade: horizontal gene transfer or bacterial contamination? Proc Natl Acad Sci U S A 113(22):E3054–E3056

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Espinas NA, Tu LN, Furci L et al (2020) Transcriptional regulation of genes bearing intronic heterochromatin in the rice genome. PLoS Genet 16:e1008637

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Laetsch DR, Blaxter ML (2017) BlobTools: interrogation of genome assemblies. F1000Res 6:1287

    Article  Google Scholar 

  55. Wood DE, Lu J, Langmead B (2019) Improved metagenomic analysis with kraken 2. Genome Biol 20:257

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Gaudet P, Dessimoz C (2017) Gene ontology: pitfalls, biases, and remedies. Methods Mol Biol 1446:189–205

    Article  CAS  PubMed  Google Scholar 

  57. Khalturin K, Hemmrich G, Fraune S et al (2009) More than just orphans: are taxonomically-restricted genes important in evolution? Trends Genet 25:404–413

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco Gerdol .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Sollitto, M., Kenny, N.J., Greco, S., Tucci, C.F., Calcino, A.D., Gerdol, M. (2022). Detecting Structural Variants and Associated Gene Presence–Absence Variation Phenomena in the Genomes of Marine Organisms. In: Verde, C., Giordano, D. (eds) Marine Genomics. Methods in Molecular Biology, vol 2498. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2313-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2313-8_4

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2312-1

  • Online ISBN: 978-1-0716-2313-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics