Skip to main content

NGS-Indel Coder v2.0: A Streamlined Pipeline to Code Indel Characters in Phylogenomic Data

  • Protocol
  • First Online:
Plant Comparative Genomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2512))

  • 1132 Accesses

Abstract

Hypothesized evolutionary insertions and deletions in nucleic acid sequences (indels) contain significant phylogenetic information and can be integrated in phylogenomic analyses. However, assemblies of short reads obtained from next-generation sequencing (NGS) technologies can contain errors that result in falsely inferred indels that need to be detected and omitted to avoid inclusion in phylogenetic analysis. Here, we detail the commands that comprise a new version of the NGS-Indel Coder pipeline, which was developed to validate indels using assembly read depth.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Giribet G, Wheeler WC (1999) On gaps. Mol Phylogenet Evol 13:132–143. https://doi.org/10.1006/mpev.1999.0643

    Article  CAS  PubMed  Google Scholar 

  2. Simmons MP, Ochoterena H (2000) Gaps as characters in sequence-based phylogenetic analyses. Syst Biol 49:13

    Article  Google Scholar 

  3. Redelings BD, Suchard MA (2007) Incorporating indel information into phylogeny estimation for rapidly emerging pathogens. BMC Evol Biol 7:40. https://doi.org/10.1186/1471-2148-7-40

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Belinky F, Cohen O, Huchon D (2010) Large-scale parsimony analysis of metazoan indels in protein-coding genes. Mol Biol Evol 27:441–451. https://doi.org/10.1093/molbev/msp263

    Article  CAS  PubMed  Google Scholar 

  5. Paśko Ł, Ericson PGP, Elzanowski A (2011) Phylogenetic utility and evolution of indels: a study in neognathous birds. Mol Phylogenet Evol 61:760–771. https://doi.org/10.1016/j.ympev.2011.07.021

    Article  PubMed  Google Scholar 

  6. Warnow T (2012) Standard maximum likelihood analyses of alignments with gaps can be statistically inconsistent. PLoS Curr 4:RRN1308. https://doi.org/10.1371/currents.RRN1308

    Article  PubMed  PubMed Central  Google Scholar 

  7. Boutte J, Fishbein M, Liston A, Straub SCK (2019) NGS-Indel Codel: a pipeline to code indel characters in phylogenomic data with an example of its application in milkweeds (Asclepias). Mol Phylogenet Evol 139:106534. https://doi.org/10.1016/j.ympev.2019.106534

    Article  PubMed  Google Scholar 

  8. Houde P, Braun EL, Narula N, Minjares U, Mirarab S (2019) Phylogenetic signal of indels and the neoavian radiation. Diversity 11(7):108. https://doi.org/10.3390/d11070108

    Article  CAS  Google Scholar 

  9. Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. https://doi.org/10.1093/bioinformatics/btp352

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Li R, Li Y, Fang X et al (2009) SNP detection for massively parallel whole-genome resequencing. Genome Res 19:1124–1132. https://doi.org/10.1101/gr.088013.108

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Xu F, Wang W, Wang P et al (2012) A fast and accurate SNP detection algorithm for next-generation sequencing data. Nat Commun 3:1258. https://doi.org/10.1038/ncomms2256

    Article  CAS  PubMed  Google Scholar 

  12. McKenna A, Hanna M, Banks E et al (2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303. https://doi.org/10.1101/gr.107524.110

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Boutte J, Aliaga B, Lima O et al (2016) Haplotype detection from next-generation sequencing in high-ploidy-level species: 45S rDNA gene copies in the hexaploid spartina maritima. G3 6:29–40. https://doi.org/10.1534/g3.115.023242

    Article  CAS  Google Scholar 

  14. Boutte J, Ferreira de Carvalho J, Rousseau-Gueutin M et al (2016) Reference transcriptomes and detection of duplicated copies in hexaploid and allododecaploid spartina species (Poaceae). Genome Biol Evol 8:3030–3044. https://doi.org/10.1093/gbe/evw209

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Muggli MD, Puglisi SJ, Ronen R, Boucher C (2015) Misassembly detection using paired-end sequence reads and optical mapping data. Bioinformatics 31:i80–i88. https://doi.org/10.1093/bioinformatics/btv262

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ (2015) IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32:268–274. https://doi.org/10.1093/molbev/msu300

    Article  CAS  PubMed  Google Scholar 

  17. Chernomor O, von Haeseler A, Minh BQ (2016) Terrace aware data structure for phylogenomic inference from supermatrices. Syst Biol 65:997–1008. https://doi.org/10.1093/sysbio/syw037

    Article  PubMed  PubMed Central  Google Scholar 

  18. Salinas NR, Little DP (2014) 2matrix: a utility for indel coding and phylogenetic matrix concatenation. Appl Plant Sci 2:1300083. https://doi.org/10.3732/apps.1300083

    Article  Google Scholar 

  19. Johnson MG, Gardner EM, Liu Y et al (2016) HybPiper: extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment. Appl Plant Sci 4:1600016. https://doi.org/10.3732/apps.1600016

    Article  Google Scholar 

  20. Li H, Durbin R (2009) Fast short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14):1754–1760. https://doi.org/10.1093/bioinformatics/btp324

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgments

This work was supported by NSF DEB awards 1457510/1457473 to MF and SCKS.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Boutte, J., Fishbein, M., Straub, S.C.K. (2022). NGS-Indel Coder v2.0: A Streamlined Pipeline to Code Indel Characters in Phylogenomic Data. In: Pereira-Santana, A., Gamboa-Tuz, S.D., Rodríguez-Zapata, L.C. (eds) Plant Comparative Genomics. Methods in Molecular Biology, vol 2512. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2429-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2429-6_4

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2428-9

  • Online ISBN: 978-1-0716-2429-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics