Abstract
Hypothesized evolutionary insertions and deletions in nucleic acid sequences (indels) contain significant phylogenetic information and can be integrated in phylogenomic analyses. However, assemblies of short reads obtained from next-generation sequencing (NGS) technologies can contain errors that result in falsely inferred indels that need to be detected and omitted to avoid inclusion in phylogenetic analysis. Here, we detail the commands that comprise a new version of the NGS-Indel Coder pipeline, which was developed to validate indels using assembly read depth.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Giribet G, Wheeler WC (1999) On gaps. Mol Phylogenet Evol 13:132–143. https://doi.org/10.1006/mpev.1999.0643
Simmons MP, Ochoterena H (2000) Gaps as characters in sequence-based phylogenetic analyses. Syst Biol 49:13
Redelings BD, Suchard MA (2007) Incorporating indel information into phylogeny estimation for rapidly emerging pathogens. BMC Evol Biol 7:40. https://doi.org/10.1186/1471-2148-7-40
Belinky F, Cohen O, Huchon D (2010) Large-scale parsimony analysis of metazoan indels in protein-coding genes. Mol Biol Evol 27:441–451. https://doi.org/10.1093/molbev/msp263
Paśko Ł, Ericson PGP, Elzanowski A (2011) Phylogenetic utility and evolution of indels: a study in neognathous birds. Mol Phylogenet Evol 61:760–771. https://doi.org/10.1016/j.ympev.2011.07.021
Warnow T (2012) Standard maximum likelihood analyses of alignments with gaps can be statistically inconsistent. PLoS Curr 4:RRN1308. https://doi.org/10.1371/currents.RRN1308
Boutte J, Fishbein M, Liston A, Straub SCK (2019) NGS-Indel Codel: a pipeline to code indel characters in phylogenomic data with an example of its application in milkweeds (Asclepias). Mol Phylogenet Evol 139:106534. https://doi.org/10.1016/j.ympev.2019.106534
Houde P, Braun EL, Narula N, Minjares U, Mirarab S (2019) Phylogenetic signal of indels and the neoavian radiation. Diversity 11(7):108. https://doi.org/10.3390/d11070108
Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. https://doi.org/10.1093/bioinformatics/btp352
Li R, Li Y, Fang X et al (2009) SNP detection for massively parallel whole-genome resequencing. Genome Res 19:1124–1132. https://doi.org/10.1101/gr.088013.108
Xu F, Wang W, Wang P et al (2012) A fast and accurate SNP detection algorithm for next-generation sequencing data. Nat Commun 3:1258. https://doi.org/10.1038/ncomms2256
McKenna A, Hanna M, Banks E et al (2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303. https://doi.org/10.1101/gr.107524.110
Boutte J, Aliaga B, Lima O et al (2016) Haplotype detection from next-generation sequencing in high-ploidy-level species: 45S rDNA gene copies in the hexaploid spartina maritima. G3 6:29–40. https://doi.org/10.1534/g3.115.023242
Boutte J, Ferreira de Carvalho J, Rousseau-Gueutin M et al (2016) Reference transcriptomes and detection of duplicated copies in hexaploid and allododecaploid spartina species (Poaceae). Genome Biol Evol 8:3030–3044. https://doi.org/10.1093/gbe/evw209
Muggli MD, Puglisi SJ, Ronen R, Boucher C (2015) Misassembly detection using paired-end sequence reads and optical mapping data. Bioinformatics 31:i80–i88. https://doi.org/10.1093/bioinformatics/btv262
Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ (2015) IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32:268–274. https://doi.org/10.1093/molbev/msu300
Chernomor O, von Haeseler A, Minh BQ (2016) Terrace aware data structure for phylogenomic inference from supermatrices. Syst Biol 65:997–1008. https://doi.org/10.1093/sysbio/syw037
Salinas NR, Little DP (2014) 2matrix: a utility for indel coding and phylogenetic matrix concatenation. Appl Plant Sci 2:1300083. https://doi.org/10.3732/apps.1300083
Johnson MG, Gardner EM, Liu Y et al (2016) HybPiper: extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment. Appl Plant Sci 4:1600016. https://doi.org/10.3732/apps.1600016
Li H, Durbin R (2009) Fast short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14):1754–1760. https://doi.org/10.1093/bioinformatics/btp324
Acknowledgments
This work was supported by NSF DEB awards 1457510/1457473 to MF and SCKS.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Boutte, J., Fishbein, M., Straub, S.C.K. (2022). NGS-Indel Coder v2.0: A Streamlined Pipeline to Code Indel Characters in Phylogenomic Data. In: Pereira-Santana, A., Gamboa-Tuz, S.D., Rodríguez-Zapata, L.C. (eds) Plant Comparative Genomics. Methods in Molecular Biology, vol 2512. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2429-6_4
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2429-6_4
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2428-9
Online ISBN: 978-1-0716-2429-6
eBook Packages: Springer Protocols