Abstract
Computation of Patterson’s D-statistic and its five-taxon derivatives are important phylogenetic methods for the quantification of reticulated evolution, yet are limited in application by the lack of a single, comprehensive program to efficiently perform all necessary calculations from common phylogenetic and population genetic program file formats. To increase accessibility for a broad range of researchers, we present a user-friendly program (COMP-D) that provides flexibility for incorporating heterozygous sites, implements multiple statistical methods, and aggregates results from multiple tests. Program augmentations also facilitate the detection of population-level introgression. COMP-D provides a threefold increase in speed relative to comparable software. It is implemented in C++ and released under the GNU General Public License v3.0. Source code is available for Linux/Mac OS X from: https://github.com/stevemussmann/Comp-D_MPI.
Similar content being viewed by others
References
Allendorf FW et al (2001) The problems with hybrids: setting conservation guidelines. Trends Ecol Evol 16(11):613–622
Árnason Ú (2018) Whole-genome sequencing of the blue whale and other rorquals finds signatures for introgressive gene flow. Sci Adv 4:eaap9873
Bangs MR et al (2018) Unraveling historical introgression and resolving phylogenetic discord within Catostomus (Osteichthyes: Catostomidae). BMC Evol Biol 18:86
Blackmon H, Adams RA (2015) EvobiR: Tools for comparative analyses and teaching evolutionary biology. https://doi.org/10.5281/zenodo.30938
Bohling JH (2016) Strategies to address the conservation threats posed by hybridization and genetic introgression. Biol Conserv 203:321–327
DaCosta JM, Sorensen MD (2014) Amplification biases and consistent recovery of loci in a double-digest RAD-seq protocol. PLoS ONE 9(9):e106713
Durand EY et al (2011) Testing for ancient admixture between closely related populations. Mol Biol Evol 28:2239–2252
Eaton DA (2014) PyRad: assembly of de novo RADseq loci for phylogenetic analyses. Bioinformatics 30:1844–1849
Eaton DA, Ree RH (2013) Inferring phylogeny and introgression using RADseq data: an example from flowering plants (Pedicularis: Orobanchaceae). Syst Biol 62(5):689–706
Eaton DA et al (2015) Historical introgression among the American live oaks and the comparative nature of tests for introgression. Evolution 69:2587–2601
Efron B (1981) Nonparametric estimates of standard error: the jackknife, the bootstrap and other methods. Biometrika 68(3):589–599
Gompert Z, Buerkle CA (2010) Introgress: a software package for mapping components of isolation in hybrids. Mol Ecol Res 10:378–384
Green RE et al (2010) A draft sequence of the Neanderthal genome. Science 328(5979):710–722
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70
Hou Y et al (2015) Thousands of RAD-seq loci fully resolve the phylogeny of the highly disjunct arctic-alpine Diapensia (Diapensiaceae). PLoS ONE 10(10):e0140175
Korneliussen TS et al (2014) ANGSD: analysis of next generation sequencing data. BMC Bioinformatics 15:356
Malukiewicz J et al (2015) Natural and anthropogenic hybridization in two species of eastern Brazilian marmosets (Callithrix jacchus and C. penicillate). PLoS One 10(6):e0127268
Martin SH et al (2015) Evaluating the use of ABBA-BABA statistics to locate introgressed loci. Mol Biol Evol 32:244–257
Ottenburghs J et al (2017) A history of hybrids? Genomic patterns of introgression in the true geese. BMC Evol Biol 17:201
Patterson N et al (2012) Ancient admixture in human history. Genetics 192:1065–1093
Pease JB, Hahn MW (2015) Detection and polarization of introgression in a five-taxon phylogeny. Syst Biol 64:651–662
Perneger TV (1998) What’s wrong with Bonferroni adjustments. Brit Med J 316:1236–1238
Rice WR (1989) Analyzing tables of statistical tests. Evolution 43:223–225
Zhang W et al (2016) Genome-wide introgression among distantly related Heliconius butterfly species. Genome Biol 17:25
Zheng Y, Janke A (2018) Gene flow analysis method, the D-statistic, is robust in a wide parameter space. BMC Bioinform 19:10
Acknowledgements
The Arkansas High Performance Computing Center (AHPCC) provided technical assistance and computational resources. Tyler K. Chafin and Bradley T. Martin promoted software development by testing an early version of the program. This research was conducted in partial fulfillment of the Ph.D. degree in Biological Sciences at University of Arkansas (SMM). It was supported by generous University of Arkansas endowments: The Bruker Professorship in Life Sciences (MRD), the twenty-first Century Chair in Global Change Biology (MED), and a Doctoral Academy Fellowship (SMM). Three anonymous reviewers provided comments that greatly improved the manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have nothing to disclose.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
12686_2019_1087_MOESM1_ESM.xlsx
Supplementary Table 1. Results of four-taxon D-statistic tests comparing methods in COMP-D for handling heterozygous loci versus those from pyRAD. Each column shows the number of statistically significant tests (α=0.001) in each treatment. COMP-D offers two methods of assessing statistical significance (Z-scores and Chi-square tests) whereas pyRAD offers only Z-scores. Two treatments (HetRand and HetFreq) considered all heterozygous loci in D-statistic calculations, but differed by either randomly picking an allele to represent an individual (HetRand) or using SNP frequency calculations (HetFreq). The HetIgnore method removed all heterozygous loci from calculations. All tests employed SNP-data for catostomid fishes of western North America. Taxonomic abbreviations (P1, P2, P3, and O columns) are as follows: BBS = Bonneville Bluehead Sucker, BLS = Bridgelip Sucker, FMS = Flannelmouth Sucker, LNS = Longnose Sucker, MTS = Mountain Sucker, RBS = Razorback Sucker, SOS = Sonora Sucker, THS = Tahoe Sucker, WTS = White Sucker. Abbreviations in parentheses next to species abbreviations represent different populations. BB = Bonneville Basin, CB = Columbia River Basin, GC = Grand Canyon of the Colorado River, LB = Lahontan Basin, LC = Little Colorado River, UC = Upper Colorado River Basin, VR = Virgin River, wen = Wenima Wildlife Area of the Little Colorado River. (XLSX 11 KB)
12686_2019_1087_MOESM2_ESM.xlsx
Supplementary Table 2. The number of biallelic loci recovered using heterozygous loci (Het. Included) versus only fixed loci (Het. Excluded). Mean number of loci (Avg. Loci) and standard deviation (StDev) are presented for each. The % decrease indicates those loci lost by considering only fixed differences among taxa. All tests employed data for catostomid fishes of western North America. Taxonomic abbreviations (P1, P2, P3, and O columns) are: BBS = Bonneville Bluehead Sucker, BLS = Bridgelip Sucker, FMS = Flannelmouth Sucker, LNS = Longnose Sucker, MTS = Mountain Sucker, RBS = Razorback Sucker, SOS = Sonora Sucker, THS = Tahoe Sucker, WTS = White Sucker. Abbreviations in parentheses next to species abbreviations represent different populations. BB = Bonneville Basin, CB = Columbia River Basin, GC = Grand Canyon of the Colorado River, LB = Lahontan Basin, LC = Little Colorado River, UC = Upper Colorado River Basin, VR = Virgin River, wen = Wenima Wildlife Area of the Little Colorado River. (XLSX 11 KB)
Rights and permissions
About this article
Cite this article
Mussmann, S.M., Douglas, M.R., Bangs, M.R. et al. Comp-D: a program for comprehensive computation of D-statistics and population summaries of reticulated evolution. Conservation Genet Resour 12, 263–267 (2020). https://doi.org/10.1007/s12686-019-01087-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12686-019-01087-x