Skip to main content
Log in

Comp-D: a program for comprehensive computation of D-statistics and population summaries of reticulated evolution

  • Methods and Resources Article
  • Published:
Conservation Genetics Resources Aims and scope Submit manuscript

Abstract

Computation of Patterson’s D-statistic and its five-taxon derivatives are important phylogenetic methods for the quantification of reticulated evolution, yet are limited in application by the lack of a single, comprehensive program to efficiently perform all necessary calculations from common phylogenetic and population genetic program file formats. To increase accessibility for a broad range of researchers, we present a user-friendly program (COMP-D) that provides flexibility for incorporating heterozygous sites, implements multiple statistical methods, and aggregates results from multiple tests. Program augmentations also facilitate the detection of population-level introgression. COMP-D provides a threefold increase in speed relative to comparable software. It is implemented in C++ and released under the GNU General Public License v3.0. Source code is available for Linux/Mac OS X from: https://github.com/stevemussmann/Comp-D_MPI.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Allendorf FW et al (2001) The problems with hybrids: setting conservation guidelines. Trends Ecol Evol 16(11):613–622

    Article  Google Scholar 

  • Árnason Ú (2018) Whole-genome sequencing of the blue whale and other rorquals finds signatures for introgressive gene flow. Sci Adv 4:eaap9873

    Article  Google Scholar 

  • Bangs MR et al (2018) Unraveling historical introgression and resolving phylogenetic discord within Catostomus (Osteichthyes: Catostomidae). BMC Evol Biol 18:86

    Article  Google Scholar 

  • Blackmon H, Adams RA (2015) EvobiR: Tools for comparative analyses and teaching evolutionary biology. https://doi.org/10.5281/zenodo.30938

  • Bohling JH (2016) Strategies to address the conservation threats posed by hybridization and genetic introgression. Biol Conserv 203:321–327

    Article  Google Scholar 

  • DaCosta JM, Sorensen MD (2014) Amplification biases and consistent recovery of loci in a double-digest RAD-seq protocol. PLoS ONE 9(9):e106713

    Article  Google Scholar 

  • Durand EY et al (2011) Testing for ancient admixture between closely related populations. Mol Biol Evol 28:2239–2252

    Article  CAS  Google Scholar 

  • Eaton DA (2014) PyRad: assembly of de novo RADseq loci for phylogenetic analyses. Bioinformatics 30:1844–1849

    Article  CAS  Google Scholar 

  • Eaton DA, Ree RH (2013) Inferring phylogeny and introgression using RADseq data: an example from flowering plants (Pedicularis: Orobanchaceae). Syst Biol 62(5):689–706

    Article  CAS  Google Scholar 

  • Eaton DA et al (2015) Historical introgression among the American live oaks and the comparative nature of tests for introgression. Evolution 69:2587–2601

    Article  CAS  Google Scholar 

  • Efron B (1981) Nonparametric estimates of standard error: the jackknife, the bootstrap and other methods. Biometrika 68(3):589–599

    Article  Google Scholar 

  • Gompert Z, Buerkle CA (2010) Introgress: a software package for mapping components of isolation in hybrids. Mol Ecol Res 10:378–384

    Article  CAS  Google Scholar 

  • Green RE et al (2010) A draft sequence of the Neanderthal genome. Science 328(5979):710–722

    Article  CAS  Google Scholar 

  • Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70

    Google Scholar 

  • Hou Y et al (2015) Thousands of RAD-seq loci fully resolve the phylogeny of the highly disjunct arctic-alpine Diapensia (Diapensiaceae). PLoS ONE 10(10):e0140175

    Article  Google Scholar 

  • Korneliussen TS et al (2014) ANGSD: analysis of next generation sequencing data. BMC Bioinformatics 15:356

    Article  Google Scholar 

  • Malukiewicz J et al (2015) Natural and anthropogenic hybridization in two species of eastern Brazilian marmosets (Callithrix jacchus and C. penicillate). PLoS One 10(6):e0127268

    Article  Google Scholar 

  • Martin SH et al (2015) Evaluating the use of ABBA-BABA statistics to locate introgressed loci. Mol Biol Evol 32:244–257

    Article  CAS  Google Scholar 

  • Ottenburghs J et al (2017) A history of hybrids? Genomic patterns of introgression in the true geese. BMC Evol Biol 17:201

    Article  Google Scholar 

  • Patterson N et al (2012) Ancient admixture in human history. Genetics 192:1065–1093

    Article  Google Scholar 

  • Pease JB, Hahn MW (2015) Detection and polarization of introgression in a five-taxon phylogeny. Syst Biol 64:651–662

    Article  CAS  Google Scholar 

  • Perneger TV (1998) What’s wrong with Bonferroni adjustments. Brit Med J 316:1236–1238

    Article  CAS  Google Scholar 

  • Rice WR (1989) Analyzing tables of statistical tests. Evolution 43:223–225

    Article  Google Scholar 

  • Zhang W et al (2016) Genome-wide introgression among distantly related Heliconius butterfly species. Genome Biol 17:25

    Article  Google Scholar 

  • Zheng Y, Janke A (2018) Gene flow analysis method, the D-statistic, is robust in a wide parameter space. BMC Bioinform 19:10

    Article  Google Scholar 

Download references

Acknowledgements

The Arkansas High Performance Computing Center (AHPCC) provided technical assistance and computational resources. Tyler K. Chafin and Bradley T. Martin promoted software development by testing an early version of the program. This research was conducted in partial fulfillment of the Ph.D. degree in Biological Sciences at University of Arkansas (SMM). It was supported by generous University of Arkansas endowments: The Bruker Professorship in Life Sciences (MRD), the twenty-first Century Chair in Global Change Biology (MED), and a Doctoral Academy Fellowship (SMM). Three anonymous reviewers provided comments that greatly improved the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Steven M. Mussmann.

Ethics declarations

Conflict of interest

The authors have nothing to disclose.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

12686_2019_1087_MOESM1_ESM.xlsx

Supplementary Table 1. Results of four-taxon D-statistic tests comparing methods in COMP-D for handling heterozygous loci versus those from pyRAD. Each column shows the number of statistically significant tests (α=0.001) in each treatment. COMP-D offers two methods of assessing statistical significance (Z-scores and Chi-square tests) whereas pyRAD offers only Z-scores. Two treatments (HetRand and HetFreq) considered all heterozygous loci in D-statistic calculations, but differed by either randomly picking an allele to represent an individual (HetRand) or using SNP frequency calculations (HetFreq). The HetIgnore method removed all heterozygous loci from calculations. All tests employed SNP-data for catostomid fishes of western North America. Taxonomic abbreviations (P1, P2, P3, and O columns) are as follows: BBS = Bonneville Bluehead Sucker, BLS = Bridgelip Sucker, FMS = Flannelmouth Sucker, LNS = Longnose Sucker, MTS = Mountain Sucker, RBS = Razorback Sucker, SOS = Sonora Sucker, THS = Tahoe Sucker, WTS = White Sucker. Abbreviations in parentheses next to species abbreviations represent different populations. BB = Bonneville Basin, CB = Columbia River Basin, GC = Grand Canyon of the Colorado River, LB = Lahontan Basin, LC = Little Colorado River, UC = Upper Colorado River Basin, VR = Virgin River, wen = Wenima Wildlife Area of the Little Colorado River. (XLSX 11 KB)

12686_2019_1087_MOESM2_ESM.xlsx

Supplementary Table 2. The number of biallelic loci recovered using heterozygous loci (Het. Included) versus only fixed loci (Het. Excluded). Mean number of loci (Avg. Loci) and standard deviation (StDev) are presented for each. The % decrease indicates those loci lost by considering only fixed differences among taxa. All tests employed data for catostomid fishes of western North America. Taxonomic abbreviations (P1, P2, P3, and O columns) are: BBS = Bonneville Bluehead Sucker, BLS = Bridgelip Sucker, FMS = Flannelmouth Sucker, LNS = Longnose Sucker, MTS = Mountain Sucker, RBS = Razorback Sucker, SOS = Sonora Sucker, THS = Tahoe Sucker, WTS = White Sucker. Abbreviations in parentheses next to species abbreviations represent different populations. BB = Bonneville Basin, CB = Columbia River Basin, GC = Grand Canyon of the Colorado River, LB = Lahontan Basin, LC = Little Colorado River, UC = Upper Colorado River Basin, VR = Virgin River, wen = Wenima Wildlife Area of the Little Colorado River. (XLSX 11 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mussmann, S.M., Douglas, M.R., Bangs, M.R. et al. Comp-D: a program for comprehensive computation of D-statistics and population summaries of reticulated evolution. Conservation Genet Resour 12, 263–267 (2020). https://doi.org/10.1007/s12686-019-01087-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12686-019-01087-x

Keywords

Navigation