Abstract
Reduced-representation genomic methods are an invaluable data acquisition tool for conservation geneticists, yet a priori estimates of locus recovery are difficult for non-model organisms. We present a simple in silico approach (FRAGMATIC) that predicts locus recovery in ddRAD sequencing which utilizes genomic data for related organisms. Its applicability was tested by quantifying prediction accuracy versus genetic distances across five non-model organisms and reference genomes for related organisms of varying phylogenetic distance. We additionally examined sensitivity of the method using one organism (Danio rerio) with an available genome. FRAGMATIC supports population genomic projects in non-model species by providing a priori estimates of targeted ddRAD loci that, in turn, will curb wasted sequencing effort and optimize cost-efficiency. Validation shows that while predictive error is minimized when applied to a closely related reference genome, in silico estimates may also be robust to deeper (e.g. within-family) relationships, although weak correlation suggests that specific characteristics of genome architecture may be more predictive than genetic distance. This indicates that a more extensive exploration of genomes, including a broader taxonomic scope (e.g. beyond vertebrates), may be informative. All code is freely available at: https://github.com/tkchafin/fragmatic.
References
Andrews KR et al (2016) Harnessing the power of RADseq for ecological and evolutionary genomics. Nat Rev Genet 17:81–92
Anthonysamy WJBA (2017) Personal Communication
Baird NA et al (2008) Rapid SNP Discovery and genetic mapping using sequenced RAD markers. PloS One 3:e3376
Bushnell B (2014) BBMap: a fast, accurate, splice-aware aligner. No. LBNL-7065E. Ernest Orlando Lawrence Berkeley National Laboratory
DaCosta JM, Sorenson MD (2014) Amplification biases and consistent recovery of loci in a double-digest RAD-seq protocol. PloS One 9:e106713
Davey JW et al (2011) Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Rev Genet 12:499–510
de Koning AJ et al (2011) Repetitive elements may comprise over two-thirds of the human genome. PloS Genet 7:e1002384
Eaton DA (2014) PyRAD: assembly of de novo RADseq loci for phylogenetic analyses. Bioinformatics 30:1844–1849
Edwards S et al (2015) Next-generation sequencing and the expanding domain of phylogeography. Folia Zool 64:187–206
Gautier M et al (2013) The effect of RAD allele dropout on the estimation of genetic variation within and between populations. Mol Ecol 22:3165–3178
Heffelfinger C et al (2014) Flexible and scalable genotyping-by-sequencing strategies for population studies. BMC Genom 15:979
Herrera S et al (2015) Genome-wide predictability of restriction sites across the eukaryotic tree of life. Genom Biol 7:3207–3225
Howe K et al (2013) The zebrafish reference genome sequence and its relationship to the human genome. Nature 496:498–503
Kamps-Hughes N et al (2013) Massively parallel characterization of restriction endonucleases. Nucleic Acids Res 41:e119–e119
Lepais O, Weir JT (2014) SimRAD: an R package for simulation-based prediction of the number of loci expected in RADseq and similar genotyping by sequencing approaches. Molec Ecol Res 14:1314–1321
Martin BT (2017) Personal Communication
Mussmann SM (2017) Personal Communication
Peterson BK et al (2012) Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PloS One 7:e37135
Puritz JB et al (2014) Demystifying the RAD fad. Molec Ecol 23:5937–5942
Rognes T et al (2016) VSEARCH: a versatile open source tool for metagenomics. Peer J 4:e2409v1
Acknowledgements
This work was supported by: University of Arkansas Distinguished Doctoral Fellowship (TKC); University of Arkansas Endowments (Bruker Professorship in Life Sciences to MRD and twenty-first Century Chair in Global Climate Change Biology to MED); the Arkansas High Performance Computing Center; the Arkansas Biosciences Institute; and the Arkansas Economic Development Commission. We would also like to acknowledge Max Bangs, Whitney Anthonysamy, and Brenna Levine for contributing data and/or lab work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Disclosures
Authors have nothing to disclose.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Chafin, T.K., Martin, B.T., Mussmann, S.M. et al. FRAGMATIC: in silico locus prediction and its utility in optimizing ddRADseq projects. Conservation Genet Resour 10, 325–328 (2018). https://doi.org/10.1007/s12686-017-0814-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12686-017-0814-1