Impact of reduced-representation sequencing protocols on detecting population structure in a threatened marsupial

  • B. R. Wright
  • C. E. Grueber
  • M. J. Lott
  • K. Belov
  • R. N. Johnson
  • C. J. HoggEmail author
Short Communication


Reduced-representation sequencing methods have wide utility in conservation genetics of non-model species. Several methods are now available that reduce genome complexity to examine a wide range of markers in a large number of individuals. We produced two datasets collected using different laboratory techniques, comprising a common set of samples from the greater bilby (Macrotis lagotis). We examined the impact of differing data filtering thresholds on downstream population inferences. We found that choice of restriction enzyme and data filtering thresholds, especially the rate of allowable missing data, impacted our ability to detect population structure. Estimates of FST were robust to alterations in laboratory and bioinformatic protocols while principal coordinates and STRUCTURE analyses showed variation according to the number of loci and percent missing data. We advise researchers using reduced-representation sequencing in conservation projects to examine a range of data thresholds, and follow these through to downstream population inferences. Multiple measures of population differentiation should be used in order to fully understand how data filtering thresholds influence the final dataset, paying particular attention to the impact of allowable missing data. Our results indicate that failure to follow these checks could impact conclusions drawn, and conservation management decisions made.


Reduced representation sequencing Restriction enzymes Data filtering Threatened species management Principal coordinates analyses FST 



Australian Wildlife Conservancy, Adelaide Zoo, Cleland Wildlife Park, Currumbin Wildlife Sanctuary, Kanyana Wildlife Rehabilitation Centre, Monarto Zoo, Taronga Zoo and Cathy Herbert for providing bilby samples. Claire Ford and Camille Goldstone-Henry for providing background on establishment of the bilby captive population. This work was funded by the Australian Wildlife Conservancy, the Australian Museum Research Institute, the University of Sydney; and an Australian Research Council DP170101253 to CEG.

Author contributions

This study was conceived by CH and RJ. Further study design was performed by CG and BW. ML performed laboratory work and contributed to the manuscript. BW and CG analysed the data. BW wrote the manuscript. All authors contributed to the final manuscript.

Supplementary material

11033_2019_4966_MOESM1_ESM.doc (5.4 mb)
Supplementary material 1 (DOC 5495 kb)


  1. 1.
    Pickles R, Groombridge JJ, Rojas VZ, Van Damme P, Gottelli D, Ariani C, Jordan W (2012) Genetic diversity and population structure in the endangered giant otter, Pteronura brasiliensis. Conserv Genet 13(1):235–245CrossRefGoogle Scholar
  2. 2.
    Brown SK, Hull JM, Updike DR, Fain SR, Ernest HB (2009) Black bear population genetics in California: signatures of population structure, competitive release, and historical translocation. J Mamm 90(5):1066–1074CrossRefGoogle Scholar
  3. 3.
    Matthee CA, Robinson TJ (1999) Mitochondrial DNA population structure of roan and sable antelope: implications for the translocation and conservation of the species. Mol Ecol 8(2):227–238CrossRefGoogle Scholar
  4. 4.
    Eldridge WH, Naish KA (2007) Long-term effects of translocation and release numbers on fine-scale population structure among coho salmon (Oncorhynchus kisutch). Mol Ecol 16(12):2407–2421CrossRefGoogle Scholar
  5. 5.
    Narum SR, Buerkle CA, Davey JW, Miller MR, Hohenlohe PA (2013) Genotyping-by-sequencing in ecological and conservation genomics. Mol Ecol 22(11):2841–2847CrossRefGoogle Scholar
  6. 6.
    Andrews KR, Good JM, Miller MR, Luikart G, Hohenlohe PA (2016) Harnessing the power of RADseq for ecological and evolutionary genomics. Nat Rev Genet 17(2):81–92. CrossRefGoogle Scholar
  7. 7.
    Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE (2012) Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLoS ONE 7(5):e37135CrossRefGoogle Scholar
  8. 8.
    Flanagan SP, Jones AG (2018) Substantial differences in bias between single-digest and double-digest RAD-seq libraries: a case study. Mol Ecol Resour 18(2):264–280CrossRefGoogle Scholar
  9. 9.
    DaCosta JM, Sorenson MD (2014) Amplification biases and consistent recovery of loci in a double-digest RAD-seq protocol. PLoS ONE 9(9):e106713CrossRefGoogle Scholar
  10. 10.
    Cruz VM, Kilian A, Dierig DA (2013) Development of DArT marker platforms and genetic diversity assessment of the US collection of the new oilseed crop lesquerella and related species. PLoS ONE 8(5):e64062CrossRefGoogle Scholar
  11. 11.
    EPBC Act (1999) Environment Protection and Biodiversity Conservation Act 1999Google Scholar
  12. 12.
    Pavey C (2006) National recovery plan for the greater bilbyGoogle Scholar
  13. 13.
    Miller EJ, Eldridge MDB, Morris K, Thomas N, Herbert CA (2015) Captive management and the maintenance of genetic diversity in a vulnerable marsupial, the greater bilby. Aust Mamm 37(2):170–181. CrossRefGoogle Scholar
  14. 14.
    Sunnucks P, Hales DF (1996) Numerous transposed sequences of mitochondrial cytochrome oxidase I-II in aphids of the genus Sitobion (Hemiptera: Aphididae). Mol Biol Evol 13(3):510–524CrossRefGoogle Scholar
  15. 15.
    White LC, Moseby KE, Thomson VA, Donnellan SC, Austin JJ (2018) Long-term genetic consequences of mammal reintroductions into an Australian conservation reserve. Biol Conserv 219:1–11CrossRefGoogle Scholar
  16. 16.
    Andrews S (2010) FastQC: a quality control tool for high throughput sequence dataGoogle Scholar
  17. 17.
    Catchen J, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013) Stacks: an analysis tool set for population genomics. Mol Ecol 22(11):3124–3140CrossRefGoogle Scholar
  18. 18.
    Bushnell B (2014) BBMap: a fast, accurate, splice-aware aligner. Ernest Orlando Lawrence Berkeley National Laboratory, BerkeleyGoogle Scholar
  19. 19.
    Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, De Bakker PI, Daly MJ (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81(3):559–575CrossRefGoogle Scholar
  20. 20.
    Peakall R, Smouse PE (2006) GENALEX 6: genetic analysis in Excel. Population genetic software for teaching and research. Mol Ecol Notes 6(1):288–295CrossRefGoogle Scholar
  21. 21.
    Rohlf FJ (1972) An empirical comparison of three ordination techniques in numerical taxonomy. Syst Zool 21(3):271–280CrossRefGoogle Scholar
  22. 22.
    Ramette A (2007) Multivariate analyses in microbial ecology. FEMS Microbiol Ecol 62(2):142–160CrossRefGoogle Scholar
  23. 23.
    Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155(2):945–959Google Scholar
  24. 24.
    Pritchard J, Wen X, Falush D (2010) Documentation for STRUCTURE software, version 2.3. University of Chicago, Chicago, ILGoogle Scholar
  25. 25.
    Kalinowski ST (2011) The computer program STRUCTURE does not reliably identify the main genetic clusters within species: simulations and implications for human population structure. Heredity 106(4):625CrossRefGoogle Scholar
  26. 26.
    Puechmaille SJ (2016) The program structure does not reliably recover the correct population structure when sampling is uneven: subsampling and new estimators alleviate the problem. Mol Ecol Resour 16(3):608–627CrossRefGoogle Scholar
  27. 27.
    Linck EB, Battey C (2017) Minor allele frequency thresholds strongly affect population structure inference with genomic datasets. Mol Ecol Resour 19(3):639–647CrossRefGoogle Scholar
  28. 28.
    Shafer A, Peart CR, Tusso S, Maayan I, Brelsford A, Wheat CW, Wolf JB (2017) Bioinformatic processing of RAD-seq data dramatically impacts downstream population genetic inference. Methods Ecol Evol 8(8):907–917CrossRefGoogle Scholar
  29. 29.
    Arnold B, Corbett-Detig RB, Hartl D, Bomblies K (2013) RADseq underestimates diversity and introduces genealogical biases due to nonrandom haplotype sampling. Mol Ecol 22(11):3179–3190. CrossRefGoogle Scholar
  30. 30.
    Jakobsson M, Edge MD, Rosenberg NA (2013) The relationship between FST and the frequency of the most frequent allele. Genetics 193(2):515–528CrossRefGoogle Scholar
  31. 31.
    Bhatia G, Patterson NJ, Sankararaman S, Price AL (2013) Estimating and interpreting FST: the impact of rare variants. Genome Res 23:1514–1521CrossRefGoogle Scholar
  32. 32.
    Novembre J, Peter BM (2016) Recent advances in the study of fine-scale population structure in humans. Curr Opin Genet Dev 41:98–105CrossRefGoogle Scholar
  33. 33.
    Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genet 2(12):e190CrossRefGoogle Scholar
  34. 34.
    Rubin BE, Ree RH, Moreau CS (2012) Inferring phylogenies from RAD sequence data. PLoS ONE 7(4):e33394CrossRefGoogle Scholar
  35. 35.
    Streicher JW, Schulte IIJA, Wiens JJ (2016) How should genes and taxa be sampled for phylogenomic analyses with missing data? An empirical study in iguanian lizards. Syst Biol 65(1):128–145. CrossRefGoogle Scholar
  36. 36.
    Huang H, Knowles LL (2016) Unforeseen consequences of excluding missing data from next-generation sequences: simulation study of RAD sequences. Syst Biol 65(3):357–365. CrossRefGoogle Scholar

Copyright information

© Springer Nature B.V. 2019

Authors and Affiliations

  1. 1.Faculty of Science, School of Life and Environmental SciencesUniversity of SydneySydneyAustralia
  2. 2.Division of Applied Animal EcologySan Diego Zoo Institute for Conservation ResearchSan DiegoUSA
  3. 3.Australian Museum Research Institute, Australian MuseumSydneyAustralia

Personalised recommendations