Abstract
Bayesian clustering methods have been widely used for studying species delimitation and genetic introgression. In order to test the effect of phylogenetic relationships and sampling scheme on the inferred clustering solution and on the performance of Bayesian clustering analysis, I simulated genotypes of the interfertile oak species Quercus robur, Quercus petraea, and Quercus pubescens and I run analyses using two popular software programs, STRUCTURE and BAPS. First, based on purebred simulations, I compared clustering solutions resulting from different sample size configurations. While clustering solution generally reflected the taxonomic relationships when equal samples of each species were included, spurious partition was inferred by STRUCTURE when some species were represented by larger and others by smaller samples. In very unbalanced configurations, STRUCTURE failed to identify the three species, even if three subpopulations were assumed. By contrast, BAPS could properly identify the three species under any sampling scheme. Second, based on simulations of purebreds and hybrids, I tested the performance of individual assignments with variable number of loci. This analysis showed that STRUCTURE can detect introgressed individuals more efficiently than BAPS. However, BAPS could assign purebreds more efficiently with a lower number of loci. Method performance also depended on phylogenetic relationships. In the case of Q. petraea, Q. pubescens, and their hybrids, method performance was lower due to their phylogenetic affinity. Inclusion of three instead of two species into the analysis led to reduction of performance, and to misclassification of hybrids, which often reflected the phylogenetic affinity between Q. petraea and Q. pubescens.
Similar content being viewed by others
References
Aldrich PR, Parker GR, Michler CH, Romero-Severson J (2003) Whole-tree silvic identifications and the microsatellite genetic structure of a red oak species complex in an Indiana old-growth forest. Can J Forest Res 33:2228–2237
Antao T, Lopes A, Lopes RJ, Beja-Pereira A, Luikart G (2008) LOSITAN: a workbench to detect molecular adaptation based on a FST-outlier method. BMC Bioinforma 9:323
Belkhir K, Borsa P, Chikhi L, Raufaste N, Bonhomme F (2004) GENETIX 4.05, WindowsTM Software for Population Genetics. Laboratoire génome, populations, interactions, CNRS UMR 5000
Bohling JH, Adams JR, Waits LP (2013) Evaluating the ability of Bayesian clustering methods to detect hybridization and introgression using an empirical red wolf data set. Mol Ecol 22:74–86
Burgarella C, Lorenzo Z, Jabbour-Zahab R, Lumaret R, Guichoux E, Petit RJ, Soto Á, Gil L (2009) Detection of hybrids in nature: application to oaks (Quercus suber and Q. ilex). Heredity 102:442–452
Celeux G, Hurn M, Robert CP (2000) Computational and inferential difficulties with mixture posterior distributions. J Am Stat Assoc 95:957–970
Corander J, Marttinen P (2006) Bayesian identification of admixture events using multilocus molecular markers. Mol Ecol 15:2833–2843
Corander J, Marttinen P, Sirén J, Tang J (2008a) Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations. BMC Bioinforma 9:539
Corander J, Sirén J, Arjas E (2008b) Bayesian spatial modeling of genetic population structure. Compu Stat 23:111–129
Curtu AL, Gailing O, Finkeldey R (2007) Evidence for hybridization and introgression within a species-rich oak (Quercus spp.) community. BMC Evol Biol 7:218
Ding L, Wiener H, Abebe T et al (2011) Comparison of measures of marker informativeness for ancestry and admixture mapping. BMC Genomics 12:622
Dow B, Ashley M, Howe H (1995) Characterization of highly variable (GA/CT) n microsatellites in the bur oak, Quercus macrocarpa. Theor Appl Genet 91:137–141
Duminil J, Caron H, Scotti I, Cazal S-O, Petit RJ (2006) Blind population genetics survey of tropical rainforest trees. Mol Ecol 15:3505–3513
Durand J, Bodénès C, Chancerel E et al (2010) A fast and cost-effective approach to develop and map EST-SSR markers: oak as a case study. BMC Genomics 11:570
Earl DA, vonHoldt BM (2012) STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conserv Genet Resour 4:359–361
Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol 14:2611–2620
Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164:1567–1587
Frantz AC, Pourtois JT, Heuertz M, Schley L, Flamand MC, Krier A, Bertouille S, Chaumont F, Burke T (2006) Genetic structure and assignment tests demonstrate illegal translocation of red deer (Cervus elaphus) into a continuous population. Mol Ecol 15:3191–3203
Gugger PF, Cavender-Bares J (2011) Molecular and morphological support for a Florida origin of the Cuban oak. J Biogeogr. doi:10.1111/j.1365-2699.2011.02610.x
Guichoux E, Lagache L, Wagner S, Léger P, Petit RJ (2011) Two highly validated multiplexes (12-plex and 8-plex) for species delimitation and parentage analysis in oaks (Quercus spp.). Mol Ecol Resour 11:578–585
Guichoux E, Garnier-Géré P, Lagache L, Lang T, Boury C, Petit RJ (2013) Outlier loci highlight the direction of introgression in oaks. Mol Ecol 22:450–462
Hanage WP, Fraser C, Tang J, Connor TR, Corander J (2009) Hyper-recombination, diversity, and antibiotic resistance in Pneumococcus. Science 324:1454–1457
Hedrick PW (1999) Perspective: highly variable loci and their interpretation in evolution and conservation. Evolution 53:313
Heuertz M, Fineschi S, Anzidei M et al (2004) Chloroplast DNA variation and postglacial recolonization of common ash (Fraxinus excelsior L.) in Europe. Mol Ecol 13:3437–3452
Höltken A, Buschbom J, Kätzel R (2012) Die Artintegrität unserer heimischen Eichen Quercus robur L., Q. petraea (Matt.) Liebl. und Q. pubescens Willd. aus genetischer Sicht (in German). Allg Forst Jagdztg 183:100–110
Jakobsson M, Rosenberg NA (2007) CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23:1801–1806
Kalinowski ST (2011) The computer program STRUCTURE does not reliably identify the main genetic clusters within species: simulations and implications for human population structure. Heredity 106:625–632
Kampfer S, Lexer C, Glössl J, Steinkellner H (1998) Characterization of (GA) n microsatellite loci from Quercus robur. Hereditas 129:183–186
Kronforst MR, Young LG, Blume LM, Gilbert LE (2006) Multilocus analyses of admixture and introgression among hybridizing Heliconius butterflies. Evolution 60:1254–1268
Kumar S, Skjæveland Å, Orr RJ, Enger P, Ruden T, Mevik B-H, Burki F, Botnen A, Shalchian-Tabrizi K (2009) AIR: a batch-oriented web program package for construction of supermatrices ready for phylogenomic analyses. BMC Bioinforma 10:357
Lepais O, Petit R, Guichoux E, Lavabre J, Alberto F, Kremer A, Gerber S (2009) Species relative abundance and direction of introgression in oaks. Mol Ecol 18:2228–2242
Lexer C, Fay MF, Joseph JA, Nica M-S, Heinze B (2005) Barrier to gene flow between two ecologically divergent Populus species, P. alba (white poplar) and P. tremula (European aspen): the role of ecology and life history in gene introgression. Mol Ecol 14:1045–1057
Manos PS, Doyle JJ, Nixon KC (1999) Phylogeny, biogeography, and processes of molecular differentiation in Quercus Subgenus Quercus (Fagaceae). Mol Phylogenet Evol 12:333–349
Narum SR, Banks M, Beacham TD et al (2008) Differentiating salmon populations at broad and fine geographical scales with microsatellites and single nucleotide polymorphisms. Mol Ecol 17:3464–3477
Neophytou C, Aravanopoulos F, Fink S, Dounavi A (2010) Detecting interspecific and geographic differentiation patterns in two interfertile oak species (Quercus petraea (Matt.) Liebl. and Q. robur L.) using small sets of microsatellite markers. For Ecol Manag 259:2026–2035
Neophytou C, Dounavi A, Fink S, Aravanopoulos F (2011) Interfertile oaks in an island environment: I. High nuclear genetic differentiation and high degree of chloroplast DNA sharing between Q. alnifolia and Q. coccifera in Cyprus. A multipopulation study. Eur J For Res 130:543–555
Nielsen EE, Bach LA, Kotlicki P (2006) HYBRIDLAB (version 1.0): a program for generating simulated hybrids from population samples. Mol Ecol Notes 6:971–973
Payseur BA, Jing P (2009) A genomewide comparison of population structure at STRPs and nearby SNPs in humans. Mol Biol Evol 26:1369–1377
Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959
Reeves PA, Richards CM (2011) Species delimitation under the general lineage concept: an empirical example using wild North American hops (Cannabaceae: Humulus lupulus). Syst Biol 60:45–59
Rosenberg NA (2004) DISTRUCT: a program for the graphical display of population structure. Mol Ecol Notes 4:137–138
Rosenberg NA (2005) Algorithms for selecting informative marker panels for population assignment. J Comput Biol 12:1183–1201
Rosenberg NA, Burke T, Elo K et al (2001) Empirical evaluation of genetic clustering methods using multilocus genotypes from 20 chicken breeds. Genetics 159:699–713
Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, Feldman MW (2002) Genetic structure of human populations. Science 298:2381–2385
Rosenberg NA, Li LM, Ward R, Pritchard JK (2003) Informativeness of genetic markers for inference of ancestry. Am J Hum Genet 73:1402–1422
Steinkellner H, Fluch S, Turetschek E, Lexer C, Streiff R, Kremer A, Burg K, Glössl J (1997a) Identification and characterization of (GA/CT) n-microsatellite loci from Quercus petraea. Plant Mol Biol 33:1093–1096
Steinkellner H, Lexer C, Turetschek E, Glössl J (1997b) Conservation of (GA)n microsatellite loci between Quercus species. Mol Ecol 6:1189–1194
Takezaki N, Nei M, Tamura K (2010) POPTREE2: software for constructing population trees from allele frequency data and computing other population statistics with windows interface. Mol Biol Evol 27:747–752
Vähä J-P, Primmer CR (2006) Efficiency of model-based Bayesian methods for detecting hybrid individuals under different hybridization scenarios and with different numbers of loci. Mol Ecol 15:63–72
Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution 38:1358–1370
Acknowledgments
This work was supported by the European Regional Development Fund (ERDF), the regional government authority of Baden-Württemberg in Freiburg (Regierungspräsidium Freiburg; RPF), the National Office of Forests (Office National des Forêts; ONF) in France and the Regional Directory of Food, Agriculture and Forestry of Alsace (Direction Régionale de l'Alimentation, de l'Agriculture et de la Forêt d'Alsace; DRAAF) in the frame of the Interreg-IV project “The regeneration of the oaks in the Upper Rhine lowlands”. I express my gratitude to all the colleagues of the ONF, RPF and the FVA who worked for sample collections and laboratory analyses, to Jukka Corander for kindly answering several questions about the BAPS software and to two anonymous reviewers for providing valuable comments and suggestions.
Data archiving statement
Genotypic data for this study are available from the Dryad Digital Repository: http://doi.org/10.5061/dryad.b64b4.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by D. Neale
Rights and permissions
About this article
Cite this article
Neophytou, C. Bayesian clustering analyses for genetic assignment and study of hybridization in oaks: effects of asymmetric phylogenies and asymmetric sampling schemes. Tree Genetics & Genomes 10, 273–285 (2014). https://doi.org/10.1007/s11295-013-0680-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11295-013-0680-2