Background

The study of species boundaries provides invaluable information on the evolution of reproductive barriers and the impacts of gene flow on species divergence. This mechanistic approach is aimed at understanding how gene pools become subdivided and further gene flow is restricted or prevented. Gene exchange among taxa is likely to continue [1] and examples of divergence-with-gene-flow speciation [2] are now common in the literature. Speciation-with-gene-flow can range from divergence initiated in sympatry, to the evolution of additional isolating barriers after the establishment of secondary contact [3].

As taxa diverge, shared traits can reflect recent common ancestry and incomplete sorting of characters, reestablished gene exchange, or a combination of both. The two processes are difficult to separate, but the patterns they generate can potentially be discriminated using multiple independent markers and geographical information [4]. While traits sort and correlations among independent characters are generated as taxa diverge, gene flow breaks down those correlations. If introgressive hybridization is important, pre-existing correlations may be broken down by gene flow in sympatry but still persist in allopatry. Therefore, the signature of past isolation and divergence will be evident through the association among traits and geography. For instance, if species constitute monophyletic groups and only individuals in sympatry share alleles with other taxa, introgression rather than ancestral polymorphism is the likely cause of the pattern of allele sharing. Furthermore, allele sharing in areas of sympatry due to introgression is also expected to increase overall levels of molecular diversity, such as number of alleles and haplotype diversity, a pattern that is not likely to be generated by incomplete lineage sorting. Importantly, upon reestablishing some degree of gene flow the resulting patterns of variation will vary on a locus-by-locus basis, depending on impacts of selection and drift and on the architecture of the trait (e.g. [511]).

The relative impacts of homogenizing gene flow and disruptive selection on species boundaries (i.e. genomic divergence) is most easily studied in recently speciated groups that show some degree of geographic overlap, such as finches, butterflies and lizards (e.g. [1217]). Among taxa that present these ideal characteristics are the speciose sister genera Barbus and Luciobarbus. This group of pseudotetraploid fishes started diversifying throughout fresh waters of the circum-Mediterranean region more than 20 ma (million years ago) [1820]. In the Iberian Peninsula, Barbus is represented by two species, B. meridionalis and the endemic B. haasi, while Luciobarbus contains seven endemic species (L. bocagei, L. comizo, L. graellsii, L. guiraonis, L. microcephalus, L. sclateri, and L. steindachneri; [2126]), which started diversifying around 8 ma [1820]. Different species are usually confined to specific ichthyogeographic provinces, inhabiting from one up to several river basins, and their ranges overlap to different extents (Fig. 1). This mostly allopatric distribution pattern indicates that speciation in Iberian barbels follows the evolution of river basins [23, 24, 2628]. Secondary contacts in riverine species can be facilitated by several geomorphological processes affecting drainage patterns, such as river capture, marine regression and divide overtopping [29]. Allopatric diversification and range expansion with secondary contacts has been explicitly demonstrated for one polytypic species [30].

Fig. 1
figure 1

Map of the Iberian Peninsula showing major river drainages and distribution of barbel species. Color lines delimit the distribution ranges of each species. Collection sites: 1 – Tâmega; 2 – Zêzere; 3 – Ocreza; 4 – Erges; 5 – Xarrama; 6 – Caia; 7 – Guadiana; 8 – Ardila; 9 – Chança; 10 – Vascão; 11 – Foupana; 12 – Segura; 13 – Bullent; 14 – Cabriel; 15 – Mijares; 16 – Mesa; 17 – Alhama (see also Table 1)

In diverse groups with varying degrees of range overlap such as barbels, species identification can be difficult. For instance, taxonomic status of L. steindachneri has been problematic due to its morphological variability and similarity to L. comizo, and has been considered a junior synonym of the latter by some authors (after [31]). Nevertheless, from an ecological standpoint it has been shown that L. steindachneri prefers habitats further upstream from those occupied by L. comizo and L. microcephalus, but lower than those inhabited by L. sclateri in the Guadiana River Basin [3234]. Luciobarbus steindachneri also exhibits different trophic adaptations (e.g. intermediate mouth protrusion) and some degree of food partitioning relative to L. bocagei and L. comizo in the Tejo/Tajo River Basin [3537] and L. microcephalus in the Guadiana River Basin [38]. Together, these data suggest that ecology plays a major role in species coexistence and has led some authors to instead regard L. steindachneri as an ecotype of L. comizo (e.g. [34]).

Despite coexistence, hybridization of Iberian barbels in areas of species overlap has been hypothesized based on specimens with intermediate phenotypes [21, 3941] and mtDNA polyphyly [42], but inference of hybridization in these studies suffers from species identification problems [26] and circularity. In fact, hybridization in Iberian barbels has not been unequivocally demonstrated except in a few cases in which nuclear markers were examined [30, 43]. Unfortunately, most studies used a single type of character, precluding the examination of patterns of covariation among them. Furthermore, pseudotetraploidy of barbels limited these studies to the use of low-resolution nuclear markers, which in principle restricts their use to study more divergent species pairs. However, this limitation has been overcome with the development of paralog-specific primers for the amplification of nuclear loci in barbels [44].

Given that Iberian barbels have various degrees of divergence and geographic overlap, these species provide an excellent natural system to understand how reproductive barriers build up. We examine the morphological and genetic distinctiveness of all Iberian barbels, whether reproductive isolation is complete, and how patterns of gene flow vary among loci and species. To this end we use multiple classes of characters (external morphological traits, mitochondrial and nuclear DNA sequence data) and their patterns of covariation on the entire constellation of endemic Barbus (one species) and Luciobarbus (seven species), sampled from sympatric and allopatric populations (Table 1).

Table 1 Sample sizes of Barbus and Luciobarbus analyzed across different traits

Results

Multivariate analysis of morphological data

Multivariate analysis of meristic traits is effective in separating the Iberian endemic species of Barbus and Luciobarbus. The first two principal components explain 80.1 % of the observed variation, with all variables contributing to both axes, although traits that contribute the most to one axis contribute the least to the other (Table 2). Plotting the two components against each other provides clear visual separation of most of the recognized species (Fig. 2). Barbus haasi is the most morphologically distinct Iberian barbel, with all Luciobarbus analyzed being more similar to each other in meristic morphospace. The first component separates L. comizo, L. bocagei and L. sclateri from L. microcephalus, L. guiraonis and L. graellsii, and these groups from B. haasi, while the second component separates species within the first two groups of Luciobarbus. Therefore, species living in sympatry can be correctly discriminated using a few morphological traits, with the notable exception of L. steindachneri. Specimens of L. steindachneri show substantial variation in meristic traits, some individuals overlapping with either L. comizo or L. bocagei in the Tejo River and with L. comizo or L. sclateri in the Guadiana River. As a whole, L. steindachneri occupies an intermediate position in the morphospace relative to the other Iberian Luciobarbus.

Table 2 Principal components of meristic traits of all Iberian Barbus and Luciobarbus samples
Fig. 2
figure 2

Scatterplot of PC1 and PC2 of meristic traits of all endemic Iberian Barbus and Luciobarbus species. Center of symbol represents morphological identification of specimens following Almaça [21], perimeter of each symbol represents mtDNA lineage defined in Fig. 3. Polygon delimits samples from each population/species

Intraspecific differences are also observable, including between populations of L. comizo from the Tejo and Guadiana rivers, between populations of L. guiraonis from the Bullent and Mijares rivers relative to that from the Júcar River, and between populations of L. sclateri from the Guadiana and Segura rivers (albeit to a lesser degree). Populations of L. guiraonis from the Bullent and Mijares rivers overlap with each other and with L. microcephalus for these specific meristic traits (numbers of scales and cephalic pores); however, these taxa are readily discriminated by other traits, such as position of the mouth, shape of the dorsal fin and robustness of the last simple dorsal ray (Fig. 2 and [22]).

Mitochondrial phylogeny

Analysis of 151 individuals representing the eight endemic barbel species identified 14 different cyt b haplotypes (Table 3), as found in previous studies [26, 45]. Two haplotypes are typical of L. comizo, two of L. bocagei, two of L. sclateri, two of L. guiraonis, four of L. microcephalus, one of L. graellsii and one of B. haasi. The Iberian Luciobarbus haplotypes form three main lineages, which are thought to reflect the true species tree. One lineage is composed of haplotypes found in the sister L. comizo and L. bocagei, which share a common ancestor with a second lineage, composed of the polytypic L. sclateri. The third lineage is composed of L. graellsii and the sister L. microcephalus and L. guiraonis (Fig. 3).

Table 3 Distribution of cyt b haplotypes across species and populations sampled
Fig. 3
figure 3

Maximum likelihood phylogeny of mitochondrial cyt b haplotypes. Bootstrap and Approximate Likelihood Ratio Test support values (left/right, respectively) are given next to relevant nodes. Each rectangle represents one sequence. Colors represent morphological identification of specimens following Almaça [21]

Cytochrome b haplotypes are readily associated with a morphological species, again with the exception of individuals of L. steindachneri analyzed, which have mtDNA haplotypes typical of L. comizo or L. sclateri. In addition, a small number of specimens of several species possess mtDNA typical of other sympatric taxa, but never of allopatric taxa. This pattern is especially pronounced in L. guiraonis from the Júcar River where most specimens examined exhibit B. haasi mtDNA (Fig. 2).

Nuclear gene phylogenies

Sequence analysis of four nuclear loci yielded 3,792 aligned sites (S7-1, 813 bp; S7-2, 827 bp; Gh-1, 1018 bp; Gh-2, 1134 bp). The Iberian specimens analyzed exhibited 35, 54, 41 and 41 alleles for S7-1, S7-2, Gh-1 and Gh-2, respectively. In general, gene trees are similar to each other (Figs. 4, 5, 6, 7) and are in agreement with relationships previously suggested by mtDNA, allozymes and morphology (e.g. [2227, 30, 46, 47]). In particular, two main monophyletic lineages corresponding to the two genera, Barbus and Luciobarbus, are recovered. The former lineage is comprised of the Iberian endemic B. haasi and central and eastern European taxa included for phylogenetic context (B. barbus, B. carpathicus and B. prespensis). For S7-1, S7-2 and Gh-2, the Iberian Luciobarbus lineage comprises two monophyletic groups, one composed of L. graellsii, L. guiraonis and L. microcephalus, and another composed of L. bocagei, L. comizo, L. sclateri and L. steindachneri. Gh-1 is the least resolved nuclear marker at the species level, revealing a monophyletic group composed of L. graellsii and L. guiraonis, while L. microcephalus is recovered in a weakly supported group together with other sympatric species. The Iberian Luciobarbus lineages are further diagnosed by particular insertion-deletion variants at S7-1, S7-2 and Gh-2.

Fig. 4
figure 4

Maximum likelihood phylogeny of S7-1 alleles. Bootstrap and Approximate Likelihood Ratio Test support values (left/right, respectively) are given next to relevant nodes. Each rectangle represents one allele. Colors represent morphological identification of specimens following Almaça [21]

Fig. 5
figure 5

Maximum likelihood phylogeny of S7-2 alleles. Bootstrap and Approximate Likelihood Ratio Test support values (left/right, respectively) are given next to relevant nodes. Each rectangle represents one allele. Colors represent morphological identification of specimens following Almaça [21]

Fig. 6
figure 6

Maximum likelihood phylogeny of Gh-1 alleles. Bootstrap and Approximate Likelihood Ratio Test support values (left/right, respectively) are given next to relevant nodes. Each rectangle represents one allele. Colors represent morphological identification of specimens following Almaça [21]

Fig. 7
figure 7

Maximum likelihood phylogeny of Gh-2 alleles. Bootstrap and Approximate Likelihood Ratio Test support values (left/right, respectively) are given next to relevant nodes. Each rectangle represents one allele. Colors represent morphological identification of specimens following Almaça [21]

The relationships within the two Iberian Luciobarbus lineages are less resolved, with most species para- or polyphyletic for some or all nuclear loci. As in the case of mtDNA, several specimens possess alleles typical of other sympatric species (Tejo and Guadiana rivers), or from allopatric taxa found in adjacent basins (Sado River). The most obvious case is represented by the two divergent lineages present in L. guiraonis from Júcar, one typical of B. haasi and the remaining also found in other populations of L. guiraonis. Notably, L. steindachneri is not distinct from the other sympatric species as most of its alleles are also found in L. comizo and L. bocagei from the Tejo River, and in L. comizo, L. microcephalus and L. sclateri from the Guadiana River.

Population-level nuclear variation and differentiation

Levels of nuclear polymorphism vary substantially among populations, from five segregating sites (S) observed in L. guiraonis from Mijares, to 91 segregating sites observed in L. guiraonis from the Júcar River (Table 4). Nucleotide diversity (π) also differs by an order of magnitude among these populations, but most species/populations show intermediate levels of polymorphism (Table 4). Interestingly, allopatric populations of Luciobarbus exhibit lower levels of nucleotide polymorphism than sympatric populations (Mann–Whitney U-tests: z = 2.2736–2.7608, False Discovery Rate corrected P = 0.0072–0.0116, for all measures of polymorphism, S, H d, K, π and θ). In addition, both populations of L. steindachneri exhibit the highest numbers of segregating sites and levels of nucleotide diversity of sympatric species, in line with results from phylogenetic analyses. Additionally, discounting the effects of introgression, populations inhabiting smaller basins (e.g., Mijares, Sado and Segura) show lower levels of nucleotide polymorphism than those inhabiting larger river basins as Guadiana or Tejo, consistent with differences in effective population size.

Table 4 Nuclear sequence polymorphism in Iberian barbel populations

To test how polymorphisms across loci are structured among populations we generated estimates of Dxy, Da and F ST (Additional file 1) and clustered them using the neighbor-joining method (Additional file 2). Population trees built using estimates derived from mtDNA or all nuclear loci combined are generally consistent with phylogenetic results presented above. They identify two main groups, the genera Barbus and Luciobarbus. Within the latter, one subgroup is composed of L. graellsii, L. guiraonis and L. microcephalus, while the other contains L. bocagei, L. comizo, L. sclateri and L. steindachneri. Different populations of the same species are typically more similar to each other than to other species, with the exception of L. steindachneri and L. guiraonis (which reflect patterns of allele sharing detailed above). When nuclear loci are analyzed individually to investigate how nucleotide variation along different regions of the genome is shared across species and populations, different scenarios become clear, depending on the locus: S7-1 and Gh-2 follow the expected species phylogeny, while S7-2 and Gh-1 reflect geographical affinities. Thus, the latter genomic regions reflect geographic proximity and interspecific gene flow, while the former reflect phylogenetic affinities.

Bayesian clustering of nuclear data

Due to the presence of large number of clusters (K) and high levels of differentiation among many populations and species, Structure sometimes converged to different solutions in independent replicates of each K, making determination of the best K challenging. Therefore, we consecutively split the complete dataset into smaller datasets, as recommended by the authors of the program as a strategy to deal with dataset multimodality. To determine the number of clusters we followed changes in LnP(D) values of consecutive K (i.e. when values plateau) and Evanno’s et al. ΔK [48]. We first split the dataset in two, based on geography and phylogenetic relationships. The first dataset is composed of all species from rivers draining to the Atlantic and allopatric populations of those species (in this case only L. sclateri from the Segura River). This analysis converged to K = 5 genetic clusters (Fig. 8, upper left), clearly separating populations inhabiting rivers draining to the western margin of the Iberian Peninsula (Douro, Tejo and Sado rivers) from rivers draining to the southern margin (Guadiana and Segura rivers). Further splitting the first dataset using this geographical discontinuity and patterns of gene exchange allows the identification of further intraspecific genetic structure (K = 3 and K = 4; Fig. 8, bottom). The second dataset is composed of species sympatric in rivers draining to the Mediterranean (Ebro, Júcar, Mijares and Bullent rivers) and closely related species (i.e., L. microcephalus from the Guadiana River). This analysis converged to K = 4 genetic clusters (Fig. 8, upper right), which is concordant with species identification based on morphology. The only discordant samples are L. guiraonis from the Júcar River, which are placed in the same group as B. haasi.

Fig. 8
figure 8

Bayesian analysis of population structure. Split-and-reanalyze strategy implemented in structure. The entire dataset was initially split into Atlantic- and Mediterranean-draining basins (top left and top right, respectively). The first dataset also includes a population of L. sclateri from a Mediterranean-draining basin (Segura R.) as an allopatric counterpart to L. sclateri from Guadiana R.. The Atlantic-draining dataset was subsequently split into west- and south-draining basins (bottom). Using a combination of changes in consecutive LnP(D) values and Evanno’s [48] ΔK, we determined the most likely number of clusters in each dataset. These analyses indicate that an overall K = 10 populations is the most biologically meaningful genetic structuring of the nuclear dataset. In Additional file 3 we present an alternative strategy to estimate K, using the run with highest likelihood value for each K. The results are very similar, with the detection of additional differentiation within L. bocagei (K = 11)

Overall we identified 10 distinct population clusters, which allowed the genetic discrimination of almost all species included in the study, as well as allopatric populations of L. comizo and L. sclateri, and identification of within-population differentiation in L. bocagei from Tejo. Conversely, and in spite of clear power to diagnose even distinct intraspecific clusters, L. steindachneri does not constitute a genetically distinct unit. In turn, nuclear variation in L. steindachneri co-varies with sympatric species, in particular with L. comizo and L. bocagei in the Tejo River, and with L. comizo and L. sclateri (and L. microcephalus to a lesser extent) in the Guadiana River. This is another aspect noticeable from the Structure analyses, sympatric populations share more alleles across genetic clusters than allopatric ones, consistent with higher polymorphism levels in sympatry found above.

Levels of gene flow across species boundaries of different ages

We tested the hypothesis of gradual accumulation of genomic barriers to gene flow by examining the relation between the proportion of introgressed nuclear alleles (alleles shared only in areas of sympatry) between pairs of co-occurring species and their degree of divergence (see Methods section for details). We found that levels of gene flow across the species boundaries vary inversely with species divergence, i.e., introgression is more reduced between more divergent species (e.g. between L. microcephalus and L. sclateri) than between more closely related ones (e.g. between L. bocagei and L. comizo; r 2 = 0.773, Spearman’s Rank ρ = −1, P = 0.0416; Fig. 9).

Fig. 9
figure 9

Relationship between levels of introgression and genetic divergence between hybridizing species pairs. A negative correlation between proportion of introgressed alleles and genetic divergence between species is observed, indicating a decrease in gene flow with increasing genetic differentiation. Species pairs in increasing order of genetic divergence based on mitochondrial DNA (from [26]): L. bocageiL. comizo, L. comizoL. sclateri, L. microcephalusL. sclateri, L. comizoL. microcephalus

Discussion

It has been suggested that studying pairs of populations and species spanning the speciation continuum can contribute more comprehensively to our understanding of how selection affects genomic divergence [9]. Our study successfully combines the analyses of populations and species spanning a wide degree of divergence, geographic distribution and range overlap, with different molecular markers and morphological traits, which provides exceptional perspective on the nature of species boundaries. Such a holistic approach has proven very informative in other systems (e.g. [4954]). We show that species boundaries in Iberian Barbus and Luciobarbus are semi-permeable, as different species (including non-sister taxa) are exchanging genes in areas of sympatry after an initial period of allopatric differentiation. Distribution of genetic variation across space, loci and species supports a speciation-with-gene-flow scenario with levels of interspecific gene flow inversely associated with divergence. We also provide evidence for a hybrid origin of a barbel ecotype, L. steindachneri, suggesting that ecology plays a key role in species coexistence and hybridization in Iberian barbels (Fig. 10).

Fig. 10
figure 10

Semi-permeable species boundaries in Iberian barbels. Species boundaries in Iberian barbels are semi-permeable to gene flow in areas of sympatry after an initial period of allopatric differentiation. Genetic data support a speciation-with-gene-flow scenario with levels of interspecific gene flow inversely associated with divergence, as represented by solid or dashed arrows. Traits involved in ecological preference (e.g. habitat, diet) are likely contributing to reproductive isolation and species coexistence. When these breakdown an ecotype of hybrid origin (L. steindachneri) is formed, frequent in areas with intermediate habitat characteristics

Speciation-with-gene-flow in Iberian barbels

Patterns of molecular variation suggest that genomic differentiation in barbels generally starts in allopatry. It has been suggested that Luciobarbus evolution follows river basin evolution as sister species usually inhabit one or a few adjacent river basins [23, 24, 2628]. Gante et al. [30] showed that polytypic L. sclateri diversified in allopatry and expanded its distribution westwards in southern Iberia, leading to secondary contacts with L. comizo and L. microcephalus in the Lower Guadiana River less than 500 ka (thousand years ago). Geographical scenarios for the evolution of all other Iberian barbels are not equally detailed, but our data indicate that intraspecific differentiation in allopatry is the norm: in addition to discrimination at the species levels, geographic (allopatric) differentiation is also found where samples from multiple basins are available (L. bocagei, L. comizo and L. sclateri). It is therefore reasonable to assume that along the speciation continuum the same mechanisms act during differentiation in allopatry and are involved in all levels of differentiation, intra- and interspecific. We also observed within-basin differentiation in L. bocagei, which could be the result of its preference for low-order rivers as observed in rheophilic Barbus [20].

Contrasting patterns of molecular variation between areas of allopatry and sympatry is pivotal in excluding ancestral polymorphism as an explanation for the observed allele sharing and gene paraphyly between co-occurring species of Iberian Barbus and Luciobarbus. Individuals are generally readily assignable to a monophyletic group representative of each species, with only individuals in sympatry sharing alleles with other taxa. This leads to higher levels of nucleotide polymorphism in sympatric populations of different species relative to their allopatric counterparts, consistent with a role for gene flow in generating these spatial patterns of molecular variation. Geographical information has also proved important in discriminating between these processes in other young systems, such as sunflowers, fruit flies, Heliconius butterflies and cichlid fishes (e.g. [4, 5557]). It is clear that reproductive isolation is not yet fully complete in Iberian barbels, including between non-sister species (e.g. L. comizo and L. sclateri), indicating that genomes remain porous and allow for gene exchange, even between genera that diverged more than 20 ma [1820]. Incomplete reproductive isolation seems to be the norm in other diverse systems in which species can be found in sympatry, including other barbels [20, 5867]. Multiple characteristics make freshwater fish generally prone to hybridize [68]. In barbels, natural characteristics associated to external fertilization include spawning migrations and group spawning during which multiple males arrive to breeding grounds and await incoming females. In addition, the likely asymmetrical behavioral isolating mechanisms and unequal abundance of the co-occurring species increase the likelihood of interspecific gene flow. Acting alone or in combination, incorrect species recognition or sneaking behavior by small males might explain why often mispairings involve females of the largest species [20, 69, 70]. Pseudotetraploidy and long generation times (i.e. slow evolutionary rates) might also buffer against possible effects of intrinsic genomic incompatibilities and slow down the evolution of reinforcement mechanisms, which would allow introgression between genera separated by more than 20 ma of evolution [1820]. Anthropogenic sources of decreased habitat complexity and construction of obstacles preventing spawning migrations to correct breeding sites (such as reservoirs and dams) might also increase the chances of hybridization. Although it is likely that anthropogenic activities have influenced the extent of hybridization and directionality of introgression we currently observed, it is probable that some interspecific gene flow has occurred soon after species became sympatric due to changes in drainage geometry. Nevertheless and in spite of ongoing gene flow in areas of sympatry, Iberian barbels can be distinguished using a combination of morphological and molecular data, which suggests that reproductive barriers, albeit variable in strength, do exist for at least some parts of the genome.

How then do reproductive barriers become established? Theory predicts that loci involved in adaptation to different niches and speciation should show increased differentiation while neutral or nearly neutral loci not tightly linked to them should move more freely between populations and species [6, 71]. Therefore, genealogies derived from neutral or nearly neutral loci will be more susceptible to the impacts of interspecific gene flow than gene trees based on loci under divergent selection, which are prevented from introgressing and should better reflect species histories predating introgression after secondary contact. Additionally, frequency of new mutations is more likely to increase if they arise in regions of low introgression, potentially leading to a feedback loop of increased genomic differentiation [1]. Thus boundaries between older species pairs, which have accumulated more genetic differences, will likely exhibit greater genetic incompatibility and be less permeable to gene flow. We observe both scenarios in Iberian barbels: introgression across species boundaries varies among loci and gene flow seems to decay with increased divergence between species pairs, such that boundaries become less diffuse and more easily identifiable in more divergent, older species pairs. Therefore, reproductive isolation is not a property of the entire genome, nor does it arise instantaneously (e.g. [6]). Instead, these patterns are consistent with genetic differentiation (i.e. barriers to gene flow) starting in more or less localized genomic regions and then accumulating along the genome as species diverge [6, 9, 11, 58, 7181]. Such ‘islands’ or ‘continents’ of the genome under stronger disruptive selection would exhibit reduced levels of introgression among species, relative to nearly neutral regions, retaining phylogenetic signal without noise from gene exchange among lineages. The relative impact of these two forces gradually changes with divergence until the influence of introgression is eventually ceased in all genomic regions. The precise mechanisms by which genomes become less porous and the signatures they leave behind are a major focus of speciation research [11, 79, 8284]. Thus, all lines of evidence suggest that population differentiation and speciation in Iberian barbels are initiated in allopatry and accumulate as populations diverge. However, reproductive barriers are not (yet) sufficient to fully prevent gene flow where species meet in regions of secondary contact, as seen in other species of barbels [20, 69, 85]. In addition to levels of introgression differing between species pairs, the environment is also expected to play a role in hybridization outcomes in particular when ecology influences their degree of sympatry. Possible environmental effects on hybridization dynamics could be disentangled from genomic divergence effects by using replicated hybrid zones of the same species pairs.

Luciobarbus steindachneri is a hybrid ecotype

The above considerations about species delimitation and mode of evolution fit all except for one nominal species, L. steindachneri. Its taxonomy has proved challenging since its description, as morphological and mtDNA similarity with L. comizo has already been identified [26, 31]. Here we provide evidence that explains these difficulties. Nuclear sequence data examined indicate that the genome of L. steindachneri shares its alleles across genetic clusters, mostly between L. comizo and all other sympatric species, both in the Tejo and Guadiana river basins. In addition, it has the highest numbers of segregating sites and levels of nucleotide diversity of sympatric species, which altogether support a scenario of admixture. It has the highest morphological variance of all Iberian barbels and is intermediate between several species, not only in meristic traits as seen in the present study and [21] but also in trophically relevant ones, such as mouth shape and position [3537]. The latter are most likely under selection and have important fitness consequences [86, 87]. Differences in feeding apparatus likely explain observed food partitioning relative to L. bocagei and L. comizo in the Tejo River [3537] and L. microcephalus in the Guadiana River [38], where it occupies an intermediate trophic position. Although L. steindachneri is always found in sympatry with other species, it prefers intermediate habitats and medium order rivers: it inhabits more frequently river stretches further upstream from those preferred by L. comizo and L. microcephalus, but downstream from typical L. sclateri habitat in the Guadiana River [3234] and downstream from L. bocagei in the Tejo River [Gante pers. obs.]. Observed habitat preferences likely reflect selection on trophic morphology and trophic niche. These lines of evidence suggest that ecology plays a major role in the coexistence and hybridization among sympatric barbel species. Luciobarbus steindachneri is thus the local product of introgressive hybridization between L. comizo and L. bocagei in the Tejo, and L. microcephalus and L. sclateri in the Guadiana, for which we have no evidence to be a stabilized hybrid species. Therefore, following Morán-López et al.’s [34] terminology, we show that L. steindachneri is an ecotype of hybrid origin that is using a niche intermediate to those of its parents [88]. In spite of this general intermediacy, whether its ecology varies between the Tejo and Guadiana rivers, and with parental crosses involved has not been assessed but is likely to be the case. Overall these findings raise questions on how one of the most common barbels found in Iberian rivers should be dealt with from taxonomic and conservation perspectives, as it is of hybrid origin and not stabilized. Conversely, it becomes a great model in evolutionary ecology. Sampling of replicated hybrid zones would help elucidate how stable this hybrid is and other aspects, such as geographical differences in parental contribution and levels of hybridization.

Conclusions

A fundamental question in speciation research is the evolution of barriers to gene flow and the long-term persistence of taxa in sympatry. We provide evidence for semi-permeable species boundaries in Iberian Barbus and Luciobarbus (summarized in Fig. 10). Different species are exchanging genes in areas of sympatry after an initial period of allopatric differentiation. Genomic barriers to gene flow are heterogeneous in strength and accumulate with increased divergence. Particularly puzzling is the case of intergeneric hybrids in the Júcar River, where only a hybrid swarm was found. Additional sampling outside this area would be necessary to tease apart possible causes for the apparent collapse or absence of isolating mechanisms. We also show that L. steindachneri is an ecotype of hybrid origin, suggesting that ecology plays a key role in species coexistence and hybridization. Since Iberian barbels span a wide range of genetic divergence, diverse ecologies, and degrees of sympatry, they provide an ideal opportunity for studying the evolution of reproductive isolation mechanisms. Given the complexities of this system (i.e., ancient polyploidy) recent improvements in technology have made it possible to study systems that were traditionally less tractable due to technical challenges. In particular, newer deep sequencing technologies will allow further insight into the patterns of genomic divergence in the face of gene flow and mechanisms shaping reproductive barriers in barbels by sampling many more loci than the ones surveyed here. For instance, it would allow determining which genomic regions are responsible for reproductive isolation and when they arose relative to the establishment of secondary contacts.

Methods

Specimen collection and sample processing

Specimens of Iberian Barbus and Luciobarbus were collected by electrofishing throughout the species’ ranges (Fig. 1). All specimens were formalin fixed and ethanol preserved, and tissues ethanol preserved and deposited in the zoological collections ‘Museu Bocage’ (MB) of Museu Nacional de História Natural e da Ciência, Portugal, and in Museo Nacional de Ciencias Naturales (MNCN), Spain. Some museum samples were used only for morphological analyses, while others from central and eastern Europe were used only to provide context in phylogenetic analyses of molecular characters (Table 1).

We identified individuals to species based on Almaça’s [21] qualitative traits, such as dorsal fin shape, robustness of the last dorsal spine, head shape, eye and mouth position, barbel length and coloration of body and fins. Assignment of individuals to L. comizo and L. steindachneri can be difficult due to morphological similarity that has motivated their synonymy [31]; conservatively only specimens with clear phenotypes were assigned to L. comizo. Covariation of meristic morphological traits and molecular markers (see below) would reject the null hypothesis of ‘no species differentiation’ and confirm that qualitative traits are truly representative of species identity [89].

Choice, scoring and analysis of morphological traits

For testing species discrimination using morphological traits, we chose additional independent characters based on their potential information content, scoring reliability and reproducibility, their consistent preservation in different conditions and ease of use. For these reasons, external meristic traits were preferred over morphometric traits, as the latter are further complicated by allometry, which allow us to include also juvenile specimens in the study. Even though there is substantial overlap in scale counts across different Luciobarbus species (e.g. [21, 31]), they vary widely and also exhibit different modal values. Three different scale counts were taken: number of scales along the lateral line (LL); number of scales in transverse rows above the LL, counted anteriorly-posteriorly from the LL to the sagittal line in front of the first dorsal-fin ray, not including the LL (TRA); number of scales in transverse rows below the LL, counted anteriorly-posteriorly from the LL to the insertion of the first pelvic-fin ray, not including the LL (TRB). Allowance was made for abnormal scale development, such as duplicated or fused scales, evaluated by comparison with scales from rows above and below the scales in question. Numbers of cephalic canal pores found to be useful in the diagnosis of Barbus species [90] were also examined, including those on the supraorbital canal (SOC), the infraorbital canal (IOC) and the preopercular-mandibular canal (POMC). Counts were taken preferentially on the left side under a dissecting scope using Cyanine Blue 5R temporary stain to enhance contrast of structures [91].

Covariation among morphological traits was assessed using principal components analysis (PCA), as implemented in PAST [92]. PCA explores co-linear variation of the original variables, reducing multidimensionality of the data into new orthogonal variables (principal components). The variance-covariance matrix of standardized meristic variables was used.

Scoring and sequencing of mitochondrial and nuclear loci

The mitochondrial gene cytochrome b (cyt b) has proved to be useful in phylogenetic and phylogeographic studies of Barbus and Luciobarbus (e.g. [24, 26, 47, 93]). We screened for single-stranded conformational polymorphisms (SSCPs) of a 275 bp fragment following methods in Gante et al. [45]. Sequences of multiple (>10 %) representative SSCP bands from each gel were confirmed by Sanger sequencing on an ABI 3730 DNA Analyzer.

Several PCR primers for nuclear loci have become available for cyprinid fishes. Due to pseudotetraploidy of barbels, paralog-specific primers were used following Gante et al. (2011). Briefly, we used a hybrid annealing strategy, combining both universal exon-primed intron-crossing (EPIC) and paralog-specific intron-primed exon-crossing (IPEC) primers for amplification and sequencing of four nuclear loci: S7-1, S7-2, Gh-1 and Gh-2. Sequences were obtained by Sanger sequencing on an ABI 3730 DNA Analyzer. Different methods were employed to resolve haplotypic phase of heterozygous individuals. Where individuals were heterozygous for insertions or deletions (indels), haplotypes were either manually phased using the method described by Flot et al. [94] or using the program Champuru [95]. Haplotypes with known phases were subsequently used to phase the single nucleotide polymorphism (SNP) heterozygotes with Phase [96]. Phase input files were generated using SeqPhase [97]. Consistency of the inferred haplotypes was assessed in five independent Phase runs, each of 100 iterations, burn-in of 100 and thinning interval of 1.

Phylogenetic analyses of nucleotide sequences

The models of nucleotide sequence evolution for the different datasets were identified with jModeltest 2 [98, 99] using the corrected Akaike and Bayesian Information Criteria. Maximum likelihood allele phylogenies were built in PhyML 3.0 [100] using an HKY85 model of nucleotide evolution with four substitution rate categories, estimated transition/transversion ratios, with a proportion of invariable sites and gamma distribution parameter, and using empirical nucleotide frequencies. Previous published complete cyt b sequences of all species [26] were included in the analysis to check relationships of haplotypes found in the present study. Topology searches used NNI (nearest-neighbor interchange) and SPR (subtree pruning and regrafting) on a BioNJ starting tree. Node support was estimated using 1000 bootstrap replicates and approximate Bayes likelihood ratio tests.

Levels of nucleotide polymorphism and population differentiation

Number of segregating sites (S), haplotype diversity (H d), average number of differences (K), nucleotide diversity (π) and Watterson estimator (θ), and pairwise population divergence measures (Da, Dxy and F ST) were calculated using the program DnaSP v5.10.01 [101]. Neighbor-joining (NJ) networks of pairwise population differentiation statistics were constructed in MEGA 5.05 [102] for each locus separately and all nuclear loci combined. Node support for each network was assessed by bootstrapping nucleotide sequence alignments (100 times) in Seqboot from the Phylip package [103] and re-calculating estimates.

Bayesian clustering of nuclear DNA data

Covariation among nuclear loci was assessed using the Bayesian clustering program Structure v2.3.3 [104, 105]. Structure identifies clusters by assigning individuals to K populations in a way that minimizes deviations from Hardy–Weinberg equilibrium within clusters and maximizes linkage disequilibrium between them (i.e. species). Each unique allele was identified using the program Macclade v4.03 [106] and numerically coded (i.e. we coded each unique combination of SNPs at each locus and not each individual SNP). Due to the complexity (multimodality) of the complete nuclear dataset, Structure sometimes converged to different solutions in independent runs for the same K. Therefore, smaller, more tractable datasets consisting of combinations of sympatric and sister species and allopatric references were analyzed independently as recommended by Pritchard & Wen [107]. This approach allows for assessment of the importance of introgression and ancestral polymorphism on allele sharing. To assess reliability of solutions, 20 iterations were run for each K. Each iteration consisted of 500,000 MCMC (Markov Chain Monte Carlo) generations as burn-in, followed by 1,000,000 MCMC replicates to estimate the posterior sample distribution, using the admixture and correlated allele frequency models. LnP(D), the probability of the data given K, was tracked over the course of the burn-in and the run to ensure that these values had stabilized by the end of the burn-in period. Two different methods were used to determine the number of groups (K) identified by Structure runs of each dataset. The first method, suggested by Pritchard & Wen [107], identifies the most likely value of K by comparing changes in LnP(D) values of consecutive K (i.e. when values plateau). The second method, developed by Evanno et al. [48] finds the ad hoc quantity based on the second order rate of change of the likelihood function with respect to KK). Plots of these two metrics were obtained using Structure Harvester [108]. Consensus clustering across iterations for each for each K was generated using the greedy algorithm in clumpp [109] and visualized using the program distruct [110]. In addition, we present and discuss an alternative strategy to determine the best K using the complete dataset in the supplementary information (Additional file 3).

Levels of introgression between pairs of hybridizing species

To test the hypothesis that reproductive barriers accumulate in the genome as species diverge, we calculated the proportion of introgressed alleles between pairs of hybridizing taxa. We restricted the analysis to pairs of species that we see from our data to co-occur and exchange genes in sympatry to avoid obscuring any patterns with zero-inflation from non-hybridizing species pairs (e.g. because they are never in sympatry). We also excluded the intergeneric hybridizing pair B. haasi and L. guiraonis, as ‘pure’ B. haasi has not yet been collected in the Júcar River. It is not presently clear if it inhabits some area of the basin yet to be sampled or if it went locally extinct. Therefore we analyzed the hybridizing pairs L. bocageiL. comizo, L. comizoL. sclateri, L. microcephalusL. sclateri and L. comizoL. microcephalus. Diagnostic alleles were identified from nuclear allele phylogenies: alleles shared only in sympatric populations of species A, but absent in their allopatric counterparts while common in sympatric or allopatric populations of species B were considered typical of the latter. Because L. microcephalus is fixed for Gh-1 and inhabits only the Guadiana River we do not have allopatric samples to helps us distinguish whether it shares Gh-1 alleles through introgression and complete replacement, or by retained ancestral polymorphism, we conservatively excluded this locus from comparisons involving this species. We used pairwise uncorrected mean divergence between taxa (p-uncorrected) derived from mtDNA as a surrogate for levels of species divergence (from [26]). Therefore, this estimate of divergence between taxa is independent from nuclear introgression levels determined in the present study and should better reflect species divergence in the absence of interspecific gene flow.

Ethics statement

Direcção Geral dos Recursos Florestais (DGRF) provided the necessary collection permits. Fish were euthanized with an overdose of MS-222 (3-aminobenzoic acidethyl ester methanesulfonate) prior to handling.

Availability of supporting data

The datasets supporting the results of this article are deposited in the Dryad Digital Repository [http://dx.doi.org/10.5061/dryad.m1572] and include five sequence alignment files and one morphological (meristic) traits file [111].