Background

When two populations of the same species become spatially separated, they will start to diverge as a product of the combined processes of mutation, selection and drift, and eventually become separate species [13]. Such allopatric evolution is thought to be the main pathway to the formation of new species [2, 47]. However, since evolutionary divergence is a gradual process, it is operationally difficult to define the actual threshold at which two allopatric populations can be classified as separate species. Moreover, there is no consensus about the criteria for delimiting species or the definition of species [810]. The controversies over species concepts basically come down to the role of evolutionary history versus population processes, i.e., the criteria of monophyly (phylogenetic species concepts, e.g., [11]) and reproductive incompatibility (the biological species concept [12, 13]).

In avian taxonomy there is a general consensus that species must have diagnostic characteristics and an independent evolutionary history [11, 14]. Those two features are tightly linked, as diagnostic, heritable characters take time to evolve. However, there is disagreement over the emphasis of reproductive incompatibility. Whereas proponents of the phylogenetic species concept argue that it is irrelevant whether two independently evolving species will merge or remain distinct upon secondary contact [11, 15] advocates of the biological species concept argue that reproductive incompatibility is the key criterion for species delimitation [14]; allopatric or parapatric populations that show diagnostic differences but have not yet attained reproductive incompatibility are better classified as subspecies within a polytypic species. On the other hand, Gill [16] recently argued for a reversal of current taxonomic practice in which “splitting” rather than “lumping” taxa should be the null hypothesis, and the burden of proof placed on “lumping” rather than on “splitting” taxa at the species level.

A recent review on avian species-level taxonomy since 1950 has shown a steady increase in the number of bird species, due to a trend towards splitting polytypic species [17]. This trend is caused by an eclectic taxonomic practice, in which multiple criteria for species delimitation have been applied [17, 18]. The trend can also be seen as a necessary correction to a taxonomic bias during the first half of the 20th century, when the number of recognized species was reduced by more than 50 % due to “default lumping” of taxa into polytypic species without any critical assessment of their diagnosability, monophyly or reproductive incompatibility [17, 19]. Hence, it follows that many subspecies still retained in current classifications are ghosts of the past and should undergo taxonomic revision with a pluralistic set of species criteria [16, 18].

While diagnosability and monophyly are relatively easy to assess by phenotypic and genotypic traits, the assessment of reproductive incompatibility seems very challenging for allopatric taxa [15], and must therefore be inferred from divergences in characters that are supposed to be functionally important in reproduction. The criteria for deciding when two allopatric or parapatric taxa have attained enough reproductive incompatibility to justify species rank have been outlined by Helbig et al. [14]. In principle, reproductive incompatibility is seen as a hypothesis derived from comparative evidence, where the magnitude of multiple character divergences matches that of related species that coexist in sympatry, and therefore are undoubtedly reproductively incompatible. Tobias et al. [20] proposed a quantitative scoring system for species assignment based on a set of phenotypic and behavioural traits derived from a data set of well-recognised sympatric or parapatric species pairs. These traits likely play a role in habitat and niche adaptations, or in sexual attraction among mates, and can thus be regarded as premating isolation mechanisms.

Here we argue that reproductive incompatibility can also be inferred from traits that have a functional role after copulation, viz. sperm traits. Spermatozoa are extremely diverse in size and form across the animal kingdom [21] and are often divergent among closely related species [22]. Sperm morphology traits have therefore proved useful for taxonomy and phylogenetic inference in many animal groups, including birds [23]. While there are generally marked structural differences in avian spermatozoa among avian orders and families [24], recent comparative studies on passerine birds have revealed considerable variation in sperm length among related species [25, 26]. There is also evidence of sperm length variation among geographically structured populations or subspecies [2730], which suggests that sperm length can be a useful taxonomic marker also for incipient species.

In this paper, we report a comparative study of multiple character divergences between the two island populations of the Blue Chaffinch Fringilla teydea, which are currently ranked as subspecies [3133]. The two taxa, endemic to the central islands of Tenerife (ssp. teydea) and Gran Canaria (ssp. polatzeki) in the Canarian archipelago, are phenotypically distinct in male plumage [34], as well as in morphometrics and vocalizations [35, 36]. A recent study confirmed the statistical significance of these measures and suggested lifting the taxa to species rank [37]. There is also some evidence that the taxa are divergent in mtDNA [38, 39] and nuclear microsatellites [40]. A timely question is therefore whether the two taxa are fully diagnosable in multiple genotypic and phenotypic traits to an extent that would merit species rank. Here we present quantitative evidence in a broad array of traits, including mitogenomes, nuclear introns, biometrics, male plumage colour, song and sperm morphology. We assess the magnitude of these divergences in relation to divergences between undisputed sister species, following the current guidelines for species delimitation in avian taxonomy [14, 20].

Methods

Study species

The Blue Chaffinch Fringilla teydea is a medium-sized to large finch (~30 g), larger than the two other congeneric species, the Common Chaffinch F. coelebs and the Brambling F. montifringilla. The distribution of the Blue Chaffinch is restricted to the high-elevation (1200–2300 m) forests of Canary Pine Pinus canariensis on Tenerife and Gran Canaria [35]. Pine seeds constitute the staple food throughout the year [41, 42], but the diet also includes a significant proportion of arthropods [34, 35]. Like most finches, F. teydea is sexually dimorphic in size and plumage, with males being larger and more colourful (leaden-blue) than females (olive-brown). The two island populations are taxonomically recognized as subspecies and show morphological differences predominantly in adult males [34]: 1) polatzeki is slightly duller, more ashy-olive grey than teydea, 2) the black band on the lower forehead is considerably more pronounced in polatzeki, 3) the tips of median and greater coverts are light bluish-grey in teydea and distinctly broader and contrasting white in polatzeki, and 4) teydea is generally larger (bill, wing and body size) than polatzeki. Figure 1 depicts the adult male of the two taxa.

Fig. 1
figure 1

Males of the two Blue Chaffinch taxa, teydea and polatzeki, with sonograms of their territorial song. Photo courtesy: Eduardo Garcia-del-Rey

The male territorial song is about a 2 s long strophe in both taxa, which consists of a first phrase of a falling series of soft, disyllabic notes in polatzeki and a more same-pitch series of harder, monosyllabic notes in teydea. The second phrase is a prolonged syllable in both taxa, but it is markedly softer or subdued in polatzeki, and more like a crescendo in teydea. These differences are notable in the sonograms in Fig. 1.

According to the IUCN Red List, the total population size of F. teydea is 1800–4500 individuals, with the majority on Tenerife (teydea) and only less than 250 birds (polatzeki) left in the wild on Gran Canaria [43]. A more recent survey estimated a population size of about 16 000 individuals on Tenerife [44]. The polatzeki population has declined severely during the last century due to habitat loss from logging and fragmentation of the pine forest [34]. In 2007, a wildfire further destroyed much of the core habitat in the Inagua area, and the population size dropped to only 122 individuals the following year [45]. Although the species seems to cope well with wildfires of mild and moderate severity [46], access to high-quality pine forest habitat in combination with stochastic population fluctuations seems to be a critical factor to the survival of the small Gran Canaria population.

Data and sample collection

Data for analysis originate from museum specimens (plumage coloration, American Museum of Natural History, New York), from measurements and samples (sperm and blood) of birds caught in the wild (Tenerife) or in captivity (Gran Canaria; the wildlife recuperation center in Tafira), and from song recordings of wild birds on both islands. The Gran Canaria captive breeders were either wild-caught or the first generation of wild-caught birds.

One person (EGDR) measured their wing length (straightened and flattened chord, nearest 1 mm), tail length (from base between the central pair of rectrices to the tip, nearest 1 mm), tarsus length (between extreme bending points, nearest 0.1 mm), bill length (from skull, nearest 0.1 mm), bill depth (at distal edge of nostrils, nearest 0.1 mm) and body mass (0.1 g). Wing length was measured with a stopped ruler, tail length with an unstopped ruler, tarsus and bill with a digital calliper, and body mass with a Pesola 50 g balance. Since first-year birds have significantly shorter wings and tail than older birds [47], we excluded first-year birds for the analyses of these characters. About 10–30 μL blood was collected in a capillary tube after brachial venepuncture and stored in absolute ethanol for subsequent DNA analyses in the lab. Ejaculate samples (1–3 μL) were collected in a capillary tube after cloacal massage, diluted in 20–30 μL phosphate-buffered saline and fixed in 300 μL 5 % formaldehyde [48] for subsequent measurements of sperm morphometrics in the microscopy lab.

Songs were recored from nine individual male teydea on three locations on Tenerife (Vilaflor, La Guancha and Las Lagunetas) and eight individual polatzeki on Gran Canaria (Llanos de la Pez) during May 2015. All individuals were in adult plumage. All recordings were made by EGDR using a Fostex recorder with a parabolic Telinga microphone.

Genetic analyses

Genomic DNA was extracted using a commercial spin column kit (E.Z.N.A. DNA Kit; Omega Bio-Tek) or a GeneMole® automated nucleic acid extraction instrument (Mole Genetics), following the manufacturers’ protocols.

Mitochondrial DNA was amplified from high molecular weight DNA extracts using two primer pairs: MtCorvus531F (GGATTAGATACCCCACTATGC) and mtCorvus9431R (GTCTACRAAGTGTCAGTATCA), and mtCorvus8031F (CCTGAWCCTGACCATGAACCTA) and mtCorvus926R (GAGGGTGACGGGCGGTATGTA) designed for study on Ravens Corvus corax (JAA, AJ and AMK, unpublished data). These two primer pairs yielded amplicons of ~8900 bp (Amplicon 1) and ~9700 bp (Amplicon 2). Annealing sites and overlapping regions are illustrated in Fig. 2. The following PCR conditions were utilized for amplification: 1X reaction buffer, 200 μM of each dNTP, 0.5 μM of each primer, ~20 ng template DNA, 0.02 U/μl Q5 High-Fidelity DNA polymerase (New England Biolabs) and dH2O to a final volume of 25 μl. The following thermal profiles were employed: Amplicon 1 – Initial denaturation 98 °C in 30 s, 35 cycles with denaturation 98 °C for 10 s, annealing 59 °C for 20 s and elongation 72 °C for 7.5 min, and a final elongation step for 2 min. Amplicon 2 - Initial denaturation 98 °C in 30 s, 5 cycles with denaturation 98 °C for 10 s following a touch-down profile starting at 72 °C with 1 °C/cycle reduction, 30 cycles with denaturation 98 °C for 10 s, annealing 67 °C for 20 s and elongation 72 °C for 7.5 min, and a final elongation step for 2 min.

Fig. 2
figure 2

The two overlapping long-PCR amplicons and their primer positions in the avian mitogenome. The map was constructed in the software pDRAW32 v.1.1.129

The complete PCR reactions were transferred to a 0.8 % agarose gel and ran at 90 V. When completely separated, the respective amplicons were cut from gel and purified using the GenJet Gel Extraction Kit (ThermoFischer Scientific). Concentrations of the purified amplicons were measured on a Qbit instrument (ThermoFischer Scientific) and equimolar amounts of each amplicon were pooled. Twenty ng of pooled amplicons where sheared using a Covaris M220 Focused-ultrasonicator (Covaris, Inc.), running the pre-programmed DNA shearing protocol for 800 bp twice. To generate barcoded libraries for sequencing, we employed the NEBNext library prep kit for Ion Torrent (New England Biolabs) on the sheared amplicons, using the IonXpress barcode adapter kit (ThermoFischer Scientific). Barcoded libraries were pooled and size selected (440–540 bp) using a Bluepippin instrument (Sage Science). Concentration of the final library was measured on a Fragment Analyzer (Advanced Analytical) using the DNF-474 High Sensitivity NGS Fragment Analysis kit. The sheared, size selected and barcoded amplicons were sequenced on an IonPGM instrument (ThermoFischer Scientific). The samples were sequenced on two different 314 chips.

Trimming and removal of low quality reads were performed on the Torrent Suite ™ software (ThermoFisher Scientific). A Common Chaffinch mitochondrial genome (GenBank acc NC025599) was used as a reference in the Torrent Suite ™ software for coverage estimates, using the plugin coverageAnalysis (v4.4.2.2). Mitochondrial genomes were reconstructed using MITObim v1.8 [49]. The complete mitochondrial genome of Common Chaffinch was used as reference in the initial mapping. Mitochondrial genes were first automatically annotated using MITOs [50], and thereafter manually inspected. Distance estimates were calculated in MEGA6 [51] using the Maximum Composite Likelihood model for nucleotides [52] and the Poisson Correction model for amino acids [53].

For a more quantitative test of possible admixture of mitochondrial haplotypes between taxa, we sequenced the first part of the cytochrome oxidase subunit 1 (COI) gene (655 bp) for 175 individuals; 14 polatzeki and 161 teydea (see Additional file 1). We used the primer pair PasserF1 and PasserR1 [54].

Eight Z-chromosome introns (ALDOB-7, BRM-15, CHDZ-15, CHDZ-18, PTCH-6, VLDLR-7, VLDLR-8, VLDLR-12; [54]) were sequenced and screened for variation between the two taxa. Two loci (PTCH-6 and VLDLR-7) were found to have polymorphic sites and were sequenced for respectively 44 and 48 individuals [see Additional file 1]. The primer sequences for all introns are given by Borge et al. [55].

For the COI marker and the eight Z-chromosome introns, the PCR reaction volumes were 12.5 μL, containing 0.5 mM dNTPs, 0.3 U Platinum Taq DNA Polymerase (Invitrogen), 1x buffer solution (20 mM Tris-HCl, 50 mM KCl; Invitrogen), 2.5 mM MgCl2, 0.1 μM forward and reverse primer and 2 μL DNA extract. The reactions were carried out under the following conditions: 2 min at 94 °C, 35 cycles of [30 s at 94 °C, 30 s at 50 °C (PTCH-6), 55 °C (COI) or 57 °C (VLDLR-7), and 45 s at 72 °C], and a final extension period of 10 min at 72 °C. Cycle-sequencing reactions were carried out using the BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems), and the sequencing products were run on an ABI Prism 3130xl Genetic Analyzer (Applied Biosystems). The COI fragment was sequenced in both directions, whereas Z-introns were only sequenced with forward primers. Sequences were proofread in CodonCode Aligner v3.7.1 (CodonCode Corporation) and aligned in MEGA5.1 [56].

Divergences in mtDNA between the taxa were visualized in haplotype networks using the PopArt software [57]. All sequence data with their voucher information and Genbank accession numbers are given in Additional file 1.

Plumage colour measurements

We measured spectral reflectance of adult male plumage from five patches, i.e., back, crown, upper breast, rump and wing bar (median covert) on 15 specimens (10 teydea, 5 polatzeki) at the American Museum of Natural History (for voucher information see Additional file 2). Unfortunately, we were not able to measure the black band on the lower forehead because of poor feather structure in the study skins. Spectral reflectance was measured with an Ocean Optics USB2000 reflectance spectrophotometer, a PX-2 pulsed xenon light source (Ocean Optics, Dunedin, Florida) and a fiber-optic probe equipped with a ‘probe pointer’ to ensure measurements were taken at a constant distance and angle from the specimen. The following settings were used integration time = 20 msec, spectra average = 40 with a multiple strobe setting. All measurements were calibrated against a Spectralon white standard (Labsphere, North Sutton, New Hampshire) and a dark standard (no light). Five measurements were taken from the same spot on each plumage patch. The spectrometer was re-calibrated using the dark and white standards after every second plumage patch was measured (after ten measurements).

Raw reflectance spectra were imported in five nanometer (nm) bins between 300 and 700 nm using CLR: Colour Analysis Programs v1.05 [58], Brightness, chroma and hue colour variables [59] were calculated in CLRv1.05 using the formulae most appropriate for the slaty greyish-blue plumage of Blue Chaffinches [58]. These were B1 for brightness = R320–700), S5a for chroma = \( {S}_5=\sqrt{{\left({B}_r-{B}_g\right)}^2+{\left({B}_y-{B}_b\right)}^2} \) and H4a for hue = arctan ([(By − Bb)/B1] / [(Br − Bg)/B1]). In both H4a and S5a, b (blue) = 400–475 nm, g (green) = 475–550 nm, y (yellow) = 550–625 nm, r (red) = 625–700 nm. We visually screened for outliers and mis-measurements by comparing the five reflectance curves taken for each individual at each plumage patch using Excel (2010 Microsoft Corporation). One mis-measurement was removed from the dataset (AMNH-788194, F. t. teydea, back measure #2). Mean spectral reflectance curves and colour variables were then calculated from the independent measures taken for each individual using JMP 11 (SAS Inc., Cary, NC).

Principle components analysis (PCA) was used to condense the reflectance spectra (81 measures between 300 and 700 nm) into two principle components (PCs). PC1 explained the majority of the variation in all five plumage patches (percent variation: 86–92 %; eigenvalue: 70–75). PC2 explained only a small fraction of the variation at the five plumage patches (percent variation: 6–8.5 %; eigenvalue: 4.8–6.8). We used two-tailed t-tests assuming equal variances to test for differences in brightness, chroma, hue and reflectance PCs of male plumage between teydea and polatzeki. All statistics were performed in JMP 11 (SAS Inc., Cary, NC).

Song analyses

Songs were analyzed using Luscinia (https://github.com/rflachlan/Luscinia/). First, the repertoire size of individuals (within the sample recorded) was determined by visual inspection of spectrograms. Song types were highly stereotypic within an individual’s repertoire, leading us to have confidence in this method (note: the sample size of songs recorded per male was not sufficient to have confidence that all song types in each male’s repertoire had been recorded). An exemplar was chosen for each song-type in each male’s repertoire, and was measured using Luscinia. Spectrogram settings were as follows: frame length – 5 ms; time step 1 ms; maximum frequency – 10 kHz; dynamic range – variable between 40–50 dB depending on recording quality; dereverberation parameter – 100 %. A high-pass filter was applied before spectrograms were created with a threshold of 1.0 kHz. Measurement involved identifying trajectories of acoustic parameters for each element within the song, and classification of elements into repeated syllables and phrases. These methods have been described in previous studies, e.g., [60].

Next, songs were compared using Luscinia’s implementation of the dynamic time warping algorithm (DTW). This algorithm searches for an optimal alignment of acoustic features between syllables, and then allows pairs of syllables to be compared by measuring Euclidean distances along this alignment. The DTW method takes into account all of the considerable and variable frequency modulation within syllables and in so doing allows a more holistic comparison of syllable structure than methods based on a small number of structural parameters (maximum frequency, syllable length, etc.). Of particular relevance for this study, Luscinia’s DTW implementation has been successfully applied to a large dataset of songs recorded from the congeneric Common Chaffinch in which detailed descriptions of population divergence were found [60].

A key step in carrying out the DTW comparison is to normalize the various acoustic features relative to each other. In this analysis, the parameter weightings were as follows. Weighting of parameters: Time: 10; Fundamental Frequency: 2.755; Fundamental Frequency Change: 8.850; Vibrato Amplitude: 0.338; all others: 0. These values are the inverse of standard deviations across the Common Chaffinch database for the chosen parameters except for time, which is normalized in a different way (see [60] for further explanation). Compression factor was set to 0.2 (with a minimum element length of 10), Time SD was set to 1, a syllable repetition weighting of 0.2 was applied, and a maximum warp of 100 %. The weight by relative amplitude, log transform frequencies, interpolate in time warping, and dynamic warping options were selected. The comparison used the Stitch syllables method with 5 alignment points.

The DTW algorithm generated a dissimilarity matrix between syllables that was converted to dissimilarity matrices between songs (using the DTW method of integrating syllable dissimilarities) and between individuals (using the option of finding the best match between songs within individual repertoires). These dissimilarities formed the basis of further analysis.

First, we carried out a UPGMA clustering analysis of songs and individuals from the two Blue Chaffinch populations. We also clustered songs using the PAM k-medoid clustering algorithm and calculated the Global Silhouette Index for each k value as a way of searching for natural clusters in the data.

Next, we quantified divergence between populations. To do this we first calculated the spatial median of each population (based on an NMDS ordination of the data, using 20 dimensions). We then measured the acoustic distance between each song or individual data point to its own population’s spatial median, as well as that of other populations. To quantify a measure of divergence, d AB , between the two populations, A and B, we then calculated:

\( {d}_{AB}=\frac{\overset{n_A}{{\displaystyle \sum}_{i=1}}{d}_{i,SB}-{d}_{i,SA}}{n_A}+\frac{\overset{n_B}{{\displaystyle \sum}_{i=1}}{d}_{j,SA}-{d}_{i,SB}}{n_B} \)

SA and SB are the spatial medians of individuals or songs for populations A and B (with sample sizes nA and nB, respectively), and d iSA and d iSB represent the acoustic dissimilarity between a data point (individual or song) i and the spatial median for population A and B, respectively. The metric therefore quantifies the degree to which songs were closer to the spatial median of their own population rather than to the spatial median of the other population.

Local populations with no genetic differentiation may nevertheless be differentiated in song structure as a consequence of cultural divergence. Because cultural evolution is believed to occur at a much faster rate than genetic evolution, populations may become culturally differentiated in the face of high levels of gene flow between populations. It is important therefore to interpret levels of divergence in the context of how other genetically differentiated and undifferentiated populations have diverged. In this case, we used a large-scale analysis of Common Chaffinch, previously compared using the same DTW methods [60] and analyzed whether the differences seen between the two Blue Chaffinch populations matched those found between other Common Chaffinch populations.

Sperm morphometrics

A small aliquot of approximately 15 μl of the formaldehyde/sperm solution was applied onto a microscope slide and allowed to air-dry before inspection, digital imaging and measurement under light microscopy. We took digital images of spermatozoa at magnifications of 200× or 400×, using a Leica DFC420 camera mounted on a Leica DM6000 B digital light microscope. The morphometric measurements were conducted using Leica Application Suite (version 2.6.0 R1). We measured the length of head (i.e., acrosome and nucleus), the midpiece and the tail (i.e., the midpiece-free end of the flagellum) of ten intact spermatozoa per male, except for three males with very few measureable sperm (one teydea with 4 sperm, and two polatzeki with 2 and 1 sperm, respectively). Total sperm length was calculated as the sum of head, midpiece and tail length. We have previously shown that sperm length measurements have very low measurement error and high repeatability [61, 62]. In this study, sperm were measured by two different persons (TL and ES) who differed consistently in the way they measured sperm components, but were in close agreement over the measure of total sperm length (i.e., the sum of components). We therefore used their combined measurements for sperm total length, but only the measurements from one of them (TL) for the analyses of component lengths. Standardized values of intra- and intermale variation in sperm total length was expressed as the coefficient of variation (CV = SD/mean × 100). The intermale variation in mean sperm length (CVbm) is negatively correlated with the frequency of extrapair paternity across passerine birds [26] and is thus an indicator of the level of sperm competition. For this measure we applied a correction factor for variation in sample size (N), viz. CV = SD/mean × 100 × (1+ 1/4 N), as recommended by Sokal and Rohlf [63].

Estimation of phenotypic divergence

The quantitative estimation of phenotypic divergence was expressed by Cohen’s d [64] as recommended by Tobias et al. [20]. We also adapted the taxonomic scoring system proposed by Tobias et al. [20] for the relevant subset of recommended characters. It includes two biometric variables (the largest positive and the largest negative divergence), two acoustic variables (strongest temporal and spectral character divergences), and the three strongest plumage characters. Effect sizes were transferred to a taxonomic score of 0–4; e.g., Cohen’s d in the range of 0.2–2 gives a score of 1, d in the range of 2–5 gives a score of 2, d in the range 5–10 gives a score of 3, and d >10 equals a score of 4. Scores are then summed for all variables, and a total score of 7 or above qualifies for species status.

Results

Genetic divergences

The mitogenome sequencing in the two Ion PGM runs yielded 706,008 and 743,375 reads, respectively. In total, 37.3 % of reads filtered as polyclonals and 15.9 % as low quality reads. Sample-specific information regarding total number of reads, mapped reads, coverage and uniformity is provided in Table 1.

Table 1 Sequence read quality of the assembled mitogenomes of the teydea (N = 21) and polatzeki (N = 4) Blue Chaffinches

The mitogenomes contained 16784-16786 nucleotides. Gene annotation analyses revealed 13 protein coding genes, 2 rRNAs and 22 tRNAs. The gene arrangement followed the standard avian mitochondrial genome [65].

We found altogether 12 haplotypes among the 25 sequenced birds; nine among the 21 teydea and three among the 4 polatzeki. The haplotypes clustered in two distinct groups corresponding to the two taxa, as shown in a median-joining haplotype network (Fig. 3) with the mitogenome of a Common Chaffinch included as an outgroup. The mean substitution rate averaged 2.3 % for the entire mitogenome (Table 2). The table also shows the divergence for each gene region for nucleotides and amino acids, respectively.

Fig. 3
figure 3

A median-joining haplotype network of the mitogenomes of ssp. teydea (blue, N = 21) and ssp. polatzeki (red, N = 4) of the Blue Chaffinch Fringilla teydea, and the sister species Common Chaffinch Fringilla coelebs (yellow, N = 1). Numbers indicate mutation steps between haplotypes (single mutation steps not indicated). Specimen information and their respective GenBank accession numbers are available in Additional file 1

Table 2 Estimates of sequence divergence across the mitogenomes between teydea (N = 21) and polatzeki (N = 4) Blue Chaffinches

To test for the possible introgression of mitochondrial haplotypes, we sequenced a larger number of presumably unrelated individuals of the two taxa for the standard DNA barcode region, i.e., the first part of the COI gene [66]. We found four haplotypes among 161 teydea and two haplotypes among 14 polatzeki. As with the full mitogenomes, the COI sequences clustered perfectly into two groups concordant with the two taxa (Fig. 4). Hence, there was no evidence of mitochondrial mismatch in this large sample. We are therefore confident that the two taxa are reciprocally monophyletic in their mtDNA.

Fig. 4
figure 4

A median-joining haplotype network of the 655 bp COI sequences (i.e., the barcode marker) of ssp. teydea (blue, N = 161) and ssp. polatzeki (red, N = 14) of the Blue Chaffinch Fringilla teydea, and the sister species Common Chaffinch Fringilla coelebs (yellow, N = 1). Numbers indicate mutation steps between haplotypes (single mutation steps not indicated). Specimen information and their respective GenBank accession numbers are available In Additional file 1

We also found evidence of reciprocal monophyly in two Z-chromosome introns, PTCH-6 (559 bp) and VLDLR-7 (586 bp). There was one fixed point mutation in PTCH-6 between 24 polatzeki and 20 teydea sequences, and two fixed point mutations in VLDLR-7 between 26 polatzeki and 22 teydea sequences (Table 3). All three were G – A transitions. There were no other polymorphic sites in these two introns, or in any of the six other Z-linked introns sequenced from the two subspecies (see Additional file 1).

Table 3 Fixed nucleotide substitutions between teydea and polatzeki Blue Chaffinches in two Z-chromosome introns

Biometrics

Since both subspecies are sexually size dimorphic in adults, we analysed the biometric measurements separately for each sex (Table 4). The nominate subspecies was significantly larger than polatzeki for all traits in both sexes, except for female tarsus length (Table 3). The difference was especially large for wing length and bill length, for which Cohen’s d > 2 in both sexes (Table 4). A principal component analysis based on wing length, tail length and bill length clustered adult males in two distinct groups corresponding to the two subspecies (Fig. 5a, Table 5). For females, a similar tendency was found, though with some overlap (Fig. 5b, Table 4). We can therefore conclude that among adult males, the two subspecies have diagnostic, non-overlapping body-size distributions.

Table 4 Sex-specific morphological divergences between the teydea and polatzeki Blue Chaffinches
Fig. 5
figure 5

Principal component analyses of the body size divergence between ssp. teydea and ssp. polatzeki of the Blue Chaffinch Fringilla teydea. The analyses included wing length, tail length and bill length in (a) adult males (N = 73) and (b) adult females (N = 56). For statistics, see Table 5.

Table 5 Factor loadings of three body size variables on the first two principal components in Principal Component Analyses of a) male and b) female teydea and polatzeki Blue Chaffinches, and their eigenvalues, percentage of variance explained, and F-statistics

Plumage

Pictures of study skins from AMNH are shown in Fig. 6, and average reflectance spectra for the five body parts are shown in Fig. 7. For three of the body parts (crown, back and rump), teydea males had significantly higher brightness and PC1 scores (which to a large extent reflects brightness) than polatzeki males (Table 6). In contrast, the wing bar of the median coverts was significantly brighter and showed a higher chroma in polatzeki males than in teydea males. The effect on our estimate of chroma was due to a steeper slope in the UV/blue parts of the spectrum for polatzeki males (Fig. 7). There was no difference in the reflectance of the upper breast. In summary, teydea males display a brighter blue plumage in most body parts, whereas polatzeki display lighter wing bars.

Fig. 6
figure 6

Plumage differences between the two taxa of Blue Chaffinches; (a) lateral view of five polatzeki (left) and eight teydea (right), (b) dorsal view of the same birds, (c) abdominal view of two polatzeki (left) and two teydea (right) specimens. Note the brighter wing bars, smaller body size and more whitish belly in polatzeki. The specimens were photographed in the collection of American Museum of Natural History, New York [see Additional file 2]

Fig. 7
figure 7

Average reflectance (+ SE) from five plumage regions (a: crown, b: back, c: rump, d: upper breast, e: wing bar) of male teydea (blue lines, N = 10) and male polatzeki (red lines, N = 5) skins in the AMNH collection [Additional file 2]. Spectra were averaged for five scans of each plumage region

Table 6 Plumage colour differences between polatzeki (N = 5) and teydea (N = 10) Blue Chaffinches

Song

The male territorial song is about a 2 s long strophe in both taxa, which consists of a first section with one or two descending phrases of soft, disyllabic syllables in polatzeki and a more constant-pitch series of harder, monosyllabic syllables in teydea. The second section of the song consists of unrepeated syllables (or syllables only repeated once) with a buzzy, ‘vibrato’ characteristic. They are markedly softer or subdued in polatzeki, and more like a crescendo in teydea. These differences are notable in the sonograms in Fig. 1.

When we clustered songs (Fig. 8a) and individual repertoires (Fig. 8b) using the UPGMA algorithm, the two populations were clearly separated with no exceptions in either case. Similarly, k-medoid clusterings with songs and individuals classified both according to their population with 100 % accuracy with k = 2. Notably, both types of analysis represented unsupervised clustering analyses of the data (unlike, e.g., DFA), suggesting a clear divergence between the two populations. Furthermore, the Global Silhouette Index for songs (the only data set with sufficient sample size) showed a clear peak with k = 2 (Fig. 9), suggesting that a natural partition of the data set was into two clusters.

Fig. 8
figure 8

Dendrograms of (a) songs and (b) individual repertoires from Blue Chaffinch populations in Gran Canaria (polatzeki) and Tenerife (teydea). The dendrograms were calculated using the UPGMA clustering algorithm from a dissimilarity matrix generated by a DTW analysis

Fig. 9
figure 9

Global Silhouette Index values for different clustering solutions produced by the k-medoids clustering algorithm applied to songs of polatzeki and teydea. The GSI has a value greater than 0 when data are clustered more than expected by chance. Higher values represent a greater clustering tendency. The GSI tends to produce higher values with smaller values of k, so we corrected the GSI by comparing its output with simulated data-sets. The peak with k = 2 corresponds to the division of songs between polatzeki and teydea

The divergence score between polatzeki and teydea was larger than that found between any pair of Common Chaffinch populations (Table 7). The Blue Chaffinch populations both had lower divergence scores with at least one Common Chaffinch population than they did with each other (Table 7).

Table 7 Estimates of pairwise population divergence in song structure between populations of Common Chaffinches and Blue Chaffinches

These results demonstrate that Blue Chaffinch songs have diverged considerably between polatzeki and teydea. This can be illustrated in a Neighbor-Joining phylogram of all the populations included in the above analyses, based on inter-population differences in song structure (Fig. 10). This analysis successfully reconstructed much of the known topology of the relationships between these populations [60], suggesting a high phylogenetic signal for song structure. The only unusually placed population is Madeira (ssp. maderensis), which might have been predicted to be found in the Azorean – Canarian clade. In this phylogeny, the Blue Chaffinches connect near the root of the Atlantic Island and European Common Chaffinch populations. Moreover, and strikingly, the two Blue Chaffinch populations show substantial differentiation from one another.

Fig. 10
figure 10

Neighbor-Joining Phylogram showing evolutionary relationships between Common Chaffinch and Blue Chaffinch populations on the basis of song structure. For Common Chaffinch populations, only island/region is indicated

Sperm

Spermatozoa were significantly longer in polatzeki than teydea males (Fig. 11, Table 8), i.e., opposite to the contrast seen in body size dimensions. The difference was mostly explained by the length of the midpiece, which made up about 88 % of the sperm total length in both taxa (Table 8). Sperm heads were also significantly longer in polatzeki than in teydea. The variation in sperm total length within males, as expressed by the CVwm index, was exceptionally low in both taxa; i.e., close to 1 % (Table 8). Likewise, the variation among males in mean sperm total length (CVbm) was low in both taxa, and variances did not differ significantly between them (Table 8). The low CVbm values indicate a relatively high risk of sperm competition, and yield estimates of 39 % and 44 % extrapair young in broods of teydea and polatzeki, respectively, using the linear regression equation given in Lifjeld et al. [26] for passerine species.

Fig. 11
figure 11

Frequency distributions of total sperm length in the two Blue Chaffinch taxa teydea (blue) and polatzeki. (red). Fitted normal curves are indicated

Table 8 Sperm morphometrics of ssp. teydea and ssp. polatzeki of the Blue Chaffinch Fringilla teydea

Taxonomic scoring

We adopted the standardized taxonomic scoring system proposed by Tobias et al. [20] for species delimitation, and the scores are given in Table 9. For biometrics, the scoring system only allows for the largest increase and the strongest decrease among multiple variables, but since teydea is the larger in all characters, we could only select one. The largest effect size was estimated for bill length of males (Table 4), which scores as a medium divergence (score = 2). For scoring the acoustics, we used the quantitative measurements of temporal and spectral variables presented in Sangster et al. [37]; Table 4. The stable pitch in the first phrase of the teydea song versus the gradually decreasing pitch in polatzeki qualifies for a score of 3, whereas the increased amplitude of phrase 2 (crescendo) in teydea and reduced amplitude in polatzeki, gives a score of 2 (Table 9). We scored two diagnostic plumage traits, i.e., male body plumage colouration and male wing bars, as medium divergence. The male teydea has a brighter plumage, which gives a more bluish impression than the more greyish male polatzeki. The two wing bars formed by the tips of the median and greater coverts are white and sharply edged in the male polatzeki, resembling the conspicuous wing bars in the Common Chaffinch, whereas in the male teydea the wing bars are light bluish-grey with much less contrast. The black band on the lower forehead is thicker and more distinct in male polatzeki than in male teydea, and was scored as minor divergence (score 1; Table 9). The total score obtained was 11, which is above the threshold of 7 for species assignment recommended by Tobias et al. [20].

Table 9 Quantitative scores for species delimitation of the two Blue Chaffinch taxa teydea (Tenerife) and polatzeki (Gran Canaria), following the scoring system of Tobias et al. [20]

Discussion

Modern taxonomy emphasizes an integrative approach in species delimitation, which means that multiple trait divergences should be assessed in a comparative framework [6769]. This is the approach we have followed here. The Blue Chaffinches on Tenerife (teydea) and Gran Canaria (polatzeki) are clearly divergent in multiple traits. The distinct differences in male plumage and size formed the basis for their original description as two separate taxa more than hundred years ago [70]; see also [34]. Here we have analysed these characters more quantitatively and show that males are diagnostically different in colour reflectance curves in multiple plumage characters and in body size dimensions. We have also shown that the two Blue Chaffinch taxa are reciprocally monophyletic in mitochondrial and nuclear DNA, and are divergent in sexual traits, like song and sperm, that are presumably important traits in mate choice and fertilization success.

In a recent paper, Sangster et al. [37] indicated a qualitatively similar divergence in male plumage characters and documented non-overlapping body size distributions in adult males, using a similar PCA approach as we have presented here. They also analysed song and calls, and found strong divergences between the two taxa. Moreover, they showed in a playback experiment that polatzeki males responded more aggressively to polatzeki songs than to teydea songs. Our two studies are therefore congruent in demonstrating multiple character divergences in the two taxa. When we employed the quantitative scoring system for multiple phenotypic characters proposed by Tobias et al. [20], which specifically consider effect sizes in biometrics, plumage and vocalizations, we found that the divergence score (=11) for teydea and polatzeki exceeded the threshold (= 7) set for the species level.

The quantitative scoring system proposed by Tobias et al. [20] is designed to infer a meaningful assessment of reproductive isolation for allopatric taxa, which cannot be tested directly. This is a key criterion under the biological species concept. Species are seen as hypotheses in which divergences are tested against a reference material drawn from a number of sympatric or indisputably good sister species. The system is a helpful guide, but it also has some shortcomings. First of all, it does not include genetic divergences, which are essential for unravelling a species’ evolutionary history and uniqueness. Second, the emphasis of biometry, plumage and vocalization may not always be sufficient or relevant to predict reproductive isolation, and their relative weights in the scoring system may seem somewhat subjective. In the following we therefore discuss these aspects in more detail for the case of the Blue Chaffinch.

The multiple trait divergences observed between the two island populations of the Blue Chaffinch are the results of a long history of allopatric evolution. Their mitogenomes show an overall genetic distance of 2.3 % with some variation among the various genes (Table 2). Previously, Suarez et al. [39] reported a genetic distance of 2.3 % for the cytochrome b gene between teydea and polatzeki, which is similar to our value for the same gene (Table 2). In birds, mitochondrial genes seem to evolve in a clock-like manner, with around a 2 % sequence divergence per million years as estimated for the cyt b gene across multiple avian orders and with multiple calibration points over the past 12 million years [71]. Accordingly, the two populations may have evolved independently for as long as one million years. Our COI sequencing of a fairly high number of individuals revealed no cases of introgression. Hence, their mitogenomes seem to be completely sorted and there is no evidence of recent or past gene flow between the two populations. It must be noted that mitochondrial DNA is inherited only through the maternal line, so the lack of mitochondrial gene flow suggest no female dispersal. However, we also found fixed mutations in nuclear DNA, i.e., two Z-chromosome introns (Table 2), which suggests that there are two distinct nuclear gene pools with no evidence of gene flow in either sex. Nuclear genes generally sort more slowly and have longer coalescence times [72]. Six other Z-chromosome introns showed no sequence divergence and should be considered incomplete lineage sorting. We have also shown in a previous study [40] that there are significant differences in allele size ranges for several microsatellite DNA markers.

Even though species limits cannot be inferred directly from sequence divergence in mtDNA, it is interesting to note that there are many avian sister species with shorter genetic distances than what we see among the two subspecies of the Blue Chaffinch. A survey of the DNA barcode database BOLD [73] reveals that even within the same family (Fringillidae), there are sister species with a divergence less than 2 % in the genera Carduelis, Spinus, Acanthis, Pyrrhula and Loxia [74, 75]. Some of them, e.g., Acanthis and Loxia, are not even monophyletic, and their species taxonomy may be doubtful [76], but others are undisputedly good species.

Many subspecies on oceanic islands are distinct evolutionary units [77] and should undergo taxonomic revision. In the Canary Islands, the Blue Chaffinch is not unique in showing strong evolutionary divergence among islands. There is a lot of endemism in the fauna and flora of Canary Islands [78], and among birds there are several distinct island-specific taxa with subspecies status. In particular, several forest-dwelling passerines show distinct differentiation between the islands of Tenerife and Gran Canaria as for example the Afrocanarian Blue Tit Cyanistes teneriffae [79, 80], the Common Chaffinch [39] and the European Robin Erithacus rubecula [81]. Their genetic distances (cytochrome b) between Tenerife and Gran Canaria populations range from 1.0 % in Common Chaffinch [39] to 3.7 % in the European Robin [81]. These taxa should undergo further taxonomic assessment with respect to their species limits.

The habitat of the Blue Chaffinch is the forest of the Canary Pine, and pine seeds constitute a major food resource [41, 42]. The pine forest on Tenerife has not been so extensively logged as the one on Gran Canaria, and contains in general more older and larger trees, with larger cones and seeds than the reduced and mostly replanted forest in Gran Canaria [34]. The larger bill and body size in teydea in Tenerife than in polatzeki on Gran Canaria, therefore possibly reflect adaptations to different environments. It is an interesting parallel to the differentiation in bill and body size seen in Darwin finches on Galapagos, which are interpreted as adaptations to different seed sizes [82]. However, species limits in Darwin’s finches is a highly contentious issue because bill and body size seem to evolve fast in these birds without any strong evidence of reproductive isolation [83]. It has therefore been argued that differentiation in these phenotypic characters reflect local adaptations within the same species rather than reproductive isolation between species. The same argument may apply to the Blue Chaffinch case. The differentiation in biometrics observed between teydea and polatzeki may be of rather recent origin due to human-induced habitat differences, and may not be a good indicator of reproductive isolation.

Traits that are more directly involved in reproduction, like sexual characters used in mate choice and mating competition, and gametic traits with a function in fertilization, may be more suitable as indicators of reproductive isolation. Elaborate plumage colour and feather ornaments, and display behaviours like song in passerine birds, are generally believed to evolve by sexual selection [84] and thereby function as premating isolation mechanisms [7].

Sperm traits, on the other hand, have a reproductive function after copulation and may represent a postmating or postcopulatory isolation mechanism. Our reasoning is that divergent sperm traits may be particularly important in passerine birds with sperm competition, because postcopulatory processes may be relatively more important for individual fitness in these systems. Across bird species, sperm length is positively correlated with the length of female sperm storage tubules [85, 86]. It has been demonstrated in insects that the form of the female reproductive tract drives the evolution of sperm morphology [87]. This is consistent with the idea that sperm length in a given species is adapted to its specific female environment. Nevertheless, there is also considerable variation in sperm length among males within the same population [60, 88], and this variation has a strong genetic basis [89]. Typically, species with large intraspecific variation in sperm length are characterized by little or no sperm competition, whereas species with a narrow sperm length variation have high levels of sperm competition [26, 28, 48, 90]. All of this suggests that sperm competition acts as an evolutionary force of stabilizing or purifying selection that reduces variation in sperm length among males in the population. That is, sperm competition should favour sperm lengths close to the population mean, and disfavour those that are particularly long or short. It further implies that two populations with divergent sperm length distributions may be reproductively incompatible under sperm competition. Hence, sperm traits might predict the presence of a postcopulatory prezygotic barrier [91, 92], or “competitive gametic isolation” (sensu Coyne & Orr [4]), and thus be taxonomically informative in delimitation of allopatric species that exhibit a mating system with sperm competition. It must be emphasized that the role of sperm competition as a selection force on sperm size variation in passerine birds is at present only a hypothesis, which needs empirical testing.

Conclusions

We conclude that the two Blue Chaffinch subspecies fulfil all the major criteria for species delimitation, and should therefore be assigned species status: the Tenerife Blue Chaffinch Fringilla teydea and the Gran Canaria Blue Chaffinch Fringilla polatzeki. This is in agreement with the recommendation by Sangster et al. [37]. The population on Gran Canaria is critically endangered with just a few hundred birds left in the wild [45]. Recognition of the population as a full species will presumably increase attention to its conservation as one of the most critically endangered birds in Europe [37]. However, we emphasize that conservation arguments should have no weight in the taxonomic assessments of species limits, and they have not affected our analyses and recommendation here either.