Disentangling the taxonomic status and phylogeographic structure of Marmora’s (Curruca sarda) and Balearic Warbler (Curruca balearica): a genetic multi-marker approach

Marmora’s Warbler (Curruca sarda) and Balearic Warbler (C. balearica) are allopatric sibling species and were recently split mostly based on morphological and ethological characteristics. Here we provide the first phylogenetic and phylogeographic analyses of this species complex to support the taxonomic status of C. sarda and C. balearica in light of integrative taxonomy. We sampled the two taxa in most of their breeding ranges and we sequenced three mitochondrial and one nuclear gene region. All C. balearica individuals had private haplotypes for the four markers and formed monophyletic clades. Genetic distances between the two taxa were comparable with those found between other species belonging to the Curruca genus. Furthermore, most of the genetic variance was expressed at the interspecific level, rather than between different populations within taxa or between individuals within populations. Our results strongly support the current taxonomic status of these two warblers as distinct species.


Introduction
The Mediterranean basin is one of the world's best explored areas in terms of ornithological research and it is considered a main biodiversity hotspot for the Western Palearctic (Myers et al. 2000). In fact, it acted as a center of origin and diversification for many taxa (e. g. Perktas et al. 2011;Pellegrino et al. 2015;Kakhki et al. 2018) due to glaciations and salinity crisis events (Taberlet et al. 1998). Since the advent of the Linnean classification, taxonomic research on Mediterranean bird species has usually been based on the analysis of morphological differences among populations. Such traditional approach might, however, have led to underestimating the actual degree of diversification occurring between isolated populations across the Mediterranean region. This divergence can be better evaluated in terms of genetic divergence and the development of reproductive isolation barriers, which in oscine birds are usually mediated by vocalizations and niche differentiation (Mason et al. 2017).
Recent studies on several bird taxa traditionally regarded as single 'biological' species have revealed the existence of morphologically similar (or nearly so) populations which are so highly diverging in terms of genetics and vocalizations that definitely belong to separate taxa (e.g. Curruca subalpina, cantillans and iberiae, Brambilla et al. 2008;Zuccon et al. 2020; Muscicapa tyrrhenica, Pons et al. 2016). This kind of research has recently led to similar results not only in birds but also in many other groups of vertebrates, including amphibians (e.g. North Italian tree frog Hyla sp., Dufresnes et al. 2018;Pelodytes caucasicus-ibericus, Díaz-Rodriguez et al. 2017), reptiles (e.g. Tarentola mauritanica complex, Rato et al. 2016) and mammals (e.g. Myotis nattereri complex, Juste et al. 2018; Sciurus meridionalis, Wauters et al. 2017), and to the identification of species and subspecies in invertebrate taxa, such as endangered crustaceans (e.g. Bernini et al. 2015) or insects (e.g. Galimberti et al. 2021). In this context, the application of advanced and integrative analytical methods is needed to shed light on the species complexes, but it is also useful in cases of apparently known and widespread taxa (Ancillotto et al. 2020;Zuccon et al. 2020;Pellegrino et al., 2017). The Mediterranean area is a biodiversity hotspot, but it is under threat from a high population density and the consequent human pressure on ecosystems, leading to biodiversity loss (Cutteload et al. 2008). On the islands, the Last Glacial Maximum, the distance from the continent and from other similar habitats, size and elevation (Steinbauer et al. 2013) modelled biodiversity and promoted phenotypic and genotypic differentiation, especially in animals with low dispersal abilities (such as reptilia, Escoriza 2020) but also in birds (Covas and Blondel, 1998). The Marmora's Warbler (Curruca sarda, Aves: Passeriformes: Sylviidae, formerly Sylvia sarda) is a Mediterranean endemic species with an insular distribution: it inhabits Sardinia, Corsica, Balearic Islands excluding Menorca and some smaller islands alongside Western Italy in the Tyrrhenian Sea. According to the 'traditional' taxonomy, it has been treated as a polytypic species with two allopatric subspecies, slightly differing in terms of plumage and bareparts colouration: Sylvia sarda balearica (von Jordans, 1914) endemic to Balearic Islands, and the nominate S. s. sarda (Temminck, 1820), inhabiting the rest of the breeding range. Whereas a difference in vocalizations between the two subspecies was noted by von Jordans (1914), their taxonomic status had not been questioned until the early 2000s. Shirihai et al. (2001) proposed to elevate S. s. sarda and S. s. balearica to the rank of allospecies within the superspecies S. [sarda]. They based their taxonomic hypothesis on genetics, analyses of song, calls and morphometric divergence patterns, and secondarily, on plumage characters (less marked in juveniles and adult females) and bare-parts colouration. However, to date, no published genetic evidence about this taxonomic splitting and about the genetic structure of this species complex across the Mediterranean region has been provided. Voelker and Light (2011), in a study about the influence of Palaeoclimatic events on the biogeographic distribution of Sylvia warblers, showed a phylogeny of the Sylvia genus based on ND2 (NADH dehydrogenase 2) and cytb (cytochrome b) markers in which they a priori treated S. sarda and S. balearica as full species. On the basis of these results, Dickinson and Christidis (2014), Sangster et al. (2016), del Hoyo et al. (2016), and Shirihai and Svensson (2018) accepted them as two distinct species. Moreover, Dickinson and Christidis (2014) and more recently, Gill et al. (2020) recommended treating Marmora's and Balearic warblers as belonging to the Curruca rather than the Sylvia genus.
It seems, therefore, clear that a complete phylogeographic analysis is needed to definitely support the splitting of the two taxa and consequently to design appropriate management and conservation plans. Here, we aim to fill this knowledge gap in order to clarify the taxonomic status of the Curruca [sarda] species complex, integrating the already available information about morphology and vocalization with multi-marker genetic data and their geographical structuring.

Field methods
Sample collection was carried out during 2013-2015 between May and June (which correspond to the peak of the breeding season; Brichetti and Fracasso 2010). Sampling sites covered 26 locations scattered within the breeding range of C. sarda and C. balearica, on the islands of Mallorca (Spain), Corsica (France), Sardinia and Elba (Italy) (Fig. 1). In order to obtain a sampling scheme balanced both over the whole breeding range of the species and within each island, we collected samples from localities as far apart as possible. Birds were trapped by mist-nets and biological samples were obtained by picking one or more feathers which were stored in 96% ethanol at − 20 °C until laboratory analysis. The birds were released immediately after sampling.

DNA extraction, amplification and sequencing
Total DNA was extracted from feathers using the DNeasy Blood and Tissue Kit (Qiagen), following the manufacturer's instructions, with the addition of 20 μl DTT (dithiothreitol) to the initial incubation step at 56 °C to accelerate the enzymatic digestion of membranes and protein structures. To examine the genetic differentiation and relationships within the species complex, we amplified and sequenced 2026 base pairs (bp) from three gene regions of the mitochondrial (mt) DNA and a nuclear (n) gene region, in order to verify potential mito-nuclear discordance phenomena, which may reflect incomplete lineage sorting or other kinds of processes which may not be highlighted by mtDNA alone (Ghorbani et al.  Ottenburghs et al. 2020). Specifically, we sequenced the second half of the mitochondrial cytochrome b (cytb) (537 bp, with primers 648L and 1137H, Brambilla et al. 2008) and the nuclear Glyceraldehyde-3-phosphate dehydrogenase (G3PDH) intron 11, spanning 358 bp, 274 bp of intron 11 and 84 bp of flanking exons, (primers G3P13b and G3P14b, sequenced with primers G3PIntl1 and G3P14b, Irestedt et al. 2002) for all the sampled individuals. Furthermore, in order to improve phylogenetic resolution between taxa, we also sequenced a fragment of 828 bp of the mitochondrial ND2 (with primers L5215, Hackett 1996and H1064, Drovetski 2004) and a fragment of 661 bp of cytochrome c oxidase (COI; primers BirdF and BirdR1, Hebert et al. 2004) for a subsample of individuals representing the investigated populations (Supporting Information Appendix 1). Amplifications were carried out in 10-μl reactions using 2 μl of template DNA solution, 0.80 μl 10 × Taq buffer with 15 mM MgCl 2 (5Prime, Gaithersburg, MD, USA), 0.80 μl of 0.2% BSA, 0.15 μl of each 0.2 mM primer, 0.40 μl of 2.5 mM dNTP, 0.2 U of Taq polymerase (5Prime, Gaithersburg, MD, USA), with the following thermal profile: 94 °C for 2 min; 45 cycles at 94 °C for 30 s, Ta°C for 30 s, 72 °C for 30 s; 72 °C for 10 min; Ta was 55 °C for cytb and ND2 and 48 °C for COI. PCR temperature profile for the amplification of G3PDH consisted of the following steps: 94 °C for 2 min; 35 cycles at 94 °C for 35 s, 57 °C for 35 s, 72 °C for 50 s; 72 °C for 10 min. Amplicons were purified by enzymatic digestion of excess of primers with ExoSap-IT (Affymetrix, Santa Clara, CA) and the samples were bi-directionally sequenced using Big-Dye terminators (Thermo Fisher Scientific, Waltham, MA, USA) in an ABI Prism 3100 XL Genetic Analyzer (Applied Biosystems, Foster City, CA, USA).

Phylogenetic analyses
Sequences were manually proofread and corrected in Seqscape v2.5 (Applied Biosystems, Foster City, CA, USA) and aligned with ClustalW in MEGA7 (Kumar et al. 2016) with default parameters. In order to exclude the amplification of nuclear pseudogenes of mitochondrial origin (NUMTs) we verified the absence of stop codons by translation of all sequences with the EMBL-EBI Emboss Transeq tool (https:// www. ebi. ac. uk/ Tools/ st/ emboss_ trans eq/) with the subsequent options: frame = F (Forward three frames), codon table = "Vertebrate Mitochondrial". Since no G3PDH sequences contained insertions/deletions or double heterozygotes, genotype phasing was straightforward and carried out using Phase 2.1 implemented in DnaSP v5 (Librado & Rozas 2009) without ambiguity. All sequences were trimmed in Bioedit (Hall 1999) and collapsed to unique haplotypes using DnaSP v5 (Librado and Rozas 2009). The unrooted minimum spanning networks were obtained using the median-joining algorithm (Bandelt et al. 1999) implemented in PopART (http:// popart. otago. ac. nz). All pairwise distances, within and among groups, were calculated using the p-distance method in MEGA 7; standard error estimates were obtained by a bootstrap procedure (1000 replicates). Estimators of molecular diversity, i. e. haplotype number, haplotype diversity (Hd), average number of nucleotide differences (k) and nucleotide diversity (π), Tajima's D (Tajima 1989) and Fu and Li's D* and F* (Fu and Li 1993) tests for neutrality were computed for the analysed genetic markers as implemented in the software DnaSP v5 for the entire dataset and for the single islands.
The identification of the best nucleotide substitution model for each dataset was carried out in jModelTest 2 (Darriba et al. 2012), among all the 203 possible models available (integrating the estimate of nucleotide frequencies, the proportion of invariable sites and a gamma model with 4 categories), using all the possible evaluation criteria (AICc, BIC, DT). The base tree was ML optimized and the base tree search parameter was set on "best". jModelTest identified as the best fit models for cytb, ND2, COI mitochondrial haplotypes, respectively, HKY + I, TPM1uf + I, GTR + I.
The Bayesian Inference trees were generated using MrBayes 3.2.6 (Ronquist et al. 2012) with the selected models for each gene and the concatenated dataset; 3 cold and 1 hot Metropolis-coupled Markov chain Monte Carlo chains were run for 2*10 7 generations and sampled every 1000 generations, with default parameters. Two runs with different starting points were carried out. The first 10% of generations were discarded as burn-in, and the posterior probability (PP) values were calculated for the remaining 1.5*10 7 generations or more until the average standard deviation of split frequency was below 0.01. Calculated trees were drawn using FigTree v1.4.4 (http:// tree. bio. ed. ac. uk/ softw are/ figtr ee). We added, to the mtDNA datasets, one or more sequences to be used as an outgroup (Curruca undata sampled on Mallorca during the ringing campaign of this study and COI GenBank ID KP780415; Sylvia atricapilla GenBank ID AM889140, Curruca communis GenBank ID GU571643; Curruca nisoria GenBank ID GU571647). The concatenated dataset was obtained with only mitochondrial haplotypes due to the very different evolutionary history and relatively rapid mutation/ substitution rates that the nuclear and mitochondrial genes have in animal taxa (Brown et al. 1979).
In the main text, we only show the tree originated from the concatenated mtDNA sequences (Fig. 3); individual gene trees are shown in the online Supporting Information Appendix 3.
Analysis of Molecular Variance (AMOVA) was conducted using Arlequin v3.5 (Excoffier and Lischer 2010) to examine the structure of molecular variation. AMOVA calculates Φ statistics, which are analogous to F-statistics, in order to assess how much of the molecular variation is explained at different levels of structuring. Φ CT values indicate the amount of variation explained among groups, Φ SC values indicate the amount of variation explained among populations within groups and Φ ST values indicate the amount of variation explained within populations. In the analyses we defined the two taxa as "groups" and the four islands as "populations" (Mallorca for C. balearica; Sardinia, Corsica and Elba for C. sarda).

Results
We sampled 62 individuals from 27 sites distributed in four different regions (13 individuals from Mallorca, 4 from Elba island, 31 from Sardinia and 14 from Corsica), from which we obtained a total of 191 sequences (Supporting Information Appendix 1). Genetic diversity indices calculated for the four markers are reported in Table 1. All mtDNA sequences were functionally translated without premature terminations, with no evidence of NUMTs.
Median Joining haplotype networks (Fig. 2) identified five unique haplotypes from cytb sequences, 12 from ND2 and seven from G3PDH, while only two haplotypes were found for COI (Table 1, Supporting Information Appendix 1). Haplotype sequences were deposited in GenBank (refer to Supporting Information Appendix 2 for accession numbers). No shared haplotypes were observed between the two species for both mitochondrial and nuclear markers. Haplotype cytb-3 (cytb marker) and nd2-4 (ND2 marker) were the most frequent, as they were found in 29 and 31 individuals, respectively (Fig. 2). Haplotype g3pdh-2 was the most frequent for G3PDH marker with 42 occurrences in C. sarda. All these haplotypes were shared by the three C. sarda populations. All C. balearica samples were homozygotes for a private allele, g3pdh-1, while the remaining haplotypes were shared by individuals sampled in at least two of the other islands (Sardinia, Corsica and Elba), except for g3pdh-4, which occurred only in Sardinian birds (Fig. 2). We identified eight haplotypes for the concatenated alignment, three found in Mallorca individuals and five in Elba, Corsica, and Sardinia individuals (Fig. 2). Concatenated mtDNA haplotypes showed 98 parsimony informative sites.
We estimated a haplotype diversity ranging between 0.154 and 0.758; the lowest value was detected in C. balearica Table 1 Genetic diversity indices for mitochondrial and nuclear markers for Curruca balearica from Mallorca and three populations of C. sarda: number of individual (haploid) sequences, number of haplotypes, nucleotide diversity (π) ± standard deviation, haplotype diversity (Hd) ± standard deviation, and average number of nucleotide differences (k), Tajima  from Mallorca (cytb marker), while the largest value was detected in C. sarda from Corsica (nuclear marker). Except for these two extreme values, the other haplotypes' diversities ranged between 0.378 and 0.603. Nucleotide diversity ranged from 0.00029 (SD = 0.00023) in Mallorca (cytb marker) to 0.00306 (SD = 0.00038) in Corsica (G3PDH marker) ( Table 1). The analysis of COI sequences showed a marked difference among the two taxa and lack of genetic diversity within populations. We found only two haplotypes: one for C. balearica (Mallorca population haplotype coi-1) and one for C. sarda (Elba, Sardinia and Corsica, haplotype coi-2) differing for 36 mutations. Tajima's D and Fu and Li's D* and F* tests showed positive but not significant values, except for the ND2 sequences of Sardinian populations, which showed negative and significant values; genetic diversity tests did not detect any bottleneck or directional selection events (Table 1). A Bayesian tree, computed with the concatenated mitochondrial sequence alignment (cytb, COI and ND2, 2026 bp long; 8 haplotypes; 23 individuals) identified two clusters corresponding to the two species (Fig. 3): one clade included all the individuals of C. balearica sampled in Mallorca, and the other one included all the individuals from Sardinia, Corsica and Elba. These structure and clade separations were also confirmed by the Bayesian trees obtained using each dataset independently (cytb, ND2 and COI; Supporting Information Appendix 3).

Fig. 2
Median-joining networks of ND2, G3PDH, cytb and COI haplotypes of Curruca balearica and C. sarda (see Supporting Information Appendix 1 and 2). Each circle represents a haplotype and circle size is proportional to haplotype frequency. Colours indicate different sampling islands. Dashes represent substitutions and when > 10, the number of substitutions is also reported within brackets  Table 2. Distances within populations were mainly low, ranging from 0 (no divergence in COI within any population) to 0.004 (Elba population for G3PHD), whereas pairwise comparisons between the two species showed higher values, ranging from 0.001 for cytb to 0.069 for ND2. The degrees of divergence between C. sarda populations (Sardinia, Corsica and Elba) were comparable with their respective within-population variation levels.
AMOVA results for each mtDNA marker and the concatenated dataset are reported in Supplementary Information Appendix 4. Analyses at both loci revealed that most part of the molecular variation (ranging from 90.48% to 100% for mtDNA markers and about 70% for the nuclear G3PDH) was attributable to differences between the two species. The lowest variation occurred among different populations within each taxon.
Hierarchical structure levels were significant at α = 0.05. In particular, group-level structuration always obtained a p-value close to the theoretical minimum p-value.

Discussion
Our results indicate a clearly defined genetic differentiation between the two taxa previously suggested to be distinct species, C. balearica and C. sarda. All C. balearica individuals possessed private haplotypes for each analysed marker, whereas C. sarda individuals from distinct populations largely shared the remaining haplotypes. The existence of private alleles was indicative of a prolonged isolation and a degree of differentiation which is coherent with the recently proposed two-species taxonomic suggestion. AMOVA again highlighted that most of the molecular divergence between individuals of the C.
[sarda] complex is mainly due to the substructure in two different blocks, coinciding with the two currently recognized species. The divergence between individuals inhabiting different islands of the Mediterranean (Sardinia, Corsica, and Elba) was very low and far lower than the average differentiation level between individuals inhabiting any single island. The amount of divergence between C. sarda and C. balearica is quite high for populations separated by only a few hundreds of kilometres and greater than the divergence between other sister species within the Curruca genus (e.g. C. crassirostris and C. hortensis; Voelker and Light 2011). The genetic distance calculated between C. sarda and C. undata on COI marker (4.75%) was similar to that detected between C. sarda and C. balearica (5.8%), while the genetic distance between Dartford Warbler C. undata and C. balearica (3.51%) showed smaller values; this evidence supports previous conclusions that C. undata and C. balearica would be regarded as sister species (Voelker and Light 2011). Similar results were found by Cai et al. (2019) in a multi-locus phylogeny of babblers, confirming the paraphyly of C. sarda as a traditionally defined polytypic species including C. balearica as a subspecies and reinforcing the need for treating C. sarda and C. balearica as distinct species. Overall, the huge genetic differences found between C. sarda and C. balearica suggest a long-term interruption of gene flow between their populations; a similar pattern was found by Pons et al. (2016) between continental and insular populations of Muscicapa (M. striata vs M. tyhrrenica) and, in a striking parallel with our case study, between Muscicapa tyrrhenica thyrrenica (inhabiting Corsica and Sardinia) and M. t. balearica (Balearic Islands); in both these cases the Mediterranean islands seem to have played an important role in taxa isolation and differentiation.
Our results clearly indicate the existence of two different and independent genetic characters (nuclear and mitochondrial markers) which allow to distinguish between C. sarda and C. balearica individuals; the two taxa may, therefore, be considered "full" species when strictly applying the original Phylogenetic Species Concept (Cracraft 1983). Moreover, these genetic differences are supported by divergences in vocalizations, biometry, plumage and bare-parts colouration (detailed in Shirihai et al. 2001). The existence of multiple independent traits allowing the recognition of two or more taxa (or Operational Taxonomic Units) is considered a solid criterion of species delimitation according to the paradigm of "integrative taxonomy" (Padial et al. 2010;Schlick-Steiner et al. 2010;Galimberti et al. 2012). On the whole, the present study improves our knowledge on two Mediterranean birds, allowing to recognize two taxa as two distinct species. It would be worth exploring, in the near future, other interesting biological attributes of the two species, in particular ecological needs and possible competition with other warbler species within the Mediterranean region. Moreover, detailed monitoring and estimates of population size of these species endemic to Mediterranean islands are needed to assess their conservation status and evaluate potential threats for their long-term survival. Indeed, small and isolated island populations may be at immediate risk of local extinction arising from stochastic environmental events and the adverse genetic effects deriving from inbreeding (as observed in different avian species, e.g. Pertoldi et al. 2012;Andersen et al. 2017;Kangas et al. 2018).
Since several Mediterranean ecosystems are at risk because of land-use change, urbanization, fragmentation and recurrent fires, it is essential to consider shared and island-specific threats to these endemic species characterized by such a very restricted distribution. In particular, C. balearica is sedentary and endemic to the Balearic Islands, excluding Menorca; recently, the impinging touristic development has been identified as a cause of disturbance and decline for birds and other vertebrate populations on these islands (Krawczyk et al. 2019). Recognizing C. balearica as a distinct species should call for special protection measures for this separate conservation unit. C. sarda has also a rather restricted range limited to Sardinia, Corsica and some little islands off Western Italy (plus small wintering areas in north Africa and Sicily; Shirihai et al. 2001) and is potentially vulnerable to the same threats affecting C. balearica. Moreover, both species could be subject to the negative effects of climate changes because of their restricted range and mostly insular distribution (Doswald et al. 2009).
Romolo Caniglia, Andrea Galimberti and Chiara Mengoni performed genetic and data analyses; Ettore Randi, Diego Rubolini, Fernando Spina, Gabriel Gargallo and Mattia Brambilla were involved in planning, organization, logistics and theoretical validation of the study. All authors collaborated to the manuscript drafting.
Funding Open access funding provided by Università degli Studi del Piemonte Orientale Amedeo Avogrado within the CRUI-CARE Agreement. This work was part of the Ph. D. of Davide Nespoli, funded by a bursary of University of Milan. Genetic analyses were funded by BIO-CGE ISPRA and "Ricerca locale" to Irene Pellegrino. Availability of data and material All genetic sequences are available on Genbank NCBI. Supplementary data are available through the publisher's website.

Conflicts of interest
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.