Introduction

Speciation is a fundamental biological process that involves the evolution of reproductive isolation between diverging populations. Whilst the general processes of geographical isolation, genetic divergence and evolution of post-mating reproductive isolation characterising allopatric speciation are better known1, those involved in sympatric speciation, the emergence and divergence of two gene pools within a panmictic population, are still poorly understood2,3. Theoretical models have shown that this requires genes of local adaptations to become associated with those of mate choice4,5,6,7. In the face of ongoing gene flow, this association is thought to occur only under a restricted set of genetic and environmental conditions. Under some circumstances, strong divergent natural selection might negate the effect of gene flow at very small spatial scales and could be the primary driver of speciation1,8. Features of genomes such as chromosomal inversions, peri-centromeric regions and other regions characterized by reduced recombination can also facilitate the association of local adaptation and assortative mating loci2,5,6,7. In addition, sex chromosomes are thought to promote the rapid accumulation of pre- and post-mating isolation genes due to their hemizygosity and lower recombination rate9. Currently, teasing out the genomic signature of the onset of speciation from the genomic processes that follow the establishment of intrinsic post-mating reproductive isolation constitutes a major challenge and there are few natural model systems that allow it2,10,11.

The Anopheles gambiae complex comprises a number of important vectors of human malaria in Africa that are separated by various degrees of reproductive isolation12,13. Amongst its different cryptic taxa, two sibling species, An. coluzzii and An. gambiae s.s., formerly known as M and S molecular forms14 are undergoing speciation with gene flow and may provide the ideal model system for studying the genomic signature of pre-mating isolation independent of intrinsic post-mating isolation speciation processes15. The two sibling species interbreed freely in the laboratory and do not exhibit intrinsic post-mating barriers to reproduction in the form of hybrid inviability or sterility16,17. In studies based on natural sympatric populations from Central and Eastern West Africa, hybrids were typically found to be uncommon or rare18,19,20. A longitudinal study conducted over a period of two decades in Mali shows that the reproductive isolation between M and S is unstable. A strong assortative mating that sustained the maintenance of the two populations, is periodically disrupted by episodes of hybridization, indicating temporal variation in hybridization rates21 due to the decreases of reproductive isolation followed by selections against hybrids. However, surveys undertaken in coastal areas West Africa in Senegal and The Gambiae have uncovered hybridization zones where hybrids frequencies between An. coluzzii and An. gambiae s.s. as high as 24% can be observed22,23,24.

In the absence of intrinsic post-mating reproductive barriers between the sibling species, their genetic identity is thought to be maintained through the combined effects of strong assortative mating, as evidenced by studies conducted in Mali and Burkina Faso18,25, and through extrinsic post-mating barriers to reproduction in the form of decreased hybrid fitness26. Whilst the later has yet to be demonstrated experimentally, evidence of ecological divergence between the two sibling species has accumulated, suggesting that hybrids could incur fitness costs in a variety of ways. The two sub-taxa have been shown to differ in preferred larval breeding sites27,28 and ability to detect and avoid aquatic predators. Entomological surveys and laboratory experiments have also shown that Sub-Saharan29,30,31 populations of An. coluzzii exhibit higher tolerance to desiccation stress and dominate vector populations in drier seasons and habitats32,33,34. In the same regions, the sibling species also differ in their aestivation strategies35,36 affecting their survival and/or migration strategy during the dry season. However, despite these differences, the sibling species co-exist in vast areas of West Africa and are characterized by similar levels of anthropophily and endophily.

Both sibling species mate in mating aggregations called swarms that are initiated at dusk usually within villages37. However, they tend to differ in the height at which they form swarms and the type of ground markers they use37. Such swarm spatial segregation has been observed in Burkina Faso and Mali and may play a major role in assortative mating between the sibling species38,39,40. However, mixed swarms are sometimes found at low frequencies41,42,43,44 suggesting that short-range mate recognition mechanisms must also be involved in assortative mating40,45. One such mechanism could be flight-tones harmonic convergence46,47 whilst the possible role of contact pheromones remains to be adequately explored40,48.

That hybridization results in gene flow between the two sibling species in spite of strong assortative mating has been supported by genome-wide population genetic studies first based on Short Tandem Repeat (STRs) markers49,50 and next on Single Nucleotide Polymorphisms (SNPs). The former revealed a mosaic pattern of genetic differentiation with most of the genome lacking differentiation because of ongoing gene flow and limited areas of the genome seemingly protected from recombination49,50; the later identified those regions more precisely through their higher marker density51,52. These revealed three highly genetically-differentiated regions located near the centromeres of chromosomes X, 2L and 3L suggesting that sympatric speciation in these incipient species probably involved the divergence of such ‘islands of speciation' possibly containing clusters of speciation genes and protected from gene flow through recombination suppression51,52. As of today, these three regions are the only regions that have been detected by high-density genome scans across the whole sympatric range of the two sibling species21,24,53,54. That current hybridization plays a role in the speciation process is further supported by recent studies that used Divergent Island SNPs (DIS) genotyping to distinguish heterozygous F1 hybrids from the recombinant genotypes of F1+n backcrosses in sympatric populations55. Studies based on DIS confirmed the occurrence of varying levels of hybridization across sympatric populations of the sibling species translating into varying degrees of introgression56. As an example, a recent introgression event occurred in Burkina Faso and Mali in which the entire 2L speciation island passed from An. gambiae s.s. into An. coluzzii through hybridization and selection resulting in the transfer of important pesticide resistance loci between the two species53,55,57. Interestingly, the DIS data also suggests that these populations are now regaining their specific X, 2L and 3L pericentric regions, highlighting the fact that selection does act against recombinant genotypes. Selection against F1+n backcrosses would also explain how pericentromeric islands of speciation remain significantly associated in the hybrid zones of Western coastal West Africa despite high levels of gene flow22,23,24.

Whist the extent of genetic differentiation in other areas of the genome and how much of it is due to selection for local adaptation is currently debated, the X, 2L and 3L speciation islands seem to have played and still appear to be playing a crucial role in the genomic structure of speciation in these species. The close association between the X speciation island and pre-mating isolation genes was recently experimentally demonstrated by swapping the X-island of An. coluzzii with that of An. gambiae s.s. through multigenerational selective introgression15. Females from the resulting recombinant strains differing only at their X-chromosome island strictly mated with males which had matching island type15. Because assortative mating was female driven and occurred in small cages, it highlighted a short-range mechanism, possibly involving male-expressed specific cues and female choice, and independent of the process of swarm spatial segregation.

Given the importance of spatial swarm segregation for assortative mating in natural sympatric populations of the sibling species, one would expect genes for swarming site preference to be also associated with one of the pericentromeric islands of divergence. In this study, for the first time, this association was formally tested using thousands of males and females from An. coluzzii and An. gambiae s.s. directly collected from natural swarms in sympatric populations from Burkina Faso. DIS56 was then used to genotype all individuals. Thus, in contrast to what is commonly done, and to avoid potential bias due to circularity, An. coluzzii and An. gambiae swarms were described using individual DIS genotypes rather than rely solely on one species diagnostic based on a X-linked locus44. Thereafter, the association between the genotypes of non-hybrid individuals and 1st generation (F1) and 1 + nth generation hybrid males and females (F1+n) at the X, 2L and 3L islands were associated with swarm spatial segregation. The results demonstrate the close association between the X-island and spatial swarm segregation initiated by males. In addition, there was a strong but significantly weaker association with the 3L island. The association between swarm type and the 2L island was broken down by recent adaptive introgression from An. gambiae s.s. to An. coluzzii. These results lend further support to an island-of-speciation mode of sympatric speciation in which core assortative mating and divergent ecological adaptation genes are genetically linked and protected from gene flow through pericentric recombination suppression.

Results

DIS genotyping and genotypic composition of swarms

A total of 2063 males from a total of 106 swarms was collected in Bana, Soumousso and VK7. They were genotyped for DIS and characterized as ‘pure’ An. coluzzii and An. gambiae, and F1 and F1+n hybrids (Table 1). F1 hybrid males were very rare, with only 2 found over the course of the study (0.1%), whilst F1+n hybrids were extremely abundant and made for 62.3% (N = 1285) of individuals. Hybrids and backcrosses were assigned to An. coluzzii and An. gambiae based on the majority of their DIS alleles—i.e. An. coluzzii or An. gambiae dominated the genetic background. Next, the swarms were described as monospecific when composed entirely of An. coluzzii and An. gambiae males, or mixed if composed of males assigned as both An. coluzzii and An. gambiae. The resulting swarm genetic structure indicated that, despite often being composed of many recombinant males, 103 (97.2%) swarms were monospecific and only 3 (2.8%) swarms mixed (Table 1).

Table 1 Number and percentage of An. coluzzii and An. gambiae, and F1 and F1+n hybrid males from monospecific and mixed swarms.

Hybrid frequencies in the sibling species

Male F1+n hybrids were much more abundant in An. coluzzii swarms, where they made up 94.8% of individuals. An. gambiae s.s., made up only 4.6% and were characterized by recombinant DIS genotypes (Likelihood-ratio Chi-square: χ2 = 1878.3, df = 1, n = 2021, P < 0.001). The frequency of An. coluzzii recombinant genotypes significantly increased over the course of the study in VK7 and Soumousso but this increase was significantly sharper in Soumousso where the starting frequency of recombinant was only 16.7% in 2007 but reached 93.3% in 2011 (Logistic regression: n = 1228, Location: χ2 = 0.001, df = 1, P = ns; Year: χ2 = 57.4, df = 1, P < 0.001; interaction: χ2 = 4.9, df = 1, P = 0.027) (Table 1). The two F1 hybrid males differed in the ancestry of their maternal X chromosome alleles. One was captured in 2008 in Soumousso and resulted from an An. gambiae female cross-mating with an An. coluzzi male (Fig. 1). The other hybrid male was caught in VK7 in 2011 and resulted from an An. coluzzii female mating with an An. gambiae male (Fig. 2).

Figure 1
figure 1

DIS map of males and females Anopheles coluzzii and An. gambiae collected from swarms in the village of Somousso in Burkina Faso—DIS genotypes, specific to An. coluzzii (light blue) and An. gambiae (dark blue) are shown; herozygous DIS genotypes are in yellow—SNPs nearest to centromeres are in red lettering. One F1 hybrid male with An. gambiae maternal X-island was identified in 2008 (red star) and 2 males captured in 2007 were characterized by an An. coluzzii X-islands but An. gambiae 2L and 3L islands (yellow stars).

Figure 2
figure 2

DIS map of males and females Anopheles coluzzii and An. gambiae collected from swarms in the village of VK7 in Burkina Faso—DIS genotypes specific to An. coluzzii (light blue) and An. gambiae (dark blue) are shown; heterozygous DIS genotypes are in yellow—SNPs nearest to centromeres are in red lettering. One F1 hybrid male with An. coluzzii maternal X-island was identified in 2011 (red star).

Amongst the 266 females collected from mating pairs, similar patterns of hybridizations were observed than in males, albeit no F1 hybrid were captured (Table 2). F1+n backcrossed genotypes were very common (72.9%). They made up 99% of all females An. coluzzii but only 4.4% of An. gambiae s.s. females (Likelihood-ratio Chi-square: χ2 = 257.5, df = 1, n = 260, P < 0.001). Unfortunately, no females were captured from the comparatively rare An. coluzzii swarms from Soumousso.

Table 2 Number and percentage of An. coluzzii and An. gambiae, and F1 and F1+n hybrid females from monospecific and mixed swarms.

Recombinant genotypes and islands of divergence

The majority of An. coluzzii and An. gambiae male recombinants (83.7 and 93.9% respectively) and female recombinants (72.8 and 100%) captured in swarms had DIS genotypes compatible with the introgression of one or several entire islands (heterozygous at all SNPs within a chromosome). The remainder of individuals had more complex genotypes with evidence of recombination within heterospecific introgressed islands (mixture of homozygous and heterozygous loci within 1 or several islands) (Figs. 1, 2, Supplementary Tables S1S2). In An. coluzzii males and females, backcrossed hybrid genotypes at islands of divergence mostly occurred through introgression of the 2L island, with only a single male recombinant for the 3L pericentric region. In An. gambiae, only 4 males and no females were recombinant on 2L, but 29 males and 3 females were recombinant for 3L (Figs. 1, 2, Supplementary Tables S1S2). Importantly, recombination amongst loci within the X chromosome pericentric island was never observed. Interestingly, 2 (0.1%) males captured in Soumousso in 2007 were characterized by an An. coluzzii X-islands but not An. gambiae 2L and 3L islands. According to our allelic majority rule, these males were therefore considered as An. gambiae backcrosses (Supplementary Tables S1S2). No females heterozygous for the X-island were found.

Association between swarm type and divergence islands

An examination was made of the association between swarm spatial segregation and the X, 2L and 3L speciation islands, first in males who initiate swarms, and then in females (Fig. 3). The relative proportions of pericentric islands genotypes matching that of the swarm they were captured from were compared among chromosomes by Logistic Regressions. The analyses were conducted for each sex, first on homozygous non-recombinants individuals and then recombinant individuals characterized by introgression of one or several entire heterospecific islands (Fig. 3, Table 3). Next, examination included individuals with evidence of recombination within heterospecific islands which had no bearing on the results giving the low numbers of such genotypes.

Figure 3
figure 3

Association between swarm type and divergence islands—Relative proportions of pericentric islands genotypes matching that of the swarm they were captured from for each chromosome in males (A) and females (B). Sample sizes are indicated and vertical whiskers show 95% confidence intervals. P-values are: P < 0.001***, P < 0.01**, P < 0.05*.

Table 3 Logistic regressions testing the effect of chromosome on the strength of association between pericentromeric island genotypes and swarm type.

The analyses revealed that An. coluzzii and An. gambiae male and females strictly associated with swarms that matched their X island genotype. This simple rule was the simplest and most parsimonious explanation for the distribution of island genotypes among monospecific swarms. It accounted for the spatial swarm distribution of 100% of An. coluzzii males and females, as well as 100% of An. gambiae females and 99.7% of An. gambiae males (Fig. 3). Further analyses revealed that one of the two males characterized by An. gambiae 2L and 3L genotypes had an An. coluzzii-like X chromosome (Supplementary Table S1) and, had the rDNA ITS marker locus typical of An. gambiae. Because this marker is located closer to the X centromere than the DIS markers, this indicated a rare recombination event within the X-chromosome island. The other male was An. coluzzii-like at the rDNA marker and was therefore the only exception to the rule.

Despite the high level of introgression between the sibling species, a near perfect association between the 3L and swarm specificity was found in An. coluzzii. The strength of association between genotype and swarm type did not significantly differ between the X and 3L islands in this species (Table 3), suggesting that the X and 3L islands are potentially strongly genetically associated. This contrasted with An. gambiae, where introgressed An. coluzzii islands characterized 2.1% (31/1446) and 3.0% (4/136) of male and female 3L chromosomes resulting in a significantly weaker association between swarm type and the 3L island (Likelihood Ratio test on odds ratio: P = 0.012) than the association with the X island (Fig. 3, Table 3). Finally, high levels of introgression of the 2L divergence island from An. gambiae to An. coluzzii obliterated any association between 2L and swarm type in the later species. However, in An. gambiae, the 2L was as strongly associated with swarm type as the X-island, with only a single male detected heterozygous for the 2L pericentromeric genomic area (Fig. 3, Table 3, Supplementary Table S1).

The DIS genotypic composition of the 3 mixed swarms was also examined to verify that swarm segregation was again associated with the X-chromosome. However, the data from mixed swarms was largely uninformative because all 3 mixed swarms were dominated by An. coluzzii males homozygous across all 3 islands, with very few recombinants (Supplementary Table S3). Rare An. gambiae males and females captured in these An. coluzzii swarms were all homozygous at all DIS loci (Supplementary Table S3).

Recombinant frequencies at the larval, indoor resting and swarming stages

The hypothesis that male and female recombinants at the divergent island loci might be present at the larval and/or indoor resting adult stages but would not contribute to swarms was tested by comparing their frequency for all three types of collections made in the villages VK7 and Soumousso (Supplementary Tables S4S7). The overall frequencies of introgressed X, 2L and 3L islands for each chromosome at each life stages, were compared for each species and sex using Logistic Regression (Table 4). In all 4 models, there was a large significant effect on the frequency of individuals with introgressed pericentric islands as described above. However, no significant differences between the larval, swarming adult and resting indoor stages were found suggesting that recombinants at divergence loci were represented equally and that none of the sampled life stages were selected against (Table 4).

Table 4 Logistic regressions testing the effect of chromosome and life stage on frequencies of recombinant genotypes.

Kdr introgression and species-specific 2L island recovery

The presence of the knockdown resistance (kdr) allele against pyrethroid within the 2L island from An. gambiae is thought to have driven its recent adaptive introgression into An. coluzzii in areas under extensive bednet coverage54. Compatible with this hypothesis, the frequency of homozygous (RR) and heterozygous (RS) kdr genotypes in An. coluzzii increased sharply from 2006 to 2012 in VK7 and Soumousso (Fig. 4). In contrast, it was found at high frequencies in An. gambiae throughout the study (Fig. 4).

Figure 4
figure 4

Changes in frequencies of introgressed kdr locus—The proportion of An. coluzzii individuals carrying the resistant allele (RR: homozygous kdr genotype or RS: heterozygous kdr genotypes) increased drastically from 2006 to near fixation in 2012 due to selective introgression of the 2L island from An. gambiae. The proportion of An. coluzzii individuals carrying the susceptible allele (SS: homozygous susceptible genotype) decreased over the same period.

Whilst recombination amongst DIS loci within the X island was never observed and was very rare in the 3L island (2 in 6856 chromosomes), the 2L pericentric region recently introgressed into An. coluzzii and was broken down by recombination much more commonly (288 in 6856 chromosomes) (Supplementary Table S8). This was also the case in the subset of individuals genotyped at the kdr locus (Supplementary Table S9). If An. coluzzii recombinants individuals with mixed island genotypes would be selected against, recombinant 2L island featuring an An. coluzzii-like loci with the resistant kdr allele should have a strong fitness advantage. This hypothesis was tested by comparing the frequency of An. coluzzii 2L islands bearing such characteristics in VK7, where samples sizes allowed such comparisons. This analysis confirmed that the An. coluzzii population is progressively regaining its specific 2L island after having integrated the resistant kdr allele (Likelihood-ratio Chi-square: n = 2082, df = 1, P = 0.013) (Fig. 5).

Figure 5
figure 5

Kdr introgression and species-specific 2L island recovery in An. coluzzii—Following the initial selective introgression of the 2L An. gambiae island into An. coluzzii new recombinant 2L-islands haplotypes featuring an An. coluzzii-like pericentromeric region and the introgressed kdr allele appeared. A new selective sweep associated with these haplotypes quickly led to a decrease in the frequency of An. gambiae pericentromeric 2L-island loci in An. coluzzii.

Discussion

Over the last decade, much progress has been made in understanding the behavioural mechanisms responsible for assortative mating between the sibling species of the An. gambiae complex18,40,58. In particular, an extensive body of work has demonstrated the role of swarm spatial segregation in premating reproductive isolation between An. coluzzii and An. gambiae26,44,59. This study has highlighted for the first time the genetic architecture underlying swarm spatial segregation by using extensive swarm sampling combined with DIS genotyping in Burkina Faso, a region of West Africa with recent hybridization and adaptive introgression between An. coluzzii and gambiae21,55. Compatible with these events, the frequency of F1+n backcrosses genotypes at three known speciation islands49,52, increased overtime and were more frequent in An. coluzzii indicating imbalanced bidirectional introgression between the two sibling species. Using a simple majority rule, all backcrosses could be assigned to An. coluzzii or An. gambiae genetic backgrounds allowing for the characterization of An. coluzzi and An. gambiae-like swarms. Finally, the association between male and female genotypes at the X, 2L and 3L speciation islands and each swarm type was tested, revealing a perfect association between the X-island genotypes and swarm-type and lending further support to an island-of-speciation model of sympatric speciation facilitated by pericentric recombination suppression15. The weakened association with the 2L and 3L islands was due to strong introgression of the 2L island of An. gambiae into An. coluzzii and weak introgression of 3L from An. coluzzii to An. gambiae.

Overall, the frequencies of recombinant DIS genotypes observed in An. coluzzii at VK7 and Soumousso were high and increased from 2006 to 2012 but were comparatively rare in An. gambiae. Compared to F1+n recombinants, F1 hybrid genotypes were extremely rare, with only 2 F1 male (one with a An. coluzzii-like maternal X and one with an An. gambiae one) no F1 female hybrids were found over the course of the study. These results suggest that hybridization is relatively rare whilst introgression is sustained through strong directional selection of the 2L island of speciation once passed on to An. coluzzii21,53,55,56. These patterns mirror those inferred from a DIS genotyping study conducted on samples from nearby Mali spanning 1991 to 2012, which showed low frequencies of hybridization between the sibling species except for rare episodic pulses21. Currently, the ecological and genetic factors that may determine the occurrence of such events are poorly known. Recently, Niang et al.26 suggested that the extreme asymmetric dominance by An. coluzzii over An. gambiae observed at certain times of the year in rice-growing areas of Burkina Faso could lead to temporary breakdowns in assortative mating. Thus, hybridization may be occurring regularly but in very narrow set of ecological settings that cannot usually be picked-up by usual mosquito sampling schemes. For example, a study conducted in Mali in 2006 demonstrated that the massive distribution of ITNs during the 2000s leading to a drastic reduction in populations was followed by hybridization episodes characterised by the occurrence of F1 hybrids. As a result, An. coluzzii acquired the pyrethroid resistance previously known only in An. gambiae through adaptive introgression indicated by increased rates of F1+n within the sympatric populations21. A hybridization zone was also reported in the extreme west Africa region22,23,60.

In this study, even though a large majority of recombinant genotypes identified involved introgression of an entire island, individuals with recombination within islands were also found. In An. coluzzii, most recombinant DIS genotypes involved introgression of the entire An. gambiae 2L island bearing the kdr resistance allele, albeit there were 4.2% of genotypes showing recombination within the 2L islands. In contrast, this was never observed within the X island and was infrequent within the 3L island in recombinants of either species. In An. coluzzii, recombination within the 2L island led to the emergence of haplotypes that combine the kdr resistant allele with An. coluzzii alleles at SNP loci nearest to the centromere55. Concurrent with increased insecticide-treated bednets distribution61, these haplotypes significantly increased in frequency within the time frame of this study suggesting a strong fitness advantage over other An. coluzzii 2L DIS recombinants individuals55.

Overall, the distribution and frequencies of DIS recombinant genotypes found in this study are compatible with comparative genomic studies conducted in sympatric populations across West Africa. These have shown no evidence of recombination in the X-island while adaptive introgression was repeatedly observed in autosomal centromeric areas54,57. The exception to this rule being the hybrid zone identified in the Western-most part of West Africa and where recombination within the X-island has been documented62,63.

Whilst our results show a large significant effect of chromosome on the frequency of individuals with introgressed pericentric islands, no significant difference between the larval, swarming adult and resting indoor stages were found, suggesting that recombinants at divergence loci were represented equally at each of the sampled life stages and therefore that there is no stage-specific selection against recombinant genotypes. Because positively-selected recombinants involving the 2L island dominate our samples, we cannot rule out the possibility that X and 3L recombinants are selected against.

The most important finding of this study is that the X chromosome island genotype was found to perfectly associated to the spatial segregation of swarms of An. coluzzii and An. gambiae. This was true both for male and female swarming individuals, with the exception of a single male (1 out of 2329) carrying an An. coluzzii-like X chromosome in an An. gambiae typical swarm. The association between swarm type and the 3L island was significantly weaker than the association with the X-island and completely broken for the 2L island. In An. coluzzii the 3L island was strongly associated with swarm type as was the X island, while in An. gambiae the strength of association between genotype and swarm type was not significantly different between the 2L and the X islands. These results suggested that the high level of introgression of the 2L divergent island from An. gambiae to An. coluzzii would prevent any association between the 2L island and the swarm type. Moreover, the X and 3L islands would be strongly genetically associated in An. coluzzii. Finally, the weakened association with the 2L and 3L islands confirms directional adaptive introgression of the 2L island of An. gambiae into An. coluzzii as well as some level of introgression of 3L from An. coluzzii to An. gambiae. Using a laboratory cross/backcross protocol Aboagye-antwi et al.15. demonstrated evidence of involvement of the X island in reproductive isolation in the form of short-range assortive mating in laboratory-induced mixed-recombinant swarms15. Findings from these studies of laboratory-produced recombinants, naturally occurring recombinants and spatial swarm segregation, show a strong association between assortative mating mechanisms and the X island of speciation. They also lend unprecedented support to island-of-speciation models of sympatric speciation facilitated by pericentric recombination suppression.

Methods

Study sites

Vallée du Kou is a rice-growing area developed in the early 1970s in Western Burkina Faso about 30 km North of Bobo-Dioulasso (11° 22′ 26″ N, 4° 24′ 42″ W). It contains seven villages totalling 4470 habitants. The survey was conducted in VK7 (11° 24′ 17″ N, 4° 24′ 54″ W), located on the boundary of the rice field and cotton cultivation area characterized by wooded savannah. The mean annual rainfall is about 1200 mm and the area is characterized by a rainy season from May to October and a dry season from November to April. The river Kou is a permanent source of irrigation and there are 2 rice crops per year (January–May and July–November). Because of the irrigation system, the rice fields form permanent mosquito breeding sites that are preferentially colonized by An. coluzzii26,28,41. During the rainy season, additional rainy-dependent breeding sites are very often found in depressions and ponds, allowing the development of An. gambiae.

Soumousso (11° 00′ 46″ N, 4° 02′ 45″ W) and Bana (12° 36′ 00″ N, 3° 28′ 59″ W) are villages located in the humid savannah of Western Burkina Faso. As in VK7 the rainy season occurs from May to October followed by a long dry season from November to April. The average annual rainfall ranges from 1000 to 1200 mm. In Soumousso both incipient species coexist in sympatry and their highest density occurs in September. However, the relative frequencies of the two species change over time with An. gambiae being predominant from July to November and An. coluzzii from December to June. An. funestus, An. arabiensis and An. nili are also found at lower densities41. The village of Bana is characterized by comparable seasonality but more balanced densities of An. coluzzii and An. gambiae swarming males26 even though An. coluzzii dominated largely the indoor resting fauna of the complex An. gambiae over a two-year period of collection in this village64.

Swarming mosquito collections

A survey of swarms was undertaken by trained observers in the three study sites: VK7, Soumousso and Bana. Observations were made to identify swarm sites scattered throughout the entire villages that were used every evening. Once located, these swarms were sampled every month from July to November in 2006, 2007, 2008 and 2011 in Soumousso, 2006, 2008 and 2011 in VK7 and 2012 in Bana. In order to catch a substantial number of females from swarms largely dominated by males, two captors spent the first 15 min after swarm initiation (first male arriving) capturing mating pairs as they formed, fell or flew out of the swarm. The whole swarm was then collected. Mating pairs and swarms were sampled using an insect net then aspirated into individual swarm cups as described previously25,41,65. Collected mosquitoes, including swarming males, females and mating pairs were respectively aspirated into separate cups and tubes, then killed with ethylic ether, identified morphologically as An. gambiae s.l.66 and kept in 70% ethanol in 1.5 ml tubes before being genotyped for subsequent DNA analysis.

Assignation of swarming mosquitoes to the three types of swarms

Mosquito swarms were characterized based on the PCR results which includes only one SNP in the X-chromosome as described previously26,67,68. Swarms are considered as pure types for one species if they are exclusively comprised of individuals of that species. Otherwise, swarms are considered as mixed if they are composed of individuals of both species regardless of the constituent proportions. Then mosquitoes identified at molecular level were assigned to the three types of swarms including An. gambiae pure type, An. coluzzii pure type or both species mixed type (Table 1).

Resting adult collections and larvae sampling

Except for the village of Bana and VK7 in 2007 females, indoor resting males, and females of An. gambiae s.l. were collected in each of villages during the same time periods of the swarm monitoring, using insecticide indoor spray catch. Resting females and males from different houses were aspirated into separate cups, killed with ethylic ether, identified morphologically as An. gambiae s.l.66 and kept in 70% ethanol in 1.5 ml tubes for subsequent DNA analysis.

In some years and for some locations, Anopheles gambiae s.l. larvae were also collected from larval habitats including footprint, puddle, pond, and rice field available during the same time periods as the swarm monitoring and indoor spray catch. All collected mosquito larvae were examined carefully, anopheline were separated from culicine larvae for classification into 1st to 4th instars, then immediate preservation was carried out in 70% ethanol for subsequent DNA analysis.

Mosquito identification by polymerase chain reaction (PCR)

DNA was extracted from whole bodies of adult mosquitoes identified as An. gambiae s.l., except for females whose abdomens were excluded to avoid the risk of contamination with DNA from sperm in the spermathecae. Individual DNA extractions were made using a standard protocol25 and were diagnosed to species level using the PCR developed by Santolamazza et al.68 and confirmed by PCR–RFLP (Restriction Fragment Length Polymorphism) developed by Fanello et al.67.

DNA was extracted from individual larvae using a standard protocol25. As hybrid males have a single copy of the hemizygous X chromosome, they cannot be distinguished from females using classic molecular diagnostics based on rDNA polymorphisms on that chromosome. Consequently, all larvae were sexed and genotyped to sibling species level using a single multiplex PCR as described previously that combined the Y-chromosome specific primers used for sperm detection elsewhere69 with primers targeting the insertion of a SINE (Short Interspersed Element) located in division 6 of the X-chromosome used in the species identification PCR developed by Santolamazza et al.68. The primers (S200X6-1F 5′-TCGCCTTAGACCTTGCGTTA-3′; S200X6-1R 5′-CGCTTCAAGAATTCGAGATAC-3′; S23-F 5′-CAAAACGACAGCAGTTCC-3′; S23-R 5′-TAAACCAAGTCCGTCGCT-3′) were combined into a single reaction after checking for possible dimmer problems using the tool available at http://www.thermoscientificbio.com/webtools/multipleprimer/. A PCR mixture contained 2 µl template DNA, 1.5 mM MgCl2, 0.5 µl of 5× buffer, 0.2 mM of each dNTPs, 1.5 pmol of each S200X6.1 primers, 2 pmol of each S23 primers and 0.05 units of Taq polymerase for a total volume of 25 µl was used. PCR reactions were performed on a Bio-Rad S1000™ Thermocycler with an initial denaturation at 94 °C for 10 min, followed by 35 cycles of 94 °C for 30 s, 54 °C for 30 s, and 72 °C for 1 min, followed by a final elongation step at 72 °C for 10 min. The PCR products were separated by gel electrophoresis to generate diagnostic sex and species-specific banding phenotypes26.

Mosquito divergent island SNP (DIS) genotyping

Mosquitoes identified as An. coluzzii and An. gambiae were sent to LGC Ltd for Competitive Allele Specific PCR (KASP) genotyping using the same DIS loci used in previous surveys of hybridization and introgression between the sibling species55,56. The panel features 7 SNPs located in the X-chromosome island, 5 in the 2L and 3 in the 3L that are fixed differences between An. coluzzii or An. gambiae in absence of introgression56. DIS genotypes were used to distinguish non-hybrid individuals homozygous at all alleles specific to either An. coluzzii or An. gambiae, from F1 hybrids (heterozygous at all X, 2L and 3L loci in females, and 2L and 3L loci in hemizygous males), and from F1+n backcrosses (characterized by recombinant genotypes). Backcrosses were assigned to An. coluzzii or An. gambiae based on most of the alleles specific to each sibling species present in their genotypes using a simple majority rule21,55,56. Additionally, a subset of samples was also genotyped at the L1014F West African and L1014S East African kdr mutations both located within the 2L island70,71. Except for analysed focusing on the kdr genetic sweep, individuals whose DIS genotypes were incomplete (missing SNP data) were not used in further analyses.

Statistical analyses

All statistical analyses were performed using the software JMP 10 (SAS Institute, Inc). Frequencies of recombinant genotypes were analysed through multivariate logistic regression (Likelihood ratios) and the statistical significance of pairwise comparisons established by Likelihood Ratio test on odds ratios.