Genome-wide divergence among invasive populations of Aedes aegypti in California
In the summer of 2013, Aedes aegypti Linnaeus was first detected in three cities in central California (Clovis, Madera and Menlo Park). It has now been detected in multiple locations in central and southern CA as far south as San Diego and Imperial Counties. A number of published reports suggest that CA populations have been established from multiple independent introductions.
Here we report the first population genomics analyses of Ae. aegypti based on individual, field collected whole genome sequences. We analyzed 46 Ae. aegypti genomes to establish genetic relationships among populations from sites in California, Florida and South Africa. Based on 4.65 million high quality biallelic SNPs, we identified 3 major genetic clusters within California; one that includes all sample sites in the southern part of the state (South of Tehachapi mountain range) plus the town of Exeter in central California and two additional clusters in central California.
A lack of concordance between mitochondrial and nuclear genealogies suggests that the three founding populations were polymorphic for two main mitochondrial haplotypes prior to being introduced to California. One of these has been lost in the Clovis populations, possibly by a founder effect. Genome-wide comparisons indicate extensive differentiation between genetic clusters. Our observations support recent introductions of Ae. aegypti into California from multiple, genetically diverged source populations. Our data reveal signs of hybridization among diverged populations within CA. Genetic markers identified in this study will be of great value in pursuing classical population genetic studies which require larger sample sizes.
KeywordsAedes aegypti Invasive species Population genomics California
Principal Component Analysis
single nucleotide polymorphism
Aedes aegypti has a short flight range, usually not actively moving more than 200 m from their breeding source , but is exquisitely adapted to hitchhiking in transport vehicles . One central question concerning the populations dynamics of Ae. aegypti in California (CA) therefore is whether the established populations at different locations are founded from one source population that spread across the state or if they are the result of other kinds of founding effects. A recent study revealed several, genetically distinct Ae. aegypti populations in CA presumably originating from multiple introductions from other sites in the U.S. and/or northern Mexico . Insight into the population structure of CA Ae. aegypti beyond this will be necessary to fully understand the dynamics that shape the current pattern of distribution and continuing spread of this invasive vector species.
Successful vector control can benefit from population genetics and genomics analyses which can provide estimates of gene flow and identify the genetic basis of phenotypes such as insecticide resistance  and host preference . Population genomics studies are especially critical to the development of control strategies based on genetic manipulation of vectors, which is a matter of growing interest. Modelling, planning and monitoring activities associated with control programs require affordable and rapid assays to distinguish vector sub-populations within a species and a deep understanding of the processes that shape their genetic structure. It is becoming increasingly apparent that hybridization between diverged vector populations may be an import source of new genetic material including alleles that mediate adaptations to facilitate range expansion  or that promote the evolution of resistance to insecticides . Analysis of whole-genome sequencing data is the most powerful method to detect even minor admixture . Therefore, we have applied a population genomics approach to study invasive Ae. aegypti populations. Here we report a preliminary analysis based on genome sequences of 39 individual Ae. aegypti, collected from twelve locations throughout CA, and, for comparison, four specimens from Florida and three from South Africa. The analyses presented here should serve as a step toward expanded population genomics studies aimed at understanding how invasive mosquito species become established in new locations and how distinct populations interact on the genetic level.
For our only temporal comparison, we compared the genomes of samples obtained in Clovis in 2013 with those from samples collected in 2016. Overall genome divergence is negligible (FST = − 0.025 ± 0.002). However, a whole genome scan using 1 Mbp windows for FST values indicates a number of genomic regions with markedly elevated FST values (> 0.1) (Fig. 4d). The difference in nucleotide diversity between 2013 and 2016 samples shows an increase over time in chromosome 1 and 2 (mean π2016- π2013 value of 7.22 × 10− 5 and 2.32 × 10− 5, respectively) but a decrease on chromosome 3 (π2016- π2013 value of − 1.70 × 10− 5). However, regions with a relatively large change in nucleotide diversity between 2016 and 2013 are visible on all three chromosomes, some of which also coincide with highly differentiated (FST > 0.1) regions. These highly differentiated regions with a relatively large nucleotide diversity change may indicate genomic regions under selection presumably as the founding population adapts to local environmental conditions .
Estimates of FST derived from whole genome sequence data have been shown to be accurate even with very small sample sizes (i.e. N = 2/population ). This is due to the very large number of SNPs (i.e. n> > 1000 loci) used in these analyses. The 4.65 million loci used in our analysis is well in excess of the number of loci required for an accurate assessment of FST. In addition, we evaluated various minimum read depths and missing data ratios and observed results consistent with those we report here (data not shown).
Using nuclear genome sequence data, we identified three major genetic clusters among CA Ae. aegypti. These correspond roughly to geographic regions in the state (Figs. 2 and 3). Our data support the hypothesis that Ae. aegypti in CA currently exists as multiple, mostly isolated populations. High genetic distance (FST > 0.1) as well as genome-wide differentiation (Fig. 4) support multiple introductions into CA from genetically distinct source populations as the most plausible history of this invasion.
Two major mitochondrial lineages are present within California populations, probably corresponding to previously described global clades [13, 14]. However, their genealogy differs from the nuclear genome genealogy (Figs. 5 and 6). This is comparable to a previous study using ND4 sequence analysis of Ae. aegypti populations introduced to Florida . The lack of geographic clustering of mitochondrial lineages therefore appears to be common in invasive Ae. aegypti populations and is likely due to the saltatory nature of dispersal in this species. The incongruence between nuclear and mitochondrial gene genealogies could be due to different evolutionary rates between different loci producing differing topologies [16, 17, 18]. It is possible that mitochondrial lineages capture historic divergence events, while nuclear genome divergence reflects relatively recent divergence. Linkage disequilibrium decays rapidly in mosquito genomes as seen in Anopheles arabiensis . Thus, any contact between two distinct Ae. aegypti populations may have resulted in relatively recent gene flow homogenizing populations within a locality. In this case mitochondrial markers appear to be less useful to determine relatively recent population divergence events in Ae. aegypti.
We investigated the possibility of generating SNP genotypes that are compatible with the existing Ae. aegypti SNP chip dataset  to allow for a direct comparison of our results with those previously published. The SNP positions provided in Evans et al.  were based on the initial genome assembly AaegL1 . Our BLAST results comparing AaegL1-based SNP sequences to the AaegL5 assembly revealed numerous and significant differences, including multiple matches with high (> 98%) similarity, sequence differences (arising from indel mutations), non-biallelic SNPs, polymorphisms surrounding the target SNPs, etc. (Additional file 4: Appendix S1). These often resulted in mismatched genotype calls between the two different platforms (see genotype discrepancy examples provided in Additional file 4: Appendix S1). Due to these problems a direct comparison of SNP genotype calls using the published SNP chip data with those generated from genome sequence data is deemed inappropriate and we highly recommend taking this into account when applying SNP chip analyses in the future.
Microsatellite data from [3, 4] indicated that San Mateo (=Menlo Park), Madera and Fresno samples were genetically similar to samples from the southeastern USA which includes samples from Louisiana, Georgia and Florida. Pless et al.  also included a population from Exeter, CA that was also classified together with other central CA samples and south central and southeast USA populations based on microsatellite profiles. This appears to be inconsistent with our results. Our analysis placed the three central CA populations (Menlo Park, Madera and Fresno) in a group (GC2) distinct from the group containing the southeast USA populations (Vero Beach and Key West, Florida). In our analysis, the samples from Florida clustered with populations from Exeter, CA and southern CA (GC1, Fig. 3).
Contrary to the microsatellite data, the SNP chip data from the same study  groups the Exeter population apart from all other CA populations including those in central CA, consistent with our genome-wide SNP data. Unfortunately, their SNP chip data clustering results did not include samples from the southeast USA preventing direct comparison with their SNP clustering result. This, however, could support the view that the Exeter population, introduced in 2014 is distinct from all other CA populations and that it was introduced independently, rather than resulting from local spread of Ae. aegypti within CA.
PCA analyses of the SNP chip data separated Clovis (GC3) from the GC2 cluster with some overlap . The larger number of SNPs used in our analysis (> 2.9 million biallelic SNPs compared to 15,698 SNPs) may have increased the resolution, allowing us to confidently separate the two. Our data together with previous reports strongly support multiple introductions of Ae. aegypti in California. The most likely scenario includes four independent introductions: (i) Clovis area; probably in 2013 (ii) Madera area; probably in 2013 (iii) southern CA, probably in 2014 (iv) Exeter, probably in 2014 introduced from someplace in the southeast USA like Florida. The years are based on reports from the California vector control districts. This scenario is also in line with most of the results published based on microsatellites and SNP chip data . From our data the exact origin of the introductions remains uncertain with only the Exeter population showing signs of presumable derivation from the southeast USA.
The degree of genetic differentiation found in the Clovis population between the years 2013 and 2016 (Fig. 4d) indicates the population is undergoing rapid changes in its genome, potentially reflecting local adaptation, or, less likely, drift. The only other longitudinal investigation of a CA population of Ae. aegypti that we are aware of compares genotypes of samples from 2013 and 2015 from Madera, detecting almost no change within these two years . Further investigation on genic features showing significant differences between the two time points may shed light on the genes involved in local adaptation at Clovis and the particular circumstances that drove it. Because Ae. aegypti chromosomes do not produce clearly visible polytene chromosomes like e.g. Anopheles gambiae, the detection of chromosome inversions has been challenging and the identification of precise location is at infant stage . Approximate location of diverged regions and the potential chromosome inversions noted by Bernhardt et al.  did not provide clear indication that the diverged regions we observed are due to chromosome inversions. Future studies of linkage disequilibrium could illuminate the potential role of chromosome structures in adaptation as it has been demonstrated in Anopheles mosquitoes .
The geographic origin of CA Ae. aegypti populations and the means by which they were introduced remains unclear. Perhaps the most interesting open question is what conditions facilitated multiple introductions? Answering these questions is beyond the scope of this study and requires additional data. Investigating samples from different origins using the same NGS platform may provide a clearer description of Ae. aegypti invasion history in CA. In addition, investigation describing genomic changes over time may provide information on local adaptation and potentially will be useful for the control of the species in California.
The mosquito species Aedes aegypti, introduced in 2013, has now been detected in multiple locations throughout California. Our genome analyses identified 3 distinct population groups loosely corresponding to different regions within California. Genome-wide comparisons indicate extensive differentiation between genetic clusters. Samples collected from Clovis in two different years (2013 and 2016) reveal genomic signatures of potential selection. Our mitogenome analysis suggests that founding populations were polymorphic for two mitochondrial lineages with one or the other lost in the various extant populations. These observations support recent multiple introductions of Ae. aegypti into California. This is the first paper that utilizes the whole genome sequences of Aedes aegypti field isolates. Our dataset serves an important step toward future studies aimed at understanding population divergence, gene-environment interactions, and dispersal of this invasive species.
Adult female Ae. aegypti were collected from 13 cities by personnel from Mosquito Abatement Districts in Fresno, San Diego, and Orange Counties (Fig. 2 and Additional file 1: Table S1). These mosquitoes are collected using BG Sentinel traps baited with CO2. All collections on private properties were conducted after obtaining permission from residents and/or owners. Mosquito samples were individually stored in 80% ethanol prior to DNA extraction.
Whole genome sequencing
Genomic DNA was extracted using established protocols [24, 25]. DNA concentrations for each sample were measured using the Qubit dsDNA HS Assay Kit (Life Technologies) on a Qubit instrument (Life Technologies). A genomic DNA library was constructed for each individual mosquito using 20 ng DNA, Qiaseq FX 96 (Qiagen, Valencia, CA), and Ampure SPRI beads (Beckman) following an established protocol . Library concentrations were measured using Qubit (Life Technologies) as described above. Libraries were sequenced as 150 bp paired-end reads using a HiSeq 4000 instrument (Illumina) at the UC Davis DNA Technologies Core.
Raw reads were trimmed using Trimmomatic  version 0.36 and mapped to the AaegL5 reference genome  using BWA-MEM  version 0.7.15. Mapping statistics were calculated using Qualimap version 2.2 (Additional file 1: Table S1). Joint variant calling using all samples was done using Freebayes  version 1.0.1 with standard filters and population priors disabled. We required a minimum depth of 8 to call variants for each individual following the recommendation of Crawford and Lazzaro to minimize bias in population inference . To improve the reliability of calls, we required variants to be supported by both forward and reverse reads overlapping the loci (Erik Garrison, Wellcome Trust Sanger Institute and Cambridge University, personal communication, Dec. 2014). The repeat regions are “soft-masked” in the AaegL5 reference genome and SNPs in these regions were excluded from analysis. Only biallelic SNPs were used for further analysis. A missing data threshold of 20% was used to filter SNPs. A phylogenetic tree base on the polymorphism data was constructed using the neighbor-joining algorithm as implemented in PHYLIP  version 3.696. Hudson FST , nucleotide diversity (π) and Principal Component Analysis (PCA) analyses was done in Python version 3.6.6 using the scikit-allel module version 1.2.0 .
The presence of mitochondrial pseudogenes in the nuclear genomes of Ae. aegypti could potentially confound SNP calling . Thus we followed the mapping recommendations suggested by Schmidt et al.  and mapped raw reads to the mitochondrial reference genome prior to mapping unmapped reads to the nuclear genome.
We used Ae13CLOV028MT (Genbank ID: MH348176) as a reference for mapping the mitochondrial genome because all our specimens contained a deletion between position 14,522 and 14,659 compared to the AaegL5 reference genome . Variants in the mitochondrial genome were called with Freebayes as described for the nuclear genome, but set to single ploidy. Mitochondrial coverage was on average 160 times greater than the nuclear genome coverage with a minimum of 25-fold difference (Additional file 1: Table S1). Use of properly paired reads for variant calling reduced errors generated by failing to recognize mitochondrial pseudogenes present in the nuclear genome. The Vcf2fasta program  was used to extract mitogenome sequences from the VCF file to FASTA format. MEGA version 7.0.26  was used for mitogenome alignment. Mitogenome reference sequences of Culex quinquefasciatus (Genbank accession number = HQ724617), Aedes notoscriptus (KM676219), and Aedes albopictus (NC_006817) were obtained from GenBank and added to the alignment. Sequences for the thirteen mitochondrial protein-coding genes in Ae. aegypti were obtained from GenBank , extracted from our dataset, and concatenated for tree construction with the maximum likelihood algorithm implemented in MEGA.
QGIS version 2.18 was used to create maps. Python matplotlib version 3.0.2 (https://matplotlib.org/) was used for generating plots. Inkscape (https://inkscape.org/) version 0.92 was used to edit images.
We thank Youki Yamasaki, Allison Chang, Parker Houston, Allison Weakley, Kendra Person, Hans Gripkey for assisting DNA extraction and library preparations for this study. Special thanks to Dr. Bradley Main for his comments to the manuscript. We thank personnel from Consolidated Mosquito Abatement District (Ms. Jodi Holeman and Ms. Katherine Ramirez), Delta, Greater LA County (Dr. Susan Kluh) and San Mateo County Vector Control Districts and Fresno, Madera County and Orange County Mosquito and Vector Control Districts (Mr. Michael Hearst, District Manager), Community Health Division of the Department of Environmental Health (Ms. Rebecca Lafreniere, Chief and Ms. Elizabeth Pozzebon, Director), San Diego County Dept. of Environmental Health, Vector Control, and Dr. Christopher Barker (UC Davis) for providing specimens used in this study. We also thank Dr. Leo Braack (University of Pretoria, South Africa) for assisting AJC in collection of South African samples and Dr. Danny Governer and SANParks for permitting collection from Shingwedzi in the Kruger National Park. We also thank Dr. Lutz Froenicke and his team at the UC Davis DNA technologies Core for genome sequencing.
We acknowledge funding support from the UC Davis Bridge Funding Program, UC Davis School of Veterinary Medicine Vector-borne Disease Pilot Grant Program, DARPA Safe Gene Program (HR0011-17-2-0047), and the Pacific Southwest Regional Center of Excellence for Vector-Borne Diseases funded by the U.S. Centers for Disease Control and Prevention (Cooperative Agreement 1U01CK000516). YL and MJH received salary support from the UC Davis Bridge Funding Program. YL received research supply support from the UC Davis School of Veterinary Medicine Vector-Borne Disease Pilot Program. YL, MS, JMM, GCL, and AJC received salary and material support from DARPA Safe Gene Program. HS received salary support from the Pacific Southwest Regional Center of Excellence for Vector-Borne Diseases. These funding bodies had no role in sample collection, data analysis, or writing the manuscript.
Availability of data and materials
Sequence data is available through the NCBI Sequence Read Archive (Study accession number: SRP106694). Data is also available through the UC Davis PopI OpenProjects ‘AedesGenomes’ page .
YL, GCL, FSM and AJC conceptualized experimental design. CTS, FSM and AJC collected specimens from Clovis and also obtained samples from other locations. YL and GCL assisted in obtaining specimens. YL and MJH carried out DNA extraction, library preparations and submission for genome sequencing. YL, HS, TCC and WRC conducted data analysis. MS, JMM, JCC assisted in data interpretation. All authors contributed to the writing and editing of this manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 6.Jewell D. Grodhaus G. In: Commerce and the Spread of Pests and Disease Vectors edited by Laird M. New York: Praeger Publishers; 1984. p. 103–7.Google Scholar
- 11.Hanemaaijer MJ, Collier TC, Chang A, Shott CC, Houston PD, Schmidt H, Main BJ, Cornel AJ, Lee Y, Lanzaro GC. The fate of genes that cross species boundaries after a major hybridization event in a natural mosquito population. Mol Ecol. 2018; In press.Google Scholar
- 14.Schmidt H, Hanemaaijer MJ, Cornel AJ, Lanzaro GC, Braack L, Lee Y. Complete mitogenome sequence of Aedes (Stegomyia) aegypti derived from field isolates from California and South Africa. Mitochondrial DNA Part B. 2018. https://doi.org/10.1080/23802359.2018.1495117.
- 16.Lynch M. The origins of genome architecture. Sunderland, mass: Sinauer Associates; 2007.Google Scholar
- 20.Evans BR, Gloria-Soria A, Hou L, McBride C, Bonizzoni M, Zhao H, Powell JR. A multipurpose high throughput SNP Chip for the dengue and yellow fever mosquito, Aedes aegypti. G3 (Bethesda). 2015.Google Scholar
- 25.Yamasaki YK, Nieman CC, Chang AN, Collier TC, Main BJ, Lee Y. Improved tools for genomic DNA library construction of small insects. F1000Res. 2016;5:211.Google Scholar
- 26.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014.Google Scholar
- 28.Li H: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. In.: Cornell University Library; 2013: arXiv:1303.3997v1302.Google Scholar
- 30.Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing; 2012. p. arXiv preprint.Google Scholar
- 32.Felsenstein J. PHYLIP - phylogeny inference package (version 3.2). Cladistics. 1989;5:164–6.Google Scholar
- 34.Miles A, Harding N: scikit-allel - Explore and analyse genetic variation. In., 1.2.0 edn. https://github.com/cggh/scikit-allel: GitHub; 2018.
- 38.Behura SK, Lobo NF, Haas B, deBruyn B, Lovin DD, Shumway MF, Puiu D, Romero-Severson J, Nene V, Severson DW. Complete sequences of mitochondria genomes of Aedes aegypti and Culex quinquefasciatus and comparative analysis of mitochondrial DNA fragments inserted in the nuclear genomes. Insect Biochem Mol Biol. 2011;41(10):770–7.CrossRefGoogle Scholar
- 39.UC Davis PopI OpenProjects- AedesGenomes [https://popi.ucdavis.edu/PopulationData/OpenProjects/AedesGenomes/].
- 40.CalSurv: California Surveilliance Gateway Maps. In.: California Vectorborne Disease Surveillance System; 2007.Google Scholar
- 41.Patterson T: CleanTOPO2: Edited SRTM30 Plus World Elevation Data. In. Online (author communicated that his data is published in public domain and free to use). ; 2008.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.