Introduction

Invasions of exotic pests confer a global economic burden of over $200B annually (Diagne et al. 2021). These costs have risen in recent decades and are expected to rise further as invasions continue to increase in frequency worldwide (Sardain et al. 2019). Invasive insect pests mostly establish via passive movement on ships, planes, or land vehicles, and are typically difficult to eradicate with insecticides due to resistance that is either present on arrival or evolves shortly afterwards (Gao et al. 2021). Border biosecurity operations have demonstrated success at restricting the spread of pests (Caley et al. 2015; Muzari et al. 2017), however, resource limitations constrain the number of inspections on arriving vessels and the number of border sections that can be effectively monitored.

Biosecurity operations can be greatly assisted by knowledge of pest incursion pathways, as this allows for more frequent inspection of goods and conveyances from likely source locations (Robinson et al. 2011) and regular inspection of known entry points (Myers et al. 2000; Mehta et al. 2007). While pathways can be easily inferred if incursions are detected on their vessel of entry, incursions that are detected away from vessels pose a greater challenge. Moreover, it is not always clear if these individuals are from the original incursions or from cryptically established invasive populations, the latter of which presents a new problem requiring a shift in management strategy. These biosecurity issues link to current areas of research in pest ecology, such as inferring the direction of passive movement (Hulme et al. 2008), changes in movement vectors over time (Zeigler and Fagan 2014), establishment likelihood as a function of the size and frequency of introductions (i.e. propagule pressure) (Stringham and Lockwood 2021), and population growth rates following establishment (Tobin et al. 2011). Knowledge of passive movement pathways, propagule pressure, and cryptic establishment can help inform border biosecurity programs such as the program that has stopped the Asian tiger mosquito Aedes albopictus from invading the Australian mainland for 18 years so far (Muzari et al. 2017).

Genomic databanks are an emerging technology that show promise for investigating these processes. When databanks contain samples covering a sufficient geographical range, they can indicate specific movement of incursive pests across international borders (Schmidt et al. 2019, 2020b; Popa-Báez et al. 2021; Parvizi et al. 2023) and among geographical regions (Chen et al. 2021; Schmidt et al. 2021c). Rather than inferring the origin of established invasive populations, which can certainly be achieved with such databanks (Gloria-Soria et al. 2018; Sherpa et al. 2019; Schmidt et al. 2020a; Kelly et al. 2021; Yan et al. 2021), these emerging methods apply pairwise kinship tests or assignment tests to incursive individuals directly. Genomic signals of inbreeding and kinship patterns within and among the incursive individuals may also provide further insight into incursion size and cryptic establishment.

While pest management is beginning to employ genomic databanks, studies have yet to evaluate their power to infer specific movement when reference samples are genetically similar across space, as is common in pest populations (Fitzpatrick et al. 2012), or when tens of generations of drift and migration have occurred since sampling. These questions are vital for the wider uptake of these methods as they determine the size of the initial sequencing investment and the length of time a databank will remain representative of extant populations. This study considers these questions directly in an analysis of two globally invasive mosquito species, the dengue mosquito, Aedes aegypti and the tiger mosquito, Ae. albopictus. Both species are readily transported by human conveyances including boats and planes, and genomic studies of recent invasions have found these are often linked to multiple geographically distinct source locations (Schmidt et al. 2021a). Here we focus on recent detections of these species within Australia, away from entry ports and of unknown provenance.

For Ae. aegypti, we trace an invasion of Tennant Creek, Northern Territory, that was detected in early 2021 and established rapidly across the town. This species had previously been eliminated from Tennant Creek in 2006, following its establishment in 2004, and again in 2014 following its establishment in 2011 (Pettit and Kurucz 2014; Whelan et al. 2020). An analysis of variation in the mtDNA COI gene concluded that the 2004 incursion had most likely originated from Cairns or somewhere nearby, and had probably been transported into Tennant Creek as eggs in dry receptacles (Beebe et al. 2005; Whelan et al. 2012). Both elimination programs were funded in collaboration between the Northern Territory Government and the Commonwealth Department of Health to prevent the establishment of Ae. aegypti in the Northern Territory and subsequently in Western Australia.

For Ae. albopictus, we trace a set of recent incursions detected in the Torres Strait, Queensland, on islands where this species had been successfully eliminated (Muzari et al. 2017). The Torres Strait invasion of this species began in 2004 (Ritchie et al. 2006) and has since been shown to have a predominantly Indonesian genetic background (Beebe et al. 2013; Schmidt et al. 2021c). This background is distinct from other Ae. albopictus including those from nearby Papua New Guinea, except for the South Fly District, which may have been invaded from the same source as the Torres Strait (Maynard et al. 2017). Since 2008, a cordon sanitaire biosecurity strategy was imposed to regulate the movement of people and goods between the invaded islands of the Torres Strait and the uninvaded southern islands, which has so far successfully stopped the spread of Ae. albopictus to the Australian mainland (Muzari et al. 2017). Islands south of the cordon-sanitaire undergo periodical surveillance for Ae. albopictus incursions, and a previous study of this region linked three 2019 incursions to two islands, Keriri and Moa (Schmidt et al. 2021c).

In testing these ideas, we show that the deep learning method Locator (Battey et al. 2020) provides remarkably high levels of unambiguous and unbiased assignment during cross-validation of reference samples. This method also provides inferences of passive movement through precise tracing of incursion samples to their geographical origins, despite high genetic similarity, strong effects of drift, and tens of generations of evolution separating incursions and reference samples. In the case of Ae. aegypti, precision in tracing was further improved by targeted assays for insecticide resistance and a Wolbachia bacterial infection, which follow releases to establish Wolbachia in north-eastern Australia over the last decade (Schmidt et al. 2017; O’Neill et al. 2019; Ryan et al. 2020). Finally, by investigating genomic patterns within and between incursive individuals, we show that the Ae. aegypti invasion was likely sourced from a single geographical location and potentially from a single family, while the Ae. albopictus incursions were independent and showed no signs of cryptic establishment.

Materials and methods

Mosquito sample collection

Aedes aegypti invasion of Tennant Creek, NT, Australia

Aedes aegypti was detected in Tennant Creek during routine surveillance between 22nd and 26th February 2021. Four adult mosquitoes from this invasion (three females, one male) were collected from a single property on 26th February. Follow-up sampling between 9th and 11th March located an additional five adults (three females, two males) and nine immatures at four properties within ~ 80 m of the original detection. After sampling, the invasion was found to have established across Tennant Creek.

To trace the source of this invasion, we built a genomic databank from 80 Ae. aegypti sampled from 11 locations across its Australian distribution in Queensland, from the northern islands of the Torres Strait (9.3° S) to Goomeri in southeast Queensland (26.2° S) (Fig. 1a, b). These were sampled between 2009 and 2017. We also included previously-generated sequence data from Cairns and Townsville. Both of these cities have been subject to Wolbachia mosquito releases for dengue transmission control, with Cairns releases beginning in 2013 (Schmidt et al. 2017) and Townsville releases in 2014 (O’Neill et al. 2019). The Cairns sample was taken in 2015 after releases (Schmidt et al. 2018), and included some individuals infected with Wolbachia, whereas the Townsville sample was taken in 2013 before releases (Rašić et al. 2016). The Townsville sample thus represents a pre-Wolbachia snapshot of this population.

Fig. 1
figure 1

Distributions (a, c) and sample locations (b, d) of Ae. aegypti (a, b) and Ae. albopictus (c, d). Black squares indicate current Australian distributions based on Atlas of Living Australia records. Blue circles show locations of reference samples. In (b), the orange square indicates Tennant Creek. In (d), the orange square indicates Thursday Island and the red square indicates Horn Island. Maps use shapefiles made available by www.naturalearthdata.com (a–c) and the Torres Strait Clear sky Landsat (d; https://eatlas.org.au/data/uuid/71c8380e-4cdc-4544-98b6-8a5c328930ad)

Aedes albopictus incursions in the Torres Strait Islands, QLD, Australia

Aedes albopictus incursions were detected during routine surveillance on two islands in the Torres Strait where the species had been successfully eliminated (Muzari et al. 2017). Surveillance conducted in January and February 2021 detected four adult individuals on Thursday Island and three on Horn Island (Fig. 1c, d). Surveillance in January 2022 detected a further four adults on Thursday Island and five on Horn Island. In both years and on both islands, incursions were detected across a wide geographical range (Figures S1–4). There were no further detections of Ae. albopictus on either island over the periods March–December 2021 and February–June 2022. This may reflect a failure of Ae. albopictus to establish on these islands, but these observations also fit with cryptic establishment at low frequencies.

To trace the source of these incursions, we used a genomic databank containing previously generated sequence data of 240 Ae. albopictus from 12 Torres Strait Island communities across 11 islands, collected in 2018 (Schmidt et al. 2021c) (Fig. 1c, d). These samples were individually georeferenced and were collected from across each island.

DNA sequencing and bioinformatic processing

Genomic DNA was extracted from the 98 Ae. aegypti and 16 Ae. albopictus using Qiagen DNeasy® Blood & Tissue Kits (Qiagen, Hilden, Germany). Extracted DNA was used to build double digest restriction-site associated DNA (ddRAD; Peterson et al. (2012)) sequencing libraries following the protocol of Rašić et al. (2014). A technical replicate of one of the Torres Strait Island incursion samples from Thursday Island, TI21-1, was taken through the entire protocol after DNA extraction. Libraries were individually barcoded, then sequenced on a HiSeq 4000 using 150 bp chemistry.

New and old sequence data were combined for each species and run through the same bioinformatic pipeline. We used the Stacks v2.54 (Rochette et al. 2019) program process_radtags to demultiplex sequences, trim them to 140 bp and remove sequences with Phred scores below 20. As the Cairns and Townsville libraries were produced using 100 bp chemistry, we trimmed these to 80 bp. Sequences were aligned to the nuclear genome assembly for Ae. aegypti, AaegL5 (Matthews et al. 2018), and the linkage-based assembly for Ae. albopictus, AalbF3 (Boyle et al. 2021). Alignment was in Bowtie2 v2.3.4.3 (Langmead and Salzberg 2012) using “–very-sensitive” settings. Sequences were built into catalogs for each species using the Stacks program ref_map. The Stacks program “populations” was used to export VCF files containing SNP genotypes for all individuals in each catalog, filtering to retain SNPs called in at least 50% of individuals from each population and 90% of individuals total, and with a minor allele count ≥ 3 (Linck and Battey 2019). Two of the 18 Tennant Creek Ae. aegypti and 1 incursive Ae. albopictus from Thursday Island in 2022 had too much missing data for tracing (> 60%) and were omitted. We also omitted ten Ae. aegypti and three Ae. albopictus reference samples with > 30% missing data. All other samples had < 30% missing data.

Cross-validation and filtering of reference datasets

To assess the robustness of the reference datasets we performed cross-validation. Each individual in turn was omitted from the training set and treated as an individual of unknown location, and then Locator (Battey et al. 2020; see below) was used to infer the geographical origin of the individual. Assignments were assessed qualitatively for accuracy (i. e. is its origin unambiguous?) as well as quantitatively for precision (i. e. how far from its true origin?). For the Ae. aegypti dataset, we treated nearby samples from Bluff, Capella, and Emerald (QLD, max separation 120 km) as one location, as well as combining those from Duaringa, Mt Morgan and Rockhampton (QLD, max separation 95 km) and those from Gin Gin and Goomeri (QLD, separation 133 km). This was because sample sizes were small (5) from each specific location. Cross-validation was also used to test two site filtering protocols: where SNPs were not filtered for linkage; and where SNPs within 25 kbp of other SNPs were filtered.

Incursion tracing

Genome-wide genetic variation

We used Locator (Battey et al. 2020) to infer the origin of the 16 invasive Ae. aegypti and the 16 incursive Ae. albopictus (including the technical replicate). Locator is a deep learning method that uses individual genotypes of known geographical location (i. e. the reference individuals) to predict the locations of a set of individuals of unknown location (i. e. the incursive mosquitoes). For each incursive mosquito, we set Locator to provide a point estimate of its origin and 1,000 bootstrap subsamples to indicate confidence around these estimates.

We benchmarked Locator’s performance against inferences derived from principal components analysis (PCA), a common tool for investigating genomic similarity among populations. PCAs were run in the R package LEA (Frichot and François 2015), using samples in this study and others from the Indo-Pacific region previously published in Schmidt et al. (2020a, 2021c).

Wolbachia infection

The Tennant Creek Ae. aegypti were assessed for infection of wMel Wolbachia, which are present at high frequency in Ae. aegypti populations in Cairns, Townsville, and several other locations in northern Queensland, but not in central Queensland populations. Wolbachia infection was assessed using a high-throughput PCR assay (Lee et al. 2012) which was run twice on all 18 samples.

In the case of uninfected samples, we also considered the possibility that the wMel infection was initially present but then lost from the invasive population due to the vulnerability of this Wolbachia strain to high temperatures (Ross et al. 2017) that are common in Tennant Creek. As Wolbachia is coinherited with mtDNA through the maternal line (Hoffmann and Turelli 1997), wMel-infected individuals will all have inherited the same mtDNA haplotype from the same common ancestor within the past ~ 10 years. This mtDNA haplotype will be retained even if Wolbachia has been lost (Yeap et al. 2016), and its presence would be revealed by high genetic similarity in mtDNA between the Tennant Creek Ae. aegypti and samples infected with wMel.

To test this hypothesis, we looked at mtDNA variation among our Ae. aegypti samples alongside variation from 38 wMel-infected (Wol+) and 87 uninfected wildtype (Wol−) individuals from Cairns, sampled in 2015 (Schmidt et al. 2018). We aligned demultiplexed sequences to the Ae. aegypti mtDNA genome assembly (Behura et al. 2011) and called mtDNA SNPs using Stacks. We used VCFtools to output mtDNA genotypes for all individuals with minimum depth of 5, omitting SNPs with any heterozygous genotypes which could represent NUMTs (Hlaing et al. 2009).

Pyrethroid resistance mutations

The Tennant Creek Ae. aegypti were assessed for three pyrethroid resistance mutations, V1016G, F1534C, and S989P, which are common to Ae. aegypti across the Indo-Pacific (Endersby-Harshman et al. 2020), but not within Australia (Endersby-Harshman et al. 2017). Monitoring mosquito incursions for these mutations is standard practise to confirm susceptibility to insecticide-based control and to ensure these mutations are not introgressing through Australian Ae. aegypti as they have through other Indo-Pacific populations (Endersby-Harshman et al. 2020). For each individual, we ran three replicates of the Custom TaqMan® SNP Genotyping Assays (Life Technologies Corporation, Carlsbad, CA, USA) described in Endersby-Harshman et al. (2020), using a LightCycler II 480 (Roche, Basel, Switzerland) real time PCR machine in a 384-well format.

Genetic relatedness and inbreeding in the incursions

We investigated patterns of genetic relatedness (k) among pairs of individuals (dyads) in the Tennant Creek and Torres Strait incursions, and calculated Wright’s inbreeding coefficient (F) for every individual. The patterns can help indicate whether the samples at either location had arrived as part of a single incursion (i.e. on a single vessel or set of vessels at a single time point) or as several distinct incursions, and whether cryptic establishment has taken place.

For instance, if specific incursive individuals have higher k scores than other dyads in the sample, these are likely to be close kin, and as mosquitoes have very large population sizes it is most likely that they are related offspring of a single female that has laid eggs after the incursion or on a vessel that transports the incursion. If incursive individuals have higher F scores than other individuals, they are likely to be the product of inbreeding following a single incursion of one or a few individuals. High F scores can thus also indicate cryptic establishment if incursion samples are inbred but no established invasion is apparent.

We used Loiselle’s k (Loiselle et al. 1995) to estimate genetic relatedness, calculated in SPAGeDi (Hardy and Vekemans 2002) using the linkage-thinned datasets for each species. Loiselle’s k describes correlations in the frequencies of homologous alleles, summarised across the genome (Loiselle et al. 1995). We used this estimator as it makes no assumptions regarding F and is suitable for markers with low polymorphism (Vekemans and Hardy 2004). We calculated F in VCFtools (Danecek et al. 2011). As estimators of individual-level variation can be biased by uneven sample size and missing data (Schmidt et al. 2021b), we subsampled and refiltered each databank before calculating F. For Ae. aegypti, we used the Tennant Creek (n = 16), Townsville (n = 16) and Cairns (n = 15) samples. For Ae. albopictus, we used n = 15 samples from each reference population and the n = 15 incursion samples. We refiltered to omit all sites with any missing genotypes (–max-missing 1) and to keep only sites with mean depth ≥ 20 (–min-meanDP 20), and did not filter singletons and doubletons.

Results

Cross-validation and filtering of reference datasets

Cross-validation indicated that different filtering settings were optimal for each species (Fig. 2). In Ae. aegypti, 4.9% (5 out of 102) of reference samples were assigned to the incorrect origin when SNPs were thinned for linkage, and 6.9% (7 out of 102) without thinning. Unambiguous assignment of a sample to its true origin occurred in 88% of samples (thinned) or 76% of samples (no thinning). The average error was 47.1 km (thinned) or 60.7 km (no thinning), out of a maximum distance between samples of 1881.9 km.

Fig. 2
figure 2

Cross-validation of genomic reference data. SNPs were either not thinned by linkage (dark symbols) or thinned to 25 kbp (light symbols). Points represent the proportion of individuals unambiguously assigned to each reference location

In Ae. albopictus, 10.5% (25 out of 237) of samples were assigned to the incorrect origin for both thinning treatments. Unambiguous assignment occurred in 84% of samples (thinned) or 86% of samples (no thinning). One location, Badu Island, performed particularly poorly in the thinning treatment, with only 30% of individuals unambiguously assigned compared to 60% without thinning. Average error was 11.6 km (thinned) or 11.2 km (no thinning), out of a maximum distance of 213.4 km. There was no evidence in either species that incorrect assignments were biased to specific locations. We used the thinned dataset for Ae. aegypti tracing and the unthinned dataset for Ae. albopictus tracing, the latter of which was chosen due to the poor assignment to Badu Island with the thinned dataset.

Incursion tracing

Aedes aegypti invasion of Tennant Creek, NT, Australia

Genomic tracing with Locator was run sequentially on different subsets of samples. An initial run with all 102 reference samples and 11,866 thinned SNPs indicated all 16 samples had a likely origin between Cairns (16.9° S) and Rockhampton (23.4° S) (Figure S5). Rerunning Locator without reference samples outside this range and 11,495 SNPs indicated a likely origin of Townsville (Fig. 3). We checked whether this assignment to origin was biased by Townsville’s geographically central location by rerunning Locator using either Townsville and Cairns only (Figure S6) or Townsville and the southern samples only (Figure S7). In each case, incursives were still placed around Townsville, but Townsville and Cairns were slightly harder to differentiate.

Fig. 3
figure 3

Origins of the 16 Tennant Creek individuals inferred from genome-wide genetic variation. Blue circles indicate reference sample locations, red circles show inferred origins from 1000 bootstrap replicate runs of Locator. Figures S5–7 show results of similar analyses using the whole dataset (S5) or different subsets of the dataset (S6–7)

Assays for wMel Wolbachia indicated none of the Tennant Creek samples were infected with wMel, in contrast to positive controls from an infected line. Analysis of 116 mitochondrial SNPs showed that pairwise FST between Tennant Creek and Townsville was lower (FST = 0.24) than between Tennant Creek and either the wMel infected or uninfected Cairns samples (both FST = 0.85) (Table 1). This indicates that no wMel infection was ever present in the Tennant Creek samples or their maternal ancestors, otherwise the Tennant Creek sample should be more similar to the Cairns samples given that Cairns provided the genetic background for the original wMel infected strain (Walker et al. 2011). In the pyrethroid resistance assays, none of the three resistance alleles was present in the Tennant Creek samples, consistent with the absence of resistance alleles in North Queensland populations (Endersby-Harshman et al. 2017).

Table 1 Pairwise FST of mtDNA variation among Tennant Creek, Cairns, and Townsville populations

Aedes albopictus incursions in the Torres Strait Islands, QLD, Australia

Locator was run on Ae. albopictus using 73,685 SNPs. Differences in tracing outcomes were evident between the 2021 and 2022 incursions (Fig. 4), and these were not due to missing data which was similarly low in both years. For the 2021 samples, the only incursive individual that could be assigned to a specific location at high confidence was the Thursday Island sample TI21-1 and its technical replicate, assigned to the St Pauls community on Moa Island. Several of the other 2021 incursives had a likely origin in the Maluilgal (Western) island group, though there was considerable variation among bootstrap replicates. The 2022 samples showed much better assignment, with seven of the nine clearly linked to Keriri Island and six of these showing very low variation among bootstrap replicates. This included one incursive collected next to the airport (HI22-4). Keriri and Moa Island are the two islands to which incursions have previously been linked in this system (Schmidt et al. 2021c).

Fig. 4
figure 4

Origins of the 15 Torres Strait Islands individuals inferred from genome-wide genetic variation. Samples were detected on Horn Island (HI) or Thursday Island (TI) in either 2021 or 2022. A technical replicate (R) was made of the TI21-1 sample. Blue circles indicate reference sample locations, red circles show inferred origins from 1000 bootstrap replicate runs of Locator

Comparisons with PCA

PCA-based assessment of genetic clustering provided limited insight into the origin of the Ae. aegypti invasion (Fig. 5a–c). In each PCA, Tennant Creek Ae. aegypti were strongly differentiated from all other samples by variation on the PC1 axis. Tennant Creek did not cluster with any other population even at very broad scales, though were more closely linked with Australian samples than others from the Indo-Pacific (Fig. 5a).

Fig. 5
figure 5

PCAs of Ae. aegypti (a–c) and Ae. albopictus (d–f). PCAs show genetic relationships between incursions and global populations (a, d), Australian populations (b, c), Torres Strait Island clusters (e), and Torres Strait Islands (f). Islands that clustered apart from incursion samples in e were omitted for f. Island clusters in f are indicated in parentheses

Within Australia, Tennant Creek mosquitoes were closest to Cairns and Townsville (Fig. 5b), though when analysed against only these two populations they could not be linked to either (Fig. 5c). The possibility of the Tennant Creek invasion coming from an unsampled, highly differentiated population was very low, however; when analysed alongside Cairns and Townsville with the program Populations, Tennant Creek had far fewer private alleles than either Cairns or Townsville (Table 2). The proportion of polymorphic loci was likewise much lower in Tennant Creek. These findings strongly suggest that the differentiation of Tennant Creek on PC1 was not due to its genetic uniqueness but rather due to drift.

Table 2 Populations summary statistics for Cairns, Townsville and Tennant Creek

PCAs were able to link the Torres Strait incursions to the Torres Strait reference samples (Fig. 5d), though not at the specificity seen in the Locator analysis. PCA of the Torres Strait locations (Fig. 5e) and a subset of Torres Strait Islands that genetically overlapped the incursions in PC space (Fig. 5f) could not indicate specific islands of origin. Most of the incursion samples were placed near the origin, next to Keriri of the Inner Island cluster, but also overlapped with samples from other island clusters (Fig. 5e). The other two incursion samples (TI21-1 and its replicate, TI21-1r) were placed with the Moa Island populations of St Pauls and Kubin, which accords with Locator placing these samples in St Pauls (Fig. 5f, c.f. Fig. 4).

Genetic relatedness and inbreeding in the incursions

Figure 6 displays pairwise Loiselle’s k scores for the Tennant Creek (6a) and Torres Strait Island (6b) dyads in rank order, and Wright’s inbreeding coefficient (F) scores in the subsampled and refiltered datasets (5073 SNPs for Ae. aegypti (6c); 3223 SNPs for Ae. albopictus (6d)).

Fig. 6
figure 6

Genetic relatedness (k) and Wright’s inbreeding coefficients (F) for Ae. aegypti (a, c) and Ae. albopictus (b, d). Loiselle’s k scores are of dyads in Tennant Creek (a) and the Torres Strait Islands (b), where dark squares show correlations between homologous alleles in two different individuals and light squares show correlations within the same individual. F scores are of Ae. aegypti (c) and Ae. albopictus (d), where red squares indicate invasive individuals and grey circles indicate other individuals

Loiselle’s k scores represent either correlations between homologous alleles in two different individuals (dark squares) or the same individual (light squares). The relative magnitudes of k scores within and between individuals were different in the two datasets. In the Torres Strait Islands, k scores within individuals were all higher than the k scores between individuals (Fig. 6b). This pattern means that the two chromosomes within each individual are more genetically similar to each other than to chromosomes found in other individuals. The higher k within individuals suggests that none of the Torres Strait Island incursives are likely to be from the same family, as dyads from the same family would have chromosomes with high genetic similarity via identity by descent. Instead, these results are evidence that the 15 Torres Strait Island incursives each represent distinct incursions. This pattern was not observed at Tennant Creek, where k scores within individuals were mostly undifferentiated from and in many cases lower than k scores between individuals (Fig. 6a). Chromosomes within individuals are thus no more genetically similar than they are to chromosomes in other individuals. This suggests that the 16 Tennant Creek individuals are all likely to be from the same exact geographical location and are potentially all from the same family.

The above inferences were further supported by variation in F scores. In Tennant Creek, F scores indicated much higher inbreeding relative to the Cairns and Townsville reference individuals, suggesting that the Tennant Creek invasion was sourced from a single family that had been inbreeding following establishment (Fig. 6c, red squares). In the Torres Strait Islands, F scores in incursive individuals (Fig. 6d) were of similar magnitude to reference individuals, suggesting that the incursive mosquitoes were not from lineages that had been recently inbreeding.

Discussion

Genomic data allow for powerful analyses of pests, including detecting specific movement of individuals between invaded regions and into new regions. These movement inferences relate directly to organismal life histories and thus provide a means of using genetic data in ecological contexts, such as understanding passive movement (Hulme et al. 2008), how movement vectors change over time (Zeigler and Fagan 2014), and the extent of passive movement required for establishment (Stringham and Lockwood 2021). This study has demonstrated the use of a deep learning program, Locator (Battey et al. 2020), to infer geographical origins of invasive Aedes mosquitoes and, in doing so, infer the recent movement of specific individuals. Also, by comparing genetic structure within individuals, we found high levels of inbreeding in a recently-established Ae. aegypti invasion, which suggests that the Ae. aegypti invasion was sourced from one location and from a small genetic pool (e. g. a few individuals or a larger group of related individuals). There were no elevated levels of inbreeding in the Ae. albopictus incursions, suggesting that no cryptic establishment of Ae. albopictus has taken place in the Torres Strait.

As well as answering these specific biosecurity questions, this study helps clarify some broader ideas around the use of genomic databanks in pest management. Most importantly, we have shown via cross-validation that the databanks used in this study are sufficiently powered for precise and unbiased tracing, and our incursion tracing results confirm that this power remains high despite tens of generations of evolution and low geographical genetic structure. If sequences that adequately represent an individual’s true population are not included in the reference dataset, Locator tends to place the individual around the centre of the sampling range (Battey et al. 2020). This was not observed here, as the Tennant Creek invasion was traced to Townsville irrespective of its relative position in the sampling range (Figures S6–7) and the Torres Strait incursives were traced to specific islands in the Western edge of the range (Fig. 4). As several Torres Strait incursives were not traceable to any specific island, it is possible that these are from an unsampled location with a similar genetic background such as South Fly District in Papua New Guinea or an unsampled location in the Torres Strait.

Cross-validation also indicated different optimal filtering settings for the two databanks. While Locator’s developers have suggested to subsample bootstrap replicates from a set of unlinked SNPs (Battey et al. 2020), we found that thinning led to slightly worse cross-validation in Ae. albopictus, though in Ae. aegypti the unthinned dataset performed much worse (Fig. 2). This result may be due to the smaller amount of genetic variation in the Ae. albopictus databank, which when thinned was reduced to only 4,147 SNPs, as opposed to 14,648 for Ae. aegypti. Whole genome databanks would presumably provide much more genetic variation for tracing, but for species with large genomes the larger sequencing effort required for these may have to be offset by smaller sample sizes. Thankfully, whole genome data can be downsampled for compatibility with reduced-representation databanks, which is unfortunately not the case for reduced-representation data produced with different restriction enzymes which currently prohibits analysis of different global datasets (Schmidt et al. 2021a).

For databanks to provide value for money, they should be reusable for tracing over years or decades. However, genetic drift and dispersal will ensure that reference samples in the databank are increasingly less representative of extant populations with each passing generation. Our analysis of Ae. aegypti showed that the 2021 Tennant Creek invasion samples could be traced to the Townsville genetic background represented by samples collected in January 2014. Assuming 10 generations/year for Aedes, the Tennant Creek and Townsville samples should be separated by ~ 70 generations of evolution, plus several generations of strong genetic drift between invasion and detection. The databank of Torres Strait Ae. albopictus showed similar robustness in tracing, wherein a majority of the incursives detected 3–4 years (~ 30–40 generations) after the reference samples could be traced to two specific islands with very high confidence (Fig. 4). This is particularly noteworthy given the much finer spatial scale of the Torres Strait invasion and the low genetic differentiation among islands (FST ≈ 0.03, Schmidt et al. (2021c)). Differentiation between Cairns and Townsville Ae. aegypti is similarly very low (FST′ = 0.018, Schmidt et al. (2020a)) but assignments were unambiguous (Figs. 3, S6).

Using targeted assays for Wolbachia infection and pyrethroid resistance, combined with mtDNA data from the databank, we were able to increase the confidence of the origin of Tennant Creek Ae. aegypti over using nuclear genomic data alone. These findings indicate that the invasion may have been sourced from an unsampled location near to Townsville or from a region of outer Townsville where the wMel infection is uncommon, as wMel has remained close to fixation in central Townsville since 2018 (O’Neill et al. 2019) and is now present in several nearby towns (see https://www.worldmosquitoprogram.org/en/global-progress/australia/townsville). Importantly, the invasion may pose a greater biosecurity threat than one from central Townsville, as without the wMel infection, the Tennant Creek Ae. aegypti should be able to transmit dengue without the transmission blocking effects of wMel (Walker et al. 2011). However, our resistance assays show that the invasion remains susceptible to pyrethroid-based control, like other populations in Australia (Endersby-Harshman et al. 2017).

The different patterns of genetic relatedness observed at Tennant Creek and the Torres Strait provide a useful indication of the genetic characteristics of each invasive system, in that we can infer the incursions into the southern Torres Strait Islands are independent from each other, whereas the invasion of Tennant Creek is from a single source. These genetic inferences align with detection locations, where the Tennant Creek samples were collected within ~ 80 m of each other while the Horn and Thursday Island incursions were spread across each island’s range. Understanding the independence of incursions can be useful for building predictive models of pest incursion risk as it can indicate propagule pressure (Camac et al. 2021), though these estimates will reflect effective rather than census population sizes of incursions. The findings at Tennant Creek indicate that greater caution will be required for studies seeking to investigate new invasions using close kin dyad methodologies (Jasper et al. 2022; Waples and Feutry 2022). While close kin dyad methods have shown promise for investigating invasions of ~ 100 generations age (Schmidt et al. 2021c), close kin methods should be applied cautiously to invasions that are very new or that are sourced from a small genetic pool.

Conclusions

The recent and ongoing reduction in sequencing costs has made genomic studies of pests increasingly affordable. A pest genomic databank contains a wealth of information relevant to control, some of which is immediately available (e.g. population structure) and some of which can be revealed over time, such as through tracking the evolution of insecticide resistance or tracing ongoing invasions, incursions, and migration between invaded regions. A concern of pest genomic tracing is that drift and migration will make databank samples obsolete, requiring frequent resampling and sequencing. This study has investigated these concerns, and found that genomic databanks can remain highly powered for precise tracing for at least 70 generations of evolution. Additionally, cross-validation showed that ~ 87% of reference samples could be assigned unambiguously, even though both databanks had low genetic differentiation due to proximity or recent invasion histories. These results suggest that pest genomic databanks can provide valuable tracing insights for many years, even at finer geographical scales. Considering our results specifically, in the Torres Strait, the lack of inbreeding among incursives and the predictability of their origins suggest that the current cordon-sanitaire biosecurity approach (Muzari et al. 2017) is working effectively, as there is no evidence of cryptic establishment on either island. In Tennant Creek, our results highlight the risks of ongoing reestablishment of Ae. aegypti given this invasion was sourced from only a very small number of mosquitoes.