Introduction

The invasion of North American forests by European earthworms is indicated by the short- and long-term environmental changes they cause. These lumbricid earthworms actively reduce the litter layer on the soil surface and mix the litter and humus layers, which indirectly reduces soil carbon and nitrogen over the long-term and depletes other soil minerals (Crumsey et al. 2015; Resner et al. 2015; Ferlian et al. 2020). Through their burrowing activity, they also influence the availability of nutrients to plants by disrupting mycorrhizal networks (Paudel et al. 2016; Dobson et al. 2017). Additionally, their presence in North American forests has facilitated the invasion of non-native plant species by selective seed predation and changes in nutrient availability (Nuzzo et al. 2009, 2015; Drouin et al. 2014; Craven et al. 2017). In consequence, non-native earthworms indirectly and directly affect nutrient availability to soil and aboveground communities and have irreversible consequences for forest ecosystems (Eisenhauer et al. 2007; Fisichelli et al. 2013; Fisichelli and Miller 2018; Ferlian et al. 2018; Frelich et al. 2019). Thus, it is important to identify potential introduction pathways in order to protect undisturbed native communities and to prevent future invasions.

The role of human transport for earthworm introductions and dispersal into the wild is well established. Disposal of earthworms used as fishing bait and passive transport by vehicles along roads have been identified as central drivers (Holdsworth et al. 2007; Keller et al. 2007; Cameron et al. 2007, 2008; Klein et al. 2017, 2020). Further, human disturbances seem to provide suitable conditions for non-native earthworm invasions, such that active or historic human settlements, construction sites, agricultural use, and recreational areas and fishing sites contain more non-native earthworms than wilderness and undisturbed sites (Hendrix et al. 2006, 2008; Callaham et al. 2016; Rogers and Collins 2017; Arcese and Rodewald 2019). The identification of vulnerable sites and vectors for earthworm invasions is necessary to formulate containment guidelines, and knowledge of the local earthworm population structure is required to understand how invasive earthworms establish viable populations.

Introduction from distant or locally adapted populations can increase genetic diversity, allowing populations to mitigate founder-effects and increase the potential of adaptation to local environmental conditions (Lawson Handley et al. 2011). Genetic studies of non-native earthworms in northern North America clearly link them to their European or Asian sources (Gailing et al. 2012; Fernandez et al. 2016; Schult et al. 2016) and demonstrate that genetic diversity of populations is generally high (Gailing et al. 2012; Porco et al. 2013; Schult et al. 2016; Keller et al. 2020), particularly in areas used by humans (Cameron et al. 2008; Klein et al. 2017). Notably, North America is experiencing a second earthworm invasion by pheretimoid earthworms of Asian origin. Arrival of Asian species in areas already invaded by European earthworms is not unusual (Greiner et al. 2012; Szlavecz et al. 2018) and may result in changes in community dynamics, ecological processes, and ecosystem functions with unforeseen consequences, depending on the combination of species that interact. Three species, Amynthas agrestis (Goto and Hatai, 1899), Amynthas tokioensis (Beddard, 1892), and Metaphire hilgendorfi (Michaelsen, 1892), have expanded their range north- and westwards into states with cold winters for the last ten to twenty years (Chang et al. 2018; Moore et al. 2018; McAlpine et al. 2022). These Asian species are more flexible in their diet and similar to European species, they also alter soil structure, carbon dynamics, and nutrient concentrations in North American forests (Zhang et al. 2010; Snyder et al. 2011, 2013; Greiner et al. 2012; Qiu and Turner 2017; Chang et al. 2018; McCay et al. 2020). Studies that investigated interspecific effects in the field and in mesocosm-experiments of Asian and European earthworms indicate that, depending on which species interact, Asian earthworms can replace the epigeic species L. rubellus (Zhang et al. 2010; Chang et al. 2016; but see also Greiner et al. 2012), induce different changes in microbial communities (Chang et al. 2017), and show higher leaf litter consumption than each species on its own (Crone et al. 2022). To control further spread, anticipate consequences of the two earthworm invasions, and develop management plans, more field studies on the ecological consequences are needed. The state of establishment and genetic diversity of non-native earthworm populations is also relevant in this context. Permanent populations that have been established for a longer time are genetically more diverse and therefore may be more resilient against additional invasions compared to transient or instable populations, that experience genetic bottlenecks and founder events more strongly. The population structure of free-living earthworms can be investigated with molecular markers that are able to resolve the genetic variability populations and relatedness among individuals, which helps to identify hotspots of multiple introductions, and to infer patterns of dispersal or the direction of transport.

In this study, we investigated the population structure of Lumbricus terrestris Linnaeus, 1758 within the metropolitan area of Minneapolis-St. Paul, Minnesota, USA. This species has been reported in Minnesota since the second half of the twentieth century (Gates 1978; Reynolds et al. 2002), and thus is ideal for studying genetic structure within and among populations. Our sampling sites included recreational and agricultural sites, and were both distant and close to road networks. We sampled earthworms at three locations within a radius of about 50 km, and three subpopulations at each location. Individual earthworms were genotyped with eight microsatellite markers to investigate the incidences and the range of dispersal, as well as genetic diversity within populations. We also included earthworms from five bait shops within the sampling area in order to assess the role of bait dumping as a potential source of free-living earthworm populations. To address multiple introduction events, we sequenced the mitochondrial 16S rDNA gene of all genotyped individuals. First, we analysed population differentiation of L. terrestris, which is a poor disperser that concentrates its activity within a small radius around its burrows (Sims and Gerard 1999; Tiunov et al. 2006; Addison 2009; Potvin and Lilleskov 2017), and therefore relies on passive dispersal to cover larger distances. Second, we analysed population structure to check if alleles are shared among locations, which indicates admixture. Third, we analysed population structure based on 16S rDNA to infer the number of genetic lineages at sampling sites and to identify potential hotspots of multiple introductions.

Methods

Collection sites, collecting methods and DNA isolation

Three populations of L. terrestris were collected in forest habitats between May and July 2014 at a distance of 25–75 km from the urban area of Minneapolis-St.-Paul, MN, USA (Table 1, Fig. 1). At each sampling site (labelled as subpopulation _1), two additional replicates were taken at 5 m (labelled as subpopulation _2) and 5,000 m (labelled as subpopulation _3) distance, dividing each population into three subpopulations. Adult and juvenile earthworms were collected from plots of 0.5 × 0.5 m using mustard extraction (Gunn 1992) and hand-sorting. Sampling site MN_0 was about 70 km south of Minneapolis and located in a woodland, intersected by a single gravel road that was surrounded by farmland, near Interstate 35 (Fig. 1). Sampling site MN_50 was located in a suburban area of Minneapolis approximately 100 km north-west from MN_0. Several lakes and parks for recreational activities are located in this area. The third sampling location MN_100 was about 100 km north-east from MN_0 and located in an area characterised by agricultural fields and various lakes that are frequently visited by the local population for fishing and camping, and only accessible by small roads. The sampling site of MN_100_3 was located on a ridge that slopes down to the St. Croix River.

Table 1 Description of sampling sites in the metropolitan area of Minneapolis-St. Paul (Minnesota, USA). Number of individuals of Lumbricus terrestris collected and analysed in this study with nuclear microsatellite (msat) and one mitochondrial (16S rDNA) marker, and the number of 16S rDNA haplotypes per subpopulation are provided
Fig. 1
figure 1

Sampling locations of Lumbricus terrestris in Minnesota, USA. Individuals were sampled around Minneapolis-St. Paul (large map) in Minnesota (grey map) in 25 and 75 km distance. Replicates were taken at 5 m and 5 km distance at each sampling site. Triangles indicate locations of bait shops

To investigate the importance of bait dumping as potential source for introductions, L. terrestris fishing baits were included from five different bait shops (five individuals per bait shop) within the sampling region. One centimetre of tail tissue of each sample was shipped from the United States to Germany for barcoding and further molecular analyses. Remaining body parts were stored as vouchers at the University of Minnesota (Department of Forest Resources).

16S rDNA sequencing and genotyping

Total genomic DNA was extracted using the Qiagen Blood and Tissue kit (Qiagen; Hilden, Germany) following the manufacturer’s protocol. To verify species identity of juvenile individuals, an 800 bp fragment of 16S rDNA was amplified using the primers 16S-LumbF and Ho_16Sra (Pèrez-Losada et al. 2009), which contain a barcoding region that reliably distinguishes lumbricid species (Bienert et al. 2012; Klarica et al. 2012). All PCR reactions for sequencing were performed in 25 μl volumes containing 11.75 μl ultrapure H2O, 1.25 μl BSA (∼4%), 2.5 μl Buffer with KCl, 1 μl dNTPs (10 mM), 3.5 μl MgCl2 (25 mM), 0.5 μl Taq polymerase (5 U/μl; Thermo Scientific; Dreieich, Germany), 1 μl of each primer (10 mM), and 2.5 μl template DNA. The PCR protocol consisted of an initial activation step at 95 °C for 3 min, 40 amplification cycles (denaturation at 95 °C for 30 s, annealing at 53 °C for 60 s, and elongation at 72 °C for 60 s), and a final elongation step at 72 °C for 10 min. PCR products were analysed with multi-capillary electrophoresis using QIAxcel (Qiagen; Hilden, Germany). Some of the PCR products performed inadequately in terms of amplification success and PCRs were repeated in 25 μl volumes containing 2 μl ultrapure H2O, 1.5 μl MgCl2 (25 mM), 12.5 μl HotStarTaq Mastermix (Genaxxon; Ulm, Germany), 2 μl of each primer (10 mM), and 5 μl template DNA. This PCR protocol consisted of an initial activation step at 92 °C for 15 min, 40 amplification cycles as described above but with an annealing temperature of 56 °C. PCR products were checked using QIAxcel and positive products were purified using the QIAquick PCR Purification kit (Qiagen; Hilden, Germany) and were either sequenced at the Göttingen Genome Sequencing Laboratory (University of Göttingen) or by Macrogen Europe Inc. (Amsterdam, The Netherlands). All 16S rDNA sequences were checked and ambiguous positions were corrected using electropherograms in Geneious 8.0.5 (https://www.geneious.com), and are available with the accession numbers ON506389 to ON506413 (bait shop individuals) and ON506269 to ON506388 (field collected animals) at NCBI GenBank (www.ncbi.nlm.nih.gov).

We amplified eight highly polymorphic microsatellite loci from L. terrestris (LTM 026, LTM 128, LTM 163, LTM 165, LTM 187, LTM 193, LTM 252, and LTM 278; Velavan et al. 2007), using Taq polymerase (Thermo Scientific; Waltham, MA, USA) or the HotStarTaq Mastermix (Genaxxon; Ulm, Germany) following the protocol of Velavan et al. (2007) and Klein et al. (2017). Genotyping of multi locus genotypes (MLGs) was performed at the Department of Animal Sciences, Veterinary Medicine, University of Göttingen, using FAM-labelled forward primers (Sigma Aldrich; St. Louis, MO, USA). Peak calling and allele binning was performed in Geneious, potential genotyping errors (scoring errors, stutter, and presence of null alleles) were checked with MICRO-CHECKER v2.2.3 (http://www.microchecker.hull.ac.uk/; van Oosterhaut et al. 2004). Three markers showed stuttering, and were subsequently checked and corrected manually in Geneious.

Population structure analyses

All 16S rDNA sequences were blasted using the nucleotide BLAST algorithm provided by NCBI GenBank to verify species identity of L. terrestris, which was particularly relevant for juvenile individuals included in this dataset. All sequences were corrected by eye and aligned in AliView v1.25 (Larsson 2014) using default settings and remaining wobble positions were conservatively corrected by eye i.e., we manually assigned the nucleotide that was present in all other sequences at this position in the alignment. The alignment was trimmed to the shortest sequence (516 bp) and collapsed to haplotypes with FaBox (Villesen 2007). Population structure was assessed with a haplotype network based on observed p-distances in R (R Core Team, 2020) using the ape and pegas packages (Paradis 2010; Paradis and Schliep 2019). Additionally, we calculated a Maximum Likelihood tree with 1,000 bootstrap replicates to check phylogenetic relatedness of 16S haplotypes using the ape and phangorn packages (Schliep 2011; Paradis and Schliep 2019). Additionally, we checked the identity of the sampled earthworm haplotypes with a phylogenetic analysis that included 628 16S sequences from NCBI database (L. castaneus = 10 sequences, L. centralis = 8 sequences, L. friendi = 10 sequences, L. herculeus = 6 sequences, L. polyphemus = 1 sequence, L. rubellus = 148 sequences, L. terrestris = 445 sequences). All NCBI sequences were aligned together with the 36 haplotypes of this study, trimmed to the shortest sequences, and positions that contained only Ns were excluded, resulting in a total length of 333 positions. The alignment was subsequently collapsed to haplotypes with FaBox resulting in 145 haplotypes (alignment files with accession numbers and haplotype datasets are available together with the R code in Dryad, https://doi.org/10.5061/dryad.w0vt4b8v2). Alignment and Maximum Likelihood analyses were performed as described above. Completeness of sampled haplotypes was analysed with a rarefaction analysis based on abundance data using the iNEXT package (Hsieh et al. 2016).

Population genetic analyses of microsatellites were performed using the adegenet package (Jombart 2008, Jombart and Ahmed 2001) in R. All earthworms from bait shops were assigned to a single population (Bait_Shop, 25 individuals) due to their high genetic similarity. For each microsatellite marker, we calculated the mean number of alleles (Na), and observed (Ho) and expected (He) heterozygosity. We tested for significant deviations from expected heterozygosity using the Bartlett test of homogeneity and for significant deviations from Hardy–Weinberg equilibrium (HWE) using a chi-square test. To estimate population differentiation and inbreeding, we calculated the standard diversity indices FST and FIS (fixation and inbreeding index, Weir and Cockerham F statistics), and pairwise FST values for four populations (MN_0, MN_50, MN_100 and Bait_Shop) and ten subpopulations (three subpopulations per population and Bait_Shop) using the hierfstat package (Goudet and Jombart 2020). To identify at which hierarchical level the most genetic variation existed, we performed an Analysis of Molecular Variance (AMOVA) using the poppr package v2.9.3 (Kamvar et al. 2014, 2015), and a Monte-Carlo test with 9,999 replications to test for significance. To identify genetic clusters, we did a Discriminant Analysis of Principal Components (DAPC, Jombart et al. 2010), using the xvalDAPC function for cross validation to decide how many axes to retain. This approach partitions genetic variation into a between- and a within-group component, and maximises the first component, thereby reducing the within-group variability, without relying on population genetic models.

We investigated population genetic structure using STRUCTURE v2.3.4 (Pritchard et al. 2000) which, in contrast to DAPC, assigns population to groups that best represent Hardy–Weinberg equilibrium, using a Bayesian sampling approach. Several preliminary runs were performed using various model configurations for optimal prior settings using the LOCPRIOR option. All other settings remained in default mode. Each run comprised ten iterations of the value of K that ranged between 1 and 10 with 105 MCMC burn-in steps and 106 replicates. The most likely number of genetic clusters (ΔK) for all runs of K was calculated in STRUCTURE HARVESTER (Evanno et al. 2005). The greedy algorithm in CLUMPP v1.12 (Jakobsson and Rosenberg 2007) was used to obtain a single mode of K corresponding to the maximum ΔK. Barplots were generated with DISTRUCT v1.1 (Rosenberg 2004).

Results

Mitochondrial 16S rDNA

Clear 16S rDNA sequences were obtained for 145 earthworms, 120 from the three sites and 25 from bait shops (Table 1). All haplotypes of this study clustered with L. terrestris’ sequences from NCBI but never with other congeners (Supplementary Information, Fig. S1), supporting correct identification of all sampled individuals. However, it is still possible that L. terrestris is a species complex (Martinsson and Erséus 2017), which cannot be discerned ultimately with this nucleotide alignment because the aligned and trimmed 16S fragment is relatively short. Notably, many haplotypes of this study were identical to earthworms sampled from Alberta (Klein et al. 2017, 2020) and some with earthworms from Klarica et al. (2012). Sanger sequences of 15 individuals were of bad quality and were not included in the 16S rDNA dataset, but microsatellites amplified well for all individuals. These failed 16S PCRs corresponded to few individuals across all sampling sites, except MN_0_1 and individuals collected from bait shops. Sequencing failures can have many reasons, ranging from buffer or ethanol remains to mutations at the primer binding sites or in the fragment causing poor primer annealing or secondary structures that hamper the sequencing reactions, respectively. We cannot exclude that these missing individuals were genetically distant or even cryptic species of L. terrestris, but we consider this as unlikely, as microsatellite markers worked well for these individuals and did not result in unusual genotypes. Overall, the 16S rDNA sequences showed a maximum p-distance of 3.9%; and the 145 individual sequences were collapsed into 36 haplotypes (Table 1). Rarefaction analysis showed that about 50% of potential haplotypes were sampled and the extrapolation curve started to reach saturation at expected 60 haplotypes if 500 individuals were sampled (Supplementary Information, Fig. S2).

Three haplotypes were very common. The most common (18 individuals) was present in two bait shop individuals and in all subpopulations except MN_100_3. Two other haplotypes were also very common (both with 17 individuals), but were less widespread. One occurred in four subpopulations (MN_0_3, MN_50_1, MN_50_2, MN_100_1) and in one bait shop individual. The other was present in six subpopulations (MN_0_3, MN_50_1, MN_50_2, MN_50_3, MN_100_1, MN_100_3) and in one bait shop individual. Subpopulation MN_50_3 had the highest number of different haplotypes relative to the number of individuals sampled (14 individuals with nine haplotypes), followed by subpopulations MN_100_1 and MN_100_2 (seven haplotypes in twelve and 13 individuals, respectively). Subpopulation MN_100_3 had the lowest number of haplotypes, all but one individual had identical 16S rDNA sequences, the other haplotype in this subpopulation was genetically distinct (p-distance = 1.9%) and co-occurred only in a single bait shop individual.

The haplotype network showed no clear structure among populations or bait-shop and field-collected individuals (Fig. 2). Subpopulations included both unique and frequent haplotypes that co-occurred in other subpopulations. Earthworms obtained from bait shops carried six haplotypes that were not identical with field collected haplotypes, but also seven haplotypes that were identical with individuals from the field. The maximum likelihood analysis showed seven well-supported clades (bootstrap support > 90%, Supplementary Information, Fig. S3), five of these clades included bait-shop and field-collected individuals and two clades comprised five individuals that were exclusively sampled from bait shops but which were distinct from field-collected individuals.

Fig. 2
figure 2

Haplotype network of 16S rDNA sequences collected from the field and bait shops in the St. Paul-Minneapolis area, MN (USA). The network shows that seven distinct lineages are present in the field, some are rare (small circles representing one individual), others are widely distributed (large circles representing several individuals from different subpopulations). The genetic distinctness among haplotypes is reflected by the length of the lines connecting circles, and dots on lines represent hypothetical mutation steps. The size of circles is relative to the number of individuals represented by this haplotype. The dashed boxes outline well-supported clades from the Maximum Likelihood tree, black numbers in boxes are bootstrap support values

Nuclear microsatellites

The microsatellite dataset contained 160 individuals and eight loci. The number of alleles per locus across all populations was similar and ranged from 17 (LTM128) to 22 (LTM278), except for locus LTM252 which had 34 alleles (Supplementary Information, Fig. S4a). In all but two loci (LTM026 and LTM187), the observed heterozygosity was lower than the expected heterozygosity (Supplementary Information, Fig. S4b). Further, all loci deviated significantly from HWE and the mean observed heterozygosity varied significantly among loci (Bartlett’s test: p =  < 0.05). The PCR reactions for microsatellite markers performed well on the collected specimens for eight out of the ten published microsatellite primers developed by Velavan et al. (2007). This contrasts with other studies that used these markers in Lumbricus terrestris individuals across a larger geographic range i.e., from Michigan, Maryland and Maine in the United States and Europe (Gailing et al. 2012), France and Poland (Souleman et al. 2016) and Vermont, USA (Keller et al. 2020). However, MICRO-CHECKER detected potential null alleles for five markers (LTM 128, LTM 163, LTM 165, LTM 252 and LTM 278), and potential genotyping errors (stuttering) for the three markers LTM 128, LTM163 and LTM165 (see Supplementary Information Tables S2, S3 for number of alleles per locus and populations and comparison with Velavan et al. 2007), which were checked and corrected manually in Geneious. We continued the analyses with all loci because deviations from Hardy–Weinberg equilibrium can have other causes than null alleles, including inbreeding or Wahlund effect, in particular for small populations (Dabrowski et al. 2013; De Meeûs 2017).

Each field collected population consisted of 45 individuals (15 individuals per subpopulation, three subpopulations per site), and the pooled Bait_Shop population consisted of 25 individuals. Population MN_100 had the highest numbers of alleles (Na = 117; Supplementary Information, Fig. S4c), closely followed by populations MN_0 (Na = 112) and MN_50 (Na = 111). Bait shop individuals had considerably fewer numbers of alleles (Na = 90), but sample size was also lower. When comparing the ten subpopulations (Supplementary Information, Fig. S4d), MN_100_1 had the highest number of alleles (Na = 87) and MN_100_3 had the lowest number of alleles (Na = 62). The number of alleles of the three subpopulations at MN_0 and subpopulation MN_50_3 ranged between 72 and 75. The number of alleles of the remaining subpopulations were lower with Na = 66 (MN_50_1) and Na = 65 (MN_50_2 and MN_100_2). The global FST and FIS values, that take all genotypes and all loci into account, were smaller for the four populations (Supplementary Information, Fig. S4e), with FST = 0.045 and FIS = 0.089 compared to the ten subpopulations dataset (Supplementary Information, Fig. S4f) with values of FST = 0.068 and slightly lower FIS = 0. 0.062.

The mean pairwise FST values between populations were also very close to zero, ranging between 0.033 and 0.058 (Supplementary Information, Table S1). However, the range of pairwise FST values was greater at subpopulation level, ranging between 0.013 and 0.139 (Table 2).

Table 2 Mean pairwise FST of Lumbricus terrestris between field collected subpopulations and bait shop individuals. Low FST values ( <0.05) are indicated in bold and always occurred between subpopulations at 5 m distance (subpopulations _1 and_2). Subpopulations of MN_0 had overall low FST values among each other and to subpopulation MN_50_3.  High FST (>0.1) are indicated in italic and occurred between subpopulations of MN_100 and MN_0_3 and MN_50_2. Intermediate FST values (0.05–0.1) are in bold italic

The AMOVA was significant at all levels and showed that genetic variance was quite low between populations and subpopulations. The majority of variance (89%) was found within samples (individuals). Variance between subpopulations within populations was low (4.9%), and variance between individuals within sites was even lower (3.7%). The variance was lowest between populations (2.0%).

The DAPC cross-validation peaked around 40 PC axes. Closer examination showed that the clustering patterns were consistent after reducing the number of PC axis around that value for both, four populations (33 PC axes retained, eigenvalues of LD1 = 181.3 and LD2 = 145.0) and for ten subpopulations (33 PC axes retained, eigenvalues of LD1 = 80.2 and LD2 = 74.3). Clustering patterns showed that most subpopulations were very similar and overlapped in one cluster (Fig. 3), except for subpopulation MN_100_3, which separated from all other subpopulations at the first axis, and bait shop individuals, which separated from all other subpopulations at the second axis. The analysis with four populations (Supplementary Information, Fig. S5) showed similar clustering patterns in which population MN_0 and MN_50 overlapped in one cluster, and MN_100 and bait shop individuals represented the most distinct genetic clusters on the first and second axes.

Fig. 3
figure 3

Discriminant analysis of principal components (DAPC) for 160 individuals of Lumbricus terrestris from ten subpopulations in the urban area of Minneapolis-St. Paul, MN (USA). The axes represent the first two Linear Discriminants (LD). Each circle represents a cluster and each dot represents an individual. The subpopulations weakly differentiated along the first two axes, but rather overlap with each other, except for subpopulation MN_100_3 and individuals collected from bait shops (MN_Baits)

Population assignment in STRUCTURE and STRUCTURE HARVESTER indicated that the sampling area most likely comprised five genetically differentiated populations (K = 5, Fig. 4). Four clear patterns occurred. First, subpopulations 5 m apart (subpopulations _1 and _2 per sampling site) were most similar. Second, at sampling sites MN_50 and MN_100, subpopulations at 5,000 m distance were considerably differentiated from the other two subpopulations at the same site. Third, sampling sites MN_0_1-3 and subpopulation MN_50_3 represented one genetic cluster, which was consistent across analyses with other K-values. Finally, subpopulation MN_100_3 and bait shop individuals (Bait_Shop) also represented discrete clusters.

Fig. 4
figure 4

Bar plots depicting the assignment of individual genotypes of L. terrestris from ten subpopulations into five genetic clusters (K = 5)

Discussion

Using nuclear and mitochondrial markers, this study investigated the genetic structure of the invasive earthworm species L. terrestris in the metropolitan area of Minneapolis-St.Paul. Our results show that allelic differentiation and genetic variance of the nuclear markers was rather low between populations of L. terrestris, indicating that these populations are a homogeneous group of genetically similar individuals. This suggests that these populations are established and likely interbreed. However, across all populations, the numbers of alleles and mitochondrial lineages was high, demonstrating that overall genetic diversity is substantial in L. terrestris, which is consistent with findings of earlier genetic studies in this species (Gailing et al. 2012; Klein et al. 2017; Keller et al. 2020). Populations of invasive species may initially exhibit low genetic diversity due to small propagule size. In cases of multiple introductions from different origins, populations can quickly overcome founder effects and can demonstrate even higher genetic diversity than in their home ranges, which in turn increases their adaptative potential to new environments (Sakai et al. 2001; Lawson Handley et al. 2011; Keller et al. 2020). The combination of low genetic structure but high genetic diversity and presence of characteristic genotypes in populations is interesting. First, earthworms represented one, well admixed population at five metre distance, showing that earthworms migrate and interbreed at short distances on their own. At five kilometres distance, genotypes changed except for one population, indicating that new populations start from few individuals after being transported by chance. Second, the high genetic diversity and low genetic structure among populations at larger distances suggests that transport (passive or active) occurs in all directions, eventually mixing populations genetically when newly arriving and established individuals interbreed. Third, the low genetic structure among populations indicates that transport of individuals occurred frequently and that populations with well mixed genotypes have been established for a long time.

We could not identify hotspots of introductions, because several mitochondrial lineages were common and widespread among populations. This suggests that dispersal or short-distance transport of earthworms may occur at random in many directions, and that individuals may interbreed after transport with already present earthworms. Apart from human mediated transport, earthworm predators such as birds, shrews or snakes also likely function as vectors for random transport over short distances. Further, isolated mitochondrial haplotypes that were only present in single individuals were also common, suggesting recurrent transport or introductions of earthworms in the field which was not followed by successful breeding.

Alternative to multiple introductions with admixture, the high number of mitochondrial haplotypes accompanied by low genetic variance at nuclear markers within populations could be a result of the special reproductive mode of earthworms. Lumbricus terrestris is a hermaphrodite with reciprocal insemination, resulting in fertilisation of eggs and production of cocoons in each individual after mating (Edwards and Bohlen 1996). Consequently, it may be that a pair of earthworms may produce offspring that carries similar, admixed nuclear genomes but different mitochondrial lineages, due to the maternal transmission of mitochondria as opposed to the biparental inheritance of nuclear genes. It is also possible that some earthworms invest more in sperm and less in eggs, or mate unsuccessfully, which may result in the presence of isolated mitochondrial lineages.

The population structure found for nuclear markers indicates that different dispersal processes acted on these earthworm populations. First, one subpopulation (MN_50_3) belonged genetically to a population 100 km south-east of its sampling site (MN_0), and not to the nearby subpopulations 5 km away, indicating effective jump dispersal, which is common in earthworms (Cameron et al. 2008; Klein et al. 2020). Second, one subpopulation in the north-west of the sampling area (MN_100_3) was also genetically differentiated from the nearby subpopulations 5 km away, but remained isolated in the population structure analyses and was dominated by one mitochondrial lineage, suggesting a recent founder event by passive dispersal of one or few individuals or a genetic bottleneck as this site is prone to recurrent flooding by the nearby St. Croix River. Third, individuals collected from bait shops separated genetically from all other subpopulations indicating that the free-living populations did not establish from recent or current cases of bait-dumping. Baits sold from shops or from private vermicultures that were not included in our sampling could also be a possible source for free-living earthworms. However, the high genetic similarity of earthworms sold in the five local bait shops suggests that local retailers purchased their products from a common source, such as a central distributor or earthworm farm. The genetic similarity of all bait shop individuals and their clear differentiation from free-living earthworms instead suggests that free-living earthworm populations derived from other sources or from releases of baits that were sold in the past. These patterns strongly contrast with findings from a study in Calgary, Alberta, Canada (Klein et al. 2017), which analysed L. terrestris populations from the field and bait shops using a similar study design and markers. They found that bait shop genotypes formed a genetic cluster with the urban population, indicating that bait disposal was an important source for free-living earthworm populations. Further, their populations were considerably more differentiated from each other than populations in this study, suggesting that free-living populations in Canada are rather recent introductions and with lower incidences of effective dispersal (dispersal with breeding), which is consistent with the more recent history of earthworm invasions in Alberta compared to Minnesota (Hale 2008; Klein et al. 2017).

The relevance of human activities for dispersal and establishment of earthworm populations is well known (Holdsworth et al. 2007; Keller et al. 2007; Cameron et al. 2007, 2008; Klein et al. 2017, 2020). The next step for investigating earthworm invasions is to disentangle the different anthropogenic vectors acting on local populations in order to manage these invasions. Once established, it is impossible to remove earthworm populations, and the development of management strategies to confine the invasion therefore remains the primary option when faced with established populations (Keller et al. 2007; Lawson Handley et al. 2011). The well-known link between baits and free-living populations (Holdsworth et al. 2007; Keller et al. 2007; Cameron et al. 2008; Klein et al. 2017) could not be identified in this study. Possibly, public awareness and educational campaigns reduced bait disposal in the past decade (Callaham et al. 2006; Keller et al. 2007). These campaigns are dedicated to education around earthworms and their effect and exist in Minnesota and the upper Midwest (e.g., https://wormwatch.d.umn.edu). They also distribute information to anglers about the importance of proper bait disposal. Other human-related factors that correlate with earthworm invasions are farming (Keller et al. 2007; Klein et al. 2020), logging and proximity to roads (Dymond et al 1997; Gundale et al. 2005; Suarez et al. 2006; Holdsworth et al. 2007; Hendrix et al. 2008; Cameron et al. 2008; Callaham et al. 2016; Roger and Collins 2017). Further, genetic differentiation between populations at small scale (0.6–13 km) is common in L. terrestris (Klein et al. 2017; Keller et al. 2020), but was not present in the population we sampled in the south from an area of intense agriculture (MN_0). Non-directional transport of earthworms with farm equipment and soil-movement by tillage practices could explain the genetic homogeneity and connectivity of this population, suggesting that farming facilitates the dispersal of L. terrestris and likely that of other invasive earthworm species such as Aporrectodea caliginosa, that are less sensitive to disturbances (Ivask et al. 2007; Smith et al. 2008).

Apart from identifying source populations and disentangling invasion routes, it is also important to understand what determines the success of invasive non-native species in order to implement successful management strategies (Lawson Handley et al. 2011). Here, earthworm populations consisted of both common and widespread, but also rare and isolated mitochondrial lineages, which is consistent with other molecular studies on invasive earthworm species in North America (Cameron et al. 2008; Gailing et al. 2012; Porco et al. 2013; Schult et al. 2016; Klein et al. 2017, 2020; Keller et al. 2017, 2020). Why some mitochondrial lineages thrive and reproduce and others do not represents one of the most fundamental questions of invasion biology (Lawson Handley et al. 2011). Repeated sampling of populations that contained the most widespread haplotypes, but also of populations that contained many isolated haplotypes would allow to observe changes in the frequency of genetic lineages over time and thereby monitor the viability of earthworm lineages.

This study revealed that earthworm populations in the metropolitan region of Minneapolis-St. Paul, Minnesota, are established and that different patterns of population structure, introductions and dispersal co-occurred, demonstrating that molecular markers are a valuable tool to study invasion patterns of earthworms. However, this study also showed that additional sampling is required to disentangle different human-related mechanisms of earthworm dispersal. Transcriptome screening for selective sweeps in different genetic lineages could eventually help to identify whether different lineages acquired adaptations that correlate with local invasions and how populations cope with founder effects, bottlenecks and local inbreeding (Gailing et al. 2012; Klein et al. 2017, 2020; Keller et al. 2020). The advances in sequencing technologies provide a valuable toolkit to launch new studies on invasive earthworm populations in order to infer the invasion success of some lineages as opposed to others, and will ultimately aid in developing management strategies to limit earthworm invasions.