Introduction

Inland fisheries produce over 10 million tonnes of biomass each year that support livelihoods, the economy and recreation, but are dependent on the maintenance of healthy ecosystems and biodiversity (Welcomme et al. 2010). One of the many factors that can affect this balance is stocking introductions: the release of fish species or populations into areas outside of their native range. Non-native species can be detected based on genetic and morphological differences compared with the native species (Trebitz et al. 2017), and is often a conservation concern (Gozlan et al. 2010). Stocking introductions can result in hybridization, outbreeding depression, genetic swamping of the native population, and greater inbreeding and lower effective population size in the wild from few captive parents contributing to large numbers of released fish (Ward 2006; Todesco et al. 2016; Waples et al. 2016). This can affect evolutionary and ecological processes that are indispensable to maintain biodiversity, functional ecosystems, and associated benefits for humanity (Moritz 2002; Le Cam et al. 2015; Savary et al. 2017). This makes stocking introductions a complex policy problem that should be managed on a case-by-case basis (Allendorf et al. 2001). However, the occurence and impacts of stocking of non-native lineages within a species, or within a complex of closely related species, is much harder to detect than between species and can often receive less attention (Laikre et al. 2010). Detecting non-native, closely-related lineages has become more tractable with advances in DNA sequencing technology, which produce high-resolution data at relatively low cost and without previous genomic resources (Funk et al. 2012).

Genomic technologies are starting to be used in Australia to inform government-managed inland recreational fisheries (e.g. Harrisson et al. 2016; Beheregaray et al. 2017; Attard et al. 2018). Arguably the most economically important inland river system in Australia is the Murray-Darling Basin. It spans more than one million km2 and supplies agricultural and recreational needs, with human impacts from regulation of river flow, water abstraction, habitat degradation, and introduction of non-native taxa (Kingsford et al. 2011). To maintain and enhance recreational fishing within the Murray-Darling Basin, government-managed stocking programs exist for five native fish species across five state and territory jurisdictions (Rowland and Tully 2004; Gilligan et al. 2009; NSW Department of Primary Industries 2010; ACT Environment and Planning Directorate 2015). These ‘harvest stocking’ programs release fingerlings produced from government or private hatcheries to bolster or replace natural recruitment. Stocking is conducted both within connected riverine networks and within hydrologically-isolated impoundments. Stocking approvals consider the endemicity of the lineage of broodfish to the proposed release site, with the level of restrictions dependent on the state or territory government. Outside of stocking programs authorised by the government, private hatcheries market their fingerlings to the public for release into private farm dams.

The largest stocking program in terms of the number of fingerlings, the length of operation, and spatial scale, is the harvest stocking program for the golden perch (Macquaria ambigua) species complex. Golden perch is a medium-large sized, long-lived freshwater fish (length = 35–50 cm; sexual maturity = 2 years males, 4 years females; longevity = 26 years) (Mallen-Cooper and Stuart 2003). This stocking program also represents the greatest risk to the natural genetic variation of the Murray-Darling lineage: the golden perch complex is found in three major hydrologically disconnected drainage basins, each containing a distinct lineage (or cryptic species) with hatchery production occuring for each lineage (Faulks et al. 2010; Beheregaray et al. 2017; Attard et al. 2018). The golden perch complex has hierarchical population structure: there is no more than 0.1 genetic differentiation (FST) among localities within basins (Faulks et al. 2010; Attard et al. 2018) but 0.5 to 0.6 FST among the three basins (Beheregaray et al. 2017). Within the Murray-Darling Basin, fingerlings are produced by numerous government and private hatcheries, and approved releases of fingerlings can be undertaken by government or community groups. Hatchery production programs for golden perch commenced in 1960 (NSW Fisheries 2003), which could have compromised natural genetic variation until 2005 when the first regulation was implemented to regionalize stocking (NSW Department of Primary Industries 2005). The flexibility of ongoing permitted stocking activities, the availability of fingerlings for direct sale from private hatcheries to the public, and the possibility of non-endemic lineages to escape from private dams during floods, also create ample opportunities for stocking introductions of fingerlings across basins. This together with other human activities in the Murray-Darling Basin threaten this native species.

Here we assessed whether there are non-endemic lineages of golden perch from the Lake Eyre or Fitzroy basins in the Murray-Darling Basin. We did so using a genome-wide SNP dataset developed without any previous genomic resources. We also compared the SNP data with microsatellite data to determine whether a genome-wide dataset was needed to conduct an accurate and precise admixture assessment among golden perch in the Murray-Darling Basin. This shows the importance of genomics to detect and mitigate stocking introductions of closely related lineages.

Methods

Genomic data from a previous intra-basin study of golden perch in the Murray-Darling Basin (Attard et al. 2018), and a golden perch phylogenomic study that included the Lake Eyre and Fitzroy basins (Beheregaray et al. 2017), were re-purposed to produce a SNP dataset for the current study. In addition, five new samples were added to increase to 15 the sample size at SNPs for the Fitzroy Basin. This resulted in a genomic dataset of 174 fish from 13 localities (including the open river system and impoundments) across the Murray-Darling Basin (Fig. 1) and 15 representative fish from each of the Lake Eyre and Fitzroy basins (Table 1). The Fitzroy Basin samples were all from Nogoa River (to avoid admixed samples from Dawson River (Beheregaray et al. 2017)), and the Lake Eyre Basin samples were six from Diamantina River and three from each of Georgia River, Neales River and Warburton River (to avoid admixed samples from Bulloo Basin (Beheregaray et al. 2017)). The dataset additionally included 15 replicates from across the basins to estimate genotyping error, which was calculated as the average percentage of alleles that differed between replicates.

Fig. 1
figure 1

Map of the Murray-Darling Basin in Australia showing the sampling sites for golden perch. The site code and number of samples are in brackets, and the location of the neighbouring Lake Eyre and Fitzroy basins are shown

Table 1 Sampling data and percent of missing data for golden perch based on the genome-wide SNPs

Genomic libraries were prepared following the ddRAD protocol of Peterson et al. (2012), with details in Attard et al. (2018) and Beheregaray et al. (2017). Resulting reads were processed using the de novo pipeline of STACKS 1.29 (Catchen et al. 2011; Catchen et al. 2013) to produce the final SNP dataset. This was performed following Attard et al. (2018), except increasing to 12 the number of sampling sites in which a locus needed to be detected in order to keep that locus. This was because of the addition of Lake Eyre and Fitzroy samples.

The genomic dataset was compared to a dataset of eight microsatellites from a landscape genetic study of golden perch (Faulks et al. 2010). Samples with more than two missing microsatellite genotypes were removed from the microsatellite dataset before use in the current study. The microsatellite data was subsampled to 140 fish that were also in the genomic data from the Murray-Darling to allow a direct comparison of individuals, and 15 representative fish from each of the Lake Eyre and Fitzroy basins. The samples from the Lake Eyre and Fitzroy basins were the same between the SNP and microsatellite datasets, except for six samples from the Lake Eyre Basin due to sample availability.

Admixed individuals were detected using multiple methods to ensure robust interpretation of estimates of ancestry. Bayesian clustering methods were implemented for SNPs in FastSTRUCTURE 1.0 (Raj et al. 2014) and ADMIXTURE 1.3.0 (Alexander et al. 2009) using the default parameters. Of these two programs, only FastSTRUCTURE ancestry is presented here as both programs showed almost indistinguishable ancestry estimates. For microsatellites and, for comparison, SNPs, STRUCTURE 2.3.4 (Pritchard et al. 2000) was run using a K of three with the admixture model of ancestry, the independent allele frequency model, not using sampling locations as priors (Hubisz et al. 2009), and with ten independent runs that had reached convergence. A burn-in of 100 000 iterations then 106 iterations were used for the microsatellites and, due to computational time, a burn-in of 10 000 iterations then 105 iterations for the SNPs. The STRUCTURE runs were analysed using CLUMPAK (Kopelman et al. 2015) (Main Pipeline, default parameters) to obtain an estimated membership of each individual to each basin.

A principal components analysis (PCA), which has no population model assumptions, was conducted for each dataset with ADEGENET 2.0.0 (Jombart 2008) in R. As PCAs require no missing data, SNPs which had complete missing data for at least one basin were removed. The remaining missing data was imputed separately for each basin using GENODIVE 3.0 (Meirmans and van Tienderen 2004) before running the PCA. This would result in admixed individuals in the Murray-Darling Basin having the appearance of greater ancestry to the Murray-Darling than the reality (i.e. it would underestimate admixture), especially in those with more missing data. So the PCAs were used only to explore the power of the SNPs compared with the microsatellites, rather than to accurately estimate individual ancestry.

To help distinguish ‘pure’ and admixed individuals, and to assess the power of the SNPs relative to the microsatellites, we simulated 30 offspring from each of the basins and four admixture classes. Simulations were based on the allele frequencies of individuals with at least 0.99 estimated ancestry to the basin in which they were sampled as estimated by FastSTRUCTURE for SNPs, or at least 0.97 ancestry as estimated by STRUCTURE for microsatellites. The difference in cut-off between datasets is due to the differing power of the datasets (see Results). The simulated admixture classes were F1 Murray-Darling—Lake Eyre hybrids and their backcross to Murray-Darling, and F1 Murray-Darling—Fitzroy hybrids and their backcross to Murray-Darling. The simulations were conducted using the custom Python script of Elliott and Russello (2018), which is based on HYBRIDLAB (Nielsen et al. 2006) but is designed for large SNP datasets, or using HYBRIDLAB 1.0 for the microsatellite data. To examine the impact of missing data on the accuracy of ancestry estimates, we created additional datasets by removing 30% and 80% of the genotypes randomly from the simulated SNP dataset and 30% from the simulated microsatellite dataset using a custom Perl script. We did not simulate 80% missing data for the microsatellites, as this would result in only 1.6 genotyped microsatellites on average per individual. The simulated datasets were then run in FastSTRUCTURE for the SNPs and STRUCTURE for the microsatellites using the same parameters as for the empirical data. As the PCAs were only used to explore the power of the SNPs compared to microsatellites, rather than estimating ancestry and associated impacts of missing data, we only included the simulated datasets with no missing data in the PCAs.

Results

The final genomic dataset consisted of 6,862 SNPs and the genetic dataset consisted of eight microsatellites. The genomic dataset had an estimated genotyping error of 1.6% and had 9% missing data (Table 1). Only 11 samples had more than 30% missing data, with all of these samples from the Murray-Darling Basin. Bayesian analyses produced the expected three genetic clusters that represent the Murray-Darling, Lake Eyre and Fitzroy basins (Fig. 2). All 30 individuals sampled from the Lake Eyre and Fitzroy basins had at least 0.99 ancestry to their basin using SNPs, or at least 0.98 using microsatellites. One hundred and fifty-two of the 174 individuals sampled from the Murray-Darling Basin and analysed using SNPs had at least 0.99 ancestry to that lineage. The remaining 22 consisted of one likely F1 hybrid between the Murray-Darling and Fitzroy basins that was collected in 2005, and individuals with various levels of backcrossed ancestry between the Lake Eyre and Fitzroy basins to the Murray-Darling Basin and that were collected from 2003 to 2015 (Table 2; Fig. 2). The PCA and simulations used 6,023 SNPs when those with missing data in an entire basin were removed, and the PCA produced results that were concordant with the Bayesian clustering analyses (Fig. 3). The microsatellites only detected three of the 15 admixed individuals detected by the SNPs and genotyped in both the microsatellites and the SNPs (Table 2; Fig. 2). The difference in ancestry estimates between the SNPs and microsatellites is unlikely due to the Bayesian analysis used, as the ancestry estimates for the SNPs between FastSTRUCTURE and STRUCTURE were almost indistinguishable (Fig. 2). The simulations revealed the likely explanation that the microsatellites are less powerful than the SNPs, even when the SNPs have 80% missing data (Figs. 3 and 4). Pure individuals simulated using the SNP dataset had at least 99% ancestry to their basin (Fig. 4), even with 80% missing data, which indicates that empirical individuals with less than 99% ancestry based on SNPs are admixed individuals.

Fig. 2
figure 2

Bayesian clustering results for golden perch when K is set to three and using (a) genome-wide SNPs in (i) FastSTRUCTURE and (ii) STRUCTURE, or (b) microsatellites in STRUCTURE. Each individual is represented by a column, with the colouring in the column representing the proportion of estimated membership of the individual to each genetic cluster (blue or gray in grayscale, Murray-Darling Basin; yellow or light gray in grayscale, Lake Eyre Basin; red or dark gray in grayscale, Fitzroy Basin). Uncoloured (i.e. white) columns in (b) indicate individuals for which microsatellite data is unavailable. Sampling sites of individuals within the Murray-Darling Basin are indicated by codes defined in Fig. 1; Table 1

Fig. 3
figure 3

PCA of empirical and simulated golden perch individuals using (a) genome-wide SNPs or (b) microsatellites. The percent of variance explained by the first two PCs and, in the inserts, the eigenvalues of the first 10 PCs are shown. Empirical individuals are represented as circles colour-coded according to their sampling basin (blue, Murray-Darling Basin; yellow, Lake Eyre Basin; red, Fitzroy Basin). Simulated individuals overlay the empirical individuals, and are represented as shape outlines according to their simulation category (circle, pure individuals; triangle, F1 hybrids; diamonds, backcrosses)

Fig. 4
figure 4

Bayesian clustering results for simulated golden perch using (a) genome-wide SNPs with (i) no missing data, (ii) 30% missing data and (iii) 80% missing data, and using (b) microsatellites with (i) no missing data and (ii) 30% missing data. Each individual is represented by a column, with the colouring in the column representing the proportion of estimated membership of the individual to each genetic cluster (blue or gray in grayscale, Murray-Darling Basin; yellow or light gray in grayscale, Lake Eyre Basin; red or dark gray in grayscale, Fitzroy Basin)

Table 2 Sampling date, estimated ancestry to the Murray-Darling Basin and missing data for the golden perch with less than 0.99 ancestry according to the genome-wide SNPs. Sampling sites of individuals within the Murray-Darling Basin are indicated by codes defined in Table 1; Fig. 1

Discussion

We found using genomics that a proportion of the sampled golden perch in the Murray-Darling Basin have admixed ancestry to the neighbouring Lake Eyre and Fitzroy basins (22/174 samples = ~ 13%). Admixed individuals were distributed widely across the Murray-Darling Basin, and in both the connected river system and impoundments. Fortunately, there was only one likely F1 hybrid, with the remaining admixed individuals representing various levels of backcrosses to the Murray-Darling Basin. This suggests as yet no widespread introgression or complete admixture (as defined by Allendorf et al. (2001)) into the natural Murray-Darling lineage. Non-endemic genetic material in the Murray-Darling Basin may have occurred through one or more human-mediated mechanisms: (i) historical stocking activities that occurred prior to existing fisheries management arrangements, (ii) escape of individuals from private farm dams that were stocked from commercial hatcheries in a different region, (iii) deliberate or careless release of non-endemic lineages, (iv) deliberate, unintentional or careless mixing in hatcheries of broodfish from multiple lineages, (v) or unintended collection and incorporation of admixed broodfish within hatcheries. Unnatural hybridization within the Murray-Darling Basin may lead to ecological and evolutionary consequences (Ward 2006), especially because the golden perch in different basins have diverged to the point that they may be different species (Beheregaray et al. 2017). Our findings serve as a warning that one or more of the human-mediated mechanisms outlined above may need to be managed more effectively than has occurred to date.

The legacy of unregulated stocking programs before 2005 likely includes unnatural admixture in the Murray-Darling Basin, given that the detected introgressed individuals include those sampled before or around 2005 and distant backcrosses. A flow-on effect of the unregulated stocking is that hatcheries may be unintentionally using admixed wild-caught broodstock. Indeed, the impoundment populations sampled here are all stocked from one hatchery run by the state government of New South Wales, and these impoundments contained admixed individuals. Impoundment populations of golden perch have low or no natural local recruitment, and so are reliant on ongoing stocking programs to maintain the recreational fishery (Forbes et al. 2016). Given that government-managed stocking programs collectively produce or facilitate the release of over one million hatchery-bred golden perch fingerlings per year, with the potential to demographically and genetically overwhelm wild-born conspecifics (Crook et al. 2016; Forbes et al. 2016), it is incumbent on those programs to minimise ongoing risks from the admixture already present within the Murray-Darling Basin. We recommend that broodstock at hatcheries undergo genomic interrogation prior to use to ensure they have pure ancestry to the Murray-Darling lineage. A golden perch broodfish can produce tens to hundreds of offspring per year, and so even a small amount of non-native ancestry in one broodfish can spread and introgress into the wild population.

We have more broadly shown the use of genomics to help manage stocking activities compared to microsatellites. A traditional microsatellite dataset can detect admixture in populations, but cannot differentiate with certainty the ‘pure’ and admixed individuals (Sanz et al. 2009; van Wyk et al. 2017). Instead, SNPs provide two intrinsic advantages for ancestral inferences: first, they have a better genomic resolution with higher density and a more uniform distribution across the genome; second, SNPs have a lower mutation rate than microsatellites, reducing the potential for homoplasy and associated analytical issues. These differences improve the resolution of ancestral inferences, and the biological meaningfulness and usefulness of the results (Morin et al. 2004; Coates et al. 2009; Oliveira et al. 2015). In addition, current SNP protocols do not need standardization across laboratories or detection platforms, as is typically needed for microsatellites (Morin et al. 2004; Coates et al. 2009). The relatively high SNP genotyping error possible from genotyping-by-sequencing can be mitigated by balancing the number of loci and individuals with quality and coverage of sequences, from library preparation to bioinformatic filtering (Fountain et al. 2016). Here, this resulted in a genotyping error rate of only 1.6%. In addition, some data analysis methods—but not those used here—can take into account genotyping error (e.g. Attard et al. 2018). In line with the advantages of SNPs, we showed here that SNPs, but not microsatellites, had the accuracy and precision to reliably detect admixed individuals even with missing data (Figs. 2, 3 and 4). So, SNPs should be used instead of microsatellites to screen individuals when selecting broodstock.

Screening broodstock through genomics expands on its use to inform fisheries management, by identifying stocks, connectivity, functional variation and adaptive capacity under environmental change, structural genomic variants relevant for ecological processes, infectious diseases, and other research questions (e.g. Attard et al. 2018; Jeffery et al. 2018; Munang’andu et al. 2018; Sandoval-Castillo et al. 2018; Grummer et al. 2019; Wellenreuther et al. 2019). Even with this plethora of genomic information, there is often little integration of genetics and genomics into fisheries management (Bernatchez et al. 2017), or to conservation in general (Garner et al. 2016). One way this can be overcome is by simplifying the communication and increasing the collaboration between academics and management agencies (Garner et al. 2016; Bernatchez et al. 2017), as we have done here (see author affiliations). This allows genomic diversity, which is a critical component of biodiversity, to be incorporated into biodiversity management and conservation.