Development of a ddPCR assay for the detection of the Smoky Madtom (Noturus baileyi) from eDNA in stream water samples

The Smoky Madtom Noturus baileyi is a federally endangered species, whose native distribution includes lower Abrams Creek in Great Smoky Mountains National Park (GRSM) and Citico Creek in nearby Cherokee National Forest. Due to challenges for bio-monitoring posed by its nocturnality and cryptic life history, an environmental DNA (eDNA)-based approach for detection would be useful to complement existing electrofishing and seining efforts to better understand the distribution of this species. We developed a probe-based droplet digital PCR (ddPCR) assay to detect Smoky Madtoms from non-invasively collected water samples. The assay was specific to N. baileyi and did not amplify concentrated genomic DNA of 16 co-occurring or regional fish species, including the yellowfin madtom N. flavipinnis and stonecat N. flavus. The assay limit of detection (LOD) was determined to be 4.18 copies (95% CI: 3.95, 4.41). Several 2 L water samples collected from throughout various streams in GRSM in 2016 and 2017 were tested for the presence of N. baileyi using the ddPCR assay. N. baileyi was detected at two different sites in 2016 and 2017 within Abrams Creek previously known to contain N. baileyi, but no novel detections in other sampled streams were observed. This assay should prove useful for continued surveys of N. baileyi in GRSM.


Introduction
The Smoky Madtom Noturus baileyi was first described by Taylor (1969) and is only known to inhabit lower Abrams Creek in Great Smoky Mountains National Park (GRSM) and Citico Creek in nearby Cherokee National Forest (CNF; Gibbs et al. 2014). The Smoky Madtom is small, reaching a maximum total length of 73 mm and is largely nocturnal (Gibbs et al. 2014;Shute et al. 2005). Smoky Madtoms reside under rocks or other cover in streams during the day hindering their observation or capture in fisheries surveys leading to potential underestimates of abundance.
Aaron W. Aunins aaunins@usgs.gov 1 outside of Citico Creek within CNF, although it has recently been introduced into the Tellico River within CNF. However, given its cryptic nature and the occurrence of other suitable habitat in the region, it is possible other unidentified populations of the Smoky Madtom may persist in streams besides Citico and Abrams Creek (Shute et al. 2005).
Environmental DNA (eDNA) analyses have emerged as a valuable tool for sensitive presence/absence detection of cryptic species (Wilcox et al. 2013;Hernandez et al. 2020). Compared to electrofishing, which can involve significant disturbance to a stream reach, eDNA analyses only require relatively non-invasive collection of water or sediment samples (McColl-Gaudsen et al. 2020). In addition, recent studies suggest that eDNA is often more sensitive than electrofishing for target species detection (McColl-Gausden et al. 2020;Penaluna et al. 2021).
In this study we developed a droplet digital PCR (ddPCR) assay to detect the Smoky Madtom in eDNA extracted from stream water samples. In the laboratory, the assay was evaluated for sensitivity (limit of detection (LOD)), and specificity by testing it against genomic DNA extracts of multiple co-occuring and geographically close species, including other madtoms. The assay was then applied to water samples collected across 2016 and 2017 from various streams throughout GRSM to compare against historical bio-monitoring data collected by GRSM scientists.

Methods
Assay design -To design a species-specific primer pair for N. baileyi, we chose to target the mitochondrial cytochrome oxidase 1 gene (cox1) due to its high level of interspecific variability (Deagle et al. 2014). An additional consideration for the choice of this gene was the presence of publicly available cox1 reference sequences for other species expected to co-occur and/or be geographically close to N. baileyi, reducing the need to sequence non-target species and facilitating the design of species-specific primers. Notably absent from BOLD or Genbank was a cox1 sequence(s) for N. baileyi. To obtain this sequence, genomic DNA of an N. baileyi specimen collected from Citico Creek was extracted and prepared for shotgun genomic sequencing on an Illumina MiSeq using the Nextera XT DNA Library Prep Kit following the manufacturer's instructions. The sample was sequenced at the U.S. Geological Survey (USGS) Eastern Ecological Science Center at the Leetown Research Laboratory, Kearneysville, WV. The complete mitogenome was recovered and assembled (GenBank Accession MW057778; NCBI BioProject PRJNA787289) following standard bioinformatic processing described elsewhere (Aunins et al. 2018).
There are approximately 70 species of fish native to GRSM and 46 of these species are historically known to occur within Abrams Creek (Kulp et al. 2015). Cox1 sequences of 44 fish species including some species known to occur with N. baileyi, species from nearby watersheds, as well as some other Noturus sp. from elsewhere in Tennessee were downloaded from the National Center for Biotechnology Information (NCBI) nt database (Online Resource 1). The number of sequences we downloaded per species was variable ranging from n = 1 for N. stanauli to n = 24 for Cottus carolinae and Rhinichthys atratulus (avg = 7.9, SD = 4.5), and we assumed the taxonomic assignment by the original authors was correct. For each species with more than one sequence, a consensus was first created by importing them into MEGA6 (Tamura et al. 2013) and aligning them using MUSCLE (Edgar 2004). Any regions with low coverage regions or multiple ambiguous nucleotides in the alignment were trimmed. Each alignment was then processed though the HIV Sequence Database Ambiguity Consensus Maker Tool using the default settings (available at: https://www.hiv.lanl.gov/content/sequence/CONSENSUS/ AmbigCon.html). These consensus sequences were then aligned in MEGA6, and low coverage regions on the ends of the alignment were further trimmed. This final alignment file (Online Resource 2) was used for input to the web version of DECIPHER using the default settings (Wright et al. 2014; http://www2.decipher.codes/DesignPrimers.html) to identify candidate primers that would likely amplify only N. baileyi. To make the assay more specific to N. baileyi, a double-quenched black hole FAM labeled probe was designed using IDT PrimerQuest software (https://www. idtdna.com/pages/tools/primerquest), which shared two mismatches with the N. flavipinnis consensus sequence. The selected primers and probe ( Table 1) were blasted against the NCBI nt database and did not match any other fish species in GRSM.
We optimized the probe and primer set on a Bio-Rad QX100 ddPCR system. To obtain a standard for assessing Name Nucleotide sequence (5`-3`) T m (°C) the LOD of the assay, we purified PCR product generated from amplification of a single N. baileyi individual from Citico Creek using the newly designed primer pair. Reaction conditions for the PCR using a Bio-Rad T100 thermal cycler were 20 µl of Promega PCR Buffer, 8 µl Promega MgCl, 10 µl of F and R primer at 5 µM stock concentration each, 2 µl of template, 2 µl of dNTPs at 25 µM each stock concentration, and 57.5 µl of PCR grade water. The resultant PCR product was purified with a Qiagen QIAquick column and quantified three times using a ThermoFisher Qubit HS dsDNA fluorometric assay employing 2 µl of template for each replicate. The average of the three replicates was taken, and the number of copies was estimated using the equation available at http://www.scienceprimer.com/copy-numbercalculator-for-realtime-pcr. We chose to use purified PCR product as the standard after repeated attempts to amplify a synthetic gBlock fragment (IDT, USA) were not successful (data not shown). Eight serial 1:5 dilutions of the standard were made from a starting concentration of 52,454 copies/ul to 0.67 copies/ul to measure the LOD following the protocol of Hunter et al. (2017), which is conservative and accounts for the non-linear response of the ddPCR instrumentation towards the limit of detection. Ten technical replicates were run for each dilution point. ddPCR reaction conditions -Each ddPCR reaction included the following components: 2 µL template DNA, 11µL Bio-Rad ddPCR Supermix for probes (no dUTP), 4 µl F and R primers at 5 µM each stock concentration, 1.1 µl N. baileyi cox1 probe (FAM) at 5 µM stock concentration, 0.3125 µl Rat Preamp mix (HEX fluorophore; Assay C5ar1 Bio-Rad Catalog# 10,031,228), 0.2 µl Rat DNA diluted to 2,857,142.86 copies/µl (BioRad Catalog# 10,044,156), and PCR-grade water to a final volume of 22 µL. The inclusion of the rat assay was to evaluate the presence of inhibition following Hunter et al. (2019). Each completed reaction mix was loaded into a Bio-Rad DG8 cartridge along with 70 µl Droplet Generation Oil for Probes and processed through a Bio-Rad QX200 Droplet Generator. Forty µl of the droplets from each reaction were loaded into a 96 well Bio-Rad ddPCR plate and sealed with a pierceable foil lid using a Bio-Rad PX1 plate sealer. The reactions were thermal cycled with the following parameters on a Bio-Rad C1000 Thermal Cycler: 95 °C for 10 min; 40 cycles of 94 °C for 30 s denaturation followed by 60 °C annealing for 1 min; 98 °C final denaturation for 10 min; hold at 10 °C. A minimum of three negative controls (2 µl PCR-grade water in place of template DNA) were run with each set of samples to monitor for contamination. Droplets were processed on a Bio-Rad QX200 Droplet Reader, and the number of positive and negative droplets per sample was determined with the Bio-Rad Quantasoft (version 1.7.4) software using automatic thresholding. The software provided the copies/µl for each sample. To calculate the copies/L in the original 2-liter water sample, we first determined the concentration (copies/ µl) in the 50 µl DNA extraction eluate by multiplying the copies/µl reported by the software times the 22 µl reaction volume, divided by 2 µl of DNA input. This concentration was then multiplied by 50 µl to give the number of copies in the original 2-liter sample, and converted to copies/ µl .
Each Sterivex sample (Millipore Sigma, USA) was analyzed in triplicate through the ddPCR assay (Table 2). Therefore, each site was analyzed through 12 individual ddPCR reactions (three ddPCR reactions per each of three environmental water samples, and three ddPCR reactions per negative control). One site (2017 AQABC1) only had two Sterivex samples available for analysis, as the DNA of the third was depleted in an unrelated assay. Each sediment sample was analyzed in triplicate for a total of nine ddPCR reactions per site where sediment was analyzed.
Specificity testing -In addition to in silico testing of the PCR primers and probe, tissue was obtained for 15 species distributed among 29 samples for testing of non-specific amplification of the N. baileyi ddPCR assay (Online Resource 3). Fin clips were collected from each species from streams within GRSM, with the exception of N. flavus from the Little Tennessee River and N. flavipinnis from Citico Creek. DNA was extracted from each sample using the Qiagen DNeasy Blood and Tissue kit and quantified using a Qubit HS dsDNA kit. Two µl of template at 4 ng/ ul was used from each specimen for amplification testing in ddPCR (see reaction conditions). Each sample was amplified in duplicate.
Study location, eDNA sample collection, and DNA extraction -Eight sites distributed among five watersheds within GRSM were targeted for eDNA sampling ( Fig. 1; Table 2). At each site, three replicate water samples were collected by filtering 2 L of water through a 0.22 μm pore size polyethersulfone (PES) membrane Sterivex filter capsule using a Geotech (Geotech Environmental, Denver, CO) peristaltic pump (a total of six liters were filtered per site, among three 2-liter replicate samples). One end of the tubing was attached with a zip-tie to a long branch obtained at each site and held out into a flowing portion of the stream while taking care to avoid touching the end of the tubing. The Sterivex filter was attached to the other end of the tubing through the use of a barbed male luer lock fitting. In some cases, two filters were required to reach the target volume of 2 L per sample. New tubing (MasterFlex Tygon E-food tubing PN# B-44-4X, Masterflex, USA), zip-ties, and barbed luer lock fittings were used at each site to avoid cross contamination among sites. The tubing end that collected the sample water was always facing upstream of the stick and zip tie to avoid sampling any water that passed over them. Once filtering was complete, filter capsules within eight hours. No preservative was used for the sediment samples. Sediment samples were eventually shipped frozen to the USGS Eastern Ecological Science Center at the Leetown Research Laboratory, where they were stored at -80 °C until DNA extraction. DNA was extracted from sediment samples using the Qiagen DNeasy PowerSoil Pro kit following the manufacturer's instructions.

Results
A ddPCR primer + probe assay targeting a 125 bp section of the cox1 gene of N. baileyi was designed ( Table 1). The LOD of the assay based on an analysis of eight serial dilutions of the standard was determined to be 4.18 copies/µl (3.95, 4.42 95% CI; Online Resource 4).
The assay was tested in vivo against extracted DNA of 15 species of fishes, and in silico (NCBI primerblast and alignments) against 44 species of fishes. The number of droplets among all reactions analyzed of the 15 fish species ranged from 13,177 to 17,607 (avg = 15,649, SD = 1240). In vivo testing resulted in no amplification of any non-target species, and in silico analyses indicated the primer and probe had multiple mismatches to other regional Noturus spp, and other fish species. Therefore, the assay appears to be specific for detection of N. baileyi. ddPCR reaction negative were placed into individual labeled Whirl-Pak bags (Nasco, USA), and kept cool on ice packs in a small cooler until they could be placed into a -20 °C freezer within eight hours of collection. A negative control water sample brought along to each site, consisting of 1 L of laboratory milli-Q water in a 1 L Nalgene bottle, was also filtered first at each site for subsequent laboratory processing to monitor for sample contamination. The Sterivex filters were eventually shipped frozen back to the Leetown Research Laboratory and stored at -80 °C until they could be extracted.
For DNA extraction, Sterivex filter capsules were removed from the − 80 ºC freezer and allowed to thaw for 15 min at room temperature. Remaining water within the capsules was removed with air pressure from a sterile 5 mL syringe. Then, the Qiagen DNeasy PowerWater Sterivex kit was used to extract the DNA from the Sterivex filter following the manufacturer's protocol. The samples were eluted into a volume of 50 µl with elution buffer. For samples collected across two filters, eluted DNA was combined.
A subset of sediment samples was collected after the water samples from site F0171 within Abrams Creek ( Table 2) for comparison. Sediment samples (approximately 50 g) were collected with a sterile scoop at three random points within a 1-meter area around the targeted water sample collection point, and consisted of fine grain sand-like material. Each of the three sediment samples was poured into a Whirl-Pak bag, and placed on ice until it could be frozen at -20 °C  -|-|--|-|--|-|a "-" represents no positive detection in a ddPCR replicate, whereas a number represents DNA copies/µl in a ddPCR replicate output by Bio-Rad Quantasoft software with no adjustment for reaction or template volume. There were three replicates run per sample separated by a "|"

Discussion
We developed a species-specific probe-based ddPCR assay for the Smoky Madtom N. baileyi, which confirmed the presence of N. baileyi within a reach of the only stream in GRSM known to contain this species -lower Abrams Creek (Fig. 1). This assay should be useful for confirming presence of N. baileyi after supplementation, and perhaps expansion into new habitats through future monitoring in the park. GRSM contains over 4,640 km of streams, of which approximately 51 km of a habitat type similar to lower Abrams occur in the Little Tennessee River watershed that could conceivably support N. baileyi.
While we were able to determine sensitivity of the assay in a controlled laboratory setting with a known set of DNA concentrations, many unknowns remain regarding the sensitivity of our assay for field detections of N. baileyi. Notably, all of the positive detections in field samples were below the LOD of the assay, suggesting eDNA concentrations of N.
controls included with the in vivo testing showed no evidence of contamination.
The number of droplets among all water samples analyzed ranged from 10,020 to 17,878 (avg = 14,028, SD = 1684), and among sediment samples ranged from 10,025 to 17,469 (avg = 13,252, SD = 1877). There was no evidence of inhibition in any samples. Two sites within Abrams Creek (2016 AQABC1 and 2017 F0171) were positive in ddPCR for at least one 2-liter replicate for the detection of N. baileyi, consistent with historical data on the distribution of N. baileyi in GRSM ( Table 2). The amount of eDNA converted to copies/L from these samples measured between 77 and 852.5 copies/L. However, detections were not consistent at these sites across years or sample replicates. N. baileyi was not detected at any of the locations sampled outside of lower Abrams Creek within GRSM. Sediment samples tested at F0171 and AQABC1 within Abrams Creek did not result in positive detections for the presence of N. baileyi DNA.  Table 2 of the manuscript was written by Aaron Aunins, and all authors commented on previous versions of the manuscript and approved the final version with the exception of Tim King (deceased).
Funding This work was supported by a Natural Resources Preservation Program (NRPP) grant from the National Park Service.
Data Availability Raw sequence reads for the Smoky Madtom, and the assembled mitogenome are available at NCBI (PRJNA787289; MW057778). ddPCR data, nucleotide alignments and metadata for fish included for specificity testing have been deposited in USGS Sciencebase https://doi.org/10.5066/P92WKR6U.

Competing Interests
The authors have no relevant competing interests to disclose.

Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/. baileyi are very low. The density of N. baileyi at the sites of detection in Abrams Creek, as well as the distance from the source of N. baileyi eDNA are unknown. Thus, we have no knowledge of the relative abundance of N. baileyi needed to result in a positive detection. In situ caging experiments (sensu Nevers et al. 2020) would be informative to assess sensitivity of the assay in the field.
Our detections of N. baileyi were not consistent across years. Similarly, all replicates within a single 2-liter water sample were not always positive. This finding likely reflects the patchy distribution of eDNA in lotic systems, and stochasticity of detection in assays such as ddPCR when the concentrations are low (Hunter et al. 2017). Future experiments should investigate the impact of different filtered volumes of water on detection of N. baileyi in a stream reach, preferably in conjunction with a caging experiment with known distances of sampling from the source. The pore size of 0.2 μm employed in this study was determined to be the most efficient in capturing the largest distribution of sizes of Common Carp Cyprinus carpio eDNA in water samples from a pond (Turner et al. 2014) at the expense of filtering the smallest amount of water due to clogging. In contrast, Thomas et al. (2018) found that larger volumes filtered through a 5 μm filter captured more target eDNA than smaller pore sizes. Thus, more field experimentation is warranted to evaluate what pore size filters and volumes are the most effective for capturing N. baileyi eDNA. We did not detect any N. baileyi eDNA in the sediment samples examined, even when the corresponding water samples were positive for N. baileyi. The ddPCR results did not suggest any inhibition as the cause for lack of detection in the sediments. This result could suggest that the N. baileyi eDNA originated from upstream, that shedding rates of eDNA into sediments is very patchy in distribution, or any N. baileyi eDNA in our sediment samples were below detection limits. Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.