Introduction

Marine zooplankton play a crucial role in ocean ecosystems, linking primary producers to higher trophic levels and contributing to the biogeochemical cycling of nutrients. They are also indicators of the health of marine environments and can provide valuable insights into the impacts of climate change, pollution, and other anthropogenic activities on the ocean (Ferdous and Muktadir 2009; Ndah et al. 2022; Yang and Zhang 2020). However, many of the ecologically significant indicators, such as the presence or relative abundance of sensitive or invasive species, rely on accurate species-level data, which remains a bottleneck in many zooplankton studies, given the time and taxonomic expertise required to manually sort and identify zooplankton samples.

Recent advances in molecular techniques have allowed to study marine biological communities at an unprecedented volume and taxonomic resolution and are rapidly becoming more and more affordable. More specifically, metabarcoding, or the simultaneous sequencing of a DNA barcode region of interest in all organisms in a community sample, permits the analysis of 100 s to 1000 s of samples simultaneously (Gaither et al. 2022). Metabarcoding avoids a lot of the limitations of microscopic sorting and more modern visual-based identification methods, since it can distinguish organisms of any stage of development, cryptic species, or those species that lose their distinguishing features in preservatives. Metabarcoding can also be more cost-effective than traditional methods, particularly for large-scale monitoring studies, as it requires fewer trained personnel and can be to a large extent automated. However, there are still methodological challenges and limitations that must be properly addressed, including the choice of barcode, PCR biases, and incomplete reference libraries (Bucklin et al. 2016; Santoferrara 2019). Most importantly, despite a growing body of evidence that sequence numbers produced via metabarcoding are at least somewhat representative of the relative composition of the organisms in the environment, it remains a challenge to apply metabarcoding data for community analysis in a quantitative way (Bucklin et al. 2016). The most common quantitative parameter in plankton ecology, abundance of organisms, is expectedly often poorly correlated to sequence numbers due to the enormous range in organism size and, thus, their DNA content. This can even include organisms belonging to the same species—for example, a Calanus spp. individual will vary between 0.2 and 140 µg C, or 4 orders of magnitude, between an egg and its adult stage (i.e., Møller et al. 2016). Far better correlations have been observed for organism biomass, especially in certain groups such as copepods (Ershova et al. 2021; Hirai et al. 2017; Lamb et al. 2019; Matthews et al. 2021), although typically these relationships remain far from perfect for most taxa, presumably due to PCR bias as well as different DNA density between taxonomic groups of similar weight.

In this study, we apply a mock sample approach to quantify and address the biases of metabarcoding both for recovering biodiversity and estimating relative contribution of species. We capitalize on the well-documented relationship between organism biomass and sequence numbers (Ershova et al. 2021; Hirai et al. 2017; Krehenwinkel et al. 2017; Schenk et al. 2019) to introduce a direct framework to apply metabarcoding as a quantitative method in zooplankton monitoring studies. We sequenced a 313-base pair fragment (“Leray” fragment) of the COI barcoding gene, which has been shown to be successful in recovering biodiversity at the species level in a wide range of marine invertebrates, including zooplankton (Antich et al. 2019; Ershova et al. 2021; Wangensteen et al. 2018). We attempt to correct for some of the PCR bias and other factors that result in uneven sequencing of various organisms, with the hypothesis that under- or overrepresentation of specific taxa in sequencing data is consistent and predictable. We do this by developing species- or taxa- specific conversion factors that are applied across the entire dataset and bring the relationship between organism biomass and sequence reads closer to unity. We then combine this data with a single bulk biomass measurement for the entire sample to produce a quantitative molecular variable akin to species biomass, which can then be used to map species distributions and densities, analyze community structure, and estimate a variety of ecosystem indicators. Additionally, we tested two different methods of preservation of zooplankton DNA to evaluate the impact of preservation method on the obtained results. We propose that the protocol described in this work can be a valuable tool for future studies on marine zooplankton communities and to help better understand the functioning of marine ecosystems.

Materials and methods

Collection and sample preparation

Sampling and mock sample preparation were done during the IMR North Sea Ecosystem Cruise onboard the RV Johan Hjort during April 2022 (Fig. 1, Supplementary Material 1). Zooplankton was collected using a WP-II net (0.25-m2 mouth opening, 180-µm mesh size), which was hauled vertically from 5 m off the sea floor. At two of the stations (stations 418 and 423), samples were obtained using a Multinet MAMMOTH (Hydrobios) with a 180-µm mesh containing 9 nets, hauled obliquely from 5 m above the sea floor while the boat was moving at a speed of 1 km. Additionally, at almost every station, supplementary samples were obtained using the GULF VII net with a mesh size of 280 µm which was also hauled obliquely from 100-m depth.

Fig. 1
figure 1

Sampling locations. Black dots indicate stations sampled with a vertically towed WPII net, red dots indicate where obliquely-towed Multinet MAMMOTH nets were used to create the mock samples

A small subsample (typically, 1/8 of the total sample) was obtained from the WP-II nets (16 stations) or the deepest net of the MultiNet MAMMOTH (2 stations). The animals were immobilized using a few drops of carbonated sea water then sorted under a stereomicroscope, and 200–500 animals were selected to create a “mock” sample. The sample was also supplemented with larger and/or rarer animals from the GULF VII net from the corresponding station. Per this method, the diversity within the samples reflected the real environmental diversity at that location (i.e., only animals from a given station were used to create the sample), but their relative abundances were to a large degree artificial. The samples were created to maximize diversity and varying contribution of various taxonomic groups in different samples. Organisms were identified to species level when possible, although for many organisms identification was done at the genus level or higher, especially at larval/juvenile stages. For example, copepods belonging to the genera Pseudocalanus and Paracalanus were identified as Pseudo/Paracalanus spp., and Calanus finmarchicus and C. helgolandicus were grouped as Calanus spp. All calanoid copepod nauplii were identified as Calanoida. Meroplankton was typically grouped at the phylum or class level. The body length of each animal was measured using the ZoopBiom digitizing system (Roff and Hopcroft 1986), and its biomass was estimated using a length–weight regression relationship published for the species or a similar species (see Ershova et al. 2015 for details). While this indirect method of biomass estimation has an associated bias, especially for groups for which there is limited data on length–weight relationships, it allows to estimate species weight without the labor-intensive practice of weighing every group of organisms. Total sample biomass was estimated by summing all the individual dry weight values. Nitrile gloves were worn during sample preparation, and all dishes and sorting tools were rinsed with 10% bleach between samples to avoid contamination. At most locations, two samples were created for each station: one was preserved in 100% ethanol, and the other was rinsed with freshwater and placed on a pre-weighed aluminum tray, then dried at 65 °C for 24 h. These two samples per station were designed to be very similar in quantity and composition, but were not identical. In total, 33 samples were prepared from 18 stations (17 preserved via drying, and 16 in ethanol).

DNA analysis

In the lab, the dry samples were re-constituted with a few drops of MilliQ water and ethanol was drained and replaced with MilliQ water. The samples were homogenized in 2-ml tubes containing ceramic beads using a 2 × 150 Precellys machine. Three 100-µl replicates of the resulting homogenate were taken, and DNA was extracted from each replicate using the Qiagen Blood and Tissue Kit according the manufacturer’s protocol. PCR amplification was carried out using individually tagged Leray-XT primers for COI: forward primer mlCOIintF-XT 5'-GGWACWRGWTGRACWITITAYCCYCC-3' (Wangensteen et al. 2018) and reverse primer jgHCO2198 5'-TAIACYTCIGGRTGICCRAARAAYCA-3' (Geller et al. 2013). The reagents and PCR protocol are described in detail in Ershova et al. 2021. Two extraction blanks and two PCR blanks were sequenced with the samples as negative controls. The PCR products were visualized on a gel to ensure the absence of contamination, then purified using Minelute PCR purification columns (www.qiagen.com) and pooled into a single library. The NextFlex PCR-free library preparation kit (Perkin-Elmer) was used to prepare the Illumina library, which was then sequenced on an Illumina MiSeq using ½ of a V3 2 × 250-bp kit (Illumina).

Bioinformatics

The bioinformatics pipeline closely followed Ershova et al. 2021. Paired-end reads were aligned with illuminapairedend from OBITools v1.01.22 (Boyer et al. 2016), and reads with an alignment score < 40 were discarded. Primer sequences were removed, and reads were demultiplexed and assigned to individual samples using ngsfilter. Reads were selected for length (299–320 b.p.) using obigrep and dereplicated using obiuniq. The uchime_denovo algorithm (Edgar et al. 2011) implemented in vsearch v1.10.1 (Rognes et al. 2016) was then used to remove chimeric sequences. Singletons (sequences with abundance of 1 read) were removed, and step-by-step clustering was performed in SWARM 3.0.0 (Mahé et al. 2021) using a distance value of d = 13, which is the optimal value for this marker (Antich et al. 2021), to cluster individual sequences into molecular operational taxonomic units (MOTUs). Taxonomic assignment of the representative sequences was then performed using ecotag (Boyer et al. 2016) against DUFA-Leray v.2020–06-10, a custom reference database (publicly available from github.com/uit-metabarcoding/DUFA), which included sequences of the Leray fragment extracted from BOLD and Genbank, complemented with in-house generated sequences. This method uses a phylogenetic tree-based approach, and when a sequence fails to produce a perfect match, it identifies taxa which are at least as similar to the nearest hit as the sequence in question, and assigns it to its most recent common ancestor, usually resulting in a higher rank, such as genus, family, or order. In the cases when not enough related sequences were present in the database to perform this analysis, an artificial threshold was applied (95% similarity for genus, 90% for family, 85% for order, and 75% for class). The resulting dataset was curated for putative pseudogene sequences using LULU (Frøslev et al. 2017). The next refining steps consisted of removing MOTUs assigned to prokaryotes and non-planktonic organisms (e.g., mammals) and a second manual taxonomy check of all MOTUs using BOLD (Barcode of Life Database, www.boldsystems.org) using species-level barcode records. A species-level identification was assigned with a minimum of 97% similarity. Several MOTU’s produced more than one species-level match and were dealt with on a case-to-case basis, with credibility of reference sources and known species distributions playing the main role in the final assignment. After all the bioinformatics steps, the three extraction replicates were pooled for each sample.

Data analysis

All analyses were performed in R (R Development Core Team 2011). Pearson correlation coefficients (r) were calculated between relative biomass, abundance, and sequence reads of all taxa at the lowest common taxonomic resolution both for the entire dataset and for each sample individually. Regression relationships between the proportion of abundance/biomass and the proportion of sequence reads were established for each species/taxa using simple linear regression. All variables were square-root transformed to meet the assumption of homoskedasticity. In order to maximize the comparability between biomass and sequence reads, for statistically significant (p < 0.05) and relatively strong (R2 > 0.3) relationships, where the slope was less than 0.6 or greater than 1.4, the slope of the line was used to introduce linear adjustment factors for each species/taxa across the entire dataset using the equation \({n}_{\mathrm{adj}}=\frac{{s}^{2}(Tn-{n}^{2})}{T-{s}^{2}n}\), where n is the number of sequence reads of the given species/taxa, s is the slope of the regression equation, and T is the total number of sequence reads within that sample. This formula adjusts the number of sequence reads of a given taxa to change its proportion in a sample of T reads by a linear factor of s. Because changing the number of reads of one species will inevitably change the relative contribution of all other species in the sample, the adjustment factors were applied to each species or taxa in question one by one in descending order of maximum relative sequence read abundance (i.e., the taxa/species that had the greatest impact on the relative number of reads were treated first), with T re-calculated again after each adjustment. Only species/taxa that were observed in a minimum of 5 samples both via metabarcoding and microscopy were included in the analysis. The R code used to calculate and apply these conversion factors is available in Supplementary Material 2.

The proportions of sequence reads belonging to each species/taxa were then multiplied by total sample biomass (mg DW) to calculate biomass-weighed sequence reads (BWSR) for each species. Mock community analysis using abundance, biomass, and BWSR data was carried out using the package “vegan” (Oksanen et al. 2016). Additionally, in order to compare BWSR and biomass estimates directly, a “pooled” dataset was created, where each sample was represented twice, once by BWSR and once by microscopically estimated biomass data. For each dataset examined, non-metric multidimensional scaling (nMDS) was carried out on Bray–Curtis dissimilarities of fourth-root transformed data. The species significantly driving the ordination were identified using the function envfit at a significance level of p = 0.05 and visualized as biplots. Additionally, we performed hierarchical cluster analysis (average linkage method) of the resulting dissimilarity matrices, and clusters (“assemblages”) were identified via the simprof tool (Clarke et al. 2008) with an alpha level of 0.05. Finally, the correlations between Bray–Curtis dissimilarity matrices of abundance, biomass, and BWSR were examined using the Mantel permutation test (Mantel and Valand 1970), which calculates the Pearson correlation coefficient between all entries in the two matrices, while permuting the rows and columns of the matrix 9999 times to determine statistical significance. Only taxa/species that contributed a minimum of 3% of the transformed biomass, abundance, or BWSR values in at least one sample were used in the analyses. Because the relative composition of species within the samples was artificially modified, these analyses were done more as proof-of-concept for the method, rather to accurately describe the communities in this region.

Results

Sequencing summary

The sequencing run produced a total of 8,165,891 sequences, with sequences per replicate ranging between 15,000 and 182,000 reads (mean = 82,500) and per sample (across 3 replicates) between 70,700 and 457,101 (mean 250,000). The extraction blanks produced 200–400 reads and the PCR blanks 100–150 reads. There were no differences in recovered DNA concentration or sequencing depth between ethanol and dry samples, so hereafter both types of samples were used for all analyses (Fig. 2a and b). Rarefaction curves reached an asymptote in the vast majority of samples, suggesting that this sequencing depth was sufficient to recover all or most of the diversity (Supplementary Material 3).

Fig. 2
figure 2

Ranges in a DNA concentration, b sequencing depth, and c recovered diversity in ethanol preserved and dehydrated samples

Diversity

The visual identification yielded a total of 59 unique holoplanktonic and 16 meroplanktonic species/taxa across all samples, with the typical number of taxa per sample ranging from 15 to 40. Metabarcoding identified 357 MOTU’s across the whole dataset, of which 228 were identified to species level, 19 to genus or family level, 43 to order or class, 10 to higher taxonomic ranks, and 57 remained unassigned. Of these, 282 MOTU’s belonged to either holo- or meroplanktonic (fish or benthic) organisms and corresponded to 86 and 124 unique species/taxa of holo- and meroplankton, respectively (Supplementary Material 4). This list included all but 4 of the species identified visually (Microsetella norvegica, two appendicularian species and one monstrilloid copepod). There were no differences in species richness between ethanol and dry samples (Fig. 2c). The additional species identified via sequencing included several hard-to-identify crustaceans, hydrozoan jellyfish, and larvae of benthic animals. For example, the visually identified group Pseudo/Paracalanus spp. consisted of 5 different species found in varying proportions at different stations, of which Paracalanus parvus was the numerically dominant one.

On a sample-to-sample basis, typically 90–100% of registered species/taxa were recovered via metabarcoding (average 93%), with the number of “false negatives” (species not detected via metabarcoding) never exceeding 1–4 species. However, the percentage of “false positives” (species that were identified via metabarcoding but were not detected visually, at the lowest common taxonomic resolution) was high, in many samples doubling or tripling the total diversity (Fig. 3a). When only species that contributed at least 0.01% of total reads in a sample were included, the percentage of false positives was reduced to less than 10% across all stations, but the percentage of false negatives rose, as some of the “real” species began to fall under that threshold (85% species recovered, on average) (Fig. 3c). Dropping the detection threshold further to 0.1% of total reads, false positives were reduced to no more than 1–5 species per sample, but only 60% (on average) of the “true” species on average were recovered (Fig. 3d). Regardless of the threshold limit applied, false positives typically made up a very small proportion of total reads. However, at 6 stations, they made up over 10% of total sequence reads, and at 3 stations over 25% (Fig. 4). These high contributions were typically caused by 1–2 species.

Fig. 3
figure 3

Number of recovered species via metabarcoding only (green), microscopy only (red) and both methods (blue) in the; a entire dataset and at; b 0.001%, c 0.01%, and; d 0.1% contribution to a sample. Species were counted at the lowest common taxonomic resolution

Fig. 4
figure 4

Proportion of reads belonging to false positives vs. species with confirmed presence

The visually observed species that were not detected via metabarcoding in all samples and/or frequently fell below the 0.01% detection limit included very small copepods such as Acartia spp., Detrichocoryceaus sp., and Oncaeidae, as well some meroplankton groups such as bivalve and gastropod larvae. Organisms frequently detected as “false positives,” on the other hand, include the large copepod Calanus hyperboreus, several species of jellyfish, fish, cirripeds, gastropods, and the chaetognath Parasagitta elegans.

Quantitative correlations

At the lowest common taxonomic level, both relative abundance and relative biomass of each species/taxa were correlated with relative sequence reads across the entire dataset, with the correlation stronger for biomass (r = 0.57) than abundance (r = 0.41). Within each individual samples, the correlations of relative biomass to sequence counts varied from 0.15 to 0.95 (Supplementary Material 5) and relative abundance to sequence counts from 0 to 0.92. When examining relative contributions of individual taxa across all samples using simple linear regression, statistically significant (p < 0.01) correlations were observed for almost all commonly encountered taxa, both between biomass and sequence counts (Fig. 5a and Table 1) and abundance and sequence counts (Table 1). The relationships to biomass were typically stronger, but for several species of copepods and pteropods, abundance was the better correlated value to relative sequence reads (Table 1). The strongest relationships to biomass were observed for several non-copepod crustacean taxa: euphausiids (R2 = 0.83), cirripeds (R2 = 0.88), ostracods (R 2 = 0.89), and decapods (R2 = 0.64), as well as echinoderm larvae (R2 = 0.74). Moderate relationships (R2 between 0.4 and 0.6) to biomass were observed for most copepod species. The weakest correlations were observed for bivalve larvae and cnidarians (R2 = 0.31). The intercept was significantly different from 0 only for Calanus helgolandicus/finmarchicus, but the slopes strongly deviated from 1 (less than 0.6 or greater than 1.4) in 22 of 40 species/taxa, indicating that they were consistently either over- or under-represented in the sequencing data relative to their estimated biomass contribution to the sample. This deviation was even stronger in the abundance data, with 29 of 40 species/taxa deviating strongly from 1. After the introduction of species- or taxa-specific linear conversion factors, all slopes fell into the range of 0.6–1.2 for biomass (Fig. 5b and Table 1). Because changing the number of reads of one species also affected the relative abundance of other species, this changed the strength of the relationship for several taxa (significance remained unchanged). For 12 species/taxa, the R2 increased by 0.05 or more, and for 5 species/taxa, it was reduced by 0.05 or more. The remainder of the relationships remained unchanged (Table 1). The overall Pearson correlation between relative biomass and sequence counts across the whole dataset improved from 0.57 to 0.77. Consistent with this high correlation, the spatial distribution of adjusted-BWSR (proportion of sequence reads multiplied by total sample biomass) showed very similar, although not identical, trends to biomass for all examined taxa (Fig. 6).

Fig. 5
figure 5

Regressions between square-root transformed proportion of biomass (estimated from microscopy) and square-root transformed proportion sequence counts for selected taxa with; a original data, and; b data adjusted with species/taxa-specific linear conversion factors. For statistical summary and all species/taxa, see Table 1

Table 1 Summary of regression relationships between square-root transformed proportion dry weight (DW)/abundance and square-root transformed proportion sequence reads in original and adjusted (adj) data
Fig. 6
figure 6

Spatial distribution of biomass, as estimated from microscopy, and BWSR, as estimated from metabarcoding, for select taxa

Mock community analysis

The Mantel test of associations between Bray–Curtis dissimilarity matrices of abundance and biomass, abundance and BWSR, and biomass and BWSR produced r values of 0.52, 0.54, and 0.73, respectively (significance p < 0.001), indicating moderate to high linear correlations between them. Accordingly, multivariate analysis of the three parameters (abundance, biomass, and adjusted BWSR) revealed slightly different, but highly complementary patterns for each of the datasets (Fig. 7). The separation of samples on all three nMDS ordinations were driven by 3 groups of species, with some, although not all, species shared between the data types. Overall, the metabarcoding dataset contained a much longer list of species within each of the groups due to the higher taxonomic resolution. All three ordinations contained a very distinct assemblage containing the two MultiNet stations (Group 1, light blue on Fig. 7), characterized by a variety of deep- and cold-water species that were absent in other samples. Similarly, all three datasets contained an assemblage associated with the southeast region (Group 2, pink and dark blue on Fig. 7), distinguished by Sagitta/Parasagitta spp. (all three datasets), Calanus finmarchicus/helgolandicus (biomass and BWSR), Aglantha digitale (abundance and biomass), and Candacia armata (abundance and BWSR). A northern/western group (Group 3, light and dark purple and dark green on Fig. 7) was driven by cirripeds (all three datasets, represented by 4 different species within the BWSR data), Acartia spp., (all three datasets, Acartia tonsa in the BWSR dataset), Pseudo/Paracalanus spp. (all three datasets, Pseudocalanus elongatus in the BWSR dataset), decapod larvae (abundance and BSWR, represented by 5 species in the BWSR dataset), and polychaete larvae (abundance and BSWR, represented by 4 species in the BWSR dataset). Cluster analysis of the pooled adjusted-BWSR/biomass dataset at the lowest common taxonomic resolution identified 9 clusters (“assemblages”) via the simprof routine. In 30 out of 33 samples, the microscopically derived data points were placed within the same assemblages as their BWSR counterparts. Similarly, the ethanol/dry samples collected at the same station were placed within the same assemblages in all but two instances, as was expected given the similarity of the sub-samples. The nMDS ordination partially resolved the complexity (2D stress = 0.21), and somewhat supported the separation of the clusters, although some groups were more distinct than others. Similar to the previous analysis, the most distinct assemblage included the two stations collected with the deep MultiNet (gray on Fig. 8). Despite slight variation in placement, the BWSR/biomass data points followed very similar patterns on the nMDS ordination.

Fig. 7
figure 7

Results of nMDS ordination and cluster analysis for fourth-root transformed; a abundance (estimated from microscopy); b biomass (estimated from microscopy), and c BWSR (estimated from metabarcoding) data, and the spatial distribution of clusters in the North Sea. Colors represent clusters at ~ 30–40% dissimilarity, as identified by simprof. Blue dashed lines connect samples collected at the same station. Arrows indicate biplots of species significantly (p < 0.05) correlated with the ordination, with biplots close together in space grouped as groups 1–3. For better readability, the species representing the biplots within each group are listed below the plots

Fig. 8
figure 8

Results of nMDS ordination and cluster analysis for fourth-root transformed biomass (estimated from microscopy) and BWSR data pooled at the lowest common taxonomic resolution, and the spatial distribution of clusters in the North Sea. Colors represent clusters at ~ 30–40% dissimilarity, as identified by simprof. Blue dashed lines connect samples collected at the same station, and black solid lines connect BWSR/biomass data from the same sample. Arrows indicate biplots of species significantly (p < 0.05) correlated with the ordination

Discussion

Metagenomic methods are increasingly being applied in marine ecosystem monitoring to provide deeper insights into communities of microbial organisms (Pawlowski et al. 2016; Santoferrara et al. 2020), phyto- and zooplankton (Bucklin et al. 2019; Coguiec et al. 2021; Ershova et al. 2019; Yoon et al. 2016), fish eggs and larvae (Lira et al. 2023), and zoobenthos (Klunder et al. 2022). These methods are efficient and cost-effective and offer a much higher level of taxonomic resolution compared to other approaches. However, despite increasingly coming of age, concerns remain about potential biases and errors associated with PCR-based metagenomic techniques which hinder the interpretation of the data, especially in a quantitative way. Santoferrara (2019) identified five of these potential sources of error in metabarcoding as false negatives, false positives, erroneous identifications, skewed relative abundances of different taxa, and artifactual sequences. The following sections address these potential issues in the context of our protocol and provide recommendations to address them.

Species richness detection and false positives/negatives

Metabarcoding is an extremely sensitive tool and given enough sequencing depth can detect trace amounts of DNA, including extracellular DNA, residual fragments of organisms left in the sample, and gut contents. This is an undeniable strength of the method, but this sensitivity can also vastly inflate diversity estimates, as seen in our results, where the number of “false positives” detected within a sample, at the lowest common taxonomic resolution, often was equal to or exceeded the number of species observed visually. It is important to note, however, that we cannot confirm that every “false positive” was truly absent from the sample. Although the samples in our study were to a large part artificially engineered, they still contained some individuals that we could not identify properly, such as early copepodites, nauplii, trochophore larvae, eggs, and small fragments of biological debris. For example, the presence of cnidarian or chaetognath DNA in a number of samples could have been due to these fragile gelatinous organisms breaking apart in the net haul, leaving behind fragments of their bodies. Chaetognaths are also known to regurgitate their gut contents when hauled by nets (Baier and Purcell 1997), which could result the presence of additional half-digested DNA, both their own and that of their prey. Nonetheless, regardless of whether they were truly absent or present in trace amounts, the total contribution of “false positives” was typically very low, 0.1% of total reads or lower. There were several samples, however, that fell outside of this range, and where reads belonging to false positives exceeded 5–10%, or in 3 samples even 25%, of total reads. Given the very small starting volume of the samples (300–500 individuals), it seems plausible that cross-sample contamination was a likely culprit in these cases. Alternatively, the presence of “false positive” species as early life stages, unidentifiable debris or inside the guts of other species in the sample, as described above, could potentially lead to similar results. Similar to the majority of other studies that used a mock sample approach (Santoferrara 2019), we found that metabarcoding recovered almost all of the observed diversity at the given sequencing depth, confirming that our choice of barcode and sequencing depth are appropriate for this region of the world ocean. Despite this, some of the smaller and rarer species within our data, such as the very small copepods or bivalve larvae, were not recovered in all samples, or were recovered only with a very low detection threshold, placing them within the detection range of the aforementioned “false positives.” Additionally, notably missing among the species list were appendicularians, for which COI may be a poorly suited barcode region (Bucklin et al. 2021), but which can nevertheless be significant contributors to plankton communities. Including an additional barcode, such as V4 or V9 regions of the 18S rRNA gene could have alleviated this. Other studies that implemented multiple barcode regions found that this approach increased the recovered diversity and minimized false negatives (Zhang et al. 2018). However, the numerical treatment of multi-locus metabarcoding data to estimate species relative read abundances remains a challenge.

Different approaches can be taken to minimize the effects of false positives and negatives when estimating biodiversity, depending on the goals of the study. If a broad-scale community analysis is the aim, particularly, as is often the case, in the context of earlier efforts that may have used morphological methods of identification, applying a conservative approach and setting an arbitrarily high read abundance threshold may be appropriate. Although this will exclude the rarer species, they are unlikely to be highly ecologically significant. If concurrent microscopic analysis reveals that some abundant/ecologically important species are being missed with the given metabarcoding protocol, additional genetic markers can be included, as was discussed in the section above; alternatively, these species can be screened for visually. On the other hand, if the goal is early detection of invasive species, then a very high sensitivity is necessary that will allow to detect even a single larva in a dense tissue sample. One approach to ensure proper sensitivity in this case could be “spiking” the sample with a single individual of a taxonomically similar species that does not naturally occur in the studied environment (for example, in the case of marine crustaceans, a small freshwater crustacean). This approach is commonly used in insect monitoring studies (i.e., Batovska et al. 2021). For simply recovering species richness, any number of approaches can be used. Our method of estimating diversity errs on the side of being overly conservative, counting only discreet identified species or taxonomic groups, and not MOTU’s. In most situations, not all MOTU’s can be identified to species level, but can often be identified to genus, family, or even higher ranks, and we collapsed all commonly identified MOTU’s into single categories. This removes a large part of the bias associated with clustering algorithms (i.e., van der Loos and Nijland 2021), but it is likely to underestimate true diversity, as these pooled MOTU’s can in fact represent many discreet species that are just absent from reference databases. Regardless, any kind of diversity metrics will only be relevant within the context of a single protocol and are not comparable to other methods or even other sequencing runs using similar methods, unless properly rarified. They can, however, be very valuable for within-study comparisons, for example, to map spatial, temporal, or seasonal patterns in species diversity.

Quantification and community analysis

Quantitative interpretation of metabarcoding data remains a subject of ongoing debate, with the most common concerns being difficulty of developing truly universal primers, differences in gene copy numbers between species, and the exponential nature of PCR (Bucklin et al. 2016; Santoferrara 2019; van der Loos and Nijland 2021). New, non-PCR-based molecular methods of community analysis, such as detecting specific marker genes using shot-gun sequencing, may offer a promising alternative to metabarcoding as these technologies continue to develop (Pierella Karlusich et al. 2023). Yet, despite the known issues of PCR bias, in recent years, there has been increasing evidence of quantitative or semi-quantitative potential of metabarcoding in a wide range of metazoan communities (Ershova et al. 2021; Krehenwinkel et al. 2017; Lamb et al. 2019; McLaren et al. 2019; Schenk et al. 2019; Thomas et al. 2016). One of the challenges of quantification is identifying an appropriate metric for comparing to sequence counts. With single cellular organisms, cell counts are typically used, but even then, read copy numbers can vary vastly between species (Wang et al. 2017). In metazoan communities, counts or abundance estimates are additionally confounded by the fact that different species are vastly different in size. Mesozooplankton, for example, even when captured by the same sampling gear, will vary from ca. 200 µm to several centimeters in length, which results in several orders of magnitude difference in weight. Nonetheless, we found a surprisingly high correlation of relative abundances to relative sequence reads within our dataset, particularly for some taxa. However, biomass or carbon weight was found to be the better quantitative proxy. This is not surprising, as a large organism will expectedly contain more DNA than a small one. Although species abundance is a much commonly used metric than species biomass when describing zooplankton communities, this is more a methodological artifact due to the fact that species counts are traditionally much easier to obtain than species weights and does not necessarily reflect the fact that abundance is the more ecologically relevant measure. Indeed, in terms of grazing pressure, carbon and nutrient cycling, and availability to higher trophic levels, species-specific biomass may be a much more important parameter to quantify.

Several previous studies have reported moderate to good correlations of relative read numbers to relative biomass in zooplankton for several genetic markers, including COI (Elbrecht and Leese 2015; Ershova et al. 2021; Matthews et al. 2021; Yang et al. 2017). These correlations were far better for some taxa than for others, and even the strong relationships were frequently very different from a 1: 1 ratio, which makes it challenging to infer one value from the other. We similarly found these correlations to be significant for almost all taxa for which sufficient observations were present, and likewise, observed stronger correlations for some groups than for others. We go a step further, however, and introduce a new quantitative metric called BWSR (biomass-weighed sequence reads). Unlike relative abundance, relative biomass can be much more easily transformed into a quantitative parameter by taking a single bulk biomass measurement, something that is already usually done as part of routine zooplankton analyses. This number is then multiplied by the proportion of reads belonging to a species or taxa. Furthermore, by applying species- or taxa-specific conversion factors that bring the relationships between read counts and biomass closer to unity, we can additionally reduce some of the PCR-based bias and increase comparability between these two measures. Conversion factors have been used in previous studies to improve quantification of microbial communities (McLaren et al. 2019), diatoms (Vasselon et al. 2018), fish (Thomas et al. 2016), and arthropods (Krehenwinkel et al. 2017), but to our knowledge, this is the first time this approach has been applied to marine zooplankton communities. While BWSR is closely related to biomass and shares units with it, it should nonetheless not be reported as biomass or be used to compare to other biomass-based studies. Instead, it provides a framework for quantitatively monitoring relative species composition over time and space within a single study (as for example, in Coguiec et al. 2021) and allows for the use of standard community analysis tools such as multivariate statistics. It can further be used to calculate a number of additional indicators, for example, the ratio of mero- to holoplankton, or copepod- to non-copepod taxa, or to estimate various indices of biogeographic affinity. For some taxa with a relatively small range in body weight (for example, small copepods such as Oithona spp.), BWSR values can be also used as a proxy for biomass to estimate species abundance, as is frequently done vice-versa in other studies. All three quantitative measures—abundance, biomass, and BWSR revealed very similar, and when collapsed to a common taxonomic resolution, nearly identical community patterns. The BWSR data, however, was able to provide much more detailed species-level information about which particular species were driving the community trends. For example, cirriped and decapod larvae, which were found to strongly drive the community structure in terms of all three examined parameters, were each represented by 4–5 distinct species which were not identifiable via morphology. These results strongly support the notion that metabarcoding data can be used not only for estimates of biodiversity or mapping of presence/absence of species but also to estimate relative densities of species and to quantitatively describe community structure.

Gaps and errors in reference databases

When matching sequences to reference libraries, such as BOLD (Barcode of Life System, www.boldsystems.org), three main error types are commonly encountered: (a) absence of the species in the database, (b) incorrect identification or annotation of reference vouchers, and (c) insufficient, or excessive, divergence in the genetic marker of interest that will result in either multiple species-level matches, or absence of a species-level match at the assigned threshold (usually 97%). The first error type will produce “unknowns” in the final species list, resulting in a higher proportion of “false negatives,” although frequently these MOTUs can be assigned to higher ranks (i.e., family or order). Within our dataset, only about 65% of MOTU’s were assigned to species, and 15% could not be matched even at a phylum level. The second error type, incorrect identifications, is particularly problematic because it can result in the presence of both false positives and false negatives. With the more common species, it will typically result in several species level matches in the reference database, and the investigator must decide which of them are more reliable. We encountered this type of error with one of the Acartia species, which produced a species level match for both Acartia tonsa and Acartia hudsonica. After manually checking the primary sources and annotations of these vouchers, we concluded A. hudsonica to be the more likely correct identification. On the other hand, misidentification of rare species with only a single barcoded voucher are much more difficult to recognize, and until more barcoding efforts take place, will have to be accepted on faith. Since the morphological identification of the specimen represents the quality control for the DNA barcode, ideally each barcode should be linked to a voucher specimen, and this should become standard practice going forward (Rimet et al. 2021). Multiple species level matches can also be a result of insufficient divergence in the chosen barcode to reliably discriminate between related species. Within our data, such an example was the chaetognath Eukrohnia sp., which matched to both E. hamata and E. bathyantarctica. Based on the known geographical distribution of these species, we concluded that E. hamata was the more likely correct identification. On the contrary, the chaetognath Parasagitta elegans has an exceptionally high divergence in the mitochonodrial genome, and a threshold as low as 90% similarity was used to assign this species (Marlétaz et al. 2017).

For marine metazoan zooplankton, significant efforts have been made in recent decades to create and quality control reliable DNA barcode reference libraries. Building barcode libraries and associated voucher collections have been a primary goal of several individual projects as well as national campaigns (see Bucklin et al. 2021; Weigand et al. 2019 and reference therein). Additional attempts to improve the quality of barcode libraries have also been made by implementing QA measures in reference databases (Fontes et al. 2021). A major effort in curating marine barcode data was undertaken in the Barcode of Life Data Systems (BOLD), including annotation of barcode records according to a ranking system for concordance (Radulovici et al. 2021). Moreover, a new tool for enhanced quality control of DNA barcodes of marine zooplankton (holo- and meroplankton) has been made accessible through the MetaZooGene Barcode Atlas and Database (MZGdb) (Bucklin et al. 2021). Despite the aforementioned and other ongoing efforts, available open-access DNA reference libraries remain incomplete for some taxonomic groups and/or geographical regions (McGee et al. 2019). For example, only 22% of European marine invertebrate species had at least one barcode in BOLD in 2019 (Weigand et al. 2019). Reliable barcode reference libraries are particularly important if metabarcoding is to be implemented in biomonitoring and reports of ecological status (e.g., EU Marine Strategy Framework Directive, MSFD). Thus, there is continued need to build comprehensive and reliable barcode reference databases, both by adding more taxa and by extending the geographical coverage.

Dry oven preservation as an alternative to ethanol

Fixing zooplankton samples with 80–100% ethanol is the most common way of preserving zooplankton for genetic use and is the recommended method in most barcoding or metabarcoding protocols (Rey et al. 2020; van der Loos and Nijland 2021; Wiebe et al. 2017). Other, less common, preservation methods of zooplankton include freezing of whole samples or preserving in other fixatives, such as DESS or RNAlater (Creer et al. 2016; van der Loos and Nijland 2021). We suggest another alternative for zooplankton DNA preservation, which is dehydrating whole samples in a drying oven at 65 °C for 12–24 h. In our study, we expected absolute ethanol to be the superior preservation method, but found no differences between ethanol and dry samples in recovered DNA quality or quantity, sequencing depth, or overall recovered biodiversity. There were also no differences in relative taxonomic composition, with read abundance in both preservation methods equally well correlated to relative biomass for all taxa. Although ethanol preservation remains a “non-destructive” method, allowing for subsequent sub-sampling or screening for taxa/species of interest, it comes with a set of drawbacks. These include a higher reagent cost, additional HSE risks (fumes and fire hazard), logistical transport limitations, sample storage space, and longer/more complicated processing in the lab, since most DNA extraction methods are sensitive to ethanol and require its removal. These challenges contribute to the reluctance of many larger-scale zooplankton monitoring efforts to adopt a metabarcoding component. On the other hand, many monitoring programs, like the one implemented at IMR, already collect dry samples for zooplankton biomass estimation. Simple additional steps can be taken to ensure the suitability of such samples for genetic use. This includes proper cleaning of sampling gear and lab tools between samples to prevent cross-sample contamination, prompt removal of samples from the drying oven once they are dry, and proper long-term storage. With this approach, it is recommended to keep the overall sample volume low (< 2 ml volume of wet animals with water removed) to ensure rapid and even drying. For larger samples, sub-sampling or splitting the sample onto several drying trays is recommended. The dried samples can be transported at room temperature but should be stored at − 20 °C for long-term storage. As an added benefit, measuring the dry weight of the sample prior to sequencing creates a simple path forward to convert relative sequence reads to BSWR, as described in this work.

Conclusion

In this work, we present a framework for recovering biodiversity and quantitative estimation of biomass of zooplankton species from COI metabarcoding data, which we think is suitable for cost-effective large-scale monitoring. We successfully tested the suitability and accuracy of our methods using mock zooplankton samples from the North Sea. Although some biases remain, we conclude that the methods based on COI metabarcoding have currently reached a satisfactory degree of technology readiness, and their results are comparable and highly complementary to morphology-based zooplankton analysis methods. However, similar studies should be performed in other geographical regions, for which reference barcode databases might be not so complete as for European waters. Although our results highlight the strengths of metabarcoding, they also underline the value of an integrated taxonomic approach by applying combined morphological and molecular analyses, which has also been recommended by previous studies (Di Capua et al. 2021; Matthews et al. 2021; Pierella Karlusich et al. 2022).