Introduction

DNA metabarcoding is being increasingly integrated into dietary studies (e.g. van der Reis et al. 2020; Homma et al. 2022), as it can identify multiple species from mixed complex samples, irrespective of the morphological state of ingesta. Although this molecular technique has enhanced our capability to refine the taxonomic resolution of previously indistinguishable organic material, it is not without its limitations. DNA metabarcoding typically targets a conserved gene region in the DNA that also has adequate sequence variation to allow discrimination between closely related taxa, e.g. COI (Leray et al. 2013; Lobo et al. 2013), 16S rRNA (Herlemann et al. 2011; Klindworth et al. 2013) and 18S rRNA (Stoeck et al. 2010; Zhan et al. 2013). Arguably, these data provide qualitative diet analysis (i.e. which dietary taxa are eaten) but not quantitative diet analysis (i.e. the relative mass or volumetric proportions of different foods; e.g. Deagle et al. 2019; Lamb et al. 2019; Piñol et al. 2019). Other important limitations of this molecular technique include DNA extraction, primer choice suitability and completeness of DNA reference databases. Commercially available DNA extraction kits have typically been optimised for the taxa of interest and/or origin of collection for optimal DNA yield and biodiversity richness detection (Carrigg et al. 2007; Deiner et al. 2015; Pearman et al. 2020). More obvious limitations are associated with primers, as no one primer set is truly universal. In silico analyses of primers can indicate which groups of sample taxa may fail amplification (Parada et al. 2016; Tournayre et al. 2020), although these rely on potential prey taxa being represented in sequence databases. Database completeness also affects taxonomic assignment of sequences and can be a hindrance during data interrogation. Thus, species-level identification will not always be possible (Ficetola et al. 2010; Hestetun et al. 2020), but bioinformatics tools can be used to assign at a higher level (e.g. genus or family; e.g. RDP classifier–Wang et al. 2007).

The most pressing issue for DNA metabarcoding is likely to be completeness of reference databases, as reviewed in Keck et al. (2023). Database completeness influences primer design and choice, and subsequent taxonomic assignment of amplified sequences. Historically, the development for DNA reference databases was driven by specific informative gene regions (e.g. COI) that effectively represented a barcode for species identification, and the need to develop primers for those gene regions to amplify a range of taxa (Folmer et al. 1994; Hebert et al. 2003; Meyer 2003; Ratnasingham and Hebert 2007). Given the previous financial constraints of molecular work (Wetterstrand 2023), it made sense for the science community to target these gene regions in subsequent studies to build suitable reference databases. However, methodological issues can arise where other gene regions are more reliable for the amplification of certain taxa (e.g. animals versus plants; Newmaster et al. 2006). Thus, taxa reference sequences, compared across gene regions, are disproportionate within databases (e.g. COI versus Cytb; Ficetola et al. 2010). It is, therefore, critical to target more than one gene region for any DNA metabarcoding study to broaden taxonomic identification. In addition to this, reference databases are still largely incomplete, lacking critical metadata and species representativeness (McGee et al. 2019; Keck et al. 2023). Curated reference databases encourage higher quality data input, but infrequent updates impede data analyses and create new issues (e.g. outdated taxonomic assignments). Current sequencing technologies have helped to facilitate species representation within reference databases by decreasing sequencing costs (van der Reis et al. 2023a; Wetterstrand 2023), enabling more gene regions and/or allowing for longer sections of the DNA (or genomes) to be sequenced, permitting more gene regions to be incorporated in taxonomic identification. However, in studies where samples are likely to be degraded (e.g. dietary studies), it is still optimal to amplify shorter gene regions for taxonomic identification.

Together, these limitations create successive biases that not only influence identification but also quantification of prey taxa, as more intricate complexities associated with the DNA itself exist, such as starting biomass, primer bias, the DNA degradation often associated with dietary work, and variations among taxa in gene copy number (Braukmann et al. 2019; Gong and Marchetti 2019; Martin et al. 2022). DNA metabarcoding is considered to produce relatively robust qualitative results (i.e. frequency of occurrence—FOO) but is generally considered to produce less reliable results for quantification, generating semi-quantitative results (i.e. relative read abundance—RRA; e.g. Deagle et al. 2019; Lamb et al. 2019; Piñol et al. 2019). These limitations are more exacerbated in dietary studies where the biomass, and thus, quantity of DNA, of non-target items may be higher than the actual target items (i.e. peanut butter versus crackers concept; Smoot and Findlay 2010; Clements et al. 2017), as in many grazing fishes. Rates of digestion also differ among taxa (e.g. gelatinous versus invertebrate species) and will influence DNA quality and therefore subsequent DNA amplification.

Here, we examine some of the influential factors impacting DNA metabarcoding (including primer choice–in silico and in vitro, and database deficiencies) with a focus on the diet of the coral reef parrotfish Scarus rivulatus (Labridae). Parrotfishes are key agents of bioerosion and sediment dynamics in coral reef ecosystems (Bonaldo et al. 2014). They grind ingested material with their pharyngeal mill, which disrupts the cells of dietary items making cell contents available for digestion (Lobel 1981; Carr et al. 2006; Clements and Choat 2018). However, the exact predator–prey dynamics between these herbivorous fishes and their dietary targets remain unresolved. Parrotfishes have been categorised as generalist herbivores, consuming the epilithic algal matrix (EAM), by grazing on or excavating EAM-covered substrata (Bellwood and Choat 1990; Lefèvre and Bellwood 2011; Bonaldo et al. 2014; Steneck et al. 2017; Bellwood et al. 2018; Siqueira et al. 2019). EAM is defined as ‘a conglomeration of short, turf-forming filamentous algae (< 1 cm high), macroalgal spores, microalgae, sediment, detritus and associated fauna' (Wilson 1997; Wilson et al. 2003). This generalist dietary characterisation was mainly based on behavioural observations and gut content analysis, both of which have limitations in resolving parrotfish diets. For example, in behavioural observations, the visible and abundant taxa that are ingested incidentally can mask taxa (especially microscopic taxa) targeted for ingestion, and visual identification of dietary taxa in conventional gut content analyses is limited by pharyngeal trituration of ingested material. Other lines of evidence for parrotfish diets including stable isotope and fatty acid biomarkers, trophic morphology and macronutrient analysis indicate a dietary focus on protein-rich microscopic photoautotrophs, especially diazotrophic cyanobacteria (Clements et al. 2017; Clements and Choat 2018). More recent work on feeding substrata supports this hypothesis (Nicholson and Clements 2020, 2022, 2023a; Manning and McCoy 2023), which the present study seeks to test further using DNA diet metabarcoding.

Scarus rivulatus has been classified as a scraping species (Bellwood and Choat 1990) that grazes bioeroded substrata with an epilithic biota dominated by cyanobacterial tufts and diatoms with lower proportions of filamentous red algae (Gigartinales and Ceramiales; Nicholson and Clements 2020, 2021, 2023a). This study aims to characterise the biotic composition of samples of pharyngeal content from S. rivulatus collected on the Great Barrier Reef, Australia (1) by developing a bioinformatically- and laboratory-informed methodology to target the 18S V4 rRNA gene region to ensure maximal representation of phylogenetic diversity and (2) using 16S V3-4 and 18S V4 rRNA gene regions to assess the utility of diet metabarcoding in characterising the diet of this grazing fish species. We hypothesise that (1) the pharyngeal content of S. rivulatus specimens will be dominated by protein-rich microscopic photoautotrophs (Clements et al. 2017) and (2) the cyanobacteria taxa detected will reflect the quantitative results of biotic cover on bite cores extracted from S. rivulatus feeding substrata (Nicholson and Clements 2023a).

Materials and methods

Specimen collection and sample preparation

Scarus rivulatus prefers sheltered reefs, and on the Great Barrier Reef (GBR; Australia), they are most abundant on inner-shelf and mid-shelf reefs in lagoonal, reef flat and back-reef habitats that typically have high levels of sediment (Russ 1984; Gust et al. 2001; Hoey and Bellwood 2008; Gordon et al. 2016; Tebbett et al. 2017). Scarus rivulatus specimens were collected by spear on snorkel from the Lizard Island complex, GBR in December 2016 and December 2017. A total of eight specimens were collected from various locations: two between South and Palfrey Islands (specimens SP1 and SP2; 2016), three at Corner Reef (CR1, CR2, CR3; 2017), two at Martin Reef (MR1 and MR2; 2017), and one at North Direction Island (ND1; 2017; Fig. 1; Online Resource Table S1). The scale these fish feed over is large and the habitats where collection took place were a mixture of patch and fringing reefs. It should be noted that feeding substrata are selected at a much finer scale than that of habitat (e.g. Nicholson and Clements 2022, 2023a). Fish collection was covered by James Cook University Animal Ethics Committee approval A2237.

Fig. 1
figure 1

Scarus rivulatus sample collection sites in the Lizard Island complex, Australia. Collection locations were South and Palfrey (SP; n = 2; December 2016), Corner Reef (CR; n = 3; December 2017), Martin Reef (MR; n = 2; December 2017) and North Direction (ND; n = 1; December 2017). Each red dot represents a sample

Fish were placed on ice immediately following capture. Upon return to the laboratory at Lizard Island Research Station, the pharyngeal baskets were removed by dissection and pharyngeal content was preserved in 70% ethanol. Some of the pharyngeal content was partially triturated.

DNA extraction

DNeasy Powersoil Kit (Qiagen; discontinued and replaced by DNeasy Powersoil Pro Kit) was used for the extractions, following the manufacturer’s protocol, with the only change to the protocol reducing the vortex duration to 2.5 min for the bead-beading with lysis solution step. Preliminary testing of 10 (protocol standard), 5 and 2 min on additional parrotfish species pharyngeal content subsets indicated that continuous vortexing (using a horizontal vortex adapter) longer than 5 min affected the DNA extraction substantially (no amplification seen for any subsets vortexed for 10 min). Thus, further extractions had a vortex step of 2.5 min. This physical cell-lysis step has shown to be more efficient for DNA extraction from cyanobacteria (which have a protective matrix) than methods without (e.g. Gaget et al. 2017) and greatly assists the disruption of algal cell walls (e.g. Fawley and Fawley 2004; Jagielski et al. 2017), which was essential to this study. A DNA extraction was also done for the same pharyngeal samples (subsets) without a physical cell-lysis step (Puregene Tissue Kit, Qiagen), and results were consistent with other studies that suggest a physical cell-lysis step is beneficial for extraction and subsequent amplification.

Primer selection and amplification

To ensure that various cyanobacteria species were represented in the data, as well as a greater spectrum of other bacterial species, 16S rRNA primers targeting the V3-4 gene region were selected for amplification (Herlemann et al. 2011; Klindworth et al. 2013). To target eukaryotes, the 18S V4 gene region was selected. Two common universal primer pairs were chosen (Stoeck et al. 2010; Zhan et al. 2013), with a third set (Bradley et al. 2016; Fiore-Donno et al. 2018) selected by using the PR2 primer database (Vaulot et al. 2021; Online Resource Fig. S1). To compare diatom detection when using the 18S universal primers, a primer set designed specifically for diatoms was chosen (Zimmermann et al. 2011). It is well known that diatoms are difficult to extract DNA from (e.g. Manoylov et al. 2016), and thus, this primer set was selected to act as a reference as to what extent the other 18S rRNA V4 universal primer sets would potentially miss these rarer DNA templates. Hereafter, 18S primer sets are referred to by first author (or an abbreviation thereof) from their respective published papers to distinguish them (Table 1).

Table 1 Gene regions targeted and respective primers sets and protocols used. Fiore-Donno forward sequences were combined (K underlined)

While the Stoeck primer set is often used, the forward and reverse primers have substantially different optimal polymerase chain reactions (PCRs) annealing temperatures which hinder their compatibility, and thus potentially limiting strong amplification (Stoeck et al. 2010). Although the original PCR profile called for a two-step PCR, recent research suggests that amplification on herbivorous fish dietary content is possible at a single annealing temperature of 55 °C (Lin et al. 2021). On the other hand, the Zhan primers amplify well at a single annealing temperature, but research has identified that it fails to amplify Ochrophyta which is essential to this study (Briand et al. 2018). Thus, a variety of 18S V4 rRNA primers (Vaulot et al. 2021) was selected from the PR2 primer database and tested in silico to mitigate these primer sets’ caveats, focusing on Ochrophyta sequence matches for possible future use in conjunction with Zhan if needed (Online Resource Table S2). The forward and reverse primers were independently tested using cutadapt v2.10 (Martin 2011), with the curated DNA reference databases SILVA v138.1 (Quast et al. 2013) and PR2 v4.13.0 (Guillou et al. 2013). Cutadapt, although specifically designed for adapter/primer removal from next-generation sequences, can be used for a very quick and easy way to not only test primers, but also to customise databases.

PCRs were done using MyTaq Red Mix (Bioline; Meridian Bioscience) master mix; 6.25 μl MyTaq Red Mix, 0.5 μl of each primer (10 μM), 4.25 μl UltraPure DNase/RNase-Free Distilled Water (Invitrogen—Thermo Fisher Scientific, Massachusetts, USA), 1 μl DNA and 1 μl BSA (1%; when necessary for optimal DNA amplification). PCR profile for Herlemann primers followed Klindworth et al. (2013) (Table 1); however, for 18S, the PCR profiles were developed that assessed the original PCR profiles to then create a single step PCR profile that tested a range of annealing temperatures. Previous PCR profiles had annealing temperatures ranging from 50 °C to 57 °C; thus, 51 °C, 53 °C, 55 °C and 57 °C were tested. The optimal annealing temperature was found to be 53 °C for all primers (Table 1). If amplification did not occur due to low DNA concentration, the DNA volume was increased (e.g. 2 μl DNA) and the water volume was decreased (e.g. 3.25 μl water), proportionally. Negative controls were included in every set of DNA extractions (extraction blank—no tissue added) and every PCR run (PCR blank—no DNA added) to check for possible cross-contamination. Primer pairs had Illumina Nextera library adapters added (Illumina 2013). The PCR products were run on a 1.6% agarose gel and visualised using Gel Red (Biotium), in a Gel Doc XR + (Bio-Rad).

PCR clean-up and pooling

PCRs were performed in triplicate and were pooled together, by sample per primer. Agencourt AMPure XP (Beckman Coulter) was used following the Illumina protocol for PCR clean-up (Illumina 2013). The concentration of the purified PCR products was determined using Qubit dsDNA HS Assay Kit (Invitrogen, Thermo Fisher Scientific) following the manufacturer’s instructions. The PCR products were brought to equal molarity, 2 ng μl− 1 where possible, and primers were then pooled by sample in two groups due to overlapping primers (group 1: FD-Bra and Zim; group 2: Stoeck and Zhan). Sequencing was done through Auckland Genomics (Auckland, New Zealand) where indexing, using the Nextera DNA library Prep Kit and the second round of PCR clean-up occurred before sequencing on an Illumina MiSeq System. The 16S (2 × 300 bp pair-end run) and 18S (2 × 250 bp paired-end run) sequencing was done separately, as the 16S data were utilised from a separate study.

DNA databases and bioinformatics

Raw demultiplexed sequencing data underwent primer removal (cutadapt) and quality filtering (Qiime 2 v2021.2; Bolyen et al. 2019). DADA2 (within Qiime 2; Callahan et al. 2016) was used for sequence truncation (based on visualisation of raw read quality scores), filtering to retain only high quality sequences that passed denoising and merging and did not form chimeras. This produced sequences that were then clustered at 100% identity (known as amplicon sequence variants – ASVs; Callahan et al. 2017). The Naïve Bayes Classifier (within Qiime 2; Wang et al. 2007) was used for assigning taxonomy at minimum confidence threshold of 97% confidence from the curated database SILVA v138.1 (16S and 18S), PR2 v4.13.0 (16S—chloroplast source identification) and CyanoSeq v1.1.2 (16S—most up-to-date database for Cyanobacteriota assignment; Lefler et al. 2023).

The plastid 23S V5 gene region was also targeted (Sherwood and Presting 2007) and followed a similar workflow to 16S and 18S. However, taxonomic assignments with SILVA for algae were problematic (1016/1800 ASVs were assigned as ‘Chloroplast’) and assigned cyanobacteria genera were comparable to 16S (Online Resource Table S3 and Fig. S2). Thus, it was decided to focus this study on 16S and 18S only.

Data analyses

R Studio (v1.4.1106; R base v4.1.0) was used for further data filtering and analyses. The data were filtered to remove any cross-contamination using proportional subtraction of DNA and PCR negatives. To further increase confidence in ASVs detected within samples, ASVs were retained only if they had > 5 reads. Any fish ASVs detected were removed as they likely originated from the host (S. rivulatus). A phylogenetic tree was constructed for Cyanobacteriota assigned ASVs (within Qiime 2) using MAFFT (Katoh et al. 2002) and FastTree (Price et al. 2009). The tree was then annotated in ggtree (v3.9.1; Yu et al. 2017). The 18S primer similarities and differences were investigated. The exact sequence matches among primer datasets were explored using global regular expression print (GNU grep; v2.20; command-line utility). The R package ggplot2 (v3.4.3; Wickham 2016) was primarily used for visualisations. Bacillariophyceae (diatoms) and Phaeophyceae (brown algae) were separated from other Ochrophyta taxa for the purpose of this study in visualisations (Azuma et al. 2022).

Results

Primer performance

In silico analyses identified FD-Bra as a primer set that matched a high number of Ochrophyta sequences within SILVA and PR2, while also having relatively comparable matches to the overall number of sequences in the databases to the other primers (Online Resource Table S2). This pair also had a melting temperature difference < 5 °C, which is within the recommended limit (Ye et al. 2012; see tmcalculator.neb.com) and allows for efficient annealing during a PCR. This difference was better than either of the other universal primers which were both outside the recommended limit. In vitro testing also provided further confidence in this primer set as it had stronger amplification across samples in comparison with the Stoeck primers which typically resulted in visibly lower amplification, across all annealing temperatures tested.

Data filtering

Post DADA2 (Online Resource Table S4), the 16S dataset had 116,778 reads and 2,089 ASVs (Online Resource Table S5), FD-Bra had 113,341 reads and 426 ASVs, Stoeck had 48,564 reads and 277 ASVs, Zhan had 30,547 reads and 293 ASVs and Zim had 50,706 and 575 ASVs (Online Resource Table S6). Only one ASV was identified in the 16S negative control (13 reads), and none were identified for 18S. Thus, after proportional subtraction took place, the 16S dataset had 116,674 reads. The datasets were filtered to retain only those ASVs within samples that had > 5 reads, which were taxonomically assigned with ≥ 97% confidence and were not assigned to fish (18S datasets). The 16S dataset had 115,198 reads and 1,686 ASVs after filtering, FD-Bra had 12,863 reads and 283 ASVs, Stoeck had 6,979 reads and 166 ASVs, Zhan had 5,358 reads and 169 ASVs and Zim had 43,754 reads and 438 ASVs.

18S V4 primers and taxonomic assignments

The overlap varied between FD-Bra and the other 18S V4 primers. There were 85 FD-Bra ASVs that overlapped with Stoeck (51%), 79 with Zhan (47%) and 92 with Zim (21%). The majority of these ASVs were given the same taxonomic assignment by the RDP classifier. Only four, five and three FD-Bra ASVs had different taxonomic assignments compared to Stoeck, Zhan and Zim, respectively (Table 2). Further investigation into these differences indicated that some of these differences were due to the taxonomic level of assignment and some miscellaneous labelling within the database (e.g. ‘uncultured eukaryote’ entered as species– FD-Bra ID ‘1’; Table 2).

Table 2 Taxonomic assignment mismatches among primer sets for 100% matching sequences. FD-Bra ASVs (FD-Bra ID) and respective taxonomic assignments were compared to taxonomic assignments for matching sequences for universal primers Stoeck and Zhan, and the diatom-specific primer set Zim

The general trend within individuals among the different universal primers (FD-Bra, Stoeck and Zhan) was similar when comparing the phyla captured (Fig. 2; Online Resource Table S7). CR2 had the highest average number of phyla (14), and MR1 had the lowest (2). Noticeable differences based on the RRA values were the smaller proportion of Rhodophyta in FD-Bra; however, the phylum was found to be categorised as ‘Other’ (i.e. RRA < 5%; Fig. 2). Rhodophyta was not present in MR1 (present with Stoeck) and MR2 and CR1 (present with Zhan). Zhan did not capture any ochrophyte taxa (e.g. Bacillariophyceae and Phaeophyceae). Zim indicated that Bacillariophyceae, Chlorophyta and Phaeophyceae were presented in all individual samples, but this was not seen with the other universal primers.

Fig. 2
figure 2

The composition of phyla within Scarus rivulatus samples identified using 18S V4 rRNA gene region and SILVA v138.1. Great Barrier Reef (Australia) collection locations were South and Palfrey (SP; December 2016), Corner Reef (CR; December 2017), Martin Reef (MR; December 2017) and North Direction (ND; December 2017). If the relative read abundance was < 5% for a phylum, it was categorised as ‘other’. The number of different phyla within other can be seen by multiple black borders

Overall, individual S. rivulatus specimens had ingested a variety of taxa (Fig. 2 and 3; Online Resource Fig. S3). Based on the universal primers, Dinoflagellata were presented in all individuals and detected by all primers. Dinoflagellata were also dominant based on RRA values, with 75% of individuals having a RRA value of > 15% for each universal primer (FD-Bra, Stoeck and Zhan; Fig. 2; Online Resource Table S8). Specifically, Prorocentrum spp. were detected in all individuals (Fig. 3). Annelida, Arthropoda and Rhodophyta were detected in > 50% of specimens (Rhodophyta were detected in Zim, but < 50%). The number of individuals detected within some phyla was dependant on the primer, such as Porifera (FD-Bra: 75%; Stoeck: 37.5%; Zhan: 12.5%; Zim: 75%), Platyhelminthes (FD-Bra: 50%; Stoeck: 25%; Zhan: 0%; Zim: 0%) and Ciliophora (FD-Bra: 25%; Stoeck: 25%; Zhan: 12.5%; Zim: 62.5%).

Fig. 3
figure 3

18S V4 rRNA amplicon sequence variants (ASVs) assigned at genus- or species-level (SILVA v138.1) for algae taxa within Bacillariophyceae, Chlorophyta, Dinoflagellata, other Ochrophyta, Phaeophyceae and Rhodophyta. Great Barrier Reef (Australia) collection locations were South and Palfrey (SP; December 2016), Corner Reef (CR; December 2017), Martin Reef (MR; December 2017) and North Direction (ND; December 2017). The relative read abundance (RRA) indicates the percentage calculated from all taxa identified within each samples per primer. All ambiguous genus and species assignments were removed (e.g. RDP classifier identified an ASV to be a genus within Eustigmatales)

16S V3-V4 taxonomic assignments

The majority of taxa identified from the 16S dataset were bacteria (Fig. 4). The phyla with the greatest relative read abundance across the eight pharyngeal samples were Proteobacteria (> 50% RRA in 75% of samples) and Bacteroidota, respectively.

Fig. 4
figure 4

The composition of phyla identified within Scarus rivulatus samples using 16S V3-V4 gene region and databases SILVA v138.1, CyanoSeq v1.1.2 (cyanobacteria amplicon sequence variants - ASVs) and PR2 v4.13.0 (chloroplast ASVs). Great Barrier Reef (Australia) collection locations were South and Palfrey (SP; December 2016), Corner Reef (CR; December 2017), Martin Reef (MR; December 2017) and North Direction (ND; December 2017). If the relative read abundance was < 1% for a phylum, it was categorised as ‘other’. The number of different phyla within other can be seen by multiple black borders

There were 156 ASVs (total 1,686 ASVs) in the 16S dataset that was assigned to Cyanobacteriota and ‘chloroplast’ when using the SILVA database. For these ASVs, the lineage was identified as chloroplast throughout the taxonomic ranks (order to genus); however, at species level (if this resolution was possible), algal species were assigned. There were 16 species assigned to algae. Using the PR2 database, these ‘chloroplast’ ASVs were able to be assigned using the full algal lineage, and thus, higher level assignments made (e.g. to family). Phaeophyceae was detected in all eight samples, with three samples having a Phaeophyceae RRA value < 1%. Bacillariophyta (not detected in SP1) and Rhodophyta (not detected in MR1) were detected in seven samples. Chlorophyta was only detected in two samples, and both had ~ 7% RRA. From the SILVA dataset, 16 chloroplast ASVs could be assigned to species. There were six overlaps at genus or species level with the PR2 database (matching by ASVs), namely Cylindrotheca closterium, Prasinoderma colonial, Ostreobium sp. (Ostreobium quekettii in SILVA), Erythrotrichia carnea, Porphyridium purpureum and Caulerpella ambigua. Of these six, only three overlapped with the 18S dataset. The remaining 10 typically matched PR2 at various taxonomic levels. For example, Dictyopteris undulata (SILVA; 99.9% confidence) matched Pylaiella littoralis (PR2; 99% confidence), both of which belong to the class Phaeophyceae. PR2 assigned 22 ASVs to species in total; however, some were ambiguous and lacked species-specific details (e.g. Pinguiochrysidaceae sp. and Dictyochophyceae sp.; (Fig. 5), and when these were removed, only samples ND1, CR1 and CR2 had ASVs assigned to genus or species level.

Fig. 5
figure 5

16S V3-V4 rRNA amplicon sequence variants (ASVs) assigned at genus or species level (PR.2 v4.13.0) for algae taxa within Bacillariophyceae, Chlorophyta, Dinoflagellata, other Ochrophyta, Phaeophyceae and Rhodophyta. Great Barrier Reef (Australia) collection locations were South and Palfrey (SP; December 2016), Corner Reef (CR; December 2017), Martin Reef (MR; December 2017) and North Direction (ND; December 2017). The relative read abundance (RRA) indicates the percentage calculated from all taxa identified within each sample. All ambiguous genus and species assignments were removed (e.g. RDP classifier identified an ASV to be a species within Pinguiochrysidaceae)

A variety of Cyanobacteriota taxa was detected, including a mixture of filamentous and unicellular taxa (Fig. 6). Capilliphycus was the most frequently detected filamentous cyanobacteria genus detected among samples (> 50% of samples). Aegeococcus, Parasynechoccus and Xenococcus were the most frequent unicellular genera detected (≥ 50% of samples). Samples MR2 (n = 15) and ND1 (16) had the greatest diversity of cyanobacteria genera. No cyanobacteria were detected in sample CR3.

Fig. 6
figure 6

A phylogenetic tree of 16S V3-V4 rRNA amplicon sequence variants (ASVs) identified as Cyanobacteriota (CyanoSeq v1.1.2). Branch lengths have been incorporated to show evolutionary relationships and bootstrap values > 0.9 are indicated as red dots. The colours branching from the tree indicate the assigned cyanobacteria order for the ASV, if it could be assigned. The inner ring indicates the number of Scarus rivulatus pharyngeal samples that shared the same ASV, and the outer ring indicates the relative read abundance (RRA) averaged across samples. The RRA indicates the percentage calculated from all taxa identified within each sample. The tree tips are labelled if the RDP classifier assigned at genus level, and further categorised into algae type (i.e. filamentous or unicellular)

Discussion

In any dietary study, there are biases that can impact the interpretation of the results. It is important to identify and address these problems to ensure essential dietary data are accurately represented. Not all taxa identified are nutritionally valuable primary targets, and those that are need to be identified. This process generally should involve several different methodological approaches, i.e. a polyphasic approach. In this study, we aimed to highlight the complexity of using DNA metabarcoding to reveal primary dietary targets and why it is necessary to have a good understanding of the consumer and dietary species before drawing conclusions. We highlight several areas where biases could unknowingly occur and how these could affect the dietary assessment. What was apparent in this study was the difficulty of making straightforward conclusions about the dietary targets of S. rivulatus when considering the variation detected among replicate specimens, and about which dietary targets are likely nutritional targets. We discuss the results from both methodological and dietary viewpoints.

Methodological considerations needed to decipher diet

Using the Zim primer set (diatom-specific) showed that the mechanical lysis in the DNA extraction method was sufficient for penetrating species with fortified cell walls or resilient membranes (e.g. cyanobacteria—polysaccharide sheath and capsule, De Philippis and Vincenzini 1998; dinoflagellates—cellulosic theca, Lau et al. 2007; diatoms—silica frustules, Hamm et al. 2003). Mechanical lysis, such as bead beating, is time-efficient and convenient in DNA extractions, but as identified in this study ‘excessive’ treatment tended to fragment the DNA, thus hindering or completely preventing amplification. Given the wide range of taxa detected in the pharyngeal content, the DNA extractions may be improved by using a two-step lysis approach where an overnight lysis buffer is first applied and thereafter mechanical lysis (see Yuan et al. 2015), but this may not necessarily increase the biodiversity detected when taking into account the DNA extraction and PCR stochasticity (i.e. abundant taxa versus rarer; Alberdi et al. 2018, 2019; Ramírez et al. 2018; Liu et al. 2019).

The PCR results indicated that the Stoeck primers did not achieve optimal amplification likely due to the difference in the annealing temperatures between forward and reverse primers. While the lower amplification results are not ideal, it did not appear to create a bias in taxa detection. Having specimens amplified with the respective primers diluted to equal molarity is key to a balanced sequencing run and ultimately provides an equal opportunity for taxa detection. Realistically, this is difficult to achieve, but the effort should be made to reduce sequencing biases (i.e. shorter amplicons sequencing preferentially to longer ones). The read numbers post-DADA2 revealed that equal molarity was not achieved. FD-Bra generated more than double the number of reads achieved with the other 18S primer sets. This difference in initial molarity allowed FD-Bra to have a greater read depth, and thus, the potential discovery of low template concentration/rarer taxa in theory favoured the FD-Bra dataset. However, an increase in FD-Bra ASVs that reflected the inclusion of these ‘additional’ taxa was only apparent in specimen CR2 for non-algal taxa (Online Resource Fig. S3). Further investigation validated these findings as this specimen had the highest number of reads in the FD-Bra dataset. Irrespective of this, the results suggest that FD-Bra performed comparably to the other universal primers. Furthermore, while identifying low template concentration/rarer taxa is beneficial in community studies, their presence in dietary studies is typically considered insignificant because of the low FOO and RRA values among replicate samples. These taxa are not necessarily targeted dietary items and are more commonly categorised as having been consumed incidentally or as environmental DNA (free-floating DNA; Sheppard and Harwood 2005; Bowser et al. 2013; Oehm et al. 2017). Rates of digestion and how they may affect results also need to be considered in dietary studies (Berry et al. 2015; Devloo-Delva et al. 2018; de Sousa et al. 2019). Targeting the most proximal source for consumed matter, such as the pharyngeal content used in this study, is optimal for DNA metabarcoding for taxa detection (i.e. avoiding the digestive process). It is imperative that patterns in the FOO among replicate specimens must be considered alongside summary RRA values (i.e. mean and median) of taxa when determining possible importance of dietary items. For example, reporting only a mean can lead to inaccurate conclusions (e.g. see Chlorophyta in Fig. 2 and Online Resource Fig. S4). In addition, there was a noticeable difference in Rhodophyta for specimens SP2, ND1 and CR2 which had greater RRAs for both Stoeck and Zhan when compared to FD-Bra (Fig. 2). This highlights the issue of how to interpret RRA values quantitatively for complex mixtures when the primer used influences both the taxa detected and the variation in RRA (see Deagle and Tollit 2007; Pompanon et al. 2012; Nielsen et al. 2018; Lamb et al. 2019).

The dissimilarities in the universal primers are more noticeable when comparing ASV overlap for the 18S universal primer sets; ~ 50% of Stoeck and Zhan ASVs overlapped with FD-Bra (and with each other). This indicates the substantial variation among primers in ASV composition despite these primers being designed to be ‘universal’. This type of issue is likely to be resolved in the future as sequencing costs decrease and sequencing technologies increase in capabilities (e.g. shotgun metagenomic sequencing using Oxford Nanopore Technologies; Duncan et al. 2022; National Human Genome Research Institute 2023; van der Reis et al. 2023a). However, database completeness will still be essential for taxonomic assignment (Keck et al. 2023). Our results suggest that using more than one primer per gene region targeted is beneficial for PCR-based methods in capturing a wider spectrum of ASVs and thus taxa. This is crucial when the diet of the species in question is not well understood. In silico testing is useful to test primer sets and estimate their ability to capture a broad (or specific) range of dietary taxa, but incomplete databases are still an issue for testing primer selection.

The ability to assign taxonomy to ASVs at a low level (e.g. genus or species) is directly influenced by database completeness and can limit studies (Devloo-Delva et al. 2018; Keck et al. 2023; van der Reis et al. 2023a). At a 97% confidence assignment parameter, ASV assignments at genus-level ranged from 30 to 40% for the 18S primer sets and 40% for the 16S primer set. This limits identifying dietary items at a resolution that aids further investigation, as typically at a higher level (e.g. family), there is not enough information to draw links to the ‘why’ of an item being targeted for consumption (i.e. data interpretation). A solution is to use another gene region to broaden the taxonomic scope by accessing non-overlapping entries for different taxa (e.g. van der Reis et al. 2023b). For example, the COI barcode marker is often utilised in studies for species-level taxonomic assignments; however, algae have proven difficult to amplify (Kezlya et al. 2023). Other options that suit sequencing regarding amplicon size are the ITS (White et al. 1990) and 23S (Sherwood and Presting 2007) gene regions. As briefly mentioned, 23S was also targeted and initially investigated in this study, but the incompleteness of curated DNA reference databases limited comparing the resulting data across gene regions (e.g. cyanobacteria; Online Resource Fig. S2). This was somewhat mitigated in our 16S dataset by using the most up-to-date databases, e.g. CyanoSeq for cyanobacteria and PR2 for algae. Incorporating multiple databases is not ideal, but it allowed us to identify algae using chloroplast sequences originally assigned to cyanobacteria (cyanobacteria do not contain chloroplasts; Raven and Allen 2003; Hanshew et al. 2013; Sato 2021). There were few algal species assignments and little overlap between databases for the 16S ASVs (i.e. SILVA versus PR2), and of those that did overlap only half were identified in the 18S datasets. Not only does this highlight the disparity among databases and their respective gene region entries, but it indicates the need for caution when investigating results at a species, and even genus, level.

An indication that protein-rich resources are nutritionally important for parrotfish

Scarus rivulatus is one of the most abundant parrotfishes on the GBR, Australia (Russ 1984; Fox and Bellwood 2007; Hoey and Bellwood 2008; Choat et al. 2012) and has a preference for fine-grained reefal sediments with high organic loads (Gordon et al. 2016), with minimal grazing on live coral (Bellwood and Choat 1990). We hypothesised that (1) the pharyngeal content of S. rivulatus specimens would be dominated by protein-rich microscopic photoautotrophs (Clements et al. 2017) and (2) that the cyanobacteria detected would reflect the general quantitative results from bite cores extracted from S. rivulatus feeding substrata (Nicholson and Clements 2023a). Overall, our results show that protein-rich epilithic and endolithic microscopic photoautotrophs were consistently presented in pharyngeal contents. Dinoflagellates were readily detected in the pharyngeal content, among and within samples (FOO and RRA; Fig. 2), but the diets detected are complex mixtures that include a variety of food sources (e.g. Annelida and Arthropods). The low FOO and RRA of cyanobacteria detected (Fig. 4) in the pharyngeal content among samples somewhat reflects the quantitative pattern found on the bite cores of S. rivulatus, which displayed relatively low cyanobacterial filament density compared to other syntopic parrotfish species at Lizard Island (Nicholson and Clements 2023a).

The dinoflagellates Prorocentrum spp. (18S data; microscopic algae) were presented in all samples (FOO) and generally had a high relative abundance within samples (RRA). Prevalence of Prorocentrum 18S sequences was also detected in recent research undertaken on the gut content of Scarus globiceps and Scarus schlegeli (Lin et al. 2023). In contrast, macroalgae is typically absent or in low abundance on parrotfish bite cores (e.g. Padina spp.; Nicholson and Clements 2020), which is likely the reason that no single genus or species of macroalgae were consistently detected in our 18S data at high relative abundance. Microalgae typically have higher protein content than macroalgae (Bleakley and Hayes 2017; Clements and Choat 2018; Lim et al. 2018; Geada et al. 2021; Sheppard et al. 2023), likely to support the protein demand required for the high growth rates seen in parrotfish (Bowen et al. 1995; Choat et al. 2002; Taylor and Choat 2014; Lin et al. 2023).

In the diatom-focused Zim primer dataset, RRA and FOO were in general higher for the epilithic and epiphytic filamentous phaeophyte Sphacelaria sp. than diatoms (Fig. 2 and 3). The inconsistent prevalence of diatoms has also been noted in other parrotfish dietary studies (Cnudde et al. 2015; Clements and Choat 2018; Lin et al. 2023). Sphacelaria sp. was found in all our samples, although in six specimens, it was only detected when using the universal primers. This association between S. rivulatus and Sphacelaria, which is relatively rich in protein, was also found in other parrotfish studies (Lin et al. 2023; Nicholson and Clements 2023a,b). Tropical filamentous Sphacelaria species grow on rocks and dead coral and are also epiphytic on algae (e.g. Padina and Sargassum) and on seagrasses (Van Elven et al. 2004; Titlyanov et al. 2017). Our 18S sequence data closely matched Sphacelaria sp. UTEX LB 800, a cultured sequence most closely affiliated with Sphacelaria rigidula (Tsiamis et al. 2017). This filamentous alga is tiny, with microscopic filaments (Keum et al. 2005). Dinoflagellates, especially Prorocentrum species, occur in high numbers on phaeophytes and filamentous algae (Kohler and Kohler 1992; Delgado et al. 2006). Perhaps, Sphacelaria helps create an ideal habitat (likely spatially and temporally dependent) for Prorocentrum and other protein-rich microscopic photoautotrophs, and thus is colonised more heavily making this tiny alga even more attractive to S. rivulatus and other herbivores (Fricke et al. 2011; Hensley et al. 2013; Stanca and Parsons 2021; Nieder et al. 2022; Nicholson and Clements 2023a,b). We note that Kohler and Kohler (1992) pointed out that filamentous algae and associated epiphytic dinoflagellates were targeted by coral reef fish (including parrotfish).

Cyanobacteria often co-occur with dinoflagellates on benthic mats (Biessy et al. 2021) and are considered to be one of the main protein-rich dietary targets of parrotfish in recent studies (Clements et al. 2017; Nicholson and Clements 2023a,b). However, the relatively low FOO and RRA for cyanobacteria (genus-level) among our S. rivulatus pharyngeal samples indicate this sediment-tolerant parrotfish species (i.e. not deterred by fine-grained sediments; Gordon et al. 2016) may rely more on other microscopic protein-rich taxa, such as dinoflagellates. Scarus rivulatus bite cores (sampled from the Lizard Island complex) had lower densities of filamentous cyanobacteria compared to most other parrotfish species, although not significantly lower, and some genera were detected on more than 90% of the respective bite cores sampled (i.e. Lyngbya-morphotypes and Calothrix/Rivularia; Nicholson and Clements 2023a). Lyngbya-morphotypes, Capilliphycus and Okeania (Nuryadi et al. 2020) were detected in six of the eight pharyngeal content samples (FOO 75%). Calothrix and Rivularia were not detected in the pharyngeal content, but Calothrix is not monophyletic and includes Nunduva and Kyrtuthrix which were detected in 50% of our samples (Gonzalez et al. 2018; Johansen et al. 2021). The continuous taxonomic revision of cyanobacteria, due to widespread polyphyly, is problematic for directly comparing taxa detected among studies. This is an advantage of molecular work; data can be compiled across studies and run together through the most recent database release to compare ASVs. Regardless, resolving the relative abundance of cyanobacteria in comparison with dinoflagellates, diatoms and other microscopic algae indicates that microscopic work should be incorporated in future studies, especially since cyanobacterial 16S sequences cannot be quantitatively compared to 18S sequences from eukaryotic microalgae and macroalgae.

The complex dietary mixture identified from S. rivulatus pharyngeal content emphasises the difficulty of deriving the primary dietary targets of grazing fish using DNA metabarcoding alone. Ultimately, the feeding substrata for this parrotfish species are spatially complex microhabitats with early successional flora associated with invertebrates such as Annelida and Arthropods (Fig. 2). The relative stochasticity of biotic composition at the small spatial scales at which feeding substrata are selected by grazing fish such as S. rivulatus would also influence the variation in DNA metabarcoding data. Invertebrates have also been identified in other parrotfish DNA metabarcoding diet studies (Lin et al. 2023). The presence of these taxa contributes to protein intake, but it is unlikely they are a primary nutritional target given that their relative contribution towards overall protein intake is likely negligible (Kramer et al. 2013). The nutritional contribution made by these minor dietary components requires further investigation using other methods such as fatty acid analysis and compound-specific stable isotope analysis. In addition, the ingested bacterial community characterised using 16S can provide insight into potential dietary variation, where it is more valuable for inter-specific diet comparisons than within a single species.

Fatty acids allow the predator–prey relationships to be traced due to the majority of fatty acids being acquired through the diet (Bergé and Barnathan 2005; Kelly and Scheibling 2012). Specific fatty acids are known as biomarkers as they are unique to their respective origin and thus can be traced through the successive consumers, helping identify trophic links in a food web (Galloway and Budge 2020). For example, C22:6n-3 and C20:5n-3 are polyunsaturated fatty acids (PUFAs) that are often used as biomarkers for dinoflagellates and diatoms, respectively (Pond et al. 2005; Kelly and Scheibling 2012). C22:6n-3 and C20:5n-3 PUFAs are present in substantially higher levels in dinoflagellates than cyanobacteria, whereas cyanobacteria are higher in C18:3n-3 (Strandberg et al. 2015; Zea-Obando et al. 2017; Jónasdóttir 2019; Taipale et al. 2020). Furthermore, some biomarkers (e.g. C20:5n-3) need to be taken in consideration with others for a holistic understanding of the various dietary signals (Bergé and Barnathan 2005; Kelly and Scheibling 2012; Cnudde et al. 2015). A study on five coral reef fishes (Thalassoma lunare, Lutjanus lutjanus, Abudefduf bengalensis, S. rivulatus and Scolopsis affinis) indicated elevated levels of PUFAs (C18:3n-3, C18:3n-6, C20:3n-3, C20:5n-3 and C22:6n-3) in S. rivulatus (Arai et al. 2015a), and these values were also relatively consistent among other parrotfish species (Arai et al. 2015b). While a larger array of fatty acids would need to be tested (alongside associated feeding substrata) to allow more robust interpretations of the complex diet, the avoidance of macroalgae (Bonaldo and Bellwood 2008) suggests that more protein-rich cyanobacteria and dinoflagellates (and other microalgae) are probable sources for some of these biomarker fatty acids (Clements et al. 2017; Clements and Choat 2018; Sheppard et al. 2023). Biochemical analyses (i.e. fatty acid and stable isotope analyses) and morphological content analyses remain a fundamental component in dietary analyses that should be paired with DNA metabarcoding to quantify diet composition in grazing fish species.

Conclusion

EAM is a term used in the coral reef herbivory literature that fails to capture the level of detail needed to resolve the diet of grazing species, especially when evidence points to fine-scale resource partitioning among grazing fish species (e.g. Choat et al. 2002; Crossman et al. 2005; Clements et al. 2017; Nicholson and Clements 2021, 2023a, 2023b; Tebbett et al. 2022; Lin et al. 2023). In this dietary study on S. rivulatus, other lines of evidence indicate a protein- and lipid-rich diet with dinoflagellates appearing to be an important source of nutrients, especially PUFA. The present study supports previous work on this species (Nicholson and Clements 2022, 2023a) suggesting that microscopic photoautotrophs are a consistently important dietary component. However, filamentous cyanobacteria were less well represented in the sequence data than indicated by microhistology of feeding substrata, as reported in previous studies (Nicholson and Clements 2023a). The likely cause of this may be the thick exopolysaccharide sheaths of dietary Nostocales taxa interfering with DNA extraction, as found in previous studies (Sihvonen et al. 2007; Urrejola et al. 2019). Gene copy number is one of the prevalent factors in DNA metabarcoding that hinders quantitative interpretation of RRA if correction factors are not considered (Martin et al. 2022). However, unless the exact (relatively simple) diet is known a priori, it would be an immense undertaking to investigate gene copy number for multiple unknown dietary taxa, and even then the nutritional value to the consumer remains ‘unknown’. Thus, until DNA metabarcoding reaches a state of maturity where there is complete quantitative taxonomic coverage, additional support is required from other dietary analysis methods to provide robust interpretations and conclusions. At present, the use of diet metabarcoding for grazing fishes is probably of most value qualitatively for determining potential trophic partitioning among species assemblages, provided appropriate care is taken with taxonomic assignments.