Background & Summary

Diarrhetic Shellfish Poisoning (DSP) is a human illness caused by consumption of shellfish contaminated with okadaic acid and/or dinophysistoxins. The organisms responsible for producing these toxins include species in the marine dinoflagellate genus Dinophysis. Although a total of 137 Dinophysis species are taxonomically accepted, only 10 are known to produce DSP when humans consume filter-feeding shellfish that have concentrated these species1,2. An unusual feature of Dinophysis is that they are mixotrophic—that is, they rely on both photosynthesis and prey capture. They accomplish this by feeding on and stealing the chloroplasts from the ciliate, Mesodinium rubrum, which in turn has fed on and captured the chloroplasts of its prey, the cryptophyte Teleaulax amphioxeia. Many single-celled plankton are now recognized as mixotrophs3.

Until recently, DSP-related shellfish closures were reported primarily in Asian and European waters. The first incidence of Dinophysis occurrence at bloom levels in US was reported in 2008 for the Texas coast and lead to the closure of shellfish harvesting4,5. In the past decade, Dinophysis blooms have increased in frequency nationwide, so all coasts in the US now face closures of shellfish industries, but each event is linked to a different Dinophysis species. In the Gulf of Mexico, DSP and shellfish closures have been attributed to D. ovum4. Shellfish harvesting closures have been linked to blooms of D. acuminata and D. fortii in Puget Sound, WA6, to D. acuminata in Massachusetts7, and to D. norvegica in Maine8. Multiple species of toxigenic Dinophysis are present in the Chesapeake Bay9. Because of the morphological and genetic similarity of D. acuminata and D. ovum, counts of these two–along with D. sacculus and D. acuta—are often lumped together as “D. cf. acuminata-complex” in monitoring programs utilizing light microscopy9. Recent studies, however, have shown that D. acuminata and D. ovum have unique toxin profiles10. The diversity of Dinophysis species and toxigenicity in different regions of the US suggests that effective management will require examination of the environmental factors that influence their growth.

The focus of this study was to develop reference transcriptomes for each component of this unique “food chain” (Fig. 1a). Although results for members of the Dinophysis food chain have been reported previously11,12,13, our focus was on two new isolates of Dinophysis (D. acuminata from the Chesapeake Bay, D. ovum from the Gulf of Mexico) and additional strains of Mesodinium rubrum and Teleaulax amphioxeia (Table 1). The use of multiple strains of a single harmful algal species has been recommended to address the physiological variability within a species14. Using the bioinformatics tools illustrated in Fig. 2, a total of 112,955 transcripts were identified for D. acuminata, 198,405 for D. ovum, 64,115 for M. rubrum-DK2009, 75,531 for M. rubrum-JAMR, and 154,041 for T. amphioxeia (Tables 2 and 3). The different sequencing depth between D. acuminata and D. ovum may explain the larger number of transcripts discovered for D. ovum. A reciprocal BLAST between the two Dinophysis species and clustering at 95% similarity yielded a total of 85,968 shared transcripts (Fig. 1b). The number of transcripts shared between the prey item M. rubrum-DK2009 and D. acuminata was 350 compared to 6,759 with D. ovum (Fig. 1b). These low numbers were expected because cultures of Dinophysis were extracted for analysis after all prey were depleted. Additionally, the number of transcripts shared between M. rubrum-JAMR and D. acuminata was 5,221 compared to 7,503 with D. ovum. A total of 54,540 transcripts were shared between M. rubrum-DK2009 and its prey, T. amphioxeia (Fig. 1b), and 49,297 between M. rubrum-JAMR and T. amphioxeia. The number of shared transcripts between the two M. rubrum strains DK2009 and JAMR was 43,115.

Fig. 1
figure 1

(a) The food chain supporting the mixotrophic dinoflagellate Dinophysis, includes Mesodinium rubrum and Teleaulax amphioxeia39. Images from the Imaging FlowCytobot in the Gulf of Mexico, Texas coast4. Scale bar = 10 μm. (b) Venn diagrams showing the unique transcripts for each organism, with the shared transcripts shown in the overlapping areas. Note that the larger number of transcripts discovered for D. ovum was due to the higher sequencing depth, so the number of shared transcripts between D. ovum and M. rubrum also was higher.

Table 1 Identification and isolation information for the Dinophysis, Mesodinium, and Teleaulax strains used in this study. All were grown at 18 °C, L1 medium15 at salinity of 22 ppt, and 100 µmol quant m−2 s−1. Raw read data are deposited in the NCBI BioProject PRJNA880267, Sequence Read Archive (SRA)23,24,25,26,27 and the Transcriptome Shotgun Assembly (TSA) at DDBJ/ENA/GenBank28,29,30,31,32. Annotated transcript datasets are deposited in Zenodo33,34,35,36,37.
Fig. 2
figure 2

The bioinformatics tools used for assembly of the non-model organisms Dinophysis, Mesodinium, and Teleaulax. Quality trimming and filtering were accomplished with BBmap (https://sourceforge.net/projects/bbmap/) and SortMeRNA16, followed by normalization with the BBnorm function and interleaving the forward reads (fwd) and reverse reads (rev) using the BBrepair function in the BBMap package. Assemblies were generated with Trinity17 and Velvet-Oases18,19 and merged with cd-hit-est at 98%20. Open reading frames of coding regions were identified using TransDecoder (https://github.com/TransDecoder/TransDecoder) and functional annotation of the resulting transcripts was performed using BLAST21 against the NCBI NR database and predicted pathways were identified using InterProScan22.

Table 2 Summary of RNA-seq results and number of reads after quality trimming, after removal of non-mRNA, and the final sequence reads used for assembly after normalization.
Table 3 Properties of the transcriptome assemblies.

The assembled de novo transcriptomes for D. acuminata and D. ovum will serve as a reference for future experiments to assess the effect of different abiotic and biotic conditions and will also provide a useful resource for screening potential genes of interest to differentiate among the closely related species within the D. cf. acuminata-complex. The generated de novo transcriptomes for this collection of mixotrophic organisms will be a valuable resource for further downstream bioinformatics applications, including validation of gene expression, quantitative RNA-Seq analysis and comparative transcriptomics among strains of these harmful algal bloom species14.

Methods

Cell culturing and collection

Cultures of the kleptoplastic, mixotrophic species of Dinophysis, D. acuminata and D. ovum, the prey ciliate Mesodinium rubrum, and its prey, the cryptophyte Teleaulax amphioxeia (Table 1), were grown following the method described in Fiorendino et al. (10). Briefly, cultures were grown in L1-Si seawater medium15 at a salinity of 22, 18 °C, and under 100 µmol quanta m−2 s−1 on a 14: 10 light: dark cycle. Cultures were harvested by centrifugation at 3000 g for 15 mins. The cryptophyte T. amphioxeia was harvested at mid-exponential stage (~day 6). The M. rubrum and Dinophysis cultures were fed their respective prey at a 1:10 (predator: prey) ratio and harvested after the complete consumption of their cryptophyte or ciliate prey, respectively.

RNA Extraction and sequencing

Total RNA was extracted from cell pellets using Extracta Plus RNA (QuantaBio, USA). Total RNA extraction was performed following the manufacturer′s guide. RNA concentration was measured using a Qubit RNA HS Assay kit (ThermoFisher Scientific, USA), and RNA integrity was evaluated using Agilent Fragment analyzer system (Agilent, USA).

Poly-A selected RNA libraries were prepared using the NEXTFLEX Rapid Directional RNA-seq kit 2.0 (Perkin Elmer, Waltham, MA) as per the manufacturer′s instructions. Each library was prepared with a unique barcode and pooled at equimolar concentrations. The pooled samples were sequenced on an Illumina NextSeq. 500 (Illumina, San Diego, CA) at a read length of 2 × 150 bp, targeting 60 million read pairs per sample.

De novo assembly and gene annotation

High quality RNA-Seq reads (sequences) were used to generate the de novo transcriptome assemblies using the bioinformatics tools illustrated in Fig. 2. Raw sequence reads in fastq format were processed to remove adapters, poly-N (⩾10% read length), low-quality bases (Phred score < 10) and the last 10 bases were trimmed using the bbduk function in BBMap tool v. 38.90 (https://sourceforge.net/projects/bbmap/). Reads shorter than 125 bp were also discarded. Forward and reverse reads were concatenated using the bbrepair function. Non-mRNA reads were removed using SortMeRNA v. 4.3.4 with rRNA databases as reference16. The mRNA reads were normalized for depth based on kmer counts using the BBNorm function. Summary statistics for the number of total reads before and after precleaning are presented in Table 2. De novo transcriptomes were generated using Trinity v. 2.12.017 with default settings and Velvet-master v. 1.2.1018-Oases-master v. 0.2.0919 with default settings, except for minimum length criterion set as 300 bp for the shortest transcripts. Both de novo transcriptomes were merged using cd-hit-est v. 4.8.120 to reduce the transcript redundancy by 98% similarity and generate unique gene clusters. TransDecoder (https://github.com/TransDecoder/TransDecoder) was used to identify coding regions (ORF) of the assembled transcripts. The generated de novo assemblies were functionally annotated using the NCBI non-reductant protein database (NR) using BLAST tool v. 2.11021. InterProScan v. 5.55-88.022 was used to identify potential proteins in pathways using the Pfam, PANTHER, Gene3D, SUPERFAMILY, TIGRFAM, HAMAP, SFLD, PRINTS datasets.

Data Records

Three datasets were generated during the study. The first dataset consists of RNA-Seq raw reads from D. acuminata (DAVA01)23, D. ovum (DoSS3195)24, M. rubrum (DK2009)25, and (JAMR)26 and T. amphioxeia (K-0434)27, which were deposited in the NCBI Sequence Read Archive database (https://www.ncbi.nlm.nih.gov/bioproject/) under project identification number PRJNA880267 (Table 1). The second dataset contains the transcriptome assemblies for each of the five organisms which were deposited in the NCBI Transcriptome Shotgun Assembly (https://www.ncbi.nlm.nih.gov/genbank/tsa/) (Table 1)28,29,30,31,32. The third data set includes the annotated files that were deposited in Zenodo (Table 1)33,34,35,36,37 as XML files (Type 5 format of BLAST output). Headings in the Zenodo files include query sequence, query length, statistics for BLASTp, reference sequence and alignment.

Technical Validation

After the initial FastQC check and precleaning steps, we assembled the de novo transcriptome assemblies with Trinity17 and Velvet-Oases18,19 (Table 3). We found that Trinity and Velvet-Oases produced different numbers of transcripts. The number of transcripts generated by Trinity was twice the number of transcripts from Velvet-Oases. The Trinity-Velvet-Oases merged strategy resulted in longer transcripts. Transcriptome assembly validation was done using Benchmarking Universal Single-Copy Orthologs (BUSCO) v. 4.1.438. BUSCO core genes provide a qualitative estimate of the de novo transcriptome quality and completeness based on the evolutionarily informed expectation of the gene content from the near-universally conserved eukaryotic protein database (eukaryote_odb90). All five de novo transcriptome assemblies indicated high-quality assemblies with BUSCO coverage of 60–89% (Table 3). The CoDing sequences (CDS) obtained using TransDecoder revealed the highest number of genes in D. ovum (DoSS3195) while M. rubrum (DK2009) had the lowest number of genes (Table 3). N50 statistics appropriate for the de novo transcriptome assemblies were generated using the Trinity accessory scripts (Table 3). Functional annotation for these genes was performed using BLASTp with the maximum 3 best hits per gene and an e-value cutoff of 1e-20. The number of annotated genes ranged from 55–82% of the total transcripts (Table 3).

Using the bioinformatics tools illustrated in Fig. 2, the total number of transcripts for D. ovum exceeded the number for D. acuminata; this was probably due to the greater sequencing depth for D. ovum (Table 2). Note that although the number of transcripts in this analysis exceeded a previous report for M. rubum12, likely because of the increased depth of sequencing here, it is less than the number of transcripts identified by others13. To determine the number of transcripts shared between the two Dinophysis species, a reciprocal BLAST was performed and results clustered at 95% similarity (Fig. 1b).