Background

One major question in biology is how novel phenotypes originate from conserved genes, processes, and tissues. Research in evolutionary developmental biology often focuses on the conserved modules that have been co-opted for new phenotypes, so-called toolkit genes [1,2,3]. This approach has also been used to investigate the evolution of homologous adult phenotypes, such as in studies of sensory system evolution (e.g., G protein-coupled receptors [4]). However, a conservation-based approach explicitly excludes genetic elements that may be classified as clade-specific (i.e., taxonomically restricted, lineage-specific, lineage-restricted, or clade-restricted) that contribute to the development or function of a particular phenotype [1, 5,6,7]. Here we present the chromosome-level genome for the gastropod mollusk, Berghia stephanieae (Valdés, 2005) [8], which we used to identify clade-specific genes that may be important for both novel and conserved phenotypes, but have largely remained under investigation.

Many clade-specific genes are known to be involved in novel, or clade-restricted, phenotypes, including a number of cnidarian-specific genes exclusively expressed in specialized cell types called cnidocytes [9,10,11]; the spiralian-specific gene trochin, expressed in the primary ciliated band [12]; and spidroins in spiders, used for creating spider silk [13, 14], among others (further examples in [5]). A recent review by Wu and Lambert [5] highlighted that clade-specific genes deserve more attention when investigating evolutionary novelties.

In addition to their value in understanding phenotypic novelties, clade-restricted genes also play important roles in what might be considered more conserved phenotypes, and can quickly become essential to the viability of the organism (e.g., Drosophila, [15]). These clade-specific genes can become integrated into more conserved systems via some version of system drift (e.g., developmental system drift [16]), which may not result in a drastic change in function that we would classify as an evolutionary novelty. Most research on the impact of clade-specific genes has focused only on the presence or absence of clade-specific gene expression in novelties and how those genes have evolved [17,18,19,20], but has not provided comparisons with more conserved phenotypes in the same organism. A few studies have identified more clade-specific gene expression in novel tissues or cell types compared to those that are more conserved [21, 22], moving us closer to appreciating how clade-specific genes—and their expression—impact phenotypic evolution. Based on these previous studies, we can hypothesize that clade-specific genes are likely to be disproportionately upregulated in novel tissues. However, much of this research has centered on well-studied model systems, which limits our ability to generalize across other metazoan lineages.

Investigations into clade-specific genes in understudied groups or phenotypes have a high potential for generating exciting new hypotheses or expanding our technical creativity. Multiple excellent examples of this potential come from the phylum Mollusca, a clade containing taxa such as snails and slugs, cephalopods, bivalves, and chitons. Mollusca is the second most speciose metazoan phylum (after Arthropoda) and contains a great diversity of phenotypes that have already provided many useful insights, including cephalopods as alternative models to vertebrates for the evolution of complex brains and intelligence [23,24,25], bivalves as a means of understanding the nature of transmissible cancer [26], and gastropods for neuroscience research [27, 28] and as models of parasitism and immunity [29, 30]. However, a lack of genomes and functional biology resources for many members of this phylum has long hindered assessments of genetic homology and function [31]. This lack of resources has limited our ability to even identify clade-restricted genes in mollusk lineages, let alone characterize their expression or test their impact on phenotypes of interest. Luckily, some genomic resources, such as transcriptomes, are being sequenced at a much higher rate than whole genomes [32]. We propose that these resources can also be used to more accurately infer whether genes are clade-specific, so that we might further characterize their impact on the evolution of novel phenotypes.

In this paper, we present a chromosome-level genome for Berghia that we use to investigate clade-specific gene expression across multiple tissues. Berghia stephanieae (hereafter referred to as Berghia) is a species of gastropod in the order Nudibranchia, a clade of marine slugs that lose their shell during metamorphosis [33]. This species has been used as a model for the study of both more conserved systems, such as neurodevelopment [34] and reproductive development [35], as well as clade-restricted phenotypes such as the sequestration of cnidarian nematocysts [36, 37] and endosymbiosis [38, 39]. We combined an inferred proteome from Berghia with available genome and transcriptome data from other metazoan species—including mollusks such as cephalopods, bivalves, and other gastropods—to identify clade-specific Berghia genes (i.e., restricted to Mollusca, Gastropoda, Heterobranchia, Nudibranchia, Aeolidina, or Berghia alone) (Fig. 1). We then describe expression profiles of clade-specific and non-clade-specific genes among adult tissue samples in Berghia, including more ancient tissue types (like nervous system tissues; [40]) and clade-restricted ones (such as those associated with nematocyst sequestration; [41]). We find highly upregulated genes that are clade-specific in every tissue investigated, some of which are also highly upregulated in the same tissues during development. The proportion of clade-specific genes upregulated varied across tissues, but with no clear trend between the proportion of clade-specific genes and the novelty of the tissue. For example, the Aeolidina-specific distal ceras did not express a significantly higher proportion of clade-specific genes compared to more “conserved” tissues, such as the foot or tail. Our results support previous assertions that clade-specific genes are important for our understanding of phenotypic evolution, and highlight that future studies in emerging model systems must account for these to truly describe how new phenotypes evolve.

Fig. 1
figure 1

Cladogram showing broadly where the nudibranch Berghia stephanieae falls in the metazoan phylogeny. Colors indicate clades that we investigated for clade-specific genes in the Berghia genome: Mollusca (purple); Gastropoda (blue); Heterobranchia (teal); Nudibranchia (green); Aeolidina (gold); and (B) Berghia stephanieae (salmon). External tissues used in our analyses are indicated in (B). Major clades for outgroups included in our analyses are also shown, with the numbers in each collapsed clade or next to each name indicating the number of species from that group included in our analysis

Results

A high-quality Berghia stephanieae genome

We constructed a highly contiguous chromosome-level assembly of the 1.05 Gb long Berghia stephanieae genome using PacBio long read sequencing (16,476,079 total reads, 166,997 Gigabases, mean read length ~10 kb, N50 = 15,977) that corresponds to ~160x theoretical coverage based on the final assembly size. The assembly was scaffolded with Omni-C Illumina short reads (Additional file 1). Post-assembly but pre-scaffolding, the assembly was 1.1 Gb in length with a contig N50 of 6,921,793 bp and L50 of 46 contigs (Additional file 1). Our scaffolded assembly contained 7945 scaffolds (Table 1) with an N50of 86 Mb (L50 = 5 scaffolds) and N90 of 34 Mb (L90 = 15) and is available through NCBI (Genome Accession JAWQJI000000000). The rest of the scaffolds (7930 scaffolds) encompass less than 8% of the original genome assembly. A GenomeScope 2.0 [42] analysis of our Omni-C data indicated relatively low heterozygosity (0.693%) compared to many other mollusks, including gastropods such as Elysia chloritica (3.66% heterozygosity [43]), and many bivalves (1–3% [44,45,46,47]). The K-mer spectra and fitted models for Berghia are also consistent with a diploid genome (Additional file 2: Fig. S1). The final genome, filtered by GC content, BLASTn hits, and sequence length (see methods for details), contains 18 scaffolds (Table 1) with an N50 length of 85.5 Mb (L50 = 5 scaffolds) and N99 length of 26.7 Mb (L99 = 15 scaffolds). We found 93.3% complete, and 95.9% complete+fragmented, BUSCO core genes represented from the Metazoa (odb10) BUSCO database in the final dataset (Additional file 2: Fig. S2), and 76.6% of PacBio reads map to the final assembly (compared to 77.2% in the unfiltered assembly; Table 1). The 18 scaffolds in the final Berghia genome likely represent 15 chromosomes, based on the length distribution of the scaffolds and the linkage map (Fig. 2 A, B) from our Omni-C analyses. This is on the high end of the range for nudibranchs, which are known to have between 12 and 15 chromosomes in their haploid genomes [48]. All further analyses were performed on these 18 scaffolds.

Table 1 Genome assembly statistics for the Berghia stephanieae genome initial assembly (pre-filtering) and final assembly (post-filtering)
Fig. 2
figure 2

Summary statistics for the chromosome-level Berghia stephanieae genome, with comparisons to other mollusk genomes. A Bar chart showing the length (in bp) for each of the retained scaffolds, including 15 putative chromosomes. B Hi-C linkage plot showing identified links within and among chromosomes. Darker color of a block indicates a higher frequency of contacts. C Plot visualizing the summary statistics for the Berghia stephanieae genome compared to assembled genomes of other mollusks by BUSCO completeness score (to the metazoa_odb10), log of the scaffold N50, and total genome assembly length. The Berghia genome is nearly average in size but is among the best genomes in terms of contiguity (scaffold N50) and BUSCO completeness score

The Berghia genome compares favorably to other mollusk genomes in NCBI, with both a very high BUSCO score (when compared to the metazoa_odb10 database) and scaffold N50 (Fig. 2C; Additional file 3: Table S1). This analysis includes genomes from NCBI classified as either “scaffold,” “chromosome,” or “complete” from the phylum Mollusca. Of the 190 non-Berghia genomes analyzed, only 23 have a higher scaffold N50, the majority of which are larger genomes from cephalopods, bivalves, and scaphopods. Five are other gastropods, including the caenogastropods Sinotaia purificata, Conus ventricosus, and Monoplex corrugatus and the patellogastropods Patella vulgata and P. pellucida, three of which have higher BUSCO scores. Overall, 61 of the 190 species have higher BUSCO completeness scores, but the Berghia genome falls within a cluster of the most contiguous and highest-quality genomes on NCBI (Fig. 2C).

The Berghia stephanieae genome is also well-annotated. Our RepeatModeler analysis identified 46.68% of the genome is repetitive elements, with the majority of bases characterized as unclassified repeats (27.45%, further details in Additional file 4: Table S2). BRAKER2 initially predicted 61,662 proteins (covering 59,494 genes; Table 2), which we subsequently filtered to a data set of 26,595 proteins and 24,960 genes for annotation and analysis using a script included with the BRAKER installation (selectSupportedSubsets.py) and the --anySupport flag to only include genes at least partially supported by hints. Prediction filtering resulted in a slightly lower BUSCO score (for both the Metazoa and Mollusca databases; Metazoa—87.2 to 86.0% complete, Mollusca—73.8 to 72.2% complete), though both scores were lower than the original BUSCO result using the whole genome, suggesting that gene predictions from BRAKER2 are incomplete. Prediction filtering also very slightly lowered IsoSeq mapping percentage (95.57 to 95.23% mapped reads), but did not change the percentage of short reads mapped to gene models (74.81% for both). Functional prediction rates, however, were much improved in our filtered data set, for both BLASTP hits (20,820 proteins, 78.3%, in filtered predictions) and InterProScan results (24,469 proteins, 92.0%, in filtered predictions) compared to 58.8% and 78.7%, respectively, in our initial predictions (Table 2). We used our filtered BRAKER2 predictions and functional annotations in subsequent analyses.

Table 2 Gene prediction and annotation statistics for the Berghia stephanieae genome, including initial gene models predicted from BRAKER (pre-filtering) and filtered gene models intended to include only those models with external support (post-filtering)

Identification of clade-specific genes in Berghia genome

Our OrthoFinder analysis compared the predicted Berghia stephanieae proteins from BRAKER2 with proteomes from 58 other metazoan species (Additional file 5: Table S3 [24, 49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80]), including 27 gastropods, one scaphopod, 12 bivalves, two cephalopods, one monoplacophoran, 3 polyplacophorans, two aplacophorans, and 11 non-molluscan species. The goal of this analysis was to generate orthologous groups among proteins from all proteomes to assess which Berghia genes are restricted to certain clades. It is important to note that genes classified as restricted to narrower taxonomic designations (e.g., Gastropoda) are also by definition restricted to higher taxonomic clades (e.g., Mollusca). We found 25,338 (95.2%) Berghia stephanieae proteins clustered into orthogroups, 1027 (3.9%) of which were in Berghia-specific clusters of two or more sequences.

Our KinFin analysis, which provides taxon-aware annotation of inferred orthologous groups, identified Berghia genes restricted to Mollusca (n = 1067, 4.3% of genes), Gastropoda (n = 463, 1.9% of genes), Heterobranchia (n = 1154, 4.6% of genes), Nudibranchia (n = 1030, 4.1% of genes), and Aeolidina (n = 108, 0.4% of genes), as well as those genes only found in Berghia (n = 2188, 8.8% of genes; Fig. 3A). These Berghia-specific genes include those clustered in Berghia-specific clusters using OrthoFinder in addition to singletons that were not clustered. This level of species-specificity is on the lower end compared to other nudibranchs and fairly average for mollusks more broadly (Additional file 2: Figs. S3-S4) This means that 18,957 Berghia genes (75.9% of genes) are not clade-specific at the levels we investigated. Rarefaction curves for each taxonomic level (Additional file 2: Figs. S5-S9) suggest sufficient sampling in all major clades investigated. Both clade-specific and non-clade-specific (Other) genes are well distributed across the genome (Fig. 3B), with higher numbers of both types of genes per Mb in chromosomes 3 and 9 compared to other chromosomes (Additional file 2: Fig. S10). The proportion of clade-specific genes appears to be enriched on smaller chromosomes (Additional file 2: Fig. S11).

Fig. 3
figure 3

Results for clade-specific genes found within Berghia stephanieae. A A bar chart showing the total number of genes in Berghia stephanieae parsed by whether they fall into one of the clade-specific groupings or not (Other). B A bar chart showing the distribution of genes across the genome, parsed by whether they are clade-specific or not (a size normalized version of this chart is available in Additional file 5: Fig. S5). C A bar chart indicating what proportion of genes within each group (clade-specific or Other) were annotated using BLASTP. In some cases, BLASTP found a match to an uncharacterized, hypothetical, or putative protein. These are separated into a different category (uncharacterized)

As expected, non-clade-specific genes (Other, Fig. 3C) were much more likely to be annotated via UniProt (79.9%) than genes identified as clade-specific. As we get phylogenetically deeper, the percentage of clade-restricted genes that are annotated increases: Berghia-specific (7.8% of genes annotated), Aeolidina-specific (17.5% of genes annotated), Nudibranchia-specific (24.2% of genes annotated), Heterobranchia-specific (24.2% of genes annotated), Gastropoda-specific (32.0% of genes annotated), and Mollusca-specific (39.8% of genes annotated). The proportion of matched but uncharacterized genes (including hypothetical proteins) ranged from 1.2% (of Heterobranchia-specific genes) to 18.1% (of Gastropoda-specific genes) across investigated clades.

Clade-specific gene expression across tissues

Bulk tissue RNA-seq data mapped to the genome

We included seven Berghia tissues in our differential expression analyses, including (1) the brain, which consists of the paired cerebral-pleural, pedal, buccal, and rhinophore ganglia; (2) rhinophores, which are chemosensory structures restricted to nudibranchs; (3) oral tentacles, which are sensory-motor appendages restricted to the clade Aeolidina; (4) distal cerata, which are also restricted to the nudibranch clade Aeolidina and contain the organ where nematocyst sequestration occurs; (5) proximal cerata, which are common in Aeolidina and a few other nudibranchs and contain branches of the digestive system; and (6) foot; and (7) tail, which are tissues associated with mollusks more broadly. Our RNA-seq samples ranged from 2.9 million (SRR14337001, brain) to 36.5 million (SRR12072210, oral tentacle) read pairs ( = 25.3 ± 8.6 million reads per sample; Additional file 6: Table S4). On average, 72.1% (± 7.8%) of read pairs mapped uniquely to the Berghia stephanieae genome, and the mapping percentage ranged from 53.4% of read pairs (SRR14337002, brain) to 80.6% of read pairs (SRR12072207, distal ceras). We identified expression (counts >10 across all tissues [81]) in ~97.0% of genes (24,178 out of 24,960 predicted genes).

Differential expression among Berghia tissues

Our differential expression analyses compared the expression of each gene in each tissue to an average of normalized expression of that gene across all other tissues. Genes with a Log2 Fold Change > 2 and adjusted p-value < 0.05 were considered upregulated in a given tissue. We identified 16,691 genes upregulated across all tissues, with the highest number of upregulated genes (Table 3; Additional file 7: Table S5) in brain tissue (15,210 genes), followed by proximal ceras (678 genes), foot (205 genes), distal ceras (188 genes), tail (169 genes), rhinophore (147 genes), and oral tentacle (94 genes), respectively. The proportion of upregulated genes that were clade-specific (Fig. 4A; Additional file 7: Table S6) was variable across tissues, with rhinophore, oral tentacle, distal ceras, and tail having the most similar proportions of clade-specific upregulated genes (28.8–34.4% of upregulated genes), followed by foot (21.0% of upregulated genes). The proportion of clade-specific upregulated genes (Fig. 4A) was much lower in brain tissue (12.0% of upregulated genes). Expression data from clade-specific genes with more recent homologs (i.e., Gastropoda-Berghia) were more useful for distinguishing all tissues except for the brain (Additional file 2: Figures S9-S14). Some genes upregulated in one tissue were also upregulated in another tissue (1779 genes, or 10.7% of upregulated genes; Fig. 4C).

Table 3 Number upregulated genes for each tissue in Berghia stephanieae, by clade-specificity designation. Total numbers of upregulated genes are also reported for each tissue and clade
Fig. 4
figure 4

Results for upregulated genes for each tissue found within Berghia stephanieae. A A bar chart showing the proportion of upregulated genes that are clade-specific, or not (Other), for each tissue type. B A bar chart indicating what proportion of genes upregulated for each tissue type were annotated using BLASTP. In some cases, BLASTP found a match to an uncharacterized, hypothetical, or putative protein. C Venn diagram showing the tissues in which all upregulated genes were determined to be upregulated. Some genes are only upregulated in certain tissues, while others are upregulated in multiple tissues (maximum of 4 out of 6, 11 genes upregulated among the brain, distal ceras, proximal ceras, and tail). These are separated into a different category (uncharacterized). Abbreviations: BR, brain; DC, distal ceras; FO, foot; OT, oral tentacle; PC, proximal ceras; RH, rhinophore; TA, tail

We then removed genes that were upregulated in multiple tissues and focused only on those genes upregulated in a single tissue (uniquely upregulated genes; 14,912 genes, or 89.3% of upregulated genes). The highest number of uniquely upregulated genes (Fig. 4C was still in brain tissue (14,522 genes), followed by proximal ceras (229 genes), foot (80 genes), distal ceras (43 genes), rhinophore (27 genes), oral tentacle (7 genes), and tail (4 genes). The clade-specificity and annotation distributions of uniquely upregulated genes differed in some tissues (Fig. 5; Additional file 8: Tables S7-S8), including the rhinophore, oral tentacle, and tail when compared to the distributions of all upregulated genes in those tissues. The proportions of clade-specific and annotated genes changed more significantly in those tissues where far fewer genes were uniquely upregulated (Fig. 5).

Fig. 5
figure 5

Results for genes only found upregulated in a single tissue of Berghia stephanieae (see bold values in Fig. 4C). A A bar chart showing the proportion of upregulated genes that are clade-specific, or not (Other), for each tissue type. B A bar chart indicating what proportion of genes upregulated for each tissue type were annotated using BLASTP. In some cases, BLASTP found a match to an uncharacterized, hypothetical, or putative protein. Abbreviations: BR, brain; DC, distal ceras; FO, foot; OT, oral tentacle; PC, proximal ceras; RH, rhinophore; TA, tail

With regard to annotation, we noted that in addition to having the highest number of upregulated genes, brain tissue also had the highest proportion of upregulated genes (78.3%) that were annotated via BLASTP (Fig. 5B; Additional file 9: Table S9). The other tissues had slightly lower levels of annotation ranging from 55.0% of upregulated genes (tail) to 67.0% of upregulated genes (oral tentacle) with multiples included, though for some tissues this proportion of upregulated, annotated genes dropped significantly when considering genes upregulated in only one tissue (Fig. 5B). Of the upregulated genes with annotations (Fig. 4B), GO term enrichment analyses were consistent with what might be expected for particular tissues (Additional file 10: Tables S10-S16). For example, we noted enrichment of signal transduction, transmembrane transport, and ion binding GO terms in the brain; G protein-coupled receptor activity and sodium ion transport terms in the sensory oral tentacles and rhinophores; and transmembrane transporter activity, transmembrane transport, and extracellular region terms in the distal ceras, where nematocyst sequestration occurs.

Confirmation of tissue-restricted expression of clade-specific genes

In order to confirm the expression of clade-specific genes inferred to be upregulated in certain tissues, we localized the expression of at least one gene from each of two tissues (rhinophores and distal ceras) using in situ hybridization chain reaction (HCR) techniques [82] in Berghia juveniles. We also compared our list of genes upregulated in the Berghia brain to HCR gene expression profiling in the brain of adult Berghia available from Ramirez et al. [83] Juveniles were selected for our experiments because adult Berghia contain pigments that make localizing expression more difficult, and both the distal ceras and rhinophores are identifiable and functional at an early juvenile stage [34, 36]. We localized a distal ceras upregulated Nudibranchia-specific gene (jg13556; annotated as collagen alpha-1, match to UniProt ID Q7LZR2, e-value = 0.000129) in juvenile distal cerata (Fig. 6A–A′′′). This gene appears to be exclusively expressed inside the cnidosac of the juvenile Berghia, where nematocyst sequestration is known to occur [36]. We also identified expression of a rhinophore upregulated Berghia-specific gene (small domain annotated as a Pancreatic trypsin inhibitor, match to UniProt ID P00974, e-value = 1.52E−07) in the juvenile rhinophores where it is expressed in patches on the external epithelium (Fig. 6B–B′′′). For the brain, we found numerous clade-specific genes are expressed in the Berghia brain based on single-cell RNA-seq data [83]. These include (1) two genes exclusively upregulated in the brain (jg44129, an unannotated Berghia-specific gene expressed in a cluster of glial cells, and jg54950, an unannotated Heterobranchia-specific gene upregulated in nitric oxide synthase (Nos) and pigment dispersing factor (Pdf)-expressing cells in the rhinophore ganglia (rhg)); (2) one gene upregulated in the brain, rhinophore, and oral tentacle in our analysis (jg22847, an unannotated Berghia-specific gene in Nos/Pdf- rhg cells; and (3) two genes not considered upregulated in the brain in our analysis but appear to be upregulated in certain cell clusters in the brain (jg57406, an unannotated Heterobranchia-specific gene that is upregulated in the distal ceras in our analysis but expressed in mature neurons in all Berghia ganglia; jg56194, an unannotated Nudibranchia-specific gene that is not upregulated in any tissue in our analysis but is found in a cluster of glial cells in the brain) [83].

Fig. 6
figure 6

HCR results in Berghia stephanieae juveniles for selected clade-specific genes upregulated in particular adult tissues. AA′′′ Juveniles stained for a collagen alpha 1 VIII gene found upregulated in the distal ceras (COL8A1; jg13556) (AA′) DAPI and Alexa 647 stained tissues in whole animal, and A′′A′′′ close up of cnidosac. BB′′′ Juveniles stained for a gene found upregulated in the rhinophores (annotated as a Pancreatic trypsin inhibitor, PTI; jg18351) (BB′) DAPI and Alexa 647 stained tissues in whole animal, and B′′B′′′ close up of rhinophore. Abbreviated: cr, cerata; cs, cnidosac; br, brain; rh, rhinophore; ot, oral tentacle

Discussion

The Berghia stephanieae genome is highly contiguous

The Berghia stephanieae genome is among the most contiguous and highest quality gastropod genomes to date (Fig. 2C). The final assembled and filtered Berghia genome is comparable to other mollusk genomes [56] in terms of size (1.05 Gb) and number of predicted genes (24,960 genes). The Berghia genome also has high Metazoa, and moderate Mollusca, BUSCO scores (Table 2), both comparable to the scores of other high-quality mollusk genomes [84, 85]. Our analysis also identified a high percentage of repetitive elements in the Berghia genome (46.68%), similar to rates found in other mollusk species [32]. However, given that only 76.6% of PacBio CLR reads mapped to the final genome assembly (Table 1), it is likely that many repeat regions remain unresolved. The proportion of annotated genes in the Berghia genome was also quite high (77.8% with BLASTP hits), consistent with other published gastropod genomes [86,87,88].

Clade-specific genes are a small percentage of the Berghia genome

The vast majority of predicted genes in the Berghia genome (18,957 genes, 75.9%; Fig. 3A) are not restricted to the clades of interest in our analysis (Mollusca, Gastropoda, Heterobranchia, Nudibranchia, Aeolidina, and Berghia). Of those identified as clade-specific, most (2188 genes, ~8.8%) were classified as Berghia-specific genes. This percentage of “orphan” or species-specific genes in Berghia is on the lower end of the range compared to other species from across Metazoa (~1 to >30%; [7]), which may simply be a feature of the Berghia genome. This may also be because our strategy to perform orthologous gene inference with a clustering-based method using both unannotated transcriptome data and well-annotated, high-quality genome data increased our chances of detecting homologs to Berghia genes. Clustering-based algorithmic approaches for inferring gene orthology apply normalization to pairwise similarity scores to account for the sequence length of the query and the length of its hits, ensuring that distantly related sequences receive comparable scores compared to the best-scoring sequences from closely related species [89,90,91]. This strategy, combined with more data from more closely related taxa, provides greater opportunity for similarity matches and limits the impacts of homology detection failure [92], which occurs when homologs have become undetectable by search algorithms even though they exist. Our analysis suggests that homology detection failure is likely to cause inflated estimates of the proportion of species-specific genes in analyses that (1) include species from less widely distributed taxonomic levels (e.g., only including closely related taxa [19] or having limited outgroup sampling [93]); (2) rely only on pairwise similarity scores for assessing homology [21, 93]. In our analysis, this could mean that a subset of the ~8.8% of Berghia-specific genes are likely to be restricted to clades that we have not explicitly investigated, such as those at the genus (Berghia) or family (Aeolidiidae) levels. This would mean that the actual percentage of species-specific Berghia genes is perhaps even lower than reported here.

As might be expected, we also note that the annotation rate goes down with an increase in clade specificity of the genes, meaning that non-clade-specific genes (“Other”) had the highest annotation rate and Berghia-specific genes the lowest (Fig. 3C). However, some Berghia genes matched to proteins in the NCBI RefSeq database that do not have functional data associated with them (i.e., are predicted, hypothetical, or unidentified proteins; purple color in Fig. 3C). It is possible that some of these putative annotations are due to protein domain-level similarities, which suggests that these clade-specific genes do share some homologous regions with genes in other taxonomic groups.

Novel tissues do not express more novel genes

Clade-specific genes have long been thought to be a driver of morphological novelties [5, 94], and some researchers have hypothesized that novel phenotypes might require a higher frequency of clade-specific genes [22, 94, 95]. This type of increase in the expression of clade-specific genes has been identified in some taxa. These include cnidarians like Nematostella in novel cells called nematosomes [22], the mollusk radula [96], and non-morphological novelties like eusocial evolution in honey bees [95]. Our results are inconsistent with this hypothesis. In Berghia, we did not see a clear increase in the proportion of upregulated clade-specific genes in morphological novelties (Fig. 4A, B). For example, the distal ceras, where nematocyst sequestration occurs inside a novel organ called the cnidosac [41, 97], appears to have an average level of clade-specific gene upregulation compared to other tissues (Fig. 4A–B). This may be due to the fact that nematocyst sequestration relies on a largely conserved process, namely phagocytosis [36, 41, 98]. Our results also did not show a higher proportion clade-specific genes upregulated in the sensory rhinophores or oral tentacles compared to other tissues, which are functionally similar and homologous to tentacles in other heterobranch gastropods and caenogastropods [34, 99, 100]. However, the clade-specific genes that are upregulated in some of these tissues do appear to be functionally important, as our HCR results indicate that some genes (as shown in Fig. 6) are not only upregulated in the adult tissues but are also highly expressed in those same tissues in early-stage juveniles. The expression of these genes in juveniles suggests that these clade-specific genes may be especially crucial for the core function of these tissues as soon as they form. Alternatively, these genes may be important for upstream developmental processes in addition to downstream functions.

So why are our results inconsistent with the hypothesis that clade-specific genes are likely to be upregulated in novel tissues? For one, we have only collected gene expression data under standard conditions, which means that we are largely capturing constitutive gene expression. It is possible that critical, clade-specific genes for certain tissues may be only expressed in response to certain stimuli. We have likely not captured many of these genes in our analyses. Second, it is known that clade-specific genes are involved in both novel and conserved phenotypes [15, 21]. However, investigating gene expression at the level of tissues likely masks hidden differences in cell type diversity and complexity within tissues. We hypothesize that differences in the number and expression levels of clade-specific genes among novel tissues may be more related to the type of novelty (i.e., functional vs morphological) rather than the level of biological organization. For example, novelties that arise from a loss or modification of function may only require the loss of expression of certain genes. Future investigations focused on the expression of clade-specific genes (facultative and constitutive) at the cellular level—and on the function of these genes—will provide the necessary data to assess the relative impacts of these possibilities.

Tissue complexity—the Berghia brain

Animal nervous systems are perhaps the most complex biological systems, with their diverse components, cell types, and functions (e.g., [101, 102]). Although neurons in general express more genes than many other cell types in a variety of metazoan lineages [103], it has also been noted that genetic novelties appear to underlie cell diversification in multiple lineages (including octopuses [25] and teleost fish [104]). In these cases, the proportion of clade-specific genes expressed in novel cell types is higher than that in cell types with clear homologs. However, these differences would not be detectable in tissue-level analyses, such as those presented here.

Despite lower read counts and mapping percentages (Additional file 6: Table S4), we identified a large number of genes upregulated in Berghia brain tissue (> 15,000; Fig. 4A), which is roughly two-thirds of predicted Berghia genes. This is consistent with the hypothesis that neurons express more genes than other cell types [103]. However, our results also show lower proportions of clade-specific genes in the brain (~16.8%; Fig. 4A) compared to other tissues, even though the number of clade-specific genes upregulated in the brain (2,562 genes) is higher overall than the total number of upregulated genes in any other tissue (94–678 genes; Fig. 4B). It is possible then that consideration of clade-specific expression at the tissue level may not provide a comprehensive understanding of novelty.

Although some prior studies identified the expression of more clade-specific genes in novel tissues or structures [21], others have found a higher frequency of clade-specific expression in cell-type novelties [22, 105]. This suggests that the expression of clade-specific genes may have a higher impact on the evolution of novel cell types rather than novel tissues as a whole. This hypothesis is supported by the expression patterns of clade-specific genes in the Berghia brain [83]. For example, an unannotated gene that serves as a neuron-specific cell type marker is also a Heterobranchia-specific gene (jg57406). Similarly, serotonergic neurons in the brain express an unannotated gene that is Nudibranchia-specific (jg38442) [83].

Type of novelty—the Berghia cnidosac

The novel-genes-drive-innovation hypothesis might suggest that the novel cnidophage cell type, in the distal ceras [36], may use more (or a higher proportion of) clade-specific genes than other, more conserved cell types in Berghia. However, it was previously hypothesized that the cnidophage cell type may simply be a more specialized homolog of a digestive cell type, due to the apparent endodermal nature of cnidophages [36, 41, 97, 106]. A logical inference from this hypothesis is that the novel function of sequestration in cnidophages does not necessarily require novel molecular processes. This is not a new idea. For example, some morphological novelties in plants have been shown to evolve via regulatory evolution [107] and social behavior evolution in ants has been tied to both conserved and novel genetic elements [108]. Under similar conditions, we might not expect an increase in the frequency of clade-specific genes expressed in cnidophages. However, these clade-specific genes may still be functionally important given their high levels of expression in the cnidosac in Berghia juveniles (Fig. 6B–B′′).

Conclusions

The Berghia stephanieae genome is the first high-quality published genome for the order Nudibranchia and is one of the most contiguous and highest-quality gastropod genomes to date. However, it is likely that many repeat regions remain somewhat unresolved. We used this genome to investigate how clade-specific gene expression is distributed across functionally and evolutionarily diverse tissue types in adult Berghia and showed that upregulated genes in novel tissue types are not necessarily more likely to be classified as clade-specific. The proportion of clade-specific genes upregulated varied across tissues, with novel tissues like the distal ceras unexpectedly expressing a fairly average frequency of clade-specific genes compared to other tissues. Our results, when combined with previous research on the impact of novel genes on phenotypic evolution, highlight the value of a more holistic approach to investigating how phenotypes arise and diversify. In particular, the complexity of the novel tissue or behavior, type of novelty [22], and where across development changes may have occurred [109] will all influence how novel and conserved interact to generate new phenotypes.

Methods

Sample preparation and genome sequencing

We isolated one Berghia juvenile from the Lyons lab culture prior to mating to minimize genomic contamination. While isolated, we fed the animal ~½ of a medium Exaiptasia diaphana (defined by Taraporevala et al. [35]) each day for 34 days. We then starved the animal for 44 days prior to shipping. To minimize residual food in the gut diverticula, cerata were removed with forceps and the remaining body was blotted on a Kimwipe to remove excess water, then the animal was placed in a cryotube and flash frozen in liquid nitrogen and stored at −80 until shipping to Dovetail Genomics (now Catana Bio, Scotts Valley, CA). Dovetail Genomics used an input of ~101 mg into a slow CTAB protocol to extract high molecular weight DNA. They measured the efficiency of DNA extraction using a Qubit 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA) High Sensitivity Kit. Overall, they obtained 12.1 ug of high molecular weight DNA. They then used a Mini Column for cleanup and resuspended the pellet in 75 μl TE. They then quantified DNA samples using the Qubit. They constructed the PacBio SMRTbell library (~20kb) for PacBio Sequel using SMRTbell Express Template Prep Kit V 2.0 (PacBio, Menlo Park, CA, USA) using the manufacturer-recommended protocol. They then bound the library to polymerase using the Sequel II Binding Kit 2.0 (PacBio) and loaded onto PacBio Sequel II (PacBio) on 8M SMRT cells (SRR25687008).

For scaffolding, Dovetail fixed chromatin in place with formaldehyde in the nucleus for extraction and analysis via Dovetail® Omni-C® proximity ligation. They then digested the fixed chromatin with DNAse I, repaired the chromatin ends, and ligated to a biotinylated bridge adapter followed by proximity ligation of adapter-containing ends. After proximity ligation, they reversed crosslinks and purified the DNA. They treated purified DNA to remove biotin that was not internal to ligated fragments. They generated sequencing libraries using NEBNext Ultra enzymes and Illumina-compatible adapters. They then isolated biotin-containing fragments using streptavidin beads before PCR enrichment of each library. Technicians then sequenced the library on an Illumina HiSeqX platform to produce approximately 30× sequence coverage. They then used HiRise MQ>50 reads for scaffolding (see “read-pair” above for figures).

Short-read RNA sample collection and sequencing

We obtained Berghia adult tissue samples, including the (1) brain (2 samples; SRR14337001-SRR14337002); (2) oral tentacles (3 samples; SRR12072210, SRR25598600-SRR25598601); (3) rhinophores (3 samples; SRR12072209, SRR25598592-SRR25598593); (4) foot (2 samples; SRR12072206, SRR25598598); (5) tail (2 samples; SRR12072205, SRR25598599); and (6) proximal (3 samples; SRR12072208, SRR25598594-SRR25598595) and (7) distal ceras (3 samples; SRR12072207, SRR25598596-SRR25598597). We also obtained earlier stage transcriptome data from (8) multiple embryonic stages (bulk sample of 500–600 individual embryos from each time point reared at 27 °C (12, 24, 36, 48, 60 h post oviposition and 4, 7, and 9 d post oviposition; SRR12072213) and (9) juveniles 15 d post oviposition at 27 °C (500 individuals from 3 egg masses laid the same day; SRR12072212). We starved adults for ~1 week prior to the removal of some tissues (all but the brain) to reduce symbiont presence and minimize contamination. We extracted total RNA from most adult tissues (minus the brain) using the RNeasy Kit (QIAGEN, Redwood City, CA) and submitted the extracted total RNA to Novogene Ltd. (Sacramento, CA) for quality assessment, library preparation, and sequencing (Illumina NovaSeq 6000; 150bp paired-end reads). We prepared the adult brain total RNA using the Clontech SmartSeq v4 Ultra-Low Input RNA Kit (Takara). We prepared libraries with the Nextera XT DNA Library Preparation Kit and 96-Sample Index Kit (Illumina, San Diego, CA) and quantified them using Qubit (ThermoFisher Scientific, Waltham, MA) and assessed quality using a Bioanalyzer (Agilent, Santa Clara, CA). We sequenced the brain sample on the Illumina NextSeq 500 (75bp paired-end reads) at the Genomics Resource Laboratory, University of Massachusetts, Amherst. For the first two samples (bulk embryonic stages and juveniles), total RNA was extracted using TRIzol (Ambion) following the standard protocol, quality was assessed using Tapestation (Agilent) and sent to the IGM UCSD Genomic Center for library preparation (TruSeq mRNA stranded library) and sequencing (Illumina NovaSeq 6000; 150bp paired-end reads).

Reference transcriptome construction

Berghia samples used for reference transcriptome construction included a subset of samples to minimize computational cost while maximizing read breadth. These included single samples from multiple adult tissues, selected at random, including the (1) brain (SRR12072211), (2) oral tentacle (SRR12072210), (3) rhinophore (SRR12072209), (4) foot (SRR12072206), (5) tail (SRR12072205), and (6) proximal (SRR12072208) and (7) distal ceras (SRR12072207), as well as samples from (8) embryos (SRR12072213) and (9) juveniles (SRR12072212). We merged all FASTQ output files for the above samples into two files (Read 1 and Read 2) for downstream analysis. We used default parameters for all programs unless otherwise specified. We trimmed and filtered reads using fastp (version 0.20.0; [110]), and assembled transcripts using Trinity (version 2.9.1; [111]). We predicted open reading frames (ORFs) with TransDecoder (version 5.5.0; [112]). Duplication levels were quite high (~56%), so we clustered predicted ORFs using CD-HIT-EST (version 4.8.1; [113, 114]) at 95% identity and word size of 11 (-c 0.95, -n 11). Post-clustering, we filtered transcripts with alien_index (https://github.com/josephryan/alien_index), based on an algorithm described in [115]. We constructed alien index databases using previously constructed metazoan and non-metazoan databases (obtained from http://ryanlab.whitney.ufl.edu/downloads/alien_index) and all “Symbiodinium” sequences present on UniProt [116] as of 31 March 2020. We removed all sequences with an alien index greater than 45 from the transcriptome. We then compiled full transcripts for each predicted ORF sequence remaining from the assembled transcriptome using a custom Python script (full_transcripts.py). We then scanned the transcriptome for vectors and possible contaminants via the NCBI VecScreen (https://www.ncbi.nlm.nih.gov/tools/vecscreen/). We removed vectors using a small script (trim_adapters.pl) available through the Trinity Community Codebase (https://github.com/trinityrnaseq/trinity_community_codebase). We removed or trimmed sequences containing contamination using the Contaminants.txt file provided by NCBI and a custom script (remove_contamination.py). Custom scripts are available at https://github.com/lyons-lab/berghia_reference_transcriptome). We assessed transcriptome quality across all steps using BUSCO v5.1.2 [117,118,119] scores by comparing assembled transcripts to the metazoa_odb10 (C:98.1%[S:83.1%,D:15.0%],F:0.9%,M:1.0%,n:954) and mollusca_odb10 (C:93.4%[S:76.9%,D:16.5%],F:1.3%,M:5.3%,n:5295) databases. This transcriptome was intentionally generated de novo, in the absence of input from the genome, to provide a completely independent proteome with which to assess potential repeats in the genome.

Long-read RNA sample collection and sequencing

We obtained Berghia adult tissue samples from animals starved for at least 4 weeks to minimize gut contaminants, including the (1) head (one animal), (2) oral tentacles (two animals), (3) rhinophores (three animals), (4) cerata (one animal), (5) mantle (one animal), and (6) homogenized mid-body tissue (one animal). We also collected two developmental samples, including (1) embryos from the trochophore (72 h post oviposition; 300 animals) and eyed veliger stages (9–10 days post oviposition; 120 animals), and (2) post-metamorphic and post-feeding juveniles (34 animals). We extracted total RNA using the standard TRIzol Reagent (Life Technologies, Carlsbad, CA, USA) protocol, with some modifications: After the addition of chloroform, we centrifuged samples for 20 min at max speed (16,000 RCF) and precipitated samples in 100% isopropanol for ~1 h at −20 °C. We assessed total RNA sample quality on a 1% agarose gel and quantified the RNA in each sample with a Qubit 2.0 High Sensitivity kit (ThermoFisher Scientific, Waltham, MA). We then pooled developmental stages (embryos and juveniles; DEV), adult rhinophore and oral tentacle samples (RHOT), and adult mantle and cerata samples (MCE) in equivalent amounts. We sent these five total RNA samples (DEV, MCE, RHOT, head, mid-body) to the Roy J. Carver Biotechnology Center at the University of Illinois at Urbana-Champaign for IsoSeq library construction (5 libraries) and sequencing. They performed sequencing (on the five pooled libraries) on a single SMRT 8M cell with the PacBio Sequel II (PacBio, Menlo Park, CA) and a 30-h movie. They then clustered the raw subreads using the IsoSeq v3 clustering workflow (https://github.com/PacificBiosciences/IsoSeq/blob/master/isoseq-clustering.md).

Genome assembly and scaffolding

Dovetail used 167 gigabase pairs of PacBio CLR reads as an input to WTDBG2 v2.5 [120] with genome size 2.0g, minimum read length 20,000, and minimum alignment length 8192. Additionally, they enabled realignment with the -R option and set read type with the option -x sq. They then used BLASTn results of the WTDBG2 output assembly against the nt database as input for blobtools v1.1.1, and scaffolds identified as possible contamination were removed from the assembly. Finally, they used purge_dups v1.2.3 [121] to remove haplotigs and contig overlaps.

Dovetail used input de novo assembly and Dovetail Omni-C library reads (3 samples; SRR25687005-SRR25687007) as input data for HiRise, a software pipeline designed specifically for using proximity ligation data to scaffold genome assemblies [122]. They aligned Dovetail Omni-C library sequences to the draft input assembly using BWA with default parameters [123]. They then analyzed separations of Dovetail Omni-C read pairs mapped within draft scaffolds by Hi-Rise to produce a likelihood model for genomic distance between read pairs and used the model to identify and break putative misjoins, to score prospective joins, and make joins above a threshold.

We initially filtered the Berghia stephanieae genome with purge_dups v1.2.5 [121] to automatically identify and remove haplotigs and contig/scaffold overlaps from heterozygous sites. This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession JAWQJI000000000. The version described in this paper is version JAWQJI010000000. Following duplicate purging, we assessed completeness with BUSCO v5.1.2 [117,118,119] by comparing to metazoa_odb10 and mollusca_odb10. BUSCO further used the programs HMMER v3.1 [124] and MetaEuk v4.a0f584d [125] for gene prediction and analysis. We then used Nucleotide-Nucleotide BLAST 2.11.0+ [126, 127] to compare our scaffolds to the nt database (downloaded April 2021) and mapped the original PacBio reads used for assembly via minimap2 v2.18-r1015 [128]. With these results, we used BlobToolKit (Challis et al. 2020) (blobtools2 filter option) to remove additional scaffolds considered contamination. Scaffold selection for removal was based on GC content, PacBio read coverage results, BLASTn hits (we removed no-hit and bacterial contamination), and finally a minimum size threshold (150 kb). This size threshold was selected because it was the point at which the removal of sequences would not change the BUSCO score, as determined by the use of BlobToolKit Viewer v1.1 [129]. Most removed sequences contained differences in GC content and coverage compared to those that were retained in the final annotated genome, in addition to their smaller size. We also performed a Nucleotide-Nucleotide BLAST 2.11.0+ to compare the removed sequences with the final genome to assess duplication rates. Of those sequences removed from the final genome, 92.8% hit to one of the final 18 scaffolds (98.7% of which with an e-value of 0.0), and 1.6% of removed sequences were obvious contaminants. To compare the Berghia genome with other available Mollusca genomes, we downloaded assembled genomes from NCBI using the datasets command line function (v.15.24.0)[130] with flags for the taxon Mollusca and genomes at assembly levels “scaffold,” “chromosome,” or “complete.” We then assessed the completeness of each genome with BUSCO v5.1.2 compared to the database metazoa_odb10 and assessed scaffold N50 using the command line tool n50 in SeqFu, a suite of FASTX utilities [131].

Gene prediction and annotation

We analyzed filtered genome scaffolds with RepeatModeler v2.0.1 [132], which used Tandem Repeat Finder (TRF) v4.09 [133], RECON v1.08 [134], RepeatScout v1.0.6 [135], and RepeatMasker v4.1.2 (https://www.repeatmasker.org), to construct a de novo repeat library for Berghia stephanieae. This included the ---LTRStruct flag to run an LTR Structural Analysis, which used GenomeTools v1.6.1 (http://genometools.org), LTR_Retriever v2.9.0 [136], Ninja v1.10.2 (https://github.com/ninja-build/ninja), MAFFT v7.480 [137], and CD-HIT v4.8.1 [113, 114]. We then used this species-specific library to detect repeat sequences (via both soft and hardmasking) with RepeatMasker in the Berghia genome.

Following repeat masking, we mapped all short RNA-seq reads (unfiltered) to the hardmasked version of the genome using two separate read mapping software programs, HiSat2 v2.2.1 [138] and STAR v2.7.9a [139]. This was intended to account for mapping bias in order to maximize the possibility of support in our gene annotations. For long-read (IsoSeq) mapping, we first obtained Full Length Non-Concatemer (FLNC) reads from step three of the IsoSeq v3 workflow. These FLNC reads were mapped directly to the non-repeatmasked genome using minimap2 v2.22-r1105-dirty [128] with the recommended options according to PacBio (https://github.com/Magdoll/cDNA_Cupcake/wiki/Best-practice-for-aligning-Iso-Seq-to-reference-genome:-minimap2,-deSALT,-GMAP,-STAR,-BLAT). Post-mapping, sam output files were reformatted into bam files and indexed using samtools v1.11 [140]. We used BRAKER v2.1.6 [141,142,143] for preliminary gene prediction of the Berghia genome, which uses GeneMark-EP+ v4 [144, 145], DIAMOND v2.0.8 [146], spaln v2.4.3 [147, 148], and Augustus v3.4.0 [149]. We used long- and short-read RNA-seq mapping results as expression support input. We also used a protein hints file generated by combining the mollusca_odb10 database with B. stephanieae sequences identified as BUSCO hits from our initial mollusca_odb10 BUSCO run. We ran BRAKER with the additional flags --etpmode, --gff3, and --softmasking. After initial gene prediction, we generated a filtered predicted gene set using a script included with the BRAKER installation (selectSupportedSubsets.py) and the --anySupport flag to only include genes at least partially supported by hints. IsoSeq and short-read RNA-sequencing data were mapped to both sets of gene models to assess the impact of filtering. Both unfiltered and filtered gene prediction results are provided in Dryad (DOI: 10.6076/D1BS33), but we only used the filtered set (braker_annotations_anysupport.gff3) in subsequent functional annotation and clade-specific gene analyses.

For functional annotation of predicted genes, we used Protein-Protein BLAST 2.11.0+ (BLASTP) and InterProScan v.5.52-86.0. For BLASTP analyses, we used an e-value cutoff of 1e-3 with -max_target_seqs of one against three databases: (1) UniProtKB/Swiss-Prot, (2) RefSeq, and (3) Trembl (all downloaded April 2021). We then combined the hits to all three databases in a single blast annotation file. For InterProScan analyses, we used the default parameters with some additional flags, including -goterms to look up gene ontology, -dp to disable pre-calculated match lookup, and -t p to indicate protein sequences.

Assessment of clade-specific genes and their expression

To determine the distribution of clade-specific genes for Berghia, we created a proteome dataset containing 47 metazoan species using both published genomes and transcriptomes (Additional file 4: Table S3). Our final dataset included 36 mollusks, including fourteen nudibranchs (five from Anthobranchia and nine from Cladobranchia). We downloaded predicted proteomes from genome datasets from MolluscDB [56]. We downloaded the transcriptomes from the NCBI Sequence Read Archive (SRA), which we then filtered with fastp v0.20.0 [110] and assembled with Trinity v2.9.1 [111]. We predicted ORFs using Transdecoder v5.5.0 (https://github.com/TransDecoder/TransDecoder), with default parameters. Our final proteomes ranged from 17,606 to 72,541 proteins (x̅ = 33,691; Additional file 4: Table S3). We identified orthologous gene families among our metazoan proteomes using the OrthoFinder package (Emms and Kelly, 2019) with default parameters and a user-generated species tree as input (Additional file 11). Our user-generated species tree topology was based on recent metazoan phylogenies [150,151,152,153], the MolluscDB phylogeny provided on the website [56], and recent Mollusca [154] and nudibranch [67, 71, 155] phylogenetic analyses. We then analyzed orthologous groups using the program KinFin v1.0.3 [156] to determine which predicted genes in Berghia stephanieae are clade-specific (meaning that they only cluster with sequences from a particular clade). We used the default parameters, with additional flags (--infer_singletons --plot_tree -r phylum,class,order,superfamily).

To determine the expression patterns of clade-specific genes, we mapped our short-read RNAseq data (from the brain, oral tentacles, rhinophores, foot, tail, and proximal and distal ceras) to the Berghia genome using STAR v2.7.9a [139] with default parameters plus additional flags (--readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --twopassMode Basic --sjdbGTFfeatureExon 'CDS'). We counted reads using the command htseq-count from the HTSeq framework v1.99.2 [157], which is a Python package for analysis of high-throughput sequencing data. We analyzed counts using the DESeq function from DESeq2 v1.26.0 [158] to perform differential analysis and generated results using the results function with contrasts comparing each focal tissue with an average of all other tissues. We considered genes upregulated if the adjusted p-value (padj) was greater than 0.05 and log2FoldChange was greater than 2.

In situ hybridization chain reaction (HCR) in Berghia juveniles

Probe design

We designed all probe sets using the HCR 3.0 probe maker [159]. The sequences generated by the software were used to order probe sets (50 pmol DNA oPools Oligo Pool) from Integrated DNA Technologies (Coralville, IA USA), which we resuspended to 1 pmol/μl in 50 mL TE buffer (Tris, EDTA).

Hybridization chain reaction

We cultured Berghia stephanieae juveniles using the same methods as prior B. stephanieae imaging work [35, 36]. We starved juveniles for 5 days prior to fixation to decrease autofluorescence from digestive contents. We relaxed juveniles in 1-part 7.3% MgCl2: 2-part fresh filtered sea water for 30 min prior to fixation. We then washed and incubated samples in 4% PFA (paraformaldehyde diluted in FSW from 16% ampules) overnight at 4 °C. We washed samples three times in 1X phosphate-buffered saline (PBS), followed by a 50% 1X PBS/50% methanol solution wash, followed by three 100% methanol washes. All washes were 10 min. We stored samples in methanol at −20 °C.

We performed in situ HCR using the buffers and protocols detailed in Choi et al. [82] with the following modifications. All steps were performed in 1.5-mL tubes and the volume of washes was decreased to 200-μL to better suit our samples. We rehydrated samples into 5X SSCT from methanol, immediately followed by the detection stage of the protocol. We prepared probe solutions using 100 μL of hybridization buffer and 1.0 pmol/oligo/μL of each probe. Following overnight probe hybridization, we washed samples with 30% probe hybridization wash buffer for 3 × 5 min, followed by 2 × 30 min. Following the 5X SSCT washes during the amplification stage, we placed samples in the hairpin solution (6pmol solution using 2μL of 3μM stock of snap-cooled hairpins in 100 μL of amplification buffer). After the completion of the HCR protocol, we incubated the samples in 1.0 μg/mL DAPI diluted in 5X SSCT for 30 min. Samples were then stored in 5X SSCT at 4 C for up to 5 days until mounting and imaging.

We mounted samples in a 20% 5X SSCT, 80% Glycerol solution, and imaged samples with a Zeiss LSM 710 inverted confocal microscope with an AxioCam HRm camera. We analyzed images using image processing software ImageJ FIJI and Adobe Photoshop [160]. Samples were stitched together using the FIJI Pairwise Stitching Plugin [161]. Figures were created in Adobe Illustrator.