Background

Sea anemones (order Actiniaria) rely on venom for their survival, deploying this chemical weapon during encounters with prey, predators and competitors [1]. In contrast to other studied cnidarian groups, however, the venom of Actiniarians is dominated by peptide neurotoxins [2, 3]. These neurotoxins confer substantial clinical utility and have been utilised as molecular probes in studies of the nervous system for more than 40 years [4]. Characterised sea anemone neurotoxins can be classified into one of 12 pharmacological toxin families, based on which receptors and ion channels they target [5,6,7,8]. Sea anemone neurotoxins, however, can also be classified according to their structural scaffold [9]. For example, the six sea anemone potassium channel toxin families are characterised by different structural scaffolds [9]. Furthermore, these structural scaffolds are not always associated with a single toxin family. The inhibitor cystine knot (ICK) fold is among the most widely recruited motifs in animal venoms [10, 11] and cysteine spacing characteristic of the ICK motif has been identified in both sea anemone type 5 potassium channel toxins and acid-sensing ion channel toxins [9, 12]. Similarly, the β-defensin fold is observed in multiple sea anemone neurotoxin families [9]. Thus, much of the pharmacological diversity of venoms, including those of sea anemones, is often accounted for by a relatively small number of peptide scaffolds. Reflecting their amenability to functional innovation, these scaffolds—sometimes referred to as privileged scaffolds—have been recruited to venoms on numerous occasions, including within the same lineage.

ShK represents another privileged scaffold, with 7,329 proteins containing 13,829 ShK-like (ShKT) domains spanning multiple kingdoms documented in the Simple Modular Architecture Research Tool (SMART) database [6, 13]. In Actiniarians, however, the ShK fold has only been observed within toxins belonging to the type 1 potassium channel toxin family [6, 9]. This motif is named after ShK, a 35-residue toxin isolated from the sea anemone, Stichodactyla helianthus, which potently blocks KV1.3 as well as other voltage-gated potassium channels [14,15,16,17,18]. An analogue of ShK (dalazatide, Shk-186) has shown remarkable efficacy in preclinical animal models of autoimmune disease [15, 19,20,21] and entered phase 1 clinical trials for the treatment of plaque psoriasis [22]. Because of the therapeutic potential of ShK, sea anemone proteins containing ShK-like domains have received substantial research attention [23,24,25,26,27], but not all ShK-like peptides block ion channels. More recently, it was reported that multiple toxin-like peptides with the cysteine spacing characteristic of ShK are localised to neurons rather than venom-related structures in the starlet sea anemone, Nematostella vectensis [28]. One of these neuropeptides was a homologue of U-actitoxin-Avd8e, a member of the sea anemone 8 (SA8) toxin family [28].

SA8 putative toxins were first identified computationally in Anemonia viridis [29] and have since been noted in several other taxa [30,31,32,33]. The SA8 gene family appears to be restricted to the order Actiniaria and thus far it has been identified in 14 sea anemone species, spanning the Actinioidea, Metridioidea and Edwardsioidea superfamilies [33]. Five peptides with a conserved motif and relatively low similarity to known actiniarian toxins were identified in the A. viridis expressed sequence tag (EST) database [29], and together form the curated sea anemone 8 toxin family [5]. One of these SA8 precursors (P0DMZ3) was associated with 103 identical sequences, placing it as the most abundant novel toxin-like transcript in A. viridis [29]. Since then it has been demonstrated that SA8 transcripts are highly expressed in Stichodactyla haddoni [30] and upregulated in the acrorhagi and tentacles of Actinia tenebrosa [33]. Despite this, peptides from the SA8 putative toxin family have not been functionally validated as toxins and the tertiary structure of this family has not been resolved.

Given the recent discovery of the ShKT cysteine motif in a N. vectensis SA8-like peptide, as well as the ongoing recruitment of neuropeptides into sea anemone venom [28], SA8 putative toxins could potentially function as toxins and/or neuropeptides in different species. Here, we characterise the genomic arrangement, evolution and expression patterns of SA8 genes in two species, A. tenebrosa (superfamily Actinioidea) and Telmatactis stephensoni (superfamily Metridioidea) using a combined proteomic, transcriptomic and genomic approach. Furthermore, we investigated the structure and function of the first reported SA8 peptide recruited to the venom of T. stephensoni. Finally, we examined whether the SA8 gene family can be distinguished from the ShK gene family or whether it represents an extension of this gene family.

Results and discussion

Genome assemblies of Actinia tenebrosa and Telmatactis stephensoni

The genome assembly for A. tenebrosa generated a 312-Mb assembly scaffolded into 18 pseudomolecules with a N50 of 16.5 Mb, with 38 contigs not assigned to scaffolds (Table 1). The genome assembly for T. stephensoni resulted in a 484-Mb assembly with a contig N50 of 602 Kb. Completeness scores (BUSCO: Benchmarking Universal Single-Copy Orthologs) for A. tenebrosa and T. stephensoni assemblies are on par with other long-read actiniarian genomes [34,35,36,37]. In contrast, contig N50 values for the current A. tenebrosa and T. stephensoni genomes are higher than both previously assembled short- and long-read genomes.

Table 1 Comparison of assembly metrics among actiniarian genomes

Repeat content analysis of the A. tenebrosa, and T. stephensoni genomes found that 32.3% (101 Mb) and 27.3% (122 Mb), respectively, of these assemblies are repetitive. Similar proportions of repetitive DNA were observed in the N. vectensis and Exaiptasia diaphana genome assemblies [35, 36]. Long terminal repeats (LTRs), miniature inverted-repeat terminal elements (MITEs) and unknown repeats were the most observed repeat classes found in both sea anemone species, but the relative abundance of each class varied between species (Additional file 1: Table S1). LTRs were the most abundant repeat type found in the A. tenebrosa assembly, followed by MITEs and unknown repeats. In contrast, Surm et al. [38] reported that MITEs were the most common repeat type in the draft genome of A. tenebrosa, followed by unknown repeats. MITEs were the most abundant repeat class observed in the T. stephensoni assembly followed by LTRs and unknown repeats.

For A. tenebrosa, we predicted 53,560 protein-coding genes (59,465 with isoforms included) with a BUSCO genome completeness score of 97.3%. We predicted 50,712 protein-coding genes (54,933 with isoforms included) in T. stephensoni with a BUSCO genome completeness score of 96.6%. Based on these BUSCO scores, the completeness of our assemblies is comparable to that of the A. equina genome (97%) [34]. We predicted twice as many genes compared to previous sea anemone genome assemblies [34,35,36,37,38]. Furthermore, we predicted more protein-coding genes in A. tenebrosa compared to the other assembly for this species as Surm et al. [38] produced a draft genome using short read sequences only. This resulted in a significant underestimation of gene number in the previous A. tenebrosa genome assembly.

Expansion of gene families containing putative toxins within T. stephensoni

The phylogenetic tree generated for the five species is consistent with previous phylogenies [39], with N. vectensis sister to a clade containing A. tenebrosa and the three Metridioidean species. Computational Analysis of gene Family Evolution (CAFE) was then used to model the evolution of gene family sizes across five actiniarian species spanning the superfamilies Actinioidea, Edwardsioidea and Metridioidea (Fig. 1). This analysis revealed T. stephensoni had the largest number of expansions and the fewest contractions of gene families. In fact, all three Metridioidean species had a greater number of gene family expansions compared to A. tenebrosa and N. vectensis. Gene families that encode putative toxins were common in the expanded gene families, with 21 expanded toxin gene families in T. stephensoni compared to 15, 10, 4 and 2 for E. diaphana, Paraphelliactis xishaensis, A. tenebrosa and N. vectensis, respectively (Additional file 2: Table S2). Orthogroups containing ShK-like, peptidase M12A, phospholipase A2 (PLA2) and other putative toxins were expanded in T. stephensoni, while orthogroups containing insulin-like growth factor binding protein (IGFBP) like and Factor V-like putative toxins were expanded in A. tenebrosa. Furthermore, CAFE results detected an expansion of the SA8 family only in Telmatactis, a pattern also observed by Surm et al. [33], with a contraction of this gene family occurring in N. vectensis. Thus, the SA8 family appears to have a unique expansion event in Telmatactis and the potential mechanisms for this expansion should be examined.

Fig. 1
figure 1

CAFE expansion/contraction tree highlights increased number of expanded gene families in T. stephensoni relative to other sea anemone species. There were 12,258 gene families found in more than one of the five sea anemone species that CAFE predicts were present in their most recent common ancestor (MRCA). Species: Nematostella, N. vectensis; Actinia, A. tenebrosa; Exaiptasia, E. diaphana; Paraphelliactis, P. xishaensis; Telmatactis, T. stephensoni

Size and sequence divergence within the SA8 family coincides with phylogeny of actiniarian superfamilies

Multiple SA8 genes were identified in the genomes of A. tenebrosa and T. stephensoni, but only a single SA8 gene was identified and manually annotated in the most recent N. vectensis genome [37]. This single N. vectensis SA8 gene was comprised of three regular exons, ranging from 29 to > 120 bp. The number of genes and their arrangement also differed significantly between A. tenebrosa and T. stephensoni (Fig. 2A). In T. stephensoni, 10 SA8 genes were identified, nine of which were tandemly clustered on a single contig. In contrast, only six SA8 genes were identified in the A. tenebrosa genome on multiple pseudomolecules. Additionally, we noted distinct differences in the genomic arrangement of the SA8 gene family between these two species. The six A. tenebrosa genes were located across four scaffolds, with multiple SA8 genes present on both scaffolds 1 and 11. The two SA8 genes on scaffold 11 were arranged tandemly, but in scaffold 1, the SA8 genes were 1.5 Mb apart and had different orientations (non-tandem). MCScanX analysis indicated that the cluster of nine SA8 genes in T. stephensoni originated from two episodes of tandem duplication involving eight genes collectively, with the remaining clustered SA8 gene originating from a proximal duplication event. The non-clustered T. stephensoni SA8 gene appears to have originated from a dispersed duplication event. Similarly, four of the A. tenebrosa SA8 genes originated from dispersed duplication events. However, there was no evidence of tandem duplication of SA8 genes in this species, with the two clustered SA8 genes on scaffold 11 arising from proximal duplication. Instead, a genome doubling event in the genus Actinia [34] may have resulted in the distribution of SA8 genes across double the number of chromosomes compared to T. stephensoni.

Fig. 2
figure 2

A Genomic arrangement and orientation of SA8 genes in T. stephensoni and A. tenebrosa. The inverted SA8 gene identified in the genome of T. stephensoni is depicted in blue. The first collinear block is depicted in orange while the second collinear block is shown in purple. Numbers in square brackets indicate the scaffold number for A. tenebrosa SA8 genes. B Sequence space of the SA8 family using 102 SA8 sequences from 14 actiniarian species spanning three superfamilies. Sequence space analysis of mature SA8 peptides reveals that they fall into four clusters (yellow, black, pink, red), with the T. stephensoni venom SA8 peptide found in the largest cluster (indicated by blue arrow)

Two separate instances of collinearity were detected between T. stephensoni and A. tenebrosa SA8 genes. The first of these collinear gene blocks included the A. tenebrosa SA8 gene and surrounding genes on scaffold 4 and the tandemly clustered T. stephensoni SA8 genes (orange arrows in Fig. 2A). The second SA8 gene on scaffold 1 of A. tenebrosa and the non-clustered T. stephensoni SA8 gene were the second instance of SA8 and other homologous genes evolving in a conserved order across both species (purple arrows in Fig. 2A). This indicates that collinearity can be observed in a subset of SA8 loci and suggests whole genome and/or segmental duplication has played a role in the evolution of the SA8 gene family in A. tenebrosa, with genes located on double the number of chromosomes.

Consistent with a recent expansion in T. stephensoni, a high degree of sequence similarity and conserved gene structure present can be observed in T. stephensoni, but not A. tenebrosa, SA8 genes. While SA8 genes are comprised of 3–5 microexons in both species, high levels of variability are observed in the second exon in A. tenebrosa only. In T. stephensoni, there is high sequence similarity between the clustered SA8 genes, with the greatest sequence divergence associated with the non-clustered SA8 gene. Additionally, sequence space analysis revealed that SA8 peptides cluster into four groups across sea anemones (Fig. 2B). SA8 peptides from A. tenebrosa were found in all four groups while most SA8 peptides from T. stephensoni (including a venom peptide that mapped to the inverted SA8 gene; referred to hereafter as venom SA8 peptide) were confined to a single group. The clustering patterns observed for A. tenebrosa and T. stephensoni sequences are reinforced in other species of Actinioidea and Metridioidea. Actinioidea species Anthopleura dowii, Anthopleura buddemeieri, S. haddoni and Aulactinia veratra have SA8 peptides distributed across all four clusters. Conversely, SA8 peptides from Metridioidea species E. diaphana, Nemanthus annamensis and Calliactis polypus are restricted to one cluster, as are SA8 peptides identified in Edwardsia carnea and N. vectensis of superfamily Edwardsioidea. Therefore, despite the high copy numbers of SA8 genes in T. stephensoni, levels of sequence variation are lower in Metridioidea relative to Actinioidea. This finding favours the hypothesis that clustered SA8 genes observed in the T. stephensoni genome can be explained by a recent tandem duplication event and provides support for the hypothesis that tandem duplication may have played a key role in the expansion of the SA8 family across the superfamily Metridioidea; however, further investigation using other actiniarian genomes will be needed to verify this.

Gene duplication has been shown to play a major role in the expansion of protein families in most venomous phyla. The snake venom metalloproteinase (SVMP) family in rattlesnakes originated from a single ancestral gene via a series of gene duplication events, resulting in 31 tandem SVMP genes in Crotalus atrox [40]. Similarly, venom PLA2 genes cluster together in the genomes of multiple rattlesnake species, however, the number of genes in the cluster differs among C. scutulatus, C. atrox, and C. adamanteus [41]. A comparable pattern may emerge as SA8 genes are annotated in the genome of other Metridioidea species. As with SA8 clustered genes in T. stephensoni, the exons of clustered venom PLA2 genes in rattlesnakes are characterised by a conserved length and low levels of intron sequence variability [41]. Furthermore, tandem duplications are responsible for the evolution of defensin-like peptides in platypus venom [42,43,44,45]. Within order Actiniaria, a cluster of twelve genes encoding the Nv1 toxin were identified in the genome of N. vectensis [46]. These clustered Nv1 genes were proposed to have evolved via a birth and death mechanism [46, 47], a pattern consistent with the levels of nucleotide diversity found in the T. stephensoni SA8 genes. Furthermore, expansion events were reported to be more common in actiniarian-specific gene families — such as SA8 — compared to shared gene families [38], and tandem duplication was recently implicated in the evolution of the actiniarian ‘dominant toxin’ families [48]. Thus, there is a strong precedence for the expansion of the toxin gene families via duplication — both within sea anemones and other venomous lineages — and our data support the involvement of duplication events in the expansion and evolution of the SA8 family in T. stephensoni.

SA8 peptides are associated with one of two expression patterns in T. stephensoni

The tissue distribution patterns of SA8 genes in T. stephensoni and A. tenebrosa were examined using several approaches. Differential gene expression analysis utilising previously generated tissue-specific reads for three envenomating structures of A. tenebrosa and three functional regions of T. stephensoni revealed distinct expression patterns of SA8 genes in each species. The tentacles and club-tips facilitate prey capture and immobilisation prior to transport through the actinopharynx to the mesenterial filaments for prey killing and digestion [49, 50]. In contrast, the epidermis (body column and pedal disc) likely have a key role in predator deterrence, with the pedal disc also mediating substrate attachment [32]. Acrorhagi are inflatable organs used during intraspecific competition and are only observed in Actiniidae species [51].

Only four of the six A. tenebrosa SA8 genes were differentially expressed (false discovery rate (FDR) ≤ 0.05, four-fold) across the structures examined. Of these four differentially expressed A. tenebrosa SA8 genes (Fig. 3A), two were upregulated in the acrorhagi (SA8_scaffold1_1 and SA8_scaffold3), one was upregulated in the mesenterial filaments (SA8_scaffold11_1), and one was upregulated in the tentacles (SA8_scaffold1_2). Conversely, in T. stephensoni, the nine transcripts corresponding to the 10 SA8 genes are differentially expressed across the tentacles (club-tips and tentacles), gastrodermis (actinopharynx and mesenterial filaments) and epidermis (body column and pedal disc) (Fig. 3B). The majority of T. stephensoni SA8 genes are upregulated in the epidermis (four transcripts, five genes; SA8 clustered genes 1, 2, 6, 7 and 8) or the tentacles (three transcripts/genes; SA8 clustered genes 3, 4 and 5), although the non-clustered SA8 gene is upregulated in both the epidermis and tentacles.

Fig. 3
figure 3

Expression of SA8 genes across functional regions in A A. tenebrosa (n = 3) and B T. stephensoni (n = 3). Each graph shows the tissue-specific expression pattern of a transcript that corresponds to a SA8 gene. One transcript corresponds to two SA8 genes (SA8_clustered_2&7) in T. stephensoni. Bars represent the mean trimmed mean of the M-values (TMM) normalised expression value of the SA8 transcript in each anatomical structure, with error bars representing the standard deviation. Supporting data values are available in Additional File 3: File S1. Abbreviations: A, acrorhagi; C, club-tips; T, tentacle; P, actinopharynx; M, mesenterial filaments; B, body column; PD, pedal disc

Co-expression toxin module membership of all transcripts with significant BLASTp matches to SA8 peptides was assessed using weighted gene co-expression network analysis (WGCNA). In T. stephensoni, 11 transcripts (from eight genes) of the 15 transcripts (from ten SA8 genes) are assigned to two modules, each with a characteristic tissue-specific expression pattern (Fig. 4A); the turquoise module is characterised by upregulated expression in the tentacle tissue while the brown module is composed of putative toxins upregulated in column and pedal disc tissue. In A. tenebrosa SA8 transcripts are found in all four of the network modules identified (blue, brown, turquoise and yellow; Fig. 4B). Thus, the co-expression toxin modules associated with SA8 are defined by a specific expression pattern, and SA8 transcripts from A. tenebrosa are characterised by a greater range of expression profiles than those from T. stephensoni.

Fig. 4
figure 4

Cluster dendrogram of putative toxin transcripts from A T. stephensoni and B A. tenebrosa constructed using WGCNA. Each colour represents one module. A Most T. stephensoni SA8 sequences were assigned to the turquoise and brown modules: turquoise = higher expression in the tentacles; brown = higher expression in the epidermis. The inverted T. stephensoni SA8 sequence was assigned to the black module, which was not associated with a specific expression pattern. Other modules: blue = higher expression in the mesenterial filaments; green = higher expression in the epidermis and tentacles; red = higher expression in the actinopharynx; yellow = higher expression in the tentacles and mesenterial filaments. B A. tenebrosa SA8 sequences were assigned to all four modules: blue = higher expression in the acrorhagi; brown = higher expression in the acrorhagi and tentacles; yellow = higher expression in the tentacles; turquoise = no distinct expression pattern. Putative toxin transcripts not assigned to a module are shown in grey

Following gene duplication, duplicates can be translocated, pseudogenised or retained, with expression divergence often occurring in those duplicates that are retained [47, 52,53,54]. One of the two major models to explain the different functional roles of duplicated genes is subfunctionalisation, where the roles of the ancestral gene are partitioned between the duplicates [52, 55]. In venomous snakes, subfunctionalisation coincides with changes in the tissue distribution of toxin duplicates derived from salivary proteins [56]. By examining the tissue distribution of duplicated toxin gene families across venomous and non-venomous lineages, it was demonstrated that while toxin duplicates are restricted to the venom gland, their non-toxic counterparts continue to be expressed in the salivary glands of non-venomous reptiles [56]. Likewise, in the current study, we identified two major tissue-specific expression patterns for the SA8 family of T. stephensoni. Furthermore, evidence of subfunctionalisation was observed in A. tenebrosa with distinct tissue-specific expression patterns of duplicate SA8 genes, with subfunctionalisation of globin and toxin genes previously documented in this species [38, 57].

A gene inversion event leads to recruitment of a SA8 peptide into the venom of T. stephensoni

An inversion event was also identified within the tandemly duplicated SA8 gene cluster of T. stephensoni. This inverted SA8 gene contains an N-terminal addition (YGKGR) not observed in other clustered SA8 genes. Like other T. stephensoni SA8 genes, the inverted SA8 gene is upregulated in the epidermis; however, it is unique in that elevated expression can also be observed in gastrodermis structures, specifically the mesenterial filaments. Furthermore, the transcript corresponding to the inverted SA8 gene is found in a separate co-expression module (black) to other T. stephensoni SA8 peptides. In addition, two peptides (TR19056) from the milked venom proteome of T. stephensoni [58] — with similarity to previously identified A. viridis SA8 peptides [29] — were mapped back to the inverted SA8 gene (Additional file 4: Figure S1A). These peptides represent alternate splice isoforms of the inverted SA8 gene, with different transcriptional start sites that maintain the open reading frame and have identical mature peptides.

Mass spectrometry imaging (MSI) indicates that the peptide corresponding to the calculated mass of the mature T. stephensoni venom SA8 peptide is abundant within internal structures located adjacent to the pedal disc (Fig. 5). While this may reflect the localisation of this peptide within the pedal disc and distal mesenterial filaments, the spatial distribution pattern of the venom SA8 peptide is consistent with the principal component corresponding to peptides with higher intensity in the acontia (defensive thread-like structures formed by the free end of mesenterial filaments) [58,59,60]. Previous research provided evidence for minimal transcriptional activity within T. stephensoni acontia, with maturation of nematocysts occurring in proximity to the acontia-mesenterial filament junction [58, 59, 61], which may explain why the venom SA8 peptide is associated with high gene expression in the mesenterial filaments. Furthermore, a SA8 peptide was recently discovered in the acontia proteome of C. polypus [62]. Overall, MSI data provide evidence for acontial, mesenterial and pedal disc localisation of the venom SA8 peptide in T. stephensoni and are consistent with the higher expression of the inverted SA8 gene in the gastrodermis and epidermis.

Fig. 5
figure 5

Peptide corresponding to the calculated mass of the T. stephensoni venom SA8 peptide is abundant in acontia, gastrodermis and epidermis. A Haematoxylin and eosin stain of T. stephensoni section. B Putative peptide observed at 5409 m/z with the highest abundance in internal structures adjacent to the mesenterial filaments (m) and pedal disc (pd), which corresponds to the location of acontial filaments

As none of the peptides identified in the milked venom of A. tenebrosa had significant similarity to SA8 sequences, the SA8 sequences in this species are probably non-venom peptides. This suggests that the inversion event, in concert with tandem duplication, may have facilitated the recruitment of this SA8 peptide into the venom of T. stephensoni, with SA8 only previously found in the acontial proteome of C. polypus [62]. Furthermore, the inverted gene is found in a well-supported clade on a branch sister to a Metridioidea-specific clade (Fig. 6), indicating that the recruitment of this orthologue into the venom may have only occurred in the Telmatactis genus or related acontiarian sea anemones.

Fig. 6
figure 6

Phylogenetic relationships among 71 sea anemone sequences from the SA8 gene family were inferred under maximum likelihood in IQ-TREE. The inverted SA8 gene from T. stephensoni is found in a well-supported clade (support value > 80) on a branch sister to a Metridioidea-specific clade (purple). Support values are from 1000 ultrafast bootstrap replicates

Neofunctionalisation occurs when the duplicated gene assumes a role distinct from that of the original gene, and it is the second major model used to account for functional divergence following gene duplication [54, 55]. Although considered a rare event [56], neofunctionalisation has occurred in multiple snake venom gene families [41, 63,64,65,66,67]. For example, in the snake-venom PLA2 gene family, duplication and subsequent inversion of a basic PLA2 gene resulted in the first acidic-PLA2 gene, and thus this combination of events can represent a scenario that enables neofunctionalisation [41]. We propose an inversion event and subsequent escape from a gene regulatory network that has allowed the venom T. stephensoni SA8 gene to undergo neofunctionalization and a novel gene expression program. This gene also possesses a unique N-terminus, not observed in the other clustered SA8 genes in T. stephensoni.

The SA8 family is distinct from the ShK and possesses a unique disulphide network

All SA8 putative toxins in T. stephensoni had a cysteine spacing consistent with motif 3, described by Kozlov and Grishin [29], which has previously been observed in SA8 putative toxins from A. viridis (P0DMZ3, P0DMZ4, P0DMZ5, P0DMZ6, P0DMZ7), sea anemone type 1 potassium channel toxins with a ShKT domain (P29187, P29186, P81897, Q9TW91) [29] and cysteine-rich domain (CRD) of snake venom cysteine-rich secretory proteins (CRISPs) [68]. Sequence space analysis [27] revealed that ShK-like proteins, CRISPs and SA8 putative toxins are clearly delineated into three groups (Fig. 7A), reinforcing the unique sequence composition of SA8 putative toxins and indicating that they represent a single distinct protein family.

Fig. 7
figure 7

Sequence space of the SA8 family (A) and disulphide connectivities of venom SA8 (B) and ShK (C). A The SA8 family (yellow) forms a distinct cluster from the ShKT domains of ShK-like (red) and CRISP (black) peptides [27]. B The C1–C5, C2–C6, C3–C4 disulphide connectivity of the mature venom SA8 peptide from T. stephensoni. C The C1–C6, C2–C4, C3–C5 disulphide connectivity of ShK toxin from S. helianthus

All SA8 peptides are characterised by a conserved glycine residue and FA dyad (Additional file 4: Figure S1A). This dyad has not been reported in other toxin groups and further research is required to determine its functional significance. Few SA8 peptides contain the KY dyad characteristic of ShK, although most contain a KY dyad alternative (QY, KW, TY, RF); however, the two protein families possess distinct sequence patterns and exon-structure. SA8 peptides contain more residues between the pro-cleavage site and final cysteine, resulting in larger proteins, but only ShK-like sequences have an extension at the C terminus. SA8 genes are consistently composed of 3–5 microexons (Additional file 4: Figure S1B), while ShK genes are composed of 2–3 exons. Additionally, large introns are present in all SA8 genes, but particularly so in A. tenebrosa, with a 17-kb intron identified in this species. Therefore, although ShK-like and SA8 peptides have a similar cysteine spacing pattern, these two families are resolved into discrete groups when examining their other characteristics.

To examine the structure of the SA8 peptide present in milked venom and determine if it is similar to that of ShK, we expressed this venom peptide and characterised it using liquid chromatography-tandem mass spectrometry (LC–MS/MS) and nuclear magnetic resonance (NMR) spectroscopy. The His6 -MBP-SA8 (polyhistidine; maltose binding protein) fusion protein was the dominant protein expressed in transformed E. coli cells following Isopropyl β-D-1-thiogalactopyranoside (IPTG) induction. SA8 was liberated from the 50.5 kDa fusion protein using tobacco etch virus (TEV) protease (Additional file 5: Figure S2), then purified by affinity chromatography followed by reversed-phase high-performance liquid chromatography (RP-HPLC) (Additional file 6: Figure S3A). The liquid chromatography-mass spectrometry (LC–MS) profile of the fraction corresponding to SA8 showed a pure peak whose molecular mass (MH+ m/z = 5438 Da) was in good agreement with the expected mass (theoretical MH+ m/z = 5438.23 Da) (Additional file 6: Figure S3B). The recombinant SA8 co-elutes on RP-HPLC with native SA8, with only a very slight difference in elution time (11.44 min and 11.38 min, respectively) due to the presence of an additional C-terminal glycine on the recombinant peptide (Additional file 7: Figure S4). Furthermore, the observed mass of the recombinant SA8 was 6 Da less than that of the reduced form of the peptide (MH+ m/z = 5444.23 Da), indicating formation of three disulphide bonds. LC–MS/MS analysis of pepsin-digested SA8 revealed the disulphide-bond connectivity to be C1–C5, C2–C6, and C3–C4 (Fig. 7B, Additional file 8: Table S3). This pattern differs from the ShK disulphide framework (C1–C6, C2–C4, C3–C5) (Fig. 7B), supporting the hypothesis that SA8 is not a simple extension of the ShK fold.

A one-dimensional (1D) 1H NMR spectrum of recombinant SA8 acquired at pH 3.5 and 25 °C revealed poor dispersion of amide-proton resonances region (Additional file 9: Figure S5). Subsequently, several 1D spectra were recorded at temperatures over the range 5–30 °C in 5 °C increments and pH values 2–7 (Additional file 10: Figure S6 and Additional file 11: Figure S7), but no significant improvement in peak dispersion was observed. Moreover, a sample of the recombinant peptide that was reduced and unfolded, then oxidatively refolded in vitro gave the same spectral dispersion as the recombinant peptide, indicating that we are dealing with the most stable fold of this sequence (Additional file 12: Figure S8). Thus, the NMR data indicate that the T. stephensoni venom SA8 peptide adopts a partially disordered structure in solution, unlike ShK which has a well-defined structure [18]. Overall, we report the SA8 family is associated with unique disulphide-bond connectivities, exon–intron structure, and sequence spaces that distinguish it from ShK.

Functional activity of SA8 peptides remains unknown

ShK is a 35-residue peptide isolated from the sea anemone S. helianthus [14] that potently inhibits KV1 channels through a conserved KY functional dyad [69,70,71]. The presence of KY dyad alternatives in multiple SA8 sequences suggests this putative toxin family may have some activity against K+ channels [72]. First, we tested the effects of 100 nM recombinant SA8 on the following human KV channels: KV1.1, KV1.2, and KV1.3. Whole-cell KV1.2 currents were recorded sequentially on the same transiently transfected Chinese hamster ovary (CHO) cell, before (control trace, black) and after perfusion of the cell with 100 nM SA8 (red trace) (Fig. 8A). At the equilibrium block, SA8 caused ~ 10% reduction in current amplitude. This weak block was almost fully reversible upon washing the perfusion chamber with inhibitor-free solution. Both the association and dissociation rates of SA8 were rapid; development of equilibrium block and recovery up to ~ 85% of control current took ~ 1 min. In contrast, 100 nM SA8 did not inhibit KV1.1 or KV1.3 (Additional file 13: Figure S9A and B). We performed a concentration–response experiment for the inhibition of KV1.2 channels by SA8 (Fig. 8B) and determined the reciprocal of the slope of the fitted line yielded an IC50 of 40.8 ± 4.9 µM (n = 4).

Fig. 8
figure 8

SA8-induced inhibition of hKV1.2 and selectivity profile of SA8. A Representative whole-cell hKV1.2 current traces were recorded using the voltage protocol shown above the raw current traces. Depolarizing pulses were repeated every 15 s, selected traces are shown in the absence of blockers (black, control) and upon reaching equilibrium block in the presence of 100 nM SA8 (red) or 14 nM charybdotoxin (ChTx, green, positive control). B Low affinity, concentration-dependent block of hKV1.2 channels by SA8 was determined by fitting a straight line to the reciprocal of the remaining current fraction (1/RCF) plotted as a function of SA8 concentration. The remaining current fraction (RCF) was calculated as I/I0, where I0 is the peak current in the absence and I is the peak current at the equilibrium block in the presence of SA8 at concentrations of 0.01, 0.1, 1, and 10 µM (filled circles). Points on the linear concentration–response curve represent the mean of four independent measurements where the error bars represent the standard error of mean (SEM). The line was drawn using linear least squares fit and the reciprocal of the slope of the best fit yielded an IC50 of 40.8 ± 4.9 µM. C The effect of SA8 (100 nM, except for KV10.1 and NaV1.7 which were tested at 1 µM) on the peak currents was reported as the RCF. Bars represent the mean of 3–6 independent measurements; error bars indicate the SEM. Data are shown for the following channels: HkV1.1 (n = 4), hKV1.2 (n = 4), hKV1.3 (n = 3), hKV10.1 (n = 4), hKV11.1 (n = 5), mKCa1.1 (n = 4), hKCa3.1 (n = 6), hTRPA1 (n = 5), hTRPV1 (n = 5), and hNaV1.7 (n = 5) (for details of the expression systems, solutions, and voltage protocols, see Materials and Methods, and for raw current traces see Additional File 13: Figure S9). SA8 did not inhibit any of the investigated channels at the applied concentrations. Supporting data values are available in Additional File 3: File S1

To reveal the selectivity profile of SA8 we assayed its effect on other K+ channels as well. We tested its activity on KV10.1 and KV11.1 due to their slightly different structure compared to KV1.x channels [73,74,75]. We also tested the effect of SA8 on KCa1.1, the large conductance voltage- and Ca2+-activated channel, and KCa3.1, a Ca2+-activated K+ channel. Moreover, as SA8 is suggested to be a defensive peptide and possibly involved in painful stings, channels involved in pain recognition were also tested including the voltage-gated sodium channel NaV1.7, and the transient receptor potential channels TRPA1 and TRPV1. We used positive controls (2 µM astemizole for KV10.1, 10 mM tetraethylammonium (TEA+) for KCa1.1, 20 nM TRAM-34 for KCa3.1, 50 µm HC-030031 for TRPA1, and 50 µm capsazepine for TRPV1) diluted freshly in extracellular solution (ECS) in order to confirm ion channel expression and proper operation of the perfusion system. We found that most of the channels shown in Fig. 8 and Additional file 13: Figure S9 are not inhibited significantly by SA8 at the indicated concentrations except for the modest block of the hKv1.2 current.

Injection of D. melanogaster with recombinant SA8 at a dose of 0.5 µg per fly did not result in paralysis or death within 24 h. In contrast, neurotoxic spider-venom peptides with insecticidal activity cause paralysis and death at dose of 0.005 µg per fly in D. melanogaster [76]. Thus, the recombinant SA8 peptide may not exert any toxic effects in insects and only causes a weak inhibition of human Kv1.2 channels. However, the presence of this peptide in the venom of T. stephensoni indicates that it probably has at least an auxiliary function in venom, and the abundance of the venom SA8 peptide within structures with defensive roles (acontia and epidermis), suggests that one potential role of this putative peptide may be targeting the neuronal channels of vertebrate predators in order to fulfil the ecological function of predator deterrence [58]. Further functional testing is required to determine the biological activity of this venom.

Conclusions

Based on sequence, structural and genomic data, we find that SA8 putative toxins are a distinct family that has only superficial similarities to ShK. Thus, characterising the SA8 family offers an opportunity to explore a novel group of putative toxins that may possess pharmacological properties that make them useful as pharmacological tools or therapeutic leads.

We report that the SA8 family is expanded in T. stephensoni but not A. tenebrosa relative to other actiniarian species. The two genomic locations of SA8 genes in T. stephensoni form two distinct collinear blocks with two of the four loci of SA8 genes in A. tenebrosa, indicating that segmental duplication may have played a role in evolution of the SA8 family. However, the expansion of clustered SA8 genes in T. stephensoni appears to be the consequence of proximal and tandem duplication events.

While members of this family are expressed in the neuronal cells of N. vectensis, a SA8 peptide has been recruited to the venom of T. stephensoni following an inversion event. This inverted SA8 gene is distinct from other members of the SA8 family across multiple levels and represents a potential neofunctionalisation event in the SA8 gene family. In contrast, the absence of non-inverted SA8 peptides from the venom of T. stephensoni and A. tenebrosa, as well as the previously reported neuronal localisation of SA8-like peptides suggests that other SA8 genes may encode non-venom neuropeptides.

Furthermore, while we have not yet determined the functional activity of venom SA8 peptides, we have shown that they are structurally distinct from ShK toxins and possess a unique disulphide bond pattern. We propose that the inverted T. stephensoni SA8 peptide may have a role in defence against predators, as either a toxin or an auxiliary venom protein, based on its distribution across the body plan.

Additional testing will be required to determine the exact function of the inverted SA8 and the SA8 family in general, but given the findings of the current study, the genetic origins and diversification of this putative toxin family should be investigated in other sea anemones. Future research should determine whether the patterns we report are conserved across Actinioidea and Metridioidea, particularly whether venom SA8 peptides are only found when there is an inversion within a cluster of tandemly duplicated genes.

Methods

Animal care

T. stephensoni were sourced by Cairns Marine Pty Ltd from the Great Barrier Reef (QLD, Australia) while A. tenebrosa were collected from the intertidal zone at Point Cartwright (QLD, Australia). Both species were housed in holding tanks at the Queensland University of Technology until experimental use. Tank salinity and temperature were maintained at 33–37 ppt and 20–28 °C, respectively.

Genome sequencing

Short-read DNA sequencing

DNA was extracted from A. tenebrosa using the E.Z.N.A Mollusc DNA Kit (Omega Bio-Tek). Tentacle tissue was homogenised in a tube containing ML1 buffer (Omega Bio-Tek) and a stainless-steel ball bearing (Qiagen) using a Tissuelyser II (Qiagen), before the remaining steps were performed following the manufacturer’s protocol. To extract DNA from T. stephensoni, tentacle tissue was homogenised in liquid nitrogen with a mortar and pestle. The sample was then processed following the QIAGEN Genomic-tip procedure. DNA quantity was determined using a Qubit 4 fluorometer. Genomic DNA for both A. tenebrosa and T. stephensoni was sequenced using 150-bp paired-end chemistry on an Illumina HiSeq 2500 and X10, respectively.

PacBio long-read sequencing

For A. tenebrosa, high molecular weight (HMW) DNA was extracted using two rounds of the method described for short-read DNA-seq from the same individual. HMW DNA was extracted from T. stephensoni according to the 10X Genomics salting-out protocol for DNA extraction from single insects [77]. In order to extract HMW DNA, minor modifications were made to this protocol, including the use of wide-bore pipette tips, low bind tubes and a gentle bead clean-up. DNA molecular weight was assessed using PippinPulse pulsed-field gel electrophoresis, while purity and quantity were determined using a Qubit fluorometer. Samples of sufficient purity (260:280 ratio of 1.8–1.9 and 260:230 ratio of 1.6–2.0) and quantity were sequenced.

Five micrograms of genomic DNA from each species was prepared using needle shearing and the BluePippin (SageScience) size-selection system. The SMRTBell® Template Prep Kit 1.0 (PacBio) was then used to prepare 14-kilobase libraries. Sequencing was undertaken on four sequencing chips using a 16-h run time on a PacBio Sequel®.

Hi-C sequencing

Hi-C sequencing was performed to generate a chromosomal-level assembly for A. tenebrosa. Tentacle tissue from A. tenebrosa was crosslinked using 1% formaldehyde. Chromatin extraction, library preparation and sequencing were performed by Phase Genomics (Seattle, WA).

Genome assembly and polishing

PacBio reads from A. tenebrosa and T. stephensoni were assembled using SMART denovo [78] and HGAP4 (Pacific Biosciences, SMART Link), respectively. Assemblies were polished using raw PacBio reads and a custom Arrow pipeline [79]. Four iterations of Arrow polishing were carried out for A. tenebrosa, while three iterations were conducted for T. stephensoni. Subsequently, both assemblies underwent two rounds of polishing with a custom Pilon pipeline [80] using Illumina short reads. Indel errors were removed from the polished A. tenebrosa genome assembly with deGRIT [81].

Redundancy removal for A. tenebrosa was performed using the Purge Haplotigs pipeline [82]. This pipeline involved the alignment of raw reads to the genome using minimap2 [83] and sorting using samtools [84]. Repeat regions were accounted for by providing Purge Haplotigs the results of our repeat masking (see the “Repeat library generation” section) converted to BED file format with a custom script. Program parameters were set by manual inspection of the histogram produced by the pipeline (low: 20, mid: 75, high: 170).

Finally, proximity-guided assembly was carried out on our A. tenebrosa assembly by Phase Genomics (Seattle, WA) using the Proximo platform. The quality of genome assemblies was assessed using BUSCO [85] and a custom script [86].

Repeat library generation

A custom repeat library (CRL) was generated using homology and structure-based prediction in addition to de novo repeat prediction. MITEs were predicted using MITE-HUNTER v.11–2011 [87] and detectMITE v.20170425 [88]. The output MITE models of both programs were clustered using CD-HIT v.4.6.4 [89] with the same parameters employed by the detectMITE program to produce a non-redundant set of MITE models (cd-hit-est -c 0.8 -s 0.8 -aL 0.99 -n 5). In addition, a structure-based prediction of long terminal repeat retrotransposons (LTR-RTs) was performed using LTRharvest (GT 1.5.10) [90] and LTR_FINDER v.1.06 [91] following the recommended protocol indicated by LTR_retriever commit 8180c24 [92] to identify canonical and non-canonical (i.e. non-TGCA motif) LTR-RTs. The MITE and LTR-RT libraries were used to mask the genome assembly using RepeatMasker open-4.0.7 [93] with settings ‘-e ncbi -nolow -no_is -norna’. After this homology and structure-based modelling, de novo repeat prediction was performed using RepeatModeler open-1.0.11 [94] with the masked genome as input.

All repeat models predicted by the aforementioned programs were then curated to remove models that were potentially part of genuine protein-coding genic regions. This process first involved the removal of any models that were confidently annotated by either LTR_retriever or by RepeatModeler (i.e. were not classified as “Unknown”) from consideration as these were assumed to be true positives. The remaining nucleotide repeat models were translated into six reading frames and were searched using HMMER 3.1b2 [95] for a list of domain models associated with transposable elements (TEs). This list was generated by adding Pfam [96] and NCBI CDD [97] domains to a list of domains identified by a previous study [98]. The added Pfam domains were based upon visual inspection of domain prediction results for putative transposable elements and comparison to the “Domain organisation” graphics provided by the Pfam website to find models that were not likely to appear in non-TEs. The CDD domains were part of the cl02808 RT_like superfamily. Any repeat models that obtained a TE-associated domain prediction were assumed to be true positives and were removed from consideration.

To remove repeat models that may be part of genuine protein-coding genes, we generated a database of known genes. This database consisted of UniProtKB/Swiss-Prot proteins (version 10/26/17) [99] in conjunction with the gene models of N. vectensis (v.2.0) [100], E. diaphana (v.1.1) [35], Acropora digitifera (v.0.9) [101], and Hydra vulgaris (NCBI annotation release 102, June 8, 2015) [102]. Putative transposons were removed in this database by the same process detailed above using HMMER 3.1b2 and the list of TE-associated domains (i.e. any sequences with TE-associated domain hits were removed). Repeat models that remained after this curation process were removed from the initial CRL if they obtained a BLASTx result with E-value more significant than 1E−2 when submitted as a query against the gene model database. This process resulted in a high-quality CRL which was used to soft-mask the A. tenebrosa genome using RepeatMasker (-e ncbi -s -nolow -no_is -norna -xsmall) for subsequent gene model prediction. Scripts were produced to automate this process [103].

Gene model prediction

Gene model prediction was performed using two complementary approaches: transcriptome-based gene model creation and ab initio gene model prediction.

RNA-seq read quality control and mapping

RNA-seq reads generated from previous studies involving A. tenebrosa [104] and T. stephensoni [58] were used to assist in gene model prediction. Raw reads were quality trimmed using Trimmomatic [105] with parameters “ILLUMINACLIP:$TRIMDIR/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:25” based on those used by the Trinity de novo assembler [106, 107]. Trimmed sequences were aligned to each genome file using STAR 2.5.4b (commit 5dbd58c) using the 2-pass procedure to assist in the identification of intron splice sites. The resulting SAM files were converted into sorted BAM files using samtools [84].

Transcriptome-based gene model prediction

Multiple transcriptome building programs were used to create a ‘master’ transcriptome file for subsequent curation using the EvidentialGene tr2aacds pipeline [108] following the parameters and methodology of Visser et al. [109]. This involved de novo transcriptome assembly using SOAPdenovo-Trans v.1.03[110] and Velvet/Oases [111, 112] with multiple k-mer lengths (23, 25, 31, 39, 47, 55, and 63 for both plus 71 for SOAPdenovo-Trans only) alongside Trinity v.2.5.1 [106]. We differed from Visser et al. [109] from here on. For Trinity, we modified the parameters of Visser et al. [109] to not specify min_contig_length and we set min_kmer_cov = 2 and SS_lib_type = RF. All resultant de novo assemblies had sequences shorter than length of 350 bp removed with a custom script. Genome-guided transcriptomes were constructed for each genome assembly using Trinity and Scallop v.0.10.2 [113] with the results of the STAR spliced alignment. For Trinity, we specified genome_guided_max_intron = 21,000 to reduce false positives. This number was based on preliminary analysis of an Illumina-based A. tenebrosa assembly [38] and the E. diaphana genome for which, of 152,518 total introns, only 51 are > 21 kb (0.03%) in the v.1.1 genome.gff. Scallop was run with default settings except where library_type = first. No minimum size cut-off was enforced for genome-guided transcriptomes as we assumed any short predictions would likely be more accurate than those built from de novo assembly. All transcriptomes were then concatenated and subjected to the EvidentialGene tr2aacds pipeline which rendered a master transcriptome consisting of non-redundant genes including alternative isoforms.

PASA v.2.2.0 [114] (commit af03820) was used to align the master transcriptome against the assembled genomes (using BLAT v.35 [115] and GMAP [116]) and to generate gene model predictions using default recommended parameters (i.e. transcriptome files were ‘cleaned’ using the seqclean tool, and validate_alignments_in_db.dbi were provided arguments MIN_PERCENT_ALIGNED = 75, MIN_AVG_PER_ID = 95, and NUM_BP_PERFECT_SPLICE_BOUNDARY = 0).

BRAKER1 ab initio gene model prediction

Gene models were predicted by BRAKER v.2.0.6 [117] using the repeat library soft-masked genome assembly (–softmasking) and the BAM file produced by STAR for use as ‘hints’ during Augustus v.3.2.3 [118] gene prediction.

Combined gene model prediction

EvidenceModeler [119] was used to combine transcriptome-based gene model predictions with BRAKER1-derived ab initio gene models with parameters set to favour transcriptome models above ab initio models (Augustus model weight = 1, Transcript-based model weight = 10). Following this, a custom pipeline to curate gene models was generated [120]. This pipeline finds additional gene copies missed by prior annotation steps and removes spurious models derived from transposable elements and from ribosomal RNA.

Phylogenetic and comparative genomic analysis

Genomic data resources

In addition to the two genomes reported herein, publicly available genomic resources for sea anemones were obtained for comparative genomic analyses. These included a chromosome-level N.vectensis genome (Nvec200) [37], the E. diaphana genome (GCF_001417965.1) [35], and the P. xishaensis genome (version 3) [121]. The annotation of N. vectensis was modified by manually annotating a putative SA8 gene into its genome prior to further analysis.

Gene family expansion and contraction

Gene families were predicted for the five sea anemone species using OrthoFinder (version 2.5.4) [122, 123]. Single copy genes were aligned with MAFFT (version 7.311) [124] and used as input to IQ-TREE 2 [125] with an estimated divergence date of 500 mya for Actinia and Nematostella. Gene family counts and our time-calibrated tree were used as input to CAFE 5 [126] to identify gene families with expansion/contracted predicted to a P-value threshold of 0.05 (fixed orthogroups).

Enrichment analysis

Gene ontologies (GOs) [127] and PFAMs [128] were predicted for each of the five sea anemone species’ representative gene models using custom scripts [129]. In short, Mmseqs2 [130] was used to query gene models against the UniRef90 database [131] and sequence matches had associated GO terms obtained from the id_mapping.tab file provided by the UniProt Knowledgebase [99]. PFAM 35.0 domain models were downloaded, and sequences were searched for these domains using HMMER3 [95]. Chi-squared tests were used to identify annotation terms enriched in expanded gene families; these tests were performed using a custom script [132]

Phylogenetic analysis

Relationships among 71 sea anemone full-length sequences (alignment length 121 amino acid residues) from the SA8 gene family were inferred under maximum likelihood (ML) in IQ-TREE v2.1.3[125]. The best fitting model of amino acid evolution was identified by ModelFinder within IQ-TREE as LG [133], with rate variation among sites accommodated by a proportion of invariant sites and a gamma distribution. This LG + I + G model was favoured by both BIC and corrected AIC (AICc). Default tree search parameters were employed within IQ-TREE and following reconstruction of the ML tree, clade support values were evaluated with 1000 ultrafast bootstrap samples.

Analysis of genome collinearity and classification of gene duplications

To identify similar proteins in inter-genome and self-genome comparisons, homology of amino acid sequences between the two species and within a single genome was examined using BLASTp (1xE−5). These BLASTp alignments were leveraged by MCScanX (default parameters) to identify collinear blocks. Additionally, the duplicate_gene_classifier program from the MCScanX package was used to classify the origins of duplicate genes in the genomes of A. tenebrosa and T. stephensoni, independently.

Expression analysis of SA8 genes and identification of toxin-like transcripts

To examine the expression patterns of SA8 genes, we made use of tissue-specific reads for three individuals of A. tenebrosa [33] and T. stephensoni [58]. Transcriptome assembly and differential gene analysis were conducted using the Trinity short-read de novo assembler v2.5.1 [134] and edgeR Bioconductor package v3.3.1 [135], respectively, as described in Ashwood et al. [58]. Similarly, Trinotate v3.1.1 was used for functional annotation as per Ashwood et al. [58].

Transcripts with homology to known animal toxins were identified from BLASTp searches against the UniProt database by filtering for transcripts annotated with the “toxin activity” GO term (GO:0,090,729), excluding bacterial toxins. Consequently, WGCNA one-step network construction and module detection (power = 7, mergeCutHeight = 0.25, minModuleSize = 10, deepSplit = 3) was used to identify correlation networks of toxin-like transcripts present in T. stephensoni and A. tenebrosa [136].

Milked venom peptide identification and mass spectrometry imaging

Peptides present in the milked venom of T. stephensoni and A. tenebrosa were previously identified by Ashwood et al. [58] and Surm et al. [33], respectively. Using these signal peptide-containing milked venom peptides as queries, homology to transcriptomic sequence and ToxProt entries was determined using tBLASTn and BLASTp, respectively. Significant BLAST hits were defined by a threshold e-value of 1 × E−5. Similarly, the mass spectrometry imaging (MSI) data for T. stephensoni utilised in the current study were generated as detailed previously [58]. The spatial distribution of peptides of interest was determined from the MSI dataset using SciLS Lab.

Chemical/sequence space analysis

A database of 102 SA8 peptides was generated using SA8 sequences identified from 14 actiniarian species spanning three superfamilies [33], as well as the SA8 translated ORFs of T. stephensoni from the current study. We analysed these SA8 sequences by position-specific biophysical property distance analyses, or sequence-space analyses [137]. Briefly, the multiple sequence alignment was converted to a matrix based on the physiochemical properties of the amino acids at each position in the alignment (gaps assigned column average values for each property), before dimensionality reduction (by principal component analysis) and group assignment (by Bayesian model based clustering) [137]. Additionally, SA8 sequences were aligned against ShK-like proteins (PF01549) and the homologous C-terminal regions of CRISPs (PF08562) available in the Pfam database [96], allowing SA8 putative toxins to be visualised in a sequence space with ShKT-domain containing sequences. Data were visualised with custom [R] scripts based on rgl [138]. WGCNA modules were mapped to the clusters identified in the sequence space of mature SA8 peptides, and the correlation between the two categorical variables was assessed using the GoodmanKruskal package [139].

Peptide synthesis and structure characterisation

Design of vector used for expressing SA8

A nucleotide sequence encoding the mature peptide sequence of SA8, containing codons optimised for expression in Escherichia coli, was synthesised by GenScript (Piscataway, NJ, USA). The sequence was subcloned into a modified pET21a vector (Novagen, Madison, WI, USA) using BamHI and EcoR1 restriction sites (Fig. 9). This modified vector encodes a MalE signal sequence (for targeting the fusion protein to the E. coli periplasm), a His6 affinity tag (for nickel affinity chromatography used for peptide purification), MBP fusion tag (to enhance the solubility of the expressed peptide), and a TEV protease recognition site (ENLYFQ) immediately preceding the coding region for SA8.

Fig. 9
figure 9

Schematic of the expression vector used for expression of SA8 in the periplasm of E. coli. Abbreviations: RBS, ribosome binding site; MalESS, MalE signal sequence; His6, poly-histidine affinity tag; TEV, TEV protease recognition site; SA8, sea anemone 8

Recombinant expression of SA8

The pET21a-SA8 plasmid was transformed into E. coli strain BL21 (DE3; Thermo Fisher) using a standard heat shock protocol [140]. Single colonies were inoculated into a 5-mL culture medium (Luria–Bertani (LB) plus 100 μg/mL of ampicillin) and grown overnight at 37 °C with shaking at 150 rpm. The overnight starter culture was used to inoculate LB medium containing 100 μg/mL of ampicillin, then the culture was grown until it attained an optical density at 600 nm (OD600) of 0.6–0.8. IPTG was added to a final concentration of 0.5 mM to induce expression, and the culture was grown overnight at room temperature with shaking at 180 rpm. Induced cells were harvested by centrifugation for 15 min at 2408 g and 4 °C, and the cell pellets were stored at − 80 °C.

Peptide purification

The His6-MBP-SA8 fusion protein was recovered from the bacterial periplasm by osmotic shock [141]. Cell pellets were thawed on ice to avoid lysis, resuspended in 30 mL 100 mM Tris, 30% sucrose, 2 mM EDTA pH 8.0, and incubated for 10 min at 4 °C. The suspension was centrifuged for 15 min at 4 °C and 5251 g. The pellets were resuspended in 30 mL of ice-cold water and 5 mM MgCl2, then incubated for 10 min at 4 °C before centrifugation for 15 min at 4 °C and 21,002 g. The supernatant containing soluble His6-MBP-SA8 fusion protein was diluted in 20 mM Tris, 150 mM NaCl, 5% glycerol pH 7.5 (TNG buffer) and incubated for 30 min with 5 mL Ni-nitrilotriacetic acid (Ni–NTA) Fast Flow resin (GE Healthcare, Uppsala, Sweden) in a gravity-fed column to capture the fusion protein via its affinity to Ni–NTA. The resin was washed with 25 mM imidazole to remove weakly bound proteins. The His6-MBP-SA8 fusion protein was then eluted with 200 mM imidazole, and then it was concentrated by centrifugal filtration (Amicon® Ultra, Millipore) and desalted using a PD-10 column (GE Healthcare, Pittsburgh, PA, USA) to remove imidazole. The fusion protein was incubated with His6-TEV protease (produced in-house; 40 μg per mg of fusion protein) for 16 h with shaking at 100 rpm and 30 °C. The cleavage mixture was then passed over Ni–NTA Fast Flow resin to remove His6-MBP and His6-TEV protease, and the eluate containing the liberated SA8 was collected for further purification using RP-HPLC. RP-HPLC was performed on a Vydac C18 column (250 × 10 mm) using a flow rate of 1 mL/min and a gradient of 30–50% solvent B (0.1% trifluoroacetic acid (TFA) in 90% acetonitrile) in solvent A (0.1% TFA in water) over 40 min with absorbance monitored at 214 nm. Fractions containing SA8 were collected and assessed by LC–MS. Fractions containing SA8 (> 99%) were lyophilised for NMR studies, mapping disulphide connectivity, and functional assays.

NMR spectroscopy and disulphide connectivity

A SA8 sample for NMR experiments was prepared by dissolving lyophilised peptide in 90% H2O and 10% 2H2O and adjusting the pH by addition of 0.1 N HCl and 0.1 M NaOH. NMR spectra were acquired on a 600-MHz Bruker NMR spectrometer (Billerica, MA) equipped with a cryogenically cooled TCI probe. One-dimensional 1H NMR spectra were acquired over the temperature range 5–30 °C in increments of 5 °C to investigate the effects of temperature on the spectrum before selecting conditions for structure determination. Bruker TopSpin, USA, version 3.6.1, was used for processing all spectra.

The disulphide connectivity was determined by analysing pepsin-digested recombinant, oxidised SA8 peptide using LC–MS/MS. 5 µg SA8 was dissolved in 10 µL 10 mM hydrochloric acid and digested with 20 ng/µL pepsin (Merck, cat. No. 10108057001) at 37 °C for 6 h. The digested sample was diluted with formic acid (FA) to yield a final concentration of 0.3 ng/µL in 15 µL 0.5% FA, from which 1 µL was analysed by LC–MS/MS. The sample was separated with a nanoElute nano-HPLC (Bruker, Germany) across a Bruker TEN C18 nano separation column (100 mm length, 75 µm inner diameter, 1.9 µm particle size, 120 Å pore size) heated to 50 °C, using a gradient of 5–30% solvent B (90% acetonitrile [ACN] in 0.1% formic acid [FA]) in solvent A (0.1% FA) over 18 min before a step gradient to 95% B for 18 min at a flow rate of 0.5 µL/min. The eluting peptides were analysed with an in-line Bruker timsTOF Pro operated in positive, data-dependent analysis, parallel accumulation-serial fragmentation (DDA-PASEF) mode with a cycle time of 0.5 s and an m/z range of 350–2200. The resulting MS and MS2 spectra were manually compared to a list of all possible peptic fragment precursors and their fragment ions estimated from all theoretical disulphide permutations. The disulphide bond connectivities identified in the recombinant peptide were confirmed with a sample of a natural peptide from T. stephensoni.

Co-elution of native and recombinant SA8

Crude extract containing the native SA8 peptide was isolated using a protocol based on that of Honma et al. [142]. Briefly, tissue from the body column of T. stephensoni was placed into tubes containing a single stainless-steel ball bearing (Qiagen) and homogenised using a Tissuelyser II (Qiagen). Subsequently, Milli-Q H2O was added to the tubes containing the macerate, which were then centrifuged for 15 min at 18,000 g, allowing the supernatant to be recovered. A second round of homogenisation and centrifugation was conducted after additional H2O was added to the tubes containing pelleted material. The supernatant from both rounds of extraction was combined, then flash-frozen in liquid nitrogen and lyophilized.

The toxin extract sample was diluted with 500 µL of Milli-Q water and centrifuged for 1 min at 17,680 g. The sample was loaded onto Sep-Pak C18 Vac cartridge/5000 mg (Waters) that was previously equilibrated with 1 mL methanol and 1 mL Milli-Q water. The sample was then washed with 500 µl of Milli-Q water and eluted with 750 µl of TFA/Milli-Q/acetonitrile (1:70:29). The eluate was evaporated under a constant stream of N2. The residue was reconstituted in Milli-Q water (80 µL) and analysed using LC–MS. An analytical Luna C8 LC column (100 mm × 2 mm) was utilised for co-elution of native and the recombinant peptides with a gradient of 0–60% buffer B (90% ACN, 0.1% trifluoroacetic acid [TFA]) in buffer A (0.1% TFA) and constant flow rate of 0.2 mL/min over 10 min.

In vitro refolding of reduced recombinant SA8 peptide

The recombinant SA8 peptide was reduced and unfolded in 10 mM dithiothreitol (DTT), 20 mM Tris at pH 8. Complete reduction occurred after approximately 30 min, as confirmed using LC–MS. Following reduction, a PD-10 column was used to desalt the sample and remove DTT. The reduced SA8 was then oxidised using 0.1 M NH4HCO3 buffer at pH 8, with complete oxidation occurring after 8 h, as verified using LC–MS. 1D 1H NMR spectra were acquired at pH 3.5 and 25 °C for the major re-oxidised product to check folding.

In vitro and in vivo functional testing

Patch clamp electrophysiology

CHO and human embryonic kidney (HEK 293) cells were grown in DMEM-high glucose supplemented with 10% FBS, 2 mM L-glutamine, 100 U/mL penicillin-g, and 100 μg/mL streptomycin (Invitrogen) at 37 °C in a 5% CO2 and 95% air humidified atmosphere. Cells were passaged twice per week following a 5-min incubation in PBS containing 0.2 g/L EDTA (Invitrogen, Carlsbad, CA).

CHO cells (ATCC) were transiently transfected with plasmids encoding hKV1.1, hKV1.2, hKV10.1, hKCa3.1, hTRPA1, or hNaV1.7 using Lipofectamine 2000 (Invitrogen, Carlsbad, CA) following the manufacturer’s protocol, then cultured under standard conditions. The cells were co-transfected with a plasmid encoding green fluorescent protein (GFP) at a molar ratio of 10:1 except for hKV1.1, hKV1.2, and hKCa3.1 for which GFP-tagged ion channel vectors were used. Currents were recorded 24–36 h after transfection. GFP-positive transfectants were identified with a Nikon Eclipse TS100 fluorescence microscope (Nikon, Tokyo, Japan) using bandpass filters of 455–495 nm and 515–555 nm for excitation and emission, respectively, and these cells were used for current recordings (> 70% success rate for transfection). HEK 293 cells stably expressing the hKv11.1 channel (hERG, hKCNH2 gene, a kind gift from H. Wulff, University of California, Davis, CA) and mKCa1.1 channel (BKCa, hKCNMA1, a kind gift from C. Beeton, Baylor College of Medicine, Houston, TX) were used. hTRPV1 channels were expressed in a stable manner in CHO cells. Transfected cells were washed twice with 2 mL of ECS (see below) and re-plated onto 35-mm polystyrene cell culture dishes (Cellstar, Greiner Bio-One). hKV1.3 currents were recorded from activated peripheral blood mononuclear cells (PBMCs, see below) 3–4 days after activation. Human veinous blood was obtained from anonymised healthy donors. PBMCs were isolated using Histopaque1077 (Sigma-Aldrich Hungary, Budapest, Hungary) density gradient centrifugation. Cells obtained were resuspended in RPMI 1640 medium containing 10% foetal calf serum (Sigma-Aldrich Hungary, Budapest, Hungary), 100 μg/mL penicillin, 100 μg/mL streptomycin, and 2 mM L-glutamine, seeded in a 24-well culture plate at a density of 5–6 × 105 cells/mL, and grown in a 5% CO2 incubator at 37 °C for 3–5 days. Phytohemagglutinin A (PHA, Sigma-Aldrich Hungary, Budapest, Hungary) was added to the medium at 10 μg/mL to amplify the KV1.3 expression. Cells were washed gently twice with 2 mL of ECS for the patch-clamp experiments.

Conventional whole-cell patch-clamp electrophysiology [143] was used to record ionic currents. Micropipettes were pulled in four stages using a Flaming Brown automatic pipette puller (Sutter Instruments, San Rafael, CA) from Borosilicate Standard Wall with Filament aluminium–silicate glass (Harvard Apparatus Co., Holliston, MA) with tip diameters between 0.5 and 1 μm and heat polished to a tip resistance of 2–8 MΩ. All measurements were carried out by using Axopatch 200B amplifier connected to a personal computer using Axon Digidata 1550A data acquisition hardware (Molecular Devices, Sunnyvale, CA). In general, the holding potential was − 120 mV. Records were discarded when a leak at the holding potential was > 10% of the peak current at the test potential. Series resistance compensation up to 70% was used to minimise voltage errors and achieve proper voltage-clamp conditions. Experiments were performed at room temperature (20–24 °C). Data were analysed using Prism 8 (GraphPad, CA, USA) and ClampFit 10.5 software (Molecular Devices Inc., Sunnyvale, CA). Before analysis, whole-cell current traces were corrected for ohmic leakage and digitally filtered with a three-point boxcar smoothing filter. For hKCa3.1, hTRPA1 and hTRPV1, the reversal potential for K+ was determined and only those currents were analysed for which the reversal potential fell into the range of the theoretical reversal potential ± 5 mV (− 86.5 ± 5 mV for KCa3.1, and 0 ± 5 mV for hTRPA1 and hTRPV1).

Solutions: All salts and components of the solutions were purchased from Sigma-Aldrich, Budapest, Hungary. For hKV1.1, hKV1.2, hKV1.3, hKV10.1, mKCa1.1, and hNaV1.7, the ECS contained 145 mM NaCl, 5 mM KCl, 2.5 mM CaCl2, 1 mM MgCl2, 10 mM HEPES, and 5.5 mM glucose (pH 7.35 with NaOH), while the intracellular (pipette) solution (ICS) contained 140 mM KF, 2 mM MgCl2, 1 mM CaCl2, 11 mM EGTA, and 10 mM HEPES (pH 7.22 with KOH). For recordings of hNav1.7 currents, the composition of ICS was 105 mM CsF, 10 mM NaCl, 10 mM EGTA, and 10 mM HEPES, (pH 7.2 with CsOH). For recordings of hKv11.1 currents, the ECS contained 140 mM choline-chloride, 5 mM KCl, 2 mM MgCl2, 2 mM CaCl2, 0.1 mM CdCl2, 20 mM glucose, and 10 mM HEPES (pH 7.35 with NaOH) and the ICS contained 140 mM KCl, 10 mM EGTA, 2 mM MgCl2, and 10 mM HEPES (pH 7.3 with KOH). hKCa3.1 currents were recorded with an ECS of the following composition: 160 mM L-aspartic acid sodium salt, 5 mM KCl, 2.5 mM CaCl2, 1.0 mM MgCl2, 5,5 mM glucose and 10 mM HEPES (pH 7.4 with NaOH), while the composition of ICS was 150 mM L-aspartic acid potassium salt, 5 mM HEPES, 10 mM EGTA, 8.7 mM CaCl2, and 2 mM MgCl2 (pH 7.22 with KOH) giving ~ 1.2 µM free Ca2+ to fully activate the KCa3.1 current [144]. For recordings of TRPA1 and TRPV1 currents, the ECS and ICS contained 150 mM NaCl, 2 mM Na-EDTA, and 10 mM HEPES (pH 7.35 with NaOH). hTRPA1 and hTRPV1 currents were activated by 100 µM allyl-isothiocyanate (AITC) and 1 µM capsaicin, respectively, and SA8 was added after complete activation of the current. SA8, activators and positive controls were dissolved in ECS, except for TEA+ that was stored in 100 mM stock at 4 °C. The osmolarity of the ECS and ICS were 302–308 mOsM and ~ 295 mOsM, respectively. SA8 and positive controls were dissolved in ECS supplemented with 0.1 mg/mL bovine serum albumin (BSA; Sigma-Aldrich Hungary, Budapest, Hungary). Bath perfusion around the measured cell with different extracellular solutions was achieved using a gravity flow microperfusion system using a rate of 0.5 mL/min. Excess fluid was removed continuously.

Voltage protocols: For measurement of hKV1.1–1.3 and hKV10.1, currents, voltage steps to + 50 mV, were applied from a holding potential of − 120 mV every 15 or 30 s and the peak current was measured. Channels were activated by a series of depolarisation pulses to + 50 mV from a holding potential of − 120 mV. For KV1.1, 50-ms-long activating stimuli were used every 15 s. Due to the highly variable activation kinetics of KV1.2 [145] 200-ms-long pulses were applied every 15 s to maximise the open probability of the channel. Due to the slow inactivation kinetics of KV1.2, the currents did not inactivate even at 200-ms-long depolarisation pulses. KV1.3 currents were evoked by 15-ms-long depolarisation pulses to + 50 mV from − 120 mV every 15 s. The use of such short pulses every 15 s was sufficient to fully activate the channels and ensured that there was no cumulative inactivation of KV1.3. Positive controls were applied at a concentration equivalent to their Kd values (14 nM ChTx for KV1.2, 0.3 mM TEA+ for KV1.1, 2 µM Astemizole for KV10.1 and 10 mM TEA+ for KV1.3 and KCa1.1). The approximate 50% reduction in the current amplitude in the presence of these compounds was an indicator of both the ion channel and the proper operation of the perfusion system. For mKCa1.1 channels, a voltage step to + 100 mV from a holding potential of − 100 mV was used. Pulses were delivered every 10 s. hKCa3.1 currents were elicited every 15 s with voltage ramps to + 50 mV from a test potential of − 120 mV at a rate of 0.85 mV/ms. The holding potential was set to − 85 mV. For hKV11.1 channels, currents were evoked with a voltage step to + 20 mV followed by a step to − 40 mV, during which the peak current was measured. The holding potential was − 80 mV, and pulses were delivered every 30 s. hTRPA1 and hTRPV1 currents were elicited every 5 s with 200-ms-long voltage ramps to + 50 mV from a test potential of − 50 mV. The cells were held at 0 mV during the subsequent pulses. The TRPA1 and TRPV1 currents were activated by 100 µM AITC and 1 µM capsaicin, respectively. Currents through hNaV1.7 channels were evoked every 15 s with a voltage step to + 50 mV from a holding potential of − 120 mV. The RCF at a given molar concentration was calculated as I/I0, where I0 is the peak current in the absence, and I is the peak current at equilibrium block or in the absence of inhibition after ~ 2 min perfusion by SA8. Data points on concentration–response curves represent the mean of four individual measurements. These curves were fitted via simple linear regression according to the equation Y = 1 + slope × [toxin], where Y is reciprocal of the RCF and [toxin] is the molar concentration of SA8. The reciprocal of the slope yielded the IC50 value.

Injection activity assay in Drosophila melanogaster

The activity of the recombinant SA8 peptide in Drosophila melanogaster was assessed using an injection activity assay [76]. The mass of female D. melanogaster used for this experiment ranged from 0.8 to 1.1 mg. Eight D. melanogaster were used as an injection control group and administered 50 nL of water. Five doses of SA8 (0.00005, 0.0005, 0.005, 0.05 and 0.5 µg per fly; 50 nL) were each administered to eight D. melanogaster (n total = 40), with observations of paralysis and mortality made at 2 and 24 h post-injection (Additional file 14: File S6). Dose–response data were analysed as described previously [76].