INTRODUCTION

Adenosine deaminases of the ADAR (adenosine deaminases acting on RNA) family have been found in most multicellular animals. Enzymatically active ADAR isoforms deaminate adenosine residues of ribonucleic acids at the 6th position of purine heterocycle, resulting in the adenosine conversion into non-canonical inosine. This natural post-transcriptional modification of RNA is common in animal cells [1].

The activity of ADARs depends on the presence of double-stranded (ds) segments in the RNA structure. ADARs have a domain that selectively binds dsRNA sequences. In humans and rodents, as well as some other vertebrates, enzymatically active ADAR forms have been classified into two functional groups. In mice and humans, each of the groups includes one enzyme encoded by a single gene. Thus, human ADAR1 enzyme (ADAR gene) edits continuous regular stretches of dsRNA typically found in non-coding RNAs or non-coding parts of mRNAs, whereas human ADAR2 (ADARB1 gene) is active towards shorter and irregular duplexes often located in the mRNA coding sequences [2].

How do these two types of ADAR enzymes function? The presence of double-stranded DNA or RNA in the cytoplasm elicits innate immune response even at comparatively low concentrations of these nucleic acids, since it may indicate a launch of the viral attack. The activity of ADARs towards continuous dsRNA structures neutralizes these sequences. Because inosine is more complementary to cytidine than to uracil (in contrast to adenosine), dsRNA structures are destroyed after editing by ADARs. The main function of ADAR1 is neutralization of long duplexes in the nucleus and cytoplasm in response to the type I interferon-mediated response elicited by the increase in the cytoplasmic levels of dsRNA via the negative feedback mechanism. In this context, mammalian type I interferons induce a production of alternative long ADAR1 isoform that is transported to the cytoplasm, where it manifests its anti-interferon activity [3].

Unlike ADAR1, ADAR2 acts more locally. It fine-tunes the splicing of mRNAs and, in some cases, changes amino acid sequences of encoded proteins. Indeed, changes of the complementary base in mRNA can change the codon meaning if the corresponding position provides for such a substitution. Inosine residue behaves as guanosine, which results in single amino acid polymorphisms or, in rare cases, in stop codon skipping. Therefore, RNA editing by ADAR enzymes, more specifically, ADAR2 (at least in vertebrates), provides proteome recoding [3].

The extent of proteome recoding is different in various organisms. For example, in cephalopods, transcriptomes are edited and proteomes are recoded at a significant level, with several hundred amino acid substitutions found in different molluscan species [4]. Drosophila, which has only one ADAR enzyme (an ortholog of vertebrate ADAR2), is also characterized demonstrates an extensive RNA editing and proteome recoding: about seventy amino acid substitutions have been detected in the Drosophila proteome [5]. In comparison to mollusks and insects, mammals, such as primates and rodents, have a much lesser number of protein recoding events (as predicted from the transcriptomes), with only ~20 of them reliably identified in the proteome [6].

Despite the limited scale of recoding in mammals, the loss-of-function mutations in the mammalian ADAR2 genes are usually lethal. The lethal phenotype of transgenic mice with the knocked out of Adarb1 gene (ADAR2) was reversed by a single amino acid substitution in the gene of the ionotropic AMPA glutamate receptor subunit 2 (Gria2). During embryonic development, residue 607 of this protein is recoded from Gln to Arg, which results in 100% recoding in adult individuals. The recoding leads to a decrease in the conductivity of the corresponding ion channel essential for the normal brain functioning. Introduction of the Q607R substitution in the receptor molecule by genome mutagenesis rescued the lethality in Adarb1-knockout mice [7].

Several other sites recoded by RNA editing have been functionally characterized in mice and various cells types. For example, an amino acid substitution in filamin A (FLNA) resulting from RNA recoding was found to affect the tone of vascular smooth muscles [8]. Evolutionary conserved Ile-to-Val substitution in coatomer subunit alpha (COPA) has the tumor suppressor effects and plays a role in the development of human hepatocellular carcinoma [9].

The scope of proteome recoding via RNA editing by ADARs has been typically estimated based on the transcriptome data. However, it is unclear how many editing events lead to the recoding of translated proteins. It is believed that many editing events are the side effects of enzymatic reactions and have no biological significance [10]. Apparently, the editing events that result in protein recoding and can be detected in the proteome are more likely to have the functional consequences. That is why the results of transcriptome-wide analysis of RNA editing by ADARs can be used for searching the amino acid polymorphisms in the proteomic data. Since a common approach to the identification of protein sequences is a search for matches between the mass-spectrometry data and protein sequences predicted from the genomes or transcriptomes, identification of editing events requires that the theoretical, consensus proteome should be supplemented with the recoded sequences [11].

Several reports have been published on analysis of recoding sites in the proteome level. In the pioneering work, Liscovitch-Brauer et al. [4] demonstrated extensive transcriptome rearrangements ADARs in cephalopods and confirmed the corresponding changes in the mollusk proteomes [4]. We found recoded proteins in the Drosophila proteome [5] and revealed that the recoding patterns were different at different metamorphosis stages [12]. We also identified recoded sites in the murine and human brain proteins [6], which were conserved between these two mammal species but did not overlap with the recoded sites in the Drosophila brain proteome. In another important work, Peng et al. [13] conducted a large-scale analysis of various cancer cell proteomes and identified the corresponding recoded sites [13].

In addition to the well-studied RNA editomes of cephalopods, Drosophila, mice, and humans, RNA editing by ADARs was recently described in the transcriptome of another well-established model species, zebrafish (Danio rerio) [14]. This freshwater fish of the Cypriniformes order native to the tropical Asia, has been used in laboratories since 1960s for a wide range of biomedical applications. The studies in the zebrafish have played an important role in developmental biology, neuroscience, and investigation of diseases of vertebrates [15]. The aim of this study was to reliably identify recoded protein sites in the zebrafish proteome using the transcriptome data and a workflow implemented in our earlier studies. Identification of the recoding sites at the proteome level is essential for further research, as these sites are more likely to be functionally important in comparison to the sites detected only at the transcriptome level.

MATERIALS AND METHODS

Development of the database with predicted sites recoded by RNA editing for the proteomic search. As a reference, we used the zebrafish proteome available from Uniprot (accession number UP000000437_7955 [16]). The non-synonymous substitutions in the mRNA exons predicted from the zebrafish transcriptome were taken from Buchumenski et al. [14]. The VCF file was composed from the said substitutions, which was annotated with the snpEff program [17] using the zebrafish genome assembly, version GRCz11.99 [18]. Therefore, the recoded residues were associated with the proteomic database records. In total, substitutions in the zebrafish proteins were predicted in 116 positions; 63 of them had two alternative variants.

Using implemented in-house Python scripts, protein sequences were converted into tryptic peptides with one possible missed cleavage site. Recoded proteins were then converted into peptides using the same approach. The resulting database contained both genome-encoded and recoded peptides. At the first stage, 116 recoded sites produced 471 tryptic peptides (taking into account peptides with one missed trypsin cleavage site). Since mass-spectrometry does not allow identification of very short or very long peptides, the final list included peptides 7 to 40 amino acid residues in length. After exclusion of 139 short and long peptides, the final list of recoded peptides contained 332 species.

Using the Pyteomics program library, the decoy sequences were generated as reversed target sequences with the conservation of the C-terminal residue and added to the database [19].

Proteomic datasets. Six shotgun proteomic datasets of zebrafish tissues [21-25] were selected from the ProteomeXchange repository for reprocessing [20] (see Table 1 for more details).

Table 1 Proteomic datasets of zebrafish tissues that were used to search for protein sites recoded via RNA editing by ADARs

Proteomic search. Original RAW files were converted into mzML files using ThermoRawFileParser [26]. The IdentiPy search engine [27] was tuned specifically for each dataset. Its configuration file was set as follows, according to the methods used for the acquisition of mass spectra in the original works:

  • PXD023967 – product accuracy: 0.01 Da, fixed: camC, variable: oxM, acetyl-

  • PXD030733 – product accuracy: 0.5 Da, fixed: camC, variable: oxM, acetyl-

  • PXD009612 – product accuracy: 0.01 Da, fixed: camC, variable: oxM

  • PXD005630 – product accuracy: 0.5 Da, fixed: camC, variable: oxM, camM

  • PXD016847 – product accuracy: 0.5 Da, fixed: camC, variable: oxM, acetyl-

  • PXD014228 – product accuracy: 20 ppm, fixed: camC, variable: oxM

The peptides were identified using IdentiPy in the automatic parameter optimization mode. Scavager post-search tool [28] was used to validate the recoded peptides and to filter them using the group specific false discovery rate (FDR 1%).

Visualization of mass spectra. The mass spectra of the peptides of interest were visualized with the xiSPEC spectrum viewer [29] and examined manually. The coverage of the corresponding amino acid substitution by the MS/MS fragments confirmed the validity of the peptide spectrum match.

Annotation of recoded sites in terms of the protein spatial structure. Amino acid substitutions were annotated using various open access resources. First, the recoded proteins were annotated using the UniProt [16] and neXtProt [30] databases. Zebrafish proteins lacking known spatial structure were aligned with the mammalian proteins using Blastp [31]. The spatial structures were analyzed with the PyMOL software [32]. The effects of the substitutions on the spatial structure stability were estimated using I-Mutant2.0 [33], MUpro [34], and iStable [35].

RESULTS AND DISCUSSION

Datasets and filtering of identified recoded sites. In contrast to the data obtained for humans and mice, few datasets are available for the zebrafish proteome. Besides, these datasets should satisfy a set of criteria essential for successful implementation of the proteomics and genomics methods used in this and other works [6]. For example, for reprocessing, we used the data of the label-free shotgun proteomic analyses recorded using Orbitrap mass spectrometers (Table 1). Because of the deficit of available proteomic datasets, we analyzed the high-resolution data of tandem mass spectrometry acquired with an Orbitrap detector (as in most recent studies), as well as the data obtained using earlier hybrid machines (Orbitrap Velos or Elite) with the linear ion traps (LTQ). The MS/MS spectra obtained with the latter devices had lower mass tolerance, which was taken into account when setting the proteomic search parameters.

Due to the fact that zebrafish is a common model in neuroscience, most of selected datasets contained data from various brain preparations, including synaptic fraction [21] and cultured motor neurons. One of the used proteomic dataset was from 6-day-old whole zebrafish embryos [22] (Table 1). According to the existing understanding of RNA editing, the central neural system is a promising study object, since in vertebrates, most editing events followed by the protein recoding occur in the neural tissues, where the ADAR2 isoform is highly expressed [3].

The use of the datasets with the low-resolution MS/MS data prompted up to more carefully select the filtration criteria for the identification of recoded peptides. Thus, the following inclusion criteria were used: identification of the recoded site in at least in two datasets; identification in two different peptides formed by incomplete tryptic cleavage; confirmation of identification by manual examination of the mass spectra visualized with the xiSPEC viewer [29]. In addition, peptides with incomplete tryptic cleavage were excluded from analysis, if not confirmed by identification of peptides produced by complete cleavage, as these peptides ensured higher false discovery rates (FDR) [36].

A broader distribution of the m/z values in the low-resolution tandem mass spectra makes it more difficult to curate the visualized mass spectra manually. Such mass spectra obtained by the chromatography/mass spectrometry of complex mixtures usually contain multiple peaks unrelated to the target peptide. However, in some cases, the peaks of interest have a higher intensity. Thus, Fig. 1 illustrates a correspondence between the mass spectrum and the peptide of the glutamate receptor subunit gria4b. As one can see from Fig. 1, the most intense peaks of the mass spectrum are associated with the molecular masses theoretically predicted for the genome-encoded peptide with the K/R substitution. The rest of the mass spectra representing recoded peptides and confirmed by manual examination are shown in Fig. S1 in the Online Resource 1.

Fig. 1.
figure 1

Tandem mass spectrum of the glutamate receptor subunit gria4b peptide visualized with the xiSPEC software [29]. The most intense peaks are associated with the molecular masses theoretically predicted for the corresponding peptide with K/R substitution.

Recoded sites identified in the zebrafish proteomes. Recoded protein sites were found in 4 of 6 datasets processed in this work. All four datasets were generated for the central nervous system. In total, 10 recoded sites were found of 116 sites predicted from transcriptome [14] (one of the sites yielded two recoding variants; see Table 2). This number was even less than the number of recoded sites reliably identified in the central neural systems of mice and humans (14 and 18, respectively) [6]. However, the search database containing recoded sites predicted from the RNA-seq data from the zebrafish was significantly smaller. Moreover, the mammalian datasets were significantly larger in terms of the number of acquired mass spectra.

Table 2 Zebrafish protein sites recoded via RNA editing by ADAR enzymes and identified in the shotgun proteomic data

As mentioned above, the classical and the best studied example of mammalian protein recoding is substitutions in the NMDA glutamate receptor subunits. Among them, the Gln-to-Arg replacement influences the conductivity of the corresponding ion channel and is essential for the development and functioning of the central nervous system. At the same time, the existing shotgun proteomics methods, which use trypsin as a protease of choice, do not allow to detect this substitution reliably. However, the transcripts of the glutamate receptors of the same type contain other edited sites whose existence was confirmed by the analysis of mammalian proteomes [6].

Seven of 10 recoded sites were identified in the zebrafish sequences homologous to the mammalian receptor subunits and partially identical to them. They were found in both flip and flop alternatively spliced isoforms [37] of the gria2a, gria2b, gria4a, and gria4b gene products (Table 2). These isoforms have been identified in many vertebrates [37]. Among the recoded sites, we found the R/G substitutions at positions 760-766 (depending on the specific gene), that have been also described for the mammalian proteomes. Another identified substitution was K493R in the gria4b gene product (the mass spectrum of the corresponding peptide is shown in Fig. 1). No recoded sites homologous to this substitution have been found in the proteomes of other species.

The homologs of other recoded sites observed in mammals were mostly absent in the zebrafish transcriptome [14], except the Q/G substitution in the cadpsa gene product, which is present in the editomes and recoded proteins of mice and humans. However, we did not detect this substitution in the zebrafish proteins.

It was found recently that the zebrafish editome contains many other recoding substitutions absent in mammals [14]. We were able to reliably identify three such sites in three proteins, respectively. Notably, all of them were situated in the protein fragments that could not be resolved by special structure prediction systems, such as AlphaFold 2 [38] and Swiss-model [39], since these fragments lacked sufficient similarity with experimentally resolved structures of the corresponding homologs from other organisms.

Astrotactin 1 (astn1) was characterized by the substitution of Lys to Arg or Glu at position 935. This transmembrane receptor regulates neuronal adhesion and participates in the migration of juvenile neuroblasts during brain development [40]. The extracellular part of this receptor contains the EGF (epidermal growth factor)-like domains and the fibronectin domain, as well as the recoded site, which is situated outside of the said domains. Supposedly, the recoded site is located in a close vicinity to the molecule fragment essential for the ligand binding, as indirectly indicated by the proximity of glycosylation sites. The recoded site of astrotactin is exposed and not covered by other structures. Interestingly, the recoded residue is more or less conserved and is also present in mammalian orthologs, where, however, it is not recoded. At the same time, the homology of the sequences surrounding the recoded residue is not enough to predict their folding based on the resolved structures. Evaluation of the effect of this substitution on the protein stability by three different methods produced conflicting results. Although lysine replacement is supposed to have little effect on the local conformation, it may modify the binding affinity toward potential ligands.

Neuregulin 3b (neu3b) is another neuronal transmembrane protein that participates in the development of neural tissue. Like astrotactin, it shares some features with EGF. Thus, in mammals, neuregulin ortholog binds to a member of the EGF receptor family [41]. The Thr-to-Ala recoding was found at position 10, close to the N-terminus of the molecule. The N-terminal part of neuregulin is exposed into the extracellular space and, as often occurs in proteins, is structurally unstable. Analysis of protein stability after a single amino acid substitution with MUPro [34] and i-Stable [35] predicted a decrease in the protein structure stability after the replacement, while I-Mutant [33] reported no difference in the stability after the substitution. Since the N-terminus of neuregulin is unstable, the substitution is supposed to have no significant effect on the receptor spatial structure. However, like in astrotactin, it may still influence potential protein–protein interactions. In contrast to the recoded site in astrotactin, the N-terminus of neuregulin 3 is not conserved and has nothing in common with the mammalian orthologs.

While astrotactin 1 and neuregulin 3 have some common functions in providing adhesion and other receptor-mediated interactions in developing neurons, the role of the third recoded zebrafish protein is different. The product of the rims2b gene is unrelated to glutamate receptors; it is located under the presynaptic membrane and is associated with the presynaptic vesicles. It is involved in the calcium-dependent exocytosis, i.e., in the synaptic activity. The S489G substitution is in the non-conserved part of the sequence, which is absent in the mammal orthologs and, therefore, has not been spatially characterized. All three methods used for the stability prediction indicated that thus substitution most probably decreases the structural stability of the protein.

Interestingly, presynaptic membrane proteins participating in the same exocytosis pathway (including the ortholog of the cadpsa gene product mentioned above) are extensively recoded in the Drosophila nervous system [12]. It was suggested that RNA editing in this poikilothermal insect contributes to the adaptation of its nervous system to the functioning at changing ambient temperature [42]. One may hypothesize a similar function of RNA editing and protein recoding in zebrafish, which is also poikilothermic. However, the confirmation of this hypothesis requires extensive and labor-consuming in vivo experimentation in transgenic fish.

CONCLUSIONS

RNA editing by ADAR enzymes is an evolutionary ancient cellular process characteristic for most Eumetazoans. During evolution, enzymatic adenosine deamination has acquired two different functions. The first one is inactivation of potentially immunogenic RNA duplexes. The second function is recoding of selected proteins by introduction of amino acid substitutions in their sequence. At first sight, the purpose of recoding seems confusing, because functional amino acid substitutions could be formed by mutations in the genome, and protein functions are often modulated by multiple post-translational modifications. However, recoding as a mechanism of structural and functional modification of some proteins has been fixed in the course of evolution and, for example, is used to regulate protein function during development.

We have systematically studied protein recoding at the protein level in different species. Analysis of six proteomic datasets of the zebrafish, a model fish species, yielded as few as ten recoding sites only. According to the published data, mRNA editing in this species is less common than in mammals. At the proteome level, most recoding events were found in the glutamate receptor subunits, which should be considered as a case of conserved protein recoding via RNA editing in vertebrates. The recoded sites identified in three other proteins were specific for the zebrafish (according to existing data). Although these amino acid substitutions unlikely affect the spatial structure of the corresponding proteins, they may influence the intermolecular interactions. Out of recently described examples, a conserved and at first sight nonsignificant Val-to-Ile substitution of valine to isoleucine in human coatomer subunit alpha was shown to modulate the malignant potential of hepatocellular carcinoma [9]. Interestingly, all three zebrafish-specific amino acid substitutions were in the sequences less conserved in the zebrafish than in other vertebrates. It is possible that this taxon-specific recoding is more typical for the evolutionary younger, changing protein fragments, although it is yet unknown whether this recoding is conserved among, e.g., Actinopterygii or other taxa. We believe that considerations on the evolution in the context of RNA editing and consequent protein recoding should be left for the transcriptomic studies, which provide the authors more options for the comparison between the species [43]. The hypothesized location of recoded zebrafish proteins in the neuron is shown in Fig. 2.

Fig. 2.
figure 2

Hypothesized location of recoded proteins in the zebrafish neuron.

The continuation of this work may be an experimental verification of the hypothesis on the functional significance of recoded sites found in the adhesion and synaptic proteins. Besides, very few extensive proteomic zebrafish datasets are available for comparing with mouse or human databases, so one may continue generating those larger datasets. At the moment, the dataset for the isolated synaptic components produced most results with respect to protein recoding via RNA editing, but unfortunately, this dataset was generated using low-resolution MS/MS spectra [21]. Further generation of high-quality proteomic data from different zebrafish cells and tissues will promote our understanding of protein recoding via RNA editing by the ADAR enzymes in this important model species.