Background

Tissue regeneration allows restoration of the function of damaged tissues and organs. Mammals have the ability to regenerate a limited number of tissues and organs like skin [1, 2], skeletal muscle [3, 4] and liver [5, 6]. Unfortunately, injuries or disease of the central nervous system (CNS) resulting in neuronal loss cannot regenerate neurons in mammals [7,8,9,10,11,12]. In contrast, zebrafish (Danio rerio) have the ability to regenerate numerous different tissues, including tissue in the central nervous system [10, 12,13,14,15,16,17,18,19]. For example, zebrafish can regenerate damaged retinal neurons, which restores visual function [20]. In all species examined, macrophage populations appear to be crucial to tissue regeneration [21,22,23,24,25,26,27,28,29,30], though in the mammalian CNS they appear to instead engage in pathological functions [31,32,33,34,35].

In vertebrates, the retina lies at the back of the eye and is a stereotypically organized part of the CNS that is composed of neural and glial cell types that are laminated into 3 distinct nuclear layers. Evidence strongly indicates that Müller glia are the source of regenerated retinal neurons in zebrafish [12, 36,37,38,39,40,41,42]. In both zebrafish and mammals, resident microglia respond to retinal injury and degeneration. This may lead to immune-Müller glia crosstalk that may shape Müller glia reaction to retinal injury [43,44,45]. The zebrafish is a relatively new, and powerful, vertebrate model in microglial biology [10, 30, 46,47,48,49,50,51]. In particular, microglia and macrophage functions in the regeneration of CNS tissue, such as in the zebrafish retina, is just beginning to be explored.

Our recent work has used the zebrafish towards an understanding of microglia and macrophage responses to acute, widespread retinal lesion in zebrafish [30, 51]. In particular, our transcriptome analysis [30] has provided a rich dataset to facilitate an understanding of gene expression in microglia/macrophages in a context of successful CNS regeneration. In order to translate our transcriptome findings in zebrafish [30] to mammals, we examined predicted orthology of differentially expressed genes (DEGs) enriched in zebrafish microglia/macrophages during retinal regeneration. We found that nearly all of the genes examined had predicted orthologs in mouse and human. However, several of these genes did not. Further, the putative function of these genes is largely unknown. As these “non-orthologous” genes comprise a portion of the microglia/macrophage regeneration-associated transcriptome [30], a better understanding of their predicted gene products will facilitate a greater understanding of the similarities and differences in fish and mammalian response to retinal injury. We reason that these genes could play functional importance in determining the outcome of tissue regeneration in zebrafish, and so functional predictions for these genes is necessary to inform future experimental work. This knowledge will also help us better understand evolutionary relationships between mammalian and teleost immunity.

For twelve selected genes without clear human or mouse orthologues, we performed a variety of bioinformatic analyses aimed to identify functional protein domains. These analyses included identification of protein domains and Gene Ontology (GO) analysis, sequence similarity comparisons, and predicted protein structure. In addition, we used synteny analysis which failed to find evidence of orthologous genes in human and mouse genomes. However, sequence similarity comparisons to find similar genes in other vertebrate species with well described regenerative capacity (Axolotl, Xenopus, Salamander) indicated possible orthologs for several of the genes of interest. We also examined several other published gene expression datasets to determine if these genes showed informative expression patterns in other contexts of tissue regeneration, or if these genes might also be differentially expressed in macrophages responding to microbial infection. The work presented here is informative for several zebrafish genes of previously unknown function, providing a foundation for future experimental work to test gene function in vivo. In addition, only one of these twelve genes was previously described to be differentially expressed in macrophages responding to microbial infection, suggesting that these genes indeed have importance to tissue regeneration and not only macrophage responses in general. These results have provided further insight into the transcriptome of zebrafish macrophages in the context of tissue regeneration.

Results

Selection of genes expressed in zebrafish microglia/macrophages for further bioinformatics analyses

We previously described a set of 970 genes enriched in in mpeg1+ cells (representing microglia and macrophage populations) compared to other retinal cell types in regenerating zebrafish retinas [30]. Of these genes, 409 of them comprised a list that we considered to be “regeneration-associated” transcripts. These particular 409 transcripts were considered to be “regeneration associated” because they were enriched in microglia/macrophages isolated from regenerating retinal tissue, but were not found to be enriched in resting/steady-state zebrafish brain microglia in another published study [30, 52]. Each gene in this list of 409 “regeneration-associated” transcripts was examined for predicted orthology in mouse and human species using the DRSC integrative ortholog prediction tool. Most genes returned predicted orthologues in mouse and/or human (Supplemental File 1). However, twelve (12) of these genes did not show predicted orthology to human or mouse genes with this analysis and were therefore selected for further bioinformatic analysis (Table 1, denoted P1-P12 throughout the manuscript). We reasoned that these twelve transcripts could be part of a transcriptional program executed in microglia/macrophages during CNS regeneration, and therefore could be important in understanding similarities and differences in mammalian vs. zebrafish outcomes following tissue damage.

Table 1 Transcripts enriched in zebrafish microglia/macrophages during retinal regeneration, without readily predicted human or mouse orthologs

Summary of results from bioinformatic analyses

A number of bioinformatic analyses were performed for the twelve genes of interest shown in Table 1 (methods summarized in Materials and Methods), and are summarized in Fig. 1. The species included in the results from these analyses are shown in Supplemental Figure 1. Protein domain and GO term were found for nine genes and largely included terms involved in immune system (Table 2). Orthologs found by sequence similarity arise from several species, mainly vertebrates (Supplemental Figure 1, Table 3); several are associated with the immune system or soluble signaling (Table 3) and the best-matched proteins are most frequently from species of fish, with occasional hits in mouse or human (Table 4). Overall, the results found for the sequence similarity and best-matched ortholog approach are consistent with the results found with the protein domain and gene ontology (GO) term approach (Tables 2, 3, 4). The three dimensional structure of the protein, or lack thereof, is known to determine protein function [56]. Of the genes studied here, two of these (P4 and P12 (pho)) are predicted to have greater than 50% disordered amino acids, and thus are likely to code for unstructured proteins (Supplemental Figure 2). We predicted three-dimensional (3D) structure using homology modeling (Table 5, Figs. 2, 3, 4, 5 and 6). The results are consistent with sequence similarity and protein domain/GO results for several genes of interest. In addition, structural similarity was informative for genes that did not return results with previous analyses (e.g. P2, P7, and P12). Synteny analysis compared to human and mouse genome returned results for only one gene (P4, with hit in human genome, Supplemental Figure 3), though based on sequence comparison this gene did not align with the candidate gene in the identified human chromosomal region. Comparison to other vertebrate species with described capacity for tissue regeneration (Ambystoma mexicanum, Xenopus laevis, Xenopus tropicalis and Cynops pyrrhogaster) returned putative orthologs of several of these genes (Table 6 and Supplemental Table 1) indicating that they may have conserved function across these species. More detailed descriptions of findings regarding P1-P12 are provided next.

Fig. 1
figure 1

Overview of Bioinformatic Analysis for Functional Predictions. The diagram shows an overview of the bioinformatic analyses performed in order to make functional predictions about the genes of interest based on (a) the predicted amino acid sequence, b predicted protein structure, and (c) genomic comparisons with selected species. The bioinformatic tool used for each type of analysis is indicated. Multiple approaches were used in order to obtain informational results for each gene of interest and to increase confidence in the overall predictions

Table 2 Protein domain and gene ontology (GO) term
Table 3 Orthologs and their species of origin identified by amino acid sequence similarity using EGGNOG
Table 4 Best-matched orthologs and their species of origin identified using SmartBLAST protein sequence analysis
Table 5 Protein structure analysis
Fig. 2
figure 2

Homology model of P1 putative kinase domain. The kinase domain of Receptor-interacting serine/threonine-protein kinase 2 (RIPK2, 6fu5.1.B in the rcsb protein database) is the template used for the homology modelling of P1. The X-RAY diffraction 3.26 Å was used to determine the experimental structure of 6fu5.1 [60]. The blue color show regions of the model where P1 was well-modeled and orange regions where P1 was poorly modeled. The well-modeled regions (blue) are regions where P1 is likely to be similar to the experimental 3D structure of the template. The homology model pertains to the putative kinase domain of P1 and starts from P1 residue N°3 (GLN, Glutamine) and ends with the residue N° 284 (LYS, Lysine)

Fig. 3
figure 3

Homology model of P3. T cell receptor beta chain (3of6.1.A in the rcsb protein database) is the template used for the homology modelling of P3. The homology model starts from the P3 residue N°32 (THR, Threonine) and ends with the residue N° 245 (THR, Threonine). The X-RAY diffraction 2.80 Å was used to determine the experimental structure of 3of6.1.A [61]. The blue color show regions of the model in which P3 was well-modeled by the template, and orange regions where P3 was poorly modeled. The blue regions correspond to the T cell receptor beta chain immunoglobulin domains

Fig. 4
figure 4

Homology model of P10 chemokine interleukin-8-like domain. Lymphotactin (1j8i.1.A in the rcsb protein database) is the template used for the homology modelling of P10. The homology model starts from P10 residue N°24 (GLU, Glutamic acid) and ends with the residue N° 102 (SER, Serine). The NMR spectroscopy was used to determine the experimental structure of 1j8i.1.A [62]. The blue color show regions of the model where P10 was well modeled and orange regions where P10 was poorly modeled. The chemokine interleukin-8-like domain of the model starts with P10 amino acid at position N°27(HIS, Histidine) and ends with amino acid at position N°86 ((LEU, Leucine). This region includes both well-modeled (blue) and poorly-modeled (orange) sections

Fig. 5
figure 5

Homology model of P11. Maltase-glucoamylase, intestinal (3top.1.A in the rcsb protein database) is the template used for the homology modelling of P11. The X-RAY diffraction 2.9 Å was used to determine the experimental structure of 3top.1.A [63]. The homology model starts from P11 residue N°922 (LYS, Lysine) and ends with the residue N° 1804 (PHE, Phenylalanine). The P-type trefoil domain (amino acid N°51–962), galactose mutaros domain (amino acid N°114–1085), and glycoside hydrolase domain (amino acid N°225–1152) are not covered in the homology model. The blue color show regions of the model where P11 was well modeled and orange regions show where P11 was poorly modeled

Fig. 6
figure 6

Expression level of selected zebrafish genes in other published studies. Expression level of selected zebrafish genes (P1, P9, and P12) in other published RNA-seq datasets of (a) zebrafish heart regeneration [64], and (b) zebrafish brain microglia [52] using the Zf Regeneration Database (www.zfregeneration.org, [65]). The y-axis indicates the normalized transcript level expressed as fpkm (fragments per kilobase of exon per million reads). On the x-axis is the different experimental conditions. (A,  dpa =  days post injury. B, active microglia indicates responding to acute damage, h = hours after acute damage)

Table 6 Othologs found in the species Agmbystoma mexicanum, Xenopus laevis, Xenopus tropicalis and Cynops pyrrhogaster

P1 (si:dkey-181f22.4)

The gene coding for P1 (si:dkey-181f22.4) is located on zebrafish chromosome 7 and is predicted to have exon/intron structure coding for a predicted 513 amino acid protein (Table 1). Protein domain and gene ontology (GO) term returned predicted “protein kinase domain” and “Caspase Activation and Recruitment (CARD) domain” (Table 2). The CARD domain is known to function in innate immunity, particularly in inflammation and the regulation of apoptotic process (Table 2, [66,67,68,69]). Amino acid sequence similarity analysis returned several kinases associated with immune function, and suggested that this gene may code for a receptor tyrosine kinase (Table 3). The best-matched ortholog analysis returned “Receptor-interacting serine/threonine-protein kinase 2 isoform 1” in both human and mouse (Table 4). Of note, human RIPK2 has been described to contain a C-terminal CARD domain [70,71,72]. In comparison to other selected species (Table 6), P1 returned receptor tyrosine kinase-like orphan receptor 2 (Axolotl), Threonine-protein kinase 2-like isoform X1 (Xenopus), and insulin-like growth factor receptor as well as receptor tyrosine kinase-like orphan receptor 2 (Salamander). Structure prediction (Table 5, Fig. 2) strongly indicated a kinase domain/function for P1.

The results strongly indicate that P1 has a kinase domain that may be activated by interactions with other proteins via the CARD domain, and this function may be acting in concert with receptor activity. Interestingly, the CARD domain of human RIPK2 facilitates interaction with NOD-like receptors [73, 74]. Collectively, these results indicate that zebrafish P1 may have orthologous function to human RIPK2. However, the amino acid substrate of phosphorylation (tyrosine vs. serine/threonine) by zebrafish P1 is not yet clear, as both classes of kinases were indicated in the hits.

P2 (si:ch73-112 l6.1)

The gene for P2 (si:ch73-112 l6.1) is located on zebrafish chromosome 21 and codes for a predicted 1025 amino acid protein (Table 1). Protein stability analysis (Supplemental Figure 2) indicates P2 is a structured protein, but with a large disorded domain. Such disordered regions often indicate a protein-protein binding interface [56]. However, collective analyses were largely uninformative for P2. For example, no protein domains nor GO terms were returned (Table 2). A putative ortholog with unknown function from Branchiostoma floridae was returned based on amino acid sequence similarity (Table 3), and three uncharacterized zebrafish genes were returned as best-matched orthologs (Table 4).

P3 (zgc:174863)

The gene for P3 (zgc:174863) is located on zebrafish chromosome 6 and codes for a predicted 290 amino acid protein (Table 1). Protein domain and GO terms indicate an immunoglobulin-like domain, which are present in proteins involved in cell adhesion (Table 2). Consistent with this, sequence similarity analysis revealed 5 proteins from 4 species, several of which contain immunoglobulin folds (Table 3). Protein structure analysis (Table 5, Fig. 3) further indicated that the predicted protein contains immunoglobulin-like domains as it was resonably modeled by the T cell receptor beta chain in regions containing immunoglobulin folds (Fig. 3). Collectively, these results suggest that P3 could be a cell membrane receptor possibly involved in cell adhesion. In support of this, comparison to Xenopus tropicalis returned a predicted ortholog with putative cell adhesion function (Table 6). In addition, several hits for P3 were found by amino acid similarity in Xenopus tropicalis, Apis mellifera, Gadus morhua, and Latimeria chalumnae (Table 3), and based on phylogenetic relationships of these species (Supplemental Figure 1), it seems possible that the funciton of the gene coding for P3 was evolutionarily conserved in these species.

P4 (si:dkey-56 m19.5)

The gene coding for P4 (si:dkey-56 m19.5) is located on zebrafish chromosome 7 and codes for a predicted 526 amino acid protein (Table 1). As noted above, P4 is predicted to be a disordered protein (Supplemental Figure 2). Many intrinsically disordered proteins evolve rapidly [75,76,77,78], and therefore, predicting a function for P4 is difficult based on amino acid sequence. Accordingly, analyses based on sequence similarity were overall minimally informative. An associated protein domain (Ribonuclease E/G) was returned for P4 (Table 2) and a possible ortholog (Brain abundant, membrane attached signal protein 1, BASP1) with unknown function in Oryzias latipes was a hit based on amino acid sequence similarity (Table 3). P4 returned four best-matched orthologs from other species, but these genes had widely varying predicted functions (Table 4). Protein structure analysis was uninformative for P4 (Table 5).

Synteny analysis indicated that the gene coding for P4 lies in a syntenic region with human genome on human chromosome 16 (Supplemental Figure 3). The gene for P4 is flanked by several neighboring genes that have apparent orthologs in human, and based on the orientations and locations of the neighboring genes in the two species, the gene for P4 lies in a relative location similar to human TERB1. However, using NCBI BLASTP to compare sequences of zebrafish P4 and human TERB1 (with any scoring matrix) found no signficant similarity between these two genes, therefore failing to provide evidence of orthology of these genes. Therefore, we consider that the gene coding for P4 could have been gained in zebrafish or lost in humans. Interestingly, several possible orthologs in various species of fish were returned for P4 (Table 4).

P5 (si:ch211-105j21.9)

Protein domain and GO term returned MGC-24 and Mucin15 domain (Table 2) for P5 (si:ch211-105j21.9). Amino acid sequence similarity returned three hits from three different species for genes with unknown and varying functions (Table 3), but best-matched orthologs (Table 4), as well as protein structure analysis, was uninformative. Although a hit was found in Xenopus laevis (Table 6), the protein has unknown function.

P6 (si:ch73-248e21.7)

P6 (si:ch73-248e21.7) did not return any hits for GO terms, but a putative complement regulatory protein from Xenopus tropicalis was identified as a hit by sequence similarity analysis (Table 3). Best-matched orthologs were found in four Sinocyclocheilus species of fish, two of which were Mucin 5AC_like proteins and two of which were cell wall-like proteins (Table 4). However, other analyses proved uninformative.

P7 (si:ch211-191j22.3)

Analyses for P7 were largely uninformative, though there were hits in some of these analyses indicating unknown, uncharacterized, or hypothetical proteins in six different fish species (Table 3, Table 4) their meaning was not interpretable.

P8 (LOC100535303)

Protein domain/GO term results suggest P8 contains immunoglobulin-like domain. This was further indicated by the amino acid sequence similarity results (Table 3), protein structure results (Table 5), and the putative “CD48 antigen” orthologue identified in Xenopus tropicalis (Table 6).

P9 (urp1)

The gene coding for P9 was previously annotated as urp1, suggesting that putative urotensin function is already recognized. Consistent with this, protein domain/GO term and amino acid sequence similarityreturned results for P9 indicating urotensin function (Table 2 and Table 3), which is involved in regulation of vasculature diameter. Specifically, Urotensin II is a secreted mediator known to function in vasoconstriction of blood vessel diameter (Table 2, [79,80,81]). However, similar structures were not identified in our analyses (Table 5).

P10 (xcl32a.1)

The gene for P10 (xcl32a.1) is located on zebrafish chromosome 2 and is predicted to encode a protein of only 126 amino acids (Table 1). The protein domains/GO term search returned chemokine interleukin-8-like, which functions in immune response (Table 2). Other analyses also indicated that P10 is likely a cytokine/chemokine (Table 3, Table 4, Table 5, Table 6). The predicted amino acid length of P10 is consistent with short amino acid chains seen in cytokines/chemokines. Consistent with this function, regions of P10 were well modeled by regions of the chemokine Lymphotactin’s interleukin-8-like domain (Fig. 4).

P11 (si:ch211-287n14.3)

Collectively, results for P11 indicate that it could be an enzyme involved in carbohydrate metabolism (Table 2, Table 3, Table 4, Table 5, and Table 6). P11 could be well modeled by human intestinal maltase-glucoamylase (Table 5, Fig. 5), as well as sucrase-isomaltase and lysosomal alpha-glucosidase (Table 5). However, the predicted functional domains found previosly (P-type trefoil, galactose mutarose, and glycoside hydrolase domains, Table 2), were not covered in the homology model of maltase-glucoamylase. The domain P-type trefoil, found for P11 (Table 2), is found in several secreted proteins associated with mucins [82,83,84], many of which are involved in the response to gastrointestinal mucosal injury and inflammation [85], though the function of such a secreted protein in the CNS during tissue regeneration is not clear; perhaps it could be involved in extracellular matrix degradation.

P12 (pho)

The gene encoding P12 (pho) is located on zebrafish chromosome 5 and encodes a large predicted protein of 2798 amino acids (Table 1). Interestingly, P12 (pho) has been previously described to be required for the regeneration of zebrafish neuromasts [86], which are sensory patches located along the zebrafish body, but its function has not been studied otherwise. The coiled coil domain found in the protein domain/GO term analysis (Table 2) was described previously [86]. In addition, we find that P12 is predicted to have more than 50% of the amino acids disordered, and is therefore is likely an unstructured protein (Supplemental Figure 2). Since P12 is a disordered protein, this is likely the reason that other analyses did not prove informative (Table 3, Table 4, Table 5, Table 6). Many studies have shown that disordered proteins evolve more rapidly than structured proteins [75,76,77,78] and the disordered region of the protein drives this rapid evolution [77]. In addition, large proteins with coiled-coil domains appear to have functions in cell structure [56]. In spite of the predicted disordered structure, the previously cited study [86] found evidence for an ATPase and transmembrane domain; however, our analyses did not reveal these features. Given that P12 is reported to be required for neuromast regeneration in zebrafish [86], we considered that a syntenic relationship might be identified in genomes of other species known to have robust regenerative abilities. However, our synteny analyses did not return predicted syntenic regions compared to Ambystoma mexicanum, Xenopus laevis, Xenopus tropicalis, Cynops pyrrhogaster (not shown).

Comparison to other published RNA-seq datasets

We were interested in determining to what extent transcripts mapping to some select genes might be shared in other zebrafish tissue/cells such as regenerating tissue such as heart [64], in resting microglia [52], and in microglia responding to acute damage [52]. We focused this comparison on P1, P9, and P12 because P1 had particularly informative analyses above (indicating kinase function), and P9 and P12 might have novel functions in regeneration. Interestingly, transcripts for both P1 and P9 were increased in regenerating heart tissue samples compared to uninjured (Fig. 6a). Transcripts mapping to P1 appeared slightly more abundant in resting microglia compared to other brain cells, but levels did not change significantly in microglia responding to acute damage (Fig. 6b). Since P1 was enriched in microglia in our study [30], which sampled microglia/macrophages during retinal regeneration, it is possible that expression and function of this putative kinase (P1) are upregulated during tissue regeneration. Transcripts for P9 gene were present in microglia in the zebrafish brain, both in resting state and in response to acute brain damage (Fig. 6b), though they did not appear to change significantly in such conditions. Thus, it is possible that P9 is a mediator produced by microglia/macrophages that acts on the local vasculature to control blood pressure locally and perhaps this function is upregulated during tissue regeneration.

Examining expression levels of P12 did not demonstrate any apparent upregulation of P12 in regenerating heart compared to the very low transcript levels in uninjured heart tissue (Fig. 6a). However, P12 expression was observed in resting microglia from zebrafish brain, and the expression of P12 appeared to be reduced in context of microglial acute damage response [52] (Fig. 6b). This expression pattern, in combination with our dataset indicating expression by microglia/macrophages during retinal regeneration, suggests that P12 (pho) may have function in restoring and/or maintaining a “resting” microglial/macrophage state. However, such a hypothesis will require experimental testing.

We next examined a published RNA-seq dataset representing zebrafish macrophages responding to M. marinum infection [87], to determine if the genes of interest were also differentially expressed in zebrafish macrophages responding to microbial infection. Interestingly, although transcripts were detected in the Rouget et al. study for ten out of twelve of the genes, only one of these (P6, si:ch73-248e21.7, which may have complement regulatory function based on the results describbed above) was found to be differentially expressed in macrophages from infected fish compared to uninfected fish based on the authors’ cut-off criteria of Log2FC > =1, p-adj < 0.05 (Table 7). This supports the idea that these genes could comprise part of a unique transcriptome that is expressed in microglia/macrophages during tissue regeneration compared to that in response to microbial infection.

Table 7 Expression of zebrafish genes pertaining to P1-P12 in macrophages responding to microbial infection

Discussion

In this study, we analyzed twelve zebrafish genes with unknown function. These genes were selected from our previous transcriptome analysis of zebrafish microglia/macrophages isolated from regenerating retinal tissue [30]. We used bioinformatic analyses to analyze the twelve selected transcripts to suggest putative functions. These analyses included protein domain and gene ontology (GO) terms, amino acid similarity, predicted protein structure, and synteny comparisons. For some selected genes, we examined expression level in other published studies of gene expression in zebrafish [52, 64], and examined other published data sets involving macrophages responding to microbial infection [87] to determine if these genes might be regulated in different activation contexts.

Results for many of the genes analyzed indicate putative functions related to the immune system. Several of these functions may not be well described in fish compared to mammalian organisms. The predicted genes/predicted proteins yielding the most informative results include P1 (results strongly indicate receptor associated kinase activity), P9 (previously annotated as urp1, which results indicate urotensin-like activity), P10 (which may have chemokine activity), and P11 (which could be an enzyme involved in carbohydrate metabolism). Although only an immunoglobulin-like fold domain was revealed for P3 and P8, and a possible mucin domain for P5, these results provide at least some new insight into the structure of the predicted proteins as these domains have not been previously noted for these genes. On the other hand, our analyses did not reveal significant functional information about P2, P4, P6, P7, and P12. Given that P12 (pho) is predicted to be a disordered protein, our analyses do not allow us to make predictions about the function of this particular protein, though it remains of interest due to its previously indicated role in neuromast regeneration [86]. It will be interesting to determine, experimentally, if phoenix (pho), or any of the other genes analyzed in this work, are required for retinal regeneration.

The lack of syntenic relationships between zebrafish and mouse/human for the majority of the genes analyzed is notable, suggesting that possibly these genes were not evolutionarily retained across these species or alternatively, that these genes may have appeared in certain species [88]. For the one zebrafish gene that did have syntenic relationship identified, sequence alignment did not indicate an evolutionary relationship to the candidate gene in the syntenic region. Orthologs were identified for some, but not all, of these zebrafish genes of interest in species which are also known to regenerate damaged tissue (Axolotl, Xenopus and Salamander, Table 6 and Supplemental Table 1). We therefore consider that, in future work, it is important to determine if the genetic program used by microglia/macrophages during zebrafish CNS regeneration is unique on a species level. Whether such a unique genetic program is required for successful regeneration also remains to be determined.

To begin to probe this question, we examined other published RNA-seq datasets for expression patterns of the genes examined here in this work. For selected genes (P1, P9, and P12), we examined transcript abundance in samples from zebrafish regenerating heart tissue [64] and zebrafish brain microglia [52]. Both P1 and P9 showed upregulation in regenerating zebrafish heart, while P12 transcripts were apparently reduced in microglia responding to acute damage compared to resting microglia. When we examined the transcriptome of zebrafish macrophages responding to infection by the microbe M. marinum [87], only one of the twelve genes discussed in our work here was found to be differentially expressed in this context. It is worth considering that the samples sequenced in our study [30] compared to these other studies differ in regards to the developmental age/stage of the animal, location in the body, sample preparation, sequencing protocols, as well as other factors. However, these comparisons might still suggest that it is possible that these genes may be regulated in a tissue regeneration context rather than in response to microbial infection. Thus, it is possible that at least some of these genes comprise part of a general transcriptional program active in zebrafish microglia/macrophages responding to both tissue damage and/or infection. However, further experimental studies involving at least some of these genes (i.e. P1, which bioinformatic predictions suggest could be a kinase, and P12 (pho)) are likely to increase our understanding of mechanisms involved in successful tissue regeneration. Indeed, harnessing such regenerative capacity in mammals must be better informed by a more thorough functional understanding of a genetic program executed by organisms such as zebrafish, that underlies successful regeneration. Such work will also lead to a better evolutionary understanding of the vertebrate innate immune system.

Conclusions

In this study, we have predicted putative functions for several zebrafish genes with previously unknown function. Transcripts mapping to these genes were enriched in microglia/macrophages during retinal regeneration, suggesting they could have functional importance in tissue regeneration. We identified putative orthologs of several of these genes, mainly based on functional domains, which provide informative insight into possible protein function. In addition, comparison to other RNAseq datasets suggest that most of these genes could be expressed as part of a transcriptional program expressed by microglia/macrophages during tissue regeneration. Our findings provide a foundation for future experimental work to determine the function of these genes in vivo.

Methods

RNAseq dataset and predicted orthology

The 3’mRNA Quant-seq experiment and differential gene expression (DEG) analysis is described in Mitchell et al., 2019 [30]. This dataset is available on the Gene Expression Omnibus (GEO120467). To identify putative mouse and human orthologs of the 986 transcripts found to be enriched in mpeg1+ cells compared to other cell types, the DRSC integrative ortholog prediction tool (DIOPT, v 7.0, www.flyrnai.org) was employed based on the zebrafish ENSEMBL ID.

Protein domains and gene ontology (GO) terms

The protein domains and the gene ontology (GO) terms (Biological Process and Molecular Function) were determined from the universal protein knowledgebase (UniProt, [89]) and the integrative protein signature database (InterPro, [90]). The gene ID from Ensembl (https://www.ensembl.org/, [54]) was used to extract the predicted protein sequence of the gene from the National Center for Biotechnology Information database (NCBI, https://www.ncbi.nlm.nih.gov/). The gene’s amino acid sequence was used to extract protein domains and gene ontology (GO) terms in UniProt [89] and InterPro [90].

Sequence similarity

Two approaches were used to find orthologs for each protein based on sequence similarity, EggNOG and SmartBLAST, because these two approaches use different protein databases. The bioinformatics web-server EggNOG 4.5.1 [55] compares the input protein sequence to the sequences available in several databases and displays the list of orthologs of the protein and the species where those orthologs are found [55]. The “default” settings of the web-server SmartBLAST (https://blast.ncbi.nlm.nih.gov/smartblast/) was used to identify the species of origin of orthologs (and paralogues within zebrafish) which were best-matched by our genes using the non-redundant protein sequence database [91].

To look for orthologs in species with described capacity for regeneration (Ambystoma mexicanum, Xenopus laevis, Xenopus tropicalis, Cynops pyrrhogaster), the protein sequences of zebrafish genes were compared to the NCBI database (http://blast.ncbi.nlm.nih.gov) using BLASTP with the BLOSUM45 scoring matrix and Gap Costs “Existence: 10 Extension: 3” (http://blast.ncbi.nlm.nih.gov). In addition, we used tBLASTn to identify putative unannotated orthologs in these species, and these results are reported in Supplemental Table 1.

Structural analysis

We inferred protein disorder using default settings (5% false positive rate) of the the server PrDOS (http://prdos.hgc.jp/cgi-bin/top.cgi, [92]), which predicts natively disordered regions of a protein chain from its amino acid sequence. PrDOS returns a disorder probability for each residue. Proteins with more than 30–50% predicted disordered residues are considered disordered proteins [92].

We used the bioinformatics web-server SWISS-MODEL [57] to identify templates or homologs for our list of unknown proteins based on the predicted 3D structure of the proteins of interest (with Global Model Quality Estimation [58] or GMQE > 0.3 as cut-off). Homology modeling, or comparative protein modeling, uses an ortholog’s (template’s) experimentally-determined 3D-structure to estimate a model for the target sequence [57].

Synteny analysis

Synteny comparisons were performed using www.ensembl.org, because this database uses the most updated genome build for zebrafish (GRCz11). The ENSEMBL ID was used to identify the gene of interest and the chromosomal region containing the gene was selected. In the Comparative Genomics menu option, synteny was selected to compare the chromosomal region of the zebrafish gene to human (GRCh38.p13) and mouse (GRCm38.p6) genomes. Only one gene of interest was found to lie in a syntenic region (P4, Supplemental Figure 3). The amino acid sequence of the zebrafish gene was compared using (BLASTP, http://blast.ncbi.nlm.nih.gov) to the candidate annotated gene found inside the syntenic region using the National Center for Biotechnology Information (NCBI) database to look for similarity and orthologs; alignment was compared with each scoring matrix in the program [93].

Expression level in other RNA-seq datasets

We determined the expression level of selected zebrafish genes of interest in other published datasets of zebrafish heart regeneration [64] and zebrafish brain microglia [52] using the Zf Regeneration Database (www.zfregeneration.org) [65]. The gene’s symbol or ENSEMBL ID were used to plot the normalized expression level of transcripts of interest.

To probe the RNA-seq dataset from Rouget et al. [87], we searched for the ENSEMBL ID of each gene of interest in the raw datasets (GSE78954 and GSE68920) to determine if transcript counts were detected. To determine if the gene was considered to be differentially expressed in macrophages responding to infection, we examined the authors’ reported results of differential expression analysis comparing transcripts from sorted uninfected vs. M. marinum infected macrophages from zebrafish larvae [87] (Rouget et al.,2019).