Background

Iridoviruses are large DNA viruses (~120–200 nm in diameter) that replicate in the cytoplasm of infected cells. Iridovirus genomes are circularly permuted and terminally redundant, and range in size from 105 to 212 kbp [1, 2]. The family Iridoviridae is currently subdivided into five genera:Chloriridovirus, Iridovirus, Lymphocystivirus, Megalocytivirus, and Ranavirus [3].

Iridoviruses have been found to infect invertebrates and poikilothermic vertebrates, including amphibians, reptiles, and fish [4]. Iridovirus infections produce symptoms that range from subclinical to very severe, which may also result in significant mortality [59]. The high pathogenicity associated with some members of the iridovirus family has had a significant impact on modern aquaculture, fish farming, and wildlife conservation. For example, systemic iridovirus infections have been found in economically important freshwater and marine fish species worldwide. In addition, iridovirus infections have been implicated in amphibian population declines, representing a set of emerging infectious diseases whose spread has been accelerated by human activities [1014].

Despite the economic and ecological significance of iridoviruses, very little is currently known about their molecular biology. One approach towards gaining a deeper understanding of iridoviral pathogenesis is to investigate the core set of essential genes conserved among all members of the family. The genomes of twelve iridoviruses, including at least one from each genus, have been completely sequenced (Table 1). According to the previously published annotations, these genomes contained only 19 core genes associated with a variety of viral activities: transcriptional regulation, DNA metabolism, protein modification, and viral structure. Definition of this core set of genes also highlights those genes that are conserved across some, but not all, genera, and unique genes found within a single species. These non-core genes may be involved in specific virus-host interactions, enhancement of virus replication, and augmented pathogenesis in certain species.

Table 1 Iridovirus Genomes

Despite the growing number of sequenced iridovirus genomes, no systematic comparative genomic analysis of the family has yet been performed. Thus, annotation of these genomes has been performed without standardization and has so far been guided primarily by the position of start/stop codons rather than the presence of homologous sequences. As a result, some long overlapping potential ORFs have been automatically designated as coding sequences, and smaller homologous ORFs overlooked. In this paper, we have taken a comparative genomics approach to re-examine the annotation of all twelve iridovirus genomes, using the Viral Orthologous Clusters (VOCs) [15] and Viral Genome Organizer (VGO) [16] software. These re-annotated genomes were then analysed further, both to define the core set of iridovirus genes more accurately, and to provide a deeper understanding into the phylogenetic relationship between individual iridovirus species.

Results & discussion

Re-annotation of Iridovirus genomes

One objective of this project was to demonstrate the application of comparative genomics to annotating viral genomes, particularly those that have been poorly characterized experimentally. In an earlier study, we utilized comparative genomics to identify previously unannotated small viral ORFs in the Poxviridae [17]. Here, we focused our analysis on the Iridoviridae family, which represents a challenge in genome annotation since there is little experimental evidence available to confirm gene expression. Another problem is that iridovirus promoter elements have not been well characterized, and thus cannot be used as a reliable criterion for assigning ORFs. These combined factors made previous iridovirus gene annotation a somewhat arbitrary process, resulting in closely related iridovirus species with dramatic differences in their genomic annotations. Therefore, we decided to analyse all members of this family using a standardized comparative genomics approach, using the fact that ORFs that are conserved in more than one divergent species are likely to be functional genes.

Analysis was begun with the Megalocytivirus genus, which contains three sequenced genomes: infectious spleen and kidney necrosis virus (ISKNV), rock bream iridovirus (RBIV), and orange-spotted grouper iridovirus (OSGIV). These three viruses display a co-linear arrangement of genes with an overall DNA sequence identity of greater than 90%. In the analysis of this genus, differences in gene content were examined in detail. Dotplots were used to determine presence of orthologous DNA and a variety of BLAST searches and the VGO genome visualization software were used to determine the reason (frameshifts, extra stop codons) behind the apparent absence of some ORFs.

Using this approach, a substantial number of ORFs were either added to, or deleted from members of the Megalocytivirus genus (Table 2). OSGIV and RBIV share 99% DNA sequence identity, and thus are probably different strains of the same virus; however, previous annotation described only 82 out of 118 total annotated ORFs shared by the two genomes [18, 19]. After our re-analysis, the RBIV and OSGIV genomes had an identical complement of annotated genes. Furthermore, this re-annotated ISKNV genome contained 110 ORFs orthologous with both RBIV and OSGIV (compared to 71 in the old annotation.) (Table 2) [18, 20].

Table 2 Re-annotation of the Megalocytivirus genus

In the process of re-examining these genomes, we annotated a number of genes containing apparent frameshift mutations between species. In RBIV we annotated ten genes with potential frameshift mutations, while OSGIV had four such genes (Table 2). All of the genes containing potential frameshift mutations had orthologs in the other two members of the Megalocytivirus genus (Table 2). In some cases, these mutations may be the result of natural mutations within the viruses; however, it is also possible that these apparent frameshift mutations are actually sequencing errors. For both RBIV and OSGIV, PCR primers based on the ISKNV sequence were used to amplify genomic fragments, which were subsequently sequenced [18, 19]. It is possible that errors were introduced during the PCR process, leading to apparent frameshifts in the reported sequence. It is interesting to note that the genomic sequence of ISKNV (sequenced using subcloned fragments rather than PCR products) [20], had significantly fewer annotation changes made during our re-analysis. Though we have not experimentally proven that the frameshift mutations in OSGIV and RBIV are the result of sequencing errors, it would be useful to focus future sequencing efforts on these regions, to determine if the reported sequences are indeed correct.

After re-annotating the Megalocytivirus genus, we applied the same comparative genomic analysis to the Ranavirus genus. The genus contains five sequenced members divided into two groups, each with a high degree of sequence conservation and a co-linear arrangement of genes. The first group is comprised of frog virus 3 (FV3), tiger frog virus (TFV), and Ambystoma tigrinum virus (ATV). The second group contains Singapore grouper iridovirus (SGIV) and grouper iridovirus (GIV).

The first step in the re-annotation of the Ranavirus genus was a comparative genomic analysis of FV3, TFV, and ATV. This resulted in an increase in the number of conserved annotated genes from 76 to 87 (Table 3). Subsequent re-analysis of the second Ranavirus group, containing SGIV and GIV, resulted in an increase from 131 to 138 conserved annotated ORFs (Table 4). It should be noted that two of the newly annotated ORFs, SGIV 0.5L and GIV 120.5L, appear to "wrap around", beginning at one end of the genome with the remainder of the ORF located at the opposite end [21, 22]. These apparent "split ORFs" are actually the result of the circularly permutated iridovirus genome being represented as a linear genomic sequence, when the arbitrarily chosen start point happens to fall in the middle of an ORF [23].

Table 3 Re-annotation of FV3, TFV, and ATV of the Ranavirus genus
Table 4 Re-annotation of SGIV and GIV of the Ranavirus genus

As seen above, our comparative genomic approach was able to identify previously unannotated ORFs, homologous ORFs with potential frameshifts, and ORFs split between the two ends of a circular genome. Although this approach proved extremely successful for the Ranavirus and Megalocytivirus genera, we were unable to use it for the Chloriridovirus, Iridovirus, and Lymphocystivirus genera. This is due to the lack of co-linearity and the highly divergent sets of genes that exist between the members of these genera, as well as the low number of available genome sequences. However, we did modify the annotations of lymphocystis disease virus-China (LCDV-China) and invertebrate iridescent virus-6 (IIV-6). The previous annotations of these genomes of both species had contained a large number of overlapping ORFs [2, 24], which we decided to exclude on several grounds. First, LCDV-China and IIV-6 are the only iridoviruses, out of the twelve so far sequenced, in which overlapping ORFs have been annotated. In addition, the original sequencing paper for IIV-6 [2] and a follow-up paper by the same group [25] did not include a number of the overlapping ORFs reported in the database sequence, presumably due to their small size and lack of similarity with other viral and cellular genes. Finally, there is no experimental or bioinformatics evidence to suggest that any of these ORFs encode proteins. Therefore, to improve the overall consistency of the Iridoviridae family annotations, we removed the small overlapping ORF annotations from the LCDV-China and IIV-6 genomic sequences (Table 5, Additional File 1 &2).

Table 5 Overlapping ORFs deleted from the Iridovirus and Lymphocystivirus genera

Defining the conserved genes in Iridoviruses

As a result of this re-annotation of the Iridoviridae family, species within each genus now have a much greater consensus among their annotated ORFs. Prior to re-annotation, only 19 ORFs appeared to be conserved across all iridovirus species (Table 6). Although a previous report has suggested that 27 core genes exist within the Iridoviridae family [26], those core genes reported are found in most, but not all published iridoviridal species. In light of our previous results, we re-examined this core set of genes using the VOCs software. We identified seven novel core genes (Table 7), increasing the total number to 26 (Table 6 &7). This increase in the number of core genes was primarily due to the five new genes annotated during the re-analysis of RBIV (Table 7 bold highlighted genes). As expected most of the core genes are predicted to have essential functions, required for transcription, replication, and virus formation. Interestingly, three core genes, the orthologs of FV3 12L, 41R, and 94L, have no predicted functions. As previously stated Delhon et al. [26] identified 27 core genes, one more than we identified after our re-analysis. Delhon et al. [26] report the orthologs of FV3 20L represent a core [26]. However, our analysis shows that orthologs of FV3 20L exist in all genera except the Megalocytivirus (Figure 1) suggesting that FV3 20L is not a core gene. Future research to determine the functions of these genes, which are also likely to be essential, will provide important data for understanding the replication cycle of iridoviruses.

Table 6 Iridoviridae Core Genes
Table 7 Additional Iridoviridae Core Genes Identified After Genome Re-analysis
Figure 1
figure 1

Conserved Iridovirus Genes. Every Iridoviridae gene that has an ortholog in at least 2 Iridoviridae genera are shown. Orthologs share the same row on the table. The genes within each genus are color-coded for easier identification. As long as at least one member of the genus contains an ortholog, the entire genus is highlighted. Where multiple ORFs are listed for a particular gene name, the ORFs represent multiple orthologs of the gene in that viral species. The remainder of the figure showing just the genes conserved between the Iridovirus and Chloriridovirus genera are included in Additional File 3.

Identifying genes conserved between some, but not all, iridovirus species can give us important information when investigating evolutionary relationships within the family. A number of past phylogenetic analyses of Iridoviridae have used phylogenic trees constructed from aligned protein sequences [1, 1820, 22, 24, 27]. However, there are potential problems with phylogenic analysis based on comparisons of single genes. This type of analysis is rarely consistent due to horizontal gene transfer [28] and variable rates of evolution [29]. Therefore, we decided to take a whole genome comparative phylogenetic analysis to understand the relationship between iridoviruses. Our approach was to identify all the genes conserved between different genera to gain a better understanding of the relationships within the iridovirus family. This approach yields an indication of how similar in gene content 2 genomes are. Our whole-genome comparative analysis, grouped orthologous genes between genera (Figures 1 &2 and Additional File 3), and was consistent with phylogenic trees constructed from single protein sequences. Based on gene conservation, the Ranavirus and Lymphocystivirus genera appear to be most closely related to one another (Figure 2). In addition, the Iridovirus and Chloriridovirus genera are also closely related to one another based on presence of orthologous genes (Figure 2). In contrast, the Megalocytivirus genus and the Iridovirus/Chloriridovirus genera are equally divergent from each other as well as all other Iridoviridae family members (Figure 2).

Figure 2
figure 2

Phylogenetic relationships between the five iridovirus genera based on gene content. Individual viral species were compared within a genus to identify the number of orthologous genes. Orthologous genes between viral genera were then determined. The numbers on each line identify the number of orthologous genes shared between viral species or genera including the 26 core genes. The Iridovirus and Chloriridovirus genera have a high degree of gene conservation and a combined genera box (Iridovirus/Chloriridovirus) was used to compare orthologous genes between genera. In addition, two subgroups of the Ranavirus genus are shown. Each subgroup contains a virtually identical complement of genes. However, a comparison between the FV3/TFV/ATV subgroup with the SGIV/GIV subgroup revealed 72 orthologous genes.

As the list of sequenced iridovirus genomes grows, the non-co-linearity between many of these genomes becomes more apparent. The Megalocytivirus and Ranavirus, but not the Chloriridovirus, Iridovirus, and Lymphocystivirus genera, show a co-linear arrangement of genes within each genus. However, comparisons of genomic sequences from different genera suggest no co-linearity. This trend may be the result of the high recombination rates [30] seen in some iridovirus members [31]. For example, within the Ranavirus genus, ATV has two inversions relative to the FV3 and TFV sequences [30], reducing the co-linearity of these genomes to some degree. Figure 3A shows how two recombination events could convert FV3 to the ATV arrangement of genes. In contrast, a comparison between the more distantly related members within the Ranavirus genus (such as FV3 and GIV) demonstrate a much more dramatic loss of co-linearity. No long stretches of co-linear genes exist between these sequences, although small sections of co-linearity remain as seen through a dotplot analysis between FV3 and GIV (Figure 3B). The dotplot shows small regions of co-linearity scattered throughout the genome of FV3 and GIV as seen by short diagonal lines on the dotplot (Figure 3B). A schematic representation of the co-linearity between FV3 and GIV demonstrates that co-linearity occurs in small clusters of genes often only 2–4 genes in length (Figure 3C).

Figure 3
figure 3

Co-linearity found within the Ranavirus genus. (A) FV3 and ATV, both members of the Ranavirus genus possess almost complete co-linearity of orthologous genes as visualized by a dotplot. However, 2 inversions have occurred. The FV3 genes 10–52 and 77–88 have switched genomic locations as shown, potentially through two recombination events. The inversion has also resulted in the loss of the ortholog of FV3 9L in ATV. (B) There is a limited amount of co-linearity found between FV3/TFV/ATV and SGIV/GIV. The co-linearity has been visualized using a dotplot analysis between FV3 (horizontal sequence) and GIV (vertical sequence). Genes are colored either red or blue representing right- or left-ward transcription respectively. (C) The co-linearity between FV3 and GIV is generally composed of stretches of 2 or 3 co-linear orthologous genes. Orthologous genes, in a co-linear arrangement are schematically shown as blocks of the same color on either FV3 or GIV genomic sequence.

Conclusion

The Iridoviridae family can cause severe diseases resulting in significant economic and environmental losses. Very little is known about how iridoviruses cause disease in their host. Our re-analysis of genomes within the Iridoviridae family provides a unifying framework to understand the biology of these viruses. For example, the re-analysis of the Iridoviridae family has increased the consistency of annotated sequences from viruses within the same genus. In addition, the re-analysis has helped create a much greater consensus among Iridoviridae family members and enhanced our understanding of this virus family as a whole. The updated annotations that we have produced for the iridovirus sequences can be found in the additional files to this paper; in addition, the databases and tools to analyse Iridoviridae genomes are available to all researchers [32]. This database will contain genomes from the original GenBank files and also the edited genomes described in this paper. Further re-defining the core set of iridovirus genes will continue to lead us to a better understanding of the phylogenetic relationships between individual iridoviruses as well as giving us a much deeper understanding of iridovirus replication. In addition, this analysis will provide a better framework for characterizing and annotating currently unclassified iridoviruses.

Methods

Re-annotation of the iridoviridae

Annotated sequences for the twelve completely sequenced iridovirus genomes (Table 1) were obtained from GenBank files and imported into the Viral Orthologous Clusters (VOCs) database [15]. Species from the same genus were examined using VOCs to identify all of the orthologous genes. The analysis then focused on the differences found between genomes within the same genus. For those genomes that contained co-linear arrangements of genes (those in the Ranavirus and Megalocytivirus genera), we compared those regions containing annotated ORFs. If more than two sequenced genomes were available for a given genus, and the ORF was present in at least two of the genomes, then we set out to determine if that ORF was also present in the remainder of the genomes. By this method, we were able to re-annotate small segments of each genome without needing to re-analyse the entire genome. The Viral Genome Organizer (VGO) software [16] was used to visualize the annotated ORFs, as well as the start and stop codons found within each genome.

Analysis of orthologous genes

We used a combination of BLAST searches and queries using the VOCs software [32] to define orthologous genes between Iridoviridae genera. VOCs is a JAVA client-server that accesses a sequence query language (SQL) database containing iridovirus genomes. This SQL database permits complex queries to be assembled in an easy to use graphical user interface. VOCs initially groups orthologous genes into families based on BLASTP scores, these can be manually checked and altered if necessary.

Dotplot analysis

Dotplots of FV3 and GIV were done using JDotter [33]. JDotter provides an interactive input window that links JDotter to the VOCs database. The sequences for the FV3 and GIV were obtained through the VOCs database.