Findings

Dinoflagellates (Dinophyceae) are one of the highly abundant and ubiquitous unicellular eukaryotic ("protistan") components in marine environments [1]. They constitute a major class of eukaryotes within the Alveolata, a firmly established deep phylogenetic lineage that includes other diverse classes of protists, such as apicomplexans and ciliates [2]. Some dinoflagellates are autotrophic using photosynthesis, some are heterotrophic using endocytotic feeding, and many dinoflagellates are mixotrophic having both modes of nutrition. Blooms of certain photosynthetic dinoflagellates kill fish and bivalves, or pollute shellfishes for food with particular toxins, and can lead to serious economic damages in aquaculture [3, 4]. Heterocapsa circularisquama forms blooms causing massive death of shellfish such as pearl oysters and mussels, and is one of the most intensively studied dinoflagellate species [5].

HcDNAV is a marine giant virus (or "girus" [6, 7]) containing dsDNA genome, and lytically infects H. circularisquama [8, 9]. HcDNAV is considered to play a significant role in the demise of H. circularisquama blooms [9, 10]. HcDNAV has a large icosahedral capsid (180-210 nm in diameter), which packs a ~356-kbp genome [8, 11]. During its multiplication, virions emerge from a specific cytoplasm compartment, called "viroplasm", which is created by the virus [9]. HcDNAV is the sole DNA virus currently isolated from dinoflagellates, and to our knowledge, is the only DNA virus isolated from the superphylum Alveolata [12]. Based on its host range, genome type/size and microscopic features, HcDNAV was previously suggested to be a member of Phycodnaviridae [13]. However, there has been no molecular data supporting this tentative classification.

Phycodnaviridae includes intensively-studied algal virus members such as chlorella viruses and Emiliania huxleyi viruses [1417], and belongs to a larger group of eukaryotic DNA viruses called NCLDVs [18]. NCLDVs complete their replication cycle within the host cytoplasm, and share an array of conserved core genes for transcription, RNA processing, replication, DNA packaging, and structural components. Other viral families of NCLDVs are Mimiviridae, Poxviridae, Iridoviridae, and Asfarviridae. Mimiviridae is represented by the freshwater amoeba-infecting mimivirus [19] and its close relative mamavirus [20]. Based on the sequences of PolB, the most conserved NCLDV core genes, three algal viruses have been suggested to belong to Mimiviridae [21]. Poxviridae include a number of successful pathogens known to infect a tremendous variety of terrestrial animals, such as insects, reptiles, birds, and mammals [22]. Iridoviruses infect invertebrate and cold-blooded vertebrate hosts, and includes numerous emerging pathogens of fishes and amphibians [23]. The last family Asfarviridae [24, 25] is currently represented by a sole species, African swine fever virus (ASFV) with a 170 kbp dsDNA genome [26]. ASFV is a large (~200 nm in diameter), intracytoplasmically-replicating arbovirus, naturally maintained in a sylvatic cycle between wild swine (warthogs and bushpigs) and argasid ticks (Ornithodoros). In these hosts, ASFV infection is usually asymptomatic [27]. However, ASFV causes an acute hemorrhagic infection in domestic swine with mortality rates up to 100% for some viral isolates.

In an attempt to further characterize HcDNAV, we performed a low coverage shotgun sequencing of its genome. Specifically, from 4 liters of HcDNAV suspension (lysate of HcDNAV-infected H. circularisquama on 6 dpi), virus particles were collected as described in [11]. The viral genomic DNA was purified in a PFGE-gel and was subjected to shotgun sequencing (coverage = 0.11 X). Resulting sequence reads covered part of the region containing a PolB-like sequence. With the use of tail-PCR method [28], we successfully determined a 5,800 bp sequence (DDBJ accession number AB522601) containing an open reading frame (ORF) for the complete HcDNAV PolB gene. By means of a reverse transcription-PCR (RT-PCR) experiment, the PolB gene was shown to be transcribed to mRNA (additional file 1); thus, it is most likely crucial for the replication of HcDNAV.

HcDNAV PolB gene was found to be 3,675 bp long (forward strand, position = nt 1,913-5,590 in AB522601), punctuated by normal start and stop codons, and no intron or intein-like sequence was observed. The predicted protein product is 1,225 amino acids (aa) long. Unexpectedly, the translated amino acid sequence showed the closest BLASTP hits against PolB sequences from different ASFV isolates, with the best homolog being DPOL_ASFL6 (identity = 27%, bit score = 311, E-value = 4.10E-82) in the NCBI non-redundant sequence database. The best non-ASFV hit corresponded to the PolB sequence of Pyramimonas orientalis virus (DPOL_POV01, identity = 23%, bit score = 131, E-value = 4.10E-28). A multiple sequence alignment of the HcDNAV PolB and its close homologs confirmed the presence of conserved residues for exonuclease and polymerase activities [29] (additional file 2). Curiously, the HcDNAV PolB sequence exhibited a rarely observed amino acid substitution within the motif containing two highly conserved metal binding aspartic acid residues; HcDNAV exhibits the motif YS DTDS- instead of the YG DTDS- sequence usually found in dsDNA viruses. In addition, we identified two ORFs in the upstream region of the PolB ORF in a divergent orientation. Their products were respectively predicted to be 245 and 194 aa in length (positions = nt 463-1,200 and 1,255-1,839). The former showed a significant similarity to HNH endonucleases with its BLASTP best hit to mimivirus L245 (YP_142599, E-value = 4E-11); the latter showed a significant similarity to hypothetical proteins from NCLDVs with its best hit to mimivirus R325 (annotated as a metal-dependent hydrolase, YP_142679.1, E-value = 1E-12). Incidentally, R325 is located near the PolB gene (R322) in the mimivirus genome [30].

To examine the unexpected sequence similarity between the HcDNAV and ASFV PolBs, we conducted a series of maximum likelihood phylogenetic analyses. First, we aligned the HcDNAV PolB sequence with its homologs from NCLDVs. A phylogenetic tree based on the 362 amino acid residue sites from the alignment supported the monophyletic grouping of HcDNAV and ASFV with a 100% bootstrap value (Fig. 1). The grouping of each of the other four NCLDV families was also supported by a high bootstrap value (100% for Iridoviridae, 81% for Phycodnaviridae, 90% for Mimiviridae and 100% for Poxviridae). Next, we used a wider range of viral homologs including those of bacteriophages. The resulting tree based on 320 amino acid residues again supported the grouping of HcDNAV/ASFV with a 98% bootstrap value (Fig. 2).

Figure 1
figure 1

Maximum likelihood tree of PolB amino acid sequences from NCLDVs. Alignment was constructed with the use of T-Coffee. All the gap-containing amino acid residue sites were removed before tree construction. The phylogenetic tree was constructed using PhyML [38] available at Phylogeny.fr [39] using WAG matrix and gamma distribution. Branch labels indicate bootstrap percentages (≥ 50%) after 100 replicates. The tree is essentially an unrooted tree, albeit mid-point rooted only for presentation purpose. The same method was used for the phylogenetic trees in Fig. 2, Fig. 3 and in the additional file 3. HcDNAV and ASFV sequences are indicated by filled diamond marks. CeV: Chrysochromulina ericina virus; PoV: Pyramimonas orientalis virus; HaV: Heterosigma akashiwo virus.

Figure 2
figure 2

Maximum likelihood tree of PolB amino acid sequences from diverse groups of viruses. HcDNAV and ASFV sequences are indicated by filled diamond marks.

In addition, we obtained a short sequence partially corresponding to an RNA polymerase II large subunit gene from HcDNAV genomic DNA (AB522602), for which we obtained a similar result. The 892 bp sequence showed BLASTX best hit against ASFV RNA polymerase sequence (RPB1_ASFM2, E-value = 2E-12). A monophyletic grouping between the HcDNAV sequence (97 aa) and the ASFV RNA polymerase sequence was again received a high bootstrap value of 87% (additional file 3).

Our homology search and phylogenetic analyses thus confirm that the newly determined HcDNAV sequences are most closely related to their ASFV homologs. This result is in clear contradiction with the previous proposal that HcDNAV may belong to the Phycodnaviridae [13].

A previous "phylogenetic mapping" survey of the metagenomic sequence data sets generated by the Global Ocean Sampling (GOS) expedition [31] revealed several PolB-like sequences most closely related to the PolB sequence of ASFV [32]. This observation suggested the presence of ASFV-related viruses in marine environments. In order to examine whether the "ASFV-like" marine PolB sequences were close to the HcDNAV PolB sequence, we retrieved 267 sequences from the environmental sequence collection of NCBI/GenBank using the PolB sequences of HcDNAV and ASFV as queries (E-value < 1E-10). These environmental sequences were in turn searched against the NCBI non-redundant sequence database and the HcDNAV PolB sequence. Of the 267 sequences, 15 showed their best hit to the ASFV PolB, one showed its best hit to HcDNAV (gi|136563424), and the remaining sequences had their best hit to other viruses or cellular organisms. Therefore, most of the ASFV/HcDNAV-like PolB sequences in the marine environmental collection are more closely related to the ASFV PolB than to the HcDNAV homolog. A phylogenetic tree using several environmental sequences supported their grouping with the terrestrial ASFV PolB (bootstrap value = 84%, Fig. 3).

Figure 3
figure 3

Maximum likelihood tree of PolB amino acid sequences from NCLDVs and several sequences from environmental samples (indicated by open diamond marks). HcDNAV and ASFV sequences are indicated by filled diamond marks.

PolB is one of the most reliable phylogenetic markers for large eukaryotic DNA viruses [32, 33]. The fact that the HcDNAV PolB was not grouped with the PolBs from phycodnaviruses strongly argues against the previous tentative classification of HcDNAV in the Phycodnaviridae family [13]. It is clear that the definitive classification of HcDNAV will require the complete sequencing of its genome. It may also turn out that the HcDNAV genome corresponds to a mosaic of NCLDV genes with different evolutionary histories, precluding a simple classification scheme. Pending its complete genome sequencing, we recently proposed to the ICTV to create a new genus "Dinodnavirus" where to tentatively classify the HcDNAV.

Our finding now establishes an evolutionary link between a terrestrial pathogen and a marine girus. A recent metagenomic analysis of corals provided evidence for the existence of viruses related to herpesviruses [34], which have been mostly isolated as pathogens of terrestrial animals. So far, giruses of 7 algal classes [12, 35] have been isolated; still, we know next to nothing about viruses infecting other protists in aquatic environments. Given the huge diversity of protists [36, 37], a comparable diversity probably exists for marine viruses living in these environments. Exploring this hidden viral world is necessary to our understanding of the evolutionary relationships between aquatic viruses and their terrestrial relatives.