Introduction

The term “pararetrovirus” was introduced by Temin [1] for animal (Hepadnaviridae) and plant viruses (Caulimoviridae) that, in contrast to retroviruses, have a DNA genome and do not integrate into the host genome for replication. Like retroviruses, pararetroviruses use a reverse transcriptase for their replication.

Endogenous pararetroviruses (EPRVs) in plants represent counterparts of members of the virus family Caulimoviridae integrated in their host’s genome. Despite the non-integrative replication cycle of members of the Caulimoviridae, a growing number of integrated viral sequences have been reported and are still being identified in various plant genomes [25].

Most of the integrants are silent, repetitive genome components. However, some of these sequences may still be able to replicate and initiate viral infection under certain conditions, according to their structural and sequence integrity and their genomic and/or epigenetic context.

Suggestions for a uniform nomenclature of endogenous virus sequences

Facing the rapidly growing diversity of EPRVs discovered in plant genomes, the need for a uniform nomenclature is obvious. According to the multi-copy nature of EPRVs, important differences in sequence composition and structure of integrants have been observed.

It would be highly desirable for a nomenclature system (1) to distinguish endogenous from episomal caulimovirid sequences, (2) to discriminate potentially functional integrants from passive and pseudogene host genome components and (3) to describe the element’s viral activity in a specific genomic context.

In some genomes, a wide variety of EPRVs has been identified, comprising viral sequences with or without exogenous virus counterparts [6, 7] as well as rearranged and functional forms of a specific virus genome [3].

Like their exogenous counterparts, EPRVs can be classified as petuvirus-like elements, badnavirus-like elements or as members of the genera Caulimovirus, Cavemovirus or Tungrovirus according to the number and arrangement of open reading frames (ORFs) and nucleotide sequence homologies with episomal viruses (see Table 1).

Table 1 Selection of EPRVs identified in plant genomes, their current nomenclature and their homologous exogenous viruses

So far, authors use the prefix “E-”or “e-” for “endogenous” (e.g. ERTBVFootnote 1 [8], ePVCVFootnote 2 [9]) or the suffix “-EPRV” (BSGFVFootnote 3 EPRV; [3]) in connection with the virus name to distinguish integrated viral sequences from the homologous episomal virus. In other examples, they are named after the host plant from which they have been isolated, analogous to the nomenclature of transposons [e.g. Sotu (in Solanum tuberosum) or LycEPRV (in several Solanum subsection Lycopersicon species); [1012]].

One major point for the nomenclature of plant endogenous virus sequences is to identify a significant relationship between viruses and integrated sequences. Usually, the highest matches of sequence identity are considered relevant. Based on existing sequence comparisons (Table 2), we suggest a threshold level of at least 80% nucleotide identity over 80% of the sequence within the polymerase (POL) reading frame (“ORF3” in Table 2) to confirm the affiliation of an endogenous sequence to a virus. This value is based on the suggestions of Wicker et al. [12] for the distinction of transposable elements and the ICTV rules for distinguishing species in the family Caulimoviridae [13]. It remains to be seen if this threshold value is appropriate when more EPRV sequences become available. Additionally, the comparison of integrated virus sequences can identify distinct EPRVs in the same host genome (e.g. NsEPRV and NtoEPRV in Nicotiana tabacum; [14]).

Table 2 Observed sequence similarities between selected members of the family Caulimoviridae and EPRVs for identification and classification of the endogenous forms

Another important feature for the classification of an integrant is whether it is functional and can trigger a virus infection. Thus, sequences in question have to be isolated and their infectivity has to be proven experimentally by infection of the respective host plants.

These considerations led us to the following suggestions for a uniform distinction between integrated pararetrovirus sequences (EPRVs) and their homologous exogenous viruses (see also Table 3):

Table 3 Proposal for a uniform nomenclature of integrated viral sequences
  1. 1.

    If the endogenous sequence can be affiliated to an exogenous virus, viral sequences integrated into the host genome should be marked by the prefix “e” (endogenous) in front of the virus abbreviation in cases where no information about the status of the integrant is yet available (eBSOLV, ePVCV). If further information is available, the prefix “ea” (endogenous and activatable) in connection with the virus abbreviation should be applied. Episomal viruses should be referred to following the ICTV nomenclature.

    1. 1.1

      Functional endogenous copies are able to release a replication-competent viral genome with high similarity to an exogenous virus and should be marked by the prefix “ea” followed by the virus name (eaPVCV, eaBSGFV-7).

    2. 1.2

      An integrant related to an exogenous counterpart, but not known to be functional as a virus per se should be named with the prefix “e” and the virus name (e.g. eTVCV, eBSVGFV-9). The integrant itself is incapable of making a functional virus: e.g. no transition from the endogenous to the episomal form is known, or the sequence lacks functional ORFs due to mutations. However, activation of “eEPRV” by recombination with exogenous counterparts cannot be ruled out. Moreover, “eEPRVs” may fulfill other purposes in the host, such as providing virus resistance [15].

    3. 1.3

      In cases where it is necessary to distinguish different integrated copies from each other, a numerical or alphabetical index (such as eaBSGFV-7 and eBSGFV-9) should be introduced.

  2. 2.

    When no exogenous virus is currently known or when only small fragments of a viral genome have been identified, the host plant initials plus the suffix “EPRS” for endogenous pararetroviral sequence” should be chosen (e.g. SotuEPRS for Solanum tuberosum endogenous pararetroviral sequence, Table 4).

    Table 4 Current nomenclature and suggested names according to the proposed guidelines of selected EPRVs

For some integrated viral sequences, homologous viruses may be unknown, e.g. because they are extinct or have not been discovered yet. Therefore, we suggest using e- or ea-(virus name) only in cases where the exogenous form is known. If there is any doubt about the existence of a corresponding pararetrovirus, (plant initials)-EPRS should be chosen; e.g. integrants in the rice genome that reveal weak homology to RTBV ([8], Table 2) were suggested to be classified as OsEPRS (Oryza sativa endogenous pararetroviral sequence) according to recent sequence alignments (see Table 3, Geering pers. comm., Table 4). As soon as more sequence information becomes available, existing EPRV names have to be changed accordingly.

We hope that these suggestions will provide some guidance for developing a uniform scheme for the nomenclature of integrated viral sequences. We encourage a discussion leading to future improvements of the nomenclature. Anybody who wants to comment on this proposal is welcome to do so using the following website: http://talk.ictvonline.org/ Footnote 4.