Introduction

Rotaviruses are a common cause of severe, acute gastroenteritis in infants and young children worldwide. By the age of 5 years, 95% of children will have experienced at least one rotavirus infection, with or without evidence of gastroenteritis symptoms. It is estimated that, globally, one in five cases will be presented to a doctor, 1 in 65 will require hospitalization, and 1 in 293 will die [27, 28, 39].

Rotaviruses have a genome of 11 segments of double-stranded RNA encoding six structural viral proteins [VP] (VP1, VP2, VP3, VP4, VP6, VP7) and six non-structural [NS] proteins (NSP1–NSP6). With the exception of genome segment 11 which encodes two proteins (NSP5 and NSP6), the genome segments are monocistronic. Full gene-protein assignments have been achieved for several rotavirus strains ([8], http://www.iah.bbsrc.ac.uk/dsRNA_virus_proteins/Rotavirus.htm). The genome is enclosed in a triple-layered particle (TLP): the inner layer consisting of VP2 encloses the genome and two minor structural proteins, VP1 and VP3, thus forming the ‘core’; the middle layer consisting of VP6 surrounds the core, forming a double-layered particle (DLP); the outer layer, consisting of VP7 and spike-like projections of VP4, enwraps the DLP to form the TLP or infectious virion [8].

Rotaviruses (Rotavirus genus) are classified into the Reoviridae family, along with the genera: Orthoreovirus, Orbivirus, Rotavirus, Coltivirus, Aquareovirus, Cypovirus, Fijivirus, Phytoreovirus, Oryzavirus, Seadornavirus, Idnoreovirus, and Mycoreovirus [21]. The Rotavirus genus includes at least seven serogroups (A–G) that may be distinguished on the basis of their antigenic relationships and the pattern of migration of the dsRNA segments in polyacrylamide gel electrophoresis [4]. Group A rotaviruses are important enteric pathogens in humans and in a wide range of mammalian and avian species [8].

Historically, VP6 was the first rotavirus protein used for classification. Both VP2 and VP6 are highly immunogenic protein in the virion [4, 41]. After infection, antibodies to VP6 are easily detected [4, 41], and the most sensitive immunologic diagnostic assays are based on detection of this protein. VP6 bears different epitopes which allow to differentiate different subgroup (SG) specificities of group A rotaviruses: SG I, SG II, SG I + II, or SG non-I, non-II viruses can be distinguished according to reactivities with two monoclonal antibodies (MAbs) [4]. More recently, based on molecular characterization, only two groups (termed genogroups) were distinguished among group A human rotaviruses (genogroup I: SGI; genogroup II: SGII, SGI + II, and SG non-I, non-II) [13]. In 1985, a classification scheme for rotaviruses was proposed to allow for the presence of multiple “groups” of rotaviruses and for the existence of “serotypes” which crossed species [10] and later in 1989, a binary classification system reminiscent of the one used for the classification of influenza viruses was established, derived from immunological reactivities and gene structures of the two outer capsid proteins, VP4 and VP7, that independently elicit neutralizing antibodies [4]. Thus, rotavirus strains are classified into VP4 or P serotypes (P for protease-sensitive) and VP7 or G (G for glycoprotein) serotypes [8]. Classification of rotaviruses into VP4 or VP7 serotypes is performed by cross-neutralization assays using hyperimmune sera raised to prototype viruses and/or to laboratory-engineered mono-reassortants. Since antigenic characterization is time-consuming and requires virus collections and proper immunological reagents that are not available in all the laboratories, and due to the increasing ease of sequencing, the antigenic classification has slowly been replaced by a classification system of rotaviruses into VP4 or VP7 genotypes, performed by sequence analyses and based on identities between sequences of cognate rotavirus gene segments. So far, 19 G genotypes (14 G serotypes) and 27 different P genotypes (14 P serotypes, 1A, 1B, and 2–14) have been identified [19]. G serotype designations largely coincide with G genotype designations. In contrast, a dual nomenclature has been adopted for the VP4 antigenic and genetic classification [8]. The P serotype (when known) is denoted by an Arabic number (sometimes followed by a capital letter) and the P genotype is denoted immediately after the P serotype number by an Arabic number within squared brackets. Rotavirus strains belonging to 11 G types (G1–G6, G8–G12) and 12 P types (P1A[8], P1B[4], P2A[6], P2C[6], P3[9], P4[10], P5A[3], P6[1], P8[11], P11[14], P12[19], P[25]) have been isolated from humans [4, 7, 9, 19, 25, 33, 36]. During the late 1990s, sequence analyses of the rotavirus enterotoxin NSP4 gene from human and animal rotavirus strains revealed the presence of six (A–F) distinct NSP4 genotypes [3, 5, 8, 11, 15, 22].

The complete open reading frames (ORF) sequences of all 11 genome segments of 53 rotavirus strains have been determined (human Wa, DS-1, AU-1, D, KU, P, TB-Chen, ST3, IAL28, Se584, 69M, WI61, A64, L26, T152, B1711, B3458, B10925-97, 111/05-27, B4633-03, RV161-00, RV176-00, N26-02, Dhaka12-03, Dhaka16-03, Matlab13-03, Dhaka25-02, Dhaka6, DRC86, DRC88, Hun5, MG6, PA169, KTM368, B4106, simian SA11, RRV, TUCH, bovine B383, UK, RF, WC3, BRV033, porcine A131, A253, YM, OSU, Gottfried, avian PO-13, ovine OVR762, guanaco Chubut, Río Negro, and lapine 30/96) [12, 1719, 34, 38 (personal communication)].

The simian rotavirus SA11 strain is considered the type species of group A rotaviruses, and its genome segments range in size from 3,302 (segment 1) to 667 (segment 11) base pairs (bp) [8, 35]. Sequences from different rotavirus strains show that each RNA segment starts with two 5′guanines followed by the 5’end non-coding sequences [UTR = untranslated region] [9–49 nucleotides (nt)], an open reading frame (or two in the case of genome segment 11), the 3′ UTRs (17–182 nt), and ends with two 3′ cytidines. In addition, the plus-stranded RNA is capped at the 5′ end with m7GpppG(m)Gpy, but there are no polyadenylated sequences at the 3′ end [8].

The overall genetic relatedness among homologues genome segments has been assessed by RNA–RNA hybridizations performed under high stringency conditions [23] and, more recently, by direct sequence comparisons. RNA–RNA hybridization has provided molecular evidence to show close interspecies relationships between human and animal strains, or confirms the existence of naturally occurring rotavirus reassortant strains. Three human genogroups, represented by reference rotavirus strains Wa, DS-1, and AU-1 have been established. Several animal genogroups have also been identified but the extent of their relationship among each other and to human strains has not been completely elucidated [24]. However, inter-genogroup reassortments may generate chimeric viruses with mixed constellations of dsRNA segments that are not easily assignable to a defined genogroup. In addition, viruses partially or completely divergent in their genome constellation from the prototype viruses may not be readily characterized. Sequence analysis of human and animal rotavirus strains has revealed the existence of a number of unusual rotavirus strains and has revealed that some strain have a puzzling genome composition derived from repeated reassortment events that could not be tracked effectively using nucleic acids hybridization techniques [2, 14, 16, 18].

Sequence-based studies targeting all the rotavirus gene segments appear more appropriate to generate conclusive data and support studies on rotavirus evolution. When rotavirus strains are analyzed and compared to one another by partial or complete sequences of all 11 gene segments, as initiated by Maunula and Von Bonsdorff [20], the genetic relationships among all rotavirus strains can be determined [12, 1719, 29, 34, 38]. Recently, using comparison of sequenced genes, evidence was presented that the human rotavirus strains belonging to the Wa-like genogroup might have a common origin with porcine rotaviruses, while the human DS-1-like genogroup might have a common origin with bovine rotaviruses [19]. Sequence comparison of rotavirus genomes is critical to the assignment of genotypes and elucidation of rotavirus evolutionary patterns [19]. One method that is used to study typical evolutionary distances between virus strains is pairwise sequence identity profiles [1], and this method resolves virus genotypes as well distinguished frequency peaks and on this biological basis provides the rationale for classification. This approach has been used successfully to classify astroviruses [26], sapoviruses [37], noroviruses [42], and papillomaviruses [6].

Accordingly, the establishment of a classification system encompassing all rotavirus gene segments and sorting individual genes into defined clusters/genotypes based on reliable percentage identity cut-off values is required in order to adopt a universal, standardized nomenclature suitable for reliable data analysis and to exchange readily and unequivocally information on the various rotavirus strains. Recently, phylogenetic analyses were performed and pairwise sequence identity profiles constructed for each of the 11 genome segments of many group A rotaviruses. Based on these analyses, a modified classification system was proposed for VP4, VP7, and NSP4, and a novel classification system for VP1, VP2, VP3, VP6, NSP1, NSP2, NSP3, and NSP5/6 [19] to be used for international standardization and implementation. The system provides an excellent framework to further analyze rotavirus interspecies evolutionary relationships [19], gene reassortment events (genetic shift), functional gene linkage in reassortant progeny, and emergence of new rotaviruses by interspecies transmission.

Review of nomenclature system

A summary of the calculated cut-off values for the 11 rotavirus RNA segments is shown in Table 1 [19]. The nucleotide cut-off percentages define different genotypes for all genes. To designate the complete genetic makeup of a virus, the schematic nomenclature was proposed: Gx-P[x]-Ix-Rx-Cx-Mx-Ax-Nx-Tx-Ex-Hx, representing the genotypes of respectively the VP7-VP4-VP6-VP1-VP2-VP3-NSP1-NSP2-NSP3-NSP4-NSP5/6 encoding gene segments, with x indicating the number of the genotype. A nomenclature overview of a large number of strains is provided in Table 2. It should be noted that no classification was obtained for NSP6, since the NSP6 ORF does completely fall into the NSP5 ORF. The phylogenetic analysis (at the nucleotide level) of the NSP6 ORF would be nearly identical to the NSP5 phylogenetic tree. Moreover, not all strains possess an NSP6 ORF.

Table 1 Nucleotide percentage identity cut-off values defining genotypes for 11 rotavirus gene segments
Table 2 Application of the proposed classification and nomenclature to the structural and non-structural protein encoding genes for known human, bovine, porcine, simian, avian, equine, lapine, murine, and ovine rotavirus strains. Green, red, and orange were used for Wa-like, DS-1-like and AU-like gene segments, respectively, while yellow, blue, and purple were used for the avian PO-13-like rotavirus gene segments, some typical porcine VP4, VP7, and VP6 genotypes, and the SA-11-like gene segments, respectively. It should be noted that the genotypes of more than 60% of the isolates in this table are partial, requiring additional work

Recommendations

In order to render the classification system useful and to explore its biological significance in full, a set of recommendations is herein summarized, which is already followed by many, when investigating the VP4, VP7, or NSP4 genotype(s) of a new rotavirus strain but now comprises all genes [19]. The new classification system [19] is based on the nt sequences of complete ORF, and therefore the nt sequence of the entire ORFs of all genes of a new strain should preferentially be obtained in order to unequivocally assign it to one of the known/established genotypes or to a new genotype. As the UTRs of group A rotavirus genes are relatively short (see above) and terminal nt sequences are partially conserved, they are, at this stage, excluded from classification attempts within group A rotaviruses. In addition, it is much less defined what drives the evolution of these gene portions.

Classification strategy

Complete ORF analysis

Once the complete ORF of a rotavirus gene under investigation has been determined, it will be compared to other complete ORFs of cognate genes available in the GenBank database through a Basic local alignment search tool (BLAST ) search (http://www.ncbi.nlm.nih.gov/BLAST/). If pairwise nucleotide identities between the gene of the novel strain under investigation (strain X) and strains belonging to an established genotype A are above the cut-off value of that gene segment (Table 1), strain X can be assigned to genotype A. The exact relationship between the gene of strain X and cognate genes of all established genotypes, has to be obtained phylogenetically. When all the pairwise nucleotide identities between a gene of the new strain Y, and the cognate genes of all the established genotypes are below the cut-off value for that gene segment (Table 1), strain Y may be the prototype of a new genotype (Fig. 1).

Fig. 1
figure 1

Flowchart summarizing the guidelines for genotype classification provided in this manuscript

In order to avoid the appearance of duplications in the literature and identical numbering of different new genes, a Rotavirus Classification Working Group (RCWG) including molecular virologists, infectious disease physicians, epidemiologists, and public health specialists has been formed (Table 3), who would be prepared and responsible to help validate the new sequence and to assign an appropriate successive genotype number, in close consultation with the Reoviridae Study Group of the ICTV.

Table 3 Current members of the RCWG

Partial ORF analysis

If only the partial ORF sequence of a rotavirus genome segment is available, assigning it to a certain genotype is less certain because the genotypic diversity across the ORF is not a constant value. Some regions of the ORF may be highly variable, while others may be more conserved. Since the cut-off percentage values for each of the 11 genome segments has been calculated based on entire ORFs, applying these cut-off percentages to only a part of the ORF, might lead to erroneous conclusions. Only under certain circumstances when all three of the following restrictions are obeyed, a partial gene sequence might be used to assign a rotavirus gene to an established genotype:

  1. 1.

    At least 50% of the ORF sequence should be determined.

  2. 2.

    At least 500 nt of the ORF should be determined.

  3. 3.

    Identity between strain X and strains belonging to an established genotype A should be at least 2% above the appropriate cut-off sequence (Table 1), before strain X can be assigned to genotype A (Fig. 1).

Remarks

To assign a genotype to a new ORF sequence, whether complete or partial, the comparison should only be done to strains for which the genotype has been established based on the entire ORF analysis, and not with other partial ORF sequences. Due to intra-segmental recombination [3032, 40] or to different rates of diversification throughout a genome segment, classification assignments based on partial ORFs may yield misleading or incorrect results.

Practical recommendations

Any sequence of a rotavirus RNA segment that had been analyzed as described above, and found to clearly belong to one of the established genotypes, can be assigned to that genotype. When a potential new genotype is found, the complete ORF should be determined and can be sent (in confidence and out of competition) to a member of the RCWG (Table 3). The sequence will be analyzed and, if appropriate, a new successive genotype number will be assigned. The new number can be used for publication. Most reputable journals will request to provide the GenBank accession number of the new sequence, and this should be obtained as soon as possible and the correct new genotype be assigned to it. The whole process of the RCWG: (a) receiving the sequence, (b) appropriate phylogenetic analysis, (c) presentation to the RCWG members, and (d) approval by the RCWG, should not take more than 6 weeks. An annual report of the RCWG will be published summarizing all newly assigned genotypes.