Complete genome sequence of GII.9 norovirus

Norovirus is recognized as one of the leading causes of acute gastroenteritis outbreaks. Genotype GII.9 was first detected in Norfolk, VA, USA, in 1997. However, the complete genome sequence of this genotype has not yet been determined. In this study, a complete genome sequence of GII.9[P7] norovirus, SCD1878_GII.9[P7], from a patient was determined using high-throughput sequencing and rapid amplification of cDNA ends (RACE) technology. The complete genome sequence of SCD1878_GII.9[P7] is 7544 nucleotides (nt) in length with a 3’ poly(A) tail and contains three open reading frames. Sequence comparisons indicated that SCD1878_GII.9[P7] shares 92.1%-92.3% nucleotide sequence identity with GII.P7 (AB258331 and AB039777) and 96.7%-97.4% identity with GII.9 (AY038599 and DQ379715). The results suggested that SCD1878_GII.9[P7] is a member of P genotype GII.P7 and G genotype GII.9. This viral sequence fills a gap at the whole-genome level for the GII.9 genotype.


Introduction
Norovirus (NoV) is recognized as one of the leading causes of acute gastroenteritis outbreaks. NoV belongs to the family Caliciviridae and has a positive-sense ~7.5 kb RNA genome [1]. Phylogenetically, NoV can be segregated into 10 genogroups and further divided into genotypes based on amino acid sequence diversity in the VP1 gene. GII is the largest of the known genogroups, consisting of 26 genotypes, including 23 human NoV genotypes that are responsible for most epidemics, and three porcine NoV genotypes (GII.11/18/19) [2]. As the diversity of NoV increased through recombination, dual typing was proposed for NoV classification. Partial nucleotide sequences of the RNA-dependent RNA polymerase (RdRp) region of ORF1 are used for NoV P-type classification independently from genotype. A total of 37 P-types have now been identified for in GII viruses [2].
The first strain of genotype GII.9 virus (VA97207) was detected in Norfolk, VA, USA, in 1997 [3]. A partial genome sequence of this strain (a 3290-bp fragment including the complete ORF2 region) was uploaded to the GenBank database in 2001 (accession number AY038599) [3]. Compared with other genotypes, GII.9 strains have rarely been reported. Gelaw et. al. detected only one GII.9 strain in 450 clinical samples by RT-PCR and partially sequenced its VP1 gene (300 bp) [4]. The presence of GII.9 was also reported in wastewater in South Africa and oyster samples in Japan

Materials and methods
In this study, a rare GII.9[P7] whole genome sequence was obtained from a clinical sample. An anal swab and epidemiological data were collected through the acute gastroenteritis (AGE) outbreak surveillance system monitored by Shanghai Customs. The patient was a 22-year-old Japanese female who traveled from India and arrived in Shanghai Pudong Airport on March 19, 2018. The patient had diarrhea and vomiting and was diagnosed as having AGE. The majority of the whole viral sequence was determined using RNA-seq, and the ends of the viral genome were sequenced using a rapid amplification of cDNA ends (RACE) kit (Vazyme, Nanjing, China) (Supplementary Figs. S1 and S2) [8,9]. The whole genomic sequence was then assembled and validated using CLC Genomics Workbench (https:// digit alins ights. qiagen. com). The assembled viral genome sequence was genotyped using a web-based genotyping tool [10], and a phylogenetic tree was constructed using MEGA X [11]. The complete sequence, named SCD1878_GII.9[P7], was deposited in the GenBank database with the accession number MZ312111.
A total of 1976 human NoV genome sequences (6400-8500 bp) were obtained from ViPR on March 10, 2021 [12]. BioAider was used to remove sequences with sequence identity over 97% [13]. PhyloSuite was used to conduct, manage, and streamline the analyses [14]. Sequences were aligned using MAFFT [15]. The best partitioning scheme and evolutionary models for one pre-defined partition were selected using PartitionFinder2 [16], using the greedy algorithm and the AICc criterion. Maximum-likelihood phylogenetic trees were constructed using IQ-TREE [17] with the GTR+I+G4+F model and 20000 ultrafast bootstrap replicates, using the Shimodaira-Hasegawa-like approximate likelihood-ratio test [18].
[P6]/[P7]/[P20]/ [P15] were used to conduct evolutionary analysis by the maximum-likelihood method using the Kimura 2-parameter model. According to the "2-standard-deviation" (SD) criterion, where "the average distance between all sequences within a new genogroup or genotype and its nearest established cluster(s) should not overlap within 2 SD", an overlap was observed between the average distance of this sequence and P6 or P7 sequences. Thus, the RdRp region of the related GII.9[P7] sequence could not form a new cluster in the phylogenetic tree, and the criterion of 2×SD could not be fulfilled [19,20]. No significant difference was observed, and therefore, it could not be recognized as a new P type ( Supplementary Fig. S3).
The rapid development of sequencing technology has greatly facilitated virus monitoring. With the development of second-and third-generation sequencing technologies, discovering and analyzing longer viral genomes has  become practical. Additional complete RdRp sequences or, ideally, complete genome sequences for all reference strains will help to improve the robustness of the present classification system [19]. Obtaining whole genome sequences of rare genotypes will not only enrich the database but also provide valuable information for analysis of evolution, as well as reference genome sequences for analysis of diversity, and screening for drug and vaccine development.

Conflict of interest
The authors declare no conflict of interest.
Ethical approval Ethical approval for this study was obtained from the China CDC Ethical Review Committee (no. M202007) (Beijing, China).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.

Fig. 2
Maximum-likelihood phylogenetic tree for human NoV genome sequences (6400-8500 bp). The overall evolutionary relationship of SCD1878_GII.9[P7] to closely related NoV genogroups is shown in the tree on the left. An enlarged view of SCD1878_