Introduction

Noroviruses are major causative pathogens of nonbacterial acute gastroenteritis worldwide in all age groups (Ahmed et al. 2014). Common human norovirus transmission occurs through contaminated food, water, and personal contact (Matthews et al. 2012; Lopman et al. 2012). Although human norovirus infection can be asymptomatic and self-limiting in healthy individuals, symptoms are often severe in immunocompromised individuals (Bok and Green 2012).

Noroviruses, members of the family Caliciviridae, are nonenveloped, single-stranded, positive-sense RNA viruses (Jiang et al. 1990; Zheng et al. 2006). The norovirus genome is 7.5–7.7 kb in length and contains three open reading frames: ORF1–ORF3 (ORF4 is also found in murine norovirus). ORF1 encodes a polyprotein that can be cleaved into six nonstructural proteins: N-terminal p48, NTPase, p22, VpG, protease, and RNA-dependent RNA polymerase (RdRp). ORF2 encodes a major structural protein (VP1), and ORF3 encodes a minor structural protein (VP2) (Hardy 2005; Green 2013).

Based on the amino acid sequences of ORF1 and ORF2, seven genogroups (GI–GVII) have been suggested (Vinjé 2015). Human infection is associated with GI, GII, and GIV, and GII is detected more frequently in clinical samples (Cho et al. 2014a; De Graaf et al. 2015; Bull et al. 2010). Based on the nucleotide sequence of RdRp in ORF1 and VP1 in ORF2, 27 genotypes of RdRp and 22 genotypes of VP1 in GII have been established (Vinjé 2015). By replacing ORF1 with other genotypes or changing the antigenicity of VP1, noroviruses obtain survival advantages (Eden et al. 2013). Among the 22 genotypes of VP1, GII.4 has become the most common genotype in the last two decades, and Sydney 2012 (RdRp genotype = GII.Pe; capsid genotype = GII.4) was the last reported GII.4 pandemic variant (Bull and White 2011; Cho et al. 2014b; Van Beek et al. 2013). However, since the latter half of 2014, new GII.17 variants have been reported as the main causes of outbreaks over GII.4 in East Asia and have also occurred in America and Europe (Chan et al. 2015; Chen et al. 2015; De Graaf et al. 2015; Dinu et al. 2016; Fu et al. 2015; Ham et al. 2015; Lu et al. 2015; Matsushima et al. 2015; Medici et al. 2015; Parra and Green 2015; Thanh et al. 2016).

Human noroviruses have been often detected in surface waters close to human communities (Kitajima et al. 2010; Wyn-Jones et al. 2011). Accordingly, in this study, we monitored environmental RdRp and the capsid gene of norovirus GII in peri-urban stream water from two provinces in South Korea from March 2015 to May 2016. To the best of our knowledge, this is the first report describing the emergence and predominance of GII.P16–GII.4, overtaking GII.P17–GII.17.

Materials and Methods

Study Area

From March 2015 to May 2016 (March, May, July, September, and December in 2015; January, February, March, and May in 2016), peri-urban coastal stream waters were collected at 18 sampling sites located within South Gyeongsang province (11 sites in four streams) and South Jeolla province (seven sites in two streams) of South Korea (Fig. 1). Each water sampling site was located immediately downstream of independent drainage ditches for domestic sewage originating from neighboring dwellings.

Fig. 1
figure 1

Study areas and sampling sites in this study. Study areas in South Korea (a) are described in enlarged maps of South Jeolla province (b) and South Gyeongsang province (c, d). Arrows in the maps throughout the streams indicate the location of each sampling site. Colors of the arrows indicate the types of putative contamination sources for the sampling sites, i.e., dwellings (black), school/nursing home (white), and farming area (gray)

Sample Collection and Preparation

A total of 158 samples were collected during the study period. One hundred liters of environmental water per sampling site was passed through Nano-Ceram cartridge filters (Agaronide, Sanford, FL, USA) according to standard procedures (Fout et al. 1996; Parshionikar et al. 2003). Subsequent elution by 3.0% beef extract (pH 9.5; BD Bioscience, USA) and concentration processes followed USEPA method 1615 (Fout et al. 2015). Final water concentrates were stored at −80 °C until analysis.

Viral RNA Extraction

Viral RNA was extracted from 140 μL of water concentrates using a QIAamp viral RNA mini kit (Qiagen, Hilden, Germany) according to the manufacturer’s protocol to obtain the final extraction volume of 60 μL.

Nucleic Acid Amplification

Region C (0.3 kb) in ORF2, the near full length of VP1 (ORF2; 1.6 kb), and the ORF1–ORF2 junction (1.0 kb) were amplified from RNA extracts. The primers used in this study are described in Table 1, and the other materials used for nucleic acid amplification process in this study followed previous descriptions (Kim et al. 2016). To amplify region C in ORF2, GII-F1M and GII-R1M primers were used for one-step reverse transcription polymerase chain reaction (one-step RT-PCR); GII-F3 and GII-R1M primers were used for the second semi-nested PCR. To amplify the near full length of the VP1 (ORF2) gene, GII-F1M and GIICR1450 primers were used for one-step RT-PCR; GII-F3 and GIICR1450 primers were used for the second semi-nested PCR. To amplify the ORF1–ORF2 junction, GIIPF800 and GIICR1450 primers were used for one-step RT-PCR; GIIPF750 and GII-R1M primers were used for the second nested PCR.

Table 1 Primers used in this study

For one-step RT-PCR, the RNA extract (5 μL) from each concentrate was mixed with 2 × 1-Step PCR ReddyMix, 2.5 μL RT enhancer, 40 μM forward/reverse primers, and 1 μL Verso Enzyme Mix (Thermo Fisher Scientific, Waltham, MA, USA). Deionized sterile water was added for a final reaction volume of 50 μL. The one-step RT-PCR conditions for region C included a reverse transcription step at 45 °C for 30 min and 94 °C for 5 min, followed by 35 cycles of amplification (94 °C for 30 s, 55 °C for 30 s, and 72 °C for 90 s) and a final extension at 72 °C for 7 min. The one-step RT-PCR conditions for the ORF1–ORF2 junction and nearly complete VP1 sequence included a reverse transcription step at 45 °C for 30 min and 94 °C for 5 min, followed by 35 cycles of amplification (94 °C for 1 min, 47 °C for 1 min, and 72 °C for 3 min) and a final extension at 72 °C for 7 min.

The one-step RT-PCR products were used as a template for secondary semi-nested PCR. The one-step RT-PCR products (5 μL) were mixed with 10× buffer, 4 μL of 2.5 mM dNTPs, 45 μM forward/reverse primers, and 5 U Top DNA polymerase (Bioneer, Daejeon, South Korea). Deionized sterile water was added for a final reaction volume of 50 μL. The secondary nested PCR conditions for region C included initial denaturation at 94 °C for 5 min, followed by 25 cycles of amplification (94 °C for 30 s, 55 °C for 30 s, and 72 °C for 90 s) and a final extension at 72 °C for 5 min. The secondary nested PCR conditions for the ORF1–ORF2 junction and the nearly complete VP1 included an initial denaturation step at 94 °C for 5 min, followed by 29 cycles of amplification (94 °C for 1 min, 47 °C for 1 min, and 72 °C for 2.5 min) and a final extension at 72 °C for 5 min.

Cloning and Sequence Analysis

Amplified products of the target gene (region C, VP1, and the ORF1–ORF2 junction) were purified using a MiniBEST Agarose Gel DNA Extraction Kit Ver 4.0 (Takara, Kusatsu, Japan) and cloned using a Mighty TA-cloning Kit (Takara) with chemically competent DH5α (Enzynomics, Daejeon, South Korea). Cloned genes in six colonies per amplicon were purified using a HiYield Plasmid Mini Kit (RBC, Banqiao, Taiwan) and were submitted for DNA sequencing (Macrogen Seoul, South Korea) using a 3730xl DNA analyzer (Thermo Fisher Scientific).

Phylogenetic Analysis

Human norovirus sequences in environmental water were identified using BLAST-n (http://www.ncbi.nlm.nih.gov/) and aligned using ClustalX (version 1.81). Phylogenetic relationships of the aligned sequences were determined using Molecular Evolutionary Genetic Analysis version 6.0 (MEGA6).

Phylogenetic trees for region C (0.3 kb) and ORF1 (0.7 kb) sequences were constructed by the neighbor-joining method with 1000 bootstrap replicates (Kimura 2-parameter model with gamma shape distribution parameter of four). The maximum likelihood method was used to construct a phylogenetic tree of VP1 sequences (Kimura 2-parameter model with gamma shape distribution parameter of four). The genotypes of the identified norovirus sequences were reconfirmed using a web-based norovirus genotyping tool (RIVM; http://www.rivm.nl/mpf/norovirus/typingtool) (Kroneman et al. 2011).

Quality Assurance

To minimize carry-over contamination, all processes involved in gene amplification were performed under controlled conditions, as previously described (Koo et al. 2012). Briefly, each process of environmental sample preparation, one-step RT-PCR preparation, semi-nested PCR preparation, and gel electrophoresis were performed in separate rooms or spaces. In addition to using pure reagents and disposable wares, all locations used for nucleic acid sequence detection were sterilized by UV radiation for 30 min before every experiment.

Results

Monitoring of Norovirus GII in Streams Neighboring Human Communities in South Korea

From March 2015 to May 2016, stream water sample collection was carried out in peri-urban streams of South Gyeongsang province (95 samples) and South Jeolla province (63 samples). Based on the region C nucleotide sequence analysis in BLAST-n (http://www.ncbi.nlm.nih.gov/) and RIVM (http://www.rivm.nl/mpf/norovirus/typingtool), 58.9% (93/158) of all samples were positive for norovirus GII; 49.5% (47/95) were positive for norovirus GII in South Gyeongsang province, and 73.0% (46/63) were positive for norovirus GII in South Jeolla province. Genotyping using a neighbor-joining phylogenetic tree identified 12 genotypes (GII.1, GII.2, GII.3, GII.4, GII.5, GII.6, GII.8, GII.12, GII.13, GII.17, GII.21, and GII.22) from GII region C sequences in the study area (ESM1—Supplementary materials). Based on region C genotyping, the number of norovirus GII.17-positive cases (n = 24) was much higher than the number of GII.4-positive cases (n = 6) from March to September 2015. In December 2015, however, the number of water samples containing GII.4 sequences (n = 9) overtook that of GII.17 sequences (n = 7). More GII.4-positive cases than GII.17-positive cases were observed for both provinces from December 2015 to May 2016 (Fig. 2).

Fig. 2
figure 2

Seasonal variations in the number of positive cases of norovirus genogroup II genotypes in peri-urban coastal stream water samples in South Korea from March 2015 to May 2016. The numbers of positive cases of region C genotypes GII.1, GII.2, GII.3, GII.5, GII.6, GII.8, GII.12, GII.13, GII.21, and GII.22 were combined and are shown as closed circles/dashed lines. The dashed lines passing thorough closed circles describe the trend of the detection frequency. The numbers of region C-positive cases of GII.4 (black) and GII.17 (gray) are shown as vertical bars. The identified GII region C sequences in this study were deposited in NCBI (GenBank ID: KX764646–KX764830)

Sequence Analysis of ORF1 and ORF2

As shown in Fig. 2, region C sequences of GII.4 and GII.17 in each of two cold seasons (March–May 2015 and December 2015–May 2016) appeared to be prevalent in March 2015 and January 2016. Thus, a total of 23 samples corresponding to GII.4 and GII.17 region C-positive (n = 11 in March 2015; n = 12 in January 2016) were selected and further analyzed using RT-PCR of the ORF1–ORF2 junction (1.0 kb) and nearly full-length VP1 (ORF2; 1.6 kb). norovirus VP1 and ORF1–ORF2 junction sequences were successfully obtained from 13/23 and 19/23 water samples, respectively.

Phylogenetic Analysis and Amino Acid Alignment of the Near Full Length of VP1

A total of 17 VP1 sequences were obtained from 13 VP1 positive samples. In the maximum likelihood tree of the VP1 nucleotide sequence, 10 sequences (GenBank ID: KX764836–KX764845; Fig. 3a) were clustered with the novel GII.17 reference sequences predominant in east Asia; seven sequences (GenBank ID: KX764831–KX764835, KX764846, and KX764847) were clustered with reference sequences of GII.4 Sydney 2012 variants, forming two independent clusters (cluster A and B; Fig. 3b). Interestingly, GII.4 VP1 sequences in 2016 formed an independent subcluster from VP1 sequences in 2015, coinciding with independent clustering of reference VP1 sequences in Japan and South Korea (LC066046/KU687025–KU687028/2015/Cluster B; LC153122/2016/Cluster A; Fig. 3b). To clarify whether amino acid sites known to be related to antigenicity in VP1 were different between the two clusters, five reference sequences of GII.4 Sydney VP1 emerged in South Korea, and seven GII.4 VP1 sequences isolated in the study area were compared (Table 2). Seven amino acid sites (95, 148, 264, 319, 333, 393, and 464) of a VP1 sequence in 2012 were changed and maintained in the VP1 sequences that emerged in 2015–2016. Four amino acid sites (119, 145, 174, and 414) were different between sequences in 2015 and 2016. Another four amino acid sites (193, 276, 309, and 438) had regional/temporal difference between sequences in 2015 and 2016. However, these variable sites were not included in HBGA binding sites and antigenic epitopes, except site 393; additionally, even site 393 did not differ between sequences isolated in 2015 and 2016. As a result, there were no amino acid sequence differences in HBGA binding sites or putative B cell epitopes between GII.4 Sydney 2012 VP1 in 2015 and 2016 in the study area.

Fig. 3
figure 3

Maximum likelihood phylogenetic tree based on genogroup II genotype 17 (a) and 4 (b) viral major protein 1 (VP1; 1.6 kb) sequences in positive samples (open square: samples from March 2015; closed square: samples from January 2016). The scale bar indicates the number of nucleotide substitutions per site. Bootstrap values of 70 or more are shown next to the branches. Each sample label number on the tree indicates the sampling date (e.g., 20150325 = March 25, 2015). EL or FL in each sample name indicates South Gyeongsang province or South Jeolla province, respectively. The names of reference sequences are indicated by bold and italic font

Table 2 Comparison of VP1 amino acid sequences of the norovirus Sydney 2012 variant that emerged in South Korea

Identification of GII.P16 in Water Samples in 2016

In terms of the ORF1–ORF2 junction, genotyping on the RIVM identified GII.P17–GII.17 (13/19 ORF1–ORF2 junction-positive samples; GenBank ID: KX764857–KX764869; data not shown) and two types of RdRp-GII.4 capsid sequences: GII.Pe–GII.4 (4/19 ORF1–ORF2 junction-positive samples; GenBank ID: KX764870–KX764873; data not shown) and GII.P16–GII.4 (9/19 ORF1–ORF2 junction-positive samples; GenBank ID: KX764848–KX764856; data not shown). To reconfirm the RdRp genotyping results (GII.Pe and GII.P16) in the RIVM, the ORF1 sequences joining with the GII.4 capsid gene were used to construct a neighbor-joining phylogenetic tree (Fig. 4). As a result, phylogenetic analysis provided information on emergence of a GII.P16–GII.4 norovirus in the study area in 2016.

Fig. 4
figure 4

Neighbor-joining phylogenetic tree based on norovirus genogroup II polymerase genotype e and 16 sequences (GII.Pe and GII.P16; 0.7 kb) from the ORF1–ORF2 junction sequences bearing GII.4 region C (1.0 kb; n = 13) in the study area in March 2015 (open circle) and January 2016 (closed circle). The scale bar indicates the number of nucleotide substitutions per site. Bootstrap values of 95 or more are shown next to the branches. Each sample label number on the tree indicates the sampling date (e.g., 20150325 = March 25, 2015). EL or FL in each sample name indicates South Gyeongsang province or South Jeolla province, respectively. The names of reference sequences are indicated by bold and italic font

Phylogenetic Analysis of Whole GII.4 region C Sequences in the Study Area

To identify whether GII.4 VP1 sequences (clusters A and B) in the study area originated from the norovirus genome bearing GII.P16/GII.Pe and to identify which region C of GII.Pe–GII.4 and GII.P16–GII.4 sequences was closely related to GII.4 region C amplicons overtaking the GII.17 detection frequency since December 2015, a neighbor-joining phylogenetic tree was constructed based on the GII.4 region C sequences (0.3 kb) of each ORF1–ORF2 junction amplicon, VP1 amplicon, and region C amplicon (Fig. 5). In the phylogenetic tree, region C sequences from GII.P16–GII.4 (ORF1–ORF2 junction) were closely related (cluster I) to region C sequences of GII.4 VP1 amplicons isolated in January 2016 (cluster A in Fig. 3b); region C sequences of GII.Pe–GII.4 outside of cluster I were closely related with region C sequences of GII.4 VP1 amplicons isolated in March 2015 (cluster B in Fig. 3b). Interestingly, most sequences (32/41) of GII.4 region C amplicons since December 2015 were grouped in cluster I (Fig. 5).

Fig. 5
figure 5

Neighbor-joining phylogenetic tree based on norovirus genogroup II genotype 4 region C (0.3 kb; triangle) sequences from viral major protein 1 amplicons (VP1; 1.6 kb; n = 7; square), ORF1–ORF2 junction amplicons (1.0 kb; n = 13; circle), and region C amplicons (n = 41) in the study area from March 2015 to May 2016 (samples from March to September 2015: open triangle, open square, and open circle; samples from December 2015 to May 2016: filled triangle, filled square, and filled circle) in South Korea. VP1 and ORF1–ORF2 junction amplicons were isolated from GII.4 and GII.17 region C-positive samples in March 2015 and January 2016. The scale bar indicates the number of nucleotide substitutions per site. Bootstrap values of 60 or more are shown next to the branches. Each sample label number on the tree indicates the sampling date (e.g., 20150325 = March 25, 2015). EL or FL in each sample name indicates South Gyeongsang province or South Jeolla province, respectively. The names of reference sequences are indicated by bold and italic font

Discussion

For two decades, norovirus GII.4 had been a predominant genotype in gastroenteritis outbreaks by replacing established pandemic strains every 2–3 years, and ‘Sydney 2012’ was the last pandemic GII.4 strain, emerging in 2012 (Kim et al. 2013; Siebenga et al. 2007; Van Beek et al. 2013; Bull and White 2011; Cho et al. 2014b). Since late 2014, however, novel GII.17 variants have been reported as the main causes of outbreaks over GII.4 in East Asia and have also occurred in America and Europe (Fu et al. 2015; De Graaf et al. 2015; Thanh et al. 2016; Lu et al. 2015; Matsushima et al. 2015; Parra and Green 2015; Ham et al. 2015; Medici et al. 2015; Dinu et al. 2016; Chen et al. 2015; Chan et al. 2015). Moreover, monitoring of environmental waters from March 2014 to February 2015 demonstrated that GII.17 capsid sequences were detected at higher frequency than other genotypes since January 2015 in South Korea (Kim et al. 2016). Rapid evolution rates, broad HBGA binding spectra, and noticeable increases in the capsid genotype have been observed in recently hospitalized patients, suggesting that GII.17 may have replaced the previous GII.4 Sydney 2012 variants (GII.Pe–GII.4) in East Asia (De Graaf et al. 2015; Chan et al. 2015).

In this study, we found that the re-emerged GII.4 Sydney capsid variant overtook novel GII.17 variants in South Korea using 158 water samples from peri-urban coastal streams collected from March 2015 to May 2016. The number of GII.17 region C-positive cases in this study was higher than that of GII.4 from March 2015 to September 2015. A recent report from South Korea described the predominance of GII.17 in the first half of 2015 in clinical specimens (Ham et al. 2015). In December 2015, the number of GII.4 region C-positive cases in this study was higher than that in early 2015 and exceeded the number of GII.17-positive cases in the beginning of 2016 (Fig. 1).

Phylogenetic analysis showed that an independent cluster (cluster I in Fig. 5) of the GII.4 region C occurred beginning in December 2015, coinciding with region C sequences in cluster I joining with the ORF1 genotype GII.P16. Moreover, VP1 sequences from January 2016 were consistent with the Sydney 2012 variant comprising a new cluster with a reference sequence collected recently in Japan (GenBank ID; LC153122; March 2016), and region C in these sequences was also included in cluster I. Thus, GII.4 VP1 sequences isolated in 2016 may have combined with GII.P16 sequences. Notably, the new VP1 sequences of GII.4 noroviruses in the study area had no unique amino acid sequence differences in B-cell epitopes or HBGA binding sites compared with those of the Sydney 2012 sequences that emerged in 2015 in South Korea.

Different RdRp sequences for human noroviruses exhibit different nucleotide incorporation/substitution rates (Bull et al. 2010). Thus, renewal of RdRp may allow noroviruses to change their fitness in nature. Owing to the noticeable increase in GII.4 region C sequences in cluster I, GII.P16 may have conferred the norovirus genome bearing these GII.4 capsid sequences with greater competitiveness. Unknown functions of other nonstructural proteins encoded in ORF1 of GII.P16 may affect the prevalence of GII.P16–GII.4 sequences. Further cell-based studies are needed to determine the function of nonstructural proteins in ORF1. In addition to a recent norovirus outbreak caused by GII.P16–GII.4 in Japan (Matsushima et al. 2016), a recent report demonstrated that the GII.4 Sydney 2012 capsid sequence harboring the polymerase sequence of pandemic GII.4 New Orleans 2009 (GII.P4–GII.4) emerged and became predominant in Australia (Bruggink et al. 2016). From these observations, replacement of ORF1 of the pandemic GII.4 capsid could acquire high fitness, and these recombinants have the possibility of triggering unexpected outbreaks in the future.

In conclusion, we report the emergence and dramatic increase in GII.P16–GII.4 sequences in environmental water neighboring human communities in South Korea. Owing to frequent international exchanges and transportation, new norovirus variants emerging locally may have the potential to spread nationwide and globally. Researchers should carefully monitor these new variants to prevent another potential pandemic.