Introduction

Sapovirus (SaV) is a small nonenveloped virus belonging to the family Caliciviridae. The SaV genome has a positive-sense, single-stranded RNA, which is approximately 7.1–7.7 kb in size organized into two or three open reading frames (ORFs). Human SaV is classified into 4 genogroups (GI, GII, GIV and GV) based on the complete VP1 nucleotide sequences, which are further subdivided into 18 genotypes (GI.1–GI.7, GII.1–GII.8, GIV.1, GV.1 and GV.2) (Oka et al. 2015; Kagning Tsinda et al. 2017).

Human SaV is an important causative agent of nonbacterial gastroenteritis among population (Platts-Mills et al. 2018). According to recent data, human SaV resulted in about 2.2–22.6% of the gastroenteritis worldwide (Mancini et al. 2019). All age groups especially infants are susceptible to human SaV (Xiaoli L. Pang et al. 2014; Rockx et al. 2002; de Wit et al. 2001). Although the severity of SaV-associated gastroenteritis is generally milder than norovirus and rotavirus-associated gastroenteritis (Page et al. 2016; Sakai et al. 2001), human SaV can result in hospitalization (Lee et al. 2012; Medici et al. 2012). Human SaV has been identified in both sporadic and outbreak cases of the gastroenteritis (Oka et al. 2015). A Meta-analysis reported that outbreaks were mainly caused by GI and GIV worldwide (Yu et al. 2019), although there were outbreaks associated with other genogroups (Yu et al. 2019; Hergens et al. 2017; Oka et al. 2017). Outbreaks caused by human SaV always occurred in closed and semi-closed settings, such as kindergartens, hospitals, ships, long-time care facilities, and schools (Yamashita et al. 2010; Pang et al. 2009; Usuku et al. 2008; Yan et al. 2005). In China, 2 gastroenteritis outbreaks associated with human SaV occurred in Shenzhen during 2015–2016 (Wang et al. 2018). Human SaV poses significant disease burden, which highlights the emerging role as a public health issue (Liu et al. 2016).

Due to the relatively low positive rate of human SaV in gastroenteritis patients (compared to norovirus and rotavirus), information on its genotype diversity in China is limited. Human SaV can be detected in sewage. Environmental surveillance on human SaV have been conducted in Thailand, Japan, Italy, Brazil, Tunisia, etc (Khamrin et al. 2020; Mancini et al. 2019; Ibrahim et al. 2019; Thongprachum et al. 2018; Fioretti et al. 2016; Murray et al. 2013; Kitajima et al. 2010) to study its molecular epidemiology, whereas, to the best of our knowledge, no studies on human SaV in sewage in China have been reported yet.

Generally, PCR amplicons of human SaV genomes from sewage contain multiple genotypes and variants. Cloning and Sanger sequencing offers a labor-consuming and inefficient approach in previous studies (Kumthip et al. 2020; Ibrahim et al. 2019; Thongprachum et al. 2018; Fioretti et al. 2016; Murray et al. 2013; Kitajima et al. 2010). Recently, next generation sequencing (NGS)-based amplicon sequencing has been carried out successfully for detection of viruses in sewage, such as SARS-CoV-2 (Ahmed et al. 2020), enterovirus (Majumdar and Martin 2018; Montmayeur et al. 2017), norovirus (Fumian et al. 2019; Suffredini et al. 2018), adenovirus (Iaconelli et al. 2017), and human SaV (Mancini et al. 2019). NGS has the advantages of high sensitivity and high throughput for detecting viruses in mixed samples and it can detect less prevalent genotypes undetectable comparing to Sanger sequencing (Mancini et al. 2019). Here, we collected sewage samples monthly during 2017–2019 in Jinan, China and analyzed human SaV by quantitative PCR and NGS-based amplicon sequencing to study its genotypes and genetic diversity.

Materials and Methods

Sampling

Between 2017 to 2019, 36 raw sewage samples were collected monthly by using grab sampling method from the influent of a wastewater treatment plant (WWTP) in Jinan, the capital city of Shandong Province, China. The WWTP collects the domestic sewage from approximately one million inhabitants. A total of 1 liter of sewage sample was collected into sterile containers each time and stored at a low temperature (4 °C) before processing.

Sewage Concentration and RNA Extraction

The sewage samples were concentrated 100-fold by mixed cellulose ester (MCE) membrane adsorption and ultrasonication elution method as described previously (Matsuura et al. 1984; Berg et al. 1971). Briefly, 1 liter of sewage was centrifugated at 3200×g for 30 min at 4 °C. The supernatant was adjusted to a final Mg2+ concentration of 0.05M and a pH value of 3.5 by using MgCl2 and hydrochloric acid. After the solution was filtered through an MCE membrane, the virus absorbed on the membrane was eluted with 10 ml of 3.0% beef extract solution (pH 8.5, adjusted by NaOH) by 3-min ultrasonication. The eluent was centrifugated again at 3000×g for 30 min, filtered through a 0.22 μm filter, and was adjusted to the pH value of 7 by hydrochloric acid.

Total viral RNA was extracted from 420 μl of concentration solution to a final volume of 50 μl by using QIAamp Viral RNA Mini Kit (QIAGEN, USA), according to the manufacturer’s instructions.

qRT-PCR

The qRT-PCR assay was carried out using SaV124F, SaV1F, SaV5F, and SaV1245R primers and SaV5TP and SaV124TP probes, which targeted polymerase-capsid junction region (Oka et al. 2006). Five microliters of RNA extract were subjected to amplification by using AgPath-ID One-Step RT-PCR reagents (ABI) with a final volume of 25 μl. Each sample was tested in duplicates. The amplification conditions were reverse transcription at 45 °C for 10 min, denaturation at 95 °C for 10 min, and followed by 40 cycles of 95 °C for 15 s and 60 °C for 45 s.

Quantification of Pepper mild mottle virus (PMMoV) RNA in sewage was performed via qRT-PCR using AgPath-ID One-Step RT-PCR reagents as internal control (Kitamura et al. 2020). The primers and probes in the qRT-PCR assay was accordant with published literatures (Haramoto et al. 2013; Zhang et al. 2006). Five microliters of RNA extract were amplified in a final volume of 25 μl with the cycling conditions of reverse transcription at 50 °C for 30 min and denaturation at 95 °C for 30 s, followed by 45 cycles of 95 °C for 5 s and 60 °C for 60 s.

Nested RT-PCR and NGS

A nested RT-PCR-targeting polymerase-capsid junction region of all the human SaV was performed according to previous reports (Kitajima et al. 2010). The first round PCR was performed by using SuperScript™ IV One-Step RT-PCR System with a final volume of 25 μl. The forward primers were SaV124F, SaV1F, and SaV5F, while the reverse primers were SV-R13 and SV-R14. The amplification conditions were 45 °C for 30 min and 98 °C for 2 min and followed by 40 cycles of 98 °C for 10 s, 50 °C for 10 s, and 72 °C for 1 min with a final extension step of 72 °C for 5 min. The second round was performed by using Platinum Taq DNA Polymerase in a final volume of 100 μl. The forward primer and the reverse primer were SV-1245Rfwd and SV-R2, respectively. The amplification conditions were 94 °C for 2 min and followed by 35 cycles of 94 °C for 30 s, 50 °C for 30 s, and 72 °C for 30 s with a final extension step of 72 °C for 5 min. PCR products were analyzed by agarose (1.5%) gels electrophoresis. The lengths of products in the first and second round PCR were 800 bp and 430 bp, respectively. The positive products were forward to NGS analysis.

NGS library preparation and Miseq sequencing using 2×150 bp paired-end reads method were performed by Shanghai BioGerm Medical Biotechnology Company. Then clean data were assembled de novo to form contigs using CLC Genomics Workbench 12.0 (QIAGEN, USA) with default parameters. Following trimming, contigs length less than 200 bp were removed. Contigs with the average coverage > 30 were exported and classified into different genotypes using BLAST against a local SaV database. Sequences with E value less than E-100 were forwarded for further analysis.

Nucleotide Diversity and Phylogenetic Analysis

The Simpson's diversity index based on the numbers of NGS reads was calculated to describe the diversity of human SaV. The nucleotide sequences were aligned by Bioedit 7.0.9.0. The nucleotide substitution model that fitted our data best was identified by MegaX. Kimura 2-parameter model with gamma-distributed rates (K2+G) was the best-fit nucleotide substitution model. Phylogenetic tree including the sequences obtained in this study and those from GenBank was constructed based on partial VP1 nucleotide sequences (nt 5179–5571 corresponding to strain Hu/SaV/Manchester/1993/UK with accession number X86560) via Neighbor-Joining method with K2+G model by MegaX. Bootstrap test with 1000 replicates was used to measure of the robustness each node (Kumar et al. 2016).

Results

Human SaV Prevalence in Sewage

In the present study, 36 sewage samples were collected monthly during January 2017–December 2019. Using qRT-PCR assay, 35 out of 36 (97.22%) sewage samples were positive for human SaV nucleic acid and the sample collected in February 2019 was negative. Using nested RT-PCR assay, 33 (91.67%) samples were positive and samples collected in September 2018, February 2019, and September 2019 were negative.

Quantitative RT-PCR (qRT-PCR)

According to the qRT-PCR, the peak of human SaV concentration in sewage (4.8 × 105 genome copies per liter) was observed in December 2017, whereas the sample from August 2018 had the lowest concentration of human SaV (2.3 × 103 genome copies per liter) (Fig. 1a). The human SaV copies in spring (March to May), summer (June to August), autumn (September to November), and winter (December to February) were compared, and no statistically significant difference in viral copies concentration among four seasons was observed (Kruskal–Wallis test, P > 0.05) (Fig. 1b).

Fig. 1
figure 1

Sapovirus concentration (copies per liter) in sewage monthly from January 2017 to December 2019, by month (a) and by season (b). The sample collected in February 2019 was negative in SaV qRT-PCR assay, and is not included in b

Quality Control

In this study, PMMoV RNA in all 36 samples was examined via qPCR as internal quality control. As shown in Supplementary Table S1, PMMoV was tested positive in all samples and the concentration of PMMoV in sewage samples ranged from 1.26 × 107 to 7.44 × 108 genome copies per liter, which was relatively stable, suggesting the reliability of genome quantification in this study.

NGS-Based Amplicon Sequencing and Genotypes

Nested RT-PCR and NGS analysis showed that human SaV sequences in sewage were classified into 10 genotypes belonging to 4 genogroups (GI, GII, GIV and GV). Of the total 301,501,545 reads, 195,852,867 reads were aligned to GI.2 reference sequences (65.0%), followed by 84,714,998 reads to GI.1 (28.1%), 16,624,161 reads to GV.1 (5.5%), and 4,205,026 reads to GI.3 (1.4%) (Fig. 2b). In addition, some rare genotypes (<0.05% of monthly identified sequences) were detected in this study, including GII.5 (72,332 reads, 0.024%), GII.1 (19,668 reads, 0.0065%), GII.NA1 (4,239 reads, 0.0014%), GII.3 (4,196 reads, 0.0014%), GI.6 (3,897 reads, 0.0013%), and GIV.1 (161 reads, 0.000053%). (Table 1, Fig. 2a).

Fig. 2
figure 2

Sapovirus genotype distribution in sewage samples from NGS in three years (a). Monthly distribution of major genotypes (>1% of monthly identified sequences) in raw sewage in Jinan, China from 2017 to 2019 (b)

Table 1 The number of sapovirus reads detected in raw sewage, by month and by genotype

Major and Minor Genotypes

It was found that GI.2 and GI.1 were the most predominant genotypes in sewage in Jinan city. Among the 33 samples which were forwarded to NGS, GI.2 nucleic acid and GI.1 nucleic acid were detected in 30 (90.91%) and 20 samples (60.61%), respectively. A switch of predominant human SaV genotype in sewage was observed during the study period. Before May 2019, GI.2 was the predominant genotype of most months accounting for 76.28 % of total reads. After July 2019, however, the main genotype had been changed to GI.1 which accounted for 94.67% of total reads (Fig. 2b). Some genotypes appeared only in several months. For example, 10 out of 33 samples were positive for both GII.3 and GII.5 nucleic acid (30.30%). GII.NA1 nucleic acid was found only in 4 samples (12.12%). GI.5 and GIV.1 nucleic acid were only found in the samples collected in May 2019 and June 2019, respectively (3.03%) (Table 1). Compared to the common genotypes, the reads of these rare genotypes were not only detected in fewer months, but also with smaller number.

Diversity

Generally, multiple genotypes were coexisting in sewage in Jinan during the study period. The Simpson’s diversity index was calculated to analyze the richness and evenness of human SaV in China. It ranged from 0 to 0.539, which implied the variable diversity of genotypes in the population during three years. The Simpson’s diversity index was generally consistent with the number of genotypes except for some samples in several months (Fig. 3).

Fig. 3
figure 3

The Simpson’s diversity index and the number of genotypes monthly in three years

Homology and Phylogeny

The phylogenetic tree based on SaV partial VP1 sequences was constructed to investigate the relationship between strains obtained in this study and those detected from human feces, wastewater, and shellfishes throughout the world (Fig. 4). The nucleotide sequences of strains in the present study were close to reference strains around the world. The phylogenetic tree was grouped into 10 main clusters. The homology of the nucleotide sequences in this study ranged from 87.5% to 100% for GI.1 and from 87.9 to 100% for GI.2. As to GII.5 nucleotide sequences obtained in this study, they had high homology (93–98.5%) with the reference strain from Guatemala and Germany. For the newly detected genotype, GII.NA1, identity ranged from 87.9 to 94.5% compared with that isolated from Kenya and Cameroon. Moreover, we detected only one GIV.1 sequence in this study, and it shared high homology (99.4%) with the reference strain from China in 2008 and Japan in 2011.

Fig. 4
figure 4

Phylogenetic tree of sapovirus strains based on partial VP1 nucleotide sequences (nt position 5179–5571 corresponding to strain Hu/SaV/Manchester/1993/UK with accession number X86560). The tree was generated using the Neighbor-Joining method with Kimura 2-parameter model and gamma-distributed rates in MegaX, with representative strains derived from sewage in Jinan, China, and reference strains from GenBank. The time of sequences detected in present study is characterized as different shapes: (filled square) for 2017; (filled circle) for 2018; and (filled triangle) for 2019. The origin of sequences in the tree was presented in other shapes: (filled star) from human stool; (open star) from sewage; and (filled inverted triangle) from shellfishes

Discussion

With the development of molecular techniques, human SaV could be detected with highly diagnostic efficacy to investigate its prevalence among population. Sporadic and outbreaks related to human SaV have been reported in Asia (Thongprachum et al. 2018; Kitajima et al. 2010), Europe (Mancini et al. 2019), North America (Kitajima et al. 2018), and South Africa (Ibrahim et al. 2019; Murray et al. 2013). However, only a few studies in China have investigated human SaV in clinical samples, providing limited information on genetic and genotype diversities of human SaV in local population (Wang et al. 2014). The results of this study are useful in understanding human SaV circulation in China.

Human SaV can be discharged into the environment via sewage effluents, where it can remain infective persistently (Sinclair et al. 2008). It can be transmitted by fecal-oral route, especially feces-contaminated water. There have been studies proving that asymptomatic patient shed viruses at levels comparable to those shed by gastroenteritis patients (Kobayashi et al. 2012; Yoshida et al. 2009). Environmental surveillance has the advantages of sensitivity, wide representative scopes, and good correlation with the presence of viruses in population, which can be used as a strong supplement to clinical surveillance (Iwai et al. 2009; Ozawa et al. 2019). Our findings show a high detection rate of 97.22% with detection of multiple genotypes, reflect continuous circulation of human SaV among local population, and reveal high sensitivity and importance of environmental surveillance in monitoring enteric viruses.

Traditionally, environmental surveillance is performed using Sanger sequencing technology. However, it has some defects in that it can only detect major genotypes in the mixed pool. On the contrary, NGS enhances the understanding of genetic diversity for it can recognize rare genotypes which were concealed by major ones when using Sanger sequencing. For example, a study from Italy showed that NGS revealed 3 additional genotypes (GI.6, GII.6 and GV.1) beyond the 4 (GI.1 GI.2 GI.3 and GII.1) detected by Sanger sequencing (Mancini et al. 2019). In this study, NGS was performed with 33 sewage samples which were positive for nested RT-PCR during the three-year period. Ten genotypes were identified, including 4 major genotypes (> 1% of monthly identified sequences) and 6 rare genotypes (< 0.05% of monthly identified sequences), demonstrating NGS-based amplicon sequencing is an effective approach in analyzing complicated samples. The existence of extremely rare genotypes was also observed in similar studies on enterovirus and norovirus (Tao et al. 2020; Fumian et al. 2019). During the process from nested PCR to NGS, there may be some factors influencing the proportion of nucleic acids from different genotypes. Thus, it is reasonable to conclude that constitution of the number of NGS reads cannot fully reflect the nucleic acid composition in the original sample.

Previously, several studies in China have investigated human SaV genotypes in clinical samples. Wang et al. detected human SaV in 42/1,125 (3.73%) samples collected from adult outpatients with acute gastroenteritis in Shanghai, China from April 2011 to March 2013 (Wang et al. 2014), and GI.2 was the most predominant genotype (78.5%; 33/42). Subsequently, Xue et al. detected SaV in 11/569 (1.93%) fecal samples from acute diarrhea patients in south China from 2013 to 2017, and GI was positive in 9 samples (Xue et al. 2019). In keeping with previous studies (Makhaola et al. 2020), GI was the most prevalent genogroup, detected in 32 out of 33 samples, and GI.2 was the most prevalent genotype, reflecting its high activity in local population during the study period.

Previous studies from the USA (Kitajima et al. 2018) and Japan (Harada et al. 2013; Harada et al. 2012; Harada et al. 2009) had observed dynamic changes of human Sav genotypes. Similarly, a switch of major genotype from GI.2 to GI.1 was observed around June 2019 in this study. This phenomenon might result from the changes of population immunity levels or the changes of infectivity, as needs further investigation. In addition, GV.1 was a rarely detected genotype in China. However, it was the most prevalent genotype in two months during the study period (Fig. 2b), suggesting high activity at that time. Moreover, we identified two genotypes, GII.NA1 and GII.5, to the best of our knowledge, these two genotypes had not been reported before in China, demonstrating the high sensitivity of NGS. GII.5 has been described in a food-borne gastroenteritis outbreak among adults in Japan (Oka et al. 2017), in pediatric patients with acute gastroenteritis in Thailand (Kumthip et al. 2020), and in children younger than 5 years of age in Guatemala (M. Diez-Valcarce et al. 2019a, b). According to the phylogenetic analysis, GII.NA1 strains we obtained in this study were closely related to the strains detected in human stool samples in Kenya in 2005 and 2008 (Marta Diez-Valcarce et al. 2019a, b) and in Cameroon in 2014 (Yinda et al. 2019). Consecutive clinical and environmental surveillance is needed to provide meaningful information for understanding the prevalence and pathogenicity of these two genotypes in China in the future.

Although PMMoV belongs to plant viruses, it is one of the most abundant virus types in a metagenomic survey of RNA viruses from human feces (Zhang et al. 2006). The abundance of PMMoV in stool samples does not depend on its infection status in human and will not change seasonally (Haramoto et al. 2013). Therefore, it is a potential indicator of pollution degree of water by human feces (Kuroda et al. 2015; Malla et al. 2019). PMMoV was used as an indicator to evaluate the accuracy of quantified values of human SaV genome in sewage. Significant levels of PMMoV were detected in all sewage samples, indicating that the observation of samples with low or no human SaV detection were not due to the presence of PCR inhibitors in the quantification process. No statistically significant seasonality of viral concentration was observed in the present study, similar with a study from the USA which detected no clear seasonality pattern over one-year period (Kitajima et al. 2014). However, studies in Brazil showed the existence of seasonal differences in virus concentration. Among four seasons, higher rate of human SaV from wastewater was observed in rainy seasons (summer and autumn) (Fioretti et al. 2016). The difference of dynamics of the human SaV may occur due to different continental dimensions. The viral concentration in this study ranges from 103 to 105 genome copies per liter, which is in accordance with that in Brazil, Japan, and the USA (Fioretti et al. 2016; Kitajima et al. 2014; Haramoto et al. 2007), suggesting Jinan is also an endemic area of human SaV. Among GI, GI.1 was likely to be more prevalent during cold seasons, in line with studies reporting a higher positive proportion of clinical samples in winter (Varela et al. 2019).

There may be some limitations in this study. We only detected human SaV in sewage supernate, ignoring the sewage sludge, which is needed to be investigated in the future. Also, all samples tested in this study were collected from raw urban sewage, lacking stool samples from gastroenteritis patients. The actual prevalence of human SaV infection among people could not be estimated exactly. Further studies are needed to test human SaV from both clinical and environmental specimens to acquire a comprehensive understanding of human SaV.

In conclusion, this study provided a comprehensive picture of genotypes and genetic characterization of human SaV in sewage in Jinan, China by NGS-based environmental surveillance, which greatly improves our understanding on human SaV circulation in communities. NGS should be encouraged as a sensitive surveillance tool in the future.