Coronaviruses are enveloped, single-stranded RNA viruses that belong to the subfamily Coronavirinae, family Coronaviridae, in the order Nidovirales. Based on the genetic distance and serological characterization, the family consists of four genera: alpha-, beta-, gamma-, and delta-coronaviruses ( Coronaviruses are important human pathogens that cause outbreaks of severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS) (de Groot et al. 2013; Drosten et al. 2003). Six human coronaviruses have been identified: human coronavirus 229E (HCoV-229E), HCoV-OC43, HCoV-HKU1, HCoV-NL63, SARS-CoV, and MERS-CoV (Hu et al. 2015). HCoV-229E, HCoV-OC43, HCoV-HKU1, and HCoV-NL63 are widespread in human populations and known to cause mild respiratory disease, while SARS-CoV and MERS-CoV had led to pandemics (Channappanavar and Perlman 2017). Stronger evidence showed that the direct ancestor of SARS-CoV, and likely MERS-CoV, originated in bats.

Bats are the only mammals capable of flight and represent approximately 20% species of all mammals (Hunter 2007). According to dietary differences, bats are distinguished as insectivores and frugivores (Stuckey et al. 2017). Frugivore bats are ideal bushmeat because of huge body and thick-flesh for local people in some districts in Africa and Southeast Asia (Mickleburgh et al. 2009). Meanwhile, frugivore bats in African or Pacific countries harbor diversity of virulent viruses, such as marburgvirus, hendra virus, and nipha virus (Shi 2013). In China, cross-reactive antibody or phylogenetically related viruses to henipaviruses, ebolaviruses and rabies virus have been detected in Chinese fruit bats (He et al. 2015; Jiang et al. 2010; Li et al. 2008; Yang et al. 2017; Yuan et al. 2012). In addition, genetically diverse reoviruses, adenoviruses, and coronaviruses have been detected or isolated from fruit bats (Du et al. 2010; Li et al. 2016; Tan et al. 2017).

Ro-BatCoV HKU9 and Ro-BatCoV GCCDC1 are two closely related but distinct betacoronavirus species found in Guangdong and Yunnan province, respectively. Both were found in the Chinese brown fruit bat Rousettus leschenaulti (Huang et al. 2016; Lau et al. 2010; Woo et al. 2007). HKU9 includes more variants and are genetically diverse, while GCCDC1 is less diverse. The greatest difference between these two viral species is the presence of p10 gene, which is thought to have been obtained from a reovirus, in the GCCDC1 genome (Huang et al. 2016; Lau et al. 2010). In Yunnan province, there are at least three fruit bat species, Eonycteris spelaea, R. leschenaultia, and an unclassified Rousettus species (He et al. 2015; Yang et al. 2017). These bats frequently cohabitate in the same cave and can only be distinguished by bat experts or molecular identification.

In this study, we conducted a longitudinal surveillance of the two betacoronaviruses in fruit bat samples collected during 2009–2016 in Yunnan province and reexamined the prevalence, genetic diversity, and host specificity of these viruses.

Materials and Methods

Sample Collection

Sampling was conducted as described previously (Li et al. 2005). Because of conservation concerns, for most captured bats, we collected fecal or anal samples and released the bats after sampling. Several bats were sacrificed for species identification and viral tissue tropism assays. Bat species were identified based on morphological characteristics and further confirmed by cytochrome b (Cytb) sequencing (Agnarsson et al. 2011). All samples were stored at − 80 °C until further analysis. All animal sampling processes were performed by veterinarians with approval from the Animal Ethics Committee of the Yunnan Institute of Endemic Diseases Control and Prevention.

Viral Detection

RNA was extracted from bat fecal or anal samples using the High Pure Viral RNA Kit (Roche, Basel, Switzerland). Partial RdRp was amplified using the SuperScript III One-Step RT-PCR and Platinum Taq Enzyme kit (Invitrogen, Carlsbad, CA, USA) by family-specific degenerate semi-nested PCR (Luna et al. 2007). Expected PCR products were gel-purified and subjected to sequencing using the Sanger ABI-PRISM platform (Applied Biosystems, Foster City, CA, USA). To exclude PCR contamination, the nucleotide sequences of the virus and bat Cytb of positive samples were evaluated by two independent PCRs by different experimenters. The partial RdRp sequences obtained in this study were submitted to GenBank under accession numbers MG762619–MG762664 for BatCoV HKU9 and MG762606–MG762618 for BatCoV GCCDC1.

Quantitative PCR (qPCR)

qPCR was used to investigate the tissue tropism of these viruses in various tissues. Total RNA was extracted from the hearts, livers, spleens, lungs, kidneys, brains, and intestines of six bats infected with bat coronaviruses HKU9 or GCCDC1 using the High Pure Viral RNA Kit. Partial RdRp representing HKU9 or GCCDC1 were cloned into the pGEM-T-easy Vector (Promega, Madison, WI, USA) and used as a positive control for quantitative analysis. Primers for the two different viruses were designed using IDT online software ( (Supplementary Table S1). The assay was carried out in triplicate on a CFX connect Real-Time system (Bio-Rad, Hercules, CA, USA) with the One-Step RT-PCR SYBR Green kit (Vazyme, Nanjing, China). The PCR thermal cycling parameters were 50 °C for 5 min, 95 °C for 10 min, and 40 cycles of 95 °C for 5 min, and 60 °C for 30 s. An absolute quantitative method was used to determine the number of copies of the viruses referring to the standard control generated from the positive sets.

Amplification of Full-Length S, N, and P10 Gene

Primers targeting the S, N, and P10 gene were designed based on alignment of the reported HKU9 or GCCDC1 sequences (primer sequences provided upon request). The first round of PCR amplification was performed in a total volume of 25 μL using SuperScript III One-Step RT-PCR (Invitrogen) under the following parameters: 50 °C for 30 min, 94 °C for 5 min; 35 cycles of 94 °C for 30 s, 50 °C for 30 s, and 68 °C for 3 min; and a final extension at 68 °C for 10 min. The second round of PCR amplification was performed in a total volume of 50 μL using the Platinum Taq Enzyme kit (Invitrogen) under the following conditions: 94 °C for 5 min; 35 cycles of 94 °C for 30 s, 50 °C for 30 s, and 72 °C for 3 min; and a final extension at 72 °C for 10 min. Expected PCR products were gel-purified and sequenced directly using target primers. Weak bands were cloned into the pGEM T-easy vector and sequenced using the Sanger ABI-PRISM platform. Full-length N and p10 sequences were deposited into GenBank under the following accession numbers: MG762665–MG762673, MG762688–MG762692, and MG762675–MG762687.

Full-Length Genome Sequencing and Characterization

One positive sample (ID: 2202) was further sequenced using an Illumina platform at Novogene (Beijing, China). Briefly, the supernatant of homogenized intestine was centrifuged at 10,000×g for 10 min at 4 °C. The supernatant was filtered through a 0.45-μm polyvinylidene difluoride filter (Millipore, Billerica, MA, USA) to remove eukaryotic and bacterial-sized particles. The filtered samples were then centrifuged at 100,000×g for 2 h. The pellets were resuspended in 140 µL Hanks’ solution and RNA was extracted with the QIAamp viral RNA minikit (Qiagen, Hilden, Germany) according to the manufacturer’s protocol. Sequence-independent PCR amplification was conducted as previously described (Ge et al. 2012). PCR products greater than 500 base pairs were excised and extracted with a MinElute Gel Extraction Kit (Qiagen). The PCR products were adaptor-anchored, pooled, and sequenced on an Illumina platform.

The filtered sequence reads were aligned to sequences in the NCBI nonredundant nucleotide database (NT) and nonredundant protein database (NR) downloaded from the NCBI FTP server using BLASTn and BLASTx, respectively. All reads matched to coronavirus were extracted and assembled using megahit and trinity software. Based on the partial genome sequences of viruses, the remaining genome sequences were determined by inverse PCR, genome walking, and 5′- and 3′-rapid amplification of cDNA ends (RACE). Next, the nucleotide sequence of the full-genome (accession numbers: MG762674) and deduced amino acid sequences of the open reading frames (ORFs) were compared to those of related betacoronaviruses. For coronavirus species demarcation, seven independent replicase domains in the ORF1ab of the virus were selected for further analysis.

Phylogenetic Analysis

Partial RdRp sequences, full-length N gene sequences, and full-length genomic sequences obtained in this study were aligned with those of HUK9, GCCDC1, and related coronaviruses and representative betacoronaviruses using ClustalW. The phylogenetic tree was constructed by the neighbor-joining method with MEGA7.0 software with 1000 bootstrap replicates. According to the structure of the phylogenetic tree, the identities of all sequences from different lineages were calculated using ClustalW in MegAlign.

Virus Isolation

Vero E6 and primary intestine cell lines of E. spelaea and R. leschenaulti were used for virus isolation. Cells were cultured and inoculated with viral RNA-positive samples after tenfold dilution. The cells were incubated in culture medium containing 5% fetal bovine serum. After three blind passages, the cell culture supernatant was tested for the presence of live virus by nested RT-PCR.


Prevalence of Betacoronavirus HKU9 and GCCDC1 and Related Viruses in Fruit Bats

A total of 555 fecal or anal samples from fruit bats were collected at four locations in Yunnan province, China in 2009–2016 (Fig. 1). By RT-PCR detection targeting partial RdRP, 46 (8.29%) samples were positive for HKU9 and 13 (2.34%) were positive for GCCDC1 or closely related viruses (Table 1). Different sampling times and sites showed different detection rates for HKU9. No positive results were detected in samples collected in Mengla, 2011 and Mojiang in 2013 (Table 1). HKU9 infection rates in Chuxiong, Mengla, and Jinghong were 18.59% (29/156), 5.32% (10/188), and 6.14% (7/114), respectively. GCCDC1 was not detected until 2015, with a positive rate of 5.26% in 2015 and significantly high positive rate in 2016 (18.86%) in Mengla.

Fig. 1
figure 1

Map of sampling sites in Yunnan province of China. Red regions indicate the four districts where bat samples were collected.

Table 1 Detection of BatCoV HKU9 and BatCoV GCCDC1 by RT-PCR in bat fecal or anal samples collected from four districts in the Yunnan province of China during 2009–2016.

Phylogenetic Analysis

The amplified partial RdRp sequences in this study shared 74.4%–100% identity at the nucleotide (nt) level. A phylogenetic tree was conducted based on the alignment of partial RdRp sequences along with previously reported HKU9, GCCDC1, and related stains, as well as representative strains of other betacoronaviruses. The results revealed 59 sequences classified as two coronavirus species, HKU9 or GCCDC1 (Fig. 2A). All sequences from Rousettus bats were HKU9-related viruses and those from E. spelaea were GCCDC1-related viruses. In contrast to the GCCDC1 strains which are highly similar, the HKU9-related strains were highly diverse. Within the HKU9 species, the sequences in this study and previously reported sequences were divided into 5 lineages: Lineage 1 comprising 28 sequences and previously reported HKU9-10-2, HKU9-5-2, and HKU9-2 exclusively from R. leschenaulti; Lineage 2 comprising 5 sequences and previously reported HKU9-1 from R. leschenaulti; Lineage 3 comprising 10 sequences and previously reported HKU9-4 from unidentified Rousettus species R. sp.; Lineage 4 comprising the previously detected HKU9-3, 9-5, and 9-10 from R. leschenaulti; Lineage 5 comprising 3 sequences from Rousettus species. The other 13 sequences were exclusively from E. spelaea and grouped with previously reported BatCoV GCCDC1 (Huang et al. 2016).

Fig. 2
figure 2figure 2

Phylogenetic analysis of the detected coronaviruses in this study. Partial RdRp sequences (A), complete nucleoprotein gene sequences (B), and full-length genomic sequence of BatCoV HKU9-2202 (C) were aligned with corresponding sequences of representative viral species in the genus Betacoronavirus. Phylogenetic trees were constructed using the neighbor-joining method implemented in MEGA7 and bootstrap values calculated from 1000 replicates. The sequence obtained in this study is labeled in color and named by the sample isolate identifier followed by bat species, location, and collection year.

To further characterize the relationships between the newly detected coronaviruses, we amplified the full-length sequences of S, N, and P10 gene from selected positive samples. We amplified N from 9 HKU9-related viruses and 5 GCCDC1-related viruses and P10 from 13 GCCDC1-related viruses. The amplifications of S failed for all positive samples. p10 amplified from this study shared 99%–100% similarity with previously reported sequences (Huang et al. 2016). The amplified N sequences of HKU9 and GCCDC1-related viruses showed 74.5%–100% and 95.2%–97.4% nt identity with each other, respectively. The phylogenetic tree constructed based on N showed a topology structure similar to that of RdRp (Fig. 2B).

Genomic Characterization of Novel Strains BatCoV HKU9-2202

The full-length genome sequence was obtained from one sample (BatCoV HKU9-2202) in lineage 5 by high-throughput sequencing and RACE. The genome of HKU9-2202 is 29,118 nt in length excluding the polyA tail, with a G/C content of 42%. The main ORFs of HKU9-2202 were predicted and deduced in the order: 5′-ORF1ab-Spike (S)-NS3-Envelope (E)-Membrane (M)-Nucleocapsid (N)-NS7a-NS7b-3′ (Table 2). The putative transcription regulatory sequences (TRSs) and their genomic localization were predicted based on the conserved core sequence (5′-ACGAAC-3′) of the TRSs of betacoronaviruses. Notably, in the putative TRS of E, there was a difference of one nucleotide with the consensus core sequences (Table 2).

Table 2 Amino acid identity, TRS and sequence comparisons of BatCoV HKU9-2202 with BatCoV HKU9 and BatCoV GCCDCC1.

Comparative genomic sequence analysis indicated that HKU9-2202 shared 83% nt identity with other previously reported BatCoV HKU9 strains. The most divergent regions were located in the S protein, which shared only 68% amino acid (aa) identity with those of other BatCoV HKU9. The aa identities of seven concatenated replicase domains, which were selected to define coronavirus species by the International Committee on Taxonomy of Viruses, shared 93% identity with other BatCoV HKU9, which was higher than the new species demarcation of 90%. Thus, the newly identified HKU9-2202 likely belongs to the BatCoV HKU9 species. To determine the evolutionary position of HKU9-2202, the full genome was subjected to phylogenetic analysis. HKU9-2202 formed a separate branch within the clade of BatCoV HKU9 species (Fig. 2C).

Tissue Tropism of batCoV HKU9 and GCCDC1-Related Virus

Tissues (heart, liver, spleen, lung, kidney, brain, intestine) from five bats positive for coronavirus were quantified by qPCR (Fig. 3). Higher virus genome copies were detected in all intestines and varied from 4.89 × 102 to 5.67 × 106 copies/g in different tissues. Three HKU9-positive bats (Bt9431, Bt9446 and Bt9466) showed wider tissue tropism, as demonstrated by the presence of viral RNA in the kidney, heart, and lung tissues (Fig. 3A). Three GCCDC1-positive bats (Bt9444, Bt9463, and Bt967) showed exclusive intestine tropism (Fig. 3B). The viral RNA was not detected in the brain, spleen, and liver tissues.

Fig. 3
figure 3

Tissue distribution of BatCoV HKU9 (A) and GCCDC1 (B) in positive bat samples.


In this study, we conducted a longitudinal study of BatCoV HKU9 and BatCoV-GCCDC1 as well as related coronaviruses in fruit bats in 2009–2016. Highly diverse HKU9-related CoVs were found in Rousettus bats, while GCCDC1-related viruses found in E. spelaea showed high similarity. For HKU9-related CoVs, in addition to four previously reported lineages (Lau et al. 2010), a novel lineage was identified in this study. Previous studies reported that all group 2d coronaviruses within the betacoronavirus were from R. leschenaulti. In this study, we identified all bat species positive for coronavirus by sequencing the Cytb gene and found that HKU9 and GCCDC1 were from two different genera, Rousettus and Eonycteris, respectively. HKU9 consists 5 lineages. Lineage 1 and 2 are from R. leschenaulti and Lineages 3–5 are from an unidentified species Rousetta sp. These results suggest that the coronaviruses may undergo host restriction and have a long evolution history with their hosts.

We amplified multiple N genes and obtained the full-length genomic sequence of a novel HKU9 of linage 5 (BatCoV HKU9-2202). The most notable sequence difference between this novel HKU9 and previously identified BatCoV HKU9s is within the S gene. The S protein of HKU9-2202 shares 61%–68% aa identity to those of previously identified HKU9. The S protein plays a pivotal role in mediating coronavirus entry into host cells. Whether mutations in S are responsible for virulence and tissue tropism of HKU9-2202 requires further analysis.

Coronavirus is known to infect the host through the respiratory system and intestines (Masters and Perlman 2013). In this study, we found that intestine tissues are the major target of BatCoV HKU9 and GCCDC1. However, some HKU9 was also detected in the kidney and lung, suggesting that BatCoV HKU9 has wide tissue tropism and the potential to be transmitted by the oral-fecal route and respiratory routes to infect other animals.

There are at least five fruit bat species in China, all which are located in tropical regions. These fruit bats feed on fruits and flowers and have frequent contact with peoples and farms, thus increasing the risk of spillover of bat viruses to domestic animals and humans. In our previous study, we also found that these bats harbor novel genetically diverse filoviruses, some of which were found to co-infect with BatCoV HKU9 or GCCDC1 in the same individual (Huang et al. 2016; Yang et al. 2017). Our results improve the understanding of variable viruses carried by fruit bats in China. Further studies are needed to investigate the virome of these bat populations and understand the spillover potential of these bat viruses to other animals and humans.