Genetic diversity of coronaviruses in Miniopterus fuliginosus bats

Coronaviruses, such as severe acute respiratory syndrome coronavirus and Middle East respiratory syndrome coronavirus, pose significant public health threats. Bats have been suggested to act as natural reservoirs for both these viruses, and periodic monitoring of coronaviruses in bats may thus provide important clues about emergent infectious viruses. The Eastern bent-wing bat Miniopterus fuliginosus is distributed extensively throughout China. We therefore analyzed the genetic diversity of coronaviruses in samples of M. fuliginosus collected from nine Chinese provinces during 2011–2013. The only coronavirus genus found was Alphacoronavirus. We established six complete and five partial genomic sequences of alphacoronaviruses, which revealed that they could be divided into two distinct lineages, with close relationships to coronaviruses in Miniopterus magnater and Miniopterus pusillus. Recombination was confirmed by detecting putative breakpoints of Lineage 1 coronaviruses in M. fuliginosus and M. pusillus (Wu et al., 2015), which supported the results of topological and phylogenetic analyses. The established alphacoronavirus genome sequences showed high similarity to other alphacoronaviruses found in other Miniopterus species, suggesting that their transmission in different Miniopterus species may provide opportunities for recombination with different alphacoronaviruses. The genetic information for these novel alphacoronaviruses will improve our understanding of the evolution and genetic diversity of coronaviruses, with potentially important implications for the transmission of human diseases.


INTRODUCTION
Coronaviruses (CoVs; order Nidovirales, family Coronaviridae, subfamily Coronavirinae) are enveloped RNA viruses with unusually large, positive-stranded RNA genomes of 26-32 kb (Lai, 2001). The viral genome contains five major open reading frames (ORFs) that encode the replicase polyproteins (ORF1a and ORF1b), spike (S), enve-lope (E), and membrane (M) glycoproteins, and the nucleocapsid protein (N) (Gonzalez et al., 2003;Holmes and Enjuanes, 2003). According to a proposal submitted to the International Committee on the Taxonomy of Viruses, CoVs can be classified into four genera, Alphacoronavirus, Betacoronavirus, Gammacoronavirus, and Deltacoronavirus, which replace the traditional CoV groups 1, 2, and 3 (King et al., 2011;Woo et al., , 2012. CoVs are known to cause upper and lower respiratory diseases, gastroenteritis, and central nervous system infections in a number of avian and mammalian hosts, including humans (Weiss and Navas-Martin, 2005). Bats have been increasingly recognized as important natural reservoirs for CoVs. In particular, previously unknown CoVs related to severe human pathogens, such as severe acute respiratory syndrome (SARS) CoV  and Middle East respiratory syndrome CoV (van Boheemen et al., 2012), were discovered in bats from China and other countries, with consequent recent increases in research into the biodiversity and genomics of CoVs in different bat species.
The diversity of CoVs arises from the infidelity of RNA-dependent RNA polymerase (RdRp), the high frequency of recombination, and the large genomes of CoVs ). These factors have generated diverse strains and genotypes of the CoV lineage, and have given rise to new lineages able to adapt to new hosts. These new lineages have occasionally caused major zoonotic outbreaks with disastrous consequences (Woo, 2006).
A previous study reported the detection of several novel bat CoVs (BtCoVs) in Miniopterus magnater and Miniopterus pusillus from Hong Kong (Chu et al., 2008), and in Miniopterus fuliginosus from Japan (Shirato et al., 2012). However, despite being the most extensively distributed Miniopterus species in China, the CoVs harbored by M. fuliginosus (the Eastern bent-wing bat) have not been systematically studied. M. fuliginosus are known to migrate long distances and typically roost with large numbers of bats from different genera, including Rhinolophus, Hipposideros, and Myotis (Cui et al., 2007;Miller-Butterworth et al., 2003), which habits may facilitate viral exchange between different bat species. Furthermore, our understanding of the diversity of CoVs in the genus Miniopterus remains limited. We therefore launched a survey to determine the dynamics and prevalence of CoVs in M. fuliginosus living in different geographical regions. In the current study, we explored the genetic diversity of CoVs in M. fuliginosus in China by analyzing 194 bat samples collected from nine Chinese provinces during 2011-2013.

Bat surveillance and identification of CoVs
A total of 194 M. fuliginosus bats were captured in nine provinces of China from October 2010 to October 2013, and pharyngeal and anal swabs were collected ( Figure 1). All sampling sites were in or close to human gathering places. Only the anal swab samples harbored CoVs according to single-strain screening with conserved primers, and the positivity rates for each province are shown in Figure 1. Sequence analysis of the PCR amplicons identified alpha-CoV-positive bats in six provinces (Guangdong, Hubei, Fujian, Henan, Anhui, and Jiangxi), but no other CoV genera were found. Interestingly, co-infections with different CoVs were detected in two M. fuliginosus anal specimens; one from Guangdong and one from Henan.
We selected samples positive for CoVs that were representative of each province for genomic sequencing and established the complete genomic sequences of six alpha-CoVs: BtMf-AlphaCoV/Guangdong2012 (GD), BtMf-AlphaCoV/Hubei2013 (HB), BtMf-Alpha CoV/Fujian2012 (FJ), BtMf-AlphaCoV/Henan2013 (HN), BtMf-AlphaCoV/ Figure 1 The nine provinces (indicated in blue) in China, where bats were captured, and samples were collected. The numbers on the right indicate the numbers of samples positive for Lineage 1 (L1) and Lineage 2 (L2) and the total number of samples collected in each province. The red shading on Guangdong and Henan indicate the regions where co-infections of two lineages were detected. Anhui2011 (AH), and BtMf-AlphaCoV/Jiangxi2012 (JX). We also established partial genomic sequences of five other alpha-CoVs: BtMf-AlphaCoV/Guangdong2012-a (GD-a), BtMf-AlphaCoV/Guangdong2012-b (GD-b), BtMf-Alpha-CoV/ Hubei2013-a (HB-a), BtMf-AlphaCoV/Henan2013-a (HN-a), and BtMf-AlphaCoV/Henan2013-b (HN-b). The GD and GD-b sequences were identified in the same sample from Guangdong, and the HN and HN-b sequences were identified in the same sample from Henan.

Genomic sequences
The sizes of the BtCoVs GD, HB, FJ, HN, AH, and JX genomes, excluding the 3′ poly(A) tails, were 28,748, 28,745, 28,755, 28,725, 28,300, and 28,301 nt, respectively, with G+C contents of 41.8%, 41.85%, 41.87%, 41.98%, 38.17%, and 38.19%, respectively. The genomic organization of these CoVs was similar to that of other alpha-CoVs (Table  1). The main difference among genomes was in ORF7, which was present in GD, HB, FJ, and HN, but absent in AH and JX. We then compared the complete genomes (Table 2). The full-length genomic sequences of HB, FJ, and HN showed 91.9%-97.0% nt identities with each another, and lower identity with the GD genome (82.1%-85.7%). In contrast, AH and JX exhibited 96.2% overall nt identity with each other, and lower identities with the other four genomes (68.0%-68.8%). The sizes of the 5′ untranslated regions of GD,HB,FJ,HN,AH,and JX were 270,269,268,268,272,and 273 nt,respectively. The core sequences of the leader transcription regulatory sequence (TRS; 5′-CUAAAC-3′) were identified in the 5′ untranslated sequences (Table 3). The TRSs of ORF3 and the E genes in AH and JX differed from those of the other four CoVs. The TRS of ORF7 in FJ and GD (CUGAAU) differed by 1 nt from that in HB and HN (CUGAAC). Apart from ORF3, E, and ORF7, the TRSs for the other ORFs were predicted in these six CoV genome sequences.
The most striking differences among CoVs were observed in the S protein sequence. The S gene sequence had five nts (AAAAU) inserted between the TRS and AUG in all CoVs except HB CoV (Table 3). Interestingly, the S protein (1,378 aa) was the same size in all members of Lineage 1, except HB (1,374 aa). However, the HB S protein shared only about 52.5%-52.8% aa identities with the S proteins of other Lineage 1 CoVs. Among the other Lineage 1 CoVs, the S proteins of FJ and HN were 98.0% identical, but they shared only 87.5% and 87.8% aa identity, respectively, with GD. In Lineage 2, AH and JX S proteins were 93.2% identical. Notably, the S proteins of GD, FJ, and HN in Lineage 1 appeared to be more closely related to the S proteins of Lineage 2 CoVs (59.6%-61.0%) than to the S protein of HB (52.5%-52.8%). Inter-ProScan analysis predicted that all six CoVs included type I membrane glycoproteins, where most of the protein (prior to residues 1318/1319/1322) was exposed on the outside of the viral capsule, and the C terminus comprised a transmembrane domain (residues 1319/1320/1323-1341/1342/1345), followed by the internal region in the virion, which was rich in cysteine residues. The S protein responsible for virus entry was divided into two domains; the S1 domain involved in receptor binding and the S2 domain for cellular membrane fusion. The putative S1 region was located at residues 229-741 for HB; 227-739 for GD and AH, 228-740 for JX, and 224-739 for FJ and HN. The diversity of S proteins was mainly within the S1 domain. HB S1 showed 93.3% aa identity with BtCoV-HKU8 and 39.6%-41.5% with other Lineage 1 and Lineage 2 CoVs. AH shared high aa identities with Lineage 2 CoVs in the S1 region (86.8%-93.7%), and GD had 85.1%-85.7% aa identities with FJ and HN. Analysis of the aa identities of the S1 region were consistent with the phylogenetic trees for the whole S region ( Figure  2). S2 included two putative heptad repeat regions, important for membrane fusion and viral entry (Bosch et al., 2003), located at residues 977-1122 and 1264-1320 in GD, FJ, and HN, 975-1120and 1260-1316in HB, and 973/974-1122/1123and 1252/1253-1311/1312 in AH and JX.
ORF3, which encoded putative 222-aa and 219-aa proteins in Lineage 1 and Lineage 2 CoVs, respectively, was located between the S and E sequences in all six genomes. a) For putative ORFs, we aligned the TRS that preceded the start codon AUG with the leader TRS. The core sequence is indicated in a box. The start codons of genes are in bold type.
The E, M, and N proteins were highly conserved within CoVs of the same lineage (>90% identity) and were diverse between lineages (63.6%-73.6%). ORF7 was located at the 3′ end of the Lineage 1 virus genome, and overlapped with the N gene. ORF7 encoded a putative NSP of 239-248 aa residues in FJ, HN, and HB. Interestingly, ORF7 in GD possessed two small ORFs, encoding putative proteins of 56 and 164 aa residues, respectively (Table 1).

Phylogenetic analyses
We performed phylogenetic analyses based on the aa se-quences of the RdRp, S, E, M, and N proteins of these BtCoVs, including the RdRp and S proteins in the five partial CoV sequences (GD-a, GD-b, HB-a, HN-a, and HN-b). Phylogenetic trees were constructed using MEGA5.0 software, based on the deduced aa sequences. Several reference CoV genome sequences were downloaded from GenBank and aligned with the fragments of the newly discovered CoVs (Figure 2). The results of the phylogenetic analyses were consistent with those of the sequence identity analyses, and confirmed that the newly identified alpha-CoVs could be divided into two lineages. The aa sequences of the RdRp, E, M, and N proteins in Lineage 1 viruses always clustered with BtCoV HKU8, found in M. pusillus. In contrast, phylogenetic analysis based on the S proteins showed a different tree structure, in which GD, FJ, and HN in Line-age 1 clustered together in a clade with Lineage 2 viruses, and HB and BtCoV HKU8 formed a relatively distant cluster, sharing 95.7% aa identity with each other and only 52.7%-53.5% identity with the other three Lineage 1 CoVs. Phylogenetic analysis of the S protein thus indicated that Lineage 1 CoVs could be further divided into two types: type I (HB and HKU8) and type II (FJ, HN, and GD). According to the phylogenetic trees, Lineage 2 viruses (AH, JX, GD-a, HB-a, and HN-a) always clustered with BtCoV 1A, found in M. magnater (>99.7% nt identity in RdRp and >91.4% aa in S protein), and GD-b and HN-b with BtCoV 1B, found in M. pusillus (98.7% aa identity with RdRp and about 92.0% with S protein). These tree branches were very short, reflecting the high sequence similarities.

Recombination analyses
Co-infection with different CoVs in the same bat may create opportunities for recombination, potentially resulting in the emergence of new viruses. Co-infections with different lineages in M. fuliginosus were detected in two anal specimens collected in Guangdong and Henan (Wu et al., 2015). Previous studies have shown that CoVs have a tendency to undergo RNA recombination (Herrewegh et al., 1998;Lai and Cavanagh, 1997;Lau et al., 2012b;Makino et al., 1986;Zeng et al., 2008). In this study, we found that recombinant events had occurred among the four Lineage 1 sequences (FJ, GD, HN, HB) and BtCoV HKU8. GD showed the highest degree of similarity to BtCoV HKU8 in the ORF1ab region with an aa identity >99% ( Table 2). The ORF1ab region of GD may have originated from BtCoV HKU8 during a co-infection event in the same bat species. However, HB showed the highest degree of similarity to BtCoV HKU8 in the S region, with an aa identity of 95.7% ( Table  2). The S region of HKU8 may be the parental sequence of the equivalent region in HB. Considering the diversity of the S region in Lineage 1 CoVs, we analyzed possible recombination events in Lineage 1 BtCoVs from different sites in China by detecting putative breakpoints and using SimPlot software (Wu et al., 2015). GARD analysis results were consistent with the bootscan analysis results, and three recombination breakpoints were found in the alignments of GD, HB, HN, FJ, and BtCoV HKU8 from M. pusillus (nt 20,930, nt 26,861, and nt 28,128, respectively) (Wu et al., 2015). The positions of the detected breakpoints corresponded to the areas of recombination.

DISCUSSION
In this study, we detected and characterized alpha-CoVs carried by M. fuliginosus bats in China. M. fuliginosus-related alpha-CoVs were detected in six different provinces (Guangdong, Hubei, Fujian, Henan, Anhui, and Jiangxi), representing the middle, eastern, and southern parts of China. Based on genetic and phylogenetic analyses, these alpha-CoVs could be classified into two distinct lineages, Lineage 1 and Lineage 2. Lineage 1/Lineage 2 co-infections were detected in two specimens collected from Guangdong and Henan (Wu et al., 2015).
Lineage 1 and Lineage 2 CoVs showed high intra-lineage genomic similarities, except in the S region. This high similarity suggests each lineage shared a common ancestor. However, Lineage 1 genomes (GD, HB, FJ, and HN), isolated from Guangdong, Hubei, Fujian, and Henan provinces, presented marked differences in the S region, and phylogenetic analysis of S proteins showed that Lineage 1 CoVs formed two distinct clusters, comprising GD, FJ, and HN in one cluster, and HB in a relatively distant cluster. The same CoV in one bat species had thus evolved diverse S proteins in different provinces. Different environmental pressures, including food availability, climate, shelter, and predators, may have exerted different selection pressures on the CoVs in the same bat species in different locations, leading to the emergence of a novel S protein subtype in the same CoV isolated from different regions.
The S protein in CoV is responsible for receptor binding and host-species adaptation, and is one of the major determinants of specificity of host-species infection (Dveksler et al., 1991;Lau et al., 2005Lau et al., , 2007. The S protein gene therefore constitutes one of the most variable regions within the CoV genome. GD in M. fuliginosus and BtCoV HKU8 in M. pusillus showed a higher degree of genomic similarity than any of the other CoVs, except in the S region. Phylogenetic analysis of the S protein revealed that BtCoV HKU8 clustered with HB, rather than with GD; indeed the BtCoV HKU8 S protein exhibited higher identity with HB than the other three Lineage 1 CoVs, including GD. Phylogenetic analysis, similarity plots, bootscan analysis, and recombination-breakpoint analysis suggested that recombination occurred around the S region among BtCoV HKU8, GD, and HB (Wu et al., 2015), which may have facilitated adaptation of the virus to a new bat species, finally leading to interspecies transmission (Graham and Baric, 2010;Song et al., 2005). Furthermore, within the complete genome (including the S region), some of the established Lineage 2 CoVs (AH, JX, GD-a, HB-a, and HN-a) showed high similarity to BtCoV 1A found in M. magnater, while other Lineage 2 CoVs (GD-b and HN-b) showed high similarity to BtCoV 1B found in M. pusillus. Overall, bat migration and roosting habits provide opportunities for large numbers of bats to gather together (Cui et al., 2007;Woo et al., 2006aWoo et al., , 2006cWoo, 2006), and could explain the mechanisms whereby Miniopterus acquires various viruses and transmits them to other bat species. In addition, our findings also suggested that the S protein had undergone varying degrees of modification in response to the evolutionary pressure of adapting to a new host.
Previous studies found that CoVs are particularly host-specific, though host-shifting has also been demonstrated (Jonassen et al., 2005;Lai, 1990;Liu et al., 2005;Rest and Mindell, 2003). A larger-scale study including different geographic regions will be necessary to confirm the phenomenon of host specificity. The results of the present study showed that a single bat species (M. fuliginosus) could harbor more than one species of CoV (Lineage 1 and 2 CoVs), and that one CoV could be found in different species of bats, indicating no strict association between BtCoVs and bat species. The availability of genomicsequence data for CoVs from bat species from different locations will allow analysis of the relationships between these viruses and the geographic distribution of their hosts. Further characterization of novel CoVs revealed high genetic diversity across a large geographic distribution. Moreover, we found that the same species of bat from different geographic locations contained the same species of CoV, but with distinct S proteins.
The novel genomes described in this study represent the first genomic data for CoVs in M. fuliginosus bats in China. The results also provide the first evidence for the high diversity of S proteins within a given CoV carried by the same bat species at different locations. This diversity most likely arose as a result of environmental pressures, migration abilities, and roosting behaviors (Lau et al., 2012a). Conversely, highly similar CoV genomes, including similar or diverse S regions, were found in different bat species from different regions, suggesting that recombination and interspecies transmission may occur among BtCoVs. Recombination may create opportunities for the emergence of new viruses that might drive CoV evolution (Vijaykrishna et al., 2007;Woo et al., 2006b). Previous studies demonstrated that SARS and a number of other new human diseases have emerged as a result of interspecies transmission of viruses carried by bats. The genetic features and host restriction of BtCoVs thus remain important subjects for global public health studies. Further studies and genomic analyses of CoVs from different Miniopterus species in different regions will contribute to a better understanding of the diversity and evolution of CoVs, and periodic studies could provide genetic clues regarding potential emergent infectious viruses.

Ethics statement
The field studies did not involve endangered or protected species. Bats were treated according to the guidelines set out in the Regulations for the Administration of Laboratory Animals (Decree No. 2 of the State Science and Technology Commission of the People's Republic of China, 1988). The sampling procedures were approved by the Ethics Committee of the Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College (Approval number: IPB EC20100415).

Bat samples
Pharyngeal and anal swabs were collected from 194 captured M. fuliginosus bats from nine provinces in China. No specific permissions were required for these procedures at these locations. All bats trapped for this study were released back into their habitat after sample collection. The bat species was initially determined morphologically and subsequently confirmed by sequence analysis of mitochondrial cytochrome b DNA, as described previously (Tang et al., 2006). The samples were immersed in maintenance medium in virus-sampling tubes (Yocon, China), temporarily stored at 20°C, and then transported to the laboratory and stored at 80°C.

RNA extraction and virus detection
Viral RNA was extracted from the pharyngeal and anal swab samples using a QIAamp viral RNA minikit (Qiagen, Germany). Reverse transcription was performed using a SuperScript III kit (Invitrogen, USA). CoV screening was performed by amplifying a 440-bp fragment of the RdRp gene of CoVs using conserved primers (5′-GGTTGGG-ACTATCCTAAGTGTGA-3′ and 5′-CCATCATCAGATA-GA-ATCATCATA-3′), as described previously (Lau et al., 2012a(Lau et al., , 2012b. Polymerase chain reaction (PCR) products were gel purified using a QIAquick gel extraction kit (Qiagen). Both strands of the PCR products were sequenced twice with an ABI Prism 3700 DNA analyzer (Applied Biosystems, USA), using the two PCR primers. The sequences of the PCR products were compared with known CoV RdRp gene sequences in the GenBank database. After screening single samples with conserved primers, we confirmed the positivity rates of CoVs in each province (Figure 1).

Complete genome sequencing
We selected samples positive for CoVs that were representative of each province for genomic sequencing. The initial results revealed that they belonged to the genus Alphacoronavirus and showed close relationships with BtCoVHKU8, 1A, or 1B. We therefore amplified the cDNA using degenerate primers designed by multiple alignment of the genomes of BtCoVHKU8 (NC010438), BtCoV1A (NC010437), and BtCoV1B (NC010436). Based on the genetic sequences obtained, sequence-specific primers were used in the subsequent PCR amplifications. The primers used to amplify the fragments of each virus are available upon request. The 5′/3′ ends of the viral genomes were confirmed by rapid amplification of cDNA ends (RACE) using a 5′ RACE kit (Invitrogen) and 3′ RACE kit (TaKaRa, Japan). For PCRs with weak or non-specific products, the desired DNA fragments were cloned in DNA vectors (pGEM-T Easy vector; Promega, USA). Multiple clones from a PCR were selected for standard DNA sequencing. Sequences were assembled and edited manually to produce the final viral genome sequences. Each full genome was deduced from a single specimen.

Sequencing complete RdRp and S genes
Some positive samples did not undergo complete genome sequencing because of limited amounts of sample. To increase the accuracy of subsequent phylogenetic analyses, we amplified the complete RdRp genes of four strains and the complete S genes of three strains, in addition to the complete genomes of six strains. Sequencing was performed using the primers available from the genomic sequencing, as previously described. The sequences of the PCR products were assembled manually to produce the complete RdRp and S gene sequences.

Genomic analysis
The nucleotide (nt) sequences of the genomes and the deduced amino acid (aa) sequences of the ORFs were predicted using Vector NTI software (Invitrogen) or the ORF Finder tool of NCBI (http://www.ncbi.nlm.nih.gov/gorf/ gorf.html). Pairwise genome sequence alignment was conducted with EMBOSS Needle software (www.ebi.ac. uk/Tools/psa/emboss_needle/) using the default parameters. MEGA5.0 (Tamura et al., 2011) was used to align nt and deduced aa sequences with the MUSCLE package and default parameters. The best substitution model was then evaluated using the Model Selection package implemented in MEGA5. Phylogenetic analyses were processed by the maximum-likelihood method with an appropriate model, to create phylogenetic trees with 1,000 bootstrap replicates (Guindon et al., 2010). Protein-family analysis was performed with PFAM (Bateman et al., 2002) and InterProScan (Apweiler et al., 2001). Predictions of transmembrane domains were performed with TMHMM (Sonnhammer et al., 1998).

Recombination analysis
Recombinations among five genomes were detected with SimPlot software (version 3.5.1). We used a sliding window of 1,000 nt, which moved in steps of 300 nt, and applied the Genetic Algorithms for Recombination Detection program in the DataMonkey software package (http://www. datamonkey.org) (Kosakovsky Pond et al., 2006). When multiple breakpoints were detected between the non-recombinant and recombinant models, they were assessed by comparing the corrected Akaike's Information Criterion scores. The Kishino-Hasegawa test was applied to verify if the adjacent sequence fragments yielded significant topological incongruence.

Nucleotide sequence accession numbers
All genome sequences have been submitted to GenBank. The accession numbers for the bat alpha-CoVs are KJ473795 to KJ473805.

Compliance and ethics
The author(s) declare that they have no conflict of interest.