The name "bocaparvovirus" (BOV) is derived from the combination of “bovine parvovirus” and “canine minute virus” [1]. These two parvoviruses were first identified in cattle and dogs in the early 1960s and were both placed in the same genus, "Bocavirus", because of their similar genomic organization and high level of sequence similarity. Like RNA viruses, single-stranded DNA viruses have a high rate of evolution because they mutate faster than double-stranded viruses, and genome size appears to correlate negatively with mutation rate [2]

BOVs are commonly present as mixed infections and frequently associated with gastrointestinal and respiratory symptoms in children and neonatal animals, causing serious public-health and economic problems. In China, a two-year molecular epidemiology study showed that human bocaparvovirus (HBOV) and human metapneumovirus (HMPV) are frequently found in children with pneumonia. In recent years, various BOVs have been detected in different hosts with different geographical distributions. There is a high prevalence of feline bocaviruses in cats with diarrhea in northeastern China [3]. Pneumonia was diagnosed in 96.2% of HBOV-positive and 98.5% HMPV-positive patients [4]. Porcine bocaviruses, first identified in swine with post-weaning multisystemic wasting syndrome, were also detected in fecal samples of piglets with clinical diarrhea in southeastern China [5, 6].

The genus Bocaparvovirus belongs to the family Parvoviridae [1]. Parvoviruses are non-enveloped viruses with a round shape, T=1 icosahedral symmetry, and diameter of 18-26 nm [7]. Their linear single-stranded DNA (ssDNA) genome is approximately 4-6 kilobases (kb) in length and contains two major open reading frames (ORFs) encoding a non-structural protein (NS1) and a viral protein (VP1) [8]. BOVs have a number of characteristic features in common with other parvoviruses, but unlike most parvoviruses, BOVs have three ORFs in their genome [1, 9]. The third ORF, which encodes a nuclear phosphoprotein, NP1, lies between the ORFs encoding NS and VP.

Tufted deer (Elaphodus cephalophus) are small deer belonging to the genus Elaphodus in the subfamily Muntiacinae, family Cervidae which are found in the southwestern and southern provinces of China [10]. The tufted deer is the only member of the genus Elaphodus, so its genetic resources are an important part of the genetic gene pool of the family Cervidae. Tufted deer are listed as near threatened on the IUCN 2013 Red List of Threatened Species VER3.1 – vulnerable due to habitat loss and illegal hunting and trade [11].

In October 2019, we collected fecal samples of four healthy wild tufted deer from the Wannan Wild-Animal First-Aid Center in the south of Anhui province of China. All samples were shipped on dry ice. Prior to viral metagenomics analysis, the collected fecal samples were diluted in 1 mL of Dulbecco’s phosphate-buffered saline (DPBS) and vortexed vigorously for 5 min. After centrifugation for 10 min at 15000 × g, the supernatants were collected in 1.5-ml centrifuge tubes and stored at -80 ℃ for later use.

The four samples were combined into one pool for next-generation sequencing. The supernatant enriched in viral particles was purified using a 0.45-μm syringe filter (Millipore), and the filtrate was treated at 37 °C for 90 min with a mixture of DNase, RNase, Benzonase, and Baseline-ZERO to digest unprotected nucleic acid [12]. Total nucleic acid was then extracted using a QIAamp MinElute Virus Spin Kit (QIAGEN) as per the manufacturer’s instructions. A Nextera XT DNA Sample Preparation Kit (Illumina) was then used to construct a 250-bp paired-end cDNA library, and the sample pool was sequenced using an Illumina MiSeq platform for double barcode sequencing. Paired-end reads generated by MiSeq were debarcoded using Illumina’s vendor software and used for further bioinformatics analysis. The data were processed on a 32-node Linux cluster. Low-quality reads, tag sequences, and duplicates were removed, and the cleaned reads were assembled de novo within each barcode using the ENSEMBLE assembler [13]. Unassembled reads and contigs were then compared to a custom-built viral proteome database using BLASTx, with an E-value cutoff of <10-5. To eliminate false-positive viral reads, the candidate viral reads were then compared to a non-virus non-redundant protein database consisting of non-viral protein sequences extracted from an NCBI nr fasta file (based on annotation taxonomy excluding Virus Kingdom). Contigs without significant BLASTx similarity to the viral proteome database were searched against viral protein families in the vFam database [14] using HMMER3 [15,16,17] to detect remote viral protein similarities.

The reads that were categorized as having been derived from cellular organisms such as archaea, eukaryotes, and bacteria, as well as reads with no obvious resemblance to amino acid sequences in the non-redundant protein database, were removed. A total of 747 Mb of viral nucleotide sequence data was obtained, consisting of 6,134,914 reads. Of these, 119 showed the best match with bocaparvovirus proteins, with a cutoff E-value of 10-5. De novo assembly using Geneious Prime generated four genome fragment contigs of 1517, 1133, 885, and 251 nt. The approximate locations of the four contigs in the genome were determined by BLASTn. The sequence identity values for the four contigs ranged from 77.78% to 93.52%.

To determine the genome sequence of the novel BOV, nested PCR and primer walking were performed. Primers were designed based on the sequences of the four contigs, and their sequences are shown in Supplementary Table S1. The PCR thermocycler program consisted of 5 min at 95 °C, 31 cycles for 30 s at 95 °C, 30 s at 50 °C (first round) or 55 °C (second round), and 60 s or 40 s at 72 °C, and a final elongation step at 72 °C for 5 min. The premixed enzyme rTaq (TaKaRa) was used in the reaction system. Primers used for PCR were designed based on the virus contigs, using Geneious Prime [18]. In addition, primer walking was performed to obtain the sequences of the 5’UTR and 3’ end of the VP gene. The PCR conditions were as follows: 95 °C for 5 min, 31 cycles of 95 °C for 30 s, 46 °C (for the first round) or 48 °C (for the second round) for 30 s, and 72 °C for 40 s, and a final elongation step at 72 °C for 5 min. The PCR results showed that only one of the four fecal samples was positive for the novel BOV, and the viral genome from this positive sample was sequenced. At this point, the genome sequence was still incomplete (4,884 nt in length).

To complete the genome sequence of ECBOV, a next-generation sequencing library based on the ECBOV-positive sample was constructed and sequenced. Using the new metagenomic data, a total of 773 reads were mapped to the partial ECBOV sequence (4,884 nt in length) using Geneious, and a consensus sequence with a length of 5,354 nt was obtained. In addition, we identified sequences of eukaryotic viruses of the families Parvoviridae and Genomoviridae, as well as bovine serum-associated circular virus and numerous bacteriophages (Supplementary Fig. 1). The nearly complete genome sequence of ECBOV was 5,354 nt long, and the G+C content was 54%. It contained a 315-nt 5’UTR, a complete ORF1, a complete ORF3, a small ORF2, and a 114-nt 3’UTR (Fig. 1). Three putative ORFs were predicted using Geneious Prime. ORF1 encodes a putative nonstructural protein, NS1, with 860 amino acids. ORF3 encodes a putative structural protein, VP1, of 662 amino acids, and ORF2 encodes a putative nonstructural protein, NP1, of 210 amino acids that is predicted to be highly phosphorylated. In NS1, several conserved domains were identified, including a replication initiator domain (xxHxHxxxxx), an SF3 helicase domain with an ATP- or GTP-binding Walker A loop (GxxxxGKT), Walker B loop (xxxxEE), and Walker B’ loop (KxxxxGxxxxxxxK). VP1 was found to contain a calcium-binding loop (YLGPF) and the putative catalytic residues (DxxAxxHDxxY) of a phospholipase A2 (PLA2) domain in the unique region, which, in other parvoviruses, is necessary for infectivity [19] (Fig. 1B).

Fig. 1
figure 1

The genome organization of ECBOV. The NS1 ORF is shown in pink, and the NP ORF is shown in purple. The right end of VP1 is truncated and shown in blue. (A) Identification of the replication initiator domain and SF3 helicase domain in the NS1 protein. (B) Identification of the phospholipase A2 (PLA2) domain in the N-terminal portion of the VP1 protein.

Phylogenetic analysis was performed using the ECBOV sequence, 21 genomic sequences retrieved from the GenBank database that exhibited sequence similarity in a BLASTx search, bocaparvovirus reference sequences, and outgroup sequences. Amino acid sequence alignments used for all trees were made using MUSCLE, implemented in MEGA 10.2.2. The resulting alignment was trimmed manually, and phylogenetic analysis was performed using MrBayes v3.2 by the mixed-models and Markov chain Monte Carlo (MCMC) methods [20]. All trees were visualized in FigTree v1.4.3, and the midpoints were rooted for clarity. Bayesian consensus trees were constructed based on the amino acid sequences of the NS and VP proteins. As shown in Fig. 2, ECBOV branched together with ungulate bocaparvovirus 6 (accession number NC_030402), bovine parvovirus (NC_001540), and bovine parvovirus 1 (NC_038895 and MW032436). The NS1 protein of ECBOV had 62.66% amino acid sequence identity to NS1 of bovine parvovirus-1 (NC_038895, query coverage: 99%). The VP protein had 63.57% amino acid sequence identity to the corresponding structural protein of ungulate bocaparvovirus 6 (KU172422, query coverage: 100%). The putative NP had 45% to 60% amino acid sequence identity to the NPs of other members of the family Parvoviridae. In 2019, new taxonomic guidelines were established for the family Parvoviridae [21]. Members of a genus should belong to the same monophyletic lineage based on their NS1 protein sequence, with at least 35-40% amino acid sequence identity between any two members and a coverage rate of at least 80%. Members of the same species should have >85% sequence identity in the NS1 protein. Based on these criteria, ECBOV should be classified as a member of a novel species in the genus Bocaparvovirus.

Fig. 2
figure 2

Bayesian consensus trees based on the NS1 (A) and VP (B) amino acid sequences of parvoviruses. Bootstrap values are shown for each node. The bar indicates 0.3 amino acid substitutions per site. The percentage of sequence identity to ECBV is shown for each virus.

The nearly complete genome sequence of ECBOV was deposited in the GenBank database under accession number MW824450.2. The biological data and information about the tufted deer faeces samples has been deposited in the NCBI database under the BioProject accession number PRJNA705580. The first and second next-generation sequencing data sets are available in the SRA database under the accession numbers SRX10243189 and SRX11527391, respectively.

Knowledge about viral diversity in the family Parvoviridae has increased greatly in the past few years, partly due to the application of viral metagenomics technology, which has provided many new genome sequences and revealed entirely new lineages. The traditional taxonomy based on host specificity has not been an appropriate framework for classifying members of the family Parvoviridae. A new system was therefore established [21] by the International Committee on Virus Taxonomy that divided the parvoviruses into three subfamilies: Parvovirinae, Densovirinae, and Hamaparvovirinae.

BOVs infect a wide range of hosts, with documented cross-species transmission between human and non-human primates [22]. According to the hosts they infect, BOVs have been assigned to 28 species, including Carnivore bocaparvovirus 1–6, Chiropteran bocaparvovirus 1-5, Lagomorph bocaparvovirus 1, Pinniped bocaparvovirus 1-2, Primate bocaparvovirus 1-3, Rodent bocaparvovirus 1-2, and Ungulate bocaparvovirus 1-9. Phylogenetic analysis showed the closest evolutionary relationship between ECBOV and bovine-associated bocaparvoviruses (BOBOV). BOBOV was first isolated in 1961 from the gastrointestinal tract of normal calves by Abinanti and Warfield [23]. BOBOV was always found in coinfections with other well-characterized virus such as bovine rotavirus or bovine viral diarrhea virus, in cases of gastrointestinal diseases [24], and with bovine coronavirus or bovine rhinitis A and B viruses in the case of respiratory diseases [25]. BOVs have been known in veterinary and clinical medicine since the discovery in 2005 of HBOV, which can cause lower respiratory disease in children [26]. When the virus is released from the respiratory or gastrointestinal tract of an infected host, it can be transmitted horizontally to other individuals. In addition, bocaparvoviruses are able to infect mammalian embryos through vertical placental transmission and may cause fetal death and repeated abortion [27].

In this study, the virome was found to include numerous bacteriophages and eukaryotic viruses, but no additional common mammalian viruses were found. The presence of a large number of phages suggested the presence of numerous bacterial species in the intestines of tufted deer, but whether any of them are associated with disease remains to be investigated. Due to the increasingly frequent interactions between humans and wildlife, further surveillance studies are needed to identify any threats to human and animal health. While the fecal samples in this study were collected from seemingly healthy tufted deer, previous studies have suggested that HBOV1 is an important pathogen associated with lower respiratory tract illnesses [28]. Persistent infection with BOVs is thought to be responsible for coinfection with other viruses. It is speculated that BOVs may be both passengers and pathogenic agents. In a study of acute wheezing disease in children, Bodewes et al. [29] found that rhinoviruses were associated with Th2-type cytokine responses or systemic proinflammatory responses, while coinfection with HBOV1 and rhinoviruses led to non-Th2-type cytokine responses. Another study showed that HBOV NP inhibited the production of IFN-β in vitro, which increases susceptibility to viral infection [30]. Whether other viruses act as helper viruses for BOVs is still unknown because multiple viruses are often present. The association of BOVs with specific diseases is inconclusive, and larger prospective studies are needed to investigate the pathogenicity of BOVs.

Although serological evidence and in vitro virus replication studies are still needed, considering that ECBOV has been found in an ungulate, the genomic structure is typical of BOVs, and the virus shares 45 to 65% amino acid sequence identity with BOBOV, we propose that this novel virus should be considered a member of a new species. Although this parvovirus may cause only inapparent or subclinical infections, early detection of such viruses in wild animals provides an opportunity to investigate potential threats. Recent human activities, ranging from concentrated animal feeding operations to illegal trade in wild animals, have triggered epidemics caused by zoonotic viruses. Future studies are needed to elucidate the distribution, host range, and cross-species transmission of ECBOV potential to better understand its epidemiological significance.