Introduction

The family Circoviridae was established in the mid-1990s when it was recognized that animal viruses with circular, single-stranded DNA (ssDNA) genomes were distinct from other eukaryotic ssDNA viruses classified at the time, including plant viruses with circular genomes (Geminiviridae) and animal viruses with linear genomes (Parvoviridae) [47, 83]. Originally, all the known animal viruses with covalently closed circular ssDNA genomes were classified in a single genus, Circovirus, within this family [47]. These animal viruses included avian and swine pathogens, namely beak and feather disease virus (BFDV), chicken anemia virus (CAV) and porcine circovirus (PCV, specifically the virus currently known as PCV-1). However, it is now recognized that animal viruses with circular ssDNA genomes are highly diverse and genome structure alone cannot be used for their taxonomical classification.

The last published report from the International Committee on Taxonomy of Viruses (ICTV), the 9th report reflecting the taxonomy from 2009, lists 12 viral species in the family Circoviridae [4]. These species are divided into two genera: Circovirus, which contained 11 species infecting birds and pigs, and Gyrovirus, which only included CAV. However, the progressive identification of new genomic sequences similar to members of the genus Circovirus in various animals and environmental samples subsequent to the release of the 9th report has prompted a revision of the classification framework used for this family. Furthermore, structural and genomic data indicate that CAV may represent a different lineage of ssDNA viruses and, thus, the genus Gyrovirus needed to be removed from the family Circoviridae. The genomic features of CAV are more closely aligned with ssDNA viruses in the family Anelloviridae and, thus, the genus Gyrovirus was reassigned to this family.

In this article we re-visit genomic features that characterize members of the family Circoviridae and provide an update to the taxonomy reported in the ICTV 9th report by: 1) establishing a new genus, Cyclovirus, that accommodates a distinct group of viruses closely related to members of the genus Circovirus; 2) reassigning the genus Gyrovirus to the family Anelloviridae; and 3) redefining the species demarcation criteria for members of the family Circoviridae and implementing a genome-wide pairwise identity based approach to classify known as well as new viruses that have been reported within the past six years.

Conserved genomic features among members of the family Circoviridae

Genus Circovirus

Most of what is known about circoviruses, or members of the genus Circovirus, comes from veterinary science-related research since these viruses are responsible for fatal diseases that affect birds (e.g., BFDV) and swine (e.g., PCV-2) [83]. In fact, until 2010, pigs were the only mammals known to be affected by circoviruses and most of the diversity for this group of viruses was reported from avian species (Table 1). However, studies employing viral metagenomic-based strategies and degenerate PCR for circoviruses in unconventional hosts have since identified the presence of circovirus genomes in freshwater fish [45, 46] and various mammals, including bats [3436, 42, 91], chimpanzees [34], dogs [37], humans [34] and minks [41]. Although a definitive host has not been confirmed for some of these newly-detected circoviruses (e.g., bat-associated circoviruses), phylogenetic analyses indicate that circovirus genomes detected in mammals, in general, are more closely-related to each other than to avian circoviruses (Fig. 1).

Table 1 List of viral species classified within the family Circoviridae
Fig. 1
figure 1

Maximum likelihood (ML) phylogenetic trees of representative sequences from members of viral species within the genus Circovirus (top) and the genus Cyclovirus (bottom). The ML trees were inferred using PhyML [21] with the GTR+G model of substitution after aligning complete genome sequences using the MUSCLE algorithm [15]. Branches with <80% SH-like support have been collapsed. The phylogenetic trees of circoviruses and cycloviruses were rooted after using cyclovirus or circovirus reverse complemented genome sequences as an outgroup, respectively. Species acronyms are defined in Table 1

Circovirus genomes range in size from ~1.8 to ~2.1 kb and are packaged within non-enveloped virions that have an icosahedral T = 1 structure and have an average diameter of ~15 - 25 nm [10, 69, 82, 83]. All circovirus genomes have an ambisense organisation containing open reading frames (ORFs) arranged on different strands of a dsDNA replicative form. Two major ORFs (>600 nt), encoding the replication-associated protein (Rep) on the virion strand and capsid protein (Cp) on the complementary strand of the replicative form, can be readily identified in circovirus genomes [4] (Fig. 2). The Rep, which has sequence motifs characteristic of proteins involved in rolling circle replication (RCR; see below) [28], is the most conserved circovirus protein. On the other hand, the Cp is significantly divergent and is only characterized by an N-terminal region rich in basic amino acids that may provide DNA binding activity [57]. Although Rep- and Cp-encoding ORFs are present in all circovirus genomes, other proteins may also be expressed by several circovirus species. For example, more than six ORFs have been identified in both avian and porcine circovirus genomes (e.g., [2, 23]) and BFDV virions have been consistently found associated with up to three proteins [68, 69]. Notably, porcine circoviruses (PCV-1 and PCV-2) are known to encode a third protein, denominated VP3, with apoptotic capacity [26, 32, 44] as well as a fourth one, ORF4, with a potential anti-apoptotic function [48].

Fig. 2
figure 2

Genome schematics illustrate the major open reading frames (ORFs) characteristic of members of the Circoviridae (left) and Anelloviridae families (right). Members of the family Circoviridae, including the Circovirus and Cyclovirus genera, have two major ORFs encoding replication-associated (Rep) and capsid (Cp) proteins as well as a conserved nonanucleotide motif marking the origin of replication. The nonanucleotide motif sequence is depicted through sequence probability logos generated in Weblogo 3 [9]. Note that the orientation of major ORFs relative to the nonanucleotide motif differs between genomes representing the Circovirus and Cyclovirus genera. The rep of the Cyclovirus type species, Human associated cyclovirus 8 (HuACyV-8), is interrupted by an intron. Although the presence of introns has been observed in various cyclovirus genomes, this has not been reported for circoviruses (Supplemental Figures 1 and 2). In contrast to the Circoviridae, members of the family Anelloviridae consistently have three major ORFs in an unisense organization. The genera Gyrovirus, which currently includes Chicken anemia virus (CAV) alone, and Alphatorquevirus, with Torque teno virus 1 (TTV-1) as the type species, are shown to highlight similarities among these ssDNA viruses. The CAV-VP1 and TTV-1 ORF1 are thought to represent capsid proteins. Proteins labelled as VP2 and ORF2 in CAV and TTV-1 genome schematics, respectively, contain a motif characteristic of protein tyrosine phosphatases. In addition, the CAV-VP3 and TTV-1 ORF3 proteins exhibit apoptotic activity

The presence of a well-conserved Rep in circovirus genomes suggests that these viruses replicate through RCR and recombinant expression studies support this idea [7, 16, 51, 77]. Circovirus Reps are characterized by the presence of three RCR motifs at the N-terminus including, motif I [FT(L/I)NN], motif II [PHLQG] and motif III [YC(S/x)K] where “x” represents any residue [72] (Fig. 3). In addition to the RCR domain, circovirus Reps contain dNTP-binding or P-loop NTPase domains characteristic of superfamily 3 (SF3) helicases which are distinguished by the presence of three conserved motifs tightly packed in a domain containing ~100 amino acids [19, 20]. The SF3 helicase motifs found in circovirus Reps include Walker-A [G(P/x)(P/x)GxGK(S/t)], Walker-B [uuDDF], and motif C [uTSN] where “x” represents any residue, “u” represents a hydrophobic amino acid (i.e., F, I, L, V, M), and residues in lower case are observed at lower frequency [72].

Fig. 3
figure 3

Sequence probability logos generated using Weblogo 3 [9] highlighting conserved amino acid residues characteristic of rolling circle replication (RCR), including RCR motifs I through III, and superfamily 3 (SF3) helicase motifs, including Walker A and B as well as motif C, found in replication-associated proteins (Rep). The logos were generated using representative sequences from members of the 70 species in the family Circoviridae (Circovirus, n=27; Cyclovirus, n=43). Numbers below Circovirus and Cyclovirus logos indicate the relative amino acid position for each motif based on the type species, including Porcine circovirus 1 (accession number AF071879) and Human associated cyclovirus 8 (accession number KF031466), respectively

In addition to conserved gene synteny and the presence of a Rep with characteristic RCR and helicase motifs, circovirus genomes have two intergenic regions (IR) and a conserved origin of replication (ori) (Fig. 2). The ori is located within the IR located between the 5′ ends of the Rep- and Cp-encoding ORFs [50]. The circovirus ori is characterized by a conserved nonanucleotide motif ‘(T/n)A(G/t)TATTAC’ (Fig. 2), where lower case nucleotides are observed at low frequency and ‘n’ represents any nucleotide, located at the apex of a potential stem-loop structure [50, 72]. The Rep introduces a nick in the virion-sense strand between positions 7 and 8 of the nonanucleotide motif (i.e., TAGTATTAC), presumably initiating circovirus genome replication through RCR [77].

Genus Cyclovirus, a newly established taxon

In 2010, a group of viruses most-closely related to circoviruses were discovered through viral metagenomic analysis and degenerate PCR in stool samples from primates (humans and chimpanzees) and meat products from various animals (camels, chickens, cows, goats, and sheep) [34]. These viruses were tentatively named cycloviruses to distinguish them from the already known circoviruses, while still highlighting the circular topology of their genomes [34]. Cycloviruses are closely related to circoviruses and share genomic features with this well-known group of animal pathogens [14]. However, phylogenetic and genomic differences between circoviruses and cycloviruses justified the creation of a second taxonomic unit, the newly established genus Cyclovirus, within the family Circoviridae.

In contrast to circoviruses, cycloviruses have been found associated with both vertebrates and invertebrates. Although cycloviruses were discovered in stool samples from primates [34], cyclovirus genomes have now been reported from a diversity of specimens including mammals (bats, cats, cows, goats, horses, squirrels, sheep) [18, 34, 36, 38, 43, 49, 73, 91, 92], birds (chickens) [34, 36], and insects (cockroaches and dragonflies) [11, 59, 70, 71] (Table 1). Additionally, a diversity of cyclovirus genomes have been recovered from human samples other than faeces [17, 34, 80], including cerebrospinal fluid [80], blood serum [76], and respiratory secretions [63]. However, it has been difficult to identify a definitive host for most, if not all, cycloviruses since these viruses have only been identified through metagenomic analysis and degenerate PCR. This is further complicated by the fact that many cycloviruses have been discovered from guts and fecal samples (Table 1), which may include viruses that are dietary in origin. Moreover, phylogenetic analysis of cyclovirus genomes did not reveal any clusters by the type of organism from which they were identified (Fig. 1).

Similar to circoviruses, cycloviruses have small genomes (~1.7 to 1.9 kb) that contain two major ORFs encoding the Rep and Cp organized in two different strands of a dsDNA form (Fig. 2) [34, 72]. Both Rep and Cp of cycloviruses have similar features when compared with their corresponding circovirus proteins (reviewed by [14, 72]). The cyclovirus Rep contains motifs similar to circovirus RCR and SF3 helicase motifs, including RCR motifs I [FT(L/W)NN], II [(P/x)HLQG] and III [Y(C/l)(S/x)K] and SF3 helicase motifs Walker-A [G(P/x)(P/t)(G/x)xGKS], Walker-B [uuDDF], and motif C [uTS(N/e)] where “x” represents any type of residue and “u” represents a hydrophobic amino acid (i.e., F, I, L, V, M). The cyclovirus putative Cp is also characterized by an N-terminal region rich in basic amino acids as seen in circovirus Cps. Furthermore, the putative cyclovirus ori is marked by the same conserved nonamer observed in circoviruses (‘TAGTATTAC’) located at the apex of a potential stem-loop structure (Fig. 2) [72].

Despite these similarities, phylogenetic analyses of the Rep and Cp show that circoviruses and cycloviruses form distinct clades (Fig. 4). Furthermore, there are key genomic features that distinguish cycloviruses from circoviruses (Fig. 2). Cyclovirus genomes contain an IR between the 5′ ends of Rep- and Cp-encoding ORFs; however, the IR between the 3′ ends of these major ORFs is either absent or consistently smaller than that observed in circovirus genomes [34, 72]. Additionally, cycloviruses and circoviruses might employ slightly different replication and transcription strategies since their genome coding regions are organized differently. Although the cyclovirus virion strand has not been empirically determined, it is predicted that these viruses encapsidate the strand containing the conserved nonanucleotide motif based on what has been shown for other eukaryotic Rep-encoding, ssDNA viruses [72]. Using the nonanucleotide motif as a point of reference, it is clear that the genome organization of cycloviruses seems to be a mirror image from the one seen in circoviruses. The putative ori, which is marked by the nonanucleotide motif, is located on the Rep-encoding strand of circoviruses, while in cycloviruses is located on the Cp-encoding strand [72]. In addition to differences in genomic arrangement, the presence of introns has been identified within the ORFs of several cyclovirus genomes, whereas this has not been reported for circoviruses (Supplemental Figures 1 and 2).

Fig. 4
figure 4

Maximum likelihood (ML) phylogenetic trees of representative replication-associated protein (Rep; left) rooted with Reps of closely related sequences (GenBank accession #s JX904473, KC248418 and KJ547623) and mid-point rooted capsid protein (Cp; right) amino acid sequences of representatives from viral species within the Circovirus (blue font) and Cyclovirus (purple font) genera. The ML trees were inferred using PhyML [21] with the LG model of substitution after aligning amino acid sequences using the MUSCLE algorithm [15]. Branches with <80% SH-like support have been collapsed. Species acronyms are defined in Table 1

Reassignment of the genus Gyrovirus to the family Anelloviridae

Since CAV, BFDV, and PCV were the first described animal viruses with circular ssDNA genomes, studies sought to better understand the relationship between these three viruses. As early as 1991 it was noted that CAV virions had marked physicochemical and morphological differences when compared to BFDV and PCV, which were indistinguishable under the same conditions [82]. Therefore, even before animal viruses with circular ssDNA genomes were classified, it was ‘inadvisable’ to place CAV in the same family as BFDV and PCV based on comparative electron microscopy [82]. Despite these observations, CAV became the type species of the family Circoviridae when it was originally reported as an official virus family in 1995 with the release of the ICTV 6th Report [47]. Nevertheless, early on it was recognized that CAV did not share the same characteristics as BFDV and PCV and thus a second genus, Gyrovirus, was created within the family Circoviridae in 1999 to accommodate CAV [27]. The genus Gyrovirus, with CAV as its sole member, remained in the family for more than a decade. However, mounting genomic, molecular, and structural data indicated that CAV represents a different lineage of ssDNA viruses as originally suspected, thus granting the placement of the genus Gyrovirus in a separate family.

Although CAV virions are non-enveloped and icosahedral, they are larger than circovirus virions and have a unique structure with protruding pentagonal shaped units compared to flat pentameric units observed in circoviruses [10, 82]. In addition to appearing structurally unrelated to circoviruses, CAV has a genomic architecture that is radically different from that of circoviruses. The CAV genome has a negative-sense organization, containing three major overlapping ORFs in the virion or genomic strand [31, 58] (Fig. 2). Moreover, CAV genomes do not encode an identifiable Rep and lack the conserved nonanucleotide motif that marks the ori of circoviruses and cycloviruses [53, 57, 72].

Gyroviruses do not appear to be either structurally or genetically related to members of the family Circoviridae. Instead, the genomic features of CAV are reminiscent of members of the family Anelloviridae (Fig. 2). The family Anelloviridae, which accommodates 12 genera of vertebrate-infecting viruses, currently represents the only other family of animal viruses with covalently closed, circular ssDNA genomes [5]. Although sequence variability within anelloviruses is extremely high, both at the nucleotide and amino acid levels, their genomes have common features [3]. It is important to note that the common genomic features that justified the creation of the family Anelloviridae [3] are also observed in CAV genomes. Similar to CAV, anellovirus genomes have a negative-sense organization with overlapping ORFs with similar relative sizes [3, 12, 54, 79]. In addition, anellovirus and CAV genomes contain GC-rich stretches in non-coding regions where replication is thought to initiate due to the presence of potential stem-loop structures [2, 13]. The non-coding or untranslated region of CAV and some anellovirus genomes also contains similar regulatory elements and share high identity within a 36-nt stretch [25, 54]. Moreover, CAV and some anelloviruses have similar transcription patterns [25, 30].

Although CAV and anelloviruses do not have significant sequence similarities, these viruses may encode proteins with similar functions. The largest ORF observed in CAV and anellovirus genomes encodes for a putative structural protein, named VP1 and ORF1, respectively. Both VP1 and ORF1 are characterized by an N-terminal region rich in basic amino acids, which resembles Cps encoded by members of the family Circoviridae [3, 57]. Biochemical characterization of CAV virions revealed VP1 as the sole Cp component for this gyrovirus [81]; however, this has not been conclusively demonstrated for the anellovirus ORF1. CAV and anellovirus genomes also have two other ORFs that encode proteins with similar characteristics and are of comparable sizes. The VP2 and ORF2 of CAV and anellovirus genomes, respectively, encode a protein containing a motif characteristic of protein tyrosine phosphatases (i.e., WX7HX3CXCX5H, where ‘X’ represents any residue) [60]. In addition, the CAV-VP3 and ORF3 of some anelloviruses encode a protein with comparable apoptotic capacity [33, 65].

Based on similarities in genome architectures and transcription profiles, gyroviruses are more closely related to members of the family Anelloviridae rather than to those of the Circoviridae. Therefore, the genus Gyrovirus has been reassigned to the former family. Until 2011, CAV was the only described gyrovirus species. However, other potential members of the genus Gyrovirus have been recently discovered in humans and birds. These novel gyroviruses (GyVs) have been identified in human skin (human gyrovirus (HGyV) [74]) and feces (GyV3 through GyV6; [8, 61, 62]), chicken serum (avian gyrovirus 2 (AGV2) [66]) and meat (GyV4 and GyV7; [8]) as well as spleen and uropygial gland tissues from sea birds (GyV8; [39]). Since HGyV and AGV2 may represent the same species [61], the genus Gyrovirus should soon be updated to reflect at least eight viral species.

Revised species demarcation criteria for the family Circoviridae

The ICTV 9th Report specified a species demarcation criteria for members of the family Circoviridae (i.e., genus Circovirus) of <75% genome-wide identity and <70% amino acid identity for the Cp [4]. However, these criteria were based on the distribution of pairwise identities derived from global alignments which may be inaccurate due to inconsistencies introduced during alignment and gap-handling issues [55]. Furthermore, the above mentioned species demarcation criteria did not include members of the newly established genus Cyclovirus and circovirus-related genomes that have been reported since 2009. Therefore, there is a need to re-evaluate the species demarcation criteria for members of the family Circoviridae. Here we use a pairwise identity distribution analysis to establish species demarcation criteria to classify viruses within the family Circoviridae.

Pairwise identities derived from aligning pairs of genome sequences individually have been recently used as a more standardized and accurate method to calculate genome identity scores [55, 56]. Hence, we reanalyzed and calculated genome-wide pairwise identities for all the circovirus and cyclovirus genomes available in public databases through June 2016 using SDT v1.2 [56] (Supplemental Figures 3 and 4; Supplemental File 1). The analysis shows that a pairwise identity species cut-off between 78% - 80% is best suited for both circoviruses and cycloviruses (Fig. 5). Since PCV-1 and PCV-2 share 79% genome-wide pairwise identity (Supplemental Figure 3) and these viruses have been historically considered to represent separate species due to marked biological differences (i.e., pathogenicity), we have delineated the species demarcation threshold at 80%. Therefore, the newly established species demarcation criteria will maintain the current classification for all previously characterized circovirus species, including the closely-related PCV-1 and PCV-2. The 80% genome-wide pairwise identity cut-off is well supported phylogenetically for both circoviruses and cycloviruses (Fig. 1). As a general rule, viruses that have <80% genome-wide pairwise identities with other members of the family Circoviridae coupled with phylogenetic support should be considered distinct species.

Fig. 5
figure 5

Distribution of pairwise identities among members of the genus Cyclovirus (purple bars; left) and the genus Circovirus (blue bars; right). Plots reflect pairwise identities based on calculations for complete genome sequences (top) as well as the replication-associated (rep; middle) and capsid (cp; bottom) genes. All pairwise identities were calculated using the Sequence Demarcation Tool version 1.2 [56] with the MUSCLE alignment algorithm [15]

The 80% identity threshold for distinct species generally holds true for pairwise comparisons of either the cp or rep gene sequences (Fig. 5). However, only complete genomes should be considered for assignment of new species in the event that they share <80% pairwise identity with any classified species within the family Circoviridae. Notably, pairwise comparisons between circoviruses and cycloviruses indicate that members of the family Circoviridae can have overlapping pairwise identity ranges based on the Rep. Cyclovirus Reps may share up to 50% amino acid pairwise identity with those of some circoviruses, which is higher than pairwise identities observed among cycloviruses (as low as 36%) and circoviruses (as low as 40%) alone (Supplemental Figure 5). Therefore, the classification of newly identified viruses with significant similarities to either circoviruses or cycloviruses should be primarily based on genomic characteristics and genome-wide pairwise identities. Rep identities and genome organization (specifically the location of the ori relative to coding regions) should be used to identify a genome as belonging to either the Circovirus or Cyclovirus genus. Once this distinction is made, circovirus and cyclovirus nucleotide sequences should be analysed independently for pairwise identity comparisons and taxonomic classification. Otherwise, obtaining a reliable alignment of complete genome sequences is extremely difficult since cycloviruses and circoviruses have a different genome organization. Even if cyclovirus genomes could be aligned to circoviruses to some degree by reverse complementing cyclovirus sequences, such an approach is not recommended since the Cp is highly divergent compared to the Rep.

After determining that 80% genome-wide pairwise identity was a suitable threshold for species demarcation, we classified all of the Circoviridae-related genomes available in GenBank through the end of June 2016 (Table 1). Note that the original published names for many of the genomes have been changed for consistency and that viruses for which a definitive host has not been identified are denoted by the presence of the word ‘associated’ in the species name. The latest analysis indicates that there are 27 distinct circovirus species. PCV-2 is, by far, the species with the most reported genome sequences, followed by BFDV, reflecting the interest in these animal pathogens over the years (Fig. 6). After the ICTV ratification of circovirus species listed here, a third porcine circovirus species, named PCV-3, was discovered [40]. The PCV-3 genome sequence was not included in the analyses presented in this taxonomy update; however, preliminary analyses indicate that indeed PCV-3 represents a new circovirus species. Although cycloviruses were discovered more recently than circoviruses, there are twice as many species (n = 43) reported to date that belong to the Cyclovirus genus than there are species from the genus Circovirus. It is possible that cycloviruses are more diverse and widespread than circoviruses since the former have been reported from both vertebrate and invertebrate animals, whereas circovirus genomes have only been found associated with birds, freshwater fish, and mammals.

Fig. 6
figure 6

Diversity of variants within each assigned species in the Circovirus (blue font) and Cyclovirus (purple font) genera. The total number of isolates within the species is given in parentheses by each species name. Details regarding genome sequences used for this analysis are provided in Supplementary File 1

Recommendations for classifying Circoviridae sequences

As more metagenomic analyses are performed on a diversity of organisms and environments, it is expected that novel circoviruses and cycloviruses will continue to be identified. In an effort to standardize the taxonomy framework for the family Circoviridae, here we outline a few guidelines regarding how to analyse and report genomes representing members of this family.

  1. 1)

    Genome sequence verification. ICTV will now consider classifying viral species based on complete genome sequences derived from metagenomic analyses [75]. Since the genomes of members of the family Circoviridae are relatively small (~2 kb) and have a circular topology, efforts should be made to close genomes and, whenever possible, verify genomes through inverse PCR. One of the biggest limitations of metagenomically-derived genomes is that there are no isolates of the viruses being described. Amplifying complete genomes through inverse PCR and cloning before sequencing will provide a biological archive of described viral genomes within plasmids. Additionally, working with PCR-verified genomes will ensure that the reported sequences are not chimeric entities assembled from metagenomic data.

  2. 2)

    Submitting sequences to GenBank. It is important to provide the correct annotation for major proteins (Rep and Cp) when submitting genome sequences to GenBank, including the identification and removal of introns. In addition, for consistency, all genomes should be submitted starting with the first nucleotide of the nonanucleotide motif. This will ensure that the genomes are reported in the correct orientation, which is a key feature for distinguishing between circovirus and cyclovirus genomes. All species level representative sequences analysed for this manuscript have been provided as supplemental material following the format indicated above, including complete genomes (Supplemental Files 2A and 2B) as well as Rep (Supplemental Files 3A and 3B) and Cp (Supplemental Files 4A and 4B) coding sequences.

  3. 3)

    Naming species. The following basic rules should be implemented when naming viral species:

    1. a.

      If a given genome has >80% genome-wide pairwise identity with a genome sequence from a member of a previously described viral species, the name from the existing species should be adopted and a specific isolate name should be provided. For example, a newly identified virus in foxes that has a genome-wide pairwise identity of >80% with sequences from members of the species Canine circovirus could be named ‘Canine circovirus [fox-associated isolate 1]’.

    2. b.

      If a novel virus representing a new species is being described (i.e., the genome has <80% genome-wide pairwise identity with known viral sequences) the word ‘associated’ should be added as a modifier to the species name, unless there is biological data identifying a definitive host (e.g., Bat associated cyclovirus # species).

  4. 4)

    Resolving species assignment conflicts. There might be instances when it is difficult to classify a new sequence based on the established species demarcation criteria. Here we provide guidelines on how to approach conflicts based on recommendations for classifying species from the family Geminiviridae, another group of Rep-encoding circular, ssDNA viruses [6, 55, 87, 88]. Species assignments might be uncertain when:

    1. a.

      A given sequence shares >80% genome-wide pairwise identity with viral sequences within a particular species, but <80% identity with other isolates of that same species. This will lead to more than 20% sequence divergence among variants of a single species (e.g., HuACyV-8, see Fig. 6). To resolve this conflict, the new sequence should be classified within any species with which it shares >80% identity based on a given classified isolate, even if it is <80% identical to other isolates within that species.

    2. b.

      A given sequence shares >80% genome-wide pairwise identity with viral sequences assigned to two or more different species. In such cases, it is suggested that the new sequence be considered as belonging to the species with which it shares the highest degree of similarity.

  5. 5)

    Reporting non- Circoviridae genomes. A growing number of circular Rep-encoding ssDNA (CRESS DNA) viruses have been reported from various organisms and environments [14, 72] and it is expected that more genomes will continue to be reported. Many of these novel CRESS DNA genome sequences have similarities with members of the family Circoviridae based on the Rep. It is important to note that only sequences that clearly represent members of the Circovirus or Cyclovirus genera, based on genome features discussed here, should be classified as part of the family Circoviridae. If novel sequences with best matches to circovirus or cyclovirus genomes in GenBank share <55% genome-wide pairwise identity (Supplemental Figures 3 - 4) with those sequences, they should be considered as unclassified CRESS DNA viral sequences for the time being. Although additional genera in the family Circoviridae may be created in the future, this level of diversity (i.e., 45% divergence) is similar to the diversity that has been observed in other CRESS DNA viral families, such as the Geminiviridae and Genomoviridae [89].

Concluding remarks

The growing number of genomes representing members of the family Circoviridae in the database required a re-evaluation of the taxonomy classification framework for this family. Here we have reviewed genomic features that characterize members of the family Circoviridae, outlined guidelines on how to analyse genome sequences, and provided a current list of circovirus and cyclovirus species based on the newly established species demarcation criteria. These efforts should facilitate future analyses geared towards elucidating evolutionary relationships among classified as well as newly identified members of the family Circoviridae.