With the advent of metagenomics approaches, a large diversity of unknown viruses has been uncovered in various environmental, plant, and animal samples [23]. Sampling of animal faecal matter has proved to be particularly efficient for the discovery of a wide variety of novel viral types, in particular those with small DNA genomes. Until recently, the circular replication-initiation protein encoding single-stranded (CRESS) DNA viruses associated with eukaryotic hosts have been classified by the International Committee on Taxonomy of Viruses (ICTV) into four families, namely Circoviridae, Genomoviridae, Geminiviridae and Nanoviridae. In 2018, the ICTV created two new families for classification of CRESS DNA viruses, Bacilladnaviridae and Smacoviridae. The family Bacilladnaviridae, which includes viruses infecting diatoms, a major group of unicellular algae widespread in aquatic habitats, has been described elsewhere [7]. Here, we introduce the family Smacoviridae (smaco- stands for small circular DNA viruses) and describe the ICTV-approved sequence-based taxonomic framework for classification of these viruses.

Smacoviruses [15, 17], previously also referred to as chipoviruses [22, 24], have been identified in faecal matter of various vertebrates, including humans, as well as in the abdomina of dragonflies of two species (Table 1). Thus far, none of these viruses have been cultured or found in animal tissue sample. Nonetheless, the viruses have been discovered using viral metagenomics approaches and for the majority of the smacoviruses, the validity of genome sequences has been verified by either PCR amplification using abutting primers followed by Sanger sequencing of these products or by amplification, cloning and Sanger sequencing of the recombinant plasmids [1, 3, 4, 9, 15, 22, 24].

Table 1 Summary of taxa that are part of the family Smacoviridae

The genomes of currently identified smacoviruses are ~2300-2900 nucleotides-long, contain two major open reading frames (ORF), encoding the rolling circle replication-associated protein (Rep) and capsid protein (CP; Figure 1). The two ORFs in all 83 smacovirus genomes are bidirectionally organised, separated by two intergenic regions. Similar to other CRESS DNA viruses, smacoviruses contain a conserved nonanucleotide sequence located at a putative stem-loop structure at the origin of replication (Figure 1), where nicking of the dsDNA replicative intermediate is predicted to occur. The Reps of smacoviruses are homologous but phylogenetically distinct from those of classified CRESS DNA viruses (Figure 2). Phylogenetic analysis and comparison of the conserved sequence motifs suggest a closer evolutionary relationship between the smacovirus and nanovirus Reps [7]. By contrast, although conserved among smacoviruses, the CPs do not display recognizable sequence similarity to the CPs of other known viruses.

Fig. 1
figure 1

Genome organization of a representative smacovirus (chimpanzee associated porprismacovirus 1 [GQ351272]) and a WebLogo of the nonanucleotide motif found in smacoviruses

Fig. 2
figure 2

Unrooted approximate maximum likelihood phylogenetic tree of Reps of CRESS DNA viruses inferred using FastTree [18]. Major groups of classified CRESS DNA viruses as well as alphasatellites associated with geminiviruses and nanoviruses are colour coded

Analysis of the genome-wide pairwise identities of the 83 smacoviruses (Figure 3) shows 45% diversity amongst these genomes, which is similar to values determined for members of the families Geminiviridae [27] and Genomoviridae [12, 25]. The plot of the distribution of pairwise identities shows a trough between 76 and 88%. Hence, for this group of viruses, 77% genome-wide pairwise identity is chosen as a species demarcation threshold. Using this approach, the 83 smacoviruses were assigned to 43 species (Table 1).

Fig. 3
figure 3

Distribution of pairwise identities of the full genome (upper panel), the replication initiation protein (middle panel) and the capsid protein sequences determined using SDT v1.2 [14]. The arrows indicate the thresholds of the full genome (top panel) and Rep protein (middle panel) pairwise sequence identities used as the species and genus demarcation criteria, respectively

Maximum likelihood phylogenetic analysis of the Rep sequences of all 83 smacoviruses reveals four main clusters with >90% branch support and two singletons (Figure 4). Rep sequences within each of the four clades in general share >40% pairwise identity, whereas sequences from different phylogenetic clades show less than 40% identity to each other. We note that phylogenetic trees produced using complete genome (Figure 5) and CP (Figure 6) sequences are not congruent with the Rep phylogeny, presumably due to intra-familial recombination between different smacovirus genomes resulting in chimeric entities encoding Rep and CP with different evolutionary histories, as has been also observed for genomoviruses [25]. Given that smacovirus Reps are considerably more conserved than CPs (Figure 3) and due to the fact that Reps are the only proteins shared across all CRESS DNA viruses [11, 20], genera were established based on the phylogenetic analysis of the Rep sequences coupled with their pairwise sequence identity. Accordingly, 40% Rep amino acid sequence identity coupled with strong phylogenetic support is proposed as a genus level demarcation threshold.

Fig. 4
figure 4

Maximum likelihood phylogenetic tree of the Rep amino acid sequences of the smacoviruses inferred using PhyML [6] with the LG+G+I+F substitution model. The tree is rooted with the Rep sequences of nanoviruses

Fig. 5
figure 5

Maximum likelihood phylogenetic tree of the genome sequences inferred using IQ-TREE [16] with K3Pu+I+G4 substitution model. Branches with <60% bootstrap support have been collapsed and the tree is mid-point rooted

Fig. 6
figure 6

Maximum likelihood phylogenetic tree of the CP amino acid sequences of the smacoviruses inferred using PhyML [6] with the LG+G+I+F substitution model. The phylogenetic tree is mid-point rooted

The naming practice for smacoviruses and other uncultivated CRESS DNA viruses, such as genomoviruses [25], typically involves adoption of the name of an organism or material from which the virus genome has been sequenced. In the absence of evidence of actual infection, the word “associated” is usually added to the potential host name to emphasize that the organism may or may not be the actual host. As a case in point, it has been recently suggested that dsRNA viruses of the family Picobirnaviridae, which for three decades were considered to infect eukaryotes [5], might instead replicate in bacteria that populate the enteric tract of animals [10]. Thus, utmost caution should be exercised when assigning viruses to potential hosts.


The following names for the six genera within the Smacoviridae have been adopted


Bovismacovirus: Bovine smacovirus

3 species (Table 1);

Drosmacovirus: Dromedary smacovirus

3 species (Table 1);

Huchismacovirus: Human and chicken smacovirus

7 species (Table 1);

Porprismacovirus: Porcine and primate smacovirus

28 species (Table 1);

Cosmacovirus: Cow smacovirus

1 species (Table 1);

Dragsmacovirus: Dragonfly smacovirus.

1 species (Table 1).


We would like to note that the species Sheep associated porprismacovirus 3, Bovine associated huchismacovirus 1 and Bovine associated huchismacovirus 2 have been tentatively assigned to genera Porprismacovirus and Huchismacovirus. It is highly likely that, as more sequences become available, new genera will have to be created for these divergent smacoviruses (Figures 4 and 5).

Sequence based taxonomic framework employed here for smacoviruses and previously applied for genomoviruses [25] should guide the classification of an astonishing diversity of other uncultured CRESS DNA viruses described by metagenomic approaches.