The application of pyrosequencing technology to study the pig virome has led to the discovery of unique viruses that may or may not necessarily play a role in disease [8]. Recently, previously unknown single–stranded (ss) circular DNA viruses, similar to chimpanzee stool–associated circular virus (ChiSCV) [1], were identified in fecal samples collected from sick or healthy pigs. The genomes of these pig– or porcine–stool–associated viruses (PigSCV and PoSCV1) [7, 9] are about 2.5 kilobases (kb) in size and contain two major open reading frames (ORFs) that encode the capsid protein (Cap) and replication initiator protein (Rep). Both PigSCV and PoSCV1 contain a palindromic sequence capable of forming a stem–loop structure in the small intergenic region (SIR), which suggests they may synthesize their respective genomes by the rolling–circle replication mechanism. Whereas the Rep and Cap of PigSCV are encoded by the same DNA strand, the Rep and Cap of PoSCV1 are transcribed bidirectionally from the large intergenic region (LIR) in opposite orientations.

In this study, diarrheal fecal materials were collected from sick pigs (1 day to 6 weeks of age) with no common background from various Midwest farms in the United States that had been submitted to the Indiana Animal Disease Diagnostic Laboratory, Purdue University, West Lafayette, Indiana, between December, 2009 and June, 2010. Following routine diagnostic analysis, rotaviruses, coronaviruses and enteroviruses were detected in these samples that were subsequently submitted to National Animal Disease Center for additional study. Fecal samples from six pigs were pooled and processed to prepare a viral nucleic acid library. Briefly, viral particles were first purified using size filtration and nucleases [8]. The extracted viral nucleic acids were amplified by random PCR with a specific nucleotide sequence tag for identification. Several libraries, each prepared with a different sequence tag for identification, were combined and subjected to 454 pyrosequencing and analyzed as described previously [4]. Sequencing was performed on a Roche FLX sequencer using Titanium chemistry (Roche, Branford, CT). For comparative purposes, the best BLASTx results were used to categorize the sequences (contigs and singletons) into virus family and genus.

Of the 1,296,370 total keypass reads generated, 125,282 reads contained sequence tags belonging to the six pooled fecal samples. Positive sequence reads for a taxonomic group were identified based on deduced protein sequence similarity using a stringent expectation value, best BLASTx expectation scores of ≤ 10−10, as the cutoff. Sequence reads at this cutoff level exhibited highly significant protein sequence similarities with known viruses in the database. Viral sequences (coronavirus, enterovirus, rotavirus) corresponding to the viruses identified by the Indiana Animal Disease Diagnostic Laboratory (West Lafayette, IN) were detected. Other viral sequences belonging to the RNA virus families (astrovirus, picobirnavirus, teschovirus, torovirus and sapelovirus) and DNA virus families (anellovirus, circovirus, and parvovirus) were also observed. Several sequences encoding amino acid sequences related to Rep of ChiSCV and PoSCV1 were identified.

The ChiSCV– and PoSCV1–related nucleotide sequences detected by deep sequencing (designated Tp1 and Tp2) were used to design primers for PCR. DNA amplification employing converging primers (conventional PCR) was used to confirm the presence of contig sequences in the sample, and diverging primers (inverse PCR) were used to amplify and clone the complete circular viral genomes. Nucleic acids were extracted directly from fecal samples using a QIAamp MinEluteVirus Vacuum Kit (QIAGEN, Valencia, CA) and subjected to rolling–circle amplification to amplify circular DNA molecules (Illustra GenomiPhi V2 DNA Amplification Kit, GE Healthcare Biosciences, Piscataway, NJ). The amplified DNA was used as a template for PCR using converging or diverging primers based on 454 pyrosequencing results. The amplicons were resolved and excised from agarose gels, cloned into plasmid TOPO–CLX104 and introduced into Eschericheria coli TOP10 (Invitrogen, Carlsbad, CA) by transformation. Multiple clones were picked and used for sequence determination using Sanger methods. From the Tp1 PCR product, three clones were analyzed, and they all yielded identical sequences. This viral genome was designated porcine stool–associated circular virus 2 (PoSCV2; GenBank accession number KC545226). From the Tp2 PCR product, four variant genomes were obtained, and the individual genomes were designated PoSCV3–4L5, –3L7, –LT2 and –4L13 with GenBank accession numbers KC545229, KC545227, KC545230 and KC545228, respectively.

Similar to the genome organization of other SCVs, the Tp clones (PoSCV2 and all four PoSCV3 clones) were about 2.5 kb in length (Fig. 1a). The viral genomes can be divided into four regions: two large ORFs with deduced amino acid sequences exhibiting homology to the Rep and Cap of ChiSCV, a LIR that encodes multiple overlapping ORFs, and an SIR that contains a palindromic sequence capable of forming a stem–loop structure. The Rep ORF and Cap ORF are transcribed divergently from the LIR and converge at the SIR. In contrast to the LIR of PoSCV1, which encodes two small ORFs (ORF3 and ORF4) in the same orientation as the Cap gene, the LIRs of PoSCV2 and PoSCV3 also contain an additional ORF (ORF5) in the reverse orientation as the Cap gene.

Fig. 1
figure 1

Genome organization of selected SCVs. (a) The circular genomes of PoSCV2 and PoSCV3 (clones 3L7 and L2T) illustrated alongside several distantly related reference genomes (BoSCV–JN634851, ChiSCV–GQ351277, PigSCV–JQ23166 and PoSCV1–JQ274036). The Rep, Cap, LIR and SIR (with stem–loop) regions are indicated. (b) RCR motifs found in the Rep proteins of selected SCVs

The four PoSCV3 genomes were aligned, and a schematic representation is shown in Fig. 2a. The LIR, Cap region and 5′ portion of the Rep region exhibited few to no nucleotide differences. Genetic differences were concentrated around the stem–loop structure in the SIR and the 3′ portion of the Rep ORF. The four genome regions (SIR, Rep–ORF, Cap–ORF and LIR) are described individually in greater detail below.

Fig. 2
figure 2

Comparative analysis of the PoSCV3 variant genomes. (a) Schematic representations of the circular genome. The 3L7 genome (2495 nt) was used as a reference, and gaps were introduced for sequence alignment. Termination sites of the Rep ORF are indicated by black triangles. Point mutations are indicated by black dots, and the long stretches of nucleotide differences are indicated in brackets. The areas showing identical mutations in 4L5 and L2T are indicated by black boxes. (b) Sequence analysis showing the SIR. Nucleotide differences are colored. Dashes were introduced to align the sequences. The stem–loop sequences are indicated by shaded boxes, and C–terminal coding sequences of Rep and Cap are indicated by black arrows. (c) A phylogenetic tree derived from the deduced amino acid sequences of the Rep genes from selected single–stranded DNA viruses

SIR: The SIR sequences of PoSCV2 and PoSCV3 are shown in Fig. 2b. Whereas the Rep ORF of PoSCV3–4L5 overlaps the stem–loop structure, the other four PoSCV3 genomes do not. All five genomes contain a palindromic sequence in the SIR that is capable of forming a stem–loop structure whose nucleotide sequence is well conserved. This stem–loop structure may be part of the origin of DNA replication. Among the PoSCV3 genomes, the SIR sequences on the Cap–gene side are more conserved, while sequences on the Rep–gene side exhibit the greatest differences.

Rep ORF: Phylogenetic and pairwise identity analyses were conducted to determine the relationship of Tp clones to other viruses. A phylogram was created based on the deduced amino acid sequences encoded by the Rep gene (Fig. 2c). The amino acid sequences were aligned using Mafft 5.8 [2] with the E–INS–I alignment strategy and previously described parameters [5, 6]. A maximum-likelihood tree was created using RaxML based on the Mafft alignment with previously described parameters [6, 10]. The resulting tree was midpoint rooted using MEGA4 [11]. Pairwise identity analysis of the PoSCV genes and ORFs was also performed using MEGA4 [11]. The results showed that the Tp clones were most closely related to ChiSCV or PoSCV1, and they clustered into a distinct clade with PoSCV2 and PoSCV3, separated into two different sub–groups. There is limited amino acid sequence identity (23–32 %) between the Tp clones and bovine SCV (BoSCV) [3] or PigSCV.

The amino acid sequence identities between Tp:PoSCV1 and Tp:ChiSCV were approximately 50 % and 40 %, respectively (Table 1a). The nucleotide or amino acid sequence identity between PoSCV2 and PoSCV3 was approximately 87 %, and the sequence identity among the PoSCV3 variants was 93–100 %. In addition, rolling-circle replication (RCR) amino acid sequence motifs (RCR–I, RCR–II, RCR–III, walker A and walker B) commonly found among the Rep proteins involved in RCR were detected [9] (Fig. 1b). These motifs were conserved among members of this new clade.

Table 1 Deduced amino acid sequence identities of the (a) Rep proteins, (b) Cap proteins, (c) ORF3s, ORF4s, ORF5s of selected SCV

Cap ORF: The deduced Cap protein sequences of selected SCV were compared (Table 1b). There is limited amino acid sequence homology (17–26 %) between Tp clones and BoSCV, ChiSCV, PigSCV or PoSCV1. The nucleotide sequence identity of the Cap gene (46–48 %) was lower than the amino acid sequence identity (60–62 %) between PoSCV2 and PoSCV3. In general, the Rep gene is more conserved than the Cap gene across the ssDNA viruses. Therefore, it was unusual to find that the nucleotide and amino acid sequence identities among the PoSCV3 Cap genes (99–100 %) were higher than those of the Rep genes (94–100 %).

LIR: The LIR nucleotide sequence identity between PoSCV2 and PoSCV3 was 70.2 %, and the sequences of the PoSCV3s were identical. Both PoSCV1 and the Tp clones exhibit two overlapping ORFs, ORF3 and ORF4, transcribed in the same orientation as the Cap gene. There were no detectable amino acid homologies between the PoSCV1 and the Tp clones. For ORF3, the amino acid sequence identity between PoSCV2 and PoSCV3 was approximately 68 %, and the sequence identity among the PoSCV3s was 99–100 % (Table 1c). For ORF4, the amino acid sequence identity between PoSCV2 and PoSCV3 was approximately 64 %, and the sequence identity among the PoSCV3 genes was 99–100 % (Table 1c). It is expected that the deduced amino acid sequences of ORF3 and ORF4 would be identical among the PoSCV3 variants since the nucleotide sequences are identical. However, it is surprising that the amino acid identity of these two ORFs between PoSCV2 and PoSCV3 was 64–68 %, which is slightly higher than the capsid protein homology of 61 %. This finding lends credence to the speculation that either ORF may code for an important functional domain or protein.

The LIRs of PoSCV2 and PoSCV3 also code for an additional ORF5 that is transcribed in the opposite orientation to the Cap gene and overlaps ORF3 and ORF4. The amino acid sequence identity between PoSCV2 and PoSCV3 was approximately 59 %, and the sequence identity among the PoSCV3 variants was 99–100 % (Table 1c). Thus, the amino acid sequence identity of ORF5 between PoSCV2 and PoSCV3 was almost as high as that of the capsid protein identity of 61 %.

In this work, we report a clade of novel viruses that includes PoSCV2 and PoSCV3, which encode a Rep–like protein and a palindromic sequence capable of forming a stem–loop structure (in the SIR), suggesting that their genomes may replicate via a common RCR mechanism. Interestingly, this clade of viruses encodes three overlapping “conserved” ORFs (ORF3, ORF4 and ORF5) in the LIR. Whereas the amino acid sequence identities between PoSCV2 and PoSCV3 for these ORFs range from 58.9 % to 68.6 %, the amino acid sequence identities among the capsid proteins range from 60.7 % to 64.1 %. Whether these additional ORFs code for functionally important proteins is not known. Likewise, the role of these viruses in any disease is unknown. The growing diversity of SCV–related genomes currently reported in the stool of chimpanzees, cows, and pigs likely portend further identification in other mammalian species. However, it remains to be seen whether these stool–associated viruses replicate in the host or that they are pass–through viruses present in the diet. Confirmation of their host and organ tropisms will require detection of SCV-specific antibodies or finding virions in animal tissues. A high level of co–infections involving numerous known viruses (coronavirus, enterovirus, rotavirus, astrovirus, picobirnavirus, teschovirus, torovirus, sapelovirus, anellovirus, circovirus and parvovirus) was detected in just six animals from this study. This report, and the work of others, demonstrates the growing complexity of the pig virome and the challenge to understand the biology, interactions and significance of these newly discovered viruses.