Findings

Guinea pig cytomegalovirus (GPCMV) serves as a useful model of congenital infection, due to the ability of the virus to cross the placenta and infect the fetus in utero [13]. This model is well-suited to vaccine studies for prevention of congenital cytomegalovirus (CMV) infection, a major public health problem and a high-priority area for new vaccine development [4]. However, an impediment to studies in this model has been the lack of detailed DNA sequence data. Although a number of reports have identified specific gene products or clusters of genes [511], to date a full genomic sequence has not been available.

We recently reported the construction and preliminary sequence map of a GPCMV bacterial artificial chromosome (BAC) clone maintained in E. coli [12, 13], and this clone was used as an initial template for sequence analysis of the full GPCMV genome. BAC DNA was purified using Clontech's NucleoBond® Plasmid Kits as described previously [14] and both strands were sequenced using an ABI PRISM® 377 DNA Sequencer, with primers synthesized, as needed, to 'primer-walk' the nucleotide sequence. In parallel, Hin d III- and EcoR I-digested fragments were gel-purified and cloned into pUC and pBR322-based vectors as previously described [15]. Plasmid sequences were determined from overlapping Hin d III and EcoR I fragments using the map coordinates originally described by Gao and Isom [16]. These sequences were compared to the BAC sequence to facilitate assembly of a full-length contiguous sequence. Since the cloning of the BAC in E. coli involved insertion of BAC origin sequences into the Hin d III "N" region of the viral genome, sequence obtained from this specific restriction fragment cloned in pBR322 was utilized for assembly of the final contiguous sequence; analysis of this sequence confirmed that there were no adventitious deletions in the Hin d III "N" region generated during the original BAC cloning process. Since a deletion in the Hin d III "D" region occurred during cloning of the GPCMV BAC in E. coli [17], DNA sequence from a plasmid containing the full-length Hin d III "D" fragment was similarly obtained, and used for assembly of the final contiguous sequence. The GPCMV genomic sequence has been deposited with GenBank (Accession Number FJ355434).

Sequence analysis of GPCMV revealed a genome length of 232,678 bp with a GC content of 55%. This value is in agreement with the value of 54.1% determined previously by CsCl buoyant density centrifugation [18]. A total of 326 open reading frames (ORFs) were identified that were capable of encoding proteins of ≥ 100 amino acids (aa). For ORFs predicted by the sequence analysis that had substantial overlap with other adjacent or complementary GPCMV ORFs that appeared to encode gene products that were highly conserved in other cytomegaloviruses, only those sequences with < 60% overlap with these highly conserved ORFs were further analyzed. ORFs homologous to those encoded by other CMVs with an e-value of < 0.1 and ≥ 100 aa were identified, based on comparisons analyzed using NCBI Blast (blastall version program 2.2.16). Of the ORFs so identified, 104 had sequence and/or positional homology to one or more ORFs encoded by human (HCMV), murine (MCMV), rat (RCMV), rhesus (RhCMV), chimpanzee (CCMV), or tupaia herpesvirus (THV) cytomegaloviruses (Table 1). Of note, homologs of HCMV ORFs UL23 through UL122 were identified [19]. For ease of nomenclature, we have designated these ORFs using upper case font (GP23 through GP122). ORFs with homologs in other CMVs that do not correspond to HCMV UL23 through UL122 have been designated with a lower case "gp" prefix. Homologs of HCMV UL41a (69 aa; gp38.2), UL51 (99 aa; GP51), and UL91 (87 aa; GP91) were annotated in these initial analyses, based primarily on positional, and not sequence, homology to the respective HCMV ORFs. Three ORFs, homologs of MHC class I genes known to be encoded by multiple other CMVs (gp 147–149, Table 1) were also identified. One ORF, gp1 (homolog of CC chemokines), did not have a positional or sequence homolog when compared to other CMVs, but was included in the annotation because of its previous molecular characterization [9]. Including ORFs with mapped exons, the total number of ORFs annotated in this preliminary analysis was 105 [Table 1].

Table 1 GPCMV Open Reading Frames (ORFs)

A map of the GPCMV genome illustrating the relative positions of these ORFs is shown in Fig. 1. ORFs that represent homologs of the individual exons of spliced HCMV genes, in particular UL89 (terminase) and UL112/UL113 (replication accessory protein) are annotated separately. The splice junction for the GP89 mRNA was predicted based on comparisons to other CMVs. For the UL112/113 region, further studies will be required to map the precise splicing patterns of the putative transcripts encoded by this region of the GPCMV genome. Similarly, the ORF encoding the sequence homolog of the HCMV IE transactivator, UL122, has been annotated without regard to the splicing events previously shown to take place in this region of the genome [20]; further analyses of cDNA from this and other GPCMV genome regions of IE transcription, including those encoded in the Hin d III 'D' region of the genome, will likely result in annotation of multiple heretofore unidentified ORFs. A comprehensive table of all ORFs > 25 aa and their homology to other CMV genomes is provided in additional files 1 and 2. As RNA analyses are completed, the total number of annotated GPCMV ORFs will expand in number.

Figure 1
figure 1

Protein Coding Map of GPCMV Genome. Schematic representation of the GPCMV genome demonstrating ORFs described in the text. GPCMV ORFs with positional and/or sequence homology to HCMV ORFs are indicated in bold with upper case prefixes (GP23 through GP122). ORFs that lack sequence or positional homologs in HCMV but share homology with ORFs in other CMVs are indicated with lower case prefixes (see Table 1). Only the 5' terminal repeat (TR) is shown; however, in about 50% of genomes the TR is duplicated at the 3' end [18]. Color-coding indicates ORFs of interest for vaccine and pathogenesis studies: blue, envelope glycoprotein homologs; green, putative immune evasion/immune modulation gene homologs; red, US22 superfamily homologs.

The schematic representation of GPCMV ORFs demonstrated in Fig. 1 highlights several gene families of particular interest. Of particular interest and importance to vaccine studies in the guinea pig model are conserved homologs of the ORFs encoding major envelope glycoproteins gB, gH/gL/gO/, and gM/gN. These glycoproteins are important determinants of humoral immune responses in the setting of CMV infection, and serve as potential subunit vaccine candidates. Of these, the gB homolog has been demonstrated to confer protection against congenital GPCMV infection in subunit vaccine studies [2123]. Homologs of putative HCMV immune modulation genes, including G-protein coupled receptors and major histocompatibility class I homologs, were also identified [24]. Also of interest was the presence of multiple US22 gene family homologs, heavily clustered near the rightward terminus of the GPCMV genome. These ORFs predict protein products that are analogous to the MCMV dsRNA-binding proteins, M142 and M143, that have been shown to inhibit dsRNA-activated antiviral pathways [25, 26]. Members of this family have also been implicated in macrophage tropism in MCMV [27]. Our sequence analysis also confirmed the findings of Liu and Biegalke [8] that the GPCMV genome does not encode a positional homolog of the antiapoptotic HCMV UL36 gene [28]. However, an ORF with homology to R36, which encodes the presumed RCMV cell death suppressor, was identified (gp29.1, Table 1). Further studies will be required to determine whether this putative gene supplies a UL36-like function.

It was also of interest to note the presence of ORFs that have apparent homology to the MCMV M129-133 region. This region has positional homologs in human and primate CMVs [2931], but is absent in THV [32]. Recently, it was determined that passage of GPCMV in cultured fibroblasts promotes the deletion of a ~1.6-kb locus containing potential positional homologs of this gene cluster. The presence of this 1.6 kb locus was found by Inoue and colleagues to be associated with an enhanced pathogenesis of GPCMV in vivo [33]. We independently confirmed the presence of this locus and its sequence in our salivary gland-derived viral stocks, and have included this sequence in our GenBank annotation (Accession Number FJ355434). Further studies will be required to fully annotate the transcripts encoded by this region of the GPCMV genome. Interestingly, the original GPCMV BAC clone that we sequenced was derived using GPCMV viral DNA obtained after long-term tissue culture passage of ATCC 2122 viral stock, and not surprisingly this BAC was found to lack the 1.6 kb virulence locus [12]. Subsequently, PCR and preliminary sequencing of a more recently obtained GPCMV BAC clone with an excisable origin of replication [17] revealed that the 1.6-kb sequence was retained in this clone. The apparent modifications of this locus that occur following viral passage on fibroblast cells are reminiscent of the mutations and deletions that occurred during fibroblast-passage of HCMV [34] and rhesus CMV [35]. The congruence of these events suggests that the selective pressures that promote mutational inactivation of genes in this region may be similar across viral species. Additional analyses, including sequencing of a full-length GPCMV genome derived from replicating virus in vivo, will be required to determine what other deletions or mutations are present in genomes from tissue culture-passaged viruses. Since additional ORFs are likely to be identified by these analyses, we have annotated the first ORF identified in the BAC sequence to the right of this 1.6 kb region as gp138 (Fig. 1), to allow for ease of nomenclature as ORFs in this virulence locus are better characterized. Application of other genome sequence analysis methods, including identification of small or overlapping genes and further assessment of mRNA splicing or unconventional translation signals, will likely result in identification of other putative ORFs in future studies [36].

Comparisons of GPCMV ORFs with sequences from other CMV genomes yielded interesting results. ORF translations were compared with all proteins from the 6 sequenced CMV genomes (HCMV, MCMV, RCMV, RhCMV, THV, and CCMV), and hits with e-values less than 1e-5 were aligned individually for each protein, using both ClustalW (version 1.82; [37]) and Muscle (version 3.6; [38]). The alignments were then used to generate trees based on neighbor-joining using JalView. Clustal trees for glycoproteins B (GP55) and N (GP73) are shown in Fig. 2, with distance scores indicated. Overall, comparison of the various glycoproteins (gB, gM, gH, and gO) yielded similar phylogenies, with GPCMV glycoproteins generally appearing closer to primate CMVs than rodent CMVs [39], except for the gN homolog, which appears closer to rodents. ClustalW and Muscle comparisons of GPCMV ORFs with homologous ORFs from the other sequenced CMVs are provided in additional file 3.

Figure 2
figure 2

Comparison of GPCMV Glycoproteins with CMV Homologs. Sequences of GPCMV glycoproteins were aligned with glycoproteins from six other CMV genomes (HCMV, MCMV, RCMV, RhCMV, THV, and CCMV) using both ClustalW [37] and Muscle [38] using default parameters. Phylogenetic trees (neighbor joining) were generated from these alignments using Jalview. Numbers at each node indicate mismatch percentages. Interestingly, GPCMV sequences closely match THV sequences (see also, supplementary information), and generally appear closer to primate CMV glycoproteins in pair-wise comparisons than to rodent CMV glycoproteins, as previously observed for gB [39]. Clustal comparisons for conserved glycoproteins gB (GP55; Panel A) and gN (GP73; Panel B) are indicated.

In summary, the complete DNA sequence of GPCMV was determined, using a combination of sequencing of BAC DNA, viral DNA, and cloned Hin d III and Eco RI fragments. These analyses identified both conserved ORFs found in all mammalian CMVs, as well as the presence of novel genes apparently unique to the GPCMV. These similarities underscore the usefulness of the guinea pig model, with positive translational implications for development and testing of CMV intervention strategies in humans. Further characterization of the GPCMV genome should facilitate ongoing vaccine and pathogenesis studies in this uniquely useful small animal model of congenital CMV infection.