Introduction

Over 200 bacteriophages infecting Campylobacter coli or Campylobacter jejuni have been isolated, the great majority being members of the Myoviridae, although ten have been classified as belonging to the Siphoviridae [116]. These phages were isolated with or without enrichment from intestinal contents; human, porcine, and avian fecal matter; sewage; and cultures of lysogenic strains. They are used for typing Campylobacter isolates [1721] and have been proposed for use in eliminating this bacterium from the intestinal tract of chickens (phage therapy) [2231] and biocontrol of pathogens on produce and carcasses (aka biocontrol) [1, 3234].

Unfortunately, the lack of uniform classification criteria and often poor quality of electron micrographs make it very difficult to group them into taxonomic clusters. Morphology, genome size determined by pulsed-field gel electrophoresis (PFGE), and susceptibility to digestion with restriction endonucleases [13, 5, 11] have been used to classify some of these viruses into three groups: Group I Campylobacter phages possess genomes of at least 320 kb and heads of 143 nm. Group II phages have isometric capsids of about 99 nm in diameter and genomes of 180-194 kb in size that are resistant to digestion with HhaI endonuclease (GCG↓C). The members of the last group (III) have head sizes of 100 nm and 130-140-kb genomes that are digestible by HhaI [5].

The integration of genome-based information into official taxonomy was proposed for the Myoviridae [35]. In addition, a more recent review of the “T4 superfamily of viruses” defined T4-related bacteriophages as possessing a core genome encoding approximately 37 proteins [36]. This included Campylobacter phages CPt10 and CP220 [37], Delftia phage φW-14 (NC_013697), and Salmonella phage ViI [38]. Adriaenssens and colleagues [39] have recently extended this work by defining a new genus “Viunalikevirus”, which contains six phages related to Salmonella phage ViI. Here, we investigate the taxonomic position of fully sequenced Campylobacter phages and propose the creation of two new genera, the “Cp220likevirus” and Cp8unalikevirus”, within a proposed subfamily, the “Eucampyvirinae.”

Results

Bacteriophages

We compared seven bacteriophages, belonging to group II (CP220, CPt10, vB_CcoM-IBB_35, CP21) or III (CP81, CPX, NCTC 12673). The phages had been isolated in different places and ecosystems at different times and in different surroundings. Phage vB_CcoM-IBB_35 (also known as phiCcoIBB35) was isolated from the intestines of free-range chickens in Braga, Portugal; CP81 was isolated from the skin of a retail chicken from Bavaria, Germany; CP220 and CPX were also recovered from a retail chicken from the United Kingdom; CPt10 was recovered from slaughterhouse effluent in the United Kingdom; CP21 was isolated from an organic farm in Berlin, Germany. NCTC 12673 was enriched from U.S. poultry excreta. For further details of each phage, the reader is referred to the original publications (see Table 1). In this manuscript, Campylobacter phages will subsequently sometimes be called “campylophages.”

Table 1 Genomic and morphological features of the phages compared in this study

Morphology

The campylophages have isometric heads and contractile tails and are thus members of the Myoviridae. The morphologies of representative Campylobacter phages are shown in Figs. 1 and 2. Their dimensions are reported in Table 2. The most noticeable difference between the phages of groups II and III is the length of the tail.

Fig. 1
figure 1

Electron micrographs of group II and III Campylobacter phages. Phages were sedimented for 1 h at 25,000 g in a Beckman J2-21 centrifuge using a JA18.1 fixed-angle rotor; washed twice in 0.1 M neutral ammonium acetate under the same conditions; deposited on carbon-coated copper grids; stained with ammonium molybdate (2 %, Mol), uranyl acetate (2 %, UA), or phosphotungstate (2 %, PT); and examined in a Philips EM 300 electron microscope. Bars indicate 100 nm. AE Type CP220. A Two particles with extended tails assembled end-to-end by entangled tail fibers, Mol. B, C Two aspects of pentagonal heads, UA. D Particle with partially assembled sheath and denuded tail core, PT. E phage with positively stained head and halo around the capsid, UA

Fig. 2
figure 2

FJ Type CP81. F Normal particle with extended tail and tail fibers, Mol. G Two phages with contracted tails adsorbed to small spherical bacterial debris, UA. Note the dark staining of phage heads. H A phage with a contracted tail adsorbed to a piece of bacterial debris, PT. Note the apparent absence of a base plate. I A pentagonal head, PT. J A smaller-than-normal head, an exceedingly rare morphological aberration, UA

Table 2 Approximate dimensions of phages

Heads are icosahedral, as proven by the observation of both hexagonal and pentagonal capsids. Heads positively stained with uranyl acetate are always shrunken. These phages have necks, but no collars and no apparent base plates. Tail fibers are short and thin. Group II phages (Fig. 1) have longer tails and appear often in pairs held together by entangled tail fibers. Tails are generally extended. Some tails have incomplete sheaths; this represents a hitherto unreported type of morphological aberration. Group III phages (Fig. 2) are unremarkable except that contracted tail tips appear typically adsorbed to spherical vesicles of probable bacterial origin. Preservation of CP220 was difficult as the phage was easily inactivated by rupture of the head and tail contraction.

DNA properties

Cohesive ends could not be detected in either group, while Bal31 analyses suggested that the genomes of these phages are circularly permuted.

While DNA from Campylobacter can be readily isolated, spectrophotometrically quantified, digested with common restriction endonucleases, and amplified using Taq polymerase, the same cannot be said for the genomes of its phages. Proteinase-K-treated DNA from phages NCTC 12673, vB_CcoM-IBB_35, and CP81 partitioned, upon phenol extraction, into the interphase [11, 14]. Commonalities between phages NCTC 12673 (group III) and vB_CcoM-IBB_35 (group II) are that their DNA could not be accurately quantified using a NanoDrop spectrophotometer (NanoDrop products, Wilmington, DE), digested with common restriction endonucleases, or amplified with Taq DNA polymerase [13, 14]. A further discrepancy was found between the predicted genome size of these phages based on PFGE and the actual genome size obtained from sequencing [14]. A detailed restriction analysis of CP81 DNA revealed that only enzymes recognizing AT-rich sequences such as DraI (TTT↓AAA), SmiI (ATTT↓AAAT), and VspI (AT↓TAAT) cleaved the DNA efficiently.

The fact that the phage DNA was predominantly found at the phenol-water interface suggested the presence of tightly-bound proteins or a change in the hydrophobicity of the DNA through enzymatic modification. In silico analysis for phage-specific histone-like proteins failed to reveal any likely candidates. Also, amplification of CP81 DNA with QIAGEN REPLI-g (φ29 DNA polymerase) resulted in DNA that could be cleaved with common restriction endonucleases, suggesting that the DNA contains unique modifications [11].

One of the interesting features of group II and III phages is that they have similar capsid dimensions but significantly different genome masses. This could be due to differences in the compaction of the phage genomes.

Genome organization properties

Based on a number of criteria, including mass, mol%G+C, number of genes for tRNAs and homing endonucleases, plus BLASTN homologs, the fully sequenced Campylobacter phages fall into two groups (Table 1). The first group comprises viruses CP220, CPt10, CP21, and vB_CcoM-IBB_35, while the latter contains phages CPX, CP81 and NCTC 12673. Based upon time of submission to GenBank, the two type phages are CP220 and CP81.

We compared the genomes of the phages using progressiveMauve analysis [40]. The results (Supplementary Figure 1AB) confirm the groupings described above and further indicated that there was almost no DNA sequence relatedness between the two groups of phages. While the genomes of CP220, CPt10 and vB_CcoM-IBB_35 are colinear, the genome of CP21 exhibits considerable rearrangement of homologous modules. Group II phages are composed of large modules separated by long DNA repeat regions, which could lead to these rearrangements. The other three phages are circular permutations of one another. This emphasizes the importance of circularly permuted genomes being opened at identical positions so that meaningful comparisons may be made, including the quantitative calculation of the overall DNA sequence identity, using programs such as Stretcher from the EMBOSS suite [41, 42].

At the protein level, similarity was determined using CoreGenes [43, 44] at its default settings (Table 3). Common proteins shared by these two groups of phages include DNA helicase UvsW, RNaseH, topoisomerase (large subunit), putative dUTP pyrophosphatase, poly A polymerase, thymidine kinase, thymidylate synthetase, ribonucleotide reductase (small subunit), co-chaperonin GroES and the following homologs to T4 proteins: gp61 DNA primase subunit, gp44 sliding clamp loader, gp45 sliding clamp, gp62 clamp loader, gp20 portal vertex, gp25 baseplate wedge, gp6 baseplate wedge, gp4 head completion protein, gp23 major capsid, gp18 tail sheath, gp19 tail tube, gp2 DNA end protector, gp5 baseplate hub, gp43 DNA polymerase, gp13 neck, gp41 DNA primase-helicase, gp14 head completion, gp15 tail sheath stabilizer, gp32 ssDNA binding protein, gp3 tail completion, gp21 prohead core scaffold and protease – in other words, the proteins whose genes make up the core genome of T4-like viruses.

Table 3 CoreGenes analysis of phages

Regulatory elements

To investigate the existence of unique regulatory regions within the proposed genera, 100 bp of 5’ upstream sequence data was extracted using extractUpStreamDNA (http://lfz.corefacility.ca/extractUpStreamDNA/) and submitted for MEME analysis at http://meme.sdsc.edu/meme/cgi-bin/meme.cgi [45]. Unlike Escherichia coli and its relatives, whose RpoD (σ70)-dependent promoters conform to the consensus sequence TTGACA(N15-17)TATAAT, these promoters in Campylobacter display no recognizable -35 element and have a modified −10 sequence (TAwAAT) [46]. A/T-rich repeats are found centered at 5-6, 15-17 and 26 bp upstream of the −10 region. In addition, another A/T-rich region is located 7-14 bp downstream of the −10 region. Campylophages displayed multiple copies of a consensus sequence TTAAG(N6)TTAAG(N11)TATAAT, with some evidence for an extended −10 region (TGN) [47], which we believe corresponds to early promoters recognized by the host RNA polymerase (Supplementary Figure 2A). The consensus sequence for late transcription in members of the genus Viunalikevirus [39] is CTAAATAcCcc, while in T4, the late promoter core is TATAAATA [48, 49]. MEME analysis revealed another consensus sequence, CAT(N3)WWWCCTTT (Supplementary Figure 2B). In the case of phage NCTC 12673, this sequence was found upstream of the genes encoding the late transcriptional regulator, major head, tail completion, and portal protein and might therefore function as a late-gene promoter.

tRNAs and codon usage

An analysis of initiation codon usage of nine of the completely sequenced C. jejuni strains using Inidon [50] indicates that AUG is the preferred initiation codon 86.6 % of the time. This is followed by UUG (8.5 %) and GUG (4.7 %). By comparison, campylophage CP220 uses these codons 89.2, 7.2 and 3.6 % of the time. With phage CP81, the predominant codon is AUG (98.3 %), followed by GUG (1.1 %) and ACA (0.5 %). Bacteriophage tRNAs function to enhance translation efficiency, particularly if the cognate codon is poorly represented in the host. The group II and group III phages contain five and two tRNA genes, respectively. The arginyl codon AGA is overrepresented in CP220 and CP81, while the leucyl UUA codon is overrepresented in CP81. Therefore, in these cases, the presence of a phage-encoded tRNA might be advantageous for overall translation efficiency.

Homing endonucleases

All sequenced Campylobacter myoviruses contain homing endonucleases related to Hef [51], which was first found in the T4-like phage U5. This homing endonuclease was located in the ndrA gene of the latter phage, while analogous genes (segH) are similarly located in T4-like coliphages RB15 and RB32. Altogether, we identified 33 homologs unevenly distributed in numbers between the campylophages. These sequences vary considerably in length and may be considered pseudogenes, and they show weak similarity to cd00221, Very Short Patch Repair (Vsr) Endonuclease.

Interestingly, Hef-like sequences are absent in the sequenced Campylobacter genomes. A phylogenetic tree of a subset of Hef-related protein sequences was constructed using “one-click” Phylogeny.fr: robust phylogenetic analysis for the non-specialist at http://phylogeny.lirmm.fr/phylo_cgi/simple_phylogeny.cgi [52]. This suite of programs is based upon MUSCLE alignment [53], Gblocks to eliminate poorly aligned positions and divergent regions [54], and PhyML [55]. Supplementary Figure 3 shows the common subclustering of Hef homologs from phages NCTC 12673, CPX and CP81, suggesting lateral transfer of these elements.

Nature of the large subunit terminase

Using the sequence of the large subunit terminase of Vibrio phage KVP40 (NP_899601) as a TBLASTN probe, two things were immediately apparent: (a) the hits within CPt10, CP220 and vB_CcoM-IBB_35 were contiguous sequences, and (b) those in the other three phages were discontinuous and in opposite orientations, i.e., homology to the N-terminal portion of the protein was found approximately 22 kb away from the region showing homology to the C-terminus. Since these fragments are associated with inteins, this suggests that the complete terminase protein is produced by intein-mediated protein splicing in trans [5660].

Receptor specificity

Like phages of other bacterial species, campylophages are also specific for their host strains [3, 61]. They have been reported to recognize different structures on the surface of the host cell including capsular polysaccharides (CPS) [6163] and flagella [61, 64]. The host specificity of a phage is determined by its receptor-binding proteins (RBPs). The genome sequences of at least seven lytic campylophages have been reported; however, gp047, previously annotated as gp48 [14], is the only putative RBP that has so far been characterized in detail. Gp047 was identified as a putative RBP in phage NCTC 12673 and is a large protein (152 kD) that is capable of agglutinating C. jejuni NCTC 11168 cells [14]. Gp047 can also be used to specifically detect C. jejuni cells when immobilized onto microbeads, and surface plasmon resonance (SPR) on gp047-derivatized surfaces allows C. jejuni detection to 102 cfu/ml [65].

Gp047 homologs are present in all seven sequenced campylophages. BLASTN analysis (http://blast.ncbi.nlm.nih.gov/Blast.cgi) using the gp047 homolog along with the 1000-bp nucleotide sequence downstream of the start codon from phage vB_CcoM-IBB_35 (GenBank ID: AEF56819.1) showed that the genomic region encoding the N-terminal sequence (~300 amino acids) of the putative RBPs and the downstream sequence is conserved only in campylophages belonging to group II. Similarly, BLASTN comparisons of the downstream sequences of the gp047 homologs in group III clustered together and showed >99 % sequence identity. Interestingly, the C-terminal part of gp047 is highly conserved in all the campylophages whose genome sequences have been determined. We designed primers based on the conserved C-terminal homologous region and sequenced the gp047 homologs from other campylophages whose genomes have not yet been sequenced, including NCTC 12678, NCTC 12669 and F336, and the C-terminal region in these three phages is also conserved (Supplementary Figure 4). The open reading frame (ORF) size in the region homologous to gp047 is variable in campylophages (Supplementary Figure 5); most interestingly, phage CPX has three ORFs in the region homologous to gp047. In phage CP81, two ORFs of 669 bp and 1854 bp are at the start of the CP81 annotated genome, while another segment of 1086 bp homologous to the gp047 sequence, encoding the N-terminal region, is located close to the end of the CP81 genome and was not annotated as an ORF. This is most probably because the CP81 genome is circularly permuted and this 1086-bp segment is part of the 669-bp ORF, which results in a single continuous reading frame of 1755 bp. Remarkably, we found a probable single nucleotide polymorphism (SNP) at 2305 bp in the gp047 homolog from F336. We sequenced this region seven times and five times found a major peak of deoxythymidine (T) and two times observed a deoxyguanosine (G) nucleoside at this position. Translation of the ORF continues when G is present, but its replacement with T results in a stop codon and a gap of a non-coding region of 693 bp followed the 795-bp-long ORF homologous to the C-terminal region of gp047.

To find out if the host recognition domains of gp047 are localized in the C-terminus, we expressed the recombinant protein in truncated N- (amino acid residues 1-684) and C- (amino acid residues 682-1365) terminal parts and fused each to gold-coated surfaces. The C. jejuni binding domains of gp047 are localized in the conserved C-terminus (Fig. 3). We are currently in the process of expressing the 795-bp ORF of phage F336 in E. coli to check if the host binding domain is also localized in this region. It is exciting to speculate that the F336 phage is evolving to express this smaller ORF independently of the 5’-end 2307-bp ORF or vice versa.

Fig. 3
figure 3

GST-fused gp047, truncated N-(amino acid residues 1-684) and C-terminal parts (amino acid residues 682-1365) were immobilized on gold-coated surfaces and binding of C. jejuni NCTC 11168 was studied as described earlier [65]. The results showed that the host binding domains of gp047 are localized in the C-terminus as the average density of the captured bacteria by full-length gp047 (A) and its N- (B) and C-terminal fragments (C) was 5.47, 0.03 and 5.8 per 100 μ2, respectively

Phylogenetic trees for the major capsid proteins (homologs of T4 gp23; Fig. 4A) and DNA polymerases (homologs of T4 gp43; Fig. 4B) clearly show that there are two distinct groups of phages. Furthermore, it is interesting that for many of the campylophages proteins the closest homologs are to be found not in the phylum Proteobacteria, but among phages of the phylum Cyanobacteria.

Fig. 4
figure 4

A Phylogenetic analysis of gp23 (major capsid protein) homologs of Campylobacter and Synechococcus phages (S-CBM2 to Syn33). B Phylogenetic analysis of gp43 (DNA polymerase) homologs of Campylobacter and Synechococcus phages

Conclusions

The Campylobacter phages described here have icosahedral heads and long contractile tails and thus belong to the family Myoviridae. In the absence of genome sequences, the Campylobacter phages were divided into three groups based on electron microscopic analysis of their head assembly, genome size determined by PFGE, and DNA restriction profiles produced using selected endonucleases [5]. The availability of genome sequences of seven virulent campylophages, which have been published recently, provided further details about the evolution and phylogenetic relationship of these phages. This knowledge can be used for the classification of these viruses. Based on their whole-genome sequence homology and similarity in protein sequences, we suggest grouping these phages into two related genera, exemplified by phages CP220 and CP81. The former, named “Cp220likevirus”, comprises phages CP21, CP220, CPt10, and vB-Ccom-IBB_35. It corresponds to group II phages of campyloviruses. The second, named “Cp8unavirus”, corresponds to group III and comprises phages CP81, CPX, and NCTC 12673. The two genera together constitute the tentative subfamily “Eucampyvirinae” within the family Myoviridae of the order Caudovirales.

This classification is supported by sequence analysis of a putative receptor binding protein (Gp047) and the large subunit terminase, plus phylogenetic analysis of the major capsid protein Gp23, and Gp43-like DNA polymerase. It appears to us to be urgent to sequence Campylobacter phages of group I, which might potentially constitute a third genus within the proposed subfamily. Furthermore, the grouping in and within the proposed subfamily “Eucampyvirinae” is consistent with the recently updated classifications within the families Podoviridae and Myoviridae [35, 66].