Background

Pseudomonas syringae is a widespread bacterial pathogen that causes disease on a broad range of economically important plant species. The species P. syringae is sub-divided into about 50 pathovars, each exhibiting characteristic disease symptoms and distinct host-specificities. P. syringae pathovar tabaci (Pta) causes wild-fire disease in soybean and tobacco plants [1, 2], characterised by chlorotic halos surrounding necrotic spots on the leaves of infected plants. Formation of halos is dependent on the beta-lactam tabtoxin, which causes ammonia accumulation in the host cell by inhibition of glutamine synthetase [3]. However, whether tabtoxin is an essential component of the disease process is unclear [4, 5].

Pathogenicity of P. syringae strains is dependent on the type III secretion system (T3SS). The T3SS secretes a suite of virulence 'effector' proteins into the host cytoplasm where they subvert the eukaryotic cell physiology and disrupt host defences [614]. Mutants lacking the T3SS do not secrete effectors, and as a consequence do not infect plants or induce disease symptoms. Thus, understanding effector action is central to understanding bacterial pathogenesis. A single P. syringae strain typically encodes about 30 different effectors [14]. However, different P. syringae strains have different complements of effector genes. The emerging view is that of a core of common effectors encoded by most strains, augmented by a variable set. Individual effectors appear to act redundantly with each other and are individually dispensable with a small or no loss to pathogen virulence [10]. Effectors are also thought to play an important role in determining host range. This is most clearly true when infections are restricted by host defences. Some plants have evolved specific mechanisms to recognise certain effectors; such recognition induces strong host defences which curtail infection. For example, expression of the T3SS effector HopQ1-1 from P. syringae pathovar tomato (Pto) DC3000 was sufficient to render Pta 11528 avirulent on Nicotiana benthamiana [15]. The opposite situation, in which acquisition of a novel effector gene confers the ability to infect new host plants, has not been demonstrated and remains speculative. However, heterologous expression of the effector gene avrPtoB conferred a plasmid-cured strain of P. syringae pathovar phaseolicola (Pph) with increased virulence [16]. We hope that further identification and characterisation of effector repertoires of particular strains will shine new light on their roles in determining host range. Finally, bacterial virulence is also likely to be influenced by other non-T3SS-dependent virulence factors such as toxins which are often co-regulated with the T3SS [17].

Complete genome sequences are available for strains representing three P. syringae pathovars: Pto, pathovar phaseolicola (Pph) and pathovar syringae (Psy) [1820]. Comparisons of these have led to the identification of core effector gene sets and to explain some of the differences in host-specificity between pathovars. However, these three sequenced strains are representatives of three distinct phylogroups within the species P. syringae, and as such are phylogenetically quite distant [21, 22]. According to DNA-DNA hybridisation studies and ribotyping [21], P. syringae can be divided into 9 discrete genomospecies. Representative strains of Psy, Pph and Pto fell into genomospecies one, two and three respectively [21]. Recently, a strain of pathovar oryzae (genomospecies four) was sequenced [23]. A draft genome sequence was also published for Pto T1 [24], a strain closely related to Pto DC3000 but restricted to tomato hosts, whereas Pto DC3000 is able to cause disease on Arabidopsis. In the current study, we explore genetic differences at an intermediate phylogenetic resolution; that is, we compared the genome sequences of Pta 11528 to that of P. phaseolicola (Pph) 1448A, which resides within the same phylogroup but possesses a distinct host range and causes different disease symptoms.

Pto DC3000 was the first plant-pathogenic pseudomonad to have its genome sequenced, helping to establish the Arabidopsis-Pto system as the primary model for plant-microbe interactions. However, Arabidopsis is not a natural host of Pto, and it is important to develop alternative systems given the genetic variability of P. syringae strains, particularly in regard to effectors. We work on the interaction between Pta and the wild tobacco plant N. benthamiana, which offers certain advantages over Arabidopsis. Firstly, N. benthamiana is an important model for the Solanaceae, which includes many important crop species. The Pta-N. benthamiana interaction is a natural pathosystem. Lastly, N. benthamiana is an important model plant that is more amenable to biochemistry-based approaches and facile manipulation of gene expression such as virus-induced gene silencing (VIGS). Thus N. benthamiana provides experimental options for understanding plant-bacterial interactions. Strains of Pta can cause disease on N. benthamiana, but relatively few genetic sequence data are available for this pathovar.

In this study we generated a draft complete genome sequence of Pta 11528 and used a functional screen for HrpL-dependent genes to infer its repertoire of T3SS effectors and associated Hrp Outer Proteins (Hops), which differs significantly from that of its closest relative whose complete genome has previously been published (Pph 1448A). Pta 11528 does not encode functional homologues of HopAF1 or HrpZ1. This was surprising since HopAF1 was conserved in the three previously sequenced pathovars [1820]. HrpZ1 is conserved in most strains of P. syringae that have been investigated, albeit with differences in amino acid sequence [25]. However, Pta strain 6605 and several other isolates from Japan, were previously shown to carry a major deletion leading to truncated HrpZ protein product [26]. Pta 11528 encodes several novel potential T3SS effectors for which no close orthologues have been reported. We also discovered several large genomic regions in Pta 11528 that do not share detectable nucleotide sequence similarity with previously sequenced Pseudomonas genomes. These regions may be horizontally acquired islands that possibly contribute to pathogenicity or epiphytic fitness of Pta 11528.

Results and discussion

Sequencing and assembly of the Pta 11528 genome

The Illumina sequencing platform provides a cost-effective and rapid means to generate nucleotide sequence data [2729]. Although this method generates very short sequence reads, several recent studies have demonstrated that it is possible to assemble these short reads into good quality draft genome sequences [3041].

We generated 12,096,631 pairs of 36-nucleotide reads for a total of 870,957,432 nucleotides. This represents approximately 145X depth of coverage assuming a genome size of six megabases. We used Velvet 0.7.18 [41] to assemble the reads de novo. Our resulting assembly had 71 supercontigs of mean length 85,604 nucleotides, an N50 number of eight, and N50 length of 317,167 nucleotides; that is, the eight longest supercontigs were all at least 317,167 nucleotides long and together covered more than 50% of the predicted genome size of six megabases. The largest supercontig was 606,547 nucleotides long. The total length of the 71 assembled supercontigs was 6,077,921 nucleotides. The G+C content of the assembly was 57.96%, similar to that of the previously sequenced P. syringae genomes (Table 1). The sequence data from this project have been deposited at DDBJ/EMBL/GenBank under the accession ACHU00000000. The version described in this paper is the first version, ACHU01000000. The data can also be accessed from the authors' website http://tinyurl.com/Pta11528-data and as Additional files submitted with this manuscript. In addition, an interactive genome browser is available from the authors' website http://tinyurl.com/Pta11528-browser.

Table 1 Comparison of Pta 11528 genome properties with those of previously sequenced P. syringae genomes [1820, 8385], [86-93].

We aligned the 71 Pta supercontigs against published complete Pseudomonas genome sequences using MUMMER [42]. The Pta 11528 genome was most similar to that of Pph 1448A, with 97.02% nucleotide sequence identity over the alignable portions. The next most similar genome was that of Pto DC3000, with less than 90% identity (Table 1). This pattern of sequence similarity is consistent with phylogenetic studies that placed strains of Pta in the same phylogroup as Pph and revealed a relatively distant relationship to Pto [21, 22].

Comparison of the protein complement of Pta 11528 versus Pph 1448A and other pseudomonads

Using the FgenesB annotation pipeline http://www.softberry.com, we identified 6,057 potential protein-coding genes, of which 5,300 were predicted to encode proteins of at least 100 amino acids long. Of 5,300 predicted Pta 11528 proteins, 575 (10.8%) had no detectable homology with Pph 1448A proteins (based on our criterion of an E-value less than 1e-10 using BLASTP). Of these 575 sequences, 303 had no detectable homologues in Psy B728a nor Pto DC3000. These 303 Pta-specific sequences had a median length of 198 amino acids whereas the median length of the 5,300 sequences was 216 amino acids. Automated gene prediction is not infallible and inevitably a subset of the predictions will be incorrect. The reliability of gene predictions is poorer for short sequences than for longer ones. This slight enrichment for very short sequences among the Pta-specific gene predictions might be explained by the inclusion of some open reading frames that are not functional genes among those 303. However, many of the predicted proteins showed significant similarity to other proteins in the NCBI NR databases (See Additional file 1: Table S1), confirming that these are likely to be genuine conserved genes.

Conservation of the T3SS apparatus and T3SS-dependent effectors

The Hop Database (HopDB, http://www.pseudomonas-syringae.org) provides a catalogue of confirmed and predicted hop genes [43]. Figure 1 lists the hop genes in HopDB for the three previously fully sequenced P. syringae genomes. A 'core' set of hop genes are conserved in all three previously sequenced pathovars: avrE1, hopAF1, hopAH2, hopAJ2, hopAK1, hopAN1, hopI1, hopJ1, hopX1, hrpK1, hrpW1 and hrpZ1. In addition to this core set, each genome contains additional hop genes that are found in only a subset of the sequenced strains. The Pta 11528 homologues of hop genes are listed in Table 2. Figure 1 also indicates those hop genes for which a close homologue was found to be encoded in Pta 11528.

Table 2 Homologues of known hop genes in Pta 11528. Homologues were detected by searching the Pta 11528 FgenesB-predicted protein sequences against HopDB http://www.pseudomonas-syringae.org using BLASTP
Figure 1
figure 1

Comparison of the hop gene complements of the three previously fully sequenced P. syringae genomes. Those hop genes that are conserved in Pta 11528 are shown in boldface and underlined. Pta 11528 also contains three hop genes that do not have orthologues in the sequenced genomes: hopAR1, hopF1 and hopW1. * No close homologue of avrPto1 was found in Pta 11528; however, there is a gene encoding a protein that shares 43% amino acid identity with Avr Pto 1 from Pto DC3000. ** In the Pta 11528 genome hrpZ1 appears to be a pseudogene.

In sequenced strains of P. syringae, the gene cluster encoding the T3SS apparatus is flanked by collections of effector genes termed the exchangeable effector locus (EEL) and the conserved effector locus (CEL). Together, these three genetic components comprise the Hrp pathogenicity island [44]. A core set of hop genes is located in the Hrp pathogenicity island [44], which is highly conserved between Pta 11528 and Pph 1448A (Figure 2), except that in Pta 11528 there is a deletion in hrpZ1 and an insertion in the hrpV-hrcU intergenic region. The core hop genes avrE1, hopAH2, hopAJ2, hopAK1, hopAN1, hopI1, hopJ1, hopX1 and hrpK1 are conserved in Pta 11528 and encode intact full-length proteins. Pta 11528 encodes a full-length HrpW1 protein, albeit with insertions of 69 and 12 nucleotides relative to the Pph 1448A sequence. However, there is a large deletion in hrpZ1 that likely renders it non-functional and hopAF1 is completely absent.

Figure 2
figure 2

Conservation of the Hrp pathogenicity island between Pph 1448A and Pta 11528. Panel A shows an alignment of the Pph 1448A Hrp pathogenicity island (lower track) against the homologous region in Pta 11528 (upper track), prepared using GenomeMatcher, which indicates similarity values by colour with dark blue, green, yellow and red representing increasing degrees of similarity [78]. Panel B shows the MAQ [79] alignment of the Pta 11528 Illumina reads (in black) and the BLASTN [80] alignment of the Pta 11528 de novo assembly (in green) against the Hrp region of the Pph 1448A genome.

Besides the core conserved hop genes, the Pta 11528 genome assembly contains full-length orthologues of hopR1, hopAS1, hopAE1 and hopV1, which are also found in Pph 1448A but are absent from Psy B728a and/or Pto DC3000.

The hrpZ1 gene encodes a harpin, which is not classified as a type III effector because it is not injected directly into host cells. Harpins are characteristically acidic, heat-stable and enriched for glycine, lack cysteine residues [8] and can induce defences in both host and non-host plants [45, 46]. HrpZ1 forms pores in the host membrane [47] suggesting a role in translocation of effectors across the host membrane. It also shows sequence-specific protein binding activity [48]. HrpZ1 can induce defences in both host and non-host plants and tobacco has been extensively used as the non-host plant species [45, 46]. The inactivation of hrpZ1 in Pta 11528 and other strains of Pta [26] may be an adaptive strategy and have been an important process in the stepwise progression towards compatibility, allowing Pta 11528 to avoid detection by the tobacco host plant. This is reminiscent of the "black holes" and other processes that inactivate genes whose expressed products are detrimental to a pathogenic lifestyle [49, 50]. One excellent example is the inactivation of cadA in genomes of Shigella species as compared to the genome of their closely related but non-pathogenic Escherichia coli strain [51, 52].

Pta 11528 contains highly conserved homologues of hopAB2, hopW, hopO1-1, hopT1-1, hopAG1, hopAH1, hopF1 and hopAR1, which are absent in Pph 1448A. Although absent from the Pph 1448A genome, hopAR1 and hopF1 have been identified in other strains of Pph [5357]. In Pph 1302A, hopAR1 is located on the pathogenicity island PPH GI-1, though its genomic location varies between strains [56, 57]. PPH GI-1 is absent from the Pph 1448A genome [57]. The Pta 11528 genome (supercontig 1087) possesses a region of similarity to PTPH GI-1, but which contains a substantial number of insertions and deletions (Additional file 2: Figure S1). The Pta 11528 hopAR1 homologue (C1E_2036) is not located in the PPH GI-1 region; it falls on supercontig 672 about two kilobases upstream of a gene encoding a protein (C1E_2039) sharing 43% amino acid identity with Pto DC3000 avrPto1. In contrast to AvrPto1 from Pto DC3000, the AvrPto1 homologue (C1E_2039) from Pta 11528 is not recognised by the plant Pto/Prf system (S. Gimenez Ibanez and J. Rathjen, manuscript in preparation).

The homologues of hopAG1, hopAH1 and the degenerate hopAI1' are found within a region of the Pta 11528 genome that shares synteny with the chromosome of Psy B728a. This region is also conserved in Pto DC3000A, albeit with several deletions and insertions, suggesting that these effector genes are ancestral to the divergence of the pathovars and have been lost in Pph 1448A rather than having been laterally transferred laterally between Pta 11528 and Psy B728a. In Pto DC3000, hopAG1 (PSPTO_0901) has been disrupted by an insertion sequence (IS) element. This is consistent with a model of lineage-specific loss of certain ancestral effectors.

In Pto DC3000, hopO1-1 and hopT1-1 are located on the large plasmid pDC3000A; homologues of these effector-encoding genes are not found in Pph 1448A. The Pta 11528 genome contains a three kilobase region of homology to pDC3000 comprising homologues of these two effector genes and a homologue of the ShcO1 chaperone-encoding gene. These three genes are situated in a large (at least 50 kilobase) region of the Pta 11528 genome that has only limited sequence similarity with Pph 1448A. Two tRNA genes (tRNA-Pro and tRNA-Lys) are located at the boundary of this region (Figure 3), which would be consistent with this comprising a mobile island.

Figure 3
figure 3

A 90-kilobase region of the Pta 11528 genome containing homologues of hopT1-1 and hopO1-1. The G+C content is indicated by the plot near the top of the figure.

In plasmid pMA4326B from P. syringae pathovar maculicola (Pma), the hopW1 effector gene is immediately adjacent to a three-gene cassette comprising a resolvase, an integrase and exeA. This cassette is also found in plasmids and chromosomes of several human-pathogenic Gram-negative bacteria [58]. We found a homologue of this cassette along with a hopW1 homologue on supercontig 955 of the Pta 11528 genome assembly. Stavrinides and Guttman [58] proposed that the boundaries of the cassette lay upstream of the resolvase and upstream of hopW1. The presence of this four-gene unit in a completely different location in Pta 11528 is indeed consistent with the hypothesis that it represents a discrete mobile unit.

Several hop genes are located on the large plasmid of Pph 1448A. We found no homologues of these genes in Pta 11528, suggesting that the plasmid is not present in Pta 11528. Consistent with this, only a small proportion of the plasmid was alignable to our 36-nucleotide Illumina sequence reads (Figure 4). This reveals that a large component of the pathogen's effector arsenal is determined by its complement of plasmids. However, simple loss or gain of a plasmid does not explain all of the differences in effector complement since Pta 11528 lacks homologues of several Pph 1448A chromosomally-located effector-encoding hop genes hopG1, hopAF1, avrB4, hopF3 and hopAT1 as well as the non-effector hopAJ1. It also lacks homologues of the Pph 1448A degenerate effector gene hopAB3'.

Figure 4
figure 4

Limited conservation between the Pta 11528 genome sequence and the sequence of the Pph 1448A large plasmid. The MAQ [79] alignment of the Pta 11528 Illumina reads is shown in black. The thickness of the black track is proportional to the depth of coverage by Illumina reads. The BLASTN [80] alignment of the Pta 11528 de novo assembly against the plasmid sequence is shown in a green track, with the thickness of this single green track being proportional to sequence identity.

The regions of the Pph 1448A large plasmid that are apparently conserved in Pta 11528 include genes encoding the conjugal transfer system, suggesting the presence of one or more plasmids in this strain. We found an open reading frame (C1E_3950, located on supercontig 955 coordinates 59126-60394) encoding a protein with about 97% sequence identity to the RepA proteins characteristically encoded on pT23A-family plasmids (e.g. AAW01447; reviewed in [59]), suggesting that this 236 kilobase supercontig might represent a plasmid.

A functional screen for HrpL-regulated genes

We used a previously described functional screen [60] to complement our bioinformatics-based searches for type III effectors of Pta 11528. Our functional screen was based on two steps. The first step was employed to identify genes whose expression was regulated by the T3SS alternative sigma factor, HrpL. The second step was used to identify the subset of HrpL-regulated genes that encoded effectors. For Pta 11528, we employed only the first step to identify candidate effector genes based on induced expression by HrpL. A library was constructed from Pta 11528 into a broad-host range vector carrying a promoter-less GFP and mobilized into Pto lacking its endogenous hrpL but conditionally complemented with an arabinose-inducible hrpL. We used a fluorescence activated cell sorter (FACS) to select clones that carried HrpL-inducible promoters based on expression of GFP after growth in arabinose. Clones were sequenced and sequences were assembled. Clones representative of assembled supercontigs were verified again for HrpL regulation using FACS. Among the genes whose expression was confirmed to be HrpL-dependent were those encoding effectors hopAE1, hopI1, hopAR1, the avrPto1-like gene, hopF1, hopT1-1, hopO1-1, avrE1, hopX1, and the degenerate hopM1' and hopAI1' as well as known T3SS-associated genes hrpH (ORF1 of the CEL; [61]) and hrpW1. Interestingly, the screen also confirmed HrpL-dependent regulation of genes encoding a major facilitator superfamily (MFS) permease and a putative peptidase (Table 3).

Table 3 Pta 11528 genes confirmed by the functional screen to be under the transcriptional control of HrpL

Other differences in predicted proteomes of P. syringae strains

Host range and pathogenicity are likely to be further influenced by genes other than those associated with type III secretion. Virulence determinants in P. syringae include toxins as well as epiphytic fitness; that is, the ability to acquire nutrients and survive on the leaf surface [14]. Epiphytic fitness depends on quorum-sensing [62], chemotaxis [63], osmo-protection, extracellular polysaccharides, glycosylation of extracellular structures [64] iron uptake [65] and the ability to form biofilms. Cell-wall-degrading hydrolytic enzymes play a role in virulence in at least some plant-pathogenic pseudomonads [66]). Secretion systems (including type I, type II, type IV, type V, type VI and twin arginine transporter) may also contribute to both virulence and epiphytic fitness [67], whilst multidrug efflux pumps may confer resistance to plant-derived antimicrobials [68].

To identify differences between Pta 11528 and the previously sequenced Pph 1448A, Psy B728a and Pto DC3000 with respect to their repertoires of virulence factors, we performed BLASTP searches between the predicted proteomes. We found no significant differences in the repertoires of secretion systems between the proteomes. However, we found that Pta 11528 lacks homologues of several Pph 1448A polysaccharide modifying enzymes (glycosyl transferase PSPPH_0951, polysaccharide lyase PSPPH_1510, glycosyl transferase PSPPH_3642). Conversely, Pta 11528 encodes two glycosyl transferases (C1E_0355 and C1E_0361) and a thermostable glycosylase (C1E_4802) that do not have homologues in any of the three fully sequenced P. syringae genomes. This may imply differences in the extracellular polysaccharide profiles. In contrast to Pph 1448A, Pta 11528 lacks homologues of RhsA insecticidal toxins (PSPPH_4042 and PSPPH_4043). However, a tabtoxin biosynthesis gene cluster is found in the Pta 11528 genome and shows a high degree of conservation with the previously sequenced Pta BR2 tabtoxin biosynthesis cluster [69].

Pta 11528 encodes several enzymes that do not have homologues in any of the three fully sequenced P. syringae genomes (Table 4), including a predicted gluconolactonase (C1E_2553), a predicted dienelactone hydrolase (C1E_2589), a predicted nitroreductase (C1E_6026), and a sulphotransferase (C1E_6026). C1E_0903 shares 71.4% amino acid sequence identity with a predicted epoxide hydrolase (YP_745600.1) from Granulibacter bethesdensis CGDNIH1 [70] and has a significant match to the epoxide hydrolase N-terminal domain in the Pfam database (PF06441) [71, 72]. Epoxide hydrolases are found in P. aeruginosa and P. fluorescens PfO-1, but not in any other pseudomonads. It is possible that this gene product has a function in detoxification of host-derived secondary metabolites.

Table 4 Proteins encoded by the draft Pta 11528 genome that have no detectable homologues on three previously fully sequenced P. syringae genomes.

Pta protein C1E_6026 has a significant match to the sulphotransferase domain (Pfam:PF00685). Examples of this protein domain have not been found in other pseudomonads except for P. fluorescens PfO-1. Sulphotransferase proteins include flavonyl 3-sulphotransferase, aryl sulphotransferase, alcohol sulphotransferase, estrogen sulphotransferase and phenol-sulphating phenol sulphotransferase. These enzymes are responsible for the transfer of sulphate groups to specific compounds. The sulphotransferase gene (C1E_6026, 82% amino acid identity to P. fluorescens Pfl01_0157) overlaps a two kilobase Pta 11528-specific genomic island that also encodes a phage tail collar-protein encoding gene (C1E_5461, 61% amino acid identity to P. fluorescens Pfl01_0155) and an acetyltransferase (C1E_5459, 76% amino acid identity to P. fluorescens Pfl01_0148). We speculate that this region has been horizontally acquired in the Pta 11528 lineage via a bacteriophage.

An 80 kilobase region of Pta 11528 supercontig 684 contains two open reading frames (ORFs) (C1E_2584 and C1E_2585) whose respective predicted protein products show 48 and 55% amino acid identity to the C- and N-termini of a P. putida methyl-accepting chemotaxis protein (MCP) (PP_2643) and little similarity to any P. syringae protein. Since the N- and C-termini are divided into separate reading frames, this probably represents a degenerate pseudogene. Immediately downstream of these ORFs is a gene (C1E_2583) that specifies a MCP showing greatest sequence identity (70%) to PP_2643 from P. putida, whilst sharing only 65% identity to its closest homologue in P. syringae (PSPPH_4743). This region also encodes another MCP (C1E_2587) that shares only 50% amino acid identity with any previously sequenced P. syringae homologue. It remains to be tested whether these MCPs play a role in pathogenesis and/or epiphytic fitness.

Transcriptional regulators are not normally considered to be virulence factors. However, expression of virulence factors may be coordinated by and dependent on regulators. Moreover, heterologous expression of the RscS regulator was recently shown to be sufficient to transform a fish symbiont into a squid symbiont [73]. Pta 11528 encodes several predicted transcriptional regulators that are not found in Pto DC3000, Psy B728a and Pph 1448A. These include two predicted TetR-like proteins (C1E_0901 and C1E_6027), two predicted xenobiotic response element proteins (C1E_2056 and C1E_2563), a LacI-like protein (C1E_2286), a Cro/CI family protein (C1E_2570) and an IclR family protein (C1E_5715).

Pta 11528 encodes a novel pilin (C1E_2329) not found in previously sequenced P. syringae strains but sharing significant sequence similarity with a type IV pilin from P. aeruginosa [74]. Pilin is the major protein component of the type IV pili, which have functions in forming micro-colonies and biofilms, host-cell adhesion, signalling, phage-attachment, DNA uptake and surface motility, and have been implicated as virulence factors in animal-pathogenic bacteria [75]. The precise function of the C1E_2329 pilin is unknown but it may be involved in epiphytic fitness or plant-pathogenesis or could even be involved in an interaction with an insect vector.

Pta-specific genomic islands

We identified 102 genomic regions of at least one kilobase in length which gave no BLASTN matches against previously sequenced Pseudomonas genomes (Additional file 3: Table S2). Ten of the Pta 11528-specific regions are longer than 10 kilobases, the longest being 37.7, 21.8, 18.7, 17.9 and 16.6 kilobases. The 16.6 kilobase region corresponds to the tabtoxin biosynthesis gene cluster [69]. These regions will be good candidates for further study of the genetic basis for association of Pta with the tobacco host. For example, several of the islands encode MFS transporters and other efflux proteins that might be involved in protection from plant-derived antimicrobials (Additional file 3: Table S2).

Conclusion

We have generated a draft complete genome sequence for the Pta 11528 a pathogen that naturally causes disease in wild tobacco, an important model system for studying plant disease and immunity. From this sequence, combined with a functional screen, we were able to deduce the pathogen's repertoire of T3SS-associated Hop proteins. This has revealed some important differences between Pta and other pathovars with respect to the arsenal of T3SS effectors at their disposal for use against the host plant. We also revealed more than a hundred Pta-specific genomic regions that are not conserved in any other sequenced P. syringae, providing many potential leads for the further study of the Pta-tobacco disease system.

Methods

Sequence data

The previously published sequences of P. syringae pathovar phaseolicola 1448A [20], P. syringae pathovar syringae B728a [19], P. syringae pathovar tomato DC3000 [18] were downloaded from the NCBI FTP site ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Pseudomonas_syringae_pv_B728a. The NCBI non-redundant (NR) Proteins database was downloaded from the NCBI FTP site ftp://ftp.ncbi.nih.gov/blast/db/ on 10th December 2008.

De novo sequence assembly and annotation

Solexa sequence data were assembled using Velvet 0.7.18 [41]. We used Softberry's FgenesB pipeline http://www.softberry.com to predict genes encoding rRNAs, tDNAs and proteins. Annotation of protein-coding genes by FgenesB was based on the NCBI NR Proteins database.

Prediction of HrpL-binding sites (Hrp boxes)

We built a profile hidden Markov model (HMM) based on a multiple sequence alignment of 26 known Hrp boxes from Pto DC3000 using hmmb from the HMMER 1.8.5 package http://hmmer.janelia.org. DNA sequence was scanned against this profile-HMM using hmmls from HMMER 1.8.5 with a bit-score cut-off of 12.0.

Functional screen for candidate type III effectors

Library preparation and the Flow cytometric-based screen for HrpL-induced genes of Pta 11528 were done according to [60].

Visualisation of data

We generated graphical views of genome alignments using CGView [76]. To visualise the annotation draft genome assembly of Pta 11528, we used the 'gbrowse' Generic Genome Browser [77].

Library preparation for Illumina sequencing

DNA was prepared from bacteria grown in L-medium using the Puregene Genomic DNA Purification Kit (Gentra Systems, Inc., Minneapolis, USA) according to manufacturer's instructions. A library for Illumina Paired-End sequencing was prepared from 5 mg DNA using a Paired-End DNA Sample Prep Kit (Pe-102-1001, Illumina, Inc., Cambridge, UK). DNA was fragmented by nebulisation for 6 min at a pressure of 32 psi. For end-repair and phosphorylation, sheared DNA was purified using QIAquick Nucleotide Removal Kit (Quiagen, Crawley, UK). The end repaired DNA was A-tailed and ada Pto rs were ligated according to manufacturer's instructions.

Size fractionation and purification of ligation products was performed using a 5% polyacrylamide gel run in TBE at 180V for 120 min. Gel slices were cut containing DNA in the 500 to 10 bp range. DNA was than extracted using 0.3 M sodium acetate and 2 mM EDTA [pH 8.0] followed by ethanol precipitation. Using 18 PCR cycles with primer PE1.0 and PE2.0 supplied by Illumina, 5' ada Pto r extension and enrichment of the library was performed. The library was finally purified using a QIAquick PCR Purification Kit and adjusted to a concentration of 10 nM in 0.1% Tween. The stock was kept at -20°C until used.

Sequencing

The flow cell was prepared according to manufacturer's instructions using a Paired-End Cluster Generation Kit (Pe-103-1001) and a Cluster Station. Sequencing reactions were performed on a 1G Genome Analyzer equipped with a Paired-End Module (Illumina, Inc., Cambridge, UK). 5 pM of the library were used to achieve ~20,000 to 25,000 clusters per tile. Capillary sequencing of avrE, HrpW1 and other individual genes was done on an ABI 3730. PCR products were directly sequenced after treatment with ExoI and SAP. Primer sequences are available upon request from JHC.

Verification of Illumina sequence data

Three of the core hop genes in Pta 11528 appeared to be degenerate, based on the de novo assembly of short Illumina sequence reads. The avrE1 gene appeared to have a 20-nucleotide deletion, hrpZ1 a 325-nucleotide deletion, whilst hrpW1 appeared to have three insertions of 22, 6 and 12 nucleotides. Currently, the reliability of de novo sequence assembly from short Illumina reads has not been fully characterised. In particular, repetitive and low-complexity sequence might generate artefacts in assembled supercontigs. Therefore, we checked these putative insertions and deletions by aligning the Illumina sequence reads against the relevant regions of both the Pph 1448A reference genome sequence and our Pta 11528 assembly. As an additional control, we also performed Velvet assembies on previously published Illumina short-read data from Psy B728a [35]. We found that the B728a avreE1, hrpZ1, hrpW1 and hopAF1 were assembled intact [Additional file 4: Figure S2], indicating that there is nothing inherently 'un-assemble-able' about these gene sequences. Sequence alignment is much more robust than de novo assembly and is not subject to assembly artefacts. The alignments supported the presence of a large deletion in hrpZ1. However, the alignments were not consistent with the assembly for avrE1 and hrpW1. Therefore, we amplified the Pta 11528 avrE1 and hrpW1 genes by PCR and verified their sequences by capillary sequencing [Additional file 5: Table S3]. This confirmed that the apparent deletion in avrE1 was an artefact of the de novo assembly and that the avrE1 sequence encodes a full-length protein product. Furthermore, transient expression of avrE1 in N. benthamiana induces cell death (S. Gimenez Ibanez and J. Rathjen, unpublished). Capillary sequencing also confirmed that the de novo assembly of hrpW1 was incorrect and that Pta 11528 encodes a full-length HrpW1 protein, albeit with repetitive sequence insertions of 69 and 12 nucleotides relative to the Pph 1448A sequence.

The absence of hopAF1 from Pta 11528 is supported not only by the de novo assembly, but also by the absence of aligned (unassembled) reads. As an additional control for the degeneracy of hopAF1 and hrpZ1, we performed the same bioinformatics and sequencing protocols to Psy B728a [35] and recovered hopAF1 and hrpZ1 intact in the de novo assembly assembly (Additional file 4: Figure S1).

Sequence data

In addition to the data available from Genbank accession ACHU00000000, the Velvet assembly and predicted protein sequences are provided in FastA format in Additional files 6 and Additional file 7.

Bioinformatics tools

We used GenomeMatcher [78] for generating and visualising whole-genome alignments. For aligning short Illumina sequence reads against a reference genome, we used MAQ [79] and for other sequence alignments and searches we used BLAST [80]. We used previously published complete genomes as reference sequences for comparative analyses [8185].