Background

The bacterial alternative sigma factor RpoN recognizes and binds a -24/-12-type promoter with the following consensus sequence: 5'-TGGCACG-N4-TTGCW-3' (the bold G and C are situated at position -24 and -12 relative to the transcription start site, respectively) [1]. Subsequently, the core DNA-dependent RNA polymerase (core-RNAP) binds to the RpoN-DNA secondary complex to form a stable closed promoter complex. The closed promoter complex is unable to initiate transcription by itself. For this, melting of the double-stranded (ds) DNA within the closed complex is required [2]. This is accomplished by the nucleotide hydrolysis activity of an activator or enhancer-binding protein (EBP). EBPs bind to enhancer sites situated 100 base-pairs (bp) or more upstream of the transcription initiation site. Each EBP is controlled by its own signal transduction pathway, thereby responding to different conditions [3,4,5]. As there is almost no leaky expression in the absence of EBPs, the expression of different RpoN-dependent genes is tightly regulated by the different EBPs.

RpoN is also known as σ54 (from the 54 kDa molecular weight of the Escherichia coli polypeptide), σN (this sigma factor was initially discovered for its requirement for the expression of nitrogen metabolism genes) and NtrA or GlnF (names now not in common use). Because the members of this protein family vary considerably in molecular weight, the designation σ54 cannot be correctly applied to all of them. Furthermore, we use the name RpoN instead of σN as the link with the encoding gene rpoN is more obvious. As some bacteria (for example, Bradyrhizobium japonicum, Rhizobium etli and Mesorhizobium loti) have two copies of the gene, the respective proteins can be easily distinguished as RpoN1 and RpoN2.

RpoN helps initiating the transcription of genes encoding proteins for very diverse functions in a broad range of bacteria [5,6,7]. The processes controlled by RpoN are not essential for cell survival and growth under favorable conditions, with the exception of Myxococcus xanthus [8]. The most widespread RpoN-regulated function in bacteria is the assimilation of ammonia [6,7]. The expression of genes coding for glutamine synthetase, an ammonium transporter and the PII proteins is controlled by the NtrC EBP and RpoN. In the enteric nitrogen regulation (Ntr) paradigm, NtrC is activated via phosphorylation by NtrB. The PII protein, a signal transducer, stimulates dephosphorylation of NtrC-P under conditions of nitrogen excess (reflected by a high glutamine/2-ketoglutarate ratio), whereas it does not affect the NtrC phosphorylation status under nitrogen-limiting conditions [9]. Consequently, under low nitrogen conditions, NtrC-P activates transcription of the NtrC-RpoN regulon.

Species of the genera Allorhizobium, Azorhizobium, Bradyrhizobium, Mesorhizobium, Rhizobium and Sinorhizobium are generally referred to as rhizobia. Rhizobial rpoN mutants are deficient in symbiotic nitrogen fixation [10,11,12,13,14]. RpoN ensures the transcription of most of the nitrogen fixation genes (nif/fix) whose gene products constitute the nitrogenase complex and accessory proteins [15,16]. However, several other symbiosis-related genes are reported to be RpoN-dependent [17,18,19,20,21,22,23,24,25,26,27]. So far, no large-scale effort has been undertaken to unravel the RpoN-regulon in rhizobia. The recent publication of several rhizobial genomes and symbiotic regions is a good opportunity to identify RpoN-regulated genes in rhizobia. To obtain a good view on what (symbiotic) functions are controlled by RpoN, we carried out an in silico analysis on the presence of -24/-12-type promoters in the symbiotic regions of R. etli CFN42, Rhizobium sp. NGR234, B. japonicum USDA110 and M. loti R7A [28,29,30] and in the genomes of M. loti MAFF303099 and Sinorhizobium meliloti 1021 [31,32]. Two closely related non-symbiotic species belonging to the Rhizobiales order of the α-proteobacteria, namely Agrobacterium tumefaciens C58 and Brucella melitensis 16M [33,34], were also included. These are non-symbiotic plant and animal pathogens, respectively. To date, there is only one report on RpoN-dependent functions in A. tumefaciens [35] whereas no information whatsoever is available on the RpoN-regulon of B. melitensis. The possible RpoN-dependent genes predicted by the screening were complemented with data from the literature and classified according to the function of the encoded proteins.

Results and discussion

In silicoidentification of potential RpoN-dependent promoters

The upstream intergenic sequences were extracted from the genomes of M. loti MAFF303099, S. meliloti 1021, A. tumefaciens C58 and B. melitensis M16 and the symbiotic regions of R. etli CFN42, Rhizobium sp. NGR234, B. japonicum USDA110 and M. loti R7A (see Materials and methods) [28,29,30,31,32,33,34]. The upper strand of these sequences was scored against a weight matrix using PATSER (see Materials and methods). Positive matches (possible RpoN-binding sites or -24/-12-type promoters) were classified according to the functions of the encoded gene products (see Additional data files, pages 8,9). To ensure a sound functional description, the predicted proteins were individually screened against the protein databases of the National Center for Biotechnology Information (NCBI) using BLASTP.

In our analysis, the number of false-positive matches is estimated to be very low. The use of PATSER in combination with a strong weight matrix is a preferred method to identify true binding sites [36]. The weight matrix used here was based on a set of 186 characterized RpoN-binding sites of 44 different bacterial species (Table 1) [1]. The lower threshold for the scores was chosen such that only matches strongly resembling the consensus were retained (see Materials and methods). This high stringency allows for a high specificity at the expense of the sensitivity. Indeed, some reports mention the presence of very poorly conserved -24/-12-type promoters that appear to be functional to some degree (see Additional data files, pages 8,9). Our stringent procedure most probably misses these sites. Therefore, the retained matches might represent an underestimation of the actual number of active -24/-12-type promoters. Second, only intergenic sequences were considered for the screening, as all known -24/-12-type promoters are situated in intergenic regions [7]. Moreover, the matches all have the correct orientation, as the upper strand of the intergenic sequences was used (for matches on the lower strand, see 'Additional control mechanisms involving RpoN'). The median of the distances from the -12 (C) position - the cytidylate residue on position -12 relative to the transcription initiation site - of a match to the start codon of the downstream coding sequence (CDS) varies from 74 to 156 bp (Table 2). This is somewhat different from the situation in E. coli, where the average distance amounts to 50 bp [7]. Although roughly 75% of the matches are within 200 bp upstream of the start codon, matches with distances over 1,000 bp were also retained (Table 2). A good example to justify this is the case of the mapped promoter of the B. japonicum gene fixB, which is situated 720 bp upstream of the coding region [37]. As is the case with other in silico prediction methods, the results of our analysis may have been biased by the approach used.

Table 1 Weight matrix based on 186 characterized -24/-12-type promoters [1]
Table 2 Comparison of different members of the Rhizobiales

A good test case to evaluate the reliability of our predictions is to compare the matches with experimental data. In our laboratory we confirmed the RpoN-dependent expression of several R. etli genes with predicted RpoN-binding sites (see Additional data files, pages 3,4,6,7,12,13). The products of these genes are involved in a wide variety of functions, such as nitrogen fixation (nifH and iscN-nifUS) [38,39], oxidative stress and gene regulation (spxA-rpoN2) [13,40], and transport (yp104-103 and yp100) [41] (G. Dirix, M. Van Guyse and J.M., unpublished results). For the genes on plasmid pNGR234a of Rhizobium sp. NGR234, an extensive expression analysis is available as well as an annotation of RpoN-dependent promoters [42]. Our screening confirmed the 15 previously annotated RpoN-dependent promoters on pNGR234a (controlling 50 CDSs). All these promoters are upregulated in bacteroids, resembling the NifA-RpoN-dependent symbiosis-specific expression pattern of the nitrogen fixation genes (see Additional data files). In addition to these promoters, eight other matches were found, controlling ten CDSs. Five out of these eight are highly expressed in bacteroids (see Additional data files), indicating that they are probably functional. In the symbiotic region of B. japonicum, 31 matches were found (controlling 87 CDSs) (Table 2), which were also previously annotated as RpoN-dependent promoters [29]. Thirteen of these promoters were shown to be functional and controlled by NifA and RpoN (see Additional data files). Furthermore, the fact that most - if not all - common RpoN-controlled genes were picked up for all species screened, including those already characterized in rhizobia, gives additional support to the methodology used. Common RpoN-dependent functions include assimilation of ammonium, transport of C4-dicarboxylates and nitrogen fixation (see Additional data files, pages 1-7, 10).

Functional classification of RpoN-dependent genes in the Rhizobiales

The matches (also referred to as possible RpoN-binding sites or possible -24/-12-type promoters) obtained from the in silico search were complemented with literature data on RpoN-regulated genes in rhizobia, and classified according to protein function (see Additional data files).

For the four genomes screened, matches were found upstream of the genes whose products are involved in the uptake and assimilation of ammonia: glnK-amtB, gInBA and glnII (see Additional data files, pages 1,2). A similar regulation and gene organization is found in other α-proteobacteria [9]. B. melitensis differs slightly in that amtB constitutes a monocistronic unit and the glnII ortholog was not identified.

Possible RpoN-binding sites were found upstream of many nitrogen fixation genes in the six symbiotic, nitrogen-fixing species (R. etli, Rhizobium sp. NGR234, B. japonicum, M. loti MAF303099 and R7A and S. meliloti). Most rhizobial nitrogen fixation genes are controlled by RpoN and the NifA EBP (see Additional data files, pages 2-7). The latter protein is activated under microaerobic conditions and is highly active in bacteroids [15,16].

A possible RpoN-binding site is also present in the promoter region of dctA, encoding a C4-dicarboxylate transporter, in all species screened, except for B. melitensis, where no such gene is present, and R. etli, where it is not present on the symbiotic plasmid (see Additional data files, page 10). In M. loti MAFF303099, -24/-12-type promoters were identified in front of the two C4-dicarboxylate transporter genes. Rhizobial dctA mutants are deficient in nitrogen fixation. The expression of dctA is controlled under free-living conditions by RpoN-DctD, whereas in bacteroids, the DctD EBP is (partially) replaced by another unknown EBP. The DctD activator protein is activated by the DctB sensor protein that senses the presence of C4-dicarboxylates together with DctA [16].

During symbiotic nitrogen fixation, hydrogen is produced as an obligate byproduct of the nitrogenase reaction. Some rhizobia possess a hydrogen-uptake (Hup) system that reoxidizes the hydrogen, a process that is thought to increase the efficiency of nitrogen fixation. This is, for instance, the case in B. japonicum, where the -24/-12-type promoters controlling these genes were confirmed in the screening (see Additional data files, pages 7,8). In B. japonicum, HoxA-RpoN is required for hupSL expression under free-living conditions and FixK2 under symbiotic conditions, whereas in Rhizobium leguminosarum, expression of hup genes is only induced in bacteroids and depends on NifA-RpoN [21,43,44].

As well as these previously well-characterized RpoN-dependent functions, several other groups of genes were found in the in silico screening. Potential target genes preceded by a possible -24/-12-type promoter belong to the following functional categories: nodulation; transport; detoxification; gene regulation; cell envelope; amino-acid biosynthesis; energy and DNA metabolism; translation; integration and recombination; and various others (see Additional data files, pages 8-28).

An interesting observation is the large number of hypothetical CDSs preceded by a possible RpoN-binding site (see Additional data files, pages 21-28). Of the 593 gene products identified in the screening, 22% have no function attributed, but do have homologs in other species. Proteins encoded by orphan genes, unique to one species, make up 9% of the total. Some of these hypothetical CDSs having a possible -24/-12-type promoter were picked up in several genomes: atu1754-smc01000-mll0186, smc00999-mll0185, mll9007-y4nG, mll9408-y4nH, orf196-y4vH, orf538-y4vI, orf354-y4vJ, id79-mlr5912-y4vQ-msi288-yp001, yp003-y4wP-yp021-yp099, mlr5819-y4wI, mll5852-msi351, mlr5886-msi324, mlr5914-msi286 and msr5928-msi276. The majority of these hypothetical genes in Rhizobium sp. NGR234 are highly expressed in bacteroids [42], suggesting a possible involvement of the putative gene products in the symbiosis.

Another interesting class of possible RpoN-regulated genes encodes proteins catalyzing detoxification reactions (see Additional data files, pages 12,13). During symbiosis, rhizobia are confronted with toxic compounds and oxidative stress in the form of reactive oxygen species [45]. Effective protection against these harmful agents would thus enhance the symbiotic performance of the rhizobia. Two good candidate proteins to fulfill such a function were identified in this screening: a peroxiredoxin and a cytochrome P450. Peroxiredoxins reduce peroxides to the alcohol and water. The gene encoding the symbiosis-specific peroxiredoxin SpxA is highly expressed in bacteroids of several rhizobial species in a NifA-RpoN-dependent way [13,40,42]. Cytochrome P450S catalyze the monoxygenation of a broad spectrum of substrates, including toxic compounds. Several genes encoding cytochrome P450S have a possible RpoN-binding site in their promoter region. One subclass of these P450S is conserved among several rhizobial species and is highly expressed in bacteroids [27,42]. Expression of the Rhizobium sp. BR816 homolog depends on NifA-RpoN and another, so far unidentified, factor [27]. Furthermore, two matches were found upstream of B. japonicum nrgA and nrgBC. The expression of these genes, possibly involved in detoxification reactions, was shown to be controlled by NifA and RpoN [46].

Transport systems have an important role at different stages of the rhizobium-legume symbiosis [41,47,48,49]. Many transporter-encoding gene loci are preceded by a possible -24/-12-type promoter (see Additional data files, pages 9-12). The majority of these transporters are of the ABC type, which is consistent with the fact that this type is the most abundant transporter in Rhizobiaceae family [33]. The RpoN-regulated transporters might be involved in symbiotic (as, for instance, DctA) or non-symbiotic (in A. tumefaciens) processes.

Many regulatory genes have a possible RpoN-binding site in their promoter region (see Additional data files, pages 13,14). The RpoN regulon is known to expand its working range by controlling the expression of other regulatory genes [6,7]. The induction of the y4qH gene (preceded by a possible RpoN-binding site and encoding a LuxR-type transcriptional regulator) in bacteroids of Rhizobium sp. NGR234 illustrates how RpoN might indirectly control the expression of genes involved in symbiosis [42]. Other examples include RpoN dependency of R. leguminosarum fnrN, or the autoregulation of R. etli rpoN2 [13,14]. The latter form of control also appears to be conserved in M. loti MAFF303099 and R7A, where a possible -24/-12-type promoter was identified upstream of the second copy of rpoN (see Additional data files, page 7).

The gene products of the rhizobial nodulation genes (nol, nod, noe) synthesize and export a class of signal molecules, the so-called Nod factors, involved in the early steps of the symbiotic interaction with leguminous plants. Although the transcription of most nodulation genes depends on the NodD regulator protein in the presence of a suitable plant-root-derived flavonoid, several reports mention an RpoN dependency (to some extent) of some of the nodulation genes ([22,23,24,26,50] and see Additional data files, pages 8,9). However, the screening did not reveal any matches upstream of rhizobial nodulation genes. Because of the high stringency of the screening, several promoters, such as those of Rhizobium sp. NGR234 nodD2 and S. meliloti nodD3, may have been missed. It thus seems that less conserved -24/-12-type promoters are functional to some degree. On the other hand, such poorly conserved promoters may be the exception rather than the rule. Nevertheless, the -24/-12-type promoter of the B. japonicum nfeC gene (involved in nodule formation) was confirmed in the screening.

The other functional categories (see Additional data files, page 14-21) include, among others, cell envelope, amino-acid biosynthesis, energy metabolism, DNA metabolism, translation, and integration and recombination. Unlike the functions discussed above, these categories comprise many different proteins from different bacterial species. It appears that the RpoN dependency of the encoding genes in these categories is not conserved in the Rhizobiales. This might reflect a species-specific recruitment of RpoN for gene transcription.

Signals controlling RpoN-dependent gene expression in the Rhizobiales

What signals control the activity of EBPs in species of the Rhizobiales? An in silico screening for EBPs in the proteomes of members of the Rhizobiales (see Materials and methods) revealed that the number of candidate EBPs is proportional to the genome size (Table 2). M. loti, S. meliloti, A. tumefaciens and B. melitensis all have roughly 1 EBP per million bases (Mb) (8, 7, 6 and 3, respectively), a number differing from a previous estimate (13, 12 and 9, respectively; the B. melitensis genome was not published at the time) [33]. However, our predicted EBPs scored significantly better (E-value 10-25 or less) with the Pfam Sigma54_activat domain HMM motif than the best of the rejected proteins (E-value 10-5 or greater), giving confidence to our prediction. Each of the candidate EBPs was subjected to a BLASTP search (default settings) against the NCBI protein databases and classified according to homology (Table 3). A similar analysis of Pseudomonas aeruginosa, Salmonella typhimurium and E. coli proteomes revealed that these bacteria have a higher number of possible EBPs (22, 15 and 13, respectively) than the species of the Rhizobiales (data not shown).

Table 3 Enhancer binding proteins (EBPs) of different members of the Rhizobiales

All four genomes screened have NtrC and NtrX orthologs (Table 3), which are both nitrogen-responsive EBPs [9]. Rhizobial NtrC-RpoN-regulated genes are downregulated early in bacteroid differentiation [51,52]. This downregulation seems to be necessary for an effective symbiosis as ectopic expression of R. etli amtB alters the bacteroid differentiation process and the ability to enter the host cells [53].

M. loti, S. meliloti and A. tumefaciens all have two DctD EBP paralogs (Table 3). DctD is activated by the DctB sensor protein, which senses C4-dicarboxylates in collaboration with the DctA C4-dicarboxylate transporter. The second dctD copy might encode the so-called alternative symbiotic activator, ensuring dctA expression in S. meliloti and R. leguminosarum dctD mutants during symbiosis [16].

All nitrogen-fixing species possess a NifA homolog - M. loti MAFF303099 and R7A even have two (Table 3). Rhizobial NifA proteins are activated by the low oxygen tension in the nodules, which is the key signal controlling expression of the nitrogen fixation genes and a prerequisite for nitrogen fixation [54]. NifA activates RpoN-dependent transcription of many rhizobial genes, including nitrogen fixation genes (see Additional data files). For the NifA-RpoN-dependent expression of some genes, microaerobiosis alone is not sufficient to equal the expression levels observed in bacteroids (see Additional data files, pages 2,4,6,7,13,24) [27,39]. An additional signal appears to control the optimal functioning of NifA.

The presence of additional uncharacterized EBPs suggests that other signals control RpoN-dependent gene expression in the Rhizobiales (Table 3). These signals could control expression of genes involved in symbiosis (M. loti and S. meliloti) or other, so far uncharacterized, functions (A. tumefaciens and B. melitensis). One common signal might control the activity of EBP-1, present in the four genomes (Table 3).

Comparison between symbiotic and non-symbiotic Rhizobiales

Globally, a higher number of possible RpoN-binding sites per megabases of genome is observed in symbiotic members of the Rhizobiales than in non-symbiotic ones (Table 2). This ratio is even higher in the symbiotic regions and that might reflect the recruitment of NifA-RpoN-dependent promoters, which are highly upregulated in bacteroids, for late symbiotic functions. RpoN is not essential for bacterial survival, but controls genes whose products have a wide range of different functions [6]. Apparently, there has been a selection in bacteroids for the NifA-RpoN-type promoter, not only for nitrogen fixation but for other symbiotic functions as well. This is illustrated by the high number of non-nif/fix genes on pNGR234a preceded by a -24/-12-type promoter and preferentially expressed in bacteroids (see Additional data files) [42]. The symbiotic island of M. loti R7A is about 109 kb smaller than that of MAFF303099 and the two areas share only a conserved backbone of 248 kb [30]. This distinction seems not to be reflected by the RpoN-regulon, as the majority of possible RpoN-dependent loci (75%) on the two islands is conserved (see Additional data files).

Although an A. tumefaciens rpoN mutant is not affected in tumorigenesis, expression of vir genes, chemotaxis or flagellum formation, it does not grow on nitrate as sole nitrogen source nor on C4-dicarboxylates as sole carbon source, and grows only very poorly on arginine as sole nitrogen source [35]. We were able to identify possible RpoN-binding sites in the promoter regions of the dctA gene, the gInK-amtB, glnBA, glnII and nrtABC-nirBD loci (see Additional data files, pages 1,2,10,12,16), providing genetic evidence for the previously observed phenotypes. As could be expected, no virulence, tumorigenesis, chemotaxis or flagellar genes were picked up in screening, although a good match was found upstream of motB, which encodes a flagellar motor protein. Strikingly, in Agrobacterium the total number of gene products whose genes are preceded by a possible -24/-12-type promoter is approximately equal to that of Sinorhizobium (Table 2). S. meliloti and A. tumefaciens are closely related at the level of protein identity as well as nucleotide colinearity and conservation of gene order [33]. This relatedness appears to be somewhat conserved at the regulatory level, as S. meliloti and A. tumefaciens share seven predicted RpoN-binding sites upstream of the same gene loci: dctA, glnK-amtB, glnBA, glnll, nrtABC, atu1754-smc01000 and atu4739-smb20103 (see Additional data files, pages 1,2,10,12,18,21,24). The relatively high number of positive matches in A. tumefaciens and the lack of tumorigenesis-related phenotype in the rpoN mutant hints to other, so far unidentified, RpoN-controlled functions. Indicative of this is the number (22) of possible RpoN-binding sites upstream of hypothetical genes in A. tumefaciens (see Additional data files, pages 21,26). Of the predicted proteins of A. tumefaciens, 16% are not present in M. loti or S. meliloti and 89% of these are hypothetical gene products [33], leaving room for genetic diversity between rhizobia and A. tumefaciens.

B. melitensis is an intracellular pathogen of humans and animals, and is closely related to rhizobia and A. tumefaciens [55]. Only 12 matches resulted from the in silico analysis for possible RpoN-binding sites (Table 2), two of which were found in the upstream region of amtB and glnBA (see Additional data files, pages 1,2). At present, no report of a B. melitensis rpoN mutant is available. A good candidate function for RpoN to regulate in B. melitensis would be pathogenesis, as is the case for several other pathogens [6]. Six possible -24/-12-type promoters were found in the B. melitensis genome upstream of hypothetical CDSs (see Additional data files, pages 21,22,26).

E. coli was estimated to have around 30 -24/-12-type promoters, controlling mainly functions related to nitrogen metabolism [7]. Although there is some overlap of RpoN-controlled functions between E. coli and members of the Rhizobiales (for example, ammonium assimilation and hydrogen oxidation), RpoN seems to control a more diverse set of functions in the latter. This appears to somewhat contradict the observation that E. coli possesses a higher number of EBPs (see 'Signals controlling RpoN-dependent gene expression in the Rhizobiales').

Additional control mechanisms involving RpoN

Expression studies of rpoN in wild-type and rpoN- backgrounds revealed that RpoN negatively controls its own expression in B. japonicum, Acinetobacter calcoaceticus, R. etli and R. leguminosarum bv. viciae [14,56,57,58]. The presence of a putative oppositely oriented RpoN-binding site on the template strand of rpoN promoter regions, overlapping with the -10 promoter region and the transcription start site, was proposed [58]. It was stated that, as RpoN is able to bind to -24/-12-type promoters in the absence of core-RNAP, the negative autoregulation of the rpoN genes occurs by direct interference of RpoN with the binding of σ70-holo-RNAP complex to the -10 promoter region of the rpoN gene. Site-directed mutagenesis of the highly conserved GG to TT in the putative RpoN-binding site of the rpoN promoter relieved the negative autoregulation, giving strong support to the above hypothesis [14].

The in silico screening revealed the presence of oppositely oriented possible RpoN-binding sites upstream of the rpoN genes of M. loti, S. meliloti, A. tumefaciens and B. melitensis (Table 2). A comparison with the rpoN coding sequences of R. etli, R. leguminosarum, Rhizobium sp. NGR234, B. japonicum, M. loti and S. meliloti revealed that the rpoN genes of A. tumefaciens and B. melitensis were incorrectly annotated. Their coding sequences should be 63 bp longer and shorter, respectively. An alignment of the rpoN promoter regions of these species shows the strong conservation of these promoters (Figure 1). In addition, the screening of the lower strand of the intergenic sequences revealed a slightly lower number of matches than that of the correctly oriented RpoN-binding sites (Table 2). It is thus not unconceivable that RpoN could alter the expression of these genes in a way similar to its own autoregulation, that is, by interference with the binding of the holo-RNAP or a regulatory protein to the promoter. This would significantly broaden the working domain of RpoN.

Figure 1
figure 1

Manual alignment of rpoN promoter sequences. At (Agrobacterium tumefaciens, GI: 17738659); Bj (Bradyrhizobium japonicum, GI: 152137); Bm (Brucella melitensis, GI: 17983821); Ml (Mesorhizobium loti, GI: 14023393); NGR (Rhizobium sp. NGR234, GI: 152431); Re (Rhizobium etli, GI: 1046228), Rl (Rhizobium leguminosarum bv. viciae, GI: 5759116), Sm (Sinorhizobium meliloti, GI: 152389). Upper line, -35 and -10 consensus sequences of Escherichia coli and the transcription start site (*) as determined for S. meliloti [69]. Lower line, consensus sequence of -24/-12-type promoter [1]. Nucleotides are shaded in black (100% conserved) or gray (75% conserved). The numbers represent the distance (in bp) from the end of the alignment to the start codon of the downstream rpoN gene.

Conclusions

A highly specific in silico screening method was applied to predict members of the RpoN-regulon in eight different species of the Rhizobiales. The matches obtained were individually checked and classified according to protein function, resulting in a highly annotated and manually curated dataset. This dataset was complemented with available literature data on members of the RpoN-regulon in Rhizobiales. In addition, a screening was carried out to identify possible EBPs controlling the expression of RpoN-dependent genes. Together, these data serve as a source of exhaustive information on the (possible) roles of RpoN in symbiotic and non-symbiotic processes.

RpoN-binding sites were found upstream of genes involved in common RpoN-dependent functions, such as assimilation of ammonium and uptake of C4-dicarboxylic acids. The symbiotic members of the Rhizobiales seem to have recruited RpoN for the expression of nitrogen fixation and other symbiotic genes. This is illustrated by the high number of possible RpoN-binding sites in the symbiotic regions of these bacteria. Other RpoN-dependent symbiotic functions might include detoxification and transport or might be controlled indirectly through other regulatory proteins. Whereas an A. tumefaciens rpoN mutant only displays common RpoN-dependent phenotypes, the relatively high number of possible RpoN-binding sites present in its genome points to several other, yet unidentified, RpoN-dependent functions. So far, no reports are present on RpoN-dependent phenotypes in B. melitensis. This animal pathogen has a significantly lower number of possible RpoN-binding sites than the other members of the Rhizobiales. B. melitensis RpoN might be required for infection of the host organism, as is the case for other pathogens. Furthermore, the species screened seem to have recruited RpoN independently, in a species-specific manner, for the transcription of different gene sets. The high percentage of hypothetical conserved and non-conserved CDSs preceded by a possible RpoN-binding site opens up ample opportunities for future research. Several uncharacterized EBPs were identified besides the 'classic' EBPs such as NtrC, NtrX, NifA and DctD. This implies that signals other than nitrogen, oxygen and C4-dicarboxylates control the expression of RpoN-dependent genes in species of the Rhizobiales. Identification of these signals will give better insight into yet uncharacterized RpoN-dependent functions. Finally, a similar number of possible RpoN-binding sites were found on the lower strand of the upstream intergenic sequences. RpoN might thus significantly extend its working domain by blocking the binding of transcription regulatory factors. Such is the case, for instance, in the negative autoregulation of rpoN.

Although much consideration was given in our analysis to the design and, to some extent, the experimental validation of the approach, experimental confirmation will ultimately be required to establish the biological meaning of the predicted -24/-12-type promoters, as is the case with all computer predictions.

In conclusion, a highly efficient method was applied to predict the RpoN-regulon of different members of the Rhizobiales group. The same approach might be used for the prediction of RpoN-dependent genes in other bacterial species.

Materials and methods

Retrieval of intergenic sequences

Complete DNA sequences from A. tumefaciens C58 (circular chromosome: NC_003304, linear chromosome NC_003305, plasmid AT: NC_003306, plasmid Ti: NC_003308), B. japonicum USDA110 (symbiotic chromosomal region: AF322012 and AF322013), B. melitensis 16 M (chromosome I: NC_003317, chromosome II: NC_003318), M. loti MAFF303099 (chromosome: NC_002678, plasmid pMla: NC_002679, plasmid pMlb: NC_002682), M. loti R7A (symbiotic island: AL672111), Rhizobium sp. NGR234 (plasmid pNGR234a: NC_000914), R. etli CFN42 (plasmid p42d: NC_004041) and S. meliloti 1021 (chromosome: NC_003047, plasmid pSymA: NC_003037, plasmid pSymB: NC_003078) were extracted from GenBank. Intergenic sequences were identified by automatically parsing the corresponding GenBank files using the modules of INCLUsive [59,60]. An intergenic region is defined as the non-coding region between two genes. Intergenic regions smaller than 10 nucleotides were discarded, as the corresponding genes are likely to belong to an operon.

Prediction of possible RpoN-binding sites

The intergenic regions were screened with the PATSER module of the Regulatory Sequence Analysis Tools (RSAT) [61,62,63] for the presence of the -24/-12 promoter consensus sequence. PATSER scores N-mers (in this case 16-mers) from a sequence against a given weight matrix. A set of 186 RpoN-dependent promoters from different bacterial species [1] was used to generate the weight matrix (Table 1). Initially, this matrix was trained against 67 -24/-12-type promoters with a known transcriptional start site. From the distribution of these scores (Figure 2), it was decided to retain all matches with a score higher than or equal to the fifth percentile (8.9). PATSER was run with the GC content of the intergenic sequences as a measure for the a priori probabilities of the nucleotides. The intergenic GC content differs markedly from the total GC content of the genomes (Table 2).

Figure 2
figure 2

Distribution of PATSER scores (see Material and methods) of 67 -24/-12-type promoters with mapped transcriptional start sites. (GenBank GI number: 2979503, 141885, 141892, 38664, 38679, 1769418, 142336, 142378, 142326, 39977, 550310, 408911, 39532, 39516, 39526, 152106, 152283, 152315, 39548, 312974, 12620419, 152100, 152280, 152317, 2316081, 144194, 896457, 144223, 262651, 3493239, 7208421, 262651, 145911, 556890, 41774, 146158, 1004098, 41568, 42538, 149241, 43802, 149252, 149256, 149246, 43857, 43864, 149273, 149275, 150095, 950651, 150993, 490170, 6492432, 151643, 6636054, 46254, 152305, 152230, 340664, 46285, 46324, 46324, 550144, 550144, 664946, 453435, 1649033). Cumulative percentage: black line.

Prediction of possible EBPs

An estimate of the number of possible EBPs in the respective proteomes was obtained by looking for the presence of the Pfam Sigma54_activat domain [64]. Sigma54_activat is a conserved domain present in EBPs that is involved in the ATP-dependent interaction with RpoN. The predicted proteome sequences of A. tumefaciens, B. melitensis, S. meliloti and M. loti MAFF303099 were downloaded from Proteome Analysis @ EBI [65] and the protein sequences of the symbiotic regions of B. japonicum, M. loti R7A, Rhizobium sp. NGR234 and R. etli were obtained from the NCBI protein database [66]. The protein sequences were subsequently queried with the PF00158 HMM motif using HMMER 2.2 g [67]. Matches with an E-value less than or equal to 10-25 were retained.

Additional data files

An additional data file listing all genes and proteins included in this analysis is available. Each protein is accompanied by a functional description, the species from which it comes, GenBank GI number and transcription unit; additional references for each protein are also included in the file. Information on the gene's regulation is provided where available. The data were obtained from an in silico analysis (see Materials and methods) and the literature [12,13,14,17,18,19,20,21,22,23,24,25,26,27,29,30,31,32,3334,39,40,41,42,43,44,46,50,51,56,58,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88].