Identification of conserved protein domains that span a wide range of biological functions provide deep insights regarding the origin and evolution of complex biological systems, These versatile conserved domains often have catalytic or structural roles that can be utilized, with small variations, in different contexts. The P-loop-containing nucleotide phosphatase fold represents one such catalytic domain that is utilized in almost every conceivable biological system in all the three superkingdoms of life [1,2]. Folds such as the SH3-like barrels, the PAS-like fold, the OB fold, the double-stranded β-helix, the β-propeller and rubredoxin-like zinc ribbons are predominantly non-catalytic domains that are widely represented in multiple functional contexts, with roles such as small-molecule binding, nucleic-acid binding and interaction with other proteins [3,4,5] (see also the SCOP [6] and CATH [7] databases). Versatile globular domains appear to have emerged fairly early in evolution in various fold classes, such as the α/β or α+β mixed folds, or the all-α and all-β folds [5,8]. Comparative genomics and evolutionary studies indicate that many of these versatile folds probably emerged in contexts related to RNA binding in the ancient translation system and were subsequently re-utilized in other biological systems [9,10].

Of particular interest in this context are the small all-β folds that assume conformations such as barrels or β-helices [3,4]. These structures have considerable potential for functional versatility, because they are able either to accommodate small molecules within cavities formed by the curved β-sheets or to interact with various larger molecules, especially nucleic acids or proteins, via the external surfaces of the sheets. A few ancient and widespread β-rich folds such as the SH3-like barrel and the OB fold appear to have colonized multiple functional niches early in evolution, although their earliest versions may have had roles related to RNA metabolism [9,11,12,13]. We were interested in identifying other such functionally versatile β-rich folds that could be traced back to the early stages of life's evolution. The availability of extensive genome sequence data and advances in structure determination over recent years allow the successful application of comparative genomics, sequence and structure comparisons to identify any such folds that may be somewhat less widely represented than the OB or SH3-like folds.

Here, we identify one such β-barrel fold typified by the globular domain of the H subunit of the photosynthetic reaction center (PRC-H) from purple proteobacteria such as Rhodopseudomonas viridis [14,15]. The purple bacterial photosynthetic reaction center consists of three primary subunits, of which PRC-L and PRC-M primarily bind the pigments involved in photochemistry, whereas PRC-H appears to be a key regulator of electron transfer between the quinones in photosynthetic reaction centers [16]. So far, homologs of the H subunit have only been found in photosynthetic proteobacteria [17,18,19] and the carboxy-terminal globular domain of PRC-H shows a distinct β-barrel fold that is structurally unrelated to other characterized β-barrels. This raises the important question of the evolutionary provenance of this unique domain. Here we use sequence-profile analysis and comparative genomics to show that the β-barrel domain of PRC-H defines a novel, widespread superfamily of β-barrel domains that is represented in several bacterial, plant and archaeal genomes. We also show that this β-barrel domain is found in the conserved protein RimM, which is involved in RNA processing and ribosomal assembly in the course of translation. Thus we provide evidence for an unexpected evolutionary connection between RNA metabolism, translation and the redox reactions in photosynthesis in the form of a shared functionally versatile β-barrel domain.

Results and discussion

Identification of the PRC-barrel domain

The PRC-H subunit is a membrane-spanning protein with a single amino-terminal transmembrane helix [14,15]. Its crystal structure reveals a cytoplasmic region comprising a largely non-globular segment followed by a β-barrel-like structure. The entire cytoplasmic region has been classified as a novel β-rich fold with no relatives in the SCOP database [6]. However, we observed that the most carboxy-terminal part of the cytoplasmic region forms a distinct folding unit in the form of a six-stranded β-barrel that could define a novel evolutionarily conserved domain (Figure 1). A DALI search [20] of the PDB database with this β-barrel revealed no specific structural relationships with other folds such as the OB fold or the SH3-like barrel beyond the presence of curved β-sheets, suggesting that it represents a domain with a distinct fold (Figure 2).

Figure 1
figure 1

A ribbon representation of the H (gold) and the M (gray) subunits of the photosynthetic reaction complex (PDB 1 eys). The PRC-barrel is colored purple to highlight it. The two acidic residues projecting in the direction of the membrane, including the glutamate (E) involved in regulation of quinone reduction, are shown in space-filling representation. The peptide from the amino-terminal tail of the M subunit that interacts with a cleft in the PRC-barrel (PCR-M peptide) is also shown in space-filling representation.

Figure 2
figure 2

A comparison of the PRC-barrel with the analogous β-barrels, namely the SH3-like barrel and the OB fold. (a) The representative of the SH3-like barrel is the dihydrofolate reductase subunit (PDB 1vie) and (b) the representative of the OB fold is the cold-shock protein S1-like RNA-binding domain (PDB 1mjc). Note the difference in packing of the last strand of the OB fold with respect to the first strand in the SH3-like barrel and (c) the PRC-barrel. In the case of the OB fold, note the difference in orientation of the second strand with respect to the first as compared to the other two β-barrels.

To further investigate its evolutionary relationships, we used the sequence of this β-barrel unit from the PRC-H protein (gi: 132177, residues 151-257) of Rhodopseudomonas viridis in a PSI-BLAST [21] search of the non-redundant (NR) database at the National Center for Biotechnology Information (NCBI). This search (expect value (e) threshold for inclusion in profile = 0.01) recovered, in addition to the orthologs of the PRC-H proteins from other purple proteobacteria, several uncharacterized proteins from the cyanobacterium Anabaena (for example, all5315 and alr5332, iteration 2, e = 10-6-10-4), non-photosynthetic α-proteobacteria such as Mesorhizobium, Sinorhizobium, Brucella and Caulobacter (for example, SMc00885, iteration 4, e = 10-4 or CAC1676, iteration 5, e = 10-6), several other assorted bacteria like Deinococcus, Bacillus and Streptomyces (for example, YlmC iteration 4, e = 10-5) and several archaea with completely sequenced genomes. Interestingly, in addition to these proteins, this search also recovered the ribosome-associated RimM protein from bacteria (for example, RimM, Deinococcus radiodurans, iteration 6, e = 10-4). To establish the validity of these relationships we collected all the true positives detected in this search and clustered them on the basis of similarity, obtained diverse representatives belonging to each cluster, and seeded PSI-BLAST searches with each of them. The majority of these searches recovered approximately the same set of proteins with statistically significant e-values. For example a search started with the archaeal protein Ta0943 (gi: 10640258, whole length) recovers RimM (from Vibrio cholerae, iteration 4, e = 10-4), and the PRC-H protein (R. viridis, iteration 8, e = 10-3). Although some of these searches converged prematurely, they consistently recovered true positives detected in the other searches with at least borderline statistical significance (e approximately 0.05-0.01).

We prepared separate multiple sequence alignments of this region for all the major, distinct clusters of the proteins detected in the above searches and predicted secondary structure for each of them. The predicted secondary structure [22] corresponded perfectly with that of the barrel domain of the classic PRC-H proteins. Furthermore, the smallest proteins with this region were detected in the Euryarchaea (for example, Ta0943) and their length of approximately 80 residues exactly corresponded to that of the β-barrel that forms a distinct folding unit seen in the PRC-H subunit. Consistent with this, several proteins, such as mll3685 from Mesorhizobium loti, have duplications or triplications of this region, which indicate that the boundaries of each repeat correspond perfectly to the β-barrel unit of PRC-H. These observations suggest that this region indeed defines a novel evolutionarily mobile domain of approximately 80 residues (Figure 3). We named it the PRC-barrel after the photosynthetic reaction center subunit H, in which it was first observed.

Figure 3
figure 3

A multiple alignment of the PRC-barrel was constructed using T-Coffee [38] and realigning the sequences by parsing high-scoring pairs from PSI-BLAST search results. The secondary structure assigned by PHD [22] is shown above the alignment, with E representing a β-strand, and H an α-helix. The 85% consensus shown below the alignment was derived using the following amino-acid classes: hydrophobic (h, ALICVMYFW, yellow shading); the aliphatic subset of the hydrophobic class (l, ALIVMC, yellow shading); small (s, ACDGNPSTV, green) and polar (p, CDEHKNQRST, blue). A 'G' denotes the conserved G of the tiny subset of the small class. Columns of residues that are peculiar to a particular category of PRC-barrels (see text) are colored red. The limits of the domains are indicated by the residue positions on each side. The numbers within the alignment are non-conserved inserts that have not been shown. The different families are shown on the right. The sequences are denoted by their gene name followed by the species abbreviation and GenBank identifier (gi). The species abbreviations are: Archaea: Af, Archaeoglobus fulgidus; Hsp, Halobacterium sp. NRC-1; Mac, Methanosarcina acetivorans; Mta, Methanobacterium thermoautotrophicum; Mj, Methanococcus jannaschii; Ph, Pyrococcus horikoshii; Tac, Thermoplasma acidophilum; Bacteria: Atu, Agrobacterium tumefaciens; Aae, Aquifex aeolicus; Ana, Anabaena sp.; Bs, Bacillus subtilis; Bb, Borrelia burgdorferi; Bmel, Brucella melitensis; Cac, Clostridium acetobutylicum; Ccr, Caulobacter crescentus; Cj, Campylobacter jejuni; Des, Desulfitobacterium hafniense; Drad, Deinococcus radiodurans; Ec, Escherichia coli; Hi, Haemophilus influenzae; Hp, Helicobacter pylori; Mlo, Mesorhizobium loti; Mtu, Mycobacterium tuberculosis; Nm, Neisseria meningitidis; Pae, Pseudomonas aeruginosa; Pmar, Prochlorococcus marinus; Rcap, Rhodobacter capsulatus; Rp, Rickettsia prowazekii; Rsp, Rhodobacter sphaeroides; Rvi, Rhodopseudomonas viridis; Sli, Streptomyces lividans; Sme, Sinorhizobium meliloti; Scoe, Streptomyces coelicolor A3; Syco, Synechococcus sp.; Ssp, Synechocystis sp.; Tm, Thermotoga maritima; Tp, Treponema pallidum; Ter, Trichodesmium erythraeum; Tsyn, Thermosynechococcus elongatus; Ttep, Thermochromatium tepidum; Xf, Xylella fastidiosa; Plants: At, Arabidopsis thaliana.

Most of the sequence conservation in the PRC-barrel is centered on the hydrophobic residues that stabilize the six strands of the domain. Additionally, there is a nearly invariant glycine (Figure 3) that corresponds to the beginning of strand 2 and is likely to stabilize the first β-hairpin in the structure. Beyond the conserved core, there is considerable variability in the residues in the loops, and these are likely to impart the specificity required for the diverse interactions of this superfamily.

Potential biological functions of the PRC-barrels

The experimentally characterized PRC-barrel-containing proteins possess diverse biological functions: the PRC-H subunits themselves are involved in photosynthesis in the purple bacteria [14], whereas RimM is a protein that associates with the 30S ribosomal subunit and is required for efficient translation and processing of 16S RNA [23,24,25]. Gene-disruption studies in Rhodobacter capsulatus indicate that loss of the PRC-H subunit results in disruption of the reaction center and the light-harvesting complex-1 and loss of photosynthetic growth [26,27]. Biochemical studies have pointed out that the PRC-barrel of the purple bacterial PRC-H lies on the cytosolic face of the reaction center and directly affects the redox processes during the photosynthetic reaction [16,28]. On photoactivation there is an electron-transfer chain from the primary donor - the bacteriochlorophyll molecules - to the primary quinone, and then to the secondary quinone. A glutamate residue (E173 in R. viridis PRC-H) located in the loop between strand 2 and 3 of the PRC domain is in the vicinity of the secondary quinone of the reaction center. The site-directed mutagenesis of this glutamate severely retards the first and second electron transfers from the primary quinone that successively reduce the secondary quinone to semi-quinone and quinol [16]. The crystal structure of the reaction center reveals that this acidic residue of the PRC-barrel is situated close to other acidic residues from the PRC-L subunit, which interact with the quinone [16]. Thus, the acidic residue in the loop between the PRC could act as a regulator of the electrostatic state of the reaction complex to potentiate electron transfer. The multiple alignment (Figure 3) of the PRC-barrel reveals that this glutamate, or an equivalent acidic residue, is conserved in the majority of PRC-barrels that are most closely related to the PRC-H version. In addition to the purple bacteria, such versions are seen in the cyanobacteria, α-proteobacteria such as rhizobia, Agrobacterium tumefaciens and Brucella melitensis, and Deinococcus radiodurans (Figure 3). In the case of the cyanobacteria, it is possible that some of the PRC-H-like proteins that contain an equivalent acidic residue might associate with their very distinct photosynthetic reaction centers [29,30,31], by analogy with their purple bacterial counterparts. One of these proteins from Anabaena cylindrica is exclusively expressed in the spore-like akinetes [32], though its actual function remains unknown. The extensive spread of this version of the domain in non-photosynthetic proteobacteria such as the rhizobia, suggest that a similar mechanism of regulating electron transfers may, perhaps, be used in regulating non-photosynthetic electron-transfer reactions.

Many PRC-barrels, including those of the RimM family, lack the acidic residue typical of those related to the PRC-H subunit and required for redox regulation. The crystal structure shows that the PRC-barrel domain mediates a contact with the amino-terminal cytoplasmic peptide of the PRC-M subunit and also makes contacts with the other structurally less ordered regions of the PRC-H polypeptide [14,15] (Figure 1). This suggests that, in addition to the specific electrostatic regulatory function, the PRC-barrel mediates specific interactions with other molecules through multiple surfaces. These interactions are consistent with the 'foundational role' postulated for the PRC-H protein in the assembly of the reaction center [33]. Such a protein-protein interaction role in the assembly of complexes could be a potential function of the PRC-barrels that do not possess the features suggestive of redox regulation. Given that RimM specifically associates with the 30S ribosomal subunit rather than the fully assembled ribosome, and participates in the maturation of the 16S rRNA [23,24,25], it is possible that the carboxy-terminal PRC-barrel could be used to bind RNA or proteins. An acidic residue at the beginning of strand 3 and a patch of large and acidic residues at the extreme amino terminus of the PRC-barrel that is specific to the RimM proteins are of particular interest in this regard (Figure 3). On the basis of the structure of the PRC-barrel it can be predicted that in RimM these residues are likely to line a cleft that may accommodate a peptide from an interacting protein (Figures 1,3). Other surfaces of the PRC-barrel in the RimM protein could interact with other proteins, or, alternatively, interact with RNA. The original report on the crystal structure of the photosynthetic reaction center suggested that the region corresponding to the PRC-barrel of the PRC-H subunit could bind a small-molecule ligand [14]. While the structure of the PRC-barrels with a central aperture (Figures 1,2) makes this a tempting possibility, currently there is no evidence to support the possibility that these domains bind small-molecule ligands.

Evolutionary history and diversification of the PRC-barrels

The phyletic patterns and relationships of the PRC-barrels have a number of important implications, including some for the evolution of the photosynthetic reaction center in bacteria. Phylogenetic analysis and similarity-based clustering show that these domains essentially form three large and distinct groups (Figure 4) and at least two smaller clusters. The PRC-H subunits and their close relatives from cyanobacteria, non-photosynthetic α-proteobacteria and D. radiodurans form the first of these groups. They are represented in multiple copies in the proteomes of most α-proteobacteria. In addition, they are also seen in multiple copies in the cyanobacterium Anabaena, the actinomycete Streptomyces coelicolor, D. radiodurans and the euryarchaeon Methanosarcina acetivorans. In this group, the PRC-barrel occurs either linked to amino-terminal transmembrane helices or as tandem repeats, with up to three copies, or linked to another α-helical repetitive domain also found in the Bacillus protein YsnF (Figure 4). Most members of this group contain a negatively charged residue in the loop between strand 2 and 3 (Figure 3) that could be implicated in electrostatic regulatory processes, as in the case of PRC-H proper. The second major group is represented in all the euryarchaeal genomes available to date and is additionally found in a few Gram-positive bacteria. All proteins belonging to this class are the stand-alone minimal version of the PRC-barrel. The third large group of these proteins comprises the RimM orthologs. These are the most prevalent of all the PRC-barrel proteins and are present in a single copy in all bacterial proteomes available to date, and also in plants in the form of a chloroplast-derived version. The RimM proteins have an additional specific amino-terminal β-strand-rich domain with no detectable relationship to other domains.

Figure 4
figure 4

Phylogenetic relationships of the PRC-barrel-containing proteins along with the domain architectures. The phyletic pattern of each family is shown, along with the number of proteins (if there is more than one). Species abbreviations are as in Figure 3. The RELL bootstrap values for the major branches are shown at their base. The thickness of a given branch is approximately proportional to the number of proteins contained within it. Ysnf, a repeat domain typified by the Bacillus subtilis YsnF protein; N-term, the specific amino-terminal domain of the RimM proteins.

In addition to these three major groups, there is one small group that has representatives only in the cyanobacteria and plants (Figures 2,4) and appears to contain more distant relatives of the PRC-H-like PRC-barrels. All members of this group contain two copies of the PRC-barrel, and, in most cases, one of these copies contains an acidic residue in the loop between strand 2 and 3 that might be equivalent to the classic PRC-H-like regulatory acidic residue. The presence of a predicted amino-terminal transit peptide in the plant proteins suggests that it probably functions in the chloroplast, and that these proteins may, in part, have a regulatory function analogous to the PRC subunit. The other small group, typified by the Bacillus subtilis protein YrrD, has so far been found only in the actinomycetes, Gram-positive bacteria and D. radiodurans (Figures 3,4). This group is most closely related to the above-discussed plant-cyanobacterial group, and is ultimately a distant relative of the larger PRC-H like group. Most members of this group have duplicate copies of the PRC-barrel domain and typically lack the acidic residue in the loop between strands 2 and 3.

The phyletic pattern of RimM is reminiscent of proteins that for part of the ribosome or participate in RNA metabolism [9]. Likewise, the presence of the euryarchaeal-type solo PRC-barrels in every euryarchaeal genome, despite the metabolic diversity of these euryarchaea, suggests that these proteins probably have a core cellular function. One likely possibility is that they function in RNA metabolism, perhaps as the archaeal equivalents of RimM. In contrast to these two groups of PRC-barrels, the PRC-H-like group and their more distant relatives have a phyletic distribution, mainly limited to bacteria or archaea with large genomes and complex metabolism. This suggests that they were, perhaps, derived later in bacteria evolution from a version involved in RNA metabolism. The extensive presence of multiple copies of the PRC-H-like versions in diverse α-proteobacteria suggests that this particular form, with the characteristic acidic residue, was derived in their common ancestor. A corollary to this is that these proteins probably functioned, initially, as regulators of electron-transfer chains in non-photosynthetic energy metabolism systems in the ancestral α-proteobacterium. They were subsequently utilized in the photosynthetic reaction center after a subset of these bacteria acquired photosynthesis. The phylogenetic tree supports a close relationship between some of the cyanobacterial PRC-barrels and the PRC-H proteins of the photosynthetic α-proteobacteria (Figure 4). This implies that the cyanobacteria probably acquired these forms of the PRC-barrel through lateral transfer from the purple bacteria and may have incorporated them as regulatory subunits into the organizationally distinct cyanobacterial photosystems [29,30,31] or into other uncharacterized electron-transfer chains. This is consistent with the previously observed case of horizontal transfer of α-proteobacterial reaction-center genes, including that for PRC-H, into the β-proteobacteria [34].


We show that the carboxy-terminal β-barrel domain of the H subunit of the photosynthetic reaction center defines a novel all-β-sheet fold, representatives of which are widespread throughout the prokaryotic world. Homologs of PRC-H, with a conserved acidic residue that has been shown to have a role in regulating electron transfer in the reduction of the secondary quinone of reaction centers, are also found in non-photosynthetic α -proteobacteria, cyanobacteria, D. radiodurans and the archaeon M. acetivorans. It appears likely that the PRC-H-like version of the PRC-barrels was first derived in the ancestral α-proteobacteria, followed by dissemination into other lineages. Probably, these proteins originally functioned in non-photosynthetic electron-transfer chains and were subsequently incorporated into the photosynthetic apparatus after its emergence in the α-proteobacteria. In addition, this domain also appears to mediate specific protein-protein interactions. This is likely to be the principal role of versions of this domain present in the pan-bacterial RimM proteins and other proteins widely distributed in bacteria and archaea. A protein comprising only a stand-alone copy of the PRC-barrel is conserved in all the euryarchaeal proteomes available to date, and, by analogy with the RimM protein, is predicted to function in RNA metabolism. It seems possible that PRC-barrels with a sporadic distribution and a regulatory function in energy metabolism or photosynthesis are likely to have been derived from more conserved and ancient versions that were probably involved in RNA metabolism. The identification of this domain may help in the exploration of hitherto unexplored facets of diverse biological processes such as photosynthesis, energy metabolism and RNA metabolism.

Materials and methods

The non-redundant (NR) database of protein sequences at NCBI was searched using the BLASTP program [21]. Profile searches were conducted using the PSI-BLAST program with either a single sequence or an alignment used as the query, with a profile-inclusion expectation (E) value threshold of e = 0.01, and were iterated until convergence [21,35]. Before use in PSI-BLAST searches, the PRC domain was evaluated for compositional bias using the SEG program [36]. No such bias that could skew the statistics of sequence relationships in searches of the NR database was detected. Accordingly, to achieve maximum sensitivity, all searches were run with the compositional-bias-based statistics turned off [37]. Multiple alignments were constructed using the T_Coffee program [38], followed by manual correction based on the PSI-BLAST results.

Structural manipulations were carried out using the Swiss-PDB viewer program [39] and the ribbon diagrams were constructed with MOLSCRIPT [40]. Searches of the PDB database with query structures was conducted using the program DALI [20]. Protein secondary structure was predicted using a multiple alignment as the input for the program PHD [22]. Signal peptides were predicted using SIGNALP [41,42] and the transmembrane regions were predicted using TOPRED [43].

Phylogenetic analysis was carried out using the maximum-likelihood, neighbor-joining and least squares methods [44,45]. Briefly, this involved the construction of a least-squares tree using the FITCH program or a neighbor-joining tree using the NEIGHBOR program (both from the PHYLIP package) [46], followed by local rearrangement using the Protml program of the Molphy package [42] to arrive at the maximum likelihood (ML) tree. The statistical significance of various nodes of this ML tree was assessed using the relative estimate of logarithmic likelihood bootstrap (Protml RELL-BP) with 10,000 replicates.