Gene organization and evolutionary history

The bacterial core RNA polymerase complex, which consists of five subunits (ββ'α2ω), is sufficient for transcription elongation and termination but is unable to initiate transcription. Transcription initiation from promoter elements requires a sixth, dissociable subunit called a σ factor, which reversibly associates with the core RNA polymerase complex to form a holoenzyme. The vast majority of σ factors belong to the so-called σ70 family, reflecting their relationship to the principal σ factor of Escherichia coli, σ70. a second family of σ factors, the σ54 family, comprises proteins that are functionally similar to, but structurally distinct from, σ70 of E. coli. Here, we limit ourselves to the σ70 family.

Members of the σ70 family direct RNA polymerase to specific promoter elements that are usually 5-6 base-pairs (bp) in length and are centred 10 and 35 bp upstream (positions -10 and -35) of the transcription initiation site. They also function in the melting of promoter DNA and the early stages of elongation of transcripts. The discovery of the σ factor as a dissociable RNA polymerase subunit [1] heralded the subsequent finding that RNA polymerase recruits alternative σ factors as a means of switching on specific regulons [2]. Multiple members of the σ70 family have since been discovered in most bacteria, with up to 63 encoded by a single genome, in the case of the antibiotic-producing bacterium Streptomyces coelicolor [3]. Furthermore, σ70-related factors have been discovered in higher plants, in which they act together with a bacterial-type RNA polymerase to direct transcription in the plastid [4].

The σ70 family has been divided broadly into four phylogenetic groups on the basis of gene structure and function [5,6]. Group 1 consists of the essential primary σ factors, each of which is closely related to σ70 of E. coli. Group 2 proteins are closely related to the primary σ factors but are dispensable for bacterial cell growth. Group 3 σ factors are more distantly related to σ70 and usually activate regulons in response to a specific signal, such as a developmental checkpoint or heat shock. The group 3 σ factors can be further divided into several clusters of functionally related proteins with roles in sporulation, flagella biosynthesis, or the heat-shock response, for example, (Figure 1). Finally, group 4 accommodates the numerically largest, but highly diverged extracytoplasmic function (ECF) subfamily, most members of which respond to signals from the extracytoplasmic environment, such as the presence of misfolded proteins in the periplasmic space. Whereas most bacteria have a single group 1 primary σ factor, the number of other group members varies widely, reflecting in part the different physiological and developmental characteristics of the various organisms. For example, whereas E. coli has two members in each of group 3 and group 4, the physiologically and developmentally complex S. coelicolor has 10 group 3 members and 49 group 4 members [3]. A phylogenetic tree that illustrates the relationships between σ70-family members from four different bacteria is shown in Figure 1. (For in-depth phylogenetic analyses of the σ70 family see, for example, [7,8].)

Figure 1
figure 1

Phylogenetic relationships between members of the σ70 family of sigma factors from four diverse bacteria: E. coli (E); Caulobacter cresentus (CC or C); B. subtilis (B); and Mycobacterium tuberculosis (M). For C. cresentus, gene numbers from the annotated genome sequence are given. Note that the primary and related factors comprise groups 1 and 2, the group 3 σ factors are divided into functionally related groups (sporulation, flagellar biosynthesis, general stress and heat shock), and the divergent ECF factors comprise group 4. Particularly striking is the distance between the ECF subfamily and other members of the σ70 family, as well as divergence within the ECF subfamily itself. The unrooted tree was constructed using the Phylip programs PROTDIST and NEIGHBOUR from a multiple amino-acid sequence alignment made by the program ClustalW. Only conserved regions 2 and 4 were included in the alignments.

For historical reasons the nomenclature of the σ superfamily is complex. In E. coli and several other Gram-negative bacteria, σ factor genes are designated rpo (for RNA polymerase subunit), whereas in most Gram-positive bacteria the genes are designated sig. The proteins may be designated σ with a superscript reflecting the molecular weight or gene name, or may have an arbitrary single-letter designation. In the post-genomic era, the situation has naturally become further complicated with the realization that some organisms have more σ factors than there are letters in the alphabet.

Characteristic structural features

Sequence alignments of the σ70 family members reveal four conserved regions that can be further divided into subregions (Figure 2) [5]. Only regions 2 and 4 are well conserved in all members of the σ70 family, and include subregions involved in binding to the core RNA polymerase complex, recognition of the -10 and -35 promoter (regions 2.4 and 4.2, respectively), and promoter melting (region 2.3). Much of region 1 is conserved only between the primary and closely related σ factors (groups 1 and 2), and region 1 appears to function in antagonizing the DNA-binding activity of the σ factor. Region 3, which is virtually absent from ECF σ factors, includes a subregion 3.0 (previously named 2.5) that interacts with DNA upstream of the -10 element in certain 'extended-10' promoters that lack the -35 element [9,10]. The linear division of σ70 factors into functionally distinct regions is largely confirmed by recent structural data, which revealed that primary σ factors have three flexibly linked compact domains, σ2, σ3 and σ4, which incorporate regions 2, 3 and 4, respectively [10].

Figure 2
figure 2

Structural characteristics of E. coli σ70. (a) The protein sequence has been divided into four regions on the basis of sequence conservation with other members of the σ70 family. Residues in the carboxy-terminal part of region 4 (subregion 4.2) form a helix-turn-helix motif that contacts the -35 element of the promoter. Residues from conserved regions 2 and 3 cooperate to mediate recognition of the -10 region and melting of the DNA. A residue in the amino-terminal part of region 3 (3.0) contacts the conserved TG motif in the extended -10 element of certain promoters that do not require a -35 region. Residues from an α helix in region 2 that corresponds to the conserved subregions 2.3 and 2.4 interact intimately with the -10 element. Subregion 2.3 is thought to interact primarily with single-stranded DNA in the open complex (dashed arrow). The three domains of the σ factor observed by X-ray crystallography (σ2, σ3 and σ4) are indicated underneath the linear structure. Note that the protein domains correspond closely (although not precisely) with the regions assigned by sequence comparisons. (b) A model for the interaction of RNA polymerase holoenzyme (containing, β,β ', two α, and ω one subunit in addition to the σ factor) with promoter DNA. The model is based on crystallographic analyses of σ domains, holoenzyme, and holoenzyme-model DNA complexes [10,14,15,25]. The major functional domains of the σ factor are shown in dark grey. The bold arrow indicates the direction of transcription. Although the template strand in the transcription bubble passes underneath the β unit and the σ2 domain, the path of the DNA is shown throughout its length. Adapted from [21].

The crystal structure of the σ2 domain has been solved for two primary σ factors (σ70 of E. coli and σA of Thermus aquaticus) [10,11], and one ECF σ factor (σR of S. coelicolor) [12]. Discounting a non-conserved region that occurs between subregions 1.2 and 2.4 in some primary σ factors, each σ2 domain is composed of a bundle of three α helices that is virtually identical in all three structures analyzed. The second helix of this bundle is a major point for contact with a coiled-coil domain in the β' subunit of the core RNA polymerase complex [13]. The third helix of the bundle includes conserved residues along one face that are involved in DNA melting and in recognition of the -10 promoter element (Figure 2b).

The σ3 domain, which is less conserved between members of the σ70 family, and is absent from ECF σ factors, is also a three-helix domain, the first helix of which contains the residues implicated in contacting DNA upstream of extended -10 promoters [10]. The σ4, domain has two pairs of α helices; the carboxy-terminal pair forms a helix-turn-helix motif that contacts the promoter DNA in the region from -30 to -38 [10,14]. The spectacular crystallographic views obtained recently of the σ70 factor in the holoenzyme complex [14,15] revealed that the σ2, σ3 and σ4, domains extend across a wide area of the RNA polymerase, with an interface between the core complex and σ70 of more than 8000 Å2. Also revealed was a gap of approximately 45 Å between the σ3 and σ4 domains, taken up by a 33-residue linker (the 'σ34 linker') that, strikingly, travels close to the active site of RNA polymerase and through the channel from which the growing transcript exits the RNA polymerase complex before connecting with the σ4 domain.

Localization and function

Whereas the function of the essential group 1 σ70 factors is to direct general transcription, the accessory σ factors of groups 2-4 usually function to turn on specific gene sets in response to an appropriate signal. Their functions can be divided into three very broad categories: stress responses, development and ancillary metabolism. The wide variety of stress responses controlled by members of the σ70 family includes the stationary phase and general stress responses (mediated by, for example, σS in E. coli and σB in Bacillus subtilis), intracellular and extracytoplasmic protein misfolding (regulated in E. coli by σ32 and σE, respectively), oxidative stress (e.g. σR in S. coelicolor), osmotic stress (σM in B. subtilis) and cell-wall stress (controlled by, for example, σE in S. coelicolor and σW in B. subtilis). Developmental programs under the control of σ70 family members include flagella biosynthesis (involving σD in B. subtilis and σF in Salmonella typhimurium), endospore formation (mediated by, for example, σE, σK, σF and σG in B. subtilis), and exospore formation (σWhiG in S. coelicolor). Ancillary metabolic functions that are controlled by σ factors include iron uptake (σFecI in E. coli and σPvds in pseudomonads). There are clearly many more functions to be discovered, especially amongst members of the ECF subfamily; for example, in S. coelicolor the function of only three of its 49 ECF σ factors is understood.

The activity of σ factors, and the consequent activity of the promoters they recognize, can be controlled at many different levels: de novo synthesis (at the transcriptional or translational level), by post-translational processing, by proteolytic degradation, and by post-translational inhibition. Indeed, some σ factors, such as σS factor of E. coli, are regulated at most of these levels [16]. Of widespread importance is post-translational inhibition by so-called anti-σ factors, proteins that reversibly bind to the σ factor thereby preventing its interaction with the core RNA polymerase [17,18]. In these cases, the signal that leads to the activation of the σ factor and the induction of the σ regulon somehow modify the σ-binding activity of the anti-σ factor.

Mechanism

Recent structural studies together with an extensive catalog of biochemical data are starting to shed light on the function of σ70 family members in transcription initiation. Once they become part of the holoenzyme, the promoter-recognition determinants of subregions 2.4 and 4.2 of the σ factor are solvent-exposed and appropriately separated. This conformation allows subregions 2.4 and 4.2 to interact with the -10 and -35 elements, respectively, to form a so-called 'closed' complex in which the promoter DNA remains base-paired. At promoters that lack -35 regions or have -35 elements that deviate significantly from the consensus sequence, the σ4, domain can stimulate formation of the closed complex by contacting activator proteins, such as λcI, which is bound upstream, or PhoB, which is bound downstream of the complex on the DNA [19].

In the following stage, the DNA in the region from position -11 to +4, which partially overlaps with the -10 element, melts in a process called isomerization, the mechanistic details of which are unresolved but probably include several kinetically distinct intermediate states. Once separated, the two DNA strands take different paths, with the template strand approaching the active site of the RNA polymerase and the non-template strand being held by conserved aromatic residues in region 2.3 of the σ factor that had previously been implicated in DNA melting [20]. The σ factor may also play a role in the next stage, that of de novo RNA synthesis, by donating a disordered loop from the σ34 linker into the active site of the RNA polymerase; this might perhaps stabilize the initiating nucleotide [15]. Alternatively, the disordered loop may stabilize the open complex by preventing reannealing close to the transcription start site [21]. Finally, after a nascent RNA of 8-10 nucleotides has been synthesized, the σ factor is released or moves out of the way to allow elongation to proceed further and RNA polymerase to escape the promoter. The discovery that the σ34 linker is located in the RNA exit channel of the RNA polymerase suggests a mechanism of promoter clearance that involves the nascent RNA displacing the linker, in turn weakening the interaction between the core RNA polymerase and the σ4 domain and ultimately the rest of the σ factor [14,15]. Interestingly, σ70 of E. coli can in some instances interact with the exposed non-template strand early in the elongation process, and these interactions can lead to transient pausing of the elongation complex [22]. Furthermore, recent evidence suggests that σ70 may remain associated with the core RNA polymerase complex during the elongation process [23,24].

Frontiers

The recent structural information on primary σ factors has had a major impact on our understanding of the mechanistic role of the σ70 family of σ factors in transcription initiation. Numerous puzzles remain, however. It is not clear how transcription-activator proteins can modulate the complex conformational changes that accompany promoter recognition and melting. The timing and extent of release of the σ factor during the transition to the transcript-elongation phase is a topic of continuing controversy. Finally, the extent to which σ factors might be retained in early, or perhaps later, elongation complexes and might mediate a sequence-responsive pause in transcription is not resolved. Although a reasonably detailed picture of the action of σ70 can now be envisaged, the sequence divergence noted within the σ70 family raises questions about how other family members mediate promoter recognition and melting and how they interact with their regulators. Finally, the many bacterial genome sequencing projects have revealed a huge gap in our understanding of the biological function of the many newly discovered σ factors. Even in some of the best characterized model systems, such as B. subtilis, there is a frustrating lack of knowledge regarding the regulation, roles, and possible redundancies among the various σ factors.