Gene organization and evolutionary history

Histidine protein kinases (HPKs), together with their partner response regulators, are undoubtedly the most widely used of all signal-transduction enzymes in nature. They are present in all three major kingdoms of life (the Bacteria, Archaea and Eukarya) [1,2], and function in the sensing of a cell's external environment [3]. For unicellular organisms this usually equates to sensing nutrients, chemoattractants, osmotic conditions and so on. HPKs are also responsible for coordinating the behavior of cell populations. In bacteria this takes the form of quorum sensing [4]. In Eukarya, where HPKs appear to be confined to plants and to free-living organisms, such as yeasts, fungi and protozoa, HPKs have roles in regulating hormone-dependent developmental processes [5]; for example, HPKs are responsible for transducing the effects of the hormone ethylene in plants [6], and for coordinating the development of the fruiting body of the slime mold Dictyostelium discoideum [5]. No HPK (or response regulator) genes are present in the completed genome sequences of Caenorhabditis elegans [7], Drosophila melanogaster [8], or Homo sapiens [9], and it is thought that these enzymes are absent from the animal kingdom as a whole.

HPKs (and response regulators) probably arose in bacteria and, with few exceptions, are found in all bacterial species [2], which have a wide variation of HPK gene numbers: for example, the Escherichia coli and Bacillus subtilis genomes both contain about 25 HPK genes; in contrast, Helicobacter pylori has only four HPKs, and the three Mycoplasma species whose genomes have been sequenced (M. genitalium, M. pneumoniae, and M. pulmonis) are among the rare exceptions with no HPK genes. There are well over a thousand HPK genes currently in sequence databases, and HPK genes are found much less frequently in archaea and eukaryotes than in bacteria. HPK proteins can be divided by sequence analysis into 11 subfamilies, and virtually all eukaryotic HPKs belong to a single subfamily [1]. This, and the relative scarcity of HPK genes in archaea and eukaryotes, supports the notion that HPK (and response regulator) genes arrived in eukaryotes and archaea by lateral gene transfer from bacteria.

In prokaryotes, the genes for cognate pairs of HPKs and response regulators are typically found together in a single operon, such as the EnvZ-OmpR system that controls osmosensing in E. coli [3]. The functional relationship between these signaling partners is therefore reflected in their gene organization. Cross-regulation of a response regulator by additional histidine kinases does occur, however. For instance, the OmpR response regulator receives phosphate in vivo not only from its cognate HPK EnvZ, but also from the ArcB histidine kinase [10], although the ArcB and OmpR genes are not on the same operon.

In eukaryotes, HPK and receiver domains, containing the aspartate that is phosphorylated by the HPK, are generally encoded within a single gene, whereas in bacteria the receiver domain is part of the response regulator. Eukaryotic HPK proteins therefore contain both elements of the traditional two-component pathway, and are referred to as 'hybrid kinases'. Such hybrid kinases, while predominating in eukaryotes, are only a small minority among prokaryotic HPKs; all hybrid HPKs (both eukaryotic and prokaryotic) belong to a subdivision of the HPK1 gene family, one of the 11 HPK families (Figure 1) [1].

Figure 1
figure 1

Conserved sequence motifs in the 11 histidine protein kinase subfamilies. (a) Motifs representative of conserved sequences in the HPK1 family, which includes a majority of all HPKs, including all eukaryotic HPKs [1]. (b) The relative positions of the conserved motifs in a representative HPK (E. coli EnvZ). Regions known or predicted to be α helical are shown as rectangles, and β sheets as arrows [14,21]. The conserved regions that make up the HPK core, designated the H, N, D, F and G boxes (shown in blue), typically span approximately 200 residues in total [1,19]. HAMP is a linker domain, and TM1 and TM2 are transmembrane helices. Sequence alignments of (c) the H box, (d) the N box, (e) the D and F boxes, and (f) the G box for one representative member of each of the 11 HPK subfamilies (except for the HPK1 family, where one member of HPK1a and one member of HPK1b are shown). The proteins used to make this alignment were: HPK1a, Pseudomonas aeruginosa KinB (595 amino acids in total); HPK1b, E. coli TorS (914 amino acids); HPK2, E. coli EnvZ (450 amino acids); HPK3, E. coli PhoQ (486 amino acids); HPK4, E. coli NtrB (349 amino acids); HPK5, E. coli DcuS (543 amino acids); HPK6, Archaeoglobus fulgidus g2648416 (607 amino acids); HPK7, E. coli NarQ (566 amino acids); HPK8, E. coli YehU (561 amino acids); HPK9, E. coli CheA (654 amino acids); HPK10, Streptococcus pneumoniae ComD (441 amino acids); HPK11, Methanobacterium mth292 (564 amino acids). The regions chosen for these alignments were taken from subfamily-specific alignments carried out in one of our previous studies [1], and are indicated in parentheses. Sequences were aligned using ClustalW. To simplify the alignment, a 25 amino-acid region of HPK9 was removed in the D-box region (indicated by an open square), and a 3 amino-acid region of HPK8 was removed in the G-box region (indicated by a filled square). HPK9 does not contain an H box in its dimerization domain, so HPK9 was left out of the H-box alignment. HPK7 and HPK8 do not contain an F box. The HPK11 subfamily contains a partially conserved F box, but the example used for this alignment does not. Highly conserved residues (those present in a majority of the subfamilies) are shown boxed in dark blue, and a consensus sequence is presented below the aligned sequences; positions containing chemically similar residues are shown boxed in light blue and indicated by a dot in the consensus sequence.

Characteristic structural features

HPKs catalyze the transfer of phosphate from ATP to a unique histidine residue, and all HPKs have a conserved ATP-binding catalytic domain that is required for kinase activity. This catalytic domain, together with a dimerization domain, forms the kinase core. The HPKs are classified into 11 subfamilies on the basis of the sequences of these two core domains [1]. Figure 1 shows a sequence alignment of the core domains with a representative of each subfamily.

The core domains

Histidine kinase activity depends on homodimer formation, with the dimerization domains, which have two-stranded coiled-coils, coming together to form a four-helix bundle [11,12]. It can be seen in Figure 1 that, except for the hpk9 family (the CheA family), the dimerization domain includes a motif, known as the H-box, which contains the site of autophosphorylation. As in the case of tyrosine protein kinases, such as the insulin receptor, HPK-mediated autophosphorylation appears to occur in trans, with the catalytic domain of one subunit in a dimer phosphorylating the H-box histidine in the opposing subunit [3]. Many HPKs also have phosphatase activity which dephosphorylates the response regulator and opposes kinase function [3, 13] (for details see the Mechanism section); phosphatase activity is mediated by the dimerization domain in these HPKs.

The catalytic domain of HPKs has clear sequence and structural homology to the ATP-binding domains of type II topoisomerases (such as GyrB), the mismatch repair protein MutL, and the heat-shock protein Hsp90 [14,15,16]. These domains form a family with a conserved structure of several α helices packed over one face of a large, mostly antiparallel, β sheet, forming a loop that closes over the bound ATP (the 'ATP lid'). The ATP lid and the entire ATP-binding site are poorly organized in the absence of ATP or ATP analogs [14,15,17]. A substantial ordering of structure and other conformational changes in the ATP lid can be seen in the X-ray crystal structures of nucleotide-bound forms of MutL and GyrB as well as the HPKs CheA (which is involved in bacterial chemotaxis) [16] and PhoQ (which senses environmental concentrations of Ca2+ and Mg2+) [18]. Within the HPK catalytic domain four conserved motifs, the N, D, F, and G boxes, are involved in ATP binding [16,18], and probably also in catalysis and phosphotransfer. The motifs can be seen in Figure 1. Some subfamilies (HPK7 and HPK8) appear to have no identifiable F box, and the HPK10 subfamily has no identifiable D box. The D and G boxes [1,19] have also been labeled the G1 and G2 boxes, respectively [20].

Figure 2 shows representations of the only three available structures of HPKs, from which a number of insights have been gained about the overall structure of HPKs [14,15,21], as well as the role of conserved residues of the catalytic domain in interactions with the Mg2+ co-factor, which is required for ATP binding, and the nucleotide substrate [14,16,18]. The aspartate that defines the D box is hydrogen-bonded directly to the adenine ring of the ATP. Buried water molecules form hydrogen bonds that connect additional amino-acid residues in the N and D boxes to the adenine. The bound Mg2+ bridges from the nucleotide phosphates to residues in the N box. The F box is part of the ATP lid, whereas the G box forms the flexible hinge at the end of the ATP lid. Hydrolysis of ATP is coupled to Mg2+ release and conformational changes in the ATP-binding cavity, and the ATP lid does not remain wellordered with ADP in the binding site. The ordering of the ATP lid may couple ATP binding to interactions between domains, such as those between the catalytic domain and the H-box region. On the basis of the CheA structure, it seems that there is substantial flexibility in the region of the hinge that joins the dimerization and catalytic domains [15]. This flexibility maybe critical for the domain rearrangements that occur during the catalytic cycle [22].

Figure 2
figure 2

A comparison of the three-dimensional structures of three HPK families (HPK2, HPK3, and HPK9) shows the strong conservation of the catalytic domain structure. (a) The kinase core (dimerization and catalytic domains) of an HPK9, the Thermotoga maritima CheA (Protein Data Bank, PDB, entry 1B3Q) [15]. (b) The catalytic domain of T. maritima CheA complexed with Mg2+ and the nucleotide analog ADPCP (PDB entry 1I58) [16]. (c) The catalytic domain of the HPK2 E. coli EnvZ complexed with ADP (model 1 from PDB entry 1BXD) [14]. (d) The catalytic domain of an HPK3, PhoQ of E. coli, complexed with Mg2+ and the nucleotide analog AMPNP (PDB entry 1ID0) [18]. The figure was created with Swiss-PdbViewer 3.7 and rendered with POV-Ray 3.1.

The sensing and linker domains

On the basis of gene sequence analyses, the typical HPK appears to be a transmembrane sensor. This type of HPK usually has an uncleaved signal sequence, which serves as a first transmembrane helix (TM1), an extracellular sensing domain, and a second transmembrane helix (TM2); this is similar to typical type I tyrosine protein kinase receptors, such as the receptors for the epidermal growth factor (EGF) and insulin [23], but in eukaryotic tyrosine protein kinase receptors the signal sequence is generally cleaved to remove TM1. The extracellular sensing domains of HPKs are extremely diverse and, in contrast to the type I tyrosine protein kinase receptors, no conserved structural motif can be inferred from sequence comparisons [22]. It seems that almost any type of sensory domain can regulate HPK activity. In Gram-negative bacteria some HPKs sense through interaction with a periplasmic binding protein that may also interact with ATP-binding cassette (ABC) transport systems [24].

Inside the cytoplasm, a typical HPK has a HAMP domain, a widely distributed putative regulatory element in transmembrane and other signaling proteins [25], that consists of about 50 residues with two helical segments and a non-helical, compact subdomain in between [25,26]. The HAMP domain (also called the linker) is located between the second transmembrane domain and the dimerization domain (Figure 1) and may play a criticial role in HPK signal transduction [25,26]. It may also directly participate in binding sensory co-factors, such as flavin adenine dinucleotide (FAD) [27]. The HAMP domain might have a structure analogous to that of helix-loop-helix transcription factors [22].

Hybrid kinases

As mentioned above, hybrid HPKs contain both an HPK and an additional carboxy-terminal receiver domain. They are not stand-alone signaling systems, however, but have to communicate with a separate downstream response regulator with an output activity. They achieve this by using a multi-step phosphorelay mechanism, rather than the single phosphotransfer mechanism that operates in normal two-component signaling pathways [28]. In phosphorelays, an intermediate histidine phosphotransfer protein (HPt) is involved either as a soluble protein or as an attached carboxy-terminal domain of the hybrid HPK. HPt proteins receive phosphates from hybrid HPKs and shuttle them to the receiver domains of the downstream response regulators [28]. In certain phosphorelay systems, receiver domains of hybrid HPKs also mediate the hydrolysis of phosphorylated HPt intermediates [29]. The phosphate group is carried on a conserved histidine residue of the HPt protein. A number of three-dimensional structures of HPt domains have been solved, including those of the bacterial HPKs ArcB [30], and CheA [31] as well as the yeast Hpt protein Ypd1 [32]. Despite their diversity of sources, these structures reveal a nearly identical four-helix-bundle core structure [31].

Structures of a range of response regulator receiver domains have also been determined, although not of a hybrid kinase and only of a few intact response regulators. The structure of the receiver domain of the chemotaxis response regulator CheY is generally taken as the stereotypical example of this domain [33,34]. The receiver domain is a five-stranded parallel β sheet that is wound twice, and the site of aspartate phosphorylation is in an acidic pocket near the carboxy-terminal edge of the β sheet [32].

Localization and function

Transmembrane receptor HPKs

Most HPKs are transmembane proteins that are presumed to be receptors for extracellular signals [35]. In the vast majority of cases, the putative ligands are not known. Furthermore, membrane localization does not necessarily mean that the HPK has to bind a soluble extracellular ligand. For example, the well-studied E. coli osmosensing HPK EnvZ is an integral membrane protein that appears to have a periplasmic sensing domain. Removal of this domain does not disrupt osmosensing via EnvZ, however [36], and it is unlikely that EnvZ senses osmolarity by directly binding small soluble molecules. It is more likely that changes in osmolarity modulate interactions of EnvZ with outer membrane proteins, or affect membrane tension, and such changes might in some other way shift the equilibrium between kinase and phosphatase activity of EnvZ. A similar mechanism might regulate the activity of the osmosensing HPK Sln1p in the yeast Saccharomyces cerevisiae, given that the Sln1p transmembrane regions are required for transducing changes in osmolarity [37].

Soluble HPKs

Some HPKs are soluble cytosolic proteins that couple to discrete transmembrane receptors, and another group of HPKs are predicted to be soluble proteins that so far have no known membrane-linked receptor. The best-studied example of a soluble HPK that couples to transmembrane receptors is CheA [11,22]. In E. coli, CheA couples to five different chemotaxis receptors, four of which bind - either directly or through the use of a separate binding protein - small ligands, such as amino acids and sugars, which in turn act as chemoattractants for the bacteria. Through interactions between the receptors and CheA, ligand binding regulates CheA HPK activity and thus controls activity of the chemotaxis two-component system. This situation is operationally very similar to that of mammalian cytokine receptors that control the activity of soluble cytosolic tyrosine protein kinases: like the E. coli chemotaxis receptors, cytokine receptors have no enzymatic activity of their own, but couple to intracellular protein kinases [23,38].

In a similar vein, receptor HPKs function analogously to growth factor receptors with tyrosine kinase activity in animal cells: both classes of receptor are type 1 membrane proteins with an amino-terminal extracellular sensing domain that is linked to the intracellular signaling domain via a single transmembrane helix. Both types of receptor also function as dimers, trans-phosphorylating their partner subunit [23]. But whereas phosphotyrosine-bearing receptors act as binding sites for Src-homology 2 (SH2) domain proteins, autophosphorylated receptor HPKs transfer their phosphate to downstream response regulators, and so must in fact become dephosphorylated in order to transmit their signals. It is interesting to note that there are currently no examples of organisms that have both receptor tyrosine kinases and receptor HPKs, and this suggests that these receptors fulfill equivalent roles.

The E. coli chemotaxis receptors that regulate CheA show higher-order organization in the bacterial inner membrane, where they are predominantly clustered in one macromolecular patch [39]. Although the chemotaxis receptors themselves are not HPKs, they might provide a model for the organization of those HPKs that function as receptors. The higher-order structures are thought to be necessary for the exquisite sensitivity and dynamic signaling properties of the chemotaxis signaling system [40]. Although chemotaxis may be a specialized signaling system, in principle its properties could be important for other signaling pathways that must be able to respond to small changes in ligand concentration and/or over a wide range of ligand concentrations.

Mechanism

The responses generated by HPK signaling are mediated by response regulators, which - like HPKs - form a large family of proteins [1]. Most bacterial response regulators have an amino-terminal receiver domain containing the site of aspartate phosphorylation. Signaling proceeds by autophosphorylation of the HPK domain followed by phosphotransfer to a receiver domain [41,42]. A carboxy-terminal DNA-binding domain allows the response regulator to function as a transcriptional regulator [28]. The presence of the conserved receiver domain defines a protein as a response regulator, but not all response regulators are transcription factors. Other examples of output activities include enzymatic activity, such as that of the receptor methylesterase CheB [28], responsible for adaptation in bacterial chemotaxis, and protein-protein interaction, mediated, for example, by the chemotaxis effector protein CheY [28]. Regardless of the particular output function of a response regulator, its activity is strongly dependent on the state of aspartate phosphorylation of its receiver domain. Thus, extracellular signals are transduced into a cytosolic output response via a protein phosphorylation cascade mediated by an HPK and a response regulator.

HPK sensor domains regulate kinase activity, but as discussed above, many HPKs also have a phosphatase activity [3,13]. When the kinase state predominates, the net activity of the two-component pathway is phosphorylation of the response regulator; when the phosphatase state predominates, the net activity is dephosphorylation of the response regulator. Because two antagonistic states are in equilibrium, ligands exert tight regulation over pathway activity because a shift in the equilibrium between kinase and phosphatase steady states is amplified by zero-order ultrasensitive kinetic effects [43,44].

In many HPKs, the regulatory effects of the phosphatase activity predominate [3,45]. Thus, the dominant phenotype of a mutant strain lacking a particular receptor-HPK is often a low-level constitutive activity of the response regulator under conditions in which, in wild-type cells, phosphatase activity of the HPK leads to response regulator inactivation. Regulation of these dual-function receptor-HPKs appears to involve modulation of a balance between two distinct states, namely kinase on and phosphatase off, or kinase off and phosphatase on [46]. This phosphatase activity is not simply a reverse phosphotransfer, however, because phosphatase activity is retained in some mutant HPKs lacking the H-box histidine [47], and most HPKs do not have intrinsic autophosphatase activity.

Frontiers

HPKs and two-component systems are an exciting area of research for a growing number of research groups studying the basic biochemistry of these systems as well as those focused on medically relevant topics such as pathogenesis, quorum sensing, and biofilm formation. Of medical interest is the potential for using HPKs as targets for new classes of antimicrobial drug. HPKs are particularly attractive for this purpose because they are not found in animals, and also because interfering with a pathogen's HPK signaling systems may expose it to destruction by the host immune system rather than being directly toxic. This type of indirect antibiotic would be less likely to evoke resistance in the target strain. Although some potential HPK inhibitors have been discovered [16,48], none of them has yet advanced to the stage of being used as a pharmaceutical. Research in this area is likely to accelerate as more HPK structures become available and can be examined for structure-based drug design.

The HPK10 family is one that needs more basic study to determine structural information. It is a distinct and widely distributed family that includes six-transmembrane HPK receptors with hydrophobic amino-terminal domains; they are referred to as 'six-transmembrane' receptors, but both algorithmic predictions and measurements of topology have determined only that HPK10 receptors have between five and seven transmembrane helices [49,50]. These receptors appear generally to function in cell-cell communication. In the few cases where stimulatory ligands have been identified, they are small peptides or modified peptides that are secreted by the same organism. For example, the AgrC system in Staphylococcus aureus is a quorum-sensing system that exports and then senses modified peptides of eight to nine amino acids that have a thio-lactone ring [51]. There is no direct evidence that the six-transmembrane HPKs are dimers, and the kinase core domains of members of the HPK10 family differ from other HPKs; in particular, the HPK family has no D box, and the region near the H box does not show a high similarity to other HPK dimerization domains [1]. This suggests substantial differences in the geometry of the active site, and transmembrane signal transduction mediated by the HPK10 family may proceed by analogy to the events mediated by eukaryotic seven-transmembrane receptors, whereas signaling through other families of HPK receptors is analogous to signaling through type I tyrosine protein kinases.

Finally, the study of HPK signaling in eukaryotes lags behind that in prokaryotes and many challenges lie ahead, including identifying the signaling connections between hybrid HPKs, HPt proteins and response regulators, and determining the interactions between the phosphorelay pathways and the other signaling pathways in eukaryotic cells. In Arabidopsis thaliana (and presumably in many other plants as well) the HPK gene family has undergone great expansion [5]. In Arabidopsis, at least five different HPK proteins act as receptors for the hormone ethylene, and the genes encoding these receptors show redundancy. Thus, plants with mutations in one, two or even three of these receptor HPKs have no phenotype; only a quadruple mutant strain shows a phenotype, namely a constitutive response to ethylene [6]. This complexity presents a significant practical challenge to the genetic analysis of HPK signaling in plants.

Similar challenges arise in studying Dictyostelium HPK genes, of which there are at least 15 [5]. Rather than responding to the same ligand, it is probable that many of these HPKs signal to the same output pathway (the response regulator RegA that controls intracellular cAMP levels), so the HPKs are likely to have overlapping functions. The HPK signaling pathways that control cAMP levels must be coordinated with other signaling systems, such as G-protein-dependent pathways, that also regulate cAMP levels in the same cells [52]. A similar signal transduction network must also operate in other eukaryotic HPK systems that are known to control the activity of mitogen-activated protein (MAP) kinase pathways, such as those involved in ethylene signaling in Arabidoposis and osmosensing in yeast [6,42]. Our traditional view of signal transduction is heavily biased towards the mechanisms used by animal cells, and so HPKs are not the main focus of studies of signal-transduction enzymes, but outside the animal kingdom they are the most important transducers of extracellular signals.