The Evolution of Guanylyl Cyclases as Multidomain Proteins: Conserved Features of Kinase-Cyclase Domain Fusions
- First Online:
- Cite this article as:
- Biswas, K.H., Shenoy, A.R., Dutta, A. et al. J Mol Evol (2009) 68: 587. doi:10.1007/s00239-009-9242-5
- 245 Views
Guanylyl cyclases (GCs) are enzymes that generate cyclic GMP and regulate different physiologic and developmental processes in a number of organisms. GCs possess sequence similarity to class III adenylyl cyclases (ACs) and are present as either membrane-bound receptor GCs or cytosolic soluble GCs. We sought to determine the evolution of GCs using a large-scale bioinformatic analysis and found multiple lineage-specific expansions of GC genes in the genomes of many eukaryotes. Moreover, a few GC-like proteins were identified in prokaryotes, which come fused to a number of different domains, suggesting allosteric regulation of nucleotide cyclase activity. Eukaryotic receptor GCs are associated with a kinase homology domain (KHD), and phylogenetic analysis of these proteins suggest coevolution of the KHD and the associated cyclase domain as well as a conservation of the sequence and the size of the linker region between the KHD and the associated cyclase domain. Finally, we also report the existence of mimiviral proteins that contain putative active kinase domains associated with a cyclase domain, which could suggest early evolution of the fusion of these two important domains involved in signal transduction.
KeywordsGuanylyl cyclaseKinase homology domainMimivirusPhylogenycGMPCoevolution
Cyclic GMP (cGMP) is used as an important signaling molecule in many eukaryotes and in mammals is known to be involved in vision, olfaction, muscle contraction, regulation of homeostasis, cardiovascular and neuronal function, and behavior (Sabatini et al. 2007). Guanylyl cyclases (GCs) are the enzymes that catalyze the conversion of GTP to cGMP. Mammals posses membrane-associated GCs, which serve as ligands for diverse polypeptides, as well as cytosolic GCs that are regulated by nitrous oxide and carbon monoxide (Lucas et al. 2000). For example, GC-A and GC-B are the receptors for atrial natriuretic factors (ANF), a family of peptide hormones that act to decrease blood volume by stimulating natriuresis and diuresis in the kidney (Kuhn 2004). GC-C is the receptor for guanylin family of peptides, which regulate fluid and electrolyte balance in the intestine (Forte 2004) as well as the heat-stable enterotoxin peptides secreted by pathogenic bacteria in the intestine (Schulz et al. 1990). Ligands for other receptor GCs are not known and are therefore called “orphan receptors.” These include GC-E/F, which are of prime importance in visual signal transduction (Dizhoor 2000). Soluble GCs function as heterodimers of two different α and β subunits and are ubiquitously expressed in mammalian cells.
In organisms such as Dictyostelium, cGMP is required for chemotaxis signaling in addition to the waves of cAMP that are generated during development (Roelofs et al. 2001). Novel and interesting GCs have been reported from Paramecium, Tetrahymena, and Plasmodium (Baker and Kelly 2004). Caenorhabditis elegans, which alone encodes 34 putative GC genes (27 receptor and 7 soluble), exemplifies the large repertoire of GCs in nematodes, and these gene products appear to be involved in determining neuronal left/right asymmetry in the worm (Ortiz et al. 2006). Drosophila melanogaster has 6 receptor and 5 soluble GCs, indicating a good representation of GCs in insects (McNeil et al. 1995; Morton et al. 2005). Until now, only 1 GC has been reported from plants (Ludidi and Gehring 2003).
Receptor and soluble GCs are multidomain proteins, and the associated domains allosterically regulate the activity of the catalytic domain. Membrane-bound receptor GCs have an extracellular ligand-binding domain, a membrane-spanning domain, an intracellular domain that has homology to protein kinases (i.e., kinase homology domain [KHD]), a linker region, and a C-terminally located cyclase domain (Lucas et al. 2000). Ligand binding results in activation of the intracellular guanylyl cyclase domain, in turn resulting in increased levels of cGMP in the cell. Cytosolic or soluble GCs have two N-terminal regulatory domains, called the “heme nitric oxide–binding” (HNOB) domain and the “heme nitric oxide binding–associated” (HNOBA) domain, which bind nitric oxide and activate the cyclase domain present at the C-terminal end (Foster et al. 1999; Lucas et al. 2000).
The KHDs of receptor GCs are approximately 250 amino acids long and are more similar to the protein tyrosine kinases compared with the serine/threonine kinases (Koller et al. 1992). Genome-wide analysis of human (Manning et al. 2002), mouse (Caenepeel et al. 2004), C. elegans (Plowman et al. 1999), Dictyostelium (Goldberg et al. 2006) and yeast (Hunter and Plowman 1997) showed that 2% to 3% of the genes encode kinase-like proteins. A typical kinase domain contains an N-terminal lobe with five β sheets and a single α helix as well as a C-terminal lobe consisting mainly of α helices (Huse and Kuriyan 2002). A “hinge” of several amino acid residues connects the two lobes and provides flexibility in the relative orientation of the two lobes. ATP is accommodated in the cleft between the two lobes, and the adenosine ring of ATP forms hydrogen bonds with the hinge region. The residues or motifs important for catalysis are the P-loop (nucleotide-binding loop), which is rich in glycines; β strand 3; helix C, which contains a conserved Glu residue in the N-terminal lobe; and the catalytic loop and A-loop in the C-terminal lobe (Fig. 1b). The P-loop has a consensus sequence of GxGxΦG, where x is any amino acid and Φ is Phe or Tyr. The catalytic loop contains the HRD motif along with an Asn residue a few amino acids C-terminal to the HRD motif, with the Asp residue playing a critical role in the phosphotransfer reaction. The activation loop contains the N-terminal anchor with a conserved DFG motif, the Asp of which is involved in binding the Mg2+ ion required for catalysis. In addition, an APE, ALE or SPE sequence, which is involved in positioning the peptide substrate correctly for phosphotransfer (Nolen et al. 2004), is found at the C-terminal anchor (Taylor et al. 1993). Aminoglycoside kinase, APH(3′)-IIIa, lacks the glycine rich loop but shows a similar kinase fold and binds ATP (Hon et al. 1997). Importantly, because of the absence of the critical Asp in the catalytic loop of the KHDs of receptor GCs, this domain is thought to lack the function of a protein kinase in receptor GCs. However, there is a single report on a retinal GC that indicates that it has autophosphorylating activity (Aparicio and Applebury 1996).
Given the importance of cGMP in cellular signaling in eukaryotes, it would be of interest to identify genes that could encode GCs in diverse organisms. In addition, analysis of the KHD domain associated with the GC domain will illuminate the evolution and regulation of this class of enzymes, i.e., the fusion of an inactive kinase domain to the guanylyl cyclase domain suggests a generalized allosteric regulation of GCs that could be likened to the regulation of ACs by G-proteins. We therefore set out to identify genes that encode putative GCs in the nonredundant database. We report here novel GC genes were identified in prokaryotes with interesting domain fusions. Moreover, phylogenetic and correlation analysis of both the cyclase and KHD domains in receptor GCs suggests that the two domains have coevolved. Most interestingly, our analysis identified genes with a putative functional kinase domain that was fused to an inactive cyclase domain, indicating an apparent exclusion of proteins in nature that contain a functional protein kinase and a GC domain in a single polypeptide chain.
Materials and Methods
Nonredundant database was obtained from National Center for Biotechnology Information (NCBI) in January 2007. Cyclase domains from 48 class III nucleotide cyclases comprising both ACs and GCs were used as queries in the BLAST searches. PSI-BLAST (Altschul et al. 1997) searches were performed with an e-value cutoff of 10−4 with a maximum of 100 iterations; final results showed ≤3000 proteins for each protein searched. A hidden Markov model (HMM) profile, cIII-cyclase.hmm, was built for class III nucleotide cyclases using the cyclase domain sequences of the same 48 queries used in the BLAST searches (http:www.//hmmer.wustl.edu/). The cyclase domains were analyzed for the presence of canonical residues required for catalysis by aligning them with cIII-cyclase.hmm profile using hmmalign alignment program in HMMER 2.3 suite. Domains were predicted as active GCs when they contained (1) both the canonical residues Asp or Glu as metal-binding residues; (2) a Glu as the first and either Cys, Gly, Ala, His, Ser, or Thr as the second substrate specifying residues; and (3) Asn or Gly as the first and Arg or Lys as the second transition state–stabilizing residue. Substrate specifying residues in ACs are Lys, Arg, Gln, and Asp at the first position and Asp, Ser, and Thr at the second position. We allowed the possibility that some GCs may contain a Thr residue at the second position because a substitution of a Thr residue for a Ser residue (found in characterized GCs) is usually tolerated in proteins with retention of their function. For a similar reason, we also allowed the presence of either a Lys or Arg as the second transition state–stabilizing residue. In analogy to mammalian ACs (Shenoy and Visweswariah 2004), cyclases having all of the residues needed for catalysis except the substrate-specifying residue are called “ambiguous” (Ambi) cyclases; those with only metal-binding residues are called “C1-like” or C1 cyclase domains; and those with only the transition state–stabilizing residue are called “C2-like” or C2 cyclase domains. Cyclase domains that did not fall into any of these classes are called as “Cyc-like.” Sequences showing ≥95% identity were removed from final analysis.
Kinase domains were searched and aligned using the protein kinase HMM (pkinase; PF00069.15) obtained from the PFAM database (Bateman et al. 2004). All other domains were predicted using the PFAM server (version 21.0 with 8957 protein families; Bateman et al. 2004), the SMART database of protein families (Letunic et al. 2006; Schultz et al. 2000), and the NCBI CD database (version 2.10 with 12589 PSSMs).
Domain sequences used for generating phylogenetic trees were mapped with their respective HMM profiles. Putative active GCs were analyzed for their evolutionary relations using Molecular Evolutionary Genetics Analysis (MEGA version 4; (Tamura et al. 2007). Sequences were named as “|GI|name of the protein (if any)|short species name_domain number (if more than one)”. All sequence alignments for phylogenetic tree construction was performed using ClustalW (matrix: BLOSUM; penalties: for pairwise alignment, gap opening = 10 and extension = 0.2; for multiple alignment, gap opening = 10 and extension = 0.2), and trees were built using the neighbor-joining (NJ) method (Saitou and Nei 1987) with interior branch test (500 replicates) implemented in MEGA4 software. Evolutionary distances were computed using Poisson correction method, and all positions containing gaps and missing data were eliminated from the data set using the complete deletion option in MEGA4.
Correlation coefficient (Pearson’s correlation coefficient [r]) and estimate of its SD, z-score, and p-value were determined essentially as reported earlier (Goh et al. 2000). Briefly, pairwise distances and average pairwise distance were computed from the alignment using MEGA4. Computed values were used for the calculation of r. Significance of the estimated r value was assessed by bootstrapping analysis, which gives the SD of r, and estimating the probability of obtaining the value by chance alone (p value). Bootstrap estimate of SD of r, z-score, and p-value were calculated by generating 1000 sets of randomly sampled pairwise distances, with replacement from the original set of distances for each pair of domains compared.
Search for Kinase–Cyclase Fusion Proteins
In June 2007, the NRDB was searched for kinase domain– and cyclase domain–containing proteins with the Pkinase.hmm and cIII-cyclase.hmm profiles. This was to ensure that the proteins identified in the search have sequences similar to protein kinase and cyclases, respectively, and are not distant homologues as can happen when using PSI-BLAST searches. Proteins found in both of the searches were pooled together, and redundant sequences removed. Analysis of critical motifs in the kinase and cyclase domains were performed after aligning the sequences with hmmalign (using respective HMM profiles). The cyclase domains classified as “Cyc-like” had a high e-value in the hmmsearches and hence were further checked at the 3DPSSM server. All of them were predicted to have a class III cyclase-like structure.
The linker region was defined as the sequence of amino acids between the kinase and cyclase domain in GCs. The boundaries of both domains were mapped using their respective HMM profiles. Although there is no foolproof method for demarcating domain boundaries, the current procedure served the purpose because we used the same hmm profiles for all sequences that were analyzed. HMM profile for the conserved 40 amino acid residues was generated using sequences reported earlier (Anantharaman et al. 2006) and the hmmbuild program in HMMER suite. The profile was further calibrated with hmmcalibrate program from the HMMER 2.3 suite.
Identification of Guanylyl Cyclases in NRDB
The ability of nucleotide cyclases to use ATP or GTP as a substrate, and hence generate cAMP or cGMP as the product, depends on the substrate-specifying residues. Therefore, inspection of these residues would allow one to classify a nucleotide cyclase as an AC or GC. A BLAST search of the NRDB in July 2007 with a query set of 48 class III nucleotide cyclase domains found 3138 protein sequences. The presence of the canonical residues required for catalytic activity, along with substrate-specifying residues, identified 199 proteins that potentially could have GC activity. This method of analysis would not have identified the α subunits of soluble GCs because they lack important residues for catalysis and are active only on dimerization with β subunits (Lucas et al. 2000; Perkins 2006; Yamada et al. 2006).
Guanylyl Cyclases in Prokaryotes
The domain architecture of these bacterial GCs is more similar to bacterial ACs than typical GCs (Shenoy and Visweswariah 2004; Wu et al. 2008), and the cyclase domains forms a separate cluster in the phylogenetic tree, suggesting their distant relation with other GCs. Linder and Schultz subclassified the class III nucleotide cyclase family (Linder and Schultz 2003). However, a phylogenetic tree constructed with the cyclase domains of proteins identified in this study, as well as the sequences used by Linder and Schultz to determine the subclass of class III nucleotide cyclases, indicated that the bacterial guanylyl cyclase-like genes formed a separate cluster, except for the Roseobacter denitrificans GC (data not shown). This suggests that the new genes we identified are indeed a distinct subclass within the classical class III nucleotide cyclases and therefore warrant further study in terms of their activity and catalytic mechanism.
Domain Organization of Guanylyl Cyclases
Two genes from Plasmodium encode two GC domains associated with an N-terminal Hydrolase_3 (Haloacid dehalogenase–like hydrolase; Koonin and Tatusov 1994) or an ATPase domain and multiple (19 and 21) transmembrane helices. Although they are similar to the mammalian ACs in terms of overall topology, the orientation of the cyclase domains is reversed in that the cyclase domain that contains substrate specifying residues (C2 domain of an AC) is found N-terminal to the domain that contains the Asp residues required for metal binding (C1 domain of ACs). Interestingly, the two types of cyclase domains present in these two-cyclase domain GCs showed clear separation in the phylogenetic analysis, with the C2 domains grouping with the bacterial GCs. One predicted receptor GC in Strongylocentrotus purpuratus has a fused ribosomal_S7 domain, and another one has a DSL (Delta/Serrate/Lag-2) domain N-terminal to the cyclase domain and 30 EGF domains C-terminal to the cyclase domain. The fusion of a DSL domain with a GC suggests the involvement of cGMP in developmental pathways. There are cyclases with CHASE2 domains (cyclases/histidine kinases associated sensing extracellular) (Mougel and Zhulin 2001) and some with a response regulator domain associated with a single cyclase domain. Interestingly, ACs with these domain fusions in bacteria have been identified earlier (Shenoy and Visweswariah 2004).
Most of the cyanobacterial GCs have a CHASE_2 domain associated with the cyclase domain, except a single gene, which encodes only a single cyclase domain. Both proteins from Mariprofundus ferrooxydans PV-1 have a response-regulator domain fused to the GC domain. Of the two GCs found in Trichodesmium erythraeum IMS101 (a colonial marine cyanobacterium), one has a CHASE2 domain along with a cyclase domain, whereas the other one has a congregation of domains, such as the Cache_1, HAMP-HisKA-HATPase-c-response regulator in tandem. CHASE2 and Cache_1 domains sense extracellular stimuli, and their presence in these proteins suggests a putative receptor-like function for these GCs. The presence of multiple signaling domains in a protein suggests that it would have been used to diversify the stimuli sensed by the extracellular Cache_1 domain into various signals in the cell by way of histidine phosphorylation (HisKA), HATPase_c, response regulator and the cyclase domain, which in turn is regulated by the HAMP domain. The diversity of domains found associated with GCs is far less than that found in the ACs (Shenoy and Visweswariah 2004), in which more than 50 distinct domains were fused to the adenylyl cyclase domain. Perhaps this suggests a more precise structural requirement of the catalytic domain of GCs than ACs, which prevents the guanylyl cyclase domain from functioning when associated with another domain. Alternatively, it could reflect the decreased use of cGMP as a second messenger, thus resulting in decreased associated-domain diversity because there have been several reports on the widespread use of di-cGMP as a signaling molecule compared with cGMP in bacteria (Romling and Amikam 2006).
Phylogenetic Analysis of the Kinase and Cyclase Domains
As mentioned previously, the majority of GCs identified resembled the architecture of receptor GCs in that they possessed an extracellular domain and a KHD domain N-terminal to the cyclase domain. We therefore analyzed in greater detail the KHDs of these genes to compare their phylogeny with their associated cyclase domain.
Pairwise distance correlation analysis between various domains of receptor GCs
SD of r
Sequence Analysis for the Presence of Critical Motifs Required for Catalysis in the KHDs of Receptor GCs
Conservation of critical motifs in the kinase homology domains found in KHD–GC fusion proteins
CASK CaM kinase lacks the Asp of the DFG motif and hence was thought to be a pseudokinase (Boudeau et al. 2006). However, recently this protein was found to have kinase activity (Kannan and Taylor 2008; Mukherjee et al. 2008). The structure showed that AMPPNP is bound in the cleft between the N- and C-lobes, as is seen in most kinases. The α-phosphate group of AMPPNP was found positioned in between the side chains of His145 and Lys41, with a number of water molecules aiding in the interaction. However, CASK CaM kinase contains the HRD, VAIK, and the GXGXXG motifs (Mukherjee et al. 2008). Our analysis of the critical motifs in the KHDs indicates that most of them contain the residues required for ATP binding as seen in typical protein kinases (76% have Lys in VAIK motif, 82% have Asn in the catalytic loop, and 83% have Asp in the activation loop). Instead, as said previously, none of them contain the Asp in the HRD motif, which is thought to be critical for the phosphotransfer reaction. Therefore, it can be safely argued that the kinase domains present in GCs are true pseudokinases.
Conservation of the Linker Region
Considering the high level of sequence conservation in this region, we sought to determine the domain (either KHD or cyclase domain) with which the linker regions are coevolving. The correlation coefficient determined for the linker region and the cyclase domain and linker region and the KHDs were found to be 0.63 and 0.69, respectively (Table 1). This indicates that there appears to be no strong coevolution of the linker region with either the KHD or cyclase domains. It is therefore likely that the linker region is a conserved feature of these receptor GCs and could have an independent role in the regulation of receptor GCs.
Kinase–Cyclase Domain Fusion
Presence of kinase subdomain motifs and classification of the associated cyclase domaina
Cyclase domain type
VAIK + HRD + DFG
VAIK + HRD
VAIK + DFG
HRD + DFG
This is the first report of a systematic search for GC-like genes in NRDB that considers not only overall sequence similarity in the cyclase domain but also the conserved critical residues required for cyclase function. We also paid extensive attention to the critical motifs in KHDs, which are amongst the commonly found domains associated with GCs. Our searches indicate a skewed distribution of GCs in the tree of life. Most of the putative GCs were from higher eukaryotes, with only a few found in lower organisms. Earlier reports on the phylogenetic analysis of GCs considered only overall sequence similarity and did not specify the functional classification based on catalytic activity (Fitzpatrick et al. 2006; Yamagami and Suzuki 2005). In agreement with an earlier report (Fitzpatrick et al. 2006), we found multiple lineage-specific GC expansion. However, we restricted our search for the presence of all motifs required for cyclase activity in a single protein, and this has resulted in our focusing almost exclusively on homodimeric receptor GCs.
We were able to find a few cyclases containing all of the residues required for GC activity (along with overall sequence similarity) in prokaryotes. The domain composition of these proteins indicated that they resembled bacterial ACs. Phylogenetic analysis suggested their evolutionary distance from the remaining GCs, but, interestingly, they formed a separate cluster from the ACs as well. This suggests that they could be intermediates in the evolution of GCs found in higher forms of life. It is clear that cGMP has not been extensively used as a second messenger in prokaryotes. In contrast, cyclic di-GMP has been extensively used in prokaryotes but not in eukaryotes (Romling and Amikam 2006). Perhaps the evolution of the effector proteins of cGMP and cyclic di-GMP have played a role in determining which second messenger was used across different phyla. Although there have been reports of cGMP (Szmidt-Jaworska et al. 2007) and proteins with GC activity (Ludidi and Gehring 2003) in plants, we were unable to detect GC-like genes in any plant using the stringent criteria employed in our analysis. The role of cGMP signaling in the presence of light has been well documented in plants; thus, it is likely that there has been extensive divergence of GC-like genes in plants, which prevented us from identifying them using the available sequences of other eukaryotic proteins.
Both cAMP and cGMP have been identified in cyanobacteria; however, cyclase enzymes characterized in detail are mostly ACs (Kanacher et al. 2002; Kasahara et al. 2001; Katayama and Ohmori 1997; Ochoa De Alda et al. 2000). Only one GC (Ochoa De Alda et al. 2000) and a single cGMP-specific phosphodiesterase (Cadoret et al. 2005) have been identified using genetic screens. Our results indicate the presence of a few active GCs in cyanobacteria. Cyclic nucleotide-mediated signal-transduction pathways have been implicated in various physiologic and developmental processes in these organisms, specifically in the UV-B–induced stress response (Cadoret et al. 2005), motility (Bhaya et al. 2006), and heterocyst development (Imashimizu et al. 2005). The domain composition of the GCs found in cyanobacteria also makes them interesting proteins, and detailed analysis of the signal-transduction pathways involving these proteins will certainly provide more insight into their roles in these organisms. It is, however, conceivable that GCs exist in cyanobacteria that are not similar to class III nucleotide cyclases and are likely to be identified only by genetic analysis in the first instance.
The study performed here is the first to suggest the presence of a GC in Cryptosporidium parvum. C. parvum is an apicomplexan protozoan parasite that is a major cause of diarrhea and gastroenteritis in human and other animals. It invades and resides in the epithelial cells, mostly in the small intestine. Although many immunological factors have been shown to play roles in the development of diarrhea after infection (Laurent et al. 1999; McDonald 2000), the presence of a GC in the parasite and perhaps subsequent production and secretion of cGMP in the host cell could also aid in causing diarrhea. Indeed, increased cGMP in intestinal cells, after activation of an intestinal receptor GC (GC-C) by toxins produced by enterotoxigenic E. coli, is the cause for “travellers’ diarrhea” (Hughes et al. 1978). Hence, biochemical and functional characterization of the Cryptosporidium cyclase and the role of cGMP in the pathogenesis of Cryptosporidium will be of immense importance.
Most of the GCs found in the study were typically of either the receptor or soluble GC topology, indicating a common theme of regulation of GC activity by ligands. A few proteins are found with a KHD and cyclase domain fusion without any extracellular ligand-binding domain. The first such protein was reported from the rat, and the protein is expressed in lung, kidney, and skeletal muscle (Kojima et al. 1995). However, no function has been assigned to this GC. Given the data on the role of KHDs in receptor GCs, it could be suggested that these cyclases are also regulated by ATP and hence can be sensors of intracellular ATP. Interestingly, a strong conservation in the length of the linker region between the KHD and the cyclase domain suggests a conserved means of regulation by the linker of either the activity of the KHD or the cyclase domain.
A variety of domains was found fused to the cyclase domain in lower organisms. GCs with two cyclase domains in a single polypeptide chain, a feature of mammalian ACs, show GC activity (Linder et al. 1999, 2000); however, the physiologic functions of these proteins are largely unknown. The presence of GCs in bacteria with domains usually found in ACs (e.g., CHASE_2, HAMP, Cache_1, HisKA, etc.) indicates that these GCs might have evolved from ACs.
Our phylogenetic and pairwise distance correlation analysis of the KHDs and the cyclase domain associated with them suggest coevolution of the two domains in GCs. Although most of the KHD containing GCs were found to contain the ATP-binding VAIK motif, and hence possible regulation by ATP, phylogeny indicates that the KHDs present in various kinds of GCs (e.g., GC-A, GC-B, etc.) are distinct from each other. Further analysis of conservation of the critical motifs for kinase function showed a variation in the conservation of motifs other than the VAIK motif. It can therefore be assumed that the KHDs in different GCs may regulate the activity of the GC domain in specific ways, as has been corroborated by experimental evidence wherein the exchange of the KHD of GC-A and GC-B allowed for proper functioning of the receptors, but not between GC-A and GC-C (Koller et al. 1992). This is probably caused by similarity between the cyclase domains of GC-As and GC-Bs, which formed a single clade in the phylogenetic tree compared with that of other groups of receptor GCs and therefore could be regulated by a similar KHD domain.
For the first time we report the presence of cyclase-like domains in the mimivirus. The mimivirus was discovered in 1992 within the amoeba Acanthamoeba polyphaga, after which it is named (Raoult et al. 2007). Mimivirus can infect mammals and cause pneumonia (La Scola et al. 2005) and therefore there has been much debate on whether this virus has acquired genes through horizontal gene transfer from its mammalian host (Filee et al. 2007; Monier et al. 2007; Moreira and Brochier-Armanet 2008; Suhre 2005; Suzan-Monti et al. 2006) or in fact represents a type of DNA virus that emerged before cellular organisms (Moreira and Brochier-Armanet 2008; Suhre 2005). The presence of gene duplications in the mimivirus genome is of a frequency comparable with that commonly observed in bacteria, archeae, and eukaryotes (approximately 30%), suggesting early evolutionary origins of the mimivirus (Suhre 2005). If this is indeed true, it indicates the early origin of proteins that could generate cyclic nucleotides, and their fusion with kinases, as we see in this study. It is a rare occurrence that kinase and guanylyl cyclase activities both function in a single protein. There has been only a single report of the KHD of a receptor GC showing catalytic activity (Aparicio and Applebury 1996), and it is difficult to comment at this time whether the activity detected was because of an associated kinase. The transition from a functional to a nonfunctional domain mirrors what has recently been reported for GAF domains. These domains have been found to regulate the activity of a variety of enzymes (e.g., ACs), cyclic nucleotide phosphodiesterase, and transcription factors (e.g., FhlA) (Martinez et al. 2002). However, recent evidence suggests that the GAF domain itself can perform catalysis (Lin et al. 2007). Therefore, it seems that nature can use the same protein architecture for either an enzymatic or a regulatory function.
In summary, we have identified GCs in prokaryotic genomes with a variety of domain fusions. Coevolution of the KHD along with the cyclase domain indicates the intricate relation between these domains in conveying external signals into changes in the physiology of the cells, resulting in conservation of the linker region between the two domains. The allosteric regulation of mammalian ACs (ATP-using enzymes) by GTP-binding proteins (G proteins) has been well characterized both functionally and structurally. The intriguing finding in our analysis described here is the abundant presence in diverse organisms of ATP-binding modules (KHDs) that allosterically regulate GTP-using enzymes. Compared with the scenario seen with G proteins and ACs, KHDs and GC domains are found in a single polypeptide chain, which perhaps prevents a generalized interaction between a single KHD regulatory domain and many GCs. This in turn has allowed these proteins to develop exquisite mechanisms of regulation, thereby effectively controlling cGMP levels in the cell.
Financial support from the Department of Biotechnology, Government of India, is acknowledged. K. H. B. is a recipient of a research fellowship from the Council of Scientific and Industrial Research, Government of India. Useful discussions with Ritwick Sawarkar are gratefully acknowledged.