Journal of Molecular Evolution

, Volume 68, Issue 6, pp 587–602

The Evolution of Guanylyl Cyclases as Multidomain Proteins: Conserved Features of Kinase-Cyclase Domain Fusions

Authors

  • Kabir Hassan Biswas
    • Department of Molecular Reproduction, Development and GeneticsIndian Institute of Science
  • Avinash R. Shenoy
    • Department of Molecular Reproduction, Development and GeneticsIndian Institute of Science
  • Anindya Dutta
    • Department of Molecular Reproduction, Development and GeneticsIndian Institute of Science
    • Department of Molecular Reproduction, Development and GeneticsIndian Institute of Science
Article

DOI: 10.1007/s00239-009-9242-5

Cite this article as:
Biswas, K.H., Shenoy, A.R., Dutta, A. et al. J Mol Evol (2009) 68: 587. doi:10.1007/s00239-009-9242-5

Abstract

Guanylyl cyclases (GCs) are enzymes that generate cyclic GMP and regulate different physiologic and developmental processes in a number of organisms. GCs possess sequence similarity to class III adenylyl cyclases (ACs) and are present as either membrane-bound receptor GCs or cytosolic soluble GCs. We sought to determine the evolution of GCs using a large-scale bioinformatic analysis and found multiple lineage-specific expansions of GC genes in the genomes of many eukaryotes. Moreover, a few GC-like proteins were identified in prokaryotes, which come fused to a number of different domains, suggesting allosteric regulation of nucleotide cyclase activity. Eukaryotic receptor GCs are associated with a kinase homology domain (KHD), and phylogenetic analysis of these proteins suggest coevolution of the KHD and the associated cyclase domain as well as a conservation of the sequence and the size of the linker region between the KHD and the associated cyclase domain. Finally, we also report the existence of mimiviral proteins that contain putative active kinase domains associated with a cyclase domain, which could suggest early evolution of the fusion of these two important domains involved in signal transduction.

Keywords

Guanylyl cyclaseKinase homology domainMimivirusPhylogenycGMPCoevolution

Introduction

Cyclic GMP (cGMP) is used as an important signaling molecule in many eukaryotes and in mammals is known to be involved in vision, olfaction, muscle contraction, regulation of homeostasis, cardiovascular and neuronal function, and behavior (Sabatini et al. 2007). Guanylyl cyclases (GCs) are the enzymes that catalyze the conversion of GTP to cGMP. Mammals posses membrane-associated GCs, which serve as ligands for diverse polypeptides, as well as cytosolic GCs that are regulated by nitrous oxide and carbon monoxide (Lucas et al. 2000). For example, GC-A and GC-B are the receptors for atrial natriuretic factors (ANF), a family of peptide hormones that act to decrease blood volume by stimulating natriuresis and diuresis in the kidney (Kuhn 2004). GC-C is the receptor for guanylin family of peptides, which regulate fluid and electrolyte balance in the intestine (Forte 2004) as well as the heat-stable enterotoxin peptides secreted by pathogenic bacteria in the intestine (Schulz et al. 1990). Ligands for other receptor GCs are not known and are therefore called “orphan receptors.” These include GC-E/F, which are of prime importance in visual signal transduction (Dizhoor 2000). Soluble GCs function as heterodimers of two different α and β subunits and are ubiquitously expressed in mammalian cells.

In organisms such as Dictyostelium, cGMP is required for chemotaxis signaling in addition to the waves of cAMP that are generated during development (Roelofs et al. 2001). Novel and interesting GCs have been reported from Paramecium, Tetrahymena, and Plasmodium (Baker and Kelly 2004). Caenorhabditis elegans, which alone encodes 34 putative GC genes (27 receptor and 7 soluble), exemplifies the large repertoire of GCs in nematodes, and these gene products appear to be involved in determining neuronal left/right asymmetry in the worm (Ortiz et al. 2006). Drosophila melanogaster has 6 receptor and 5 soluble GCs, indicating a good representation of GCs in insects (McNeil et al. 1995; Morton et al. 2005). Until now, only 1 GC has been reported from plants (Ludidi and Gehring 2003).

Although GCs use GTP as substrate, they are similar to adenylyl cyclases (ACs) in terms of primary structure and are classified as class III nucleotide cyclases (Tesmer et al. 1999; Tucker et al. 1998). Much of the inferences on the catalytic mechanism and substrate specificity of GCs are based on the structural knowledge of ACs and mutational analysis (Fig. 1a) (Tesmer et al. 1997; Tucker et al. 1998). ACs function as dimers, with catalysis occurring at the dimer interface, and requires two metal ions, either Mg2+ or Mn2+ (Tesmer et al. 1999). In mammalian ACs, the two cyclase domains forming the dimer are found in a single polypeptide chain and are numbered according to their order from N to C termini (C1 and C2, respectively). The structure determined for the C2 homodimer of an AC, as well as the C1-C2 heterodimer (Tesmer et al. 1997), showed that the two monomers were intertwined like two boughs in a wreath (Tesmer et al. 1997; Tesmer et al. 1999). The active site is formed by the contribution of both monomers: The metal binding residues are provided by one monomer, whereas substrate (ATP)-specifying and transition state–stabilizing residues are provided by the other monomer. The nucleophilic attack of the 3′-OH of the ribose sugar on the α-phosphate group of ATP results in the release of pyrophosphate and formation of cAMP (Tesmer et al. 1997; Tesmer et al. 1999). Soluble GCs form similar heterodimers (Wedel and Garbers 2001). However, receptor GCs are either homodimers or higher oligomers (Vijayachandra et al. 2000). Based on the crystal structure of a mammalian AC, computational models of GCs were derived, which led to the identification of the determinants of GTP specificity in both homodimeric receptor GCs and heterodimeric soluble GCs (Sunahara et al. 1998; Tucker et al. 1998). In ACs, a Lys and an Asp present in the C2 domain recognize the exocyclic amine and the unprotonated N1 of adenine, respectively. These two residues discriminate against the O6 and N2 as well as the protonated N1 of guanine. In GCs, the Asp and Lys are replaced with a Glu and Cys. However, residues other than Cys (including Gly, Ser, Ala, or His) are also found at the GTP-specifying position in GCs (Foster et al. 1999; Linder and Schultz 2002; Ochoa De Alda et al. 2000; Roelofs et al. 2001; Roelofs and Van Haastert 2002; Tang and Hurley 1998).
https://static-content.springer.com/image/art%3A10.1007%2Fs00239-009-9242-5/MediaObjects/239_2009_9242_Fig1_HTML.gif
Fig. 1

Critical residues in class III nucleotide cyclases and protein kinase domains. a Three-dimensional structure of the class III AC dimer consisting of C1 (canine AC type V) and C2 (rat AC type II) domains with the ATP analog βLddATP at the active site (pdb code 1CJT [Tesmer et al. 1999]). Residues critical for the cyclase reaction are shown with their side chains. D396 and D440 are the metal-binding residues (M1 and M2) contributed by the C1 domain; K938 and D1018 are the substrate-specifying residues (S1 and S2); and N1025 and R1029 are the transition state-stabilizing residues (T1 and T2) contributed by the C2 domain. Metal ions (Mg2+ and Mn2+) are shown as black dots. b Src kinase domain structure (pdb code 1OIQ; [Seeliger et al. 2007]) indicating the conserved motifs in kinases. The ATP-binding VAIK motif present in subdomain II, the conserved E involved in stabilizing ATP binding in subdomain III, the HRD motif in the catalytic loop, the DFG in the activation loop, and the APE motif in subdomain VIII are all highlighted with their side chains

Receptor and soluble GCs are multidomain proteins, and the associated domains allosterically regulate the activity of the catalytic domain. Membrane-bound receptor GCs have an extracellular ligand-binding domain, a membrane-spanning domain, an intracellular domain that has homology to protein kinases (i.e., kinase homology domain [KHD]), a linker region, and a C-terminally located cyclase domain (Lucas et al. 2000). Ligand binding results in activation of the intracellular guanylyl cyclase domain, in turn resulting in increased levels of cGMP in the cell. Cytosolic or soluble GCs have two N-terminal regulatory domains, called the “heme nitric oxide–binding” (HNOB) domain and the “heme nitric oxide binding–associated” (HNOBA) domain, which bind nitric oxide and activate the cyclase domain present at the C-terminal end (Foster et al. 1999; Lucas et al. 2000).

The KHDs of receptor GCs are approximately 250 amino acids long and are more similar to the protein tyrosine kinases compared with the serine/threonine kinases (Koller et al. 1992). Genome-wide analysis of human (Manning et al. 2002), mouse (Caenepeel et al. 2004), C. elegans (Plowman et al. 1999), Dictyostelium (Goldberg et al. 2006) and yeast (Hunter and Plowman 1997) showed that 2% to 3% of the genes encode kinase-like proteins. A typical kinase domain contains an N-terminal lobe with five β sheets and a single α helix as well as a C-terminal lobe consisting mainly of α helices (Huse and Kuriyan 2002). A “hinge” of several amino acid residues connects the two lobes and provides flexibility in the relative orientation of the two lobes. ATP is accommodated in the cleft between the two lobes, and the adenosine ring of ATP forms hydrogen bonds with the hinge region. The residues or motifs important for catalysis are the P-loop (nucleotide-binding loop), which is rich in glycines; β strand 3; helix C, which contains a conserved Glu residue in the N-terminal lobe; and the catalytic loop and A-loop in the C-terminal lobe (Fig. 1b). The P-loop has a consensus sequence of GxGxΦG, where x is any amino acid and Φ is Phe or Tyr. The catalytic loop contains the HRD motif along with an Asn residue a few amino acids C-terminal to the HRD motif, with the Asp residue playing a critical role in the phosphotransfer reaction. The activation loop contains the N-terminal anchor with a conserved DFG motif, the Asp of which is involved in binding the Mg2+ ion required for catalysis. In addition, an APE, ALE or SPE sequence, which is involved in positioning the peptide substrate correctly for phosphotransfer (Nolen et al. 2004), is found at the C-terminal anchor (Taylor et al. 1993). Aminoglycoside kinase, APH(3′)-IIIa, lacks the glycine rich loop but shows a similar kinase fold and binds ATP (Hon et al. 1997). Importantly, because of the absence of the critical Asp in the catalytic loop of the KHDs of receptor GCs, this domain is thought to lack the function of a protein kinase in receptor GCs. However, there is a single report on a retinal GC that indicates that it has autophosphorylating activity (Aparicio and Applebury 1996).

Given the importance of cGMP in cellular signaling in eukaryotes, it would be of interest to identify genes that could encode GCs in diverse organisms. In addition, analysis of the KHD domain associated with the GC domain will illuminate the evolution and regulation of this class of enzymes, i.e., the fusion of an inactive kinase domain to the guanylyl cyclase domain suggests a generalized allosteric regulation of GCs that could be likened to the regulation of ACs by G-proteins. We therefore set out to identify genes that encode putative GCs in the nonredundant database. We report here novel GC genes were identified in prokaryotes with interesting domain fusions. Moreover, phylogenetic and correlation analysis of both the cyclase and KHD domains in receptor GCs suggests that the two domains have coevolved. Most interestingly, our analysis identified genes with a putative functional kinase domain that was fused to an inactive cyclase domain, indicating an apparent exclusion of proteins in nature that contain a functional protein kinase and a GC domain in a single polypeptide chain.

Materials and Methods

Database Search

Nonredundant database was obtained from National Center for Biotechnology Information (NCBI) in January 2007. Cyclase domains from 48 class III nucleotide cyclases comprising both ACs and GCs were used as queries in the BLAST searches. PSI-BLAST (Altschul et al. 1997) searches were performed with an e-value cutoff of 10−4 with a maximum of 100 iterations; final results showed ≤3000 proteins for each protein searched. A hidden Markov model (HMM) profile, cIII-cyclase.hmm, was built for class III nucleotide cyclases using the cyclase domain sequences of the same 48 queries used in the BLAST searches (http:www.//hmmer.wustl.edu/). The cyclase domains were analyzed for the presence of canonical residues required for catalysis by aligning them with cIII-cyclase.hmm profile using hmmalign alignment program in HMMER 2.3 suite. Domains were predicted as active GCs when they contained (1) both the canonical residues Asp or Glu as metal-binding residues; (2) a Glu as the first and either Cys, Gly, Ala, His, Ser, or Thr as the second substrate specifying residues; and (3) Asn or Gly as the first and Arg or Lys as the second transition state–stabilizing residue. Substrate specifying residues in ACs are Lys, Arg, Gln, and Asp at the first position and Asp, Ser, and Thr at the second position. We allowed the possibility that some GCs may contain a Thr residue at the second position because a substitution of a Thr residue for a Ser residue (found in characterized GCs) is usually tolerated in proteins with retention of their function. For a similar reason, we also allowed the presence of either a Lys or Arg as the second transition state–stabilizing residue. In analogy to mammalian ACs (Shenoy and Visweswariah 2004), cyclases having all of the residues needed for catalysis except the substrate-specifying residue are called “ambiguous” (Ambi) cyclases; those with only metal-binding residues are called “C1-like” or C1 cyclase domains; and those with only the transition state–stabilizing residue are called “C2-like” or C2 cyclase domains. Cyclase domains that did not fall into any of these classes are called as “Cyc-like.” Sequences showing ≥95% identity were removed from final analysis.

Kinase domains were searched and aligned using the protein kinase HMM (pkinase; PF00069.15) obtained from the PFAM database (Bateman et al. 2004). All other domains were predicted using the PFAM server (version 21.0 with 8957 protein families; Bateman et al. 2004), the SMART database of protein families (Letunic et al. 2006; Schultz et al. 2000), and the NCBI CD database (version 2.10 with 12589 PSSMs).

Phylogenetic Analysis

Domain sequences used for generating phylogenetic trees were mapped with their respective HMM profiles. Putative active GCs were analyzed for their evolutionary relations using Molecular Evolutionary Genetics Analysis (MEGA version 4; (Tamura et al. 2007). Sequences were named as “|GI|name of the protein (if any)|short species name_domain number (if more than one)”. All sequence alignments for phylogenetic tree construction was performed using ClustalW (matrix: BLOSUM; penalties: for pairwise alignment, gap opening = 10 and extension = 0.2; for multiple alignment, gap opening = 10 and extension = 0.2), and trees were built using the neighbor-joining (NJ) method (Saitou and Nei 1987) with interior branch test (500 replicates) implemented in MEGA4 software. Evolutionary distances were computed using Poisson correction method, and all positions containing gaps and missing data were eliminated from the data set using the complete deletion option in MEGA4.

Correlation coefficient (Pearson’s correlation coefficient [r]) and estimate of its SD, z-score, and p-value were determined essentially as reported earlier (Goh et al. 2000). Briefly, pairwise distances and average pairwise distance were computed from the alignment using MEGA4. Computed values were used for the calculation of r. Significance of the estimated r value was assessed by bootstrapping analysis, which gives the SD of r, and estimating the probability of obtaining the value by chance alone (p value). Bootstrap estimate of SD of r, z-score, and p-value were calculated by generating 1000 sets of randomly sampled pairwise distances, with replacement from the original set of distances for each pair of domains compared.

Search for Kinase–Cyclase Fusion Proteins

In June 2007, the NRDB was searched for kinase domain– and cyclase domain–containing proteins with the Pkinase.hmm and cIII-cyclase.hmm profiles. This was to ensure that the proteins identified in the search have sequences similar to protein kinase and cyclases, respectively, and are not distant homologues as can happen when using PSI-BLAST searches. Proteins found in both of the searches were pooled together, and redundant sequences removed. Analysis of critical motifs in the kinase and cyclase domains were performed after aligning the sequences with hmmalign (using respective HMM profiles). The cyclase domains classified as “Cyc-like” had a high e-value in the hmmsearches and hence were further checked at the 3DPSSM server. All of them were predicted to have a class III cyclase-like structure.

The linker region was defined as the sequence of amino acids between the kinase and cyclase domain in GCs. The boundaries of both domains were mapped using their respective HMM profiles. Although there is no foolproof method for demarcating domain boundaries, the current procedure served the purpose because we used the same hmm profiles for all sequences that were analyzed. HMM profile for the conserved 40 amino acid residues was generated using sequences reported earlier (Anantharaman et al. 2006) and the hmmbuild program in HMMER suite. The profile was further calibrated with hmmcalibrate program from the HMMER 2.3 suite.

Results

Identification of Guanylyl Cyclases in NRDB

The ability of nucleotide cyclases to use ATP or GTP as a substrate, and hence generate cAMP or cGMP as the product, depends on the substrate-specifying residues. Therefore, inspection of these residues would allow one to classify a nucleotide cyclase as an AC or GC. A BLAST search of the NRDB in July 2007 with a query set of 48 class III nucleotide cyclase domains found 3138 protein sequences. The presence of the canonical residues required for catalytic activity, along with substrate-specifying residues, identified 199 proteins that potentially could have GC activity. This method of analysis would not have identified the α subunits of soluble GCs because they lack important residues for catalysis and are active only on dimerization with β subunits (Lucas et al. 2000; Perkins 2006; Yamada et al. 2006).

Most of the GC genes that we identified as putative active GCs were from the higher forms of life. Interestingly, though, we did find 11 sequences from bacteria. Phylogenetic analysis using the cyclase domain sequence showed a clear separation of the soluble and receptor GCs, with subfamilies of receptor GCs present in separate clusters, suggesting different lineages (Fig. 2). Recently, Fitzpatrick et al. demonstrated the multiple lineage-specific expansion of GCs on analysis of selected sequenced genomes (Fitzpatrick et al. 2006), as well as the phylogenetic separation of the α and β subunits, with a clustering of C. elegans–soluble GCs and mammalian β-2 isoforms. However, in the previous study, residues required for catalysis were not considered while analyzing sequences for their phylogeny, and only a few selected genomes were used to determine the evolutionary relation. Furthermore, because the soluble GC–associated HNOB and HNOBA domains were analyzed in the previous study, we chose to explore the relation between the receptor guanylyl cyclase catalytic domains and the associated KHDs.
https://static-content.springer.com/image/art%3A10.1007%2Fs00239-009-9242-5/MediaObjects/239_2009_9242_Fig2_HTML.gif
Fig. 2

Phylogentic analysis of the GC domains. Topology of the phylogenetic tree of the GC domains found in the study was generated by NJ with a 500-replicate interior branch test. Branches not shaded are largely insect GCs. The nomenclature shown begins with GI, followed by the name of the protein (if any), and, finally, abbreviation of the organism name. Names of the organism are abbreviated as follows: A.ae.g.—Aedes aegypti; A.var—Anabaena variabilis ATCC 29413; A.jap—Anguilla japonica; A.gam—Anopheles gambiae str. PEST; A.mel—Apis mellifera; A.amu—Asterias amurensis; B.dor—Bactrocera dorsalis; B.mor—Bombyx mori; B.tau—Bos taurus; B.aga—Brissus agassizii; C.bri—C. briggsae; C.ele—C. elegans; C.sap—Callinectes sapidus; C.fam—Canis familiaris; C.por—Cavia porcellus; C.rei—Chlamydomonas reinhardtii; C.int—Ciona intestinalis; C.par—C. parvum Iowa II; D.rer—Danio rerio; D.set—Diadema setosum; D.mel—D. melanogaster; D.pse—D. pseudoobscura; G.gal—Gallus gallus; g-proteo—g-proteobacteria KT 71; H.pul—Hemicentrotus pulcherrimus; H.gly—Heterodera glycines; H.sap—Homo sapiens; M.mul—Macaca mulatta; M.sex—Manduca sexta; M.fer—M. ferrooxydans PV-1; M.mus—Mus musculus; N.pun—Nostoc punctiforme; Nostoc—Nostoc sp. PCC 7120; O.cur—Oryzias curvinotus; O.lat—O. latipes; P.tro—Pan troglodytes; P.ber—Plasmodium berghei; P.cha—P. chabaudi chabaudi; P.fal—P. falciparum 3D7; P.cla—Procambarus clarkia; R.cat—Rana catesbeiana; R.pip—Rana pipiens; R.nor—Rattus norvegicus; R.den—R. denitrificans OCh 114; S.aca—Squalus acanthias; S.jap—Stichopus japonicus; S.pur—S. purpuratus; S.scr—Sus scrofa; Synecho—Synechocystis sp. PCC 6803; T.nig—Tetraodon nigroviridis; T.ann—Theileria annulata strain Ankara; T.par—Theileria parva strain Muguga; T.cas—Tribolium castaneum; T.ery—T. erythraeum IMS101; X.lae—Xenopus laevis

Guanylyl Cyclases in Prokaryotes

Despite the wide use of cGMP as a signaling molecule in eukaryotes, there are only few reports of the use of cGMP as a second messenger in prokaryotes (Linder and Schultz 2002). Indeed, a recent analysis to identify nucleotide cyclase genes in >170 sequenced bacterial genomes did not suggest that any of the identified genes could show GC activity (Shenoy and Visweswariah 2004). Of the 11 putative GCs identified in our analysis, 7 were found in cyanobacteria such as Anabaena, Nostoc, Synechocystis, and Trichodesmium. A recent report on class III nucleotide cyclases in cyanobacteria also suggested the presence of 7 putative GCs encoded in these genomes (Wu et al. 2008). The other prokaryotic GCs were found in proteobacteria such as Mariprofundus and Roseobacter. A sequence alignment indicating the presence of canonical residues (and overall sequence similarity to the eukaryotic GCs) is shown in Fig. 3. None of the proteins identified here have been characterized biochemically or physiologically. Three of these genes contained a Thr residue at the second substrate-specifying position. An adenylyl cyclase from Rhizobium melotti (Beuve et al. 1990) contains a Gln as the first substrate-specifying residue and a Thr at the second position, which are residues found in a subclass of adenylyl cyclases (Linder and Schultz 2003; Shenoy and Visweswariah 2004). A mutant Rhizobium protein, generated as a fusion with β-galactosidase in which the Gln was converted to a Glu, retained adenylyl cyclase activity and showed no increase in guanylyl cyclase activity (Beuve et al. 1993). However, so far all naturally occurring proteins that have guanylyl cyclase activity contain a Glu residue in the first position. Furthermore, in a number of instances mutational analysis has not always provided unambiguous results on the role of substrate-specifying residues (Shenoy et al. 2003; Sunahara et al. 1998). We therefore considered proteins containing a Glu as the first substrate-specifying residue in combination with other allowed residues (see Materials and Methods) to be GCs and await their future biochemical characterization.
https://static-content.springer.com/image/art%3A10.1007%2Fs00239-009-9242-5/MediaObjects/239_2009_9242_Fig3_HTML.gif
Fig. 3

GCs in prokaryotes. The cyclase domains of putative prokaryotic GCs identified in the current study were aligned along with characterized GCs. Residues involved in catalysis are highlighted (filled circles indicate metal-binding residues; open squares indicate substrate-specifying residues; and crosses indicate transition state–stabilizing residues). Abbreviations used for organisms are identical to those listed in Fig. 2. Residue numbering is according to human GC-C

The domain architecture of these bacterial GCs is more similar to bacterial ACs than typical GCs (Shenoy and Visweswariah 2004; Wu et al. 2008), and the cyclase domains forms a separate cluster in the phylogenetic tree, suggesting their distant relation with other GCs. Linder and Schultz subclassified the class III nucleotide cyclase family (Linder and Schultz 2003). However, a phylogenetic tree constructed with the cyclase domains of proteins identified in this study, as well as the sequences used by Linder and Schultz to determine the subclass of class III nucleotide cyclases, indicated that the bacterial guanylyl cyclase-like genes formed a separate cluster, except for the Roseobacter denitrificans GC (data not shown). This suggests that the new genes we identified are indeed a distinct subclass within the classical class III nucleotide cyclases and therefore warrant further study in terms of their activity and catalytic mechanism.

Domain Organization of Guanylyl Cyclases

We analyzed the domain composition of the proteins we identified as GCs using PFAM, CD, SMART, and TMHMM profiles to identify the possible allosteric regulation of the GC domain by any associated domain (Fig. 4). Of the 199 proteins that we identified, 134 have a typical receptor GC architecture with an N-terminal extracellular ligand-binding domain (ANF-receptor domain, as defined in the Pfam database) and a transmembrane domain, followed by a kinase homology domain and a C-terminal cyclase domain. Nineteen genes were found to have a typical soluble GC architecture of HNOB, HNOBA (Iyer et al. 2003), and a C-terminal cyclase domain. Some of the eukaryotic proteins were found with a KHD linked to a cyclase domain without any extracellular receptor domain. Soluble GCs with either a single HNOB or HNOBA domain were also found, and most of these atypical domain compositions were found in lower organisms.
https://static-content.springer.com/image/art%3A10.1007%2Fs00239-009-9242-5/MediaObjects/239_2009_9242_Fig4_HTML.gif
Fig. 4

Domain organization of GCs. Schematic domain composition (not to scale) of the GCs are shown based on PFAM, SMART, and CD domain predictions. Transmembrane regions were predicted using TMHMM and are shown as vertical bars. Proteins shown are representative of the domain composition type, and the number of transmembrane helices can vary amongst proteins with similar domain organizations

Two genes from Plasmodium encode two GC domains associated with an N-terminal Hydrolase_3 (Haloacid dehalogenase–like hydrolase; Koonin and Tatusov 1994) or an ATPase domain and multiple (19 and 21) transmembrane helices. Although they are similar to the mammalian ACs in terms of overall topology, the orientation of the cyclase domains is reversed in that the cyclase domain that contains substrate specifying residues (C2 domain of an AC) is found N-terminal to the domain that contains the Asp residues required for metal binding (C1 domain of ACs). Interestingly, the two types of cyclase domains present in these two-cyclase domain GCs showed clear separation in the phylogenetic analysis, with the C2 domains grouping with the bacterial GCs. One predicted receptor GC in Strongylocentrotus purpuratus has a fused ribosomal_S7 domain, and another one has a DSL (Delta/Serrate/Lag-2) domain N-terminal to the cyclase domain and 30 EGF domains C-terminal to the cyclase domain. The fusion of a DSL domain with a GC suggests the involvement of cGMP in developmental pathways. There are cyclases with CHASE2 domains (cyclases/histidine kinases associated sensing extracellular) (Mougel and Zhulin 2001) and some with a response regulator domain associated with a single cyclase domain. Interestingly, ACs with these domain fusions in bacteria have been identified earlier (Shenoy and Visweswariah 2004).

Most of the cyanobacterial GCs have a CHASE_2 domain associated with the cyclase domain, except a single gene, which encodes only a single cyclase domain. Both proteins from Mariprofundus ferrooxydans PV-1 have a response-regulator domain fused to the GC domain. Of the two GCs found in Trichodesmium erythraeum IMS101 (a colonial marine cyanobacterium), one has a CHASE2 domain along with a cyclase domain, whereas the other one has a congregation of domains, such as the Cache_1, HAMP-HisKA-HATPase-c-response regulator in tandem. CHASE2 and Cache_1 domains sense extracellular stimuli, and their presence in these proteins suggests a putative receptor-like function for these GCs. The presence of multiple signaling domains in a protein suggests that it would have been used to diversify the stimuli sensed by the extracellular Cache_1 domain into various signals in the cell by way of histidine phosphorylation (HisKA), HATPase_c, response regulator and the cyclase domain, which in turn is regulated by the HAMP domain. The diversity of domains found associated with GCs is far less than that found in the ACs (Shenoy and Visweswariah 2004), in which more than 50 distinct domains were fused to the adenylyl cyclase domain. Perhaps this suggests a more precise structural requirement of the catalytic domain of GCs than ACs, which prevents the guanylyl cyclase domain from functioning when associated with another domain. Alternatively, it could reflect the decreased use of cGMP as a second messenger, thus resulting in decreased associated-domain diversity because there have been several reports on the widespread use of di-cGMP as a signaling molecule compared with cGMP in bacteria (Romling and Amikam 2006).

Phylogenetic Analysis of the Kinase and Cyclase Domains

As mentioned previously, the majority of GCs identified resembled the architecture of receptor GCs in that they possessed an extracellular domain and a KHD domain N-terminal to the cyclase domain. We therefore analyzed in greater detail the KHDs of these genes to compare their phylogeny with their associated cyclase domain.

Of the 199 putative active GCs, 146 proteins contained a KHD. This included mainly receptor GCs, along with few proteins having a KHD domain and a cyclase domain, without any predictable extracellular ligand-binding domain (also called “ksGCs”) (Kojima et al. 1995). An NJ tree built using the KHD sequences showed that the KHDs separate out into various groups that resemble the tree of the receptor GC domain (Fig. 5). The nematode KHDs together formed a separate clade in a manner similar to their cyclase domain counterparts. Similarly, the KHDs of GC-Cs, GC-D, E, and F groups also formed their own clusters. This indicates that the KHDs may have coevolved with their associated cyclase domain, suggesting that the regulation of the cyclase domain by the KHD is very specific. Interestingly, the KHDs of GC-As and GC-Bs formed distinct clades, whereas their cyclase domains did not.
https://static-content.springer.com/image/art%3A10.1007%2Fs00239-009-9242-5/MediaObjects/239_2009_9242_Fig5_HTML.gif
Fig. 5

Phylogenetic analysis of the kinase homology domain in cyclases. Topology of the phylogenetic tree of the KHD of the receptor GCs found in the current study was generated by NJ method with a 500-replicate interior branch test. Branches not shaded largely contain insect GCs. Names of the proteins begin with GI, followed by name of the protein (if any). Abbreviations of the organism are as in Fig. 2

The correlation coefficient between pairwise distances in the sequence alignment of two different domains or proteins has been shown to be indicative of their coevolution (Goh et al. 2000). Therefore, to test the coevolution of the KHD and the cyclase domains, we calculated the correlation coefficient between the pairwise distances of the KHD and cyclase domains using the approach described previously (Goh et al. 2000). The average pairwise distance of the cyclase domains was less than that of the kinase domains (0.42 and 0.94, respectively), indicating a greater overall similarity in the cyclase domains than in the KHDs. The correlation coefficient for the two domains was determined to be 0.82 (Table 1), suggesting strong evolutionary forces for the functional coupling of these domains. Indeed, a swap of the KHD of GC-A with the KHD of GC-C resulted in an inactive protein, indicating the precise conformational interactions required for allosteric regulation of the cyclase domain by the KHD (Koller et al. 1992).
Table 1

Pairwise distance correlation analysis between various domains of receptor GCs

Domains compared

ra

SD of r

z

p

Cyclase–KHD

0.82

0.01

119.8

0

Linker–cyclase

0.63

0.01

92.8

0

Linker–KHD

0.69

0.01

100.4

0

aCorrelation coefficients were determined from the pairwise distances between sequences in the respective domain alignments. SD of r, z scores, and p values were determined from 1000 sets of randomly shuffled pairwise distances from the original sets of respective distances

Sequence Analysis for the Presence of Critical Motifs Required for Catalysis in the KHDs of Receptor GCs

As indicated previously, kinases contain specific residues present in various subdomains, which are involved in either binding ATP or the metal ion or in the catalysis reaction per se. We analyzed the KHDs in receptor GCs for the presence of these subdomains and the conservation of specific residues required for kinase function because ATP binding to the KHD domain regulates GC domain activity (Jaleel et al. 2006). The KHDs of the 146 proteins found in our analysis were aligned with the Pkinase HMM profile, and the presence of various functional motifs was determined. The conservation of the subdomain and specific residues within a subdomain are listed in Table 2. The glycine rich P-loop GXGXXG motif is present in only 2 proteins (1.4%), which are found in nematodes. The ATP anchoring VAIK motif is present in approximately 60% of the proteins. However, the ATP-binding Lys residue is conserved in approximately 75% of the proteins. The metal-chelating Asp of the DFG motif is present in 83% of proteins, whereas the DFG motif is conserved in only 36% of the proteins. The APE motif in the P+1 loop is the most conserved of the motifs analyzed. As expected, none of the 146 proteins has the Asp residue conserved in the HRD motif, which is required for catalysis, thus leading to their classification as pseudokinases. Therefore, the presence of the core ATP-binding motif in the KHDs of almost all of the proteins indicates the importance of ATP interaction as a means of regulating the catalytic activity of the associated GC domain.
Table 2

Conservation of critical motifs in the kinase homology domains found in KHD–GC fusion proteins

Motifs

Residues

Residue (%)a

Motif (%)

GXGXXG

G/A

16.4

1.4

G

31.5

G

11.6

VAIK

V/F/Y/M/L/I/C

77.4

58.9

A/W/V/C/I

69.9

M/I/A/V/L/G

49.3

R/K

76.0

E

E

65.1

65.1

HRD

H/R

83.6

0

R

0.0

D

0.0

L/V/I

82.9

K/R

74.0

N

81.5

DFG

D

82.9

36.3

F

47.3

G

97.3

APE

A

84.9

84.2

P

96.6

E

95.9

aSubdomain motifs of protein kinases were searched in the GCs identified in this study, and the frequency of occurrence of individual residues in each motif in different GCs is indicated in column 3. In addition, the frequency of occurrence of the motif in its entirety is also shown (column 4)

CASK CaM kinase lacks the Asp of the DFG motif and hence was thought to be a pseudokinase (Boudeau et al. 2006). However, recently this protein was found to have kinase activity (Kannan and Taylor 2008; Mukherjee et al. 2008). The structure showed that AMPPNP is bound in the cleft between the N- and C-lobes, as is seen in most kinases. The α-phosphate group of AMPPNP was found positioned in between the side chains of His145 and Lys41, with a number of water molecules aiding in the interaction. However, CASK CaM kinase contains the HRD, VAIK, and the GXGXXG motifs (Mukherjee et al. 2008). Our analysis of the critical motifs in the KHDs indicates that most of them contain the residues required for ATP binding as seen in typical protein kinases (76% have Lys in VAIK motif, 82% have Asn in the catalytic loop, and 83% have Asp in the activation loop). Instead, as said previously, none of them contain the Asp in the HRD motif, which is thought to be critical for the phosphotransfer reaction. Therefore, it can be safely argued that the kinase domains present in GCs are true pseudokinases.

Conservation of the Linker Region

The sequence between the KHD and the cyclase domain is known as the linker region and is presumably involved in transmitting the conformational change from the extracellular domain by way of the KHD to the cyclase domain. ATP binding to the KHD also regulates GC activity in vivo and in vitro (Bhandari et al. 2001; Jaleel et al. 2006), and the conformational change induced on ATP binding must also be transduced to the cyclase domain by way of the linker sequence. The linker region of receptor GCs is predicted to form coiled-coils (Anantharaman et al. 2006), and, in retGC, it was found to be important for receptor dimerization (Ramamurthy et al. 2001). Similar helical sequences have been shown to be important for dimerization in bacterial two-component receptors (Park and Inouye 1997) and protein kinase A (PKA) (Vigil et al. 2004). The presence of a linker sequence between the KHD and the cyclase domain of receptor GCs prompted us to study conservation of the length of the sequence between these domains, which might indicate that these proteins undergo a common structural alteration during the regulation of the cyclase domain by the KHD. We determined the distance between the KHD and the cyclase domain in all of the 146 KHD-containing GCs and, surprisingly, found that the distance between the 2 domains is invariably approximately 70 residues (Fig. 6a). It was reported previously that a stretch of approximately 40 amino acid residues are found to be conserved in a variety of proteins, including histidine kinases as well as GAF domain– and PAS domain–containing proteins and cyclases (Iyer et al. 2003). To determine if this region is conserved in all receptor GCs and has the potential to form coiled-coils, we generated an HMM profile for the stretch of approximately 40 amino acids, and hmmsearch was performed. Results of the hmmsearch suggested that almost all of them (142 of 146) could indeed form a coiled-coil. Indeed, a sequence alignment of this region showed a remarkable similarity and therefore underscores the importance of this region of receptor GCs in the function and regulation of these proteins (Fig. 6b).
https://static-content.springer.com/image/art%3A10.1007%2Fs00239-009-9242-5/MediaObjects/239_2009_9242_Fig6_HTML.gif
Fig. 6

Conservation of the linker region. a Linker regions were mapped by predicting the cyclase and kinase domains in the kinase-cyclase fusion proteins using the HMM profiles for cyclase and kinase domains. The number of amino acid residues are plotted on the x-axis, whereas the y-axis represents the number of proteins containing a particular length of linker sequence. The number of proteins with a particular length is shown above the bars. b Sequence conservation in the coiled-coil region. Coiled-coil regions were predicted in the receptor GCs with the coiled-coil HMM profile, and sequences were aligned using the same HMM profile. Sequence logo generated with the WebLogo program (Crooks et al. 2004) indicates the high degree of sequence conservation in this region of receptor GCs. Residue numbering is according to human GC-C

Considering the high level of sequence conservation in this region, we sought to determine the domain (either KHD or cyclase domain) with which the linker regions are coevolving. The correlation coefficient determined for the linker region and the cyclase domain and linker region and the KHDs were found to be 0.63 and 0.69, respectively (Table 1). This indicates that there appears to be no strong coevolution of the linker region with either the KHD or cyclase domains. It is therefore likely that the linker region is a conserved feature of these receptor GCs and could have an independent role in the regulation of receptor GCs.

Kinase–Cyclase Domain Fusion

In our analysis, the majority of the GCs we identified were found to be receptor GCs, and the organization of fused KHD and cyclase domains were found mainly in eukaryotes. During the course of these studies, we came across three proteins (gis 55819684, 55819780, and 55819790) encoded in the mimivirus genome, each of which contained a cyclase domain, with ambiguous substrate-specifying residues. Our attention was caught by the fact that these proteins have a cyclase domain flanked by two kinase domains. Closer inspection of the sequence indicated that the kinase domains in these proteins contain all of the critical motifs required for kinase function (Fig. 7a). The cyclase domains in these proteins contain both metal-binding residues and either His or Arg as the first and Gly as the second substrate-binding residues (Fig. 7b). Two of the three genes contain both transition state-stabilizing residues, suggesting that they are active cyclases. This is in contrast to what is seen with receptor GCs, in which the cyclase domain is active but the kinase domain is inactive. One mimiviral gene (gi 55819790) contains no transition state–stabilizing residues and therefore is predicted to be inactive. Because the KHDs in receptor GCs are inactive and regulate the activity of the associated cyclase domain, it is conceivable that the cyclase domain in this mimiviral protein regulates the function of the associated kinase domains.
https://static-content.springer.com/image/art%3A10.1007%2Fs00239-009-9242-5/MediaObjects/239_2009_9242_Fig7_HTML.gif
Fig. 7

Cyclase and kinase domains in mimiviral proteins. a Alignment of the kinase domains of the mimiviral proteins with other kinases. Critical motifs required for phosphotransfer reaction have been boxed. Residue numbering is according to residues in human GC-C. b Alignment of the cyclase domains in the three mimiviral proteins with class III nucleotide cyclases. Residues involved in catalysis are highlighted (filled circles indicates metal-binding residues; open squares indicates substrate-specifying residues; crosses indicate transition state–stabilizing residues). mimi—A. polyphaga mimivirus; other abbreviations used for organisms are identical to those listed in Fig. 2

We then attempted to identify the presence of kinase-cyclase domain fusions in the NRDB. The search resulted in the identification of 193 proteins from 60 different species that contained a kinase and cyclase domain fused together. As expected, the majority of them (n = 153) were receptor GCs, containing extracellular, protein kinase-like, and GC domains. Importantly, of the 193 proteins, 20 were of prokaryotic origin including the 3 mimiviral genes. Motif analysis of the kinase domains of these proteins (Table 3) indicated the presence of critical residues required for the kinase reaction in 21 kinase domains in 18 different proteins, with the majority found in prokaryotes and 1 in Trypanosoma (gi 71424068). The associated cyclase domains were also found to vary from being an AC, Ambi, C1, C2, or Cyc-like (see Materials and Methods). None of the 23 prokaryotic proteins had a putative active GC domain. Those containing an AC domain were largely prokaryotic, with 1 of them having an active kinase domain (Myxococcus xanthus [gi 108762189]). Therefore, in general, pseudokinase domain–containing proteins are usually associated with active GC domains, whereas most of the active kinase domain–containing proteins possess active adenylyl cyclase or unusual nucleotide cyclase domains.
Table 3

Presence of kinase subdomain motifs and classification of the associated cyclase domaina

Cyclase domain type

VAIK + HRD + DFG

VAIK + HRD

VAIK + DFG

HRD + DFG

VAIK

HRD

DFG

Kin-like

GC

0

0

27

0

47

0

29

46

AC

1

0

1

0

0

0

1

0

Ambi

2

0

0

0

0

0

0

0

C1

7

0

3

0

4

0

4

10

C2

2

0

0

0

1

0

0

2

Cyc-like

6

0

0

0

0

0

0

0

aThe subdomains found in the kinase domain associated with a cyclase domain (as defined in Materials and Methods) were identified, and the number of proteins containing various combinations is indicated

Discussion

This is the first report of a systematic search for GC-like genes in NRDB that considers not only overall sequence similarity in the cyclase domain but also the conserved critical residues required for cyclase function. We also paid extensive attention to the critical motifs in KHDs, which are amongst the commonly found domains associated with GCs. Our searches indicate a skewed distribution of GCs in the tree of life. Most of the putative GCs were from higher eukaryotes, with only a few found in lower organisms. Earlier reports on the phylogenetic analysis of GCs considered only overall sequence similarity and did not specify the functional classification based on catalytic activity (Fitzpatrick et al. 2006; Yamagami and Suzuki 2005). In agreement with an earlier report (Fitzpatrick et al. 2006), we found multiple lineage-specific GC expansion. However, we restricted our search for the presence of all motifs required for cyclase activity in a single protein, and this has resulted in our focusing almost exclusively on homodimeric receptor GCs.

We were able to find a few cyclases containing all of the residues required for GC activity (along with overall sequence similarity) in prokaryotes. The domain composition of these proteins indicated that they resembled bacterial ACs. Phylogenetic analysis suggested their evolutionary distance from the remaining GCs, but, interestingly, they formed a separate cluster from the ACs as well. This suggests that they could be intermediates in the evolution of GCs found in higher forms of life. It is clear that cGMP has not been extensively used as a second messenger in prokaryotes. In contrast, cyclic di-GMP has been extensively used in prokaryotes but not in eukaryotes (Romling and Amikam 2006). Perhaps the evolution of the effector proteins of cGMP and cyclic di-GMP have played a role in determining which second messenger was used across different phyla. Although there have been reports of cGMP (Szmidt-Jaworska et al. 2007) and proteins with GC activity (Ludidi and Gehring 2003) in plants, we were unable to detect GC-like genes in any plant using the stringent criteria employed in our analysis. The role of cGMP signaling in the presence of light has been well documented in plants; thus, it is likely that there has been extensive divergence of GC-like genes in plants, which prevented us from identifying them using the available sequences of other eukaryotic proteins.

Both cAMP and cGMP have been identified in cyanobacteria; however, cyclase enzymes characterized in detail are mostly ACs (Kanacher et al. 2002; Kasahara et al. 2001; Katayama and Ohmori 1997; Ochoa De Alda et al. 2000). Only one GC (Ochoa De Alda et al. 2000) and a single cGMP-specific phosphodiesterase (Cadoret et al. 2005) have been identified using genetic screens. Our results indicate the presence of a few active GCs in cyanobacteria. Cyclic nucleotide-mediated signal-transduction pathways have been implicated in various physiologic and developmental processes in these organisms, specifically in the UV-B–induced stress response (Cadoret et al. 2005), motility (Bhaya et al. 2006), and heterocyst development (Imashimizu et al. 2005). The domain composition of the GCs found in cyanobacteria also makes them interesting proteins, and detailed analysis of the signal-transduction pathways involving these proteins will certainly provide more insight into their roles in these organisms. It is, however, conceivable that GCs exist in cyanobacteria that are not similar to class III nucleotide cyclases and are likely to be identified only by genetic analysis in the first instance.

The study performed here is the first to suggest the presence of a GC in Cryptosporidium parvum. C. parvum is an apicomplexan protozoan parasite that is a major cause of diarrhea and gastroenteritis in human and other animals. It invades and resides in the epithelial cells, mostly in the small intestine. Although many immunological factors have been shown to play roles in the development of diarrhea after infection (Laurent et al. 1999; McDonald 2000), the presence of a GC in the parasite and perhaps subsequent production and secretion of cGMP in the host cell could also aid in causing diarrhea. Indeed, increased cGMP in intestinal cells, after activation of an intestinal receptor GC (GC-C) by toxins produced by enterotoxigenic E. coli, is the cause for “travellers’ diarrhea” (Hughes et al. 1978). Hence, biochemical and functional characterization of the Cryptosporidium cyclase and the role of cGMP in the pathogenesis of Cryptosporidium will be of immense importance.

Most of the GCs found in the study were typically of either the receptor or soluble GC topology, indicating a common theme of regulation of GC activity by ligands. A few proteins are found with a KHD and cyclase domain fusion without any extracellular ligand-binding domain. The first such protein was reported from the rat, and the protein is expressed in lung, kidney, and skeletal muscle (Kojima et al. 1995). However, no function has been assigned to this GC. Given the data on the role of KHDs in receptor GCs, it could be suggested that these cyclases are also regulated by ATP and hence can be sensors of intracellular ATP. Interestingly, a strong conservation in the length of the linker region between the KHD and the cyclase domain suggests a conserved means of regulation by the linker of either the activity of the KHD or the cyclase domain.

A variety of domains was found fused to the cyclase domain in lower organisms. GCs with two cyclase domains in a single polypeptide chain, a feature of mammalian ACs, show GC activity (Linder et al. 1999, 2000); however, the physiologic functions of these proteins are largely unknown. The presence of GCs in bacteria with domains usually found in ACs (e.g., CHASE_2, HAMP, Cache_1, HisKA, etc.) indicates that these GCs might have evolved from ACs.

Our phylogenetic and pairwise distance correlation analysis of the KHDs and the cyclase domain associated with them suggest coevolution of the two domains in GCs. Although most of the KHD containing GCs were found to contain the ATP-binding VAIK motif, and hence possible regulation by ATP, phylogeny indicates that the KHDs present in various kinds of GCs (e.g., GC-A, GC-B, etc.) are distinct from each other. Further analysis of conservation of the critical motifs for kinase function showed a variation in the conservation of motifs other than the VAIK motif. It can therefore be assumed that the KHDs in different GCs may regulate the activity of the GC domain in specific ways, as has been corroborated by experimental evidence wherein the exchange of the KHD of GC-A and GC-B allowed for proper functioning of the receptors, but not between GC-A and GC-C (Koller et al. 1992). This is probably caused by similarity between the cyclase domains of GC-As and GC-Bs, which formed a single clade in the phylogenetic tree compared with that of other groups of receptor GCs and therefore could be regulated by a similar KHD domain.

For the first time we report the presence of cyclase-like domains in the mimivirus. The mimivirus was discovered in 1992 within the amoeba Acanthamoeba polyphaga, after which it is named (Raoult et al. 2007). Mimivirus can infect mammals and cause pneumonia (La Scola et al. 2005) and therefore there has been much debate on whether this virus has acquired genes through horizontal gene transfer from its mammalian host (Filee et al. 2007; Monier et al. 2007; Moreira and Brochier-Armanet 2008; Suhre 2005; Suzan-Monti et al. 2006) or in fact represents a type of DNA virus that emerged before cellular organisms (Moreira and Brochier-Armanet 2008; Suhre 2005). The presence of gene duplications in the mimivirus genome is of a frequency comparable with that commonly observed in bacteria, archeae, and eukaryotes (approximately 30%), suggesting early evolutionary origins of the mimivirus (Suhre 2005). If this is indeed true, it indicates the early origin of proteins that could generate cyclic nucleotides, and their fusion with kinases, as we see in this study. It is a rare occurrence that kinase and guanylyl cyclase activities both function in a single protein. There has been only a single report of the KHD of a receptor GC showing catalytic activity (Aparicio and Applebury 1996), and it is difficult to comment at this time whether the activity detected was because of an associated kinase. The transition from a functional to a nonfunctional domain mirrors what has recently been reported for GAF domains. These domains have been found to regulate the activity of a variety of enzymes (e.g., ACs), cyclic nucleotide phosphodiesterase, and transcription factors (e.g., FhlA) (Martinez et al. 2002). However, recent evidence suggests that the GAF domain itself can perform catalysis (Lin et al. 2007). Therefore, it seems that nature can use the same protein architecture for either an enzymatic or a regulatory function.

In summary, we have identified GCs in prokaryotic genomes with a variety of domain fusions. Coevolution of the KHD along with the cyclase domain indicates the intricate relation between these domains in conveying external signals into changes in the physiology of the cells, resulting in conservation of the linker region between the two domains. The allosteric regulation of mammalian ACs (ATP-using enzymes) by GTP-binding proteins (G proteins) has been well characterized both functionally and structurally. The intriguing finding in our analysis described here is the abundant presence in diverse organisms of ATP-binding modules (KHDs) that allosterically regulate GTP-using enzymes. Compared with the scenario seen with G proteins and ACs, KHDs and GC domains are found in a single polypeptide chain, which perhaps prevents a generalized interaction between a single KHD regulatory domain and many GCs. This in turn has allowed these proteins to develop exquisite mechanisms of regulation, thereby effectively controlling cGMP levels in the cell.

Acknowledgments

Financial support from the Department of Biotechnology, Government of India, is acknowledged. K. H. B. is a recipient of a research fellowship from the Council of Scientific and Industrial Research, Government of India. Useful discussions with Ritwick Sawarkar are gratefully acknowledged.

Supplementary material

239_2009_9242_MOESM1_ESM.doc (294 kb)
Supplementary material 1 (DOC 294 kb)

Copyright information

© Springer Science+Business Media, LLC 2009