Journal of Molecular Evolution

, Volume 70, Issue 6, pp 545–556

Evolutionary Bases of Carbohydrate Recognition and Substrate Discrimination in the ROK Protein Family



The ROK (repressor, open reading frame, kinase) protein family (Pfam 00480) is a large collection of bacterial polypeptides that includes sugar kinases, carbohydrate responsive transcriptional repressors, and many functionally uncharacterized gene products. ROK family sugar kinases phosphorylate a range of structurally distinct hexoses including the key carbon source d-glucose, various glucose epimers, and several acetylated hexosamines. The primary sequence elements responsible for carbohydrate recognition within different functional categories of ROK polypeptides are largely unknown due to a limited structural characterization of this protein family. In order to identify the structural bases for substrate discrimination in individual ROK proteins, and to better understand the evolutionary processes that led to the divergent evolution of function in this family, we constructed an inclusive alignment of 227 representative ROK polypeptides. Phylogenetic analyses and ancestral sequence reconstructions of the resulting tree reveal a discrete collection of active site residues that dictate substrate specificity. The results also suggest a series of mutational events within the carbohydrate-binding sites of ROK proteins that facilitated the expansion of substrate specificity within this family. This study provides new insight into the evolutionary relationship of ROK glucokinases and non-ROK glucokinases (Pfam 02685), revealing the primary sequence elements shared between these two protein families, which diverged from a common ancestor in ancient times.


ROK Sugar kinase Repressor Substrate specificity Enzyme evolution 

Supplementary material

239_2010_9351_MOESM1_ESM.fsa (176 kb)
Supplementary Figure 1: FASTA formatted alignment of the complete ROK data set containing 227 members. (FSA 175 kb)
239_2010_9351_MOESM2_ESM.fsa (85 kb)
Supplementary Figure 2: FASTA formatted alignment of the complete ROK data set masked at 5%. (FSA 85.3 kb)
239_2010_9351_MOESM3_ESM.fsa (26 kb)
Supplementary Figure 3: FASTA formatted alignment of the complete, merged ROK and non-ROK data set. (FSA 26.3 kb)
239_2010_9351_MOESM4_ESM.tree (14 kb)
Supplementary Figure 4: Phylogenetic tree of the complete ROK data set. (TREE 13.8 kb)
239_2010_9351_MOESM5_ESM.tree (3 kb)
Supplementary Figure 5: Phylogenetic tree of the merged ROK and non-ROK data set. (TREE 3.40 kb)
239_2010_9351_MOESM6_ESM.fsa (333 kb)
Supplementary Figure 6: FASTA formatted ancestral node sequences resulting from phylogenetic analysis of the complete ROK data set. (FSA 333 kb)


  1. Abascal F, Zardoya R, Posada D (2005) ProtTest: selection of best-fit models of protein evolution. Bioinformatics 9:2104–2105CrossRefGoogle Scholar
  2. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 17:3389–3402CrossRefGoogle Scholar
  3. Angell S, Schwarz E, Bibb MJ (1992) The glucose kinase gene of Streptomyces coelicolor A3(2): its nucleotide sequence, transcriptional analysis and role in glucose repression. Mol Microbiol 19:2833–2844CrossRefGoogle Scholar
  4. Eddy SR (1998) Profile hidden Markov models. Bioinformatics 9:755–763CrossRefGoogle Scholar
  5. Finn RD, Tate J, Mistry J, Coggill PC, Samut JS, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A (2008) The Pfam protein families database. Nucleic Acids Res 36:D281–D288CrossRefPubMedGoogle Scholar
  6. Fischer E (1894) Einfluss der configuration auf die Wirkung der enzyme. Ber Dtsch Chem Ges 27:2984–2993Google Scholar
  7. Greene GL (1969) Enzymes of glucose catabolism pathways in Colletotrichum and Gloeosporium. Mycologia 61:902–914CrossRefPubMedGoogle Scholar
  8. Hantke K (2001) Iron and metal regulation in bacteria. Curr Opin Microbiol 4:172–177CrossRefPubMedGoogle Scholar
  9. Holmes KC, Sander C, Valencia A (1993) A new ATP-binding fold in actin, hexokinase and Hsc70. Trends Cell Biol 2:53–59CrossRefGoogle Scholar
  10. Ito S, Fushinobu S, Yoshioka I, Koga S, Matsuzawa H, Wakagi T (2001) Structural basis for the ADP-specificity of a novel glucokinase from a hyperthermophilic archae. Structure 9:205–214CrossRefPubMedGoogle Scholar
  11. Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33:511–518CrossRefPubMedGoogle Scholar
  12. Kawai S, Mukai T, Mori S, Mikami B, Murata K (2005) Hypothesis: structure, evolution, and ancestor of glucose kinases in the hexokinase family. J Biosci Bioeng 4:320–330CrossRefGoogle Scholar
  13. Koshland DE Jr (1958) Application of a theory of enzyme specificity to protein synthesis. Proc Natl Acad Sci USA 44:98–104CrossRefPubMedGoogle Scholar
  14. Krem MM, Di Cera E (2001) Molecular markers of serine protease evolution. EMBO J 20:3036–3045CrossRefPubMedGoogle Scholar
  15. Kreuzer P, Gärtner D, Allmansberger R, Hillen W (1989) Identification and sequence analysis of the Bacillus subtilis W23 xylR gene and xyl operator. J Bacteriol 7:3840–3845Google Scholar
  16. Larion M, Moore LB, Thompson SM, Miller BG (2007) Divergent evolution of function in the ROK sugar kinase superfamily: role of enzyme loops in substrate specificity. Biochemistry 46:13564–13572CrossRefPubMedGoogle Scholar
  17. Lokman BC, van Santen P, Verdoes JC, Krüse J, Leer RJ, Posno M, Pouwels PH (1991) Organization and characterization of three genes involved in d-xylose catabolism in Lactobacillus pentosus. Mol Gen Genet 1–2:161–169CrossRefGoogle Scholar
  18. Lunin VV, Li Y, Schrag JD, Iannuzzi P, Cygler M, Matte A (2004) Crystal structure of Escherichia coli ATP-Dependent glucokinase and its complex with glucose. J Bacteriol 186:6915–6927CrossRefPubMedGoogle Scholar
  19. McGoldrick S, O’Sullivan SM, Sheehan D (2005) Glutathione transferase-like proteins encoded in genomes of yeasts and fungi: insights into evolution of a multifunctional protein superfamily. FEMS Microbiol Lett 242:1–12CrossRefPubMedGoogle Scholar
  20. Mesak LR, Mesak FM, Dahl MK (2004) Bacillus subtilis GlcK activity requires cysteines within a motif that discriminates microbial glucokinase into two lineages. BMC Microbiol 4:6CrossRefPubMedGoogle Scholar
  21. Miller BG, Raines RT (2004) Identifying latent enzyme activities: substrate ambiguity within modern bacterial sugar kinases. Biochemistry 43:6387–6392CrossRefPubMedGoogle Scholar
  22. Miller BG, Raines RT (2005) Reconstitution of a defunct glycolytic pathway via recruitment of ambiguous sugar kinases. Biochemistry 44:10776–10783CrossRefPubMedGoogle Scholar
  23. Mukai T, Kuwai S, Mori S, Mikami B, Murata K (2004) Crystal structure of bacterial inorganic polyphosphate/ATP-glucomannokinase. J Biol Chem 48:50591–50600CrossRefGoogle Scholar
  24. Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 1:205–217CrossRefGoogle Scholar
  25. Pearson WR (1998) Empirical statistical estimates for sequence similarities searches. J Mol Biol 1:40–47Google Scholar
  26. Price MN, Dehal PS, Arkin AP (2009) FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol Biol Evol 7:1641–1650CrossRefGoogle Scholar
  27. Pupko T, Pe’re I, Graur D, Hasegawa M, Friedman N (2002) A branch-and-bound algorithm for the inference of ancestral amino-acid sequences when the replacement rate varies among sites: application to the evolution of five gene families. Bioinformatics 18:1116–1123CrossRefPubMedGoogle Scholar
  28. Schiefner A, Gerber K, Seitz S, Welte W, Diederichs K, Boos W (2005) The crystal structure of Mlc, a global regulator of sugar metabolism in Escherichia coli. J Biol Chem 32:29073–29079CrossRefGoogle Scholar
  29. Sizemore C, Buchner E, Rygus T, Witke C, Götz F, Hillen W (1991) Organization, promoter analysis and transcriptional regulation of the Staphylococcus xylosus xylose utilization operon. Mol Gen Genet 3:277–284Google Scholar
  30. Smith SW, Overbeek R, Woese CR, Gilbert W, Gillevet PM (1994) The genetic data environment an expandable GUI for multiple sequence alignment. Comput Appl Biosci 10:671–675PubMedGoogle Scholar
  31. Stamatakis A (2006) RAxML-VL-HPC: ML-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 21:2688–2690CrossRefGoogle Scholar
  32. Titgemeyer F, Reizer J, Reizer A, Saier MH Jr (1994) Evolutionary relationships between sugar kinases and transcriptional repressors in bacteria. Microbiol 140:2349–2354CrossRefGoogle Scholar
  33. Wallace IM, O’Sullivan O, Higgins DG, Notredame C (2006) M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res 34:1692–1699CrossRefPubMedGoogle Scholar
  34. Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 5:691–699Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Department of Chemistry and BiochemistryFlorida State UniversityTallahasseeUSA
  2. 2.Department of BiologyValdosta State UniversityValdostaUSA

Personalised recommendations