Abstract
Protein sequence world is considerably larger than structure world. In consequence, numerous non-related sequences may adopt similar 3D folds and different kinds of amino acids may thus be found in similar 3D structures. By grouping together the 20 amino acids into a smaller number of representative residues with similar features, sequence world simplification may be achieved. This clustering hence defines a reduced amino acid alphabet (reduced AAA). Numerous works have shown that protein 3D structures are composed of a limited number of building blocks, defining a structural alphabet. We previously identified such an alphabet composed of 16 representative structural motifs (5-residues length) called Protein Blocks (PBs). This alphabet permits to translate the structure (3D) in sequence of PBs (1D). Based on these two concepts, reduced AAA and PBs, we analyzed the distributions of the different kinds of amino acids and their equivalences in the structural context. Different reduced sets were considered. Recurrent amino acid associations were found in all the local structures while other were specific of some local structures (PBs) (e.g Cysteine, Histidine, Threonine and Serine for the α-helix Ncap). Some similar associations are found in other reduced AAAs, e.g Ile with Val, or hydrophobic aromatic residues Trp with Phe and Tyr. We put into evidence interesting alternative associations. This highlights the dependence on the information considered (sequence or structure). This approach, equivalent to a substitution matrix, could be useful for designing protein sequence with different features (for instance adaptation to environment) while preserving mainly the 3D fold.
Similar content being viewed by others
References
Akanuma S, Kigawa T, Yokoyama S (2002) Combinatorial mutagenesis to restrict amino acid usage in an enzyme to a reduced set. Proc Natl Acad Sci USA 99:13549–13553
Aurora R, Creamer TP, Srinivasan R, Rose GD (1997) Local interactions in protein folding: lessons from the alpha-helix. J Biol Chem 272:1413–1416
Benros C (2005) Analyse et prédiction des structures tridimensionnelles locales des protéines, vol PhD. Paris 7-Denis Diderot, Paris, pp 212
Benros C, de Brevern AG, Etchebest C, Hazout S (2006) Assessing a novel approach for predicting local 3D protein structures from sequence. Proteins 62:865–880
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28:235–242
Bradley LH, Thumfort PP, Hecht MH (2006) De novo proteins from binary-patterned combinatorial libraries. Methods Mol Biol 340:53–69
Bradley LH, Wei Y, Thumfort P, Wurth C, Hecht MH (2007) Protein design by binary patterning of polar and nonpolar amino acids. Methods Mol Biol 352:155–166
Brown BM, Sauer RT (1999) Tolerance of Arc repressor to multiple-alanine substitutions. Proc Natl Acad Sci USA 96:1983–1988
Buhot C, Chenal A, Sanson A, Pouvelle-Moratille S, Gelb MH, Menez A, Gillet D, Maillere B (2004) Alteration of the tertiary structure of the major bee venom allergen Api m 1 by multiple mutations is concomitant with low IgE reactivity. Protein Sci 13:2970–2978
Camproux AC, Tuffery P, Chevrolat JP, Boisvieux JF, Hazout S (1999) Hidden Markov model approach for identifying the modular framework of the protein backbone. Protein Eng 12:1063–1073
Camproux AC, Gautier R, Tuffery P (2004) A hidden markov model derived structural alphabet for proteins. J Mol Biol 339:591–605
Chan HS (1999) Folding alphabets. Nat Struct Biol 6:994–996
Chan HS, Dill KA (1990) Origins of structure in globular proteins. Proc Natl Acad Sci USA 87:6388–6392
Chothia C, Levitt M, Richardson D (1977) Structure of proteins: packing of alpha-helices and pleated sheets. Proc Natl Acad Sci USA 74:4130–4134
Cieplak M, Holter NS, Maritan A, Banavar JR (2001) Amino acid classes and protein folding problem. J Chem Phys 114:1420–1423
Clarke ND (1995) Sequence ‘minimization’: exploring the sequence landscape with simplified sequences. Curr Opin Biotechnol 6:467–472
Colloc’h N, Etchebest C, Thoreau E, Henrissat B, Mornon JP (1993) Comparison of three algorithms for the assignment of secondary structure in proteins: the advantages of a consensus assignment. Protein Eng 6:377–382
de Brevern AG (2005) New assessment of protein blocks. In Silico Biol 5:283–289
de Brevern AG, Hazout S (2003) ‘Hybrid protein model’ for optimally defining 3D protein structure fragments. Bioinformatics 19:345–353
de Brevern AG, Etchebest C, Hazout S (2000) Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks. Proteins 41:271–287
de Brevern AG, Valadie H, Hazout S, Etchebest C (2002) Extension of a local backbone description using a structural alphabet: a new approach to the sequence-structure relationship. Protein Sci 11:2871–2886
de Brevern AG, Benros C, Gautier R, Valadie H, Hazout S, Etchebest C (2004) Local backbone structure prediction of proteins. In Silico Biol 4:381–386
de Brevern AG, Etchebest C, Benros C, Hazout S (2007) “Pinning strategy”: a novel approach for predicting the backbone structure in terms of protein blocks from sequence. J Biosci 32:51–70
Dill KA (1985) Theory for the folding and stability of globular proteins. Biochemistry 24:1501–1509
Dill KA, Bromberg S, Yue K, Fiebig KM, Yee DP, Thomas PD, Chan HS (1995) Principles of protein folding—a perspective from simple exact models. Protein Sci 4:561–602
Dokholyan NV (2004) What is the protein design alphabet? Proteins 54:622–628
Dokholyan NV (2005) Studies of folding and misfolding using simplified models. Curr Opin Struct Biol 16:1–7
Dubreuil O, Bossus M, Graille M, Bilous M, Savatier A, Jolivet M, Menez A, Stura E, Ducancel F (2005) Fine tuning of the specificity of an anti-progesterone antibody by first and second sphere residue engineering. J Biol Chem 280:24880–24887
Dudev M, Lim C (2007) Discovering structural motifs using a structural alphabet: application to magnesium-binding sites. BMC Bioinformatics 8:106
Esteve JG, Falceto F (2004) A general clustering approach with application to the Miyazawa-Jernigan potentials for amino acids. Proteins 55:999–1004
Esteve JG, Falceto F (2005) Classification of amino acids induced by their associated matrices. Biophys Chem 115:177–180
Etchebest C, Benros C, Hazout S, de Brevern AG (2005) A structural alphabet for local protein structures: improved prediction methods. Proteins 59:810–827
Fan K, Wang W (2003) What is the minimum number of letters required to fold a protein? J Mol Biol 328:921–926
Fitzkee NC, Fleming PJ, Gong H, Panasik N Jr, Street TO, Rose GD (2005) Are proteins made from a limited parts list? Trends Biochem Sci 30:73–80
Fourrier L, Benros C, de Brevern AG (2004) Use of a structural alphabet for analysis of short loops connecting repetitive structures. BMC Bioinformatics 5:58
Franks NP, Jenkins A, Conti E, Lieb WR, Brick P (1998) Structural basis for the inhibition of firefly luciferase by a general anesthetic. Biophys J 75:2205–2211
Gaboriaud C, Bissery V, Benchetrit T, Mornon J-P (1987) Hydrophobic cluster analysis: an efficient new way to compare and analyse amino acid sequences. FEBS Lett 224:149–155
Hecht MH, Das A, Go A, Bradley LH, Wei Y (2004) De novo proteins from designed combinatorial libraries. Protein Sci 13:1711–1723
Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89:10915–10919
Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5:299–314
Ikenaka Y, Nanba H, Yajima K, Yamada Y, Takano M, Takahashi S (1998a) Increase in thermostability of N-carbamyl-D-amino acid amidohydrolase on amino acid substitutions. Biosci Biotechnol Biochem 62:1668–1671
Ikenaka Y, Nanba H, Yamada Y, Yajima K, Takano M, Takahashi S (1998b) Screening, characterization, and cloning of the gene for N-carbamyl-D-amino acid amidohydrolase from thermotolerant soil bacteria. Biosci Biotechnol Biochem 62:882–886
Kamtekar S, Schiffer JM, Xiong H, Babik JM, Hecht MH (1993) Protein design by binary patterning of polar and nonpolar amino acids. Science 262:1680–1685
Karchin R (2003) Evaluating local structure alphabets for protein structure prediction, vol PhD. University of California, Santa Cruz, pp 301
Karchin R, Cline M, Mandel-Gutfreund Y, Karplus K (2003) Hidden Markov models that use predicted local structure for fold recognition: alphabets of backbone geometry. Proteins 51:504–514
Kim YH, Berry AH, Spencer DS, Stites WE (2001) Comparing the effect on protein stability of methionine oxidation versus mutagenesis: steps toward engineering oxidative resistance in proteins. Protein Eng 14:343–347
Kuhlman B, Baker D (2004) Exploring folding free energy landscapes using computational protein design. Curr Opin Struct Biol 14:89–95
Kumar S, Bansal M (1998) Geometrical and sequence characteristics of alpha-helices in globular proteins. Biophys J 75:1935–1944
Law GH, Gandelman OA, Tisi LC, Lowe CR, Murray JA (2006) Mutagenesis of solvent-exposed amino acids in Photinus pyralis luciferase improves thermostability and pH-tolerance. Biochem J 397:305–312
Li T, Fan K, Wang J, Wang W (2003) Reduction of protein sequence complexity by residue grouping. Protein Eng 16:323–330
Liu X, Zhang LM, Guan S, Zheng WM (2003) Distances and classification of amino acids for different protein secondary structures. Phys Rev E Stat Nonlin Soft Matter Phys 67:051927
Melo F, Marti-Renom MA (2006) Accuracy of sequence alignment and fold assessment using reduced amino acid alphabets. Proteins 63:986–995
Miyazawa S, Jernigan RL (1993) A new substitution matrix for protein sequence searches based on contact frequencies in protein structures. Protein Eng 6:267–278
Murphy LR, Wallqvist A, Levy RM (2000) Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein Eng 13:149–152
Nanba H, Ikenaka Y, Yamada Y, Yajima K, Takano M, Takahashi S (1998) Isolation of Agrobacterium sp. strain KNK712 that produces N-carbamyl-D-amino acid amidohydrolase, cloning of the gene for this enzyme, and properties of the enzyme. Biosci Biotechnol Biochem 62:875–881
Noguchi T, Akiyama Y (2003) PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB) in 2003. Nucleic Acids Res 31:492–493
Noguchi T, Matsuda H, Akiyama Y (2001) PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB). Nucleic Acids Res 29:219–220
Oh KH, Nam SH, Kim HS (2002) Improvement of oxidative and thermostability of N-carbamyl-D-amino acid amidohydrolase by directed evolution. Protein Eng 15:689–695
Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philosophical Magazine 2:559–572
Plaxco KW, Riddle DS, Grantcharova V, Baker D (1998) Simplified proteins: minimalist solutions to the ‘protein folding problem’. Curr Opin Struct Biol 8:80–85
Regan L, DeGrado WF (1988) Characterization of a helical protein designed from first principles. Science 241:976–978
Riddle DS, Santiago JV, Bray-Hall ST, Doshi N, Grantcharova VP, Yi Q, Baker D (1997) Functional rapidly folding proteins from simplified amino acid sequences. Nat Struct Biol 4:805–809
Rogov SI, Nekrasov AN (2001) A numerical measure of amino acid residues similarity based on the analysis of their surroundings in natural protein sequences. Protein Eng 14:459–463
Sali A, Shakhnovich E, Karplus M (1994) Kinetics of protein folding. A lattice model study of the requirements for folding to the native state. J Mol Biol 235:1614–1636
Sammon JJW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18:401–409
Sander O, Sommer I, Lengauer T (2006) Local protein structure prediction using discriminative models. BMC Bioinformatics 7:14
Smith RF, Smith TF (1990) Automatic generation of primary sequence patterns from sets of related protein sequences. Proc Natl Acad Sci USA 87:118–122
Solis AD, Rackovsky S (2000) Optimized representations and maximal information in proteins. Proteins 38:149–164
Tyagi M (2006) New perspectives for protein structure analysis and mining using sequences of a structural alphabet, vol PhD. Université de la Réunion, Saint-Denis de la Réunion, pp 215
Tyagi M, Sharma P, Swamy C, Cadet F, Srinivasan N, de Brevern AG, Offmann B (2006a) Protein Block Expert (PBE): A web-based protein structure analysis server using a structural alphabet. Nucleic Acids Res:W119–123
Tyagi M, Gowri VS, Srinivasan N, de Brevern AG, Offmann B (2006b) A substitution matrix for structural alphabet based on structural alignment of homologous proteins and its applications. Proteins 65:32–39
Unger R, Harel D, Wherland S, Sussman JL (1989) A 3D building blocks approach to analyzing and predicting structure of proteins. Proteins 5:355–373
Wang J, Wang W (1999) A computational approach to simplifying the protein folding alphabet. Nat Struct Biol 6:1033–1038
Wang WC, Hsu WH, Chien FT, Chen CY (2001) Crystal structure and site-directed mutagenesis studies of N-carbamoyl-D-amino-acid amidohydrolase from Agrobacterium radiobacter reveals a homotetramer and insight into a catalytic cleft. J Mol Biol 306:251–261
Wei Y, Kim S, Fela D, Baum J, Hecht MH (2003) Solution structure of a de novo protein from a designed combinatorial library. Proc Natl Acad Sci USA 100:13270–13273
Wrabl JO, Grishin NV (2005) Grouping of amino acid types and extraction of amino acid properties from multiple sequence alignments using variance maximization. Proteins 61:523–534
Xu W, Miranker DP (2004) A metric model of amino acid substitution. Bioinformatics 20:1214–1221
Yue K, Fiebig KM, Thomas PD, Chan HS, Shakhnovich EI, Dill KA (1995) A test of lattice protein folding algorithms. Proc Natl Acad Sci USA 92:325–329
Acknowledgments
This work was supported by French Institute for Health and Medical Care (INSERM) and University Paris 7-Denis Diderot. AB benefits from a grant of the Ministère de la Recherche.
Author information
Authors and Affiliations
Corresponding author
Additional information
C. Etchebest and C. Benros contributed equally to this work.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Etchebest, C., Benros, C., Bornot, A. et al. A reduced amino acid alphabet for understanding and designing protein adaptation to mutation. Eur Biophys J 36, 1059–1069 (2007). https://doi.org/10.1007/s00249-007-0188-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00249-007-0188-5