Skip to main content
Log in

A reduced amino acid alphabet for understanding and designing protein adaptation to mutation

  • Original Paper
  • Published:
European Biophysics Journal Aims and scope Submit manuscript

Abstract

Protein sequence world is considerably larger than structure world. In consequence, numerous non-related sequences may adopt similar 3D folds and different kinds of amino acids may thus be found in similar 3D structures. By grouping together the 20 amino acids into a smaller number of representative residues with similar features, sequence world simplification may be achieved. This clustering hence defines a reduced amino acid alphabet (reduced AAA). Numerous works have shown that protein 3D structures are composed of a limited number of building blocks, defining a structural alphabet. We previously identified such an alphabet composed of 16 representative structural motifs (5-residues length) called Protein Blocks (PBs). This alphabet permits to translate the structure (3D) in sequence of PBs (1D). Based on these two concepts, reduced AAA and PBs, we analyzed the distributions of the different kinds of amino acids and their equivalences in the structural context. Different reduced sets were considered. Recurrent amino acid associations were found in all the local structures while other were specific of some local structures (PBs) (e.g Cysteine, Histidine, Threonine and Serine for the α-helix Ncap). Some similar associations are found in other reduced AAAs, e.g Ile with Val, or hydrophobic aromatic residues Trp with Phe and Tyr. We put into evidence interesting alternative associations. This highlights the dependence on the information considered (sequence or structure). This approach, equivalent to a substitution matrix, could be useful for designing protein sequence with different features (for instance adaptation to environment) while preserving mainly the 3D fold.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Akanuma S, Kigawa T, Yokoyama S (2002) Combinatorial mutagenesis to restrict amino acid usage in an enzyme to a reduced set. Proc Natl Acad Sci USA 99:13549–13553

    Article  ADS  Google Scholar 

  • Aurora R, Creamer TP, Srinivasan R, Rose GD (1997) Local interactions in protein folding: lessons from the alpha-helix. J Biol Chem 272:1413–1416

    Article  Google Scholar 

  • Benros C (2005) Analyse et prédiction des structures tridimensionnelles locales des protéines, vol PhD. Paris 7-Denis Diderot, Paris, pp 212

  • Benros C, de Brevern AG, Etchebest C, Hazout S (2006) Assessing a novel approach for predicting local 3D protein structures from sequence. Proteins 62:865–880

    Article  Google Scholar 

  • Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28:235–242

    Article  Google Scholar 

  • Bradley LH, Thumfort PP, Hecht MH (2006) De novo proteins from binary-patterned combinatorial libraries. Methods Mol Biol 340:53–69

    Google Scholar 

  • Bradley LH, Wei Y, Thumfort P, Wurth C, Hecht MH (2007) Protein design by binary patterning of polar and nonpolar amino acids. Methods Mol Biol 352:155–166

    Google Scholar 

  • Brown BM, Sauer RT (1999) Tolerance of Arc repressor to multiple-alanine substitutions. Proc Natl Acad Sci USA 96:1983–1988

    Article  ADS  Google Scholar 

  • Buhot C, Chenal A, Sanson A, Pouvelle-Moratille S, Gelb MH, Menez A, Gillet D, Maillere B (2004) Alteration of the tertiary structure of the major bee venom allergen Api m 1 by multiple mutations is concomitant with low IgE reactivity. Protein Sci 13:2970–2978

    Article  Google Scholar 

  • Camproux AC, Tuffery P, Chevrolat JP, Boisvieux JF, Hazout S (1999) Hidden Markov model approach for identifying the modular framework of the protein backbone. Protein Eng 12:1063–1073

    Article  Google Scholar 

  • Camproux AC, Gautier R, Tuffery P (2004) A hidden markov model derived structural alphabet for proteins. J Mol Biol 339:591–605

    Article  Google Scholar 

  • Chan HS (1999) Folding alphabets. Nat Struct Biol 6:994–996

    Article  Google Scholar 

  • Chan HS, Dill KA (1990) Origins of structure in globular proteins. Proc Natl Acad Sci USA 87:6388–6392

    Article  ADS  Google Scholar 

  • Chothia C, Levitt M, Richardson D (1977) Structure of proteins: packing of alpha-helices and pleated sheets. Proc Natl Acad Sci USA 74:4130–4134

    Article  ADS  Google Scholar 

  • Cieplak M, Holter NS, Maritan A, Banavar JR (2001) Amino acid classes and protein folding problem. J Chem Phys 114:1420–1423

    Article  ADS  Google Scholar 

  • Clarke ND (1995) Sequence ‘minimization’: exploring the sequence landscape with simplified sequences. Curr Opin Biotechnol 6:467–472

    Article  Google Scholar 

  • Colloc’h N, Etchebest C, Thoreau E, Henrissat B, Mornon JP (1993) Comparison of three algorithms for the assignment of secondary structure in proteins: the advantages of a consensus assignment. Protein Eng 6:377–382

    Article  Google Scholar 

  • de Brevern AG (2005) New assessment of protein blocks. In Silico Biol 5:283–289

    Google Scholar 

  • de Brevern AG, Hazout S (2003) ‘Hybrid protein model’ for optimally defining 3D protein structure fragments. Bioinformatics 19:345–353

    Article  Google Scholar 

  • de Brevern AG, Etchebest C, Hazout S (2000) Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks. Proteins 41:271–287

    Article  Google Scholar 

  • de Brevern AG, Valadie H, Hazout S, Etchebest C (2002) Extension of a local backbone description using a structural alphabet: a new approach to the sequence-structure relationship. Protein Sci 11:2871–2886

    Article  Google Scholar 

  • de Brevern AG, Benros C, Gautier R, Valadie H, Hazout S, Etchebest C (2004) Local backbone structure prediction of proteins. In Silico Biol 4:381–386

    Google Scholar 

  • de Brevern AG, Etchebest C, Benros C, Hazout S (2007) “Pinning strategy”: a novel approach for predicting the backbone structure in terms of protein blocks from sequence. J Biosci 32:51–70

    Article  Google Scholar 

  • Dill KA (1985) Theory for the folding and stability of globular proteins. Biochemistry 24:1501–1509

    Article  Google Scholar 

  • Dill KA, Bromberg S, Yue K, Fiebig KM, Yee DP, Thomas PD, Chan HS (1995) Principles of protein folding—a perspective from simple exact models. Protein Sci 4:561–602

    Article  Google Scholar 

  • Dokholyan NV (2004) What is the protein design alphabet? Proteins 54:622–628

    Article  Google Scholar 

  • Dokholyan NV (2005) Studies of folding and misfolding using simplified models. Curr Opin Struct Biol 16:1–7

    Google Scholar 

  • Dubreuil O, Bossus M, Graille M, Bilous M, Savatier A, Jolivet M, Menez A, Stura E, Ducancel F (2005) Fine tuning of the specificity of an anti-progesterone antibody by first and second sphere residue engineering. J Biol Chem 280:24880–24887

    Article  Google Scholar 

  • Dudev M, Lim C (2007) Discovering structural motifs using a structural alphabet: application to magnesium-binding sites. BMC Bioinformatics 8:106

    Article  Google Scholar 

  • Esteve JG, Falceto F (2004) A general clustering approach with application to the Miyazawa-Jernigan potentials for amino acids. Proteins 55:999–1004

    Article  Google Scholar 

  • Esteve JG, Falceto F (2005) Classification of amino acids induced by their associated matrices. Biophys Chem 115:177–180

    Article  Google Scholar 

  • Etchebest C, Benros C, Hazout S, de Brevern AG (2005) A structural alphabet for local protein structures: improved prediction methods. Proteins 59:810–827

    Article  Google Scholar 

  • Fan K, Wang W (2003) What is the minimum number of letters required to fold a protein? J Mol Biol 328:921–926

    Article  Google Scholar 

  • Fitzkee NC, Fleming PJ, Gong H, Panasik N Jr, Street TO, Rose GD (2005) Are proteins made from a limited parts list? Trends Biochem Sci 30:73–80

    Article  Google Scholar 

  • Fourrier L, Benros C, de Brevern AG (2004) Use of a structural alphabet for analysis of short loops connecting repetitive structures. BMC Bioinformatics 5:58

    Article  Google Scholar 

  • Franks NP, Jenkins A, Conti E, Lieb WR, Brick P (1998) Structural basis for the inhibition of firefly luciferase by a general anesthetic. Biophys J 75:2205–2211

    Google Scholar 

  • Gaboriaud C, Bissery V, Benchetrit T, Mornon J-P (1987) Hydrophobic cluster analysis: an efficient new way to compare and analyse amino acid sequences. FEBS Lett 224:149–155

    Article  Google Scholar 

  • Hecht MH, Das A, Go A, Bradley LH, Wei Y (2004) De novo proteins from designed combinatorial libraries. Protein Sci 13:1711–1723

    Article  Google Scholar 

  • Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89:10915–10919

    Article  ADS  Google Scholar 

  • Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5:299–314

    Article  Google Scholar 

  • Ikenaka Y, Nanba H, Yajima K, Yamada Y, Takano M, Takahashi S (1998a) Increase in thermostability of N-carbamyl-D-amino acid amidohydrolase on amino acid substitutions. Biosci Biotechnol Biochem 62:1668–1671

    Article  Google Scholar 

  • Ikenaka Y, Nanba H, Yamada Y, Yajima K, Takano M, Takahashi S (1998b) Screening, characterization, and cloning of the gene for N-carbamyl-D-amino acid amidohydrolase from thermotolerant soil bacteria. Biosci Biotechnol Biochem 62:882–886

    Article  Google Scholar 

  • Kamtekar S, Schiffer JM, Xiong H, Babik JM, Hecht MH (1993) Protein design by binary patterning of polar and nonpolar amino acids. Science 262:1680–1685

    Article  ADS  Google Scholar 

  • Karchin R (2003) Evaluating local structure alphabets for protein structure prediction, vol PhD. University of California, Santa Cruz, pp 301

  • Karchin R, Cline M, Mandel-Gutfreund Y, Karplus K (2003) Hidden Markov models that use predicted local structure for fold recognition: alphabets of backbone geometry. Proteins 51:504–514

    Article  Google Scholar 

  • Kim YH, Berry AH, Spencer DS, Stites WE (2001) Comparing the effect on protein stability of methionine oxidation versus mutagenesis: steps toward engineering oxidative resistance in proteins. Protein Eng 14:343–347

    Article  Google Scholar 

  • Kuhlman B, Baker D (2004) Exploring folding free energy landscapes using computational protein design. Curr Opin Struct Biol 14:89–95

    Article  Google Scholar 

  • Kumar S, Bansal M (1998) Geometrical and sequence characteristics of alpha-helices in globular proteins. Biophys J 75:1935–1944

    Article  Google Scholar 

  • Law GH, Gandelman OA, Tisi LC, Lowe CR, Murray JA (2006) Mutagenesis of solvent-exposed amino acids in Photinus pyralis luciferase improves thermostability and pH-tolerance. Biochem J 397:305–312

    Article  Google Scholar 

  • Li T, Fan K, Wang J, Wang W (2003) Reduction of protein sequence complexity by residue grouping. Protein Eng 16:323–330

    Article  Google Scholar 

  • Liu X, Zhang LM, Guan S, Zheng WM (2003) Distances and classification of amino acids for different protein secondary structures. Phys Rev E Stat Nonlin Soft Matter Phys 67:051927

    ADS  Google Scholar 

  • Melo F, Marti-Renom MA (2006) Accuracy of sequence alignment and fold assessment using reduced amino acid alphabets. Proteins 63:986–995

    Article  Google Scholar 

  • Miyazawa S, Jernigan RL (1993) A new substitution matrix for protein sequence searches based on contact frequencies in protein structures. Protein Eng 6:267–278

    Article  Google Scholar 

  • Murphy LR, Wallqvist A, Levy RM (2000) Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein Eng 13:149–152

    Article  Google Scholar 

  • Nanba H, Ikenaka Y, Yamada Y, Yajima K, Takano M, Takahashi S (1998) Isolation of Agrobacterium sp. strain KNK712 that produces N-carbamyl-D-amino acid amidohydrolase, cloning of the gene for this enzyme, and properties of the enzyme. Biosci Biotechnol Biochem 62:875–881

    Article  Google Scholar 

  • Noguchi T, Akiyama Y (2003) PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB) in 2003. Nucleic Acids Res 31:492–493

    Article  Google Scholar 

  • Noguchi T, Matsuda H, Akiyama Y (2001) PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB). Nucleic Acids Res 29:219–220

    Article  Google Scholar 

  • Oh KH, Nam SH, Kim HS (2002) Improvement of oxidative and thermostability of N-carbamyl-D-amino acid amidohydrolase by directed evolution. Protein Eng 15:689–695

    Article  Google Scholar 

  • Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philosophical Magazine 2:559–572

    Google Scholar 

  • Plaxco KW, Riddle DS, Grantcharova V, Baker D (1998) Simplified proteins: minimalist solutions to the ‘protein folding problem’. Curr Opin Struct Biol 8:80–85

    Article  Google Scholar 

  • Regan L, DeGrado WF (1988) Characterization of a helical protein designed from first principles. Science 241:976–978

    Article  ADS  Google Scholar 

  • Riddle DS, Santiago JV, Bray-Hall ST, Doshi N, Grantcharova VP, Yi Q, Baker D (1997) Functional rapidly folding proteins from simplified amino acid sequences. Nat Struct Biol 4:805–809

    Article  Google Scholar 

  • Rogov SI, Nekrasov AN (2001) A numerical measure of amino acid residues similarity based on the analysis of their surroundings in natural protein sequences. Protein Eng 14:459–463

    Article  Google Scholar 

  • Sali A, Shakhnovich E, Karplus M (1994) Kinetics of protein folding. A lattice model study of the requirements for folding to the native state. J Mol Biol 235:1614–1636

    Article  Google Scholar 

  • Sammon JJW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18:401–409

    Article  ADS  Google Scholar 

  • Sander O, Sommer I, Lengauer T (2006) Local protein structure prediction using discriminative models. BMC Bioinformatics 7:14

    Article  Google Scholar 

  • Smith RF, Smith TF (1990) Automatic generation of primary sequence patterns from sets of related protein sequences. Proc Natl Acad Sci USA 87:118–122

    Article  ADS  Google Scholar 

  • Solis AD, Rackovsky S (2000) Optimized representations and maximal information in proteins. Proteins 38:149–164

    Article  Google Scholar 

  • Tyagi M (2006) New perspectives for protein structure analysis and mining using sequences of a structural alphabet, vol PhD. Université de la Réunion, Saint-Denis de la Réunion, pp 215

  • Tyagi M, Sharma P, Swamy C, Cadet F, Srinivasan N, de Brevern AG, Offmann B (2006a) Protein Block Expert (PBE): A web-based protein structure analysis server using a structural alphabet. Nucleic Acids Res:W119–123

    Article  Google Scholar 

  • Tyagi M, Gowri VS, Srinivasan N, de Brevern AG, Offmann B (2006b) A substitution matrix for structural alphabet based on structural alignment of homologous proteins and its applications. Proteins 65:32–39

    Article  Google Scholar 

  • Unger R, Harel D, Wherland S, Sussman JL (1989) A 3D building blocks approach to analyzing and predicting structure of proteins. Proteins 5:355–373

    Article  Google Scholar 

  • Wang J, Wang W (1999) A computational approach to simplifying the protein folding alphabet. Nat Struct Biol 6:1033–1038

    Article  Google Scholar 

  • Wang WC, Hsu WH, Chien FT, Chen CY (2001) Crystal structure and site-directed mutagenesis studies of N-carbamoyl-D-amino-acid amidohydrolase from Agrobacterium radiobacter reveals a homotetramer and insight into a catalytic cleft. J Mol Biol 306:251–261

    Article  Google Scholar 

  • Wei Y, Kim S, Fela D, Baum J, Hecht MH (2003) Solution structure of a de novo protein from a designed combinatorial library. Proc Natl Acad Sci USA 100:13270–13273

    Article  ADS  Google Scholar 

  • Wrabl JO, Grishin NV (2005) Grouping of amino acid types and extraction of amino acid properties from multiple sequence alignments using variance maximization. Proteins 61:523–534

    Article  Google Scholar 

  • Xu W, Miranker DP (2004) A metric model of amino acid substitution. Bioinformatics 20:1214–1221

    Article  Google Scholar 

  • Yue K, Fiebig KM, Thomas PD, Chan HS, Shakhnovich EI, Dill KA (1995) A test of lattice protein folding algorithms. Proc Natl Acad Sci USA 92:325–329

    Article  ADS  Google Scholar 

Download references

Acknowledgments

This work was supported by French Institute for Health and Medical Care (INSERM) and University Paris 7-Denis Diderot. AB benefits from a grant of the Ministère de la Recherche.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. G. de Brevern.

Additional information

C. Etchebest and C. Benros contributed equally to this work.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(DOC 398 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Etchebest, C., Benros, C., Bornot, A. et al. A reduced amino acid alphabet for understanding and designing protein adaptation to mutation. Eur Biophys J 36, 1059–1069 (2007). https://doi.org/10.1007/s00249-007-0188-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00249-007-0188-5

Keywords

Navigation