A reduced amino acid alphabet for understanding and designing protein adaptation to mutation

Etchebest, C.; Benros, C.; Bornot, A.; Camproux, A.-C.; de Brevern, A. G.

doi:10.1007/s00249-007-0188-5

A reduced amino acid alphabet for understanding and designing protein adaptation to mutation

Original Paper
Published: 13 June 2007

Volume 36, pages 1059–1069, (2007)
Cite this article

European Biophysics Journal Aims and scope Submit manuscript

C. Etchebest¹,
C. Benros¹,
A. Bornot¹,
A.-C. Camproux¹ &
…
A. G. de Brevern¹

849 Accesses
69 Citations
5 Altmetric
Explore all metrics

Abstract

Protein sequence world is considerably larger than structure world. In consequence, numerous non-related sequences may adopt similar 3D folds and different kinds of amino acids may thus be found in similar 3D structures. By grouping together the 20 amino acids into a smaller number of representative residues with similar features, sequence world simplification may be achieved. This clustering hence defines a reduced amino acid alphabet (reduced AAA). Numerous works have shown that protein 3D structures are composed of a limited number of building blocks, defining a structural alphabet. We previously identified such an alphabet composed of 16 representative structural motifs (5-residues length) called Protein Blocks (PBs). This alphabet permits to translate the structure (3D) in sequence of PBs (1D). Based on these two concepts, reduced AAA and PBs, we analyzed the distributions of the different kinds of amino acids and their equivalences in the structural context. Different reduced sets were considered. Recurrent amino acid associations were found in all the local structures while other were specific of some local structures (PBs) (e.g Cysteine, Histidine, Threonine and Serine for the α-helix Ncap). Some similar associations are found in other reduced AAAs, e.g Ile with Val, or hydrophobic aromatic residues Trp with Phe and Tyr. We put into evidence interesting alternative associations. This highlights the dependence on the information considered (sequence or structure). This approach, equivalent to a substitution matrix, could be useful for designing protein sequence with different features (for instance adaptation to environment) while preserving mainly the 3D fold.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Advances in Structural Bioinformatics

Introduction to Bioinformatics

SESNet: sequence-structure feature-integrated deep learning method for data-efficient protein engineering

Article Open access 03 February 2023

References

Akanuma S, Kigawa T, Yokoyama S (2002) Combinatorial mutagenesis to restrict amino acid usage in an enzyme to a reduced set. Proc Natl Acad Sci USA 99:13549–13553
Article ADS Google Scholar
Aurora R, Creamer TP, Srinivasan R, Rose GD (1997) Local interactions in protein folding: lessons from the alpha-helix. J Biol Chem 272:1413–1416
Article Google Scholar
Benros C (2005) Analyse et prédiction des structures tridimensionnelles locales des protéines, vol PhD. Paris 7-Denis Diderot, Paris, pp 212
Benros C, de Brevern AG, Etchebest C, Hazout S (2006) Assessing a novel approach for predicting local 3D protein structures from sequence. Proteins 62:865–880
Article Google Scholar
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28:235–242
Article Google Scholar
Bradley LH, Thumfort PP, Hecht MH (2006) De novo proteins from binary-patterned combinatorial libraries. Methods Mol Biol 340:53–69
Google Scholar
Bradley LH, Wei Y, Thumfort P, Wurth C, Hecht MH (2007) Protein design by binary patterning of polar and nonpolar amino acids. Methods Mol Biol 352:155–166
Google Scholar
Brown BM, Sauer RT (1999) Tolerance of Arc repressor to multiple-alanine substitutions. Proc Natl Acad Sci USA 96:1983–1988
Article ADS Google Scholar
Buhot C, Chenal A, Sanson A, Pouvelle-Moratille S, Gelb MH, Menez A, Gillet D, Maillere B (2004) Alteration of the tertiary structure of the major bee venom allergen Api m 1 by multiple mutations is concomitant with low IgE reactivity. Protein Sci 13:2970–2978
Article Google Scholar
Camproux AC, Tuffery P, Chevrolat JP, Boisvieux JF, Hazout S (1999) Hidden Markov model approach for identifying the modular framework of the protein backbone. Protein Eng 12:1063–1073
Article Google Scholar
Camproux AC, Gautier R, Tuffery P (2004) A hidden markov model derived structural alphabet for proteins. J Mol Biol 339:591–605
Article Google Scholar
Chan HS (1999) Folding alphabets. Nat Struct Biol 6:994–996
Article Google Scholar
Chan HS, Dill KA (1990) Origins of structure in globular proteins. Proc Natl Acad Sci USA 87:6388–6392
Article ADS Google Scholar
Chothia C, Levitt M, Richardson D (1977) Structure of proteins: packing of alpha-helices and pleated sheets. Proc Natl Acad Sci USA 74:4130–4134
Article ADS Google Scholar
Cieplak M, Holter NS, Maritan A, Banavar JR (2001) Amino acid classes and protein folding problem. J Chem Phys 114:1420–1423
Article ADS Google Scholar
Clarke ND (1995) Sequence ‘minimization’: exploring the sequence landscape with simplified sequences. Curr Opin Biotechnol 6:467–472
Article Google Scholar
Colloc’h N, Etchebest C, Thoreau E, Henrissat B, Mornon JP (1993) Comparison of three algorithms for the assignment of secondary structure in proteins: the advantages of a consensus assignment. Protein Eng 6:377–382
Article Google Scholar
de Brevern AG (2005) New assessment of protein blocks. In Silico Biol 5:283–289
Google Scholar
de Brevern AG, Hazout S (2003) ‘Hybrid protein model’ for optimally defining 3D protein structure fragments. Bioinformatics 19:345–353
Article Google Scholar
de Brevern AG, Etchebest C, Hazout S (2000) Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks. Proteins 41:271–287
Article Google Scholar
de Brevern AG, Valadie H, Hazout S, Etchebest C (2002) Extension of a local backbone description using a structural alphabet: a new approach to the sequence-structure relationship. Protein Sci 11:2871–2886
Article Google Scholar
de Brevern AG, Benros C, Gautier R, Valadie H, Hazout S, Etchebest C (2004) Local backbone structure prediction of proteins. In Silico Biol 4:381–386
Google Scholar
de Brevern AG, Etchebest C, Benros C, Hazout S (2007) “Pinning strategy”: a novel approach for predicting the backbone structure in terms of protein blocks from sequence. J Biosci 32:51–70
Article Google Scholar
Dill KA (1985) Theory for the folding and stability of globular proteins. Biochemistry 24:1501–1509
Article Google Scholar
Dill KA, Bromberg S, Yue K, Fiebig KM, Yee DP, Thomas PD, Chan HS (1995) Principles of protein folding—a perspective from simple exact models. Protein Sci 4:561–602
Article Google Scholar
Dokholyan NV (2004) What is the protein design alphabet? Proteins 54:622–628
Article Google Scholar
Dokholyan NV (2005) Studies of folding and misfolding using simplified models. Curr Opin Struct Biol 16:1–7
Google Scholar
Dubreuil O, Bossus M, Graille M, Bilous M, Savatier A, Jolivet M, Menez A, Stura E, Ducancel F (2005) Fine tuning of the specificity of an anti-progesterone antibody by first and second sphere residue engineering. J Biol Chem 280:24880–24887
Article Google Scholar
Dudev M, Lim C (2007) Discovering structural motifs using a structural alphabet: application to magnesium-binding sites. BMC Bioinformatics 8:106
Article Google Scholar
Esteve JG, Falceto F (2004) A general clustering approach with application to the Miyazawa-Jernigan potentials for amino acids. Proteins 55:999–1004
Article Google Scholar
Esteve JG, Falceto F (2005) Classification of amino acids induced by their associated matrices. Biophys Chem 115:177–180
Article Google Scholar
Etchebest C, Benros C, Hazout S, de Brevern AG (2005) A structural alphabet for local protein structures: improved prediction methods. Proteins 59:810–827
Article Google Scholar
Fan K, Wang W (2003) What is the minimum number of letters required to fold a protein? J Mol Biol 328:921–926
Article Google Scholar
Fitzkee NC, Fleming PJ, Gong H, Panasik N Jr, Street TO, Rose GD (2005) Are proteins made from a limited parts list? Trends Biochem Sci 30:73–80
Article Google Scholar
Fourrier L, Benros C, de Brevern AG (2004) Use of a structural alphabet for analysis of short loops connecting repetitive structures. BMC Bioinformatics 5:58
Article Google Scholar
Franks NP, Jenkins A, Conti E, Lieb WR, Brick P (1998) Structural basis for the inhibition of firefly luciferase by a general anesthetic. Biophys J 75:2205–2211
Google Scholar
Gaboriaud C, Bissery V, Benchetrit T, Mornon J-P (1987) Hydrophobic cluster analysis: an efficient new way to compare and analyse amino acid sequences. FEBS Lett 224:149–155
Article Google Scholar
Hecht MH, Das A, Go A, Bradley LH, Wei Y (2004) De novo proteins from designed combinatorial libraries. Protein Sci 13:1711–1723
Article Google Scholar
Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89:10915–10919
Article ADS Google Scholar
Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5:299–314
Article Google Scholar
Ikenaka Y, Nanba H, Yajima K, Yamada Y, Takano M, Takahashi S (1998a) Increase in thermostability of N-carbamyl-D-amino acid amidohydrolase on amino acid substitutions. Biosci Biotechnol Biochem 62:1668–1671
Article Google Scholar
Ikenaka Y, Nanba H, Yamada Y, Yajima K, Takano M, Takahashi S (1998b) Screening, characterization, and cloning of the gene for N-carbamyl-D-amino acid amidohydrolase from thermotolerant soil bacteria. Biosci Biotechnol Biochem 62:882–886
Article Google Scholar
Kamtekar S, Schiffer JM, Xiong H, Babik JM, Hecht MH (1993) Protein design by binary patterning of polar and nonpolar amino acids. Science 262:1680–1685
Article ADS Google Scholar
Karchin R (2003) Evaluating local structure alphabets for protein structure prediction, vol PhD. University of California, Santa Cruz, pp 301
Karchin R, Cline M, Mandel-Gutfreund Y, Karplus K (2003) Hidden Markov models that use predicted local structure for fold recognition: alphabets of backbone geometry. Proteins 51:504–514
Article Google Scholar
Kim YH, Berry AH, Spencer DS, Stites WE (2001) Comparing the effect on protein stability of methionine oxidation versus mutagenesis: steps toward engineering oxidative resistance in proteins. Protein Eng 14:343–347
Article Google Scholar
Kuhlman B, Baker D (2004) Exploring folding free energy landscapes using computational protein design. Curr Opin Struct Biol 14:89–95
Article Google Scholar
Kumar S, Bansal M (1998) Geometrical and sequence characteristics of alpha-helices in globular proteins. Biophys J 75:1935–1944
Article Google Scholar
Law GH, Gandelman OA, Tisi LC, Lowe CR, Murray JA (2006) Mutagenesis of solvent-exposed amino acids in Photinus pyralis luciferase improves thermostability and pH-tolerance. Biochem J 397:305–312
Article Google Scholar
Li T, Fan K, Wang J, Wang W (2003) Reduction of protein sequence complexity by residue grouping. Protein Eng 16:323–330
Article Google Scholar
Liu X, Zhang LM, Guan S, Zheng WM (2003) Distances and classification of amino acids for different protein secondary structures. Phys Rev E Stat Nonlin Soft Matter Phys 67:051927
ADS Google Scholar
Melo F, Marti-Renom MA (2006) Accuracy of sequence alignment and fold assessment using reduced amino acid alphabets. Proteins 63:986–995
Article Google Scholar
Miyazawa S, Jernigan RL (1993) A new substitution matrix for protein sequence searches based on contact frequencies in protein structures. Protein Eng 6:267–278
Article Google Scholar
Murphy LR, Wallqvist A, Levy RM (2000) Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein Eng 13:149–152
Article Google Scholar
Nanba H, Ikenaka Y, Yamada Y, Yajima K, Takano M, Takahashi S (1998) Isolation of Agrobacterium sp. strain KNK712 that produces N-carbamyl-D-amino acid amidohydrolase, cloning of the gene for this enzyme, and properties of the enzyme. Biosci Biotechnol Biochem 62:875–881
Article Google Scholar
Noguchi T, Akiyama Y (2003) PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB) in 2003. Nucleic Acids Res 31:492–493
Article Google Scholar
Noguchi T, Matsuda H, Akiyama Y (2001) PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB). Nucleic Acids Res 29:219–220
Article Google Scholar
Oh KH, Nam SH, Kim HS (2002) Improvement of oxidative and thermostability of N-carbamyl-D-amino acid amidohydrolase by directed evolution. Protein Eng 15:689–695
Article Google Scholar
Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philosophical Magazine 2:559–572
Google Scholar
Plaxco KW, Riddle DS, Grantcharova V, Baker D (1998) Simplified proteins: minimalist solutions to the ‘protein folding problem’. Curr Opin Struct Biol 8:80–85
Article Google Scholar
Regan L, DeGrado WF (1988) Characterization of a helical protein designed from first principles. Science 241:976–978
Article ADS Google Scholar
Riddle DS, Santiago JV, Bray-Hall ST, Doshi N, Grantcharova VP, Yi Q, Baker D (1997) Functional rapidly folding proteins from simplified amino acid sequences. Nat Struct Biol 4:805–809
Article Google Scholar
Rogov SI, Nekrasov AN (2001) A numerical measure of amino acid residues similarity based on the analysis of their surroundings in natural protein sequences. Protein Eng 14:459–463
Article Google Scholar
Sali A, Shakhnovich E, Karplus M (1994) Kinetics of protein folding. A lattice model study of the requirements for folding to the native state. J Mol Biol 235:1614–1636
Article Google Scholar
Sammon JJW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18:401–409
Article ADS Google Scholar
Sander O, Sommer I, Lengauer T (2006) Local protein structure prediction using discriminative models. BMC Bioinformatics 7:14
Article Google Scholar
Smith RF, Smith TF (1990) Automatic generation of primary sequence patterns from sets of related protein sequences. Proc Natl Acad Sci USA 87:118–122
Article ADS Google Scholar
Solis AD, Rackovsky S (2000) Optimized representations and maximal information in proteins. Proteins 38:149–164
Article Google Scholar
Tyagi M (2006) New perspectives for protein structure analysis and mining using sequences of a structural alphabet, vol PhD. Université de la Réunion, Saint-Denis de la Réunion, pp 215
Tyagi M, Sharma P, Swamy C, Cadet F, Srinivasan N, de Brevern AG, Offmann B (2006a) Protein Block Expert (PBE): A web-based protein structure analysis server using a structural alphabet. Nucleic Acids Res:W119–123
Article Google Scholar
Tyagi M, Gowri VS, Srinivasan N, de Brevern AG, Offmann B (2006b) A substitution matrix for structural alphabet based on structural alignment of homologous proteins and its applications. Proteins 65:32–39
Article Google Scholar
Unger R, Harel D, Wherland S, Sussman JL (1989) A 3D building blocks approach to analyzing and predicting structure of proteins. Proteins 5:355–373
Article Google Scholar
Wang J, Wang W (1999) A computational approach to simplifying the protein folding alphabet. Nat Struct Biol 6:1033–1038
Article Google Scholar
Wang WC, Hsu WH, Chien FT, Chen CY (2001) Crystal structure and site-directed mutagenesis studies of N-carbamoyl-D-amino-acid amidohydrolase from Agrobacterium radiobacter reveals a homotetramer and insight into a catalytic cleft. J Mol Biol 306:251–261
Article Google Scholar
Wei Y, Kim S, Fela D, Baum J, Hecht MH (2003) Solution structure of a de novo protein from a designed combinatorial library. Proc Natl Acad Sci USA 100:13270–13273
Article ADS Google Scholar
Wrabl JO, Grishin NV (2005) Grouping of amino acid types and extraction of amino acid properties from multiple sequence alignments using variance maximization. Proteins 61:523–534
Article Google Scholar
Xu W, Miranker DP (2004) A metric model of amino acid substitution. Bioinformatics 20:1214–1221
Article Google Scholar
Yue K, Fiebig KM, Thomas PD, Chan HS, Shakhnovich EI, Dill KA (1995) A test of lattice protein folding algorithms. Proc Natl Acad Sci USA 92:325–329
Article ADS Google Scholar

Download references

Acknowledgments

This work was supported by French Institute for Health and Medical Care (INSERM) and University Paris 7-Denis Diderot. AB benefits from a grant of the Ministère de la Recherche.

Author information

Authors and Affiliations

Equipe de Bioinformatique Génomique et Moléculaire (EBGM), INSERM UMR-S 726, Université Denis DIDEROT, Paris 7, case 7113, 2, place Jussieu, 75251, Paris, France
C. Etchebest, C. Benros, A. Bornot, A.-C. Camproux & A. G. de Brevern

Authors

C. Etchebest
View author publications
You can also search for this author in PubMed Google Scholar
C. Benros
View author publications
You can also search for this author in PubMed Google Scholar
A. Bornot
View author publications
You can also search for this author in PubMed Google Scholar
A.-C. Camproux
View author publications
You can also search for this author in PubMed Google Scholar
A. G. de Brevern
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. G. de Brevern.

Additional information

C. Etchebest and C. Benros contributed equally to this work.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(DOC 398 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Etchebest, C., Benros, C., Bornot, A. et al. A reduced amino acid alphabet for understanding and designing protein adaptation to mutation. Eur Biophys J 36, 1059–1069 (2007). https://doi.org/10.1007/s00249-007-0188-5

Download citation

Received: 13 February 2007
Revised: 05 May 2007
Accepted: 07 May 2007
Published: 13 June 2007
Issue Date: November 2007
DOI: https://doi.org/10.1007/s00249-007-0188-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A reduced amino acid alphabet for understanding and designing protein adaptation to mutation

Abstract

Access this article

Similar content being viewed by others

Advances in Structural Bioinformatics

Introduction to Bioinformatics

SESNet: sequence-structure feature-integrated deep learning method for data-efficient protein engineering

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

(DOC 398 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A reduced amino acid alphabet for understanding and designing protein adaptation to mutation

Abstract

Access this article

Similar content being viewed by others

Advances in Structural Bioinformatics

Introduction to Bioinformatics

SESNet: sequence-structure feature-integrated deep learning method for data-efficient protein engineering

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

(DOC 398 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation