Abstract
For most proteins, multiple sequence alignments are a viable method to identify functionally and structurally important amino acids, but for most organisms, there is a subset of proteins that are unique or found in a few closely related organisms. For these proteins, it is not possible to produce sequence alignments that are useful in identifying functionally or structurally important amino acids. We have investigated the relationship between amino acid conservation and five factors (the amino acid’s identity, N-terminal neighbor, C-terminal neighbor, the local hydropathy of surrounding amino acids, and the local expected net charge of the surrounding amino acids based on the primary sequence) in Escherichia coli proteins. For four of the factors examined (all but the amino acid’s identity), there is a significant relationship with conservation for some of the standard 20 amino acids. Using the combination of all five factors, we show that it is possible to calculate a score based on the primary sequences of a subset of E. coli proteins that has statistically significant predictive value with respect to predicting conserved amino acids in other E. coli proteins and Saccharomyces cerevisiae proteins. As these five variables show significant relationships with conservation, we have termed them conservation factors.
Similar content being viewed by others
Abbreviations
- S. cerevisiae :
-
Saccharomyces cerevisiae
- E. coli :
-
Escherichia coli
- Ala:
-
Alanine
- Cys:
-
Cysteine
- Asp:
-
Aspartate
- Glu:
-
Glutamate
- Phe:
-
Phenylalanine
- Gly:
-
Glycine
- His:
-
Histidine
- Ile:
-
Isoleucine
- Lys:
-
Lysine
- Met:
-
Methionine
- Asn:
-
Asparagine
- Pro:
-
Proline
- Gln:
-
Glutamine
- Arg:
-
Arginine
- Ser:
-
Serine
- Thr:
-
Threonine
- Val:
-
Valine
- Trp:
-
Tryptophan
- Tyr:
-
Tyrosine
- COG:
-
Clusters of orthologous groups
References
Argos P, Schwarz J, Schwarz J (1976) Biochim Biophys Acta 439:261–273
Bock JR, Gough DA (2001) Bioinformatics 17:455–460
Boekhorst J, Snel B (2007) BMC Bioinformatics 8:1–7
Chinnasamy A, Mittal A, Sung WK (2006) Comput Biol Med 36:1143–1154
Chou PY, Fasman GD (1978) Adv Enzymol Relat Areas Mol Biol 47:45–148
Gutman GA, Hatfield GW (1989) Proc Natl Acad Sci USA 86:699–703
Kyte J, Doolittle RF (1982) J Mol Biol 157:105–312
Palenchar PM (2008) Protein J 5:283–291
Persson B (2000) EXS 88:215–231
Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA (2003) BMC Bioinformatics 4:41
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Palenchar, P., Mount, M., Cusato, D. et al. Using Genome-Wide Protein Sequence Data to Predict Amino Acid Conservation. Protein J 27, 401–407 (2008). https://doi.org/10.1007/s10930-008-9150-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10930-008-9150-3