Skip to main content
Log in

Using Genome-Wide Protein Sequence Data to Predict Amino Acid Conservation

  • Published:
The Protein Journal Aims and scope Submit manuscript

Abstract

For most proteins, multiple sequence alignments are a viable method to identify functionally and structurally important amino acids, but for most organisms, there is a subset of proteins that are unique or found in a few closely related organisms. For these proteins, it is not possible to produce sequence alignments that are useful in identifying functionally or structurally important amino acids. We have investigated the relationship between amino acid conservation and five factors (the amino acid’s identity, N-terminal neighbor, C-terminal neighbor, the local hydropathy of surrounding amino acids, and the local expected net charge of the surrounding amino acids based on the primary sequence) in Escherichia coli proteins. For four of the factors examined (all but the amino acid’s identity), there is a significant relationship with conservation for some of the standard 20 amino acids. Using the combination of all five factors, we show that it is possible to calculate a score based on the primary sequences of a subset of E. coli proteins that has statistically significant predictive value with respect to predicting conserved amino acids in other E. coli proteins and Saccharomyces cerevisiae proteins. As these five variables show significant relationships with conservation, we have termed them conservation factors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

S. cerevisiae :

Saccharomyces cerevisiae

E. coli :

Escherichia coli

Ala:

Alanine

Cys:

Cysteine

Asp:

Aspartate

Glu:

Glutamate

Phe:

Phenylalanine

Gly:

Glycine

His:

Histidine

Ile:

Isoleucine

Lys:

Lysine

Met:

Methionine

Asn:

Asparagine

Pro:

Proline

Gln:

Glutamine

Arg:

Arginine

Ser:

Serine

Thr:

Threonine

Val:

Valine

Trp:

Tryptophan

Tyr:

Tyrosine

COG:

Clusters of orthologous groups

References

  1. Argos P, Schwarz J, Schwarz J (1976) Biochim Biophys Acta 439:261–273

    CAS  Google Scholar 

  2. Bock JR, Gough DA (2001) Bioinformatics 17:455–460

    Article  CAS  Google Scholar 

  3. Boekhorst J, Snel B (2007) BMC Bioinformatics 8:1–7

    Article  Google Scholar 

  4. Chinnasamy A, Mittal A, Sung WK (2006) Comput Biol Med 36:1143–1154

    Article  CAS  Google Scholar 

  5. Chou PY, Fasman GD (1978) Adv Enzymol Relat Areas Mol Biol 47:45–148

    CAS  Google Scholar 

  6. Gutman GA, Hatfield GW (1989) Proc Natl Acad Sci USA 86:699–703

    Article  Google Scholar 

  7. Kyte J, Doolittle RF (1982) J Mol Biol 157:105–312

    Article  CAS  Google Scholar 

  8. Palenchar PM (2008) Protein J 5:283–291

    Article  Google Scholar 

  9. Persson B (2000) EXS 88:215–231

    CAS  Google Scholar 

  10. Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA (2003) BMC Bioinformatics 4:41

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Palenchar.

Electronic supplementary material

Below is the link to the electronic supplementary material.

MOESM1 (PDF 45 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Palenchar, P., Mount, M., Cusato, D. et al. Using Genome-Wide Protein Sequence Data to Predict Amino Acid Conservation. Protein J 27, 401–407 (2008). https://doi.org/10.1007/s10930-008-9150-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10930-008-9150-3

Keywords

Navigation