Using Genome-Wide Protein Sequence Data to Predict Amino Acid Conservation

Palenchar, Peter; Mount, Mathew; Cusato, Douglas; Dougherty, Jeffery

doi:10.1007/s10930-008-9150-3

Using Genome-Wide Protein Sequence Data to Predict Amino Acid Conservation

Published: 16 September 2008

Volume 27, pages 401–407, (2008)
Cite this article

The Protein Journal Aims and scope Submit manuscript

Peter Palenchar¹,
Mathew Mount¹,
Douglas Cusato¹ &
…
Jeffery Dougherty¹

98 Accesses
3 Citations
Explore all metrics

Abstract

For most proteins, multiple sequence alignments are a viable method to identify functionally and structurally important amino acids, but for most organisms, there is a subset of proteins that are unique or found in a few closely related organisms. For these proteins, it is not possible to produce sequence alignments that are useful in identifying functionally or structurally important amino acids. We have investigated the relationship between amino acid conservation and five factors (the amino acid’s identity, N-terminal neighbor, C-terminal neighbor, the local hydropathy of surrounding amino acids, and the local expected net charge of the surrounding amino acids based on the primary sequence) in Escherichia coli proteins. For four of the factors examined (all but the amino acid’s identity), there is a significant relationship with conservation for some of the standard 20 amino acids. Using the combination of all five factors, we show that it is possible to calculate a score based on the primary sequences of a subset of E. coli proteins that has statistically significant predictive value with respect to predicting conserved amino acids in other E. coli proteins and Saccharomyces cerevisiae proteins. As these five variables show significant relationships with conservation, we have termed them conservation factors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Abbreviations

S. cerevisiae :: Saccharomyces cerevisiae
E. coli :: Escherichia coli
Ala:: Alanine
Cys:: Cysteine
Asp:: Aspartate
Glu:: Glutamate
Phe:: Phenylalanine
Gly:: Glycine
His:: Histidine
Ile:: Isoleucine
Lys:: Lysine
Met:: Methionine
Asn:: Asparagine
Pro:: Proline
Gln:: Glutamine
Arg:: Arginine
Ser:: Serine
Thr:: Threonine
Val:: Valine
Trp:: Tryptophan
Tyr:: Tyrosine
COG:: Clusters of orthologous groups

References

Argos P, Schwarz J, Schwarz J (1976) Biochim Biophys Acta 439:261–273
CAS Google Scholar
Bock JR, Gough DA (2001) Bioinformatics 17:455–460
Article CAS Google Scholar
Boekhorst J, Snel B (2007) BMC Bioinformatics 8:1–7
Article Google Scholar
Chinnasamy A, Mittal A, Sung WK (2006) Comput Biol Med 36:1143–1154
Article CAS Google Scholar
Chou PY, Fasman GD (1978) Adv Enzymol Relat Areas Mol Biol 47:45–148
CAS Google Scholar
Gutman GA, Hatfield GW (1989) Proc Natl Acad Sci USA 86:699–703
Article Google Scholar
Kyte J, Doolittle RF (1982) J Mol Biol 157:105–312
Article CAS Google Scholar
Palenchar PM (2008) Protein J 5:283–291
Article Google Scholar
Persson B (2000) EXS 88:215–231
CAS Google Scholar
Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA (2003) BMC Bioinformatics 4:41
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Chemistry, Rutgers University, 315 Penn St, Camden, NJ, 08102-1411, USA
Peter Palenchar, Mathew Mount, Douglas Cusato & Jeffery Dougherty

Authors

Peter Palenchar
View author publications
You can also search for this author in PubMed Google Scholar
Mathew Mount
View author publications
You can also search for this author in PubMed Google Scholar
Douglas Cusato
View author publications
You can also search for this author in PubMed Google Scholar
Jeffery Dougherty
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Palenchar.

Electronic supplementary material

Below is the link to the electronic supplementary material.

MOESM1 (PDF 45 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Palenchar, P., Mount, M., Cusato, D. et al. Using Genome-Wide Protein Sequence Data to Predict Amino Acid Conservation. Protein J 27, 401–407 (2008). https://doi.org/10.1007/s10930-008-9150-3

Download citation

Published: 16 September 2008
Issue Date: September 2008
DOI: https://doi.org/10.1007/s10930-008-9150-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using Genome-Wide Protein Sequence Data to Predict Amino Acid Conservation

Abstract

Access this article

Similar content being viewed by others

BLAST and FASTA Similarity Searching for Multiple Sequence Alignment

Distinguishing Proteins From Arbitrary Amino Acid Sequences

Quantiprot - a Python package for quantitative analysis of protein sequences

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

MOESM1 (PDF 45 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using Genome-Wide Protein Sequence Data to Predict Amino Acid Conservation

Abstract

Access this article

Similar content being viewed by others

BLAST and FASTA Similarity Searching for Multiple Sequence Alignment

Distinguishing Proteins From Arbitrary Amino Acid Sequences

Quantiprot - a Python package for quantitative analysis of protein sequences

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

MOESM1 (PDF 45 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation