Abstract
In a previous paper we obtained ten (orthogonal) factors, linear combinations of which can express the properties of the 20 naturally occurring amino acids. In this paper, we assume that the most important properties (linear combinations of these ten factors) that determine the three-dimensional structure of a protein are conserved properties, i.e., are those that have been conserved during evolution. Two definitions of a conserved property are presented: (1) a conserved property for an average protein is defined as that linear combination of the ten factors that optimally expresses the similarity of one amino acid to another (hence, little change during evolution), as given by the relatedness odds matrix of Dayhoff et al.; (2) a conserved property for each position in the amino acid sequence (locus) of a specific family of homologous proteins (the cytochromec family or the globin family) is defined as that linear combination of the ten factors that is common among a set of amino acids at a given locus when the sequences are properly aligned. When the specificity at each locus is averaged over all loci, the same features are observed for three expressions of these two definitions, namely the conserved property for an average protein, the average conserved property for the cytochromec family, and the average conserved property for the globin family; we find that bulk and hydrophobicity (information about packing and long-range interactions) are more important than other properties, such as the preference for adopting a specific backbone structure (information about short-range interactions). We also demonstrate that the sequence profile of a conserved property, defined for each locus of a protein family (definition 2), corresponds uniquely to the three-dimensional structure, while the conserved property for an average protein (definition 1) is not useful for the prediction of protein structure. The amino acid sequences of numerous proteins are searched to find those that are similar, in terms of the conserved properties (definition 2), to sequences of the same size from one of the homologous families (cytochromec and globin, respectively) for whose loci the conserved properties were defined. Many similar sequences are found, the number of similarities decreasing with increasing size of the segment. However, the segments must be rather long (≥15 residues) before the comparisons become meaningful. As an example, one sufficiently large sequence (20 residues) from a protein of known structure (apo-liver alcohol dehydrogenase that is not a member of either family) is found to be similar in the conserved properties to a particular sequence of a member of the family of human hemoglobin α chains, and the two sequences have similar structures. This means that, since conserved properties are expected to be structure determinants, we can use the conserved properties to predict an initial protein structure for subsequent energy minimization for a protein for which the conserved properties are similar to those of a family of proteins with a sufficiently large number of homologous amino acid sequences; such a large number of homologous sequences is required to define a conserved property for each locus of the homologous protein family.
Similar content being viewed by others
References
Baba, M. L., Darga, L. L., Goodman, M., and Czelusniak, J. (1981).J. Mol. Evol. 17, 197–213.
Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, Jr., E. F., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T., and Tatsumi, M. (1977).J. Mol. Biol. 112, 535–542.
Dayhoff, M. O., Schwartz, R. M., and Orcutt, B. C. (1978). InAtlas of Protein Sequence and Structure (Dayoff, M. O., ed.), National Biomedical Research Foundation, Washington, D. C., Vol. 5, Suppl. 3, pp. 345–352.
Dickerson, R. E. (1980). InThe Evolution of Protein Structure and Function (Sigman, D. S., and Brazier, M. A. B., eds.), Academic Press, New York, pp. 173–202.
Eisenberg, D., Weiss, R. M., and Terwilliger, T. C. (1982).Nature 299, 371–374.
Epstein, C. J. (1967).Nature 215, 355–359.
French, S., and Robson, B. (1983).J. Mol. Evol. 19, 171–175.
Gay, D. M. (1983).ACM Trans. Math. Software 9, 503–524.
Gō, M., and Miyazawa, S. (1980).Int. J. Peptide Protein Res. 15, 211–224.
Goodman, M. (1981).Prog. Biophys. Mol. Biol. 38, 105–164.
Grantham, R. (1974).Science 185, 862–864.
IMSL (1982).IMSL Library Reference Manual, 9th ed., IMSL, Houston.
Kabsch, W., and Sander, C. (1983).Biopolymers 22, 2577–2637.
Kidera, A., Konishi, Y., Oka, M., Ooi, T., and Scheraga, H. A. (1985).J. Protein Chem. 4, 23–55.
Lawn, R. M., Adelman, J., Dull, T. J., Gross, M., Goeddel, D., and Ullrich, A. (1981).Science 212, 1159–1162.
Lesk, A. M., and Chothia, C. (1980).J. Mol. Biol. 136, 225–270.
Morrison, D. F. (1976).Multivariate Statistical Method, McGraw-Hill, New York.
Némethy, G., Pottle, M. S., and Scheraga, H. A. (1983).J. Phys. Chem. 87, 1883–1887.
Ohno, S., and Taniguchi, T. (1981).Proc. Natl. Acad. Sci. USA 78, 5305–5309.
Orcutt, B. C., and Dayhoff, M. O. (1982).Protein Sequence Database, National Biomedical Research Foundation, Washington, D.C.
Perutz, M. F., Kendrew, J. C., and Watson, H. C. (1965).J. Mol. Biol. 13, 669–678.
Pestka, S. (1983).Arch. Biochem. Biophys. 221, 1–37.
Ptitsyn, O. B. (1974).J. Mol. Biol. 88, 287–300.
Richardson, J. S. (1981).Adv. Protein Chem. 34, 167–339.
Rose, G. D., and Roy, S. (1980).Proc. Natl. Acad. Sci. USA 77, 4643–4647.
Schulz, G. E., and Schirmer, R. H. (1979).Principles of Protein Structure, Springer, New York.
Shrake, A., and Rupley, J. A. (1973).J. Mol. Biol. 79, 351–371.
Sippl, M. J. (1982).J. Mol. Biol. 156, 359–388.
Thompson, E. O. P. (1980). InThe Evolution of Protein Structure and Function (Sigman, D. A., and Brazier, M. A. B., eds.), Academic Press, New York, pp. 267–298.
Vogel, H., and Zuckerkandl, E. (1972). InProceeding of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (Lecam, L. M., Neyman, J., and Scott, E., eds.), University of California Press, Los Angeles, pp. 155–176.
Zuckerkandl, E., and Pauling, L. (1965). InEvolving Genes and Proteins (Bryson, V., and Vogel, H. J., eds.), Academic Press, New York, pp. 97–166.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Kidera, A., Konishi, Y., Ooi, T. et al. Relation between sequence similarity and structural similarity in proteins. Role of important properties of amino acids. J Protein Chem 4, 265–297 (1985). https://doi.org/10.1007/BF01025494
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF01025494