Abstract
A statistical analysis of protein conformations in terms of the distance between residues, represented by their Cα atoms, is presented. We consider four factors that contribute to the determination of the distanced i,i+k between a given pair ofith and(i+k)th residues in the native conformation of a globular protein: (1) the distancek along the chain, (2) the size of the protein, (3) the conformational states of theith to(i+k)th residues, and (4) the amino acid types of the and(i+k)th residues. In order to account for the dependence on the distancek along the chain, the statistics are taken for three ranges, viz., short, medium, and long ranges (k≤8; 9≤k≤20; andk≥21; respectively). In the statistics of short-range distances, a mean distanceD k and its standard deviationS k are calculated for each value ofk, with and without taking into account the conformational states of all residues fromi toi+k (factors 1 and 3). As an Appendix, the relations for converting from the distances between residues into other conformational parameters are discussed. In the statistics of long-range distances, a reduced distanced* ij (the actual distance divided by the radius of gyration) is used to scale the data so that they become independent of protein size, and then a mean reduced distanceD l (aμ, aν) and its standard deviation σl (aμ, aν) are calculated for each amino acid pair (aμ, aν) (factors 2 and 4). The effect of the neighboring residues along the chain on the value of the distanced* ij is explored by a linear regression analysis between the actual reduced distanced* ij and the mean value over theD l for all possible pairs of residues in the two segments of the (i−2)th to the (i+2)th and the (j−2)th to the (j+2)th residues. The effect is assessed in terms of the tangentA l (aμ, aν) of the calculated regression line for each amino acid pair (aμ, aν). In the statistics of medium-range distances, only factors 1 and 4 are considered, to simplify the analysis. The scaled distanced †i,i+k =(d i,i+k-D k)/S k is used to eliminate the dependence onk, the distance along the chain. The propertiesD m (aμ, aν), σm (aμ, aν) andA m (aμ, aν) corresponding toD l (aμ, aν), σl (aμ, aν), andA l (aμ, aν), and also calculated for each amino acid pair (aμ, aν). The results are interpreted as follows: the smaller values ofD l (aμ, aν) andD m (aμ, aν) indicate a preference of the pair (aμ, aν) for a contact (e.g., pairs between hydrophobic amino acids, and pairs of Cys with aromatic amino acids), and the larger values of these quantities indicate a preference for distant mutual location (e.g., pairs between strong hydrophilic amino acids); the smaller values of σl (aμ, aν) and σm (aμ, aν) indicate a strong preference for either contact or noncontact (e.g., pairs between hydrophobic amino acids, and pairs between strong hydrophobic and hydrophilic amino acids, respectively), and the larger values of these quantities indicate the ambivalent/neutral nature of the preference for contact and noncontact (e.g., pairs containing Ser or Thr); the smaller values ofA l (aμ, aν) andA m (aμ, aν) indicate that the distance of an (aμ, aν) pair is determined independently of the amino acid character of the neighboring residues along the chain (e.g., some pairs of Cys or Met with other amino acids) and the larger values of these quantities indicare that such amino acid character contributes strongly to the determination of the distance (e.g., pairs containing Ser or Thr, and pairs between amino acids with small side chains). The difference between the statistics for the long- and medium-range distances is also discussed; the former reflect the difference between the hydrophobic and hydrophilic character of the residues, but the latter cannot be easily interpretable only in terms of hydrophobicity and hydrophilicity. The data analyzed here are used in the optimization of an object function to compute protein conformation in a subsequent paper.
Similar content being viewed by others
References
Brant, D. A. and Flory, P. J. (1965).J. Am. Chem. Soc. 87, 2791–2800.
Burgess, A. W., Ponnuswamy, P. K., and Scheraga, H. A. (1974).Isr. J. Chem. 12, 239–286.
Chothia, C. (1973).J. Mol. Biol. 75, 295–302.
Chou, K. C., Pottle, M. S., Némethy, G., Ueda, Y., and Scheraga, H. A. (1982).J. Mol. Biol., submitted.
Crippen, G. M. (1977a).Biopolymers 16, 2189–2201.
Crippen, G. M. (1977b).J. Comput. Phys. 24, 96–107.
Crippen, G. M., and Havel, T. F. (1978).Acta Cryst. A34, 282–284.
Gō, M., and Miyazawa, S. (1978).Int. J. Peptide Protein Res. 12, 237–241.
Goel, N. S., and Yeas, M. (1979).J. Theor. Biol. 77, 253–305.
Havel, T. F., Crippen, G. M., and Kuntz, I. D. (1979).Biopolymers 18, 73–81.
Isogai, Y., Némethy, G., Fackovsky, S., Leach, S. J., and Scheraga, H. A. (1980).Biopolymers 19, 1183–1210.
IUPAC-IUB Commission on Biochemical Nomenclature (1970).Biochemistry 9, 3471–3479.
Kuntz, I. D., Crippen, G. M., and Kollman, P. A. (1979).Biopolymers 13, 939–957.
Lewis, P. N., Momany, F. A., and Scheraga, H. A. (1971).Proc. Natl. Acad. Sci. USA 68, 2293–2297.
Lewis, P. N., Momany, F. A., and Scheraga, H. A. (1973).Biochim. Biophys. Acta 303, 211–229.
Manavalan, P., and Ponnuswamy, F. K. (1977).Arch. Biochem. Biophys. 184, 476–487.
McGuire, R. F., Vanderkooi, G., Momany, F. A., Engwall, R. T., Crippen, G. M., Lotan, N., Tuttle, F. W., Kashuba, K. L., and Scheraga, H. A. (1971).Macromolecules 4, 112–124.
Meirovitch, H., and Scheraga, H. A. (1980).Macromolecules 13, 1406–1414.
Meirovitch, H., Rackovsky, S., and Scheraga, H. A. (1980).Macromolecules 13, 1398–1405.
Miyazawa, T. (1961).J. Polymer Sci. 55, 215–231.
Momany, F. A., McGuire, R. F., Burgess, A. W., and Scheraga, H. A. (1975).J. Phys. Chem. 79, 2361–2381.
Morgan, R. S., Tatsch, C. E., Gushard, R. H., McAdon, J. M., and Warme, P. K. (1978).Int. J. Peptide Protein Res. 11, 209–217.
Nagano, K. (1977).J. Mol. Biol. 109, 235–250.
Némethy, G., and Scheraga H. A. (1977).Quart. Rev. Biophys. 10, 239–352.
Némethy, G., and Scheraga, H. A. (1981).Biochem. Biophys. Res. Commun. 98, 482–487.
Nishikawa, K., and Ooi, T. (1980).Int. J. Peptide Protein Res. 16, 19–32.
Nishikawa, K., Momany, F. A., and Scheraga, H. A. (1974).Macromolecules 7, 797–806.
Oobarake, M., and Crippen, G. M. (1981).J. Phys. Chem. 85, 1187–1197.
Prabhakaran, M., and Ponnuswamy, P. K. (1980).J. Theor. Biol. 87, 623–637.
Rackovsky, S., and Scheraga, H. A. (1977).Proc. Natl. Acad. Sci. USA 74, 5248–5251.
Rackovsky, S., and Scheraga, H. A. (1978).Macromolecules 11, 1168–1174.
Rackovsky, S., and Scheraga, H. A. (1980).Macromolecules 13, 1440–1453.
Rackovsky, S., and Scheraga, H. A. (1981).Macromolecules 14, 1259–1269.
Schulz, G. E., and Schirmer, R. H. (1979).Principles of Protein Structure, Springer-Verlag, New York, pp. 69, 77–78.
Sugeta, H., and Miyazawa, T. (1967).Biopolymers 5, 673–679.
Tanaka, S., and Scheraga, H. A. (1976).Macromolecules 9, 945–950.
Wako, H., and Scheraga, H. A. (1981).Macromolecules 14, 961–969.
Wako, H., and Scheraga, H. A. (1982a).Biopolymers 21, 611–632.
Wako, H., and Scheraga, H. A. (1982b).J. Protein Chem. 1, 85–117.
Wertz, D. H., and Scheraga, H. A. (1978).Macromolecules 11, 9–15.
Yeas, M., Goel, N. S., and Facobsen, J. W. (1978).J. Theor. Biol. 72, 443–457.
Zimmerman, S. S., Shipman, L. L., and Scheraga, H. A. (1977).J. Phys. Chem. 81, 614–622.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Wako, H., Scheraga, H.A. Distance-constraint approach to protein folding. I. Statistical analysis of protein conformations in terms of distances between residues. J Protein Chem 1, 5–45 (1982). https://doi.org/10.1007/BF01025549
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF01025549