Skip to main content
Log in

The evolution of proteins from random amino acid sequences. I. Evidence from the lengthwise distribution of amino acids in modern protein sequences

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Summary

We examine in this paper one of the expected consequences of the hypothesis that modern proteins evolved from random heteropeptide sequences. Specifically, we investigate the lengthwise distributions of amino acids in a set of 1,789 protein sequences with little sequence identity using the run test statistic (r o) of Mood (1940,Ann. Math. Stat. 11, 367–392). The probability density ofr o for a collection of random sequences has mean=0 and variance=1 [the N(0,1) distribution] and can be used to measure the tendency of amino acids of a given type to cluster together in a sequence relative to that of a random sequence. We implement the run test using binary representations of protein sequences in which the amino acids of interest are assigned a value of 1 and all others a value of 0. We consider individual amino acids and sets of various combinations of them based upon hydrophobicity (4 sets), charge (3 sets), volume (4 sets), and secondary structure propensity (3 sets). We find that any sequence chosen randomly has a 90% or greater chance of having a lengthwise distribution of amino acids that is indistinguishable from the random expectation regardless of amino acid type. We regard this as strong support for the random-origin hypothesis. However, we do observe significant deviations from the random expectation as might be expected after billions years of evolution. Two important global trends are found: (1) Amino acids with a strong α-helix propensity show a strong tendency to cluster whereas those with β-sheet or reverse-turn propensity do not. (2) Clustered rather than evenly distributed patterns tend to be preferred by the individual amino acids and this is particularly so for methionine. Finally, we consider the problem of reconciling the random nature of protein sequences with structurally meaningful periodic “patterns” that can be detected by sliding-window, autocorrelation, and Fourier analyses. Two examples, rhodopsin and bacteriorhodopsin, show that such patterns are a natural feature of random sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Barlow RJ (1989) Statistics. A guide to the use of statistical methods in the physical sciences. John Wiley and Sons, New York, pp 1–204

    Google Scholar 

  • Black JA, Harkins RN, Stenzel P (1976) Non-random relationships among amino acids in protein sequences. Int J Peptide Protein Res 8:125–130

    Article  CAS  Google Scholar 

  • Blake C (1983) Exons—present from the beginning? Nature (London) 306:535–537

    Article  CAS  Google Scholar 

  • Chan HS and Dill KA (1990) Origins of structure in globular proteins. Proc Natl Acad Sci USA 87:6388–6392

    CAS  PubMed  Google Scholar 

  • Darnell JE (1978) Implications of RNA-RNA splicing in evolution of eukaryotic cells. Science 202:1257–1260

    CAS  PubMed  Google Scholar 

  • David FN, Barton DE (1962) Combinatorial chance. Charles Griffin and Co., London, pp 1–356

    Google Scholar 

  • Doolittle RF (1979) Protein evolution. In: Neurath H, Hill RL (eds) The proteins, vol IV. Academic Press, New York, pp 1–118

    Google Scholar 

  • Doolittle RF (1989) Redundancies in protein sequences. In: Fasman GD (ed) Prediction of protein structure and the principles of protein conformation. Plenum Press, New York, pp 599–623

    Google Scholar 

  • Doolittle WF (1978) Genes in pieces: were they ever together? Nature (London) 272:581–582

    Article  Google Scholar 

  • Dorit RL, Schoenbach L, Gilbert W (1990) How big is the universe of exons. Science 250:1377–1382

    CAS  PubMed  Google Scholar 

  • Dorit RL, Gilbert W (1991) The limited universe of exons. Cur Opinion Struc Biol 1:973–977

    Article  CAS  Google Scholar 

  • Eck RV, Dayhoff MO (1966) Evolution of the structure of ferredoxin based on living relics of primitive amino acid sequences. Science 152:363–366

    CAS  PubMed  Google Scholar 

  • Eisenberg D, Weiss RM, Terwilliger TC (1982) The helical hydrophobic moment: a measure of the amphiphilicity of a helix. Nature (London) 299:371–374

    Article  CAS  Google Scholar 

  • Eisenberg D, Weiss RM, Terwilliger TC (1984) The hydrophobic moment detects periodicity in protein hydrophobicity. Proc Natl Acad Sci USA 81:140–144

    CAS  PubMed  Google Scholar 

  • Engelman DM, Steitz TA, Goldman A (1986) Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. Annu Rev Biophys Biophys Chem 15:321–353

    Article  CAS  PubMed  Google Scholar 

  • Fasman GD (1989) The development of the prediction of protein structure. In: Fasman GD (ed) Prediction of protein structure and the principles of protein conformation. Plenum Press, New York, pp 193–316

    Google Scholar 

  • Finkelstein AV, Ptitsyn OB (1987) Why do globular proteins fit the limited set of folding patterns. Prog Biophys Molec Biol 50:171–190

    Article  CAS  Google Scholar 

  • Fisher HF (1964) A limiting law relating the size and shape of protein molecules to their composition. Proc Natl Acad Sci USA 51:1285–1291

    CAS  PubMed  Google Scholar 

  • Fitch WM, Margoliash E (1967) A method for estimating the number of invariant amino acid coding positions in a gene using cytochrome c as a model case. Biochem Genet 1:65–71

    Article  CAS  PubMed  Google Scholar 

  • Gamow G, Ycas M (1958) The cryptographic approach to the problem of protein synthesis. In: Yockey HP (ed) Symposium on information theory in biology. Pergamon Press, New York, pp 63–69

    Google Scholar 

  • Garnier J (1990) Protein structure prediction. Biochimie 72:513–524

    Article  CAS  PubMed  Google Scholar 

  • Gates RE, Fisher HF (1971) Restrictions of sequence on the thickness of globular protein molecules. Proc Natl Acad Sci USA 68:2928–2931

    CAS  PubMed  Google Scholar 

  • Gellman SH (1991) On the role of methionine residues in the sequence-independent recognition of nonpolar protein surfaces. Biochemistry 30:6633–6636

    Article  CAS  PubMed  Google Scholar 

  • George DG, Barker WC, Hunt LT (1986) The protein identification resource (PIR). Nucleic Acids Res 14:11–15

    CAS  PubMed  Google Scholar 

  • Gilbert W (1978) Why genes in pieces? Nature (London) 271:501

    Article  CAS  Google Scholar 

  • Holland SK, Blake CCF (1990) Proteins, exons, and molecular evolution. In: Stone EM, Schwartz RJ (eds) Intervening sequences in evolution and development. Oxford University Press, New York, pp 10–42

    Google Scholar 

  • Janin J (1979) Surface and inside volumes in globular proteins. Nature (London) 277:491–492

    Article  CAS  Google Scholar 

  • Jukes TH (1969) Evolutionary pattern of specificity regions in light chains of immunoglobulins. Biochem Genet 3:109–117

    Article  CAS  PubMed  Google Scholar 

  • Karlin S, Bucher P, Brendel V, Altschul SF (1991) Statistical methods and insights for protein and DNA sequences. Annu Rev Biophys Biophys Chem 20:175–203

    Article  CAS  PubMed  Google Scholar 

  • Karlin S, Altschul SF (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci USA 87:2264–2268

    CAS  PubMed  Google Scholar 

  • Khorana HG, Gerber GE, Herlihy WC, Gray CP, Anderegg RJ, Nihei K, Biemann K (1979) Amino acid sequence of bacteriorhodopsin. Proc Natl Acad Sci USA 76:5046–5050

    CAS  PubMed  Google Scholar 

  • Klapper MH (1977a) Amino acid frequency distributions in proteins. Fed Proc 36:837

    Google Scholar 

  • Klapper MH (1977b) The independent distribution of amino acid near neighbor pairs and in polypeptides. Biochem Biophys Res Comm 78:1018–1024

    CAS  PubMed  Google Scholar 

  • Lau KF, Dill KA (1990) Theory for protein mutability and biogenesis. Proc Natl Acad Sci USA 87:638–642

    CAS  PubMed  Google Scholar 

  • Lee B, Richards FM (1971) The interpretation of protein structures: estimation of static accessibility. J Mol Biol 55:379–400

    Article  CAS  PubMed  Google Scholar 

  • Levitt M (1978) Conformational preferences of amino acids in globular proteins. Biochemistry 17:4277–4285

    Article  CAS  PubMed  Google Scholar 

  • Macchiato V, Tramontano A (1985) Determination of the autocorrelation orders in proteins. Eur J Biochem 149:375–379

    Article  CAS  PubMed  Google Scholar 

  • McCaldon P, Argos P (1988) Oligopeptide biases in protein sequences and their use in predicting protein coding regions in nucleotide sequences. Protein-Struct Funct Genet 4:99–122

    Article  CAS  Google Scholar 

  • McLachlan AD (1972) Repeating sequences and gene duplication in proteins. J Mol Biol 64:417–437

    Article  CAS  PubMed  Google Scholar 

  • McLachlan AD, Stewart M (1976) The 14-fold periodicity in alpha-tropomyosin and the interaction with actin. J Mol Biol 103:271–298

    Article  CAS  PubMed  Google Scholar 

  • Mood AM (1940) The distribution theory of runs. Ann Math Stat 11:367–392

    Google Scholar 

  • Nathans J, Hogness DS (1983) Isolation, sequence analysis, and intron-exon arrangement of the gene encoding bovine rhodopsin. Cell 34:807–814

    Article  CAS  PubMed  Google Scholar 

  • Orcutt BC, George DG, Dayhoff MO (1983) Protein and nucleic acid data base systems. Annu Rev Biophys Bioengr 12:419–441

    Article  CAS  Google Scholar 

  • Parzen E (1967) Time series analysis papers. Holden-Day, San Francisco pp 1–565

    Google Scholar 

  • Patthy L (1991) Exons—original building blocks of proteins? BioEssays 13:187–192

    Article  CAS  PubMed  Google Scholar 

  • Peebles PJE, Schramm DN, Turner EL, Kron RG (1991) The case for the relativistic hot Big Bang cosmology. Nature (London) 352:769–776

    Article  Google Scholar 

  • Ptitsyn OB (1985) Random sequences and protein folding. J Molec Struct (Theochem) 123:45–65

    Article  Google Scholar 

  • Ptitsyn OB (1987) Protein folding: hypotheses and experiments. J Protein Chem 6:273–294

    Article  CAS  Google Scholar 

  • Rose GD (1978) Prediction of chain turns in globular proteins on a hyrophobic basis. Nature (London) 272:586–590

    Article  CAS  Google Scholar 

  • Rose GD, Roy S (1980) Hydrophobic basis of packing in globular proteins. Proc Natl Acad Sci USA 77:4643–4647

    CAS  PubMed  Google Scholar 

  • Saroff HA (1984) The uniqueness of protein sequences. Uniqueness diagrams for the Dayhoff file—1984. Bull Math Biol 46:661–672

    Article  CAS  PubMed  Google Scholar 

  • Shakhnovich EI, Gutin AM (1989) Formation of unique structure in polypeptide chains: theoretical investigation with the aid of a replica approach. Biophys Chem 34:187–199

    Article  CAS  PubMed  Google Scholar 

  • Shakhnovich EI, Gutin AM (1990a) Implications of thermodynamics of protein folding for evolution of primary sequences. Nature (London) 346:773–775

    Article  CAS  Google Scholar 

  • Shakhnovich EI, Gutin AM (1990b) Enumeration of all compact conformations of copolymers with random sequence of links. J Chem Phys 93:5967–5971

    Article  CAS  Google Scholar 

  • Vonderviszt F, Matrai G, Simon I (1986) Characteristic sequential residue environment of amino acids in proteins. Int J Peptide Protein Res 27:483–492

    Article  CAS  Google Scholar 

  • Wani JK (1971) Probability and statistical inference. Appleton-Century-Crofts, New York pp 1–315

    Google Scholar 

  • White SH, Jacobs RE (1990) Statistical distribution of hydrophobic residues along the length of protein chains—implications for protein folding and evolution. Biophys J 57:911–921

    Article  CAS  PubMed  Google Scholar 

  • Wilson IA, Haft DH, Getzoff ED, Tainer JA, Lerner RA, Brenner S (1985) Identical short peptide sequences in unrelated proteins can have different conformations: A testing ground for theories of immune recognition. Proc Natl Acad Sci USA 82:5255–5259

    CAS  PubMed  Google Scholar 

  • Ycas M (1958) The protein text. In: Yockey HP (ed) Symposium on information theory in biology. Pergamon Press, New York, pp 70–102

    Google Scholar 

  • Zielenkiewicz P, Plochocka D, Rabczenko A (1988) The formation of protein secondary structure. Its connection with amino acid sequence. Biophys Chem 31:139–142

    Article  CAS  PubMed  Google Scholar 

  • Zimmerman JM, Eliezer N, Simha R (1968) The characterization of amino acid sequences in proteins by statistical methods. J Theor Biol 21:170–201

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

White, S.H., Jacobs, R.E. The evolution of proteins from random amino acid sequences. I. Evidence from the lengthwise distribution of amino acids in modern protein sequences. J Mol Evol 36, 79–95 (1993). https://doi.org/10.1007/BF02407307

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02407307

Key words

Navigation