Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Assessment of similarities of pairs and groups of proteins using transformed amino-acid-residue data

  • 33 Accesses

  • 7 Citations


Using as a primary standard a representative set of 208 proteins whose amino-acid-residue mole frequencies have been accurately established, a set of standard distributions of mole frequencies is defined for each amino acid, in terms of which percentile values for the observed mole frequencies of the amino-acid residues in any other protein can be determined. Data so transformed have a distribution much closer to Gaussian than untransformed values, and allow meaningful determinations of correlations between the amino-acid-residue compositions of two proteins as well as between pairs of amino-acid-residues within groups of proteins. Of the 153 possible pairs of amino acids (Asx and Glx are used) 39 are significantly correlated atp ≤ 0.01 and 22 atp ≤ 0.001. A percentile table is included for those wishing to use the method with programmable calculators.

The transformed data for amino-acid compositions have been used to perform principal components analyses on groups of proteins in order to determine if meaningful sub-groupings (observable clusters in scatter diagrams) were detectable. Such analyses are shown for the representative set of proteins and for a group of 184 globins. With regard to the globin chains, a correlation is observed for alpha chains in the first principal component projection (PCP), (accounting for 22% of the variance) with respect to the evolutionary time-scale while beta chains show such a correlation in the first and second PCPs (22% and 18% of the variance respectively). Thus, alpha and beta chains appear to diverge from a common progenitor, similar in position to globin chains from “primitive” forms. Furthermore, globins from “primitive” forms are nearer to one another than they are to globins from the vertebrates, a finding without a priori reason, suggesting perhaps that once a chain has reached a stable relationship with its environment, strong constraints are placed on the co-existing globin chains so that they maintain appropriate interaction with one another. In addition, positions of the epsilon, gamma and delta chains are in the order: epsilon (embryonal) more primitive than gamma (foetal) more primitive than delta equal to beta (adult).

This is a preview of subscription content, log in to check access.


  1. Black JA, Harkins RN (1977) J Theor Biol 66: 281

  2. Clegg JB, Gagnon J (1981) Proc Natl Acad Sci USA 78:6076

  3. Cornish-Bowden A (1977) J Theor Biol 65:735

  4. Cornish-Bowden A (1979) J Theor Biol 76:369

  5. Deman JR, Gracy RW, Harris BG (1974) Comp Biochem Physiol 46B:715

  6. Doolittle RF (1979) The Proteins Vol 4, Neurath H, Hill RL (eds) Academic Press, New York p 1

  7. Doolittle RF (1981) Science 214:149

  8. Enslein K, Ralston A, Wilf HS (1977) Eds of statistical Methods for Digital Computers. John Wiley & Sons, New York

  9. Harris CE, Kobes RD, Teller DC, Rutter WJ (1969) Biochemistry 8:2442

  10. Harris CE, Teller DC (1973) J Theor Biol 38:347

  11. Kirschenbaum DM (1971) Anal Biochem 44:159

  12. Kirschenbaum DM (1972) Anal Biochem 49:248

  13. Kirschenbaum DM (1973a) Anal Biochem 52:234

  14. Kirschenbaum DM (1973b) Anal Biochem 53:223

  15. Kirschenbaum DM (1973c) Anal Biochem 56:208

  16. Kirschenbaum DM (1974) Anal Biochem 61:567

  17. Kirschenbaum DM (1975a) Anal Biochem 66:123

  18. Kirschenbaum DM (1975b) Anal Biochem 66:303

  19. Kirschenbaum DM (1975c) Anal Biochem 65:466

  20. Kirschenbaum DM (1975d) Anal Biochem 66:590

  21. Kirschenbaum DM (1977a) Anal Biochem 83:484

  22. Kirschenbaum DM (1977b) Anal Biochem 83:521

  23. Marchalonis JJ, Weltman JK (1971) Comp Biochem Physiol 38B:609

  24. Marriott FHC (1974) The Interpretation of Multiple Observations. Academic Press, London

  25. Metzger H, Shapiro MB, Mosimann JE, Vinton JE (1968) Nature 219:1166

  26. Ohnishi K (1978a) Evolution of Protein Molecules, Matsubara H, Yananaka T (eds) Japan Scientific Societies Press, Tokyo p 75

  27. Ohnishi K (1978b) Proc of the 2nd Int Conf on the Origin of Life. Noda H (ed) p 471

  28. Ohnishi K (1978c) Information System for Studies on Molucular Evolution: Report I. Noda H (ed) Japan Ministry of Education, Tokyo p 37 (in Japanese)

  29. Reeck GR (1976) Handbook of Biochemistry and Molecular Biology: Proteins Vol 3 Third edn, Fasman GD (ed) p 504

  30. Reeck GR, Fisher L (1973) Int J Pep Prot Res 5:109

  31. Shapiro HM (1971) Biochem Biophys Acta 236:725

  32. Smith MH (1966) J Theor Biol 13:261

  33. Thompson EOP (1980) In: Evolution of Protein Structure and Function p 267

  34. Williamson PG (1981a) Nature 293:437

  35. Williamson PG (1981b) Nature 294:214

Download references

Author information

Correspondence to A. H. Reisner.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Reisner, A.H., Westwood, N.H. Assessment of similarities of pairs and groups of proteins using transformed amino-acid-residue data. J Mol Evol 18, 240–250 (1982). https://doi.org/10.1007/BF01734102

Download citation

Key words

  • Normalization of amino acid residue data
  • Relatedness of proteins
  • Multivariate analysis of globin chain relatedness
  • Globin chain evolution
  • Principal component analyses of proteins