Journal of Molecular Evolution

, Volume 15, Issue 1, pp 37–57 | Cite as

The size distributions of proteins, mRNA, and nuclear RNA

  • Steve S. Sommer
  • Joel E. Cohen


The frequency distributions of size (molecular weight) and of numbers of subunits were determined from lists of over 500 mammalian and bacterial proteins. The size distribution of polypeptides is well fitted by a lognormal distribution with a median value of about 40,000 daltons and a deviation of 1.8. About 60% of all proteins exist in multimeric aggregates. Of the multimers 75% have either two or four subunits while less than 1% have an odd number of subunits that is greater than three. Over 90% of the time, a given multimer is composed of subunits of nearly equal size so that the size of a N-mer is lognormally distributed with a median value of N x 40,000 daltons and a deviation of 1.8. The distribution of polypeptide size and subunit number is similar for mammalian and bacterial proteins as well as for intracellular and extracellular proteins.

The sedimentation profiles of mRNA from HeLa and CHO cells indicate that the lengths of mammalian mRNA are lognormally distributed with a median value of 1.4 kb and a deviation of 2.0. This implies that, on the average, a mRNA species is only about 25% larger than the mature polypeptide it codes for. Therefore, at most a small fraction of mammalian mRNA could code for large precursor polypeptides which are then cleaved into a number of mature polypeptides (like polio mRNA), or for 3′ coterminal mRNAs where the larger species contain the information for up to four proteins (like adenovirus mRNA).

The sedimentation profile of nascent nuclear RNA from HeLa suggests that the length distribution of transcription units has 2 components: An exponential component that decays with a half-length of 10–15 kb, and a high frequency of very short molecules. However, other distributions (for example, the lognormal distribution) of transcription unit lengths could also be consistent with the data if one or more of the following occurred: Physiological cleavage of nascent chains, perturbation of non-rRNA transcription by actinomycin D, or degradation during isolation.

The length distribution of HeLa nuclear RNA labeled for 60 min is similar to that of nascent nuclear RNA, indicating that a completed hnRNA chain is quickly transported or degraded after being cleaved.

Key words

Lognormal distribution Subunit size Mammalian protein Bacterial protein Sedimentation profile 



heterogeneous RNA


in an exponential distribution, the increase in length required to reduce the frequency by a factor of 2





CHO cells

Chinese hamster ovary cells


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aitchison, J, Brown, J.A.C. (1957): The lognormal distribution, p. 102. Cambridge: Cambridge University PressGoogle Scholar
  2. Baralle, F.E. (1977). Cell10, 549–558Google Scholar
  3. Bishop, J.O. (1974). Cell2, 81–86Google Scholar
  4. Bishop, J.O., Morton, J.G., Rosebach, M., Richardson, R.M. (1974). Nature250, 199–204Google Scholar
  5. Brachet, J. (1967). Nature213, 650–655Google Scholar
  6. Bremer, H., Yuan, D. (1968). J. Mol. Biol.38, 163–180Google Scholar
  7. Darnall, D.W., Klotz, I.M. (1976). In: CRC Handbook of biochemical and molecular biology: Proteins. Fasman, G.D., ed., Vol.2, pp. 325–371, Cleveland: CRC PressGoogle Scholar
  8. Darnell, J.E., Girard, M., Baltimore, D., Summers, D.F., Maizel, J. (1967). In: Molecular biology of viruses. Cotter, J., ed., New York: AcademicGoogle Scholar
  9. Derman, E., Darnell, J.E. (1974). Cell3, 255–264Google Scholar
  10. Derman, E., Goldberg, S., Darnell, J.E. (1976). Cell9, 465–472Google Scholar
  11. Eagle, H. (1959). Science130, 432–437Google Scholar
  12. Edwards, Y.H., Hopkinson, D.A., Harris, H. (1977). Ann. Hum. Genet.40, 267–277Google Scholar
  13. Efstratiadis, A., Kafatos, F.C., Maniatis, T. (1977). Cell10, 571–586Google Scholar
  14. Egyhazi, E. (1975). Proc. Nat. Acad. Sci.72, 947–950Google Scholar
  15. Feller, W. (1966). An introduction to probability theory and its applications., Vol.2, New York: WileyGoogle Scholar
  16. Gibrat, R. (1931). Les Inégalitiés Economique, Paris: Libraire de Recueil, SireyGoogle Scholar
  17. Giorno, R., Sauerbier, W. (1976). Cell9, 775–786Google Scholar
  18. Goldberg, S., Schwartz, H., Darnell, J.E. (1977). Proc. Nat. Acad. Sci.74, 4520–4523Google Scholar
  19. Greenberg, H., Penman, S. (1966). J. Mol. Biol.21, 527–535Google Scholar
  20. Herman, R.C., Penman, S. (1977). Biochemistry16, 3460–3465Google Scholar
  21. Hopkinson, D.A., Edwards, Y.H., Harris, H. (1976). Ann. Hum. Genet.39, 383–411Google Scholar
  22. Hruby, P.E., Maki, R.A., Cummings, D.J. (1977). Biochim. Biophys. Acta,47, 89–96Google Scholar
  23. Jelinek, W., Leinwand, L. (1978). Cell15, 205–214Google Scholar
  24. Karlin, S., Taylor, H.M. (1975). A first course in stochastic processes. New York: AcademicGoogle Scholar
  25. Kleczkowski, A. (1949). Ann. Appl. Biol.36, 139–152Google Scholar
  26. Koehn, R.K., Eanes, W.F. (1978). Evolutionary Biol.11, 39–100Google Scholar
  27. Levis, R., Penman, S. (1977). Cell11, 105–113Google Scholar
  28. MacReynolds, L.A., O'Malley, B.W., Nesbet, A.D., Fothergill, J.E., Givol, D., Fields, S., Robertson, M., Brownlee, G.G. (1978). Nature273, 723–728Google Scholar
  29. Malloy, G.R., Jelinek, W., Salditt, M., Darnell, J.R. (1974). Cell1, 43–53Google Scholar
  30. Masson, P.L. (1976). In: CRC Handbook of biochemical and molecular biology: Proteins. Fasman, G.D., ed., Vol.2, pp. 242–253, Cleveland: CRC PressGoogle Scholar
  31. Milcarek, C., Price, R., Penman, S. (1974). Cell3, 1–10Google Scholar
  32. Nei, M., Chakraborty, R., Fuerst, P.A. (1976). Proc. Nat. Acad. Sci.73, 4164–4168Google Scholar
  33. Nei, M., Fuerst, P.A., Chakraborty, R. (1978). Proc. Nat. Acad. Sci.75, 3359–3362Google Scholar
  34. Nemer, M., Dubroff, C.M., Graham, M. (1975). Cell6, 171–178Google Scholar
  35. Nevins, J., Darnell, J.E. (1978). J. Virology25, 811–825Google Scholar
  36. Nikolaev, N., Silengo, L., Schlessinger, D. (1973). Proc. Nat. Acad. Sci.70, 3361–3365Google Scholar
  37. Pearson, E.S., Hartley, H.O. (1966, 1972). Biometrika tables for statisticians, Vol. 1 and 2., Cambridge: Cambridge University PressGoogle Scholar
  38. Penman, S. (1966). J. Mol. Biol.17, 117–130Google Scholar
  39. Penman, S., Scherrer, K., Becker, Y., Darnell, J.E. (1963). Proc. Nat. Acad. Sci.49, 654–662Google Scholar
  40. Perry, R.P. (1963). Exp. Cell Research29, 400–406Google Scholar
  41. Perry, R.P., Latorre, J., Kelly, D.E., Greenberg, J.A. (1972). Biochim. Biophys. Acta262, 220–226Google Scholar
  42. Peterson, J.L., McConkey, L. (1976). J. Biol. Chem.251, 548–554Google Scholar
  43. Polasa, H., Green, M. (1967) Virology31, 565–567Google Scholar
  44. Preston, F.W. (1948). Ecology29, 254–283Google Scholar
  45. Proudfoot, N.J. (1977). Cell10, 559–570Google Scholar
  46. Proudfoot, N.J., Gillam, S., Smith, M., Longley, J.I. (1977). Cell11, 807–818Google Scholar
  47. Puckett, L., Darnell, J.E. (1976). J. Cell Physiol.90, 521–534Google Scholar
  48. Reddy, V.B., Thimmappaya, B., Dhar, R., Subramanian, K.N., Zain, B.S., Pan, J., Ghosh, P.K., Celma, M.L., Weissman, S.M. (1978). Science200, 494–502Google Scholar
  49. Reeck, G. (1976). In: CRC Handbook of biochemistry and molecular biology: Proteins. Fasman, G.D., ed., Vol.3, pp. 504–519, Cleveland: CRC PressGoogle Scholar
  50. Sanger, F., Dir, G.M., Barrell, B.G., Brown, B.L., Coulson, H.R., Fiddes, J.C., Hutchinson, C.V., Slocombe, P.M., Smith, M. (1976). Nature265, 687–698Google Scholar
  51. Sawicki, S., Jelinek, W., Darnell, J.E. (1977), J. Mol. Biol.113, 219–239Google Scholar
  52. Spradling, A., Hui, H., Penman, S. (1974). Cell4, 131–137Google Scholar
  53. Strauss, J.H., Kelly, R.B., Sinsheimer, R.I. (1968). Biopolymers6, 793–807Google Scholar
  54. Sueoka, N. (1961). Proc. Nat. Acad. Sci.47, 1141–1149Google Scholar
  55. U.S. Department of Commerce, Office of Business Economics. (1952). Income distribution in the United States, Washington, D.C.: US Govt. Printing OfficeGoogle Scholar
  56. Vallee, B.L., Wacker, W.E.C. (1976). In: CRC Handbook of biochemistry and molecular biology: Proteins. Fasman, G.D., ed., Vol.3, pp. 278–292, Cleveland: CRC PressGoogle Scholar
  57. Villa-Komaroff, C., Guttman, N., Baltimore, D., Lodish, H.F. (1975). Proc. Nat. Acad. Sci.72, 4157–4161Google Scholar
  58. Williams, C.B. (1937). Ann. Appl. Biol.24, 404–414Google Scholar
  59. Yuan, P.T. (1933). Ann. Math. Statistics6, 20–34Google Scholar

Copyright information

© Springer-Verlag 1980

Authors and Affiliations

  • Steve S. Sommer
    • 1
  • Joel E. Cohen
    • 1
  1. 1.The Rockefeller UniversityNew YorkUSA
  2. 2.Department of PathologyNational Cancer InstituteBethesdaUSA

Personalised recommendations