The size distributions of proteins, mRNA, and nuclear RNA

Summary

The frequency distributions of size (molecular weight) and of numbers of subunits were determined from lists of over 500 mammalian and bacterial proteins. The size distribution of polypeptides is well fitted by a lognormal distribution with a median value of about 40,000 daltons and a deviation of 1.8. About 60% of all proteins exist in multimeric aggregates. Of the multimers 75% have either two or four subunits while less than 1% have an odd number of subunits that is greater than three. Over 90% of the time, a given multimer is composed of subunits of nearly equal size so that the size of a N-mer is lognormally distributed with a median value of N x 40,000 daltons and a deviation of 1.8. The distribution of polypeptide size and subunit number is similar for mammalian and bacterial proteins as well as for intracellular and extracellular proteins.

The sedimentation profiles of mRNA from HeLa and CHO cells indicate that the lengths of mammalian mRNA are lognormally distributed with a median value of 1.4 kb and a deviation of 2.0. This implies that, on the average, a mRNA species is only about 25% larger than the mature polypeptide it codes for. Therefore, at most a small fraction of mammalian mRNA could code for large precursor polypeptides which are then cleaved into a number of mature polypeptides (like polio mRNA), or for 3′ coterminal mRNAs where the larger species contain the information for up to four proteins (like adenovirus mRNA).

The sedimentation profile of nascent nuclear RNA from HeLa suggests that the length distribution of transcription units has 2 components: An exponential component that decays with a half-length of 10–15 kb, and a high frequency of very short molecules. However, other distributions (for example, the lognormal distribution) of transcription unit lengths could also be consistent with the data if one or more of the following occurred: Physiological cleavage of nascent chains, perturbation of non-rRNA transcription by actinomycin D, or degradation during isolation.

The length distribution of HeLa nuclear RNA labeled for 60 min is similar to that of nascent nuclear RNA, indicating that a completed hnRNA chain is quickly transported or degraded after being cleaved.

This is a preview of subscription content, access via your institution.

Abbreviations

hRNA:

heterogeneous RNA

L1/2 :

in an exponential distribution, the increase in length required to reduce the frequency by a factor of 2

kb:

kilobases

kd:

kilodaltons

CHO cells:

Chinese hamster ovary cells

References

  1. Aitchison, J, Brown, J.A.C. (1957): The lognormal distribution, p. 102. Cambridge: Cambridge University Press

    Google Scholar 

  2. Baralle, F.E. (1977). Cell10, 549–558

    Google Scholar 

  3. Bishop, J.O. (1974). Cell2, 81–86

    Google Scholar 

  4. Bishop, J.O., Morton, J.G., Rosebach, M., Richardson, R.M. (1974). Nature250, 199–204

    Google Scholar 

  5. Brachet, J. (1967). Nature213, 650–655

    Google Scholar 

  6. Bremer, H., Yuan, D. (1968). J. Mol. Biol.38, 163–180

    Google Scholar 

  7. Darnall, D.W., Klotz, I.M. (1976). In: CRC Handbook of biochemical and molecular biology: Proteins. Fasman, G.D., ed., Vol.2, pp. 325–371, Cleveland: CRC Press

    Google Scholar 

  8. Darnell, J.E., Girard, M., Baltimore, D., Summers, D.F., Maizel, J. (1967). In: Molecular biology of viruses. Cotter, J., ed., New York: Academic

    Google Scholar 

  9. Derman, E., Darnell, J.E. (1974). Cell3, 255–264

    Google Scholar 

  10. Derman, E., Goldberg, S., Darnell, J.E. (1976). Cell9, 465–472

    Google Scholar 

  11. Eagle, H. (1959). Science130, 432–437

    Google Scholar 

  12. Edwards, Y.H., Hopkinson, D.A., Harris, H. (1977). Ann. Hum. Genet.40, 267–277

    Google Scholar 

  13. Efstratiadis, A., Kafatos, F.C., Maniatis, T. (1977). Cell10, 571–586

    Google Scholar 

  14. Egyhazi, E. (1975). Proc. Nat. Acad. Sci.72, 947–950

    Google Scholar 

  15. Feller, W. (1966). An introduction to probability theory and its applications., Vol.2, New York: Wiley

    Google Scholar 

  16. Gibrat, R. (1931). Les Inégalitiés Economique, Paris: Libraire de Recueil, Sirey

    Google Scholar 

  17. Giorno, R., Sauerbier, W. (1976). Cell9, 775–786

    Google Scholar 

  18. Goldberg, S., Schwartz, H., Darnell, J.E. (1977). Proc. Nat. Acad. Sci.74, 4520–4523

    Google Scholar 

  19. Greenberg, H., Penman, S. (1966). J. Mol. Biol.21, 527–535

    Google Scholar 

  20. Herman, R.C., Penman, S. (1977). Biochemistry16, 3460–3465

    Google Scholar 

  21. Hopkinson, D.A., Edwards, Y.H., Harris, H. (1976). Ann. Hum. Genet.39, 383–411

    Google Scholar 

  22. Hruby, P.E., Maki, R.A., Cummings, D.J. (1977). Biochim. Biophys. Acta,47, 89–96

    Google Scholar 

  23. Jelinek, W., Leinwand, L. (1978). Cell15, 205–214

    Google Scholar 

  24. Karlin, S., Taylor, H.M. (1975). A first course in stochastic processes. New York: Academic

    Google Scholar 

  25. Kleczkowski, A. (1949). Ann. Appl. Biol.36, 139–152

    Google Scholar 

  26. Koehn, R.K., Eanes, W.F. (1978). Evolutionary Biol.11, 39–100

    Google Scholar 

  27. Levis, R., Penman, S. (1977). Cell11, 105–113

    Google Scholar 

  28. MacReynolds, L.A., O'Malley, B.W., Nesbet, A.D., Fothergill, J.E., Givol, D., Fields, S., Robertson, M., Brownlee, G.G. (1978). Nature273, 723–728

    Google Scholar 

  29. Malloy, G.R., Jelinek, W., Salditt, M., Darnell, J.R. (1974). Cell1, 43–53

    Google Scholar 

  30. Masson, P.L. (1976). In: CRC Handbook of biochemical and molecular biology: Proteins. Fasman, G.D., ed., Vol.2, pp. 242–253, Cleveland: CRC Press

    Google Scholar 

  31. Milcarek, C., Price, R., Penman, S. (1974). Cell3, 1–10

    Google Scholar 

  32. Nei, M., Chakraborty, R., Fuerst, P.A. (1976). Proc. Nat. Acad. Sci.73, 4164–4168

    Google Scholar 

  33. Nei, M., Fuerst, P.A., Chakraborty, R. (1978). Proc. Nat. Acad. Sci.75, 3359–3362

    Google Scholar 

  34. Nemer, M., Dubroff, C.M., Graham, M. (1975). Cell6, 171–178

    Google Scholar 

  35. Nevins, J., Darnell, J.E. (1978). J. Virology25, 811–825

    Google Scholar 

  36. Nikolaev, N., Silengo, L., Schlessinger, D. (1973). Proc. Nat. Acad. Sci.70, 3361–3365

    Google Scholar 

  37. Pearson, E.S., Hartley, H.O. (1966, 1972). Biometrika tables for statisticians, Vol. 1 and 2., Cambridge: Cambridge University Press

    Google Scholar 

  38. Penman, S. (1966). J. Mol. Biol.17, 117–130

    Google Scholar 

  39. Penman, S., Scherrer, K., Becker, Y., Darnell, J.E. (1963). Proc. Nat. Acad. Sci.49, 654–662

    Google Scholar 

  40. Perry, R.P. (1963). Exp. Cell Research29, 400–406

    Google Scholar 

  41. Perry, R.P., Latorre, J., Kelly, D.E., Greenberg, J.A. (1972). Biochim. Biophys. Acta262, 220–226

    Google Scholar 

  42. Peterson, J.L., McConkey, L. (1976). J. Biol. Chem.251, 548–554

    Google Scholar 

  43. Polasa, H., Green, M. (1967) Virology31, 565–567

    Google Scholar 

  44. Preston, F.W. (1948). Ecology29, 254–283

    Google Scholar 

  45. Proudfoot, N.J. (1977). Cell10, 559–570

    Google Scholar 

  46. Proudfoot, N.J., Gillam, S., Smith, M., Longley, J.I. (1977). Cell11, 807–818

    Google Scholar 

  47. Puckett, L., Darnell, J.E. (1976). J. Cell Physiol.90, 521–534

    Google Scholar 

  48. Reddy, V.B., Thimmappaya, B., Dhar, R., Subramanian, K.N., Zain, B.S., Pan, J., Ghosh, P.K., Celma, M.L., Weissman, S.M. (1978). Science200, 494–502

    Google Scholar 

  49. Reeck, G. (1976). In: CRC Handbook of biochemistry and molecular biology: Proteins. Fasman, G.D., ed., Vol.3, pp. 504–519, Cleveland: CRC Press

    Google Scholar 

  50. Sanger, F., Dir, G.M., Barrell, B.G., Brown, B.L., Coulson, H.R., Fiddes, J.C., Hutchinson, C.V., Slocombe, P.M., Smith, M. (1976). Nature265, 687–698

    Google Scholar 

  51. Sawicki, S., Jelinek, W., Darnell, J.E. (1977), J. Mol. Biol.113, 219–239

    Google Scholar 

  52. Spradling, A., Hui, H., Penman, S. (1974). Cell4, 131–137

    Google Scholar 

  53. Strauss, J.H., Kelly, R.B., Sinsheimer, R.I. (1968). Biopolymers6, 793–807

    Google Scholar 

  54. Sueoka, N. (1961). Proc. Nat. Acad. Sci.47, 1141–1149

    Google Scholar 

  55. U.S. Department of Commerce, Office of Business Economics. (1952). Income distribution in the United States, Washington, D.C.: US Govt. Printing Office

    Google Scholar 

  56. Vallee, B.L., Wacker, W.E.C. (1976). In: CRC Handbook of biochemistry and molecular biology: Proteins. Fasman, G.D., ed., Vol.3, pp. 278–292, Cleveland: CRC Press

    Google Scholar 

  57. Villa-Komaroff, C., Guttman, N., Baltimore, D., Lodish, H.F. (1975). Proc. Nat. Acad. Sci.72, 4157–4161

    Google Scholar 

  58. Williams, C.B. (1937). Ann. Appl. Biol.24, 404–414

    Google Scholar 

  59. Yuan, P.T. (1933). Ann. Math. Statistics6, 20–34

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Steve S. Sommer.

Additional information

This paper is dedicated to Harold Sommer

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Sommer, S.S., Cohen, J.E. The size distributions of proteins, mRNA, and nuclear RNA. J Mol Evol 15, 37–57 (1980). https://doi.org/10.1007/BF01732582

Download citation

Key words

  • Lognormal distribution
  • Subunit size
  • Mammalian protein
  • Bacterial protein
  • Sedimentation profile