Skip to main content
Log in

On the origin of coding sequences from random open reading frames

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Summary

The size distribution of 411 randomly selected mammalian exons was investigated. This distribution was found to be unimodal with a frequency maximum of 120 bp. Detailed analysis of the distribution demonstrated that larger exons (>150 bp) have a high goodness of fit to the size distribution of open reading frames (ORFs) in a random sequence, i.e., (61/64)t in which t is the number of triplets. Based on this observation, the general character of the total exon size distribution suggested that this could be defined by a theoretical distribution by superimposing a sigmoid function on the ORF generating function, i.e., (61/64)t×fs(t)×E in which fs(t) is a sigmoid function and E is a constant. We tested this distribution for fitness to the exon distribution using two sigmoid functions. fs(t)=Φ(t) and fs(t)=Bekt/1+Bekt. In both cases a very high goodness of fit was attained. It is concluded that exons have been generated from ORFs in random sequences, that ORFs larger than 150 bp have been selected, irrespective of size, as exons, and that a lower size limit exists below which the probability of an ORF being selected as an exon is very low. These results provide evidence at the molecular level to support the ideas that (1) larger exons have been selected from random ORFs without primary correlation to structural or functional properties at the protein level, (2) there exists a restriction on smaller ORFs to be selected as exons, and (3) the interrupted coding sequences found in eukaryotes represent the ancient form of gene organization that existed prior to the divergence of prokaryotes and eukaryotes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Blake C (1985) Exons and the evolution of proteins. Int Rev Cytol 93:149–185

    PubMed  Google Scholar 

  • de Crombrugghe B, Pastan I (1982) Structure and regulation of a collagen gene. Trends Biochem Sci 7:11–13

    Google Scholar 

  • Frendewey D, Keller W (1985) Stepwise assembly of a pre-mRNA splicing complex requires U-snRNPs and specific intron sequences. Cell 42:355–367

    PubMed  Google Scholar 

  • Gilbert W (1978) Why genes in pieces? Nature (London) 271:501

    Google Scholar 

  • Go M (1981) Correlation of DNA exonic regions with protein structural units in haemoglobin. Nature (London) 291:90–92

    Google Scholar 

  • Go M (1983) Modular structural units, exons and function in chicken lysozyme. Proc Natl Acad Sci USA 80:1964–1968

    PubMed  Google Scholar 

  • Hawkins JD (1988) A survey of intron and exon lengths. Nucleic Acids Res 16:9893–9908

    PubMed  Google Scholar 

  • Naora H, Deacon NJ (1982) Relationships between the total size of exons and introns in protein-coding genes of higher eukaryotes. Proc Natl Acad Sci USA 79:6196–6200

    PubMed  Google Scholar 

  • Reed R, Maniatis T (1986) A role for exon sequences and splicesite proximity in splice-site selection. Cell 46:681–690

    PubMed  Google Scholar 

  • SAS Institute Inc (1982) SAS user's guide: basics, 1982 ed. SAS Institute, Cary NC

    Google Scholar 

  • Savageau MA (1986) Proteins ofEscherichia coli comes in sizes that are multiples of 14 kDa: domain concepts and evolutionary implications. Proc Natl Acad Sci USA 83:1198–1202

    PubMed  Google Scholar 

  • Senapathy P (1986) Origin of eukaryotic introns: a hypothesis, based on codon distribution statistics in genes, and its implications. Proc Natl Acad Sci USA 83:2133–2137

    PubMed  Google Scholar 

  • Sharp PA (1987) Splicing of messenger RNA percursors. Science 235:766–771

    PubMed  Google Scholar 

  • Shih MC, Heinrich P, Goodman HM (1988) Intron existence predated the divergence of eukaryotes and prokaryotes. Science 242:1164–1166

    PubMed  Google Scholar 

  • Smith MW (1988) Structure of vertebrate genes: a statistical analysis implicating selection. J Mol Evol 27:45–55

    PubMed  Google Scholar 

  • Südhof TC, Goldstein JL, Brown MS, Russel DW (1985) The LDL receptor gene: a mosaic of exons shared with different proteins. Science 228:815–822

    PubMed  Google Scholar 

  • Traut TW (1988) Do exons code for structural or functional units in proteins? Proc Natl Acad Sci USA 85:2944–2948

    PubMed  Google Scholar 

  • Turnbull-Ross AD, Else AJ, Eperon IC (1988) The dependence of splicing efficiency of the length of 3′-exon. Nucleic Acids Res 16:395–411

    PubMed  Google Scholar 

  • von Heijne G (1985) Signal sequences, the limits of variation. J Mol Biol 184:99–105

    PubMed  Google Scholar 

  • Whetlaufer DB (1981) Folding of protein fragments. Adv Protein Chem 34:61–92

    PubMed  Google Scholar 

  • Protein C. Plutzky J et al. (1986) Proc Natl Acad Sci USA 83:546

    PubMed  Google Scholar 

  • Gastrin. Wiborg O et al. (1984) Proc Natl Acad Sci USA 81:1067

    PubMed  Google Scholar 

  • Phosphoglycerate kinase (PGK). Michelson AM et al. (1985) Proc Natl Acad Sci USA 82:6965

    PubMed  Google Scholar 

  • Adenine phosphoribosyltransferase (APRT). Dush MK et al. (1985) Proc Natl Acad Sci USA 82:3731

    PubMed  Google Scholar 

  • Keratin (Ia). Lehnert ME et al. (1984) EMBO J 3:279

    PubMed  Google Scholar 

  • Growth hormone-releasing factor precursor. Mayo KE et al. (1985) Proc Natl Acad Sci USA 82:63

    PubMed  Google Scholar 

  • Renin. Hobart PM et al. (1984) Proc Natl Acad Sci USA 81:5026

    PubMed  Google Scholar 

  • Cardiac-α-myosin heavy chain. Mahdavi V et al. (1984) Proc Natl Acad Sci USA 81:2626

    PubMed  Google Scholar 

  • Desmin. Quax W et al. (1985) Cell 43:327

    PubMed  Google Scholar 

  • Myelin basic protein. Takahashi N et al. (1985) Cell 42:139

    PubMed  Google Scholar 

  • Human keratin (50 kd). Marchuk D et al. (1984) Cell 39:491

    PubMed  Google Scholar 

  • Granulocyte-macrophage colony-stimulating factor. Miyatake S et al. EMBO J 4:2561

  • Nerve growth factor (NGF). Evans BA, Richards RI (1985) EMBO J 4:133

    PubMed  Google Scholar 

  • Haptoglobin. Bensi G et al. (1985) EMBO J 4:119

    PubMed  Google Scholar 

  • Cu/Zn superoxide dismutase. Levanon D et al. (1985) EMBO J 4:77

    PubMed  Google Scholar 

  • Prolactin. Truong AT et al. (1984) EMBO J 3:429

    PubMed  Google Scholar 

  • SB α2. Servenius B et al. (1984) EMBO J 4:3209

    Google Scholar 

  • Int-1. van Ooyen A et al. (1985) EMBO J 4:2905

    PubMed  Google Scholar 

  • Glutathione peroxidase. Chambers I et al. (1986) EMBO J 5:1221

    PubMed  Google Scholar 

  • C-fes/fps proto-oncogene. Roebroek AJM et al. (1985) EMBO J 4:2897

    PubMed  Google Scholar 

  • H-2K k Arnorld B et al. (1984) Nucleic Acids Res 12:9473

    PubMed  Google Scholar 

  • Apolipoprotein A4. Karathanasis SK et al. (1986) Proc Natl Acad Sci USA 83:8457

    PubMed  Google Scholar 

  • Glial fibrillary acidic protein. Balcarek JM, Cowan NJ (1985) Nucleic Acids Res 13:5527

    PubMed  Google Scholar 

  • Basic fibroblast growth factor. Abraham JA et al. (1986) EMBO J 5:2523

    PubMed  Google Scholar 

  • Tumor necrosis factor. Nedwin GE et al. (1985) Nucleic Acids Res 13:6361

    PubMed  Google Scholar 

  • Interleukin-2 receptor. Ishida N et al. (1985) Nucleic Acids Res 13:7579

    PubMed  Google Scholar 

  • Interleukin-2. Fuse A et al. (1984) Nucleic Acids Res 12:9323

    PubMed  Google Scholar 

  • E β2. Braunstein NS, Germain RN (1986) EMBO J 5:2469

    PubMed  Google Scholar 

  • 21-Hydroxylase. Chaplin DD et al. (1986) Proc Natl Acad Sci USA 83:9601

    PubMed  Google Scholar 

  • Myosin light chain 2. Nudel U et al. (1984) Nucleic Acids Res 12:7175

    PubMed  Google Scholar 

  • Alkali myosin light chain. Daubas P et al. (1985) Nucleic Acids Res 13:4623

    PubMed  Google Scholar 

  • α 1-acid glycoprotein. Dente L et al. (1985) Nucleic Acids Res 13:3941

    PubMed  Google Scholar 

  • Urokinase-plasminogen activator. Riccio A et al. (1985) Nucleic Acids Res 13:2759

    PubMed  Google Scholar 

  • α-cardiac actin. Chang KS et al. (1985) Nucleic Acids Res 13:1223

    PubMed  Google Scholar 

  • Pyruvate kinase (PK). Lonberg N, Gilbert W (1985) Cell 40:81

    PubMed  Google Scholar 

  • Thymidine kinase (TK). Kwoh TJ, Engler JA (1984) Nucleic Acids Res 12:3959

    PubMed  Google Scholar 

  • Corticotropin-β-lipotropin precursor. Takahashi H et al. (1983) Nucleic Acids Res 11:6847

    PubMed  Google Scholar 

  • I-Eαd. Hyldig-Nielsen JJ et al. (1983) Nucleic Acids Res 11:5055

    PubMed  Google Scholar 

  • γ4-chrystallin. Lok S et al. (1984) Nucleic Acids Res 12:4517

    PubMed  Google Scholar 

  • Cytoplasmic β-actin. Nudel U et al. (1983) Nucleic Acids Res 11:1759

    PubMed  Google Scholar 

  • IgG Cγ1. Ellison JW et al. (1982) Nucleic Acids Res 10:4071

    PubMed  Google Scholar 

  • Carbonic anhydrase II. Yoshihara CM et al. (1987) Nucleic Acids Res 15:753

    PubMed  Google Scholar 

  • Neural cell adhesion molecule (N-CAM). Owens GC et al. (1987) Proc Natl Acad Sci USA 84:294

    PubMed  Google Scholar 

  • Glucose 6-phosphate dehydrogenase (G6PD). Martini G et al. (1986) EMBO J 5:1849

    PubMed  Google Scholar 

  • Thy-1. Giguere V et al. (1985) EMBO J 4:2017

    PubMed  Google Scholar 

  • Murine cellular tumor antigen p53. Bienz B et al. (1984) EMBO J 3:2179

    PubMed  Google Scholar 

  • I-Eβb: Widera G, Flavell RA (1984) EMBO J 3:1221

    PubMed  Google Scholar 

  • Major urinary protein (MUP) BS-6. Clark AJ et al. (1984) EMBO J 3:1045

    PubMed  Google Scholar 

  • HLA-CW3. Sodoyer R et al. (1984) EMBO J 3:879

    PubMed  Google Scholar 

  • C-myc. Gazin C et al. (1984) EMBO J 3:383

    PubMed  Google Scholar 

  • Myoglobin. Weller P et al. (1984) EMBO J 3:439

    PubMed  Google Scholar 

  • Insulin-like growth factor (IGF). Bell GI et al. (1985) Proc Natl Acad Sci USA 82:6450

    PubMed  Google Scholar 

  • T-cell receptor/T3δ. van den Elsen P et al. (1986) Proc Natl Acad Sci USA 83:2944

    PubMed  Google Scholar 

  • C-sis/platelet-derived growth factor 2 (SIS/PDGF2). Rao CD et al. (1986) Proc Natl Acad Sci USA 83:2392

    PubMed  Google Scholar 

  • Pro-α1(IV) collagen. Soininen R et al. (1986) Proc Natl Acad Sci USA 83:1568

    PubMed  Google Scholar 

  • C-Ha-ras-1. Sekiya T et al. (1984) Proc Natl Acad Sci USA 81:4771

    PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Höglund, M., Säll, T. & Röhme, D. On the origin of coding sequences from random open reading frames. J Mol Evol 30, 104–108 (1990). https://doi.org/10.1007/BF02099936

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02099936

Key words

Navigation