Summary
The size distribution of 411 randomly selected mammalian exons was investigated. This distribution was found to be unimodal with a frequency maximum of 120 bp. Detailed analysis of the distribution demonstrated that larger exons (>150 bp) have a high goodness of fit to the size distribution of open reading frames (ORFs) in a random sequence, i.e., (61/64)t in which t is the number of triplets. Based on this observation, the general character of the total exon size distribution suggested that this could be defined by a theoretical distribution by superimposing a sigmoid function on the ORF generating function, i.e., (61/64)t×fs(t)×E in which fs(t) is a sigmoid function and E is a constant. We tested this distribution for fitness to the exon distribution using two sigmoid functions. fs(t)=Φ(t) and fs(t)=Bekt/1+Bekt. In both cases a very high goodness of fit was attained. It is concluded that exons have been generated from ORFs in random sequences, that ORFs larger than 150 bp have been selected, irrespective of size, as exons, and that a lower size limit exists below which the probability of an ORF being selected as an exon is very low. These results provide evidence at the molecular level to support the ideas that (1) larger exons have been selected from random ORFs without primary correlation to structural or functional properties at the protein level, (2) there exists a restriction on smaller ORFs to be selected as exons, and (3) the interrupted coding sequences found in eukaryotes represent the ancient form of gene organization that existed prior to the divergence of prokaryotes and eukaryotes.
Similar content being viewed by others
References
Blake C (1985) Exons and the evolution of proteins. Int Rev Cytol 93:149–185
de Crombrugghe B, Pastan I (1982) Structure and regulation of a collagen gene. Trends Biochem Sci 7:11–13
Frendewey D, Keller W (1985) Stepwise assembly of a pre-mRNA splicing complex requires U-snRNPs and specific intron sequences. Cell 42:355–367
Gilbert W (1978) Why genes in pieces? Nature (London) 271:501
Go M (1981) Correlation of DNA exonic regions with protein structural units in haemoglobin. Nature (London) 291:90–92
Go M (1983) Modular structural units, exons and function in chicken lysozyme. Proc Natl Acad Sci USA 80:1964–1968
Hawkins JD (1988) A survey of intron and exon lengths. Nucleic Acids Res 16:9893–9908
Naora H, Deacon NJ (1982) Relationships between the total size of exons and introns in protein-coding genes of higher eukaryotes. Proc Natl Acad Sci USA 79:6196–6200
Reed R, Maniatis T (1986) A role for exon sequences and splicesite proximity in splice-site selection. Cell 46:681–690
SAS Institute Inc (1982) SAS user's guide: basics, 1982 ed. SAS Institute, Cary NC
Savageau MA (1986) Proteins ofEscherichia coli comes in sizes that are multiples of 14 kDa: domain concepts and evolutionary implications. Proc Natl Acad Sci USA 83:1198–1202
Senapathy P (1986) Origin of eukaryotic introns: a hypothesis, based on codon distribution statistics in genes, and its implications. Proc Natl Acad Sci USA 83:2133–2137
Sharp PA (1987) Splicing of messenger RNA percursors. Science 235:766–771
Shih MC, Heinrich P, Goodman HM (1988) Intron existence predated the divergence of eukaryotes and prokaryotes. Science 242:1164–1166
Smith MW (1988) Structure of vertebrate genes: a statistical analysis implicating selection. J Mol Evol 27:45–55
Südhof TC, Goldstein JL, Brown MS, Russel DW (1985) The LDL receptor gene: a mosaic of exons shared with different proteins. Science 228:815–822
Traut TW (1988) Do exons code for structural or functional units in proteins? Proc Natl Acad Sci USA 85:2944–2948
Turnbull-Ross AD, Else AJ, Eperon IC (1988) The dependence of splicing efficiency of the length of 3′-exon. Nucleic Acids Res 16:395–411
von Heijne G (1985) Signal sequences, the limits of variation. J Mol Biol 184:99–105
Whetlaufer DB (1981) Folding of protein fragments. Adv Protein Chem 34:61–92
Protein C. Plutzky J et al. (1986) Proc Natl Acad Sci USA 83:546
Gastrin. Wiborg O et al. (1984) Proc Natl Acad Sci USA 81:1067
Phosphoglycerate kinase (PGK). Michelson AM et al. (1985) Proc Natl Acad Sci USA 82:6965
Adenine phosphoribosyltransferase (APRT). Dush MK et al. (1985) Proc Natl Acad Sci USA 82:3731
Keratin (Ia). Lehnert ME et al. (1984) EMBO J 3:279
Growth hormone-releasing factor precursor. Mayo KE et al. (1985) Proc Natl Acad Sci USA 82:63
Renin. Hobart PM et al. (1984) Proc Natl Acad Sci USA 81:5026
Cardiac-α-myosin heavy chain. Mahdavi V et al. (1984) Proc Natl Acad Sci USA 81:2626
Desmin. Quax W et al. (1985) Cell 43:327
Myelin basic protein. Takahashi N et al. (1985) Cell 42:139
Human keratin (50 kd). Marchuk D et al. (1984) Cell 39:491
Granulocyte-macrophage colony-stimulating factor. Miyatake S et al. EMBO J 4:2561
Nerve growth factor (NGF). Evans BA, Richards RI (1985) EMBO J 4:133
Haptoglobin. Bensi G et al. (1985) EMBO J 4:119
Cu/Zn superoxide dismutase. Levanon D et al. (1985) EMBO J 4:77
Prolactin. Truong AT et al. (1984) EMBO J 3:429
SB α2. Servenius B et al. (1984) EMBO J 4:3209
Int-1. van Ooyen A et al. (1985) EMBO J 4:2905
Glutathione peroxidase. Chambers I et al. (1986) EMBO J 5:1221
C-fes/fps proto-oncogene. Roebroek AJM et al. (1985) EMBO J 4:2897
H-2K k Arnorld B et al. (1984) Nucleic Acids Res 12:9473
Apolipoprotein A4. Karathanasis SK et al. (1986) Proc Natl Acad Sci USA 83:8457
Glial fibrillary acidic protein. Balcarek JM, Cowan NJ (1985) Nucleic Acids Res 13:5527
Basic fibroblast growth factor. Abraham JA et al. (1986) EMBO J 5:2523
Tumor necrosis factor. Nedwin GE et al. (1985) Nucleic Acids Res 13:6361
Interleukin-2 receptor. Ishida N et al. (1985) Nucleic Acids Res 13:7579
Interleukin-2. Fuse A et al. (1984) Nucleic Acids Res 12:9323
E β2. Braunstein NS, Germain RN (1986) EMBO J 5:2469
21-Hydroxylase. Chaplin DD et al. (1986) Proc Natl Acad Sci USA 83:9601
Myosin light chain 2. Nudel U et al. (1984) Nucleic Acids Res 12:7175
Alkali myosin light chain. Daubas P et al. (1985) Nucleic Acids Res 13:4623
α 1-acid glycoprotein. Dente L et al. (1985) Nucleic Acids Res 13:3941
Urokinase-plasminogen activator. Riccio A et al. (1985) Nucleic Acids Res 13:2759
α-cardiac actin. Chang KS et al. (1985) Nucleic Acids Res 13:1223
Pyruvate kinase (PK). Lonberg N, Gilbert W (1985) Cell 40:81
Thymidine kinase (TK). Kwoh TJ, Engler JA (1984) Nucleic Acids Res 12:3959
Corticotropin-β-lipotropin precursor. Takahashi H et al. (1983) Nucleic Acids Res 11:6847
I-Eαd. Hyldig-Nielsen JJ et al. (1983) Nucleic Acids Res 11:5055
γ4-chrystallin. Lok S et al. (1984) Nucleic Acids Res 12:4517
Cytoplasmic β-actin. Nudel U et al. (1983) Nucleic Acids Res 11:1759
IgG Cγ1. Ellison JW et al. (1982) Nucleic Acids Res 10:4071
Carbonic anhydrase II. Yoshihara CM et al. (1987) Nucleic Acids Res 15:753
Neural cell adhesion molecule (N-CAM). Owens GC et al. (1987) Proc Natl Acad Sci USA 84:294
Glucose 6-phosphate dehydrogenase (G6PD). Martini G et al. (1986) EMBO J 5:1849
Thy-1. Giguere V et al. (1985) EMBO J 4:2017
Murine cellular tumor antigen p53. Bienz B et al. (1984) EMBO J 3:2179
I-Eβb: Widera G, Flavell RA (1984) EMBO J 3:1221
Major urinary protein (MUP) BS-6. Clark AJ et al. (1984) EMBO J 3:1045
HLA-CW3. Sodoyer R et al. (1984) EMBO J 3:879
C-myc. Gazin C et al. (1984) EMBO J 3:383
Myoglobin. Weller P et al. (1984) EMBO J 3:439
Insulin-like growth factor (IGF). Bell GI et al. (1985) Proc Natl Acad Sci USA 82:6450
T-cell receptor/T3δ. van den Elsen P et al. (1986) Proc Natl Acad Sci USA 83:2944
C-sis/platelet-derived growth factor 2 (SIS/PDGF2). Rao CD et al. (1986) Proc Natl Acad Sci USA 83:2392
Pro-α1(IV) collagen. Soininen R et al. (1986) Proc Natl Acad Sci USA 83:1568
C-Ha-ras-1. Sekiya T et al. (1984) Proc Natl Acad Sci USA 81:4771
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Höglund, M., Säll, T. & Röhme, D. On the origin of coding sequences from random open reading frames. J Mol Evol 30, 104–108 (1990). https://doi.org/10.1007/BF02099936
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02099936