Abstract
Exon shuffling is an essential molecular mechanism for the formation of new genes. Many cases of exon shuffling have been reported in vertebrate genes. These discoveries revealed the importance of exon shuffling in the origin of new genes. However, only a few cases of exon shuffling were reported from plants and invertebrates, which gave rise to the assertion that the intron-mediated recombination mechanism originated very recently. We focused on the origin of new genes by exon shuffling and retroposition. We will first summarize our experimental work, which revealed four new genes in Drosophila, plants, and humans. These genes are 106 to 108 million years old. The recency of these genes allows us to directly examine the origin and evolution of genes in detail. These observations show firstly the importance of exon shuffling and retroposition in the rapid creation of new gene structures. They also show that the resultant chimerical structures appearing as mosaic proteins or as retroposed coding structures with novel regulatory systems, often confer novel functions. Furthermore, these newly created genes appear to have been governed by positive Darwinian selection throughout their history, with rapid changes of amino acid sequence and gene structure in very short periods of evolution. We further analyzed the distribution of intron phases in three non-vertebrate species, Drosophila melanogaster, Caenorhabditis elegans, and Arabidosis thaliana, as inferred from their genome sequences. As in the case of vertebrate genes, we found that intron phases in these species are unevenly distributed with an excess of phase zero introns and a significant excess of symmetric exons. Both findings are consistent with the requirements for the molecular process of exon shuffling. Thus, these non-vertebrate genomes may have also been strongly impacted by exon shuffling in general.
Similar content being viewed by others
References
Adams, M.D., S.E. Celniker, R.A. Holt, C.A. Evans, J.D. Gocayne, P.G. Amanatides, S.E. Scherer, P.W. Li, R.A. Hoskins, R.F. Galle, et al., 2000. The genome sequence of Drosophila melanogaster. Science 287(5461): 2185-2195.
AGI (The Arabidopsis Genome Initiative), 2000. Analysis of the genome sequence of the flowering plant. Nature 408: 796-815.
Attwood, T.K., 2000. The Babel of bioinformatics. Science 290: 471-473.
Begun, D.J., 1997. Origin and evolution of a new gene descended from alcohol dehydrogenase in Drosophila. Genetics 145: 375-382.
Betrán, E. & M. Long, 2002. Expansion of genome coding regions by acquisition of new genes. Genetica 115: 65-80.
Betrán, E., W. Wang, L. Jin & M. Long, 2002. Evolution of the phosphoglycerate mutase processed gene in human and chimpanzee revealing the origin of a new primate gene. Mol. Biol. Evol. 19: 654-663.
Boeke, J.D. & O.K. Pickeral, 1999. Retroshuffling the genomic deck. Nature 398: 108-109, 111.
Brookfield, J.F. & P.M. Sharp, 1994. Neutralism and selectionism face up to DNA data. Trends Genet. 10: 109-111.
Brosius, J., 1999. RNAs from all categories generate retrosequences that may be exapted as novel genes or regulatory elements. Gene 238: 115-134.
Brosius, J., 2003. The contribution of RNAs and retroposition to evolutionary novelties. Genetica 118: 99-115.
Burks, C., M.J. Cinkosky, P. Gilna, J.E. Hayden, Y. Abe, E.J. Atencio, S. Barnhouse, D. Benton, C.A. Buenafe & K.E. Cumella, 1990. GenBank: current status and future directions. Meth. Enzymol. 183: 3-22.
Cerff, R., 1995. The chimeric nature of nuclear genoms and the antiquity of introns as demonstrated by the GAPDH gene system, pp. 205-228 in Tracing Biological Evolution in Protein and Gene Structures, edited by M. Go & P. Schimmel. Elsevier, Amsterdam.
CESC (The C. elegans Sequencing Consortium), 1998. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282: 2012-2018.
Chen, J.J., B.J. Janssen, A. Williams & N. Sinha, 1997. A gene fusion at a homeobox locus: alterations in leaf shape and implications for morphological evolution. Plant Cell 9: 1289-1304.
De Souza, S.J., M. Long, R.J. Klein, S. Roy, S. Lin & W. Gilbert, 1998. Toward a resolution of the introns early/late debate: only phase zero introns are correlated with the structure of ancient proteins. Proc. Natl. Acad. Sci. USA 95: 5094-5099.
De Souza, S.J., M. Long, L. Schoenbach, S.W. Roy & W. Gilbert, 1996. Intron positions correlate with module boundaries in ancient proteins. Proc. Natl. Acad. Sci. USA 93: 14632-14636.
Dierick, H.A., J.F.B. Mercer & T.W. Glover, 1997. A phosphoglycerate mutase brain isoform (PGAM1) pseudogene is localized within the human Menkes disease gene (ATP7A). Gene 198: 37-41.
Domon, C. & A. Steinmetz, 1994. Exon shuffling in anther-specific genes from sunflower. Mol. Gen. Genet. 244: 312-317.
Dorit, R.L., L. Schoenbach & W. Gilbert, 1991. How big is the universe of exons? Science 250: 1377-1382.
Eddy, S.R., 2001. Non-coding RNA genes and the modern RNA world. Nat. Rev. Genet. 2: 919-929.
Fedorov, A., L. Fedorova, V. Starshenko, V. Filatov & E. Grigor'ev, 1998. Influence of exon duplication on intron and exon phase distribution. J. Mol. Evol. 46: 263-271.
Fedorov, A., G. Suboch, M. Bujakov & L. Fedorova, 1992. Analysis of nonuniformity in intron phase distribution. Nucl. Acids Res. 20(10): 2553-2557.
Fraser, C.M., J.D. Gocayne, O. White, M.D. Adams, R.A. Clayton, R.D. Fleischmann et al., 1995. The minimal gene complement of Mycoplasma. Science 270: 397-403.
Gilbert, W., 1978. Why gene in pieces? Nature 271(5645): 501.
Gilbert, W., 1987. The exon theory of genes. Cold Spring Harb. Symp. Quant. Biol. 52: 901-905.
Gilbert, W., S.J. de Souza & M. Long, 1997. Origin of genes. Proc. Natl. Acad. Sci. USA 94: 7698-7703.
Goodman, M., C.A. Porter, J. Czelusniak, S.L. Page, H. Schneider, J. Shoshani, G. Gunnell & C.P. Groves, 1998. Toward a phylogenetic classification of primates based on DNA evidence complemented by fossil evidence. Mol. Phyl. Evol. 9: 585-598.
Grisolia, S. & B.K. Joyce, 1959. Distribution of two types of phosphoglyceric acid mutase, diphosphoglycerate mutase and D-2, 3-dipphosphoglyceric acid. J. Biol. Chem. 234, 6: 1335-1337.
Grisolia, S. & J. Carreras, 1975. Phosphoglycerate mutase from Yeast, chicken, breast muscle and kidney (2,3-PGA-dependent). Meth. Enzymol. 42: 435-450.
Gu, Z., A. Cavalcanti, F.C. Chen, P. Bouman & W.H. Li, 2002. Extent of gene duplication in the genomes of Drosophila, nematode, and yeast. Mol. Biol. Evol. 19: 256-262.
Himmelreich, R., H. Hilbert, H. Plagens, E. Pirkl, B.C. Li & R. Herrmann, 1996. Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae Nucl. Acids Res. 24: 4420-4449.
Horowitz, D.S. & A.R. Krainer, 1994. Mechanisms for selecting 5′ splice sites in mammalian pre-mRNA splicing. Trends Genet. 10: 100-106.
Jeffs, P. & M. Ashburner, 1991. Processed pseudogenes in Drosophila. Proc. R. Soc. Lond. B. 244: 151-159.
Kaessmann, H., S. Zöllner, A. Nekrutenko & W.H. Li, 2002. Signatures of domain shuffling in the human genome. Genome Res. 12: 1642-1650.
Lander, et al., 2001. Initial sequencing and analysis of the human genome. Nature 409: 860-921.
Langley, C.H., E. Montgomery & W.F. Quattlebaum, 1982. Restriction map variation in the Adh region of Drosophila. Proc. Natl. Acad. Sci. USA 79: 5631-5635.
Long, M., 2001. Evolution of novel genes. Curr. Opin. Genet. Dev. 11: 673-680.
Long, M. & M. Deutsch, 1999. Association of intron phases with conservation at splice site sequences and evolution of spliceosomal introns. Mol. Biol. Evol. 16: 1528-1534.
Long, M., W. Wang & J. Zhang, 1999. Origin of new genes and source for N-terminal domain of the chimerical gene, jingwei, in Drosophila. Gene 238: 135-141.
Long, M. & C. Rosenberg, 2000. Testing the “proto-splice sites” model of intron origin: evidence from analysis of intron phase correlations. Mol. Biol. Evol. 17: 1789-1796.
Long, M., C. Rosenberg & W. Gilbert, 1995. Intron phase correlations and the evolution of the intron/exon structure of genes. Proc. Natl. Acad. Sci. USA 92(26): 12495-12499.
Long, M., S.J. de Souza & W. Gilbert, 1995. Evolution of intron/exon structure of eukaryotic genes. Curr. Opin. Genet. Dev. 5: 774-778.
Long, M., S.J. de Souza, C. Rosenberg & W. Gilbert, 1996. Exon shuffling and the origin of the mitochondrial targeting function in plant cytochrome c1 precursor. Proc. Natl. Acad. Sci. USA 93: 7727-7731.
Long, M., S.J. de Souza, C. Rosenberg & W. Gilbert, 1998. Relationship between “proto-splice sites” and intron phases: evidence from dicodon analysis. Proc. Natl. Acad. Sci. USA 95: 219-223.
Long, M. & C.H. Langley, 1993. Natural selection and the origin of jingwei, a chimeric processed functional gene in Drosophila. Science 260: 91-95.
Lynch, M., 2002. Intron evolution as a population-genetic process. Proc. Natl. Acad. Sci. USA 99: 6118-6123.
Nugent, J.M. & J.D. Palmer, 1991. RNA-mediated transfer of the gene coxII from the mitochondrion to the nucleus during flowering plant evolution. Cell 66: 473-481.
Nurminsky, D.I., M.V. Nurminskaya, D. De Aguiar & D.L. Hartl, 1998. Selective sweep of a newly evolved sperm-specific gene in Drosophila. Nature 396: 572-575.
Ohno, S., 1970. Evolution by Gene Duplication. Springer, New York.
Palmer, J.D., 1985. Comparative organization of chloroplast genomes. Annu. Rev. Genet. 19: 325-354.
Patthy, L., 1987. Intron-dependent evolution: preferred types of exons and introns. FEBS Lett. 214: 1-7.
Patthy, L., 1991. Modular exchange principles in proteins. Curr. Opin. Struct. Biol. 1: 351-361.
Patthy, L., 1995. Protein Evolution by Exon-shuffling. Molecular biology intelligence unit, edited by R.G. Landes. Springer, Austin, TX.
Pearson, W.R., 1994. Using the FASTA program to search protein and DNA sequence databases. Meth. Mol. Biol. 24: 307-331.
Reed, R., 1996. Initial splice-site recognition and pairing during pre-mRNA splicing. Curr. Opin. Genet. Dev. 6: 215-220.
Roise, D., S.J. Horvath, J.M. Tomich, J.H. Richards & G. Schatz, 1986. A chemically synthesized pre-sequence of an imported mitochondrial protein can form an amphiphilic helix and perturb natural and artificial phospholipid bilayers. EMBO J. 5: 1327-1334.
Rubin, G.M., M.D. Yandell, J.R. Wortman, G.L. Gabor Miklos, C.R. Nelson, I.K. Hariharan et al., 2000. Comparative genomics of the eukaryotes. Science 287: 2204-2215.
Schatz, G. & B. Dobberstein, 1996. Common principles of protein translocation across membranes. Science 271: 1519-1526.
Stoltzfus, A., J.M. Logsdon Jr., J.D. Palmer & W.F. Doolittle, 1997. Intron “sliding” and the diversity of intron positions. Proc. Natl. Acad. Sci. USA 94: 10739-10744.
Venter, J.C. et al., 2001. The sequence of the human genome. Science 291: 1304-1351.
Wang, W., J. Zhang, C. Alvarez, A. Llopart & M. Long, 2000. The origin of the Jingwei gene and the complex modular structure of its parental gene, yellow emperor, in Drosophila melanogaster. Mol. Biol. Evol. 17: 1294-1301.
Wang, W., F.G. Brunet, E. Nevo & M. Long, 2002a. Origin of sphinx, a young chimeric RNA gene in Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 99: 4448-4453.
Wang, W., K. Thornton, A. Berry & M. Long, 2002b. Nucleotide variation along the Drosophila melanogaster fourth chromosome. Science 295: 134-137.
Wegener, S. & U.K. Schmitz, 1993. The presequence of cytochrome c1 from potato mitochondria is encoded on four exons. Curr. Genet. 24: 256-259.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Long, M., Deutsch, M., Wang, W. et al. Origin of New Genes: Evidence from Experimental and Computational Analyses. Genetica 118, 171–182 (2003). https://doi.org/10.1023/A:1024153609285
Issue Date:
DOI: https://doi.org/10.1023/A:1024153609285