Abstract
The evolutionary origin of “orphan” genes, genes that lack sequence similarity to any known gene, remains a mystery. One suggestion has been that most orphan genes evolve rapidly so that similarity to other genes cannot be traced after a certain evolutionary distance. This can be tested by examining the divergence rates of genes with different degrees of lineage specificity. Here the lineage specificity (LS) of a gene describes the phylogenetic distribution of that gene’s orthologues in related species. Highly lineage-specific genes will be distributed in fewer species in a phylogeny. In this study, we have used the complete genomes of seven ascomycotan fungi and two animals to define several levels of LS, such as Eukaryotes-core, Ascomycota-core, Euascomycetes-specific, Hemiascomycetes-specific, Aspergillus-specific, and Saccharomyces-specific. We compare the rates of gene evolution in groups of higher LS to those in groups with lower LS. Molecular evolutionary analyses indicate an increase in nonsynonymous nucleotide substitution rates in genes with higher LS. Several analyses suggest that LS is correlated with the evolutionary rate of the gene. This correlation is stronger than those of a number of other factors that have been proposed as predictors of a gene’s evolutionary rate, including the expression level of genes, gene essentiality or dispensability, and the number of protein-protein interactions. The accelerated evolutionary rates of genes with higher LS may reflect the influence of selection and adaptive divergence during the emergence of orphan genes. These analyses suggest that accelerated rates of gene evolution may be responsible for the emergence of apparently orphan genes.
Similar content being viewed by others
References
Adams MD, Celniker SE, Holt RA, et al. (2000) The genome sequence of Drosophila melanogaster. Science 287:2185–2195
Alba MM, Castresana J (2005) Inverse relationship between evolutionary rate and age of mammalian genes. Mol Biol Evol 22:598–606
Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W, Lipman D (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Aravind L, Watanabe H, Lipman DJ, Koonin EV (2000) Lineage-specific loss and divergence of functionally linked genes in eukaryotes. Proc Natl Acad Sci USA 97:11319–11324
Breitkreutz BJ, Stark C, Tyers M (2003) The GRID: the General Repository for Interaction Datasets. Genome Biol 4:R23
C. elegans Sequencing Consortium (1998) Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282:2012–2018
Cai JJ, Smith DK, Xia X, Yuen KY (2005) MBEToolbox: a Matlab toolbox for sequence data analysis in molecular biology and evolution. BMC Bioinform 6:64
Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ, Davis RW (1998) A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 2:65–73
Cliften P, Sudarsanam P, Desikan A, Fulton L, Fulton B, Majors J, Waterston R, Cohen BA, Johnston M (2003) Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science 301:71–76
Collins LJ, Poole AM, Penny D (2003) Using ancestral sequences to uncover potential gene homologueues. Appl Bioinform 2:S85–S95
Daubin V, Ochman H (2004) Bacterial genomes as new gene homes: the genealogy of ORFans in E. coli. Genome Res 14:1036–1042
d’Enfert C, Goyard S, Rodriguez-Arnaveilhe S, Frangeul L, Jones L, Tekaia F, Bader O, Albrecht A, Castillo L, Dominguez A, Ernst JF, Fradin C, Gaillardin C, Garcia-Sanchez S, de Groot P, Hube B, Klis FM, Krishnamurthy S, Kunze D, Lopez MC, Mavor A, Martin N, Moszer I, Onesime D, Perez Martin J, Sentandreu R, Valentin E, Brown AJ (2005) CandidaDB: a genome database for Candida albicans pathogenomics. Nucleic Acids Res 33:D353–D357
Domazet-Loso T, Tautz D (2003) An evolutionary analysis of orphan genes in Drosophila. Genome Res 13:2213–2219
Draper NR, Smith H (1998) Applied regression analysis. Wiley, New York
Elhaik E, Sabath N, Graur D (2006) The “inverse relationship between evolutionary rate and age of Mammalian genes” is an artifact of increased genetic distance with rate of evolution and time of divergence. Mol Biol Evol 23:1–3
Fischer D, Eisenberg D (1999) Finding families for genomic ORFans. Bioinformatics 15:759–762
Fraser HB, Wall DP, Hirsh AE (2003) A simple dependence between protein evolution rate and the number of protein-protein interactions. BMC Evol Biol 3:11
Fungal Research Community (2002) Fungal Genome Initiative; http://www.broad.mit.edu/annotation/fungi/fgi/FGI_01_whitepaper_2002.pdf
Galagan JE, Calvo SE, Borkovich KA, Selker EU, Read ND, Jaffe D, FitzHugh W, Ma LJ, Smirnov S, Purcell S, Rehman B, Elkins T, Engels R, Wang S, Nielsen CB, Butler J, Endrizzi M, Qui D, Ianakiev P, Bell-Pedersen D, Nelson MA, Werner-Washburne M, Selitrennikoff CP, Kinsey JA, Braun EL, Zelter A, Schulte U, Kothe GO, Jedd G, Mewes W, Staben C, Marcotte E, Greenberg D, Roy A, Foley K, Naylor J, Stange-Thomann N, Barrett R, Gnerre S, Kamal M, Kamvysselis M, Mauceli E, Bielke C, Rudd S, Frishman D, Krystofova S, Rasmussen C, Metzenberg RL, Perkins DD, Kroken S, Cogoni C, Macino G, Catcheside D, Li W, Pratt RJ, Osmani SA, DeSouza CP, Glass L, Orbach MJ, Berglund JA, Voelker R, Yarden O, Plamann M, Seiler S, Dunlap J, Radford A, Aramayo R, Natvig DO, Alex LA, Mannhaupt G, Ebbole DJ, Freitag M, Paulsen I, Sachs MS, Lander ES, Nusbaum C, Birren B (2003) The genome sequence of the filamentous fungus Neurospora crassa. Nature 422:859–868
Gavin AC, Bosche M, Krause R, et al. (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415:141–147
Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M, Louis EJ, Mewes HW, Murakami Y, Philippsen P, Tettelin H, Oliver SG (1996) Life with 6000 genes. Science 274:546, 563–567
Graur D (1985) Amino acid composition and the evolutionary rates of protein-coding genes. J Mol Evol 22:53–62
Gu Z, Steinmetz LM, Gu X, Scharfe C, Davis RW, Li WH (2003) Role of duplicate genes in genetic robustness against null mutations. Nature 421:63–66
Hastings KE (1996) Strong evolutionary conservation of broadly expressed protein isoforms in the troponin I gene family and other vertebrate gene families. J Mol Evol 42:631–640
Heckman DS, Geiser DM, Eidell BR, Stauffer RL, Kardos NL, Hedges SB (2001) Molecular evidence for the early colonization of land by fungi and plants. Science 293:1129–1133
Hedges SB, Kumar S (2003) Genomic clocks and evolutionary timescales. Trends Genet 19:200–206
Hirsh A, Fraser H (2001) Protein dispensability and rate of evolution. Nature 411:1046–1049
Ho Y, Gruhler A, Heilbut A, Bader G, Moore L, Adams S, Millar A, Taylor P, Bennett K, Boutilier K (2002) Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415:180–183
Hurst LD, Smith NG (1999) Do essential genes evolve slowly? Curr Biol 9:747–750
Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA 98:4569– 4574
Jordan IK, Rogozin IB, Wolf YI, Koonin EV (2002a) Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res 12:962–968
Jordan IK, Rogozin IB, Wolf YI, Koonin EV (2002b) Microevolutionary genomics of bacteria. Theor Popul Biol 61:435–447
Jordan IK, Wolf YI, Koonin EV (2003) No simple dependence between protein evolution rate and the number of protein-protein interactions: only the most prolific interactors tend to evolve slowly. BMC Evol Biol 3:1
Kellis M, Patterson N, Endrizzi M, Birren B, Lander E (2003) Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423:241–254
Kondrashov FA, Ogurtsov AY, Kondrashov AS (2004) Bioinformatical assay of human gene morbidity. Nucleic Acids Res 32:1731–1737
Koonin E, Fedorova N, Jackson J, Jacobs A, Krylov D, Makarova K, Mazumder R, Mekhedov S, Nikolskaya A, Rao B, Rogozin I, Smirnov S, Sorokin A, Sverdlov A, Vasudevan S, Wolf Y, Yin J, Natale D (2004) A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol 5:R7
Krylov DM, Wolf YI, Rogozin IB, Koonin EV (2003) Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. Genome Res 13:2229–2235
Kurtzman C, Fell J (1998) The yeasts, a taxonomic study. Elsevier Science, Amsterdam
Makalowski W, Boguski MS (1998) Synonymous and nonsynonymous substitution distances are correlated in mouse and rat genes. J Mol Evol 47:119–121
Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D (1999) A combined algorithm for genome-wide prediction of protein function. Nature 402:83–86
Ohta T, (1995) Synonymous and nonsynonymous substitutions in mammalian genes and the nearly neutral theory. J Mol Evol 40:56–63
Pal C, Papp B, Hurst LD (2001) Highly expressed genes in yeast evolve slowly. Genetics 158:927–931
Pellegrini M, Marcotte E, Thompson M, Eisenberg D, Yeates T (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA 96:4285–4288
Remm M, Storm CE, Sonnhammer EL (2001) Automatic clustering of orthologues and in-paralogs from pairwise species comparisons. J Mol Biol 314:1041–1052
Rocha EP, Danchin A (2004) An analysis of determinants of amino acids substitution rates in bacterial proteins. Mol Biol Evol 21:108–116
Rubin GM, Yandell MD, Wortman JR, Gabor Miklos GL, et al. (2000) Comparative genomics of the eukaryotes. Science 287:2204–2215
Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D (2004) The Database of Interacting Proteins: 2004 update. Nucleic Acids Res 32:D449–D451
Schmid KJ, Aquadro CF (2001) The evolutionary analysis of “orphans” from the Drosophila genome identifies rapidly diverging and incorrectly annotated genes. Genetics 159:589–598
Shields R (2004) Pushing the envelope on molecular dating. Trends Genet 20:221–222
Sipiczki M (2000) Where does fission yeast sit on the tree of life? Genome Biol 1:REVIEWS1011
Steinmetz LM, Scharfe C, Deutschbauer AM, Mokranjac D, Herman ZS, Jones T, Chu AM, Giaever G, Prokisch H, Oefner PJ, Davis RW (2002) Systematic screen for human disease genes in yeast. Nature Genet 31:400–404
Thompson J, Higgins D, Gibson T (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680
Tong AHY, Lesage G, Bader GD, Ding H, et al. (2004) Global mapping of the yeast genetic interaction network. Science 303:808–813
Uetz P, Giot L, Cagney G, Mansfield T, Judson R, Knight J, Lockshon D, Narayan V, Srinivasan M, Pochart P (2000) A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae Nature 403:623–627
Wagner A (2001) The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. Mol Biol Evol 18:1283–1292
Wilson AC, Carlson SS, White TJ (1977) Biochemical evolution. Annu Rev Biochem 46:573–639
Wolfe KH, Sharp PM (1993) Mammalian gene evolution: nucleotide sequence divergence between mouse and rat. J Mol Evol 37:441–456
Wood V, Gwilliam R, Rajandream MA, et al. (2002) The genome sequence of Schizosaccharomyces pombe. Nature 415:871–880
Yang J, Gu Z, Li WH (2003) Rate of protein evolution versus fitness effect of gene deletion. Mol Biol Evol 20:772–774
Yang Z (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13:555–556
Zar JH (1999) Biostatistical analysis. Prentice Hall, Upper Saddle River, NJ
Zhang P, Gu Z, Li WH (2003) Different evolutionary patterns between young duplicate genes in the human genome. Genome Biol 4:R56
Acknowledgments
A preliminary version of this work was presented by J.J.C. at the SMBE conference on June 17, 2004. This work was supported in part by the AIDS Trust Fund (MSS 083), Research Grant Council Grant (HKU 7363/03M), and the University Development Fund of the University of Hong Kong. We thank Professor Pak Sham and the anonymous reviewers for valuable comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
[Reviewing Editor: Dr. Martin Kreitman]
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Cai, J.J., Woo, P.C., Lau, S.K. et al. Accelerated Evolutionary Rate May Be Responsible for the Emergence of Lineage-Specific Genes in Ascomycota. J Mol Evol 63, 1–11 (2006). https://doi.org/10.1007/s00239-004-0372-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-004-0372-5