Abstract
Intronless genes, a characteristic feature of prokaryotes, constitute a significant portion of the eukaryotic genomes. Our analysis revealed the presence of 11,109 (19.9%) and 5,846 (21.7%) intronless genes in rice and Arabidopsis genomes, respectively, belonging to different cellular role and gene ontology categories. The distribution and conservation of rice and Arabidopsis intronless genes among different taxonomic groups have been analyzed. A total of 301 and 296 intronless genes from rice and Arabidopsis, respectively, are conserved among organisms representing the three major domains of life, i.e., archaea, bacteria, and eukaryotes. These evolutionarily conserved proteins are predicted to be involved in housekeeping cellular functions. Interestingly, among the 68% of rice and 77% of Arabidopsis intronless genes present only in eukaryotic genomes, approximately 51% and 57% genes have orthologs only in plants, and thus may represent the plant-specific genes. Furthermore, 831 and 144 intronless genes of rice and Arabidopsis, respectively, referred to as ORFans, do not exhibit homology to any of the genes in the database and may perform species-specific functions. These data can serve as a resource for further comparative, evolutionary, and functional analysis of intronless genes in plants and other organisms.
Similar content being viewed by others
References
Agarwal SM, Gupta J (2005) Comparative analysis of human intronless proteins. Biochem Biophys Res Commun 331:512–519
Ahn S, Tanksley SD (1993) Comparative linkage maps of the rice and maize genomes. Proc Natl Acad Sci USA 90:7980–7984
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Andersson JO (2005) Lateral gene transfer in eukaryotes. Cell Mol Life Sci 62:1182–1197
Aubourg S, Kreis M, Lecharny A (1999) The DEAD box RNA helicase family in Arabidopsis thaliana. Nucleic Acids Res 27:628–636
Babenko VN, Rogozin IB, Mekhedov SL, Koonin EV (2004) Prevalence of intron gain over intron loss in the evolution of paralogous gene families. Nucleic Acids Res 32:3724–3733
Bancroft I (2002) Insights into cereal genomes from two draft genome sequences of rice. Genome Biol 3: Reviews 1015.1–1015.3
Boucher Y, Douady CJ, Papke RT, Walsh DA, Boudreau ME, Nesbo CL, Case RJ, Doolittle WF (2003) Lateral gene transfer and the origins of prokaryotic groups. Annu Rev Genet 37:283–328
Boudet N, Aubourg S, Toffano-Nioche C, Kreis M, Lecharny A (2001) Evolution of intron/exon structure of DEAD helicase family genes in Arabidopsis, Caenorhabditis, and Drosophila. Genome Res 11:2101–2114
Bowers JE, Chapman BA, Rong J, Paterson AH (2003) Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422:433–438
Chapman BA, Bowers JE, Feltus FA, Paterson AH (2006) Buffering of crucial functions by paleologous duplicated genes may contribute cyclicality to angiosperm genome duplication. Proc Natl Acad Sci U S A 103:2730–2735
Copley SD, Dhillon JK (2002) Lateral gene transfer and parallel evolution in the history of glutathione biosynthesis genes. Genome Biol 3:1–25
Delseny M (2003) Towards an accurate sequence of the rice genome. Curr Opin Plant Biol 6:101–105
Domazet-Loso T, Tautz D (2003) An evolutionary analysis of orphan genes in Drosophila. Genome Res 13:2213–2219
Fischer D, Eisenberg D (1999) Finding families for genomic ORFans. Bioinformatics 15:759–762
Gagne JM, Downes BP, Shiu SH, Durski AM, Vierstra RD (2002) The F-box subunit of the SCF E3 complex is encoded by a diverse superfamily of genes in Arabidopsis. Proc Natl Acad Sci U S A 99:11519–11524
Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B (2002) Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419:498–511
Gentles AJ, Karlin S (1999) Why are human G-protein-coupled receptors predominantly intronless? Trends Genet 15:47–49
Glusman G, Sosinsky A, Ben-Asher E, Avidan N, Sonkin D, Bahar A, Rosenthal A, Clifton S, Roe B, Ferraz C, Demaille J, Lancet D (2000) Sequence, structure, and evolution of a complete human olfactory receptor gene cluster. Genomics 63:227–245
Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, Hadley D, Hutchison D, Martin C, Katagiri F, Lange BM, Moughamer T, Xia Y, Budworth P, Zhong J, Miguel T, Paszkowski U, Zhang S, Colbert M, Sun WL, Chen L, Cooper B, Park S, Wood TC, Mao L, Quail P, Wing R, Dean R, Yu Y, Zharkikh A, Shen R, Sahasrabudhe S, Thomas A, Cannings R, Gutin A, Pruss D, Reid J, Tavtigian S, Mitchell J, Eldredge G, Scholl T, Miller RM, Bhatnagar S, Adey N, Rubano T, Tusneem N, Robinson R, Feldhaus J, Macalma T, Oliphant A, Briggs S (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296:92–100
Gotoh O (1998) Divergent structures of Caenorhabditis elegans cytochrome P450 genes suggest the frequent loss and gain of introns during the evolution of nematodes. Mol Biol Evol 15:1447–1459
International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436:793–800
Jain M, Kaur N, Garg R, Thakur JK, Tyagi AK, Khurana JP (2006a) Structure and expression analysis of early auxin-responsive Aux/IAA gene family in rice (Oryza sativa). Funct Integr Genomics 6:47–59
Jain M, Tyagi AK, Khurana JP (2006b) Genome-wide analysis, evolutionary expansion, and expression of early auxin-responsive SAUR gene family in rice (Oryza sativa). Genomics 88:360–371
Jensen LJ, Gupta R, Blom N, Devos D, Tamames J, Kesmir C, Nielsen H, Staerfeldt HH, Rapacki K, Workman C, Andersen CA, Knudsen S, Krogh A, Valencia A, Brunak S (2002) Prediction of human protein function from post-translational modifications and localization features. J Mol Biol 319:1257–1265
Jensen LJ, Ussery DW, Brunak S (2003) Functionality of system components: conservation of protein function in protein feature space. Genome Res 13:2444–2449
Jordan IK, Rogozin IB, Wolf YI, Koonin EV (2002) Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res 12:962–968
Kellis M, Birren BW, Lander ES (2004) Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428:617–624
Lecharny A, Boudet N, Gy I, Aubourg S, Kreis M (2003) Introns in, introns out in plant gene families: a genomic approach of the dynamics of gene structure. J Struct Funct Genomics 3:111–116
Long M (2001) Evolution of novel genes. Curr Opin Genet Dev 11:673–680
Lurin C, Andres C, Aubourg S, Bellaoui M, Bitton F, Bruyere C, Caboche M, Debast C, Gualberto J, Hoffmann B, Lecharny A, Le Ret M, Martin-Magniette ML, Mireau H, Peeters N, Renou JP, Szurek B, Taconnat L, Small I (2004) Genome-wide analysis of Arabidopsis pentatricopeptide repeat proteins reveals their essential role in organelle biogenesis. Plant Cell 16:2089–2103
Paterson AH, Bowers JE, Chapman BA (2004a) Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc Natl Acad Sci U S A 101:9903–9908
Paterson AH, Bowers JE, Chapman BA, Peterson DG, Rong J, Wicker TM (2004b) Comparative genome analysis of monocots and dicots, toward characterization of angiosperm diversity. Curr Opin Biotechnol 15:120–125
Rujan T, Martin W (2001) How many genes in Arabidopsis come from cyanobacteria? An estimate from 386 protein phylogenies. Trends Genet 17:113–120
Sakharkar MK, Kangueane P (2004) Genome SEGE: a database for ‘intronless’ genes in eukaryotic genomes. BMC Bioinformatics 5:67
Sakharkar KR, Sakharkar MK, Culiat CT, Chow VT, Pervaiz S (2006) Functional and evolutionary analyses on expressed intronless genes in the mouse genome. FEBS Lett 580:1472–1478
Schmid KJ, Aquadro CF (2001) The evolutionary analysis of “orphans” from the Drosophila genome identifies rapidly diverging and incorrectly annotated genes. Genetics 159:589–598
Siew N, Fischer D (2003a) Analysis of singleton ORFans in fully sequenced microbial genomes. Proteins 53:241–251
Siew N, Fischer D (2003b) Twenty thousand ORFan microbial protein families for the biologist? Structure 11:7–9
Takeda S, Kadowaki S, Haga T, Takaesu H, Mitaku S (2002) Identification of G protein-coupled receptor genes from the human genome sequence. FEBS Lett 520:97–101
The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815
Veitia RA (2005) Paralogs in polyploids: one for all and all for one? Plant Cell 17:4–11
Vij S, Gupta V, Kumar D, Vydianathan R, Raghuvanshi S, Khurana P, Khurana JP, Tyagi AK (2006) Decoding the rice genome. Bioessays 28:421–432
Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Church DM, DiCuccio M, Edgar R, Federhen S, Helmberg W, Kenton DL, Khovayko O, Lipman DJ, Madden TL, Maglott DR, Ostell J, Pontius JU, Pruitt KD, Schuler GD, Schriml LM, Sequeira E, Sherry ST, Sirotkin K, Starchenko G, Suzek TO, Tatusov R, Tatusova TA, Wagner L, Yaschenko E (2005) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 33:D39–D45
Wilson AC, Carlson SS, White TJ (1977) Biochemical evolution. Annu Rev Biochem 46:573–639
Yu J, Wang J, Lin W, Li S, Li H, Zhou J, Ni P, Dong W, Hu S, Zeng C, Zhang J, Zhang Y, Li R, Xu Z, Li X, Zheng H, Cong L, Lin L, Yin J, Geng J, Li G, Shi J, Liu J, Lv H, Li J, Deng Y, Ran L, Shi X, Wang X, Wu Q, Li C, Ren X, Li D, Liu D, Zhang X, Ji Z, Zhao W, Sun Y, Zhang Z, Bao J, Han Y, Dong L, Ji J, Chen P, Wu S, Xiao Y, Bu D, Tan J, Yang L, Ye C, Xu J, Zhou Y, Yu Y, Zhang B, Zhuang S, Wei H, Liu B, Lei M, Yu H, Li Y, Xu H, Wei S, He X, Fang L, Huang X, Su Z, Tong W, Tong Z, Ye J, Wang L, Lei T, Chen C, Chen H, Huang H, Zhang F, Li N, Zhao C, Huang Y, Li L, Xi Y, Qi Q, Li W, Hu W, Tian X, Jiao Y, Liang X, Jin J, Gao L, Zheng W, Hao B, Liu S, Wang W, Yuan L, Cao M, McDermott J, Samudrala R, Wong GK, Yang H (2005) The Genomes of Oryza sativa: a history of duplications. PLoS Biol 3:e38
Yuan Q, Ouyang S, Wang A, Zhu W, Maiti R, Lin H, Hamilton J, Haas B, Sultana R, Cheung F, Wortman J, Buell CR (2005) The Institute for Genomic Research Osa1 rice genome annotation database. Plant Physiol 138:18–26
Acknowledgments
We are thankful to Rashmi Jain for technical assistance. This work was supported financially by the Department of Biotechnology, Government of India, and the University Grants Commission, New Delhi. MJ acknowledges the Council of Scientific and Industrial Research, New Delhi, for the award of Senior Research Fellowship.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below are the links to the electronic supplementary material.
Supplemental data file 1
Predicted intronless genes in rice. (PDF 263 kb)
Supplemental data file 2
Predicted intronless genes in Arabidopsis. (PDF 184 kb)
Supplemental data file 3
Cellular role and GO category of rice intronless genes predicted by ProtFun. (PDF 277 kb)
Supplemental data file 4
Cellular role and GO category of Arabidopsis intronless genes predicted by ProtFun. (PDF 174 kb)
Supplemental data file 5
Locus IDs of rice intronless genes present in archaea, bacteria, and/or eukaryotes (PDF 169 kb)
Supplemental data file 6
Locus IDs of Arabidopsis intronless genes present in archaea, bacteria, and/or eukaryotes. (PDF 147 kb)
Supplemental data file 7
Locus IDs of rice intronless genes present specifically in different taxonomic groups. (XLS 419 kb)
Supplemental data file 8
Locus IDs of Arabidopsis intronless genes present specifically in different taxonomic groups. (XLS 306 kb)
Supplemental data file 9
Locus IDs of rice and Arabidopsis intronless ORFans. (PDF 59 kb)
Rights and permissions
About this article
Cite this article
Jain, M., Khurana, P., Tyagi, A.K. et al. Genome-wide analysis of intronless genes in rice and Arabidopsis . Funct Integr Genomics 8, 69–78 (2008). https://doi.org/10.1007/s10142-007-0052-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10142-007-0052-9