Skip to main content
Log in

Features of Arabidopsis Genes and Genome Discovered using Full-length cDNAs

  • Published:
Plant Molecular Biology Aims and scope Submit manuscript

Abstract

Arabidopsis is currently the reference genome for higher plants. A new, more detailed statistical analysis of Arabidopsis gene structure is presented including intron and exon lengths, intergenic distances, features of promoters, and variant 5′-ends of mRNAs transcribed from the same transcription unit. We also provide a statistical characterization of Arabidopsis transcripts in terms of their size, UTR lengths, 3′-end cleavage sites, splicing variants, and coding potential. These analyses were facilitated by scrutiny of our collection of sequenced full-length cDNAs and much larger collection of 5′-ESTs, together with another set of full-length cDNAs from Salk/Stanford/Plant Gene Expression Center/RIKEN. Examples of alternative splicing are observed for transcripts from 7% of the genes and many of these genes display multiple spliced isoforms. Most splicing variants lie in non-coding regions of the transcripts. Non-canonical splice sites constitute less than 1% of all splice sites. Genes with fewer than four introns display reduced average mRNA levels. Putative alternative transcription start sites were observed in 30% of highly expressed genes and in more than 50% of the genes with low expression. Transcription start sites correlate remarkably well with a CG skew peak in the DNA sequences. The intergenic distances vary considerably, those where genes are transcribed towards one another being significantly shorter. New transcripts, missing in the current TIGR genome annotation and ESTs that are non-coding, including those antisense to known genes, are derived and cataloged in the Supplementary Material. They identify 148 new loci in the Arabidopsis genome. The conclusions drawn provide a better understanding of the Arabidopsis genome and how the gene transcripts are processed. The results also allow better predictions to be made for, as yet, poorly defined genes and provide a reference for comparisons with other plant genomes whose complete sequences are currently being determined. Some comparisons with rice are included in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • A. Beletskii A.S. Bhagwat (1996) ArticleTitleTranscription-induced mutations: increase in C to T mutations in the nontranscribed strand during transcription in Escherichia coli Proc. Natl. Acad. Sci. USA 93 IssueID24 13919–13924 Occurrence Handle8943036 Occurrence Handle1:CAS:528:DyaK28Xnt1Glt70%3D Occurrence Handle10.1073/pnas.93.24.13919

    Article  PubMed  CAS  Google Scholar 

  • A. Beletskii A. Grigoriev et al. (2000) ArticleTitleMutations induced by bacteriophage T7 RNA polymerase and their effects on the composition of the T7 genome J. Mol. Biol. 300 IssueID5 1057–1065 Occurrence Handle10903854 Occurrence Handle1:CAS:528:DC%2BD3cXkvFars7k%3D Occurrence Handle10.1006/jmbi.2000.3944

    Article  PubMed  CAS  Google Scholar 

  • E. Birney J.D. Thompson et al. (1996) ArticleTitlePairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames Nucleic Acids Res. 24 IssueID14 2730–2739 Occurrence Handle8759004 Occurrence Handle1:CAS:528:DyaK28XkslWls7g%3D Occurrence Handle10.1093/nar/24.14.2730

    Article  PubMed  CAS  Google Scholar 

  • V. Castelli J.M. Aury et al. (2004) ArticleTitleWhole genome sequence comparisons and full-length cDNA sequences: a combined approach to evaluate and improve Arabidopsis genome annotation Genome Res. 14 IssueID3 406–413 Occurrence Handle14993207 Occurrence Handle10.1101/gr.1515604

    Article  PubMed  Google Scholar 

  • M.J. Daly (2002) ArticleTitleEstimating the human gene count Cell 109 IssueID3 283–284 Occurrence Handle12015978 Occurrence Handle1:CAS:528:DC%2BD38XjvV2lsL0%3D Occurrence Handle10.1016/S0092-8674(02)00742-0

    Article  PubMed  CAS  Google Scholar 

  • M. Danin-Kreiselman C.Y. Lee et al. (2003) ArticleTitleRNAse III-mediated degradation of unspliced pre-mRNAs and lariat introns Mol. Cell 11 IssueID5 1279–1289 Occurrence Handle12769851 Occurrence Handle1:CAS:528:DC%2BD3sXksVOjt7c%3D Occurrence Handle10.1016/S1097-2765(03)00137-0

    Article  PubMed  CAS  Google Scholar 

  • S.R. Eddy (2001) ArticleTitleNon-coding RNA genes and the modern RNA world Nat. Rev. Genet. 2 IssueID12 919–929 Occurrence Handle11733745 Occurrence Handle1:CAS:528:DC%2BD38Xmt1Whu7g%3D Occurrence Handle10.1038/35103511

    Article  PubMed  CAS  Google Scholar 

  • L. Florea G. Hartzell et al. (1998) ArticleTitleA computer program for aligning a cDNA sequence with a genomic DNA sequence Genome Res. 8 IssueID9 967–974 Occurrence Handle9750195 Occurrence Handle1:CAS:528:DyaK1cXmsVWnt74%3D

    PubMed  CAS  Google Scholar 

  • J.M. Freeman T.N. Plasterer et al. (1998) ArticleTitlePatterns of Genome Organization in Bacteria Science 279 1827 Occurrence Handle10.1126/science.279.5358.1827a

    Article  Google Scholar 

  • Gish, W. (1996–2001). BLASTN 2.0MP-WashU. http://blast.wustl.edu.

  • S.A. Goff D. Ricke et al. (2002) ArticleTitleA draft sequence of the rice genome (Oryza sativa L. ssp. japonica) Science 296 IssueID5565 92–100 Occurrence Handle11935018 Occurrence Handle1:CAS:528:DC%2BD38XivVSqtrw%3D Occurrence Handle10.1126/science.1068275

    Article  PubMed  CAS  Google Scholar 

  • A. Grigoriev (1998a) ArticleTitleAnalyzing genomes with cumulative skew diagrams Nucleic Acids Res. 26 IssueID10 2286–2290 Occurrence Handle1:CAS:528:DyaK1cXjvFWmtr0%3D Occurrence Handle10.1093/nar/26.10.2286

    Article  CAS  Google Scholar 

  • A. Grigoriev (1998b) ArticleTitleGenome arithmetic Science 281 1923a Occurrence Handle10.1126/science.281.5385.1923a

    Article  Google Scholar 

  • A. Grigoriev (1999) ArticleTitleStrand-specific compositional asymmetries in double-stranded DNA viruses Virus Res. 60 IssueID1 1–19 Occurrence Handle10225270 Occurrence Handle1:CAS:528:DyaK1MXhs1OmsL0%3D Occurrence Handle10.1016/S0168-1702(98)00139-7

    Article  PubMed  CAS  Google Scholar 

  • B.J. Haas A.L. Delcher et al. (2003) ArticleTitleImproving the Arabidopsis genome annotation using maximal transcript alignment assemblies Nucleic Acids Res. 31 IssueID19 5654–5666 Occurrence Handle14500829 Occurrence Handle1:CAS:528:DC%2BD3sXns1Cntbs%3D Occurrence Handle10.1093/nar/gkg770

    Article  PubMed  CAS  Google Scholar 

  • Haas, B.J., Volfovsky, N. et al. 2002. Full-length messenger RNA sequences greatly improve genome annotation. Genome Biol. 3(6).

  • R.T. Hillman R.E. Green et al. (2004) ArticleTitleAn unappreciated role for RNA surveillance Genome Biol. 5 IssueID2 R8 Occurrence Handle14759258 Occurrence Handle10.1186/gb-2004-5-2-r8

    Article  PubMed  Google Scholar 

  • X. Huang M.D. Adams et al. (1997) ArticleTitleA tool for analyzing and annotating genomic sequences Genomics 46 IssueID1 37–45 Occurrence Handle9403056 Occurrence Handle1:CAS:528:DyaK2sXnvVCkurw%3D Occurrence Handle10.1006/geno.1997.4984

    Article  PubMed  CAS  Google Scholar 

  • K. Iida M. Seki et al. (2004) ArticleTitleGenome-wide analysis of alternative pre-mRNA splicing in Arabidopsis thaliana based on full-length cDNA sequences Nucleic Acids Res. 32 IssueID17 5096–5103 Occurrence Handle15452276 Occurrence Handle1:CAS:528:DC%2BD2cXotlWmur0%3D Occurrence Handle10.1093/nar/gkh845

    Article  PubMed  CAS  Google Scholar 

  • S. Kikuchi K. Satoh et al. (2003) ArticleTitleCollection, mapping, and annotation of over 28,000 cDNA clones from japonica rice Science 301 IssueID5631 376–379 Occurrence Handle12869764 Occurrence Handle10.1126/science.1081288

    Article  PubMed  Google Scholar 

  • C.H. Ko V. Brendel et al. (1998) ArticleTitleU-richness is a defining feature of plant introns and may function as an intron recognition signal in maize Plant Mol. Biol. 36 IssueID4 573–583 Occurrence Handle9484452 Occurrence Handle1:CAS:528:DyaK1cXht1eltLY%3D Occurrence Handle10.1023/A:1005932620374

    Article  PubMed  CAS  Google Scholar 

  • A.V. Kochetov M.P. Ponomarenko et al. (1999) ArticleTitlePrediction of eukaryotic mRNA translational properties Bioinformatics 15 IssueID7–8 704–712 Occurrence Handle10487876 Occurrence Handle1:CAS:528:DyaK1MXntVWiuro%3D Occurrence Handle10.1093/bioinformatics/15.7.704

    Article  PubMed  CAS  Google Scholar 

  • E.S. Lander L.M. Linton et al. (2001) ArticleTitleInitial sequencing and analysis of the human genome Nature 409 IssueID6822 860–921 Occurrence Handle11237011 Occurrence Handle1:CAS:528:DC%2BD3MXhsFCjtLc%3D Occurrence Handle10.1038/35057062

    Article  PubMed  CAS  Google Scholar 

  • K. Mayer C. Schuller et al. (1999) ArticleTitleSequence and analysis of chromosome 4 of the plant Arabidopsis thaliana Nature 402 IssueID6763 769–777 Occurrence Handle10617198 Occurrence Handle1:CAS:528:DC%2BD3cXptF2j Occurrence Handle10.1038/47134

    Article  PubMed  CAS  Google Scholar 

  • Mignone, F., Gissi, C. et al. 2002. Untranslated regions of mRNAs. Genome Biol. 3(3).

  • Mirkin, B. 1996. Mathematical Classification and Clustering, Kluwer Academic Publishers.

  • R. Mott (1997) ArticleTitleEST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA Comput. Appl. Biosci. 13 IssueID4 477–478 Occurrence Handle9283765 Occurrence Handle1:CAS:528:DyaK2sXmtVOhsr0%3D

    PubMed  CAS  Google Scholar 

  • J. Mrazek S. Karlin (1998) ArticleTitleStrand compositional asymmetry in bacterial and large viral genomes Proc. Natl. Acad. Sci. USA 95 IssueID7 3720–3725 Occurrence Handle9520433 Occurrence Handle1:CAS:528:DyaK1cXitlKjt78%3D Occurrence Handle10.1073/pnas.95.7.3720

    Article  PubMed  CAS  Google Scholar 

  • H. Myllykallio P. Lopez et al. (2000) ArticleTitleBacterial mode of replication with eukaryotic-like machinery in a hyperthermophilic archaeon Science 288 IssueID5474 2212–2215 Occurrence Handle10864870 Occurrence Handle1:CAS:528:DC%2BD3cXksFeis70%3D Occurrence Handle10.1126/science.288.5474.2212

    Article  PubMed  CAS  Google Scholar 

  • H. Ner-Gaon R. Halachmi et al. (2004) ArticleTitleIntron retention is a major phenomenon in alternative splicing in Arabidopsis Plant J. 39 IssueID6 877–885 Occurrence Handle15341630 Occurrence Handle1:CAS:528:DC%2BD2cXpt1GntLg%3D Occurrence Handle10.1111/j.1365-313X.2004.02172.x

    Article  PubMed  CAS  Google Scholar 

  • M.E. Petracek T. Nuygen et al. (2000) ArticleTitlePremature termination codons destabilize ferredoxin-1 mRNA when ferredoxin-1 is translated Plant J. 21 IssueID6 563–569 Occurrence Handle10758507 Occurrence Handle1:CAS:528:DC%2BD3cXjsVeru7Y%3D Occurrence Handle10.1046/j.1365-313x.2000.00705.x

    Article  PubMed  CAS  Google Scholar 

  • M. Picardeau J.R. Lobry et al. (2000) ArticleTitleAnalyzing DNA strand compositional asymmetry to identify candidate replication origins of Borrelia burgdorferi linear and circular plasmids Genome Res. 10 IssueID10 1594–1604 Occurrence Handle11042157 Occurrence Handle1:CAS:528:DC%2BD3cXns1Shsr8%3D Occurrence Handle10.1101/gr.124000

    Article  PubMed  CAS  Google Scholar 

  • S.Y. Rhee W. Beavis et al. (2003) ArticleTitleThe Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community Nucleic Acids Res. 31 IssueID1 224–228 Occurrence Handle12519987 Occurrence Handle1:CAS:528:DC%2BD3sXhvFSnurk%3D Occurrence Handle10.1093/nar/gkg076

    Article  PubMed  CAS  Google Scholar 

  • I.B. Rogozin A.V. Kochetov et al. (2001) ArticleTitlePresence of ATG triplets in 5′ untranslated regions of eukaryotic cDNAs correlates with a ‘weak’ context of the start codon Bioinformatics 17 IssueID10 890–900 Occurrence Handle11673233 Occurrence Handle1:CAS:528:DC%2BD3MXot1Ggtb0%3D Occurrence Handle10.1093/bioinformatics/17.10.890

    Article  PubMed  CAS  Google Scholar 

  • A.B. Rose J.A. Beliakoff (2000) ArticleTitleIntron-mediated enhancement of gene expression independent of unique intron sequences and splicing Plant Physiol. 122 IssueID2 535–542 Occurrence Handle10677446 Occurrence Handle1:CAS:528:DC%2BD3cXktFCjtbg%3D Occurrence Handle10.1104/pp.122.2.535

    Article  PubMed  CAS  Google Scholar 

  • A. Schmitz D.J. Galas (1979) ArticleTitleThe interaction of RNA polymerase and lac repressor with the lac control region Nucleic Acids Res. 6 IssueID1 111–137 Occurrence Handle370784 Occurrence Handle1:CAS:528:DyaE1MXhvVWltbY%3D

    PubMed  CAS  Google Scholar 

  • H. Schoof R. Ernst et al. (2004) ArticleTitleMIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource for plant genomics Nucleic Acids Res 32 Database issue D373–D376 Occurrence Handle10.1093/nar/gkh068

    Article  Google Scholar 

  • M. Seki M. Narusaka et al. (2002) ArticleTitleFunctional annotation of a full-length Arabidopsis cDNA collection Science 296 IssueID5565 141–145 Occurrence Handle11910074 Occurrence Handle10.1126/science.1071006

    Article  PubMed  Google Scholar 

  • I.A. Shahmuradov A.J. Gammerman et al. (2003) ArticleTitlePlantProm: a database of plant promoter sequences Nucleic Acids Res. 31 IssueID1 114–117 Occurrence Handle12519961 Occurrence Handle1:CAS:528:DC%2BD3sXhvFSns7g%3D Occurrence Handle10.1093/nar/gkg041

    Article  PubMed  CAS  Google Scholar 

  • M. Schmid T.S. Davison S.R. Henz U.J. Pape M. Demar M. Vingron B. Scholkopf D. Weigel J.U. Lohmann (2005) ArticleTitleA gene expression map of Arabidopsis thaliana development Nat. Genet. 37 IssueID5 501–506 Occurrence Handle15806101 Occurrence Handle1:CAS:528:DC%2BD2MXjsF2ksrg%3D Occurrence Handle10.1038/ng1543

    Article  PubMed  CAS  Google Scholar 

  • G. Storz (2002) ArticleTitleAn expanding universe of noncoding RNAs Science 296 IssueID5571 1260–1263 Occurrence Handle12016301 Occurrence Handle1:CAS:528:DC%2BD38Xjsl2qtrY%3D Occurrence Handle10.1126/science.1072249

    Article  PubMed  CAS  Google Scholar 

  • InstitutionalAuthorNameThe Arabidopsis Genome Initiative (2000) ArticleTitleAnalysis of the genome sequence of the flowering plant Arabidopsis thaliana Nature 408 796–815 Occurrence Handle10.1038/35048692

    Article  Google Scholar 

  • J. Usuka W. Zhu et al. (2000) ArticleTitleOptimal spliced alignment of homologous cDNA to a genomic DNA template Bioinformatics 16 IssueID3 203–211 Occurrence Handle10869013 Occurrence Handle1:CAS:528:DC%2BD3cXksFajurk%3D Occurrence Handle10.1093/bioinformatics/16.3.203

    Article  PubMed  CAS  Google Scholar 

  • J.C. Venter M.D. Adams et al. (2001) ArticleTitleThe sequence of the human genome Science 291 IssueID5507 1304–1351 Occurrence Handle11181995 Occurrence Handle1:CAS:528:DC%2BD3MXhtlSgsbo%3D Occurrence Handle10.1126/science.1058040

    Article  PubMed  CAS  Google Scholar 

  • K. Yamada J. Lim et al. (2003) ArticleTitleEmpirical analysis of transcriptional activity in the Arabidopsis genome Science 302 IssueID5646 842–846 Occurrence Handle14593172 Occurrence Handle1:CAS:528:DC%2BD3sXos1Cmsbg%3D Occurrence Handle10.1126/science.1088305

    Article  PubMed  CAS  Google Scholar 

  • J. Yu S. Hu et al. (2002) ArticleTitleA draft sequence of the rice genome (Oryza sativa L. ssp. indica) Science 296 IssueID5565 79–92 Occurrence Handle11935017 Occurrence Handle1:CAS:528:DC%2BD38XivVSqtr8%3D Occurrence Handle10.1126/science.1068037

    Article  PubMed  CAS  Google Scholar 

  • M. Zavolan E.V. Nimwegen et al. (2002) ArticleTitleSplice variation in mouse full-length cDNAs identified by mapping to the mouse genome Genome Res. 12 IssueID9 1377–1385 Occurrence Handle12213775 Occurrence Handle1:CAS:528:DC%2BD38Xnt1elsbk%3D Occurrence Handle10.1101/gr.191702

    Article  PubMed  CAS  Google Scholar 

  • J. Zhao L. Hyman et al. (1999) ArticleTitleFormation of mRNA 3′ ends in eukaryotes: mechanism, regulation, and interrelationships with other steps in mRNA synthesis Microbiol. Mol. Biol. Rev. 63 IssueID2 405–445 Occurrence Handle10357856 Occurrence Handle1:STN:280:DyaK1M3os1eksQ%3D%3D

    PubMed  CAS  Google Scholar 

  • W. Zhu S.D. Schlueter et al. (2003) ArticleTitleRefined annotation of the Arabidopsis genome by complete expressed sequence tag mapping Plant Physiol. 132 IssueID2 469–484 Occurrence Handle12805580 Occurrence Handle1:CAS:528:DC%2BD3sXkslersLs%3D Occurrence Handle10.1104/pp.102.018101

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nickolai N. Alexandrov.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alexandrov, N.N., Troukhan, M.E., Brover, V.V. et al. Features of Arabidopsis Genes and Genome Discovered using Full-length cDNAs. Plant Mol Biol 60, 69–85 (2006). https://doi.org/10.1007/s11103-005-2564-9

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11103-005-2564-9

Keywords

Navigation