Plant Molecular Biology

, Volume 48, Issue 1–2, pp 49–58 | Cite as

Computational modeling of gene structure in Arabidopsis thaliana

  • Volker Brendel
  • Wei Zhu


Computational gene identification by sequence inspection remains a challenging problem. For a typical Arabidopsis thaliana gene with five exons, at least one of the exons is expected to have at least one of its borders predicted incorrectly by ab initio gene finding programs. More detailed analysis for individual genomic loci can often resolve the uncertainty on the basis of EST evidence or similarity to potential protein homologues. Such methods are part of the routine annotation process. However, because the EST and protein databases are constantly growing, in many cases original annotation must be re-evaluated, extended, and corrected on the basis of the latest evidence. The Arabidopsis Genome Initiative is undertaking this task on the whole-genome scale via its participating genome centers. The current Arabidopsis genome annotation provides an excellent starting point for assessing the protein repertoire of a flowering plant. More accurate whole-genome annotation will require the combination of high-throughput and individual gene experimental approaches and computational methods. The purpose of this article is to discuss tools available to an individual researcher to evaluate gene structure prediction for a particular locus.

EST analysis gene prediction spliced alignment 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25: 3389–3402.Google Scholar
  2. Arabidopsis Genome Initiative. 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–813.Google Scholar
  3. Bennetzen, J.L. 2000. Comparative sequence analysis of plant nuclear genomes: microcolinearity and its many exceptions. Plant Cell 12: 1021–1029.Google Scholar
  4. Bevan, M. et al. 1998. Analysis of 1.9 Mb of contiguous sequence from chromosome 4 of Arabidopsis thaliana. Nature 391: 485–488.Google Scholar
  5. Brendel, V. and Kleffe, J. 1998. Prediction of locally optimal splice sites in plant pre-mRNA with applications to gene identification in Arabidopsis thaliana genomic DNA. Nucl. Acids Res. 26: 4749–4757.Google Scholar
  6. Burge, C. and Karlin, S. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268: 78–94.Google Scholar
  7. Cho, Y. and Walbot, V. 2001. Computational methods for gene annotation: the Arabidopsis genome. Curr. Opin. Biotechnol. 12: 126–130.Google Scholar
  8. Claverie, J.-M. 1997. Computational methods for the identification of genes in vertebrate genomic sequences. Hum. Mol. Genet. 6: 1735–1744.Google Scholar
  9. Florea, L., Hartzell, G., Zhang, Z., Rubin, G.M. and Miller, W. 1998. A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res. 8: 967–974.Google Scholar
  10. Gelfand, M.S., Mironov, A.A. and Pevzner, P.A. 1996. Gene recognition via spliced sequence alignment. Proc. Natl. Acad. Sci. USA 93: 9061–9066.Google Scholar
  11. Huang, X., Adams, M.D., Zhou, H. and Kerlavage, A.R. 1997. A tool for analyzing and annotating genomic sequences. Genomics 46: 37–45.Google Scholar
  12. Huang, X. and Zhang, J. 1996. Methods for comparing a DNA sequence with a protein sequence. Comput. Appl. Biosci. 12: 497–506.Google Scholar
  13. Kan, Z., Rouchka, E.C., Gish, W.R. and States, D.J. 2001. Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. Genome Res. 11: 889–900.Google Scholar
  14. Lukashin, A.V. and Borodovsky, M. 1998. GeneMark.hmm: new solutions for gene finding. Nucl. Acids Res. 26: 1107–1115.Google Scholar
  15. Mathé, C., Déhais, P., Pavy, N., Rombauts, S., Van Montagu, M. and Rouzé, P. 2000. Gene prediction and gene classes in Arabidopsis thaliana. J. Biotechnol. 78: 293–299.Google Scholar
  16. Pavy, N., Rombauts, S., Déhais, P., Mathé, C., Ramana, D.V.V., Leroy, P. and Rouzé, P. 1999. Evaluation of gene prediction software using a genomic data set: application to Arabidopsis thaliana sequences. Bioinformatics 15: 887–899.Google Scholar
  17. Rogic, S., Mackworth, A.K. and Ouellette, F.B.F. 2001. Evaluation of gene-finding programs on mammalian sequences. Genome Res. 2001: 817–832.Google Scholar
  18. Salzberg, S.L., Pertea, M., Delcher, A.L., Gardner, M.J. and Tettelin, H. 1999. Interpolated Markov models for eukaryotic gene finding. Genomics 59: 24–31.Google Scholar
  19. Shoemaker, D.D. et al. 2001. Experimental annotation of the human genome using microarray technology. Nature 409: 922–927.Google Scholar
  20. Usuka, J., Zhu,W. and Brendel, V. 2000. Optimal spliced alignment of homologous cDNA to a genomic DNA template. Bioinformatics 16: 203–211.Google Scholar
  21. Usuka, J. and Brendel, V. 2000. Gene structure prediction by spliced alignment of genomic DNA with protein sequences: Increased accuracy by differential splice site scoring. J. Mol. Biol. 297: 1075–1085.Google Scholar
  22. Yeh, R.-F., Lim, L.P. and Burge, C.B. 2001. Computational inference of homologous gene structures in the human genome. Genome Res. 11: 803–816.Google Scholar
  23. Zhang, M.Q. 1998. Identification of protein coding regions in Arabidopsis thaliana genome based on quadratic discriminant analysis. Plant Mol. Biol. 37: 803–806.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • Volker Brendel
    • 1
  • Wei Zhu
    • 1
  1. 1.Department of Zoology & Genetics and Department of StatisticsIowa State UniversityAmesUSA

Personalised recommendations