Skip to main content
Log in

Evaluation of five ab initio gene prediction programs for the discovery of maize genes

  • Published:
Plant Molecular Biology Aims and scope Submit manuscript

Abstract

Five ab initio programs (FGENESH, GeneMark.hmm, GENSCAN, GlimmerR and Grail) were evaluated for their accuracy in predicting maize genes. Two of these programs, GeneMark.hmm and GENSCAN had been trained for maize; FGENESH had been trained for monocots (including maize), and the others had been trained for rice or Arabidopsis. Initial evaluations were conducted using eight maize genes (gl8a, pdc2, pdc3, rf2c, rf2d, rf2e1, rth1, and rth3) of which the sequences were not released to the public prior to conducting this evaluation. The significant advantage of this data set for this evaluation is that these genes could not have been included in the training sets of the prediction programs. FGENESH yielded the most accurate and GeneMark.hmm the second most accurate predictions. The five programs were used in conjunction with RT-PCR to identify and establish the structures of two new genes in the a1-sh2 interval of the maize genome. FGENESH, GeneMark.hmm and GENSCAN were tested on a larger data set consisting of maize assembled genomic islands (MAGIs) that had been aligned to ESTs. FGENESH, GeneMark.hmm and GENSCAN correctly predicted gene models in 773, 625, and 371 MAGIs, respectively, out of the 1353 MAGIs that comprise data set 2.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

AE:

actual exon

CC:

correlation coefficient

FN:

false negative

FP:

false positive

GSSs:

genome survey sequences

HC:

high C o t

MAGIs:

maize assembled genomic islands

ME:

missing exon

MF:

methylation filtration

OE:

overlapped exon

PE:

partial exon

RACE:

Rapid Amplification of cDNA Ends

SN:

sensitivity

SP:

specificity

TE:

true exon

TP:

true positive

WE:

wrong exon

References

  • J.E. Allen M. Pertea S.L. Salzberg (2004) ArticleTitleComputational gene prediction using multiple sources of evidence Genome Res. 14 142–148

    Google Scholar 

  • J.L. Bennetzen V.L. Chandler P.S. Schnable (2001) ArticleTitleNational Science Foundation-sponsored workshop report. Maize genome sequencing project Plant Physiol. 127 1572–1578

    Google Scholar 

  • V. Brendel L. Xing W. Zhu (2004) ArticleTitleGene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus Bioinformatics 20 1157–1169

    Google Scholar 

  • C. Burge S. Karlin (1997) ArticleTitlePrediction of complete gene structures in human genomic DNA J. Mol. Biol. 268 78–94

    Google Scholar 

  • C. Burge S. Karlin (1998) ArticleTitleFinding the genes in genomic DNA Curr. Opin. Struct. Biol. 8 346–354

    Google Scholar 

  • M. Burset R. Guigo (1996) ArticleTitleEvaluation of gene structure prediction programs Genomics 34 353–367

    Google Scholar 

  • M. Burset I.A. Seledtsov V.V. Solovyev (2000) ArticleTitleAnalysis of canonical and non-canonical splice sites in mammalian genomes Nucleic Acids Res. 28 4364–4375

    Google Scholar 

  • M. Burset I.A. Seledtsov V.V. Solovyev (2001) ArticleTitleSpliceDB: database of canonical and non-canonical mammalian splice sites Nucleic Acids Res. 29 255–259

    Google Scholar 

  • M. Chen J.L. Bennetzen (1996) ArticleTitleSequence composition and organization in the Sh2/A1-homologous region of rice Plant Mol. Biol. 32 999–1001

    Google Scholar 

  • M. Chen P. SanMiguel J.L. Bennetzen (1998) ArticleTitleSequence organization and conservation in sh2/a1-homologous regions of sorghum and rice Genetics 148 435–443

    Google Scholar 

  • L. Civardi Y. Xia K.J. Edwards P.S. Schnable B.J. Nikolau (1994) ArticleTitleThe relationship between genetic and physical distances in the cloned a1-sh2 interval of the Zea mays L. genome Proc. Natl. Acad. Sci. USA 91 8268–8272

    Google Scholar 

  • S.J. Emrich S. Aluru Y. Fu T.-J. Wen M. Narayanan L. Guo D.A. Ashlock P.S. Schnable (2004) ArticleTitleAstrategy for assembling the maize (Zea mays L.) genome Bioinformatics 20 140–147

    Google Scholar 

  • S.A. Goff et al. (2002) ArticleTitleA draft sequence of the rice genome (Oryza sativa L. ssp. japonica) Science 296 92–100 Occurrence Handle10.1126/science.1068275 Occurrence Handle1:CAS:528:DC%2BD38XivVSqtrw%3D Occurrence Handle11935018

    Article  CAS  PubMed  Google Scholar 

  • S.M. Hebsgaard P.G. Korning N. Tolstrup J. Engelbrecht P. Rouze S. Brunak (1996) ArticleTitleSplice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information Nucleic Acids Res. 24 3439–3452

    Google Scholar 

  • A.N. Kolmogorov (1933) ArticleTitleSulla determinazione empirica di una legge di distribuzione Giornale dell’ Istituto Italiano degli Attuari 4 83–91

    Google Scholar 

  • I. Korf (2004) ArticleTitleGene finding in novel genomes BMC Bioinformatics 5 59

    Google Scholar 

  • I.P. Korf D.D. Flicek M.R. Brent (2001) ArticleTitleIntegrating genomic homology into gene structure prediction Bioinformatics 17 140–148

    Google Scholar 

  • A.V. Lukashin M. Borodovsky (1998) ArticleTitleGeneMark.hmm: new solutions for gene finding Nucleic Acids Res. 26 1107–1115

    Google Scholar 

  • C. Mathé M.F. Sagot T. Schiex P. Rouze (2002) ArticleTitleCurrent methods of gene prediction, their strengths and weaknesses Nucleic Acids Res. 30 4103–4117

    Google Scholar 

  • G. Moore (2000) ArticleTitleCereal chromosome structure, evolution, and pairing Annu. Rev. Plant Physiol. Plant Mol. Biol. 51 195–222

    Google Scholar 

  • L.E. Palmer P.D. Rabinowicz A.L. O’Shaughnessy V.S. Balija L.U. Nascimento S. Dike M. Bastide Particlede la R.A. Martienssen W.R. McCombie (2003) ArticleTitleMaize genome sequencing by methylation filtration Science 302 2115–2117

    Google Scholar 

  • N. Pavy S. Rombauts P. Dehais C. Mathe D.V. Ramana P. Leroy P. Rouze (1999) ArticleTitleEvaluation of gene prediction software using a genomic data set: application to Arabidopsis thaliana sequences Bioinformatics 15 887–899

    Google Scholar 

  • M. Pertea X. Lin S.L. Salzberg (2001) ArticleTitleGeneSplicer: a new computational method for splice site prediction Nucleic Acids Res. 29 1185–1190

    Google Scholar 

  • M. Pertea S.L. Salzberg (2002) ArticleTitleComputational gene finding in plants Plant Mol. Biol. 48 39–48

    Google Scholar 

  • D.G. Peterson S.R. Wessler A.H. Paterson (2002) ArticleTitleEfficient capture of unique sequences from eukaryotic genomes Trends Genet. 18 547–550

    Google Scholar 

  • P.D. Rabinowicz K. Schutz N. Dedhia C. Yordan L.D. Parnell L. Stein W.R. McCombie R.A. Martienssen (1999) ArticleTitleDifferential methylation of genes and retrotransposons facilitates shotgun sequencing of the maize genome Nature Genet. 23 305–308

    Google Scholar 

  • S. Rogic A.K. Mackworth F.B.F. Ouellette (2001) ArticleTitleEvaluation of gene-finding programs on mammalian sequences Genome Res. 11 817–832

    Google Scholar 

  • A.A. Salamov V.V. Solovyev (2000) ArticleTitleAb initio gene finding in Drosophila genomic DNA Genome Res. 10 516–522

    Google Scholar 

  • S.L. Salzberg M. Pertea A.L. Delcher M.J. Gardner H. Tettelin (1999) ArticleTitleInterpolated Markov models for eukaryotic gene finding Genomics 59 24–31

    Google Scholar 

  • P. SanMiguel A. Tikhonov Y.K. Jin N. Motchoulskaia D. Zakharov A. Melake-Berhan P.S. Springer K.J. Edwards M. Lee Z. Avramova J.L. Bennetzen (1996) ArticleTitleNested retrotransposons in the intergenic regions of the maize genome Science 274 765–768

    Google Scholar 

  • N.V. Smirnov (1939) ArticleTitleEstimate of deviation between empirical distribution functions in two independent samples Bull. Moscow University 2 3–16

    Google Scholar 

  • V. Solovyev (2001) Statistical approaches in eukaryotic gene prediction D.J. Balding M. Bishop C. Cannings (Eds) Handbook of Statistical Genetics John Wiley & Sons Ltd New York 83–127

    Google Scholar 

  • G.D. Stormo (2000) ArticleTitleGene-finding approaches for eukaryotes Genome Res. 10 394–397

    Google Scholar 

  • The Arabidopsis Genome Initiative 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815

    Google Scholar 

  • N. Tolstrup P. Rouze S. Brunak (1997) ArticleTitleA branch point consensus from Arabidopsis found by non-circular analysis allows for better prediction of acceptor sites Nucleic Acids Res. 25 3159–3163

    Google Scholar 

  • J. Usuka V. Brendel (2000) ArticleTitleGene structure prediction by spliced alignment of genomic DNA with protein sequences: increased accuracy by differential splice site scoring J. Mol. Biol. 297 1075–1085

    Google Scholar 

  • J. Usuka W. Zhu V. Brendel (2000) ArticleTitleOptimal spliced alignment of homologous cDNA to a genomic DNA template Bioinformatics 16 203–211

    Google Scholar 

  • J. Wang S. Li Y. Zhang H. Zheng Z. Xu J. Ye J. Yu G.K. Wong (2003) ArticleTitleVertebrate gene predictions and the problem of large genes Nat. Rev. Genet. 4 741–749

    Google Scholar 

  • C.A. Whitelaw W.B. Barbazuk G. Pertea A.P. Chan F. Cheung Y. Lee L. Zheng S. Heeringen Particlevan S. Karamycheva J.L. Bennetzen P. SanMiguel N. Lakey J. Bedell Y. Yuan M.A. Budiman A. Resnick S. Van Aken T. Utterback S. Riedmuller M. Williams T. Feldblyum K. Schubert R. Beachy C.M. Fraser J. Quackenbush (2003) ArticleTitleEnrichment of gene-coding sequences in maize by genome filtration Science 302 2118–2120

    Google Scholar 

  • Y. Xu E.C. Uberbacher (1997) ArticleTitleAutomated gene identification in large-scale genomic sequences J. Comput. Biol. 4 325–338

    Google Scholar 

  • H. Yao Q. Zhou J. Li H. Smith M. Yandeau B.J. Nikolau P.S. Schnable (2002) ArticleTitleMolecular characterization of meiotic recombination across the 140-kb multigenic a1-sh2 interval of maize Proc. Natl. Acad. Sci. USA 99 6157–6162

    Google Scholar 

  • J. Yu et al. (2002) ArticleTitleA draft sequence of the rice genome (Oryza sativa L. ssp. indica) Science 296 79–92 Occurrence Handle10.1126/science.1068037 Occurrence Handle1:CAS:528:DC%2BD38XivVSqtr8%3D Occurrence Handle11935017

    Article  CAS  PubMed  Google Scholar 

  • Q. Yuan J. Quackenbush R. Sultana M. Pertea S.L. Salzberg C.R. Buell (2001) ArticleTitleRice bioinformatics. Analysis of rice sequence data and leveraging the data to other plant species Plant Physiol. 125 1166–1174

    Google Scholar 

  • Y. Yuan P.J. SanMiguel J.L. Bennetzen (2003) ArticleTitleHigh-Cot sequence analysis of the maize genome Plant J. 34 249–255

    Google Scholar 

  • W. Zhu S.D. Schlueter V. Brendel (2003) ArticleTitleRefined annotation of the Arabidopsis genome by complete expressed sequence tag mapping Plant Physiol. 132 469–484

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick S. Schnable.

Additional information

these authors contributed equally to this work

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yao, H., Guo, L., Fu, Y. et al. Evaluation of five ab initio gene prediction programs for the discovery of maize genes. Plant Mol Biol 57, 445–460 (2005). https://doi.org/10.1007/s11103-005-0271-1

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11103-005-0271-1

Keywords

Navigation