Abstract
Five ab initio programs (FGENESH, GeneMark.hmm, GENSCAN, GlimmerR and Grail) were evaluated for their accuracy in predicting maize genes. Two of these programs, GeneMark.hmm and GENSCAN had been trained for maize; FGENESH had been trained for monocots (including maize), and the others had been trained for rice or Arabidopsis. Initial evaluations were conducted using eight maize genes (gl8a, pdc2, pdc3, rf2c, rf2d, rf2e1, rth1, and rth3) of which the sequences were not released to the public prior to conducting this evaluation. The significant advantage of this data set for this evaluation is that these genes could not have been included in the training sets of the prediction programs. FGENESH yielded the most accurate and GeneMark.hmm the second most accurate predictions. The five programs were used in conjunction with RT-PCR to identify and establish the structures of two new genes in the a1-sh2 interval of the maize genome. FGENESH, GeneMark.hmm and GENSCAN were tested on a larger data set consisting of maize assembled genomic islands (MAGIs) that had been aligned to ESTs. FGENESH, GeneMark.hmm and GENSCAN correctly predicted gene models in 773, 625, and 371 MAGIs, respectively, out of the 1353 MAGIs that comprise data set 2.
Similar content being viewed by others
Abbreviations
- AE:
-
actual exon
- CC:
-
correlation coefficient
- FN:
-
false negative
- FP:
-
false positive
- GSSs:
-
genome survey sequences
- HC:
-
high C o t
- MAGIs:
-
maize assembled genomic islands
- ME:
-
missing exon
- MF:
-
methylation filtration
- OE:
-
overlapped exon
- PE:
-
partial exon
- RACE:
-
Rapid Amplification of cDNA Ends
- SN:
-
sensitivity
- SP:
-
specificity
- TE:
-
true exon
- TP:
-
true positive
- WE:
-
wrong exon
References
J.E. Allen M. Pertea S.L. Salzberg (2004) ArticleTitleComputational gene prediction using multiple sources of evidence Genome Res. 14 142–148
J.L. Bennetzen V.L. Chandler P.S. Schnable (2001) ArticleTitleNational Science Foundation-sponsored workshop report. Maize genome sequencing project Plant Physiol. 127 1572–1578
V. Brendel L. Xing W. Zhu (2004) ArticleTitleGene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus Bioinformatics 20 1157–1169
C. Burge S. Karlin (1997) ArticleTitlePrediction of complete gene structures in human genomic DNA J. Mol. Biol. 268 78–94
C. Burge S. Karlin (1998) ArticleTitleFinding the genes in genomic DNA Curr. Opin. Struct. Biol. 8 346–354
M. Burset R. Guigo (1996) ArticleTitleEvaluation of gene structure prediction programs Genomics 34 353–367
M. Burset I.A. Seledtsov V.V. Solovyev (2000) ArticleTitleAnalysis of canonical and non-canonical splice sites in mammalian genomes Nucleic Acids Res. 28 4364–4375
M. Burset I.A. Seledtsov V.V. Solovyev (2001) ArticleTitleSpliceDB: database of canonical and non-canonical mammalian splice sites Nucleic Acids Res. 29 255–259
M. Chen J.L. Bennetzen (1996) ArticleTitleSequence composition and organization in the Sh2/A1-homologous region of rice Plant Mol. Biol. 32 999–1001
M. Chen P. SanMiguel J.L. Bennetzen (1998) ArticleTitleSequence organization and conservation in sh2/a1-homologous regions of sorghum and rice Genetics 148 435–443
L. Civardi Y. Xia K.J. Edwards P.S. Schnable B.J. Nikolau (1994) ArticleTitleThe relationship between genetic and physical distances in the cloned a1-sh2 interval of the Zea mays L. genome Proc. Natl. Acad. Sci. USA 91 8268–8272
S.J. Emrich S. Aluru Y. Fu T.-J. Wen M. Narayanan L. Guo D.A. Ashlock P.S. Schnable (2004) ArticleTitleAstrategy for assembling the maize (Zea mays L.) genome Bioinformatics 20 140–147
S.A. Goff et al. (2002) ArticleTitleA draft sequence of the rice genome (Oryza sativa L. ssp. japonica) Science 296 92–100 Occurrence Handle10.1126/science.1068275 Occurrence Handle1:CAS:528:DC%2BD38XivVSqtrw%3D Occurrence Handle11935018
S.M. Hebsgaard P.G. Korning N. Tolstrup J. Engelbrecht P. Rouze S. Brunak (1996) ArticleTitleSplice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information Nucleic Acids Res. 24 3439–3452
A.N. Kolmogorov (1933) ArticleTitleSulla determinazione empirica di una legge di distribuzione Giornale dell’ Istituto Italiano degli Attuari 4 83–91
I. Korf (2004) ArticleTitleGene finding in novel genomes BMC Bioinformatics 5 59
I.P. Korf D.D. Flicek M.R. Brent (2001) ArticleTitleIntegrating genomic homology into gene structure prediction Bioinformatics 17 140–148
A.V. Lukashin M. Borodovsky (1998) ArticleTitleGeneMark.hmm: new solutions for gene finding Nucleic Acids Res. 26 1107–1115
C. Mathé M.F. Sagot T. Schiex P. Rouze (2002) ArticleTitleCurrent methods of gene prediction, their strengths and weaknesses Nucleic Acids Res. 30 4103–4117
G. Moore (2000) ArticleTitleCereal chromosome structure, evolution, and pairing Annu. Rev. Plant Physiol. Plant Mol. Biol. 51 195–222
L.E. Palmer P.D. Rabinowicz A.L. O’Shaughnessy V.S. Balija L.U. Nascimento S. Dike M. Bastide Particlede la R.A. Martienssen W.R. McCombie (2003) ArticleTitleMaize genome sequencing by methylation filtration Science 302 2115–2117
N. Pavy S. Rombauts P. Dehais C. Mathe D.V. Ramana P. Leroy P. Rouze (1999) ArticleTitleEvaluation of gene prediction software using a genomic data set: application to Arabidopsis thaliana sequences Bioinformatics 15 887–899
M. Pertea X. Lin S.L. Salzberg (2001) ArticleTitleGeneSplicer: a new computational method for splice site prediction Nucleic Acids Res. 29 1185–1190
M. Pertea S.L. Salzberg (2002) ArticleTitleComputational gene finding in plants Plant Mol. Biol. 48 39–48
D.G. Peterson S.R. Wessler A.H. Paterson (2002) ArticleTitleEfficient capture of unique sequences from eukaryotic genomes Trends Genet. 18 547–550
P.D. Rabinowicz K. Schutz N. Dedhia C. Yordan L.D. Parnell L. Stein W.R. McCombie R.A. Martienssen (1999) ArticleTitleDifferential methylation of genes and retrotransposons facilitates shotgun sequencing of the maize genome Nature Genet. 23 305–308
S. Rogic A.K. Mackworth F.B.F. Ouellette (2001) ArticleTitleEvaluation of gene-finding programs on mammalian sequences Genome Res. 11 817–832
A.A. Salamov V.V. Solovyev (2000) ArticleTitleAb initio gene finding in Drosophila genomic DNA Genome Res. 10 516–522
S.L. Salzberg M. Pertea A.L. Delcher M.J. Gardner H. Tettelin (1999) ArticleTitleInterpolated Markov models for eukaryotic gene finding Genomics 59 24–31
P. SanMiguel A. Tikhonov Y.K. Jin N. Motchoulskaia D. Zakharov A. Melake-Berhan P.S. Springer K.J. Edwards M. Lee Z. Avramova J.L. Bennetzen (1996) ArticleTitleNested retrotransposons in the intergenic regions of the maize genome Science 274 765–768
N.V. Smirnov (1939) ArticleTitleEstimate of deviation between empirical distribution functions in two independent samples Bull. Moscow University 2 3–16
V. Solovyev (2001) Statistical approaches in eukaryotic gene prediction D.J. Balding M. Bishop C. Cannings (Eds) Handbook of Statistical Genetics John Wiley & Sons Ltd New York 83–127
G.D. Stormo (2000) ArticleTitleGene-finding approaches for eukaryotes Genome Res. 10 394–397
The Arabidopsis Genome Initiative 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815
N. Tolstrup P. Rouze S. Brunak (1997) ArticleTitleA branch point consensus from Arabidopsis found by non-circular analysis allows for better prediction of acceptor sites Nucleic Acids Res. 25 3159–3163
J. Usuka V. Brendel (2000) ArticleTitleGene structure prediction by spliced alignment of genomic DNA with protein sequences: increased accuracy by differential splice site scoring J. Mol. Biol. 297 1075–1085
J. Usuka W. Zhu V. Brendel (2000) ArticleTitleOptimal spliced alignment of homologous cDNA to a genomic DNA template Bioinformatics 16 203–211
J. Wang S. Li Y. Zhang H. Zheng Z. Xu J. Ye J. Yu G.K. Wong (2003) ArticleTitleVertebrate gene predictions and the problem of large genes Nat. Rev. Genet. 4 741–749
C.A. Whitelaw W.B. Barbazuk G. Pertea A.P. Chan F. Cheung Y. Lee L. Zheng S. Heeringen Particlevan S. Karamycheva J.L. Bennetzen P. SanMiguel N. Lakey J. Bedell Y. Yuan M.A. Budiman A. Resnick S. Van Aken T. Utterback S. Riedmuller M. Williams T. Feldblyum K. Schubert R. Beachy C.M. Fraser J. Quackenbush (2003) ArticleTitleEnrichment of gene-coding sequences in maize by genome filtration Science 302 2118–2120
Y. Xu E.C. Uberbacher (1997) ArticleTitleAutomated gene identification in large-scale genomic sequences J. Comput. Biol. 4 325–338
H. Yao Q. Zhou J. Li H. Smith M. Yandeau B.J. Nikolau P.S. Schnable (2002) ArticleTitleMolecular characterization of meiotic recombination across the 140-kb multigenic a1-sh2 interval of maize Proc. Natl. Acad. Sci. USA 99 6157–6162
J. Yu et al. (2002) ArticleTitleA draft sequence of the rice genome (Oryza sativa L. ssp. indica) Science 296 79–92 Occurrence Handle10.1126/science.1068037 Occurrence Handle1:CAS:528:DC%2BD38XivVSqtr8%3D Occurrence Handle11935017
Q. Yuan J. Quackenbush R. Sultana M. Pertea S.L. Salzberg C.R. Buell (2001) ArticleTitleRice bioinformatics. Analysis of rice sequence data and leveraging the data to other plant species Plant Physiol. 125 1166–1174
Y. Yuan P.J. SanMiguel J.L. Bennetzen (2003) ArticleTitleHigh-Cot sequence analysis of the maize genome Plant J. 34 249–255
W. Zhu S.D. Schlueter V. Brendel (2003) ArticleTitleRefined annotation of the Arabidopsis genome by complete expressed sequence tag mapping Plant Physiol. 132 469–484
Author information
Authors and Affiliations
Corresponding author
Additional information
†these authors contributed equally to this work
Rights and permissions
About this article
Cite this article
Yao, H., Guo, L., Fu, Y. et al. Evaluation of five ab initio gene prediction programs for the discovery of maize genes. Plant Mol Biol 57, 445–460 (2005). https://doi.org/10.1007/s11103-005-0271-1
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s11103-005-0271-1