Journal of Computer Science and Technology

, Volume 20, Issue 4, pp 446–453 | Cite as

Test Data Sets and Evaluation of Gene Prediction Programs on the Rice Genome

  • Heng Li
  • Jin-Song Liu
  • Zhao Xu
  • Jiao Jin
  • Lin Fang
  • Lei Gao
  • Yu-Dong Li
  • Zi-Xing Xing
  • Shao-Gen Gao
  • Tao Liu
  • Hai-Hong Li
  • Yan Li
  • Li-Jun Fang
  • Hui-Min Xie
  • Wei-Mou Zheng
  • Bai-Lin Hao
Regular Paper

Abstract

With several rice genome projects approaching completion gene prediction/finding by computer algorithms has become an urgent task. Two test sets were constructed by mapping the newly published 28,469 full-length KOME rice cDNA to the RGP BAC clone sequences of Oryza sativa ssp. japonica: a single-gene set of 550 sequences and a multi-gene set of 62 sequences with 271 genes. These data sets were used to evaluate five ab initio gene prediction programs: RiceHMM, GlimmerR, GeneMark, FGENSH and BGF. The predictions were compared on nucleotide, exon and whole gene structure levels using commonly accepted measures and several new measures. The test results show a progress in performance in chronological order. At the same time complementarity of the programs hints on the possibility of further improvement and on the feasibility of reaching better performance by combining several gene-finders.

Keywords

gene prediction rice genome test sets accuracy measures hidden Markov models dynamic programming 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Yu J, Hu S-N et al. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science, 2002, 296: 79–92.Google Scholar
  2. 2.
    Goff S A, Ricke D et al. A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science, 2002, 296: 92–100.CrossRefGoogle Scholar
  3. 3.
    The international rice genome sequencing project. http://rgp.dna.affrc.go.jp/IRSGP/
  4. 4.
    Sasaki T, Matsumoto T, Yamamoto K et al. The genome sequence and structure of rice chromosome 1. Nature, 2002, 420: 312–316.CrossRefGoogle Scholar
  5. 5.
    Feng Q, Zhang Y J, Wang S Y et al. Sequence and analysis of rice chromosome 4. Nature, 2002, 420: 316–320.CrossRefGoogle Scholar
  6. 6.
    The rice chromosome 10 sequencing consortium. In-depth view of structure, activity and evolution of rice chromosome 10. Science, 2003, 300: 1566–1569.CrossRefGoogle Scholar
  7. 7.
    Zhao W-M, Wang J, He X-M et al. BGI-RIS: An integrated information resource and comparative analysis workbench for rice genomics. Nucl. Acids Res., 2004, 32: D377–D382.Google Scholar
  8. 8.
    Pertea M, Salzberg S L. Computational gene finding in plants. Plant Mol. Biol., 2002, 48(1): 39–48.CrossRefGoogle Scholar
  9. 9.
    Solovyev V V. Finding Genes by Computer: Probabilistic and Discriminative Approaches. Current Topics in Computational Molecular Biology, Jiang T, Xu Y, Zhang M Q (eds.), Tsinghua University Press and MIT Press, 2002, pp.201–248.Google Scholar
  10. 10.
    Brent M R, Guigó R. Recent advances in gene structure prediction. Curr. Opin. Struct. Biol., 2004, 14: 264–272.CrossRefGoogle Scholar
  11. 11.
    Shah S P, McVicker G P, Mackworth A K et al. GeneComber: Combining outputs of gene prediction programs. Bioinformatics, 2003, 9(10): 1296–1297.Google Scholar
  12. 12.
    Allen J E, Pertea M, Salzberg S L. Computational gene prediction using multiple sources of evidence. Genome Res., 2004, 14(1): 142–148.Google Scholar
  13. 13.
    Burset M, Guigó R. Evaluation of gene structure prediction programs. Genomics, 1996, 34: 353–367.CrossRefGoogle Scholar
  14. 14.
    Guigó R, Agarwal P, Abril J F et al. An assessment of gene prediction accuracy in large DNA sequences. Genome Res., 2000, 10(10): 1631–1642.Google Scholar
  15. 15.
    Rogic S, Mackworth A K, Ouellette B F. Evaluation of gene-finding programs on mammalian sequences. Genome Res., 2001, 11(5): 817–832.CrossRefGoogle Scholar
  16. 16.
    Kleffe J, Hermann K, Vahrson W et al. Logitlinear models for the prediction of splics sites in plant pre-mRNA sequences. Nucl. Acids Res., 1996, 24(23): 4709–4718.CrossRefGoogle Scholar
  17. 17.
    The European Union Arabidopsis Genome Sequencing Consortium and the Cold Spring Harbor Washington University in St Louis and PE Biosystem Arabidopsis Sequencing Consortium. Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana. Nature, 1999, 402: 769–777.Google Scholar
  18. 18.
    Yuan Q, Quackenbush J, Sultana R et al. Rice bioinformatics. Analysis of rice sequence data and leveraging the data to other plant species. Plant Physiol., 2001, 125: 1166–1174.CrossRefGoogle Scholar
  19. 19.
    The Rice Full-Length cDNA Consortium. Collection, mapping and annotation of over 28,000 cDNA clones from japonica rice. Science, 2003, 301: 376–379.Google Scholar
  20. 20.
    Jabbari K, Cruveiller S, Clay O et al. The new genes of rice: A closer look. Trends in Plant Sci., 2004, 9(6): 281–285.Google Scholar
  21. 21.
    Kent W J. BLAT: The BLAST-like alignment tool. Genome Res., 2002, 12(4): 656–664.CrossRefMathSciNetGoogle Scholar
  22. 22.
    Altschul S F, Madden T L, Schaffer A A et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucl. Acid Res., 1997, 25(17): 3389–3402.Google Scholar
  23. 23.
    Sakata K, Nagasaki H et al. A computer program for prediction of gene domain on rice genome sequence. In The 2nd Georgia Tech Int. Conf. Bioinformatics Abstracts, 1999, 78.Google Scholar
  24. 24.
    Salzberg S L, Pertea M, Delcher A L et al. Interpolated Markov models for eukaryotic gene finding. Genomics, 1999, 59(1): 24–31.CrossRefGoogle Scholar
  25. 25.
    Delcher A L, Harmon D, Kasif S et al. Improved microbial gene identification with Glimmer. Nucl. Acids Res., 1999, 27(23): 4636–4641.CrossRefGoogle Scholar
  26. 26.
    Borodovsky M, McIninch J. GENMARK: Parallel gene recognition for both DNA strands. Computer Chem., 1993, 17(2): 123–133.Google Scholar
  27. 27.
    Salamov A A et al. Ab initio gene finding in Drosophila genomic DNA. Genome Res., 2000, 10(4): 516–522.CrossRefGoogle Scholar
  28. 28.
    Zheng W-M. Finding signals for plant promoters. Genomics, Proteomics & Bioinformatics, 2003, 1(1): 68–73.Google Scholar
  29. 29.
    Zheng W-M. Genomic signal enhancement by clustering. Commun. Theor. Phys., 2003, 39(5): 631–634.Google Scholar
  30. 30.
    Zheng W-M. Genomic signal search by dynamic programming. Commun. Theor. Phys., 2003, 39(6): 761–764.Google Scholar
  31. 31.
    Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J. Mol. Evol., 1997, 268(1): 78–94.Google Scholar
  32. 32.
    Burge C. Identification of genes in human genomic DNA [Dissertation]. Stanford University, 1997.Google Scholar
  33. 33.
    Snyder E E, Stormo G D. Identification of coding regions in genomic DNA sequences: An application of dynamic programming and neural networks. Nucl. Acids Res., 1993, 21: 607–613.Google Scholar
  34. 34.
    Zhang M Q. Computational prediction of eukaryotic protein-coding genes. Nature Reviews Genetics, 2002, 3: 698–709.CrossRefGoogle Scholar
  35. 35.
    Abril J F, Guigó R. gff2ps: Visualizing genomic annotations. Bioinformatics, 2000, 16(8): 743–744.CrossRefGoogle Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  • Heng Li
    • 1
    • 2
  • Jin-Song Liu
    • 1
  • Zhao Xu
    • 1
  • Jiao Jin
    • 1
    • 3
  • Lin Fang
    • 1
  • Lei Gao
    • 1
    • 2
  • Yu-Dong Li
    • 1
  • Zi-Xing Xing
    • 1
    • 3
  • Shao-Gen Gao
    • 1
    • 4
  • Tao Liu
    • 1
  • Hai-Hong Li
    • 1
  • Yan Li
    • 5
  • Li-Jun Fang
    • 5
  • Hui-Min Xie
    • 6
  • Wei-Mou Zheng
    • 1
    • 2
  • Bai-Lin Hao
    • 2
    • 5
    • 7
  1. 1.Beijing Genomics Institute (BGI)Academia SinicaBeijingP.R. China
  2. 2.Institute of Theoretical PhysicsAcademia SinicaBeijingP.R. China
  3. 3.Department of MathematicsBeijing Normal UniversityBeijingP.R. China
  4. 4.Institute of Systems ScienceAcademia SinicaBeijingP.R. China
  5. 5.Hangzhou Branch of BGIHangzhouP.R. China
  6. 6.Department of MathematicsSuzhou UniversitySuzhouP.R. China
  7. 7.T-Life Research CenterFudan UniversityShanghaiP.R. China

Personalised recommendations