Skip to main content

Eugène: An Eukaryotic Gene Finder That Combines Several Sources of Evidence

  • Conference paper
  • First Online:
Computational Biology (JOBIM 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2066))

Included in the following conference series:

Abstract

In this paper, we describe the basis of EuGéne, a gene finder for eukaryotic organisms applied to Arabidopsis thaliana. The specificity of EuGéne, compared to existing gene finding software, is that EuGéne has been designed to combine the output of several information sources, including output of other software or user information. To achieve this, a weighted directed acyclic graph (DAG) is built in such a way that a shortest feasible path in this graph represents the most likely gene structure of the underlying DNA sequence.

The usual simple Bellman linear time shortest path algorithm for DAG has been replaced by a shortest path with constraints algorithm. The constraints express minimum length of introns or intergenic regions. The specificity of the constraints leads to an algorithm which is still linear both in time and space. p] EuGéne effectiveness has been assessed on Araset, a recent dataset of Arabidopsis thaliana sequences used to evaluate several existing gene finding software. It appears that, despite its simplicity, EuGéne gives results which compare very favourably to existing software. We try to analyse the reasons of these results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Bellman, Dynamic Programming, Princeton Univ. Press, Princeton, New Jersey, (1957).

    Google Scholar 

  2. V. Brendel and J. Kleffe, (1998), Prediction of locally optimal splice sites in plant pre-mRNA with application to gene identification in Arabidopsis thaliana genomic DNA, Nucleic Acids Res., 26, pp. 4749–4757.

    Article  Google Scholar 

  3. C. Burge and S. Karlin, Apr 1997, Prediction of complete gene structures in human genomic dna., J Mol Biol, 268, pp. 78–94.

    Article  Google Scholar 

  4. T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to algorithms, MIT Press, (1990). ISBN: 0-262-03141-8.

    Google Scholar 

  5. L. Florea, G. Hartzell, Z. Zhang, G. Rubin, and W. Miller, Sept. 1998, A computer program for aligning a cdna sequence with a genomic dna sequence, Genome Res, 8, pp. 967–974.

    Google Scholar 

  6. X. Huang, M. Adams, H. Zhou, and A. Kerlavage, Nov 1997, A tool for analyzing and annotating genomic sequences., Genomics, 46, pp. 37–45.

    Article  Google Scholar 

  7. P. Korning, S. Hebsgaard, P. Rouze, and S. Brunak, (1996), Cleaning the genbank arabidopsis thaliana data set, Nucleic Acids Res., 24, pp. 316–320.

    Article  Google Scholar 

  8. D. Kulp, D. Haussler, M. Reese, and F. Eeckman, (1997), Integrating database homology in a probabilistic gene structure model., in Pacific Symp. Biocomputing, pp. 232–44.

    Google Scholar 

  9. A. V. Lukashin and M. Borodovsky, (1998), GeneMark.hmm: new solutions for gene finding, Nucleic Acids Res., 26, pp. 1107–1115.

    Article  Google Scholar 

  10. K. Murakami and T. Takagi, (1998), Gene recognition by combination of several gene-finding programs, BioInformatics, 14, pp. 665–675.

    Article  Google Scholar 

  11. N. Pavy, S. Rombauts, P. Déhais, C. Mathé, D. Ramana, P. Leroy, and P. Rouzé, Nov. 1999, Evaluation of gene prediction software using a genomic data set: application to arabidopsis thaliana sequences., Bioinformatics, 15, pp. 887–99. Also appeared in the Proc. of 2d Georgia Tech conference on BioInformatics.

    Article  Google Scholar 

  12. A. Pedersen and H. Nielsen, (1997), Neural network prediction of translation initiation sites in eukaryotes: prespectives for EST and genome analysis, in Proc. of ISMB’97, AAAI Press, pp. 226–233.

    Google Scholar 

  13. G. R, Winter 1998, Assembling genes from predicted exons in linear time with dynamic programming., Journal of Computational Biology, 5, pp. 681–702.

    Google Scholar 

  14. L. Rabiner, (1989), A tutorial on hidden markov models and selected application in speech recognition, Proc. IEEE, 77, pp. 257–286.

    Article  Google Scholar 

  15. L. R. Rabiner, (1989), A tutorial on hidden markov models and selected applications in speech recognition, Proc. of the IEEE, 77, pp. 257–286.

    Article  Google Scholar 

  16. I. Rogozin, L. Milanesi, and N. Kolchanov, Jun 1996, Gene structure prediction using information on homologous protein sequence., Comput. Appl. Biosci., 12, pp. 161–70.

    Google Scholar 

  17. S. L. Salzberg, A. L. Delcher, S. Kasif, and O. White, (1998), Microbial gene identification using interpolated Markov models, Nucleic Acids Res., 26, pp. 544–548.

    Article  Google Scholar 

  18. E. Snyder and G. Stormo, DNA and protein sequence analysis: a practical approach, IRL Press, Oxford, (1995), ch. Identifying genes in genomic DNA sequences, pp. 209–224.

    Google Scholar 

  19. N. Tolstrup et al., (1997), A branch-point consensus from Arabidopsis found by non circular analysis allows for better prediction of acceptor sites, Nucleic Acids Res., 25, pp. 3159–3163.

    Article  Google Scholar 

  20. J. Usuka., W. Zhu., and V. Brendel., (2000), Optimal spliced alignment of homologous cDNA to a genomic DNA template, Bioinformatics, 16, pp. 203–211.

    Article  Google Scholar 

  21. T. D. Wu, (1996), A segment-based dynamic programming algorithm for predicting gene structure, Journal of Computational Biology, 3, pp. 375–394.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schiex, T., Moisan, A., Rouzé, P. (2001). Eugène: An Eukaryotic Gene Finder That Combines Several Sources of Evidence. In: Gascuel, O., Sagot, MF. (eds) Computational Biology. JOBIM 2000. Lecture Notes in Computer Science, vol 2066. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45727-5_10

Download citation

  • DOI: https://doi.org/10.1007/3-540-45727-5_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42242-6

  • Online ISBN: 978-3-540-45727-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics