Eugène: An Eukaryotic Gene Finder That Combines Several Sources of Evidence

Schiex, Thomas; Moisan, Annick; Rouzé, Pierre

doi:10.1007/3-540-45727-5_10

Thomas Schiex⁶,
Annick Moisan⁶ &
Pierre Rouzé⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2066))

Included in the following conference series:

International Conference on Biology, Informatics, and Mathematics

384 Accesses
36 Citations

Abstract

In this paper, we describe the basis of EuGéne, a gene finder for eukaryotic organisms applied to Arabidopsis thaliana. The specificity of EuGéne, compared to existing gene finding software, is that EuGéne has been designed to combine the output of several information sources, including output of other software or user information. To achieve this, a weighted directed acyclic graph (DAG) is built in such a way that a shortest feasible path in this graph represents the most likely gene structure of the underlying DNA sequence.

The usual simple Bellman linear time shortest path algorithm for DAG has been replaced by a shortest path with constraints algorithm. The constraints express minimum length of introns or intergenic regions. The specificity of the constraints leads to an algorithm which is still linear both in time and space. p] EuGéne effectiveness has been assessed on Araset, a recent dataset of Arabidopsis thaliana sequences used to evaluate several existing gene finding software. It appears that, despite its simplicity, EuGéne gives results which compare very favourably to existing software. We try to analyse the reasons of these results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Bellman, Dynamic Programming, Princeton Univ. Press, Princeton, New Jersey, (1957).
Google Scholar
V. Brendel and J. Kleffe, (1998), Prediction of locally optimal splice sites in plant pre-mRNA with application to gene identification in Arabidopsis thaliana genomic DNA, Nucleic Acids Res., 26, pp. 4749–4757.
Article Google Scholar
C. Burge and S. Karlin, Apr 1997, Prediction of complete gene structures in human genomic dna., J Mol Biol, 268, pp. 78–94.
Article Google Scholar
T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to algorithms, MIT Press, (1990). ISBN: 0-262-03141-8.
Google Scholar
L. Florea, G. Hartzell, Z. Zhang, G. Rubin, and W. Miller, Sept. 1998, A computer program for aligning a cdna sequence with a genomic dna sequence, Genome Res, 8, pp. 967–974.
Google Scholar
X. Huang, M. Adams, H. Zhou, and A. Kerlavage, Nov 1997, A tool for analyzing and annotating genomic sequences., Genomics, 46, pp. 37–45.
Article Google Scholar
P. Korning, S. Hebsgaard, P. Rouze, and S. Brunak, (1996), Cleaning the genbank arabidopsis thaliana data set, Nucleic Acids Res., 24, pp. 316–320.
Article Google Scholar
D. Kulp, D. Haussler, M. Reese, and F. Eeckman, (1997), Integrating database homology in a probabilistic gene structure model., in Pacific Symp. Biocomputing, pp. 232–44.
Google Scholar
A. V. Lukashin and M. Borodovsky, (1998), GeneMark.hmm: new solutions for gene finding, Nucleic Acids Res., 26, pp. 1107–1115.
Article Google Scholar
K. Murakami and T. Takagi, (1998), Gene recognition by combination of several gene-finding programs, BioInformatics, 14, pp. 665–675.
Article Google Scholar
N. Pavy, S. Rombauts, P. Déhais, C. Mathé, D. Ramana, P. Leroy, and P. Rouzé, Nov. 1999, Evaluation of gene prediction software using a genomic data set: application to arabidopsis thaliana sequences., Bioinformatics, 15, pp. 887–99. Also appeared in the Proc. of 2d Georgia Tech conference on BioInformatics.
Article Google Scholar
A. Pedersen and H. Nielsen, (1997), Neural network prediction of translation initiation sites in eukaryotes: prespectives for EST and genome analysis, in Proc. of ISMB’97, AAAI Press, pp. 226–233.
Google Scholar
G. R, Winter 1998, Assembling genes from predicted exons in linear time with dynamic programming., Journal of Computational Biology, 5, pp. 681–702.
Google Scholar
L. Rabiner, (1989), A tutorial on hidden markov models and selected application in speech recognition, Proc. IEEE, 77, pp. 257–286.
Article Google Scholar
L. R. Rabiner, (1989), A tutorial on hidden markov models and selected applications in speech recognition, Proc. of the IEEE, 77, pp. 257–286.
Article Google Scholar
I. Rogozin, L. Milanesi, and N. Kolchanov, Jun 1996, Gene structure prediction using information on homologous protein sequence., Comput. Appl. Biosci., 12, pp. 161–70.
Google Scholar
S. L. Salzberg, A. L. Delcher, S. Kasif, and O. White, (1998), Microbial gene identification using interpolated Markov models, Nucleic Acids Res., 26, pp. 544–548.
Article Google Scholar
E. Snyder and G. Stormo, DNA and protein sequence analysis: a practical approach, IRL Press, Oxford, (1995), ch. Identifying genes in genomic DNA sequences, pp. 209–224.
Google Scholar
N. Tolstrup et al., (1997), A branch-point consensus from Arabidopsis found by non circular analysis allows for better prediction of acceptor sites, Nucleic Acids Res., 25, pp. 3159–3163.
Article Google Scholar
J. Usuka., W. Zhu., and V. Brendel., (2000), Optimal spliced alignment of homologous cDNA to a genomic DNA template, Bioinformatics, 16, pp. 203–211.
Article Google Scholar
T. D. Wu, (1996), A segment-based dynamic programming algorithm for predicting gene structure, Journal of Computational Biology, 3, pp. 375–394.
Article Google Scholar

Download references

Author information

Authors and Affiliations

INRA, Toulouse, France
Thomas Schiex & Annick Moisan
INRA, Gand, Belgique
Pierre Rouzé

Authors

Thomas Schiex
View author publications
You can also search for this author in PubMed Google Scholar
Annick Moisan
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Rouzé
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Laboratoire d’Informatique, de Robotique et de Microelectronique de Montpellier, 161 rue Ada, 34392, Montpellier Cedex 5, France
Olivier Gascuel
Laboratoire d’Algorithmique Combinatoire, Institut Pasteur, 28, rue du Dr. Roux, 75724, Paris Cedex 15, France
Marie-France Sagot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schiex, T., Moisan, A., Rouzé, P. (2001). Eugène: An Eukaryotic Gene Finder That Combines Several Sources of Evidence. In: Gascuel, O., Sagot, MF. (eds) Computational Biology. JOBIM 2000. Lecture Notes in Computer Science, vol 2066. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45727-5_10

Download citation

DOI: https://doi.org/10.1007/3-540-45727-5_10
Published: 28 June 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42242-6
Online ISBN: 978-3-540-45727-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics