Skip to main content

EuGene: An Automated Integrative Gene Finder for Eukaryotes and Prokaryotes

Part of the Methods in Molecular Biology book series (MIMB,volume 1962)

Abstract

EuGene is an integrative gene finder applicable to both prokaryotic and eukaryotic genomes. EuGene annotated its first genome in 1999. Starting from genomic DNA sequences representing a complete genome, EuGene is able to predict the major transcript units in the genome from a variety of sources of information: statistical information, similarities with known transcripts and proteins, but also any GFF3 structured information supporting the presence or absence of specific types of elements. EuGene has been used to find genes in the plants Arabidopsis thaliana, Medicago truncatula, and Theobroma cacao; tomato, sunflower, and Rosa genomes; and in the nematode Meloidogyne incognita genome, among many others. The large fraction of plant in this list probably influenced EuGene development, especially in its capacities to withstand a genome with a large number of repeated regions and transposable elements.

Depending on the sources of information used for prediction, EuGene can be considered as purely ab initio, purely similarity based, or hybrid. With the general availability of NGS-transcribed sequence data in genome projects, EuGene adopts a default hybrid behavior that strongly relies on similarity information. Initially targeted at eukaryotic genomes, EuGene has also been extended to offer integrative gene prediction for bacteria, allowing for richer and robust predictions than either purely statistical or homology-based prokaryotic gene finders.

This text has been written as a practical guide that will give you the capacity to train and execute EuGene on your favorite eukaryotic genome. As the prokaryotic case is simpler and has already been described, only the main differences with the eukaryotic version were reported.

Key words

  • Integrative gene finder
  • Prokaryotic and eukaryotic genomes
  • Protein-coding genes
  • Non-coding genes
  • EuGene

This is a preview of subscription content, access via your institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-1-4939-9173-0_6
  • Chapter length: 24 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   169.00
Price excludes VAT (USA)
  • ISBN: 978-1-4939-9173-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Hardcover Book
USD   219.99
Price excludes VAT (USA)

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Foissac S, Bardou P, Moisan A, Cros MJ, Schiex T (2003) EUGENE'HOM: a generic similarity-based gene finder using multiple homologous sequences. Nucleic Acids Res 31(13):3742–3745

    CAS  CrossRef  Google Scholar 

  2. Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25(5):955–964

    CAS  CrossRef  Google Scholar 

  3. Lagesen K, Hallin PF, Rødland E, Stærfeldt HH, Rognes T, Ussery DW (2007) RNammer: consistent annotation of rRNA genes in genomic sequences. Nucleic Acids Res 35(9):3100–3108

    CAS  CrossRef  Google Scholar 

  4. Nawrocki EP, Eddy SR (2013) Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29:2933–2935

    CAS  CrossRef  Google Scholar 

  5. Schiex T, Moisan A, Rouzé P (2001) Eugène: an eukaryotic gene finder that combines several sources of evidence. In: Gascuel O, Sagot MF (eds) Computational biology. JOBIM 2000. Lecture notes in computer science, vol 2066. Springer, Heidelberg

    Google Scholar 

  6. Foissac S, Gouzy J, Rombauts S, Mathé C, Amselem J, Sterck L, Van de Peer Y, Rouzé P, Schiex T (2008) Genome annotation in plants and fungi: EuGene as a model platform. Curr Bioinforma 3(2):87–97

    CAS  CrossRef  Google Scholar 

  7. Bellman R (1957) Dynamic programming. Princeton Univ. Press, Princeton, NJ

    Google Scholar 

  8. Sallet E, Roux B, Sauviac L, Jardinaud MF, Carrere S, Faraut T, de Carvalho-Niebel F, Gouzy J, Gamas P, Capela D, Bruand C (2013) Next-generation annotation of prokaryotic genomes with EuGene-P: application to Sinorhizobium meliloti 2011. DNA Res 20(4):339–354

    CAS  CrossRef  Google Scholar 

  9. Lafferty J, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. ICML '01 proceedings of the eighteenth international conference on machine learning

    Google Scholar 

  10. Badouin H et al (2017) The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution. Nature 546(7656):148–152

    CAS  CrossRef  Google Scholar 

  11. Zhang MQ, Marr TG (1993) A weight array method for splicing signal analysis. Bioinformatics 9(5):499–509

    CAS  CrossRef  Google Scholar 

  12. Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, Durbin R, Ashburner M (2005) The sequence ontology: a tool for the unification of genome. Genome Biol 6:R44

    CrossRef  Google Scholar 

  13. Girgis HZ (2015) Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale. BMC Bioinformatics 16:227

    CrossRef  Google Scholar 

  14. Ellinghaus D, Kurtz S, Willhoeft U (2008) LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9:18

    CrossRef  Google Scholar 

  15. Bao W, Kojima KK, Kohany O (2015) Repbase update, a database of repetitive elements in eukaryotic genomes. Mob DNA 6:11

    CrossRef  Google Scholar 

  16. Arnaiz O, Van Dijk E, Bétermier M, Lhuillier-Akakpo M, de Vanssay A, Duharcourt S, Sallet E, Gouzy J, Sperling L (2017) Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression. BMC Genomics 18(1):483

    CrossRef  Google Scholar 

  17. Skinner ME, Uzilov AV, Stein LD, Mungall CJ, Holmes IH (2009) JBrowse: a next-generation genome browser. Genome Res 19:1630–1638

    CAS  CrossRef  Google Scholar 

  18. Carrere S, Gouzy J (2017) myGenomeBrowser: building and sharing your own genome browser. Bioinformatics 33(8):1255–1257

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Sallet E, Gouzy J, Schiex T (2014) EuGene-PP: a next-generation automated annotation pipeline for prokaryotic genomes. Bioinformatics 30(18):2659–2661

    CAS  CrossRef  Google Scholar 

  20. Foissac S, Schiex T (2005) Integrating alternative splicing detection into gene prediction. BMC Bioinformatics 6:25

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erika Sallet .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Verify currency and authenticity via CrossMark

Cite this protocol

Sallet, E., Gouzy, J., Schiex, T. (2019). EuGene: An Automated Integrative Gene Finder for Eukaryotes and Prokaryotes. In: Kollmar, M. (eds) Gene Prediction. Methods in Molecular Biology, vol 1962. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9173-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-9173-0_6

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-4939-9172-3

  • Online ISBN: 978-1-4939-9173-0

  • eBook Packages: Springer Protocols