Skip to main content

Computational Gene Annotation in New Genome Assemblies Using GeneID

  • Protocol
  • First Online:
Bioinformatics for DNA Sequence Analysis

Part of the book series: Methods in Molecular Biology ((MIMB,volume 537))

Abstract

The sequence of many eukaryotic genomes is nowadays available from a personal computer to any researcher in the world-wide scientific community. However, the sequences are worthless without the adequate annotation of the biological meaningful elements. The annotation of the genes, in particular, is a challenging task that can not be tackled without the aid of specific bioinformatics tools. We present in this chapter a simple protocol mainly based on the combination of the program GeneID and other computational tools to annotate the location of a gene, which was previously annotated in D. melanogaster, in the recently assembled genome of D. yakuba.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Blanco, E., and R. GuigĂ³ (2005) Predictive methods using DNA sequences, in Bioinformatics : A Practical Guide to the Analysis of Genes and Proteins (Baxevanis, A.D. and Ouellette, B.F.F. Eds). Wiley-Interscience: Hoboken, NJ, p. xviii, 540 p.

    Google Scholar 

  2. ENCODE Project Consortium (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447(7146), 799–816.

    Article  Google Scholar 

  3. Zhang, M. Q. (2002) Computational prediction of eukaryotic protein-coding genes. Nat Rev Genet 3(9), 698–709.

    Article  PubMed  CAS  Google Scholar 

  4. Venter, J. C., et al. (2001) The sequence of the human genome. Science 291(5507), 1304–51.

    Article  PubMed  CAS  Google Scholar 

  5. Nagaraj, S. H., Gasser, R. B., and Ranganathan, S. (2007) A hitchhiker’s guide to expressed sequence tag (EST) analysis. Brief Bioinform 8(1), 6–21.

    Article  PubMed  CAS  Google Scholar 

  6. Stanke, M., Tzvetkova, A., and Morgenstern, B. (2006) AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome. Genome Biol 7 Suppl 1, S11 1–8.

    Article  Google Scholar 

  7. Allen, J. E., and Salzberg, S. L. (2005) JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics 21(18), 3596–603.

    Article  PubMed  CAS  Google Scholar 

  8. Kuhn, R. M., et al. (2007) The UCSC genome browser database: update 2007. Nucleic Acids Res 35(Database issue), D668–73.

    Article  PubMed  CAS  Google Scholar 

  9. Hubbard, T. J., et al. (2007) Ensembl 2007. Nucleic Acids Res 35(Database issue), D610–7.

    Article  PubMed  CAS  Google Scholar 

  10. Wheeler, D. L., et al. (2007) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 35(Database issue), D5–12.

    Article  PubMed  CAS  Google Scholar 

  11. Guigo, R., et al. (1992) Prediction of gene structure. J Mol Biol 226(1), 141–57.

    Article  PubMed  CAS  Google Scholar 

  12. Parra, G., Blanco, E., and Guigo, R. (2000) GeneID in Drosophila. Genome Res 10(4), 511–5.

    Article  PubMed  CAS  Google Scholar 

  13. Blanco, E., Parra, G., and GuigĂ³, R. (2007) Using geneid to identify genes in Current Protocols in Bioinformatics (Baxevanis, A. D. et al., Eds). John Wiley & Sons: New York, p. 1–28 (Unit 4.3).

    Google Scholar 

  14. Burge, C., and Karlin, S. (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268(1), 78–94.

    Article  PubMed  CAS  Google Scholar 

  15. Besemer, J., and Borodovsky, M. (2005) GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 33(Web Server issue), W451–4.

    Article  PubMed  CAS  Google Scholar 

  16. Uberbacher, E. C., and Mural, R. J. (1991) Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proc Natl Acad Sci USA 88(24), 11261–5.

    Article  PubMed  CAS  Google Scholar 

  17. Salamov, A. A., and Solovyev, V. V. (2000) Ab initio gene finding in Drosophila genomic DNA. Genome Res 10(4), 516–22.

    Article  PubMed  CAS  Google Scholar 

  18. Reese, M. G., et al. (2000) Genome annotation assessment in Drosophila melanogaster. Genome Res 10(4), 483–501.

    Article  PubMed  CAS  Google Scholar 

  19. Glockner, G., et al. (2002) Sequence and analysis of chromosome 2 of Dictyostelium discoideum. Nature 418(6893), 79–85.

    Article  PubMed  Google Scholar 

  20. Jaillon, O., et al. (2004) Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature 431(7011), 946–57.

    Article  PubMed  Google Scholar 

  21. Aury, J. M., et al. (2006) Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraurelia. Nature 444(7116), 171–8.

    Article  PubMed  CAS  Google Scholar 

  22. Guigo, R., et al. (2006) EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biol 7 Suppl 1, S2 1–31.

    Article  Google Scholar 

  23. Gingeras, T. R. (2007) Origin of phenotypes: genes and transcripts. Genome Res 17(6), 682–90.

    Article  PubMed  CAS  Google Scholar 

  24. Ladd, A. N., and Cooper, T. A. (2002) Finding signals that regulate alternative splicing in the post-genomic era. Genome Biol 3(11), reviews0008.

    Article  PubMed  Google Scholar 

  25. Low, S. C., and Berry, M. J. (1996) Knowing when not to stop: selenocysteine incorporation in eukaryotes. Trends Biochem Sci 21(6), 203–8.

    PubMed  CAS  Google Scholar 

  26. Castellano, S., et al. (2004) Reconsidering the evolution of eukaryotic selenoproteins: a novel nonmammalian family with scattered phylogenetic distribution. EMBO Rep 5(1), 71–7.

    Article  PubMed  CAS  Google Scholar 

  27. Pruitt, K. D., Tatusova, T., and Maglott, D. R. (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35(Database issue), D61–5.

    Article  PubMed  CAS  Google Scholar 

  28. Crosby, M. A., et al. (2007) FlyBase: genomes by the dozen. Nucleic Acids Res 35(Database issue), D486–91.

    Article  PubMed  CAS  Google Scholar 

  29. Guigo, R. (1998) Assembling genes from predicted exons in linear time with dynamic programming. J Comput Biol 5(4), 681–702.

    Article  PubMed  CAS  Google Scholar 

  30. Kent, W. J. (2002) BLAT – the BLAST-like alignment tool. Genome Res 12(4), 656–64.

    PubMed  CAS  Google Scholar 

  31. Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22), 4673–80.

    Article  PubMed  CAS  Google Scholar 

  32. Birney, E., Clamp, M., and Durbin, R. (2004) GeneWise and Genomewise. Genome Res 14(5), 988–95.

    Article  PubMed  CAS  Google Scholar 

  33. Abril, J. F., and Guigo, R. (2000) gff2ps: visualizing genomic annotations. Bioinformatics 16(8), 743–4.

    Article  PubMed  CAS  Google Scholar 

  34. Fabra, P., and Miracle, J. (1983) Diccionari general de la Ilengua catalana. (17a ed). EDHASA editorial: Barcelona, 1786 p.

    Google Scholar 

  35. Jimenez, G., et al. (2000) Relief of gene repression by torso RTK signaling: role of capicua in Drosophila terminal and dorsoventral patterning. Genes Dev 14(2), 224–31.

    PubMed  CAS  Google Scholar 

  36. Adams, M. D., et al. (2000) The genome sequence of Drosophila melanogaster. Science 287(5461), 2185–95.

    Article  PubMed  Google Scholar 

  37. Parra, G., et al. (2003) Comparative gene prediction in human and mouse. Genome Res 13(1), 108–17.

    Article  PubMed  CAS  Google Scholar 

  38. Wang, M., Buhler, J., and Brent, M. R. (2003) The effects of evolutionary distance on TWINSCAN, an algorithm for pair-wise comparative gene prediction. Cold Spring Harb Symp Quant Biol 68, 125–30.

    Article  PubMed  CAS  Google Scholar 

  39. Batzoglou, S. (2005) The many faces of sequence alignment. Brief Bioinform 6(1), 6–22.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Blanco, E., Abril, J.F. (2009). Computational Gene Annotation in New Genome Assemblies Using GeneID. In: Posada, D. (eds) Bioinformatics for DNA Sequence Analysis. Methods in Molecular Biology, vol 537. Humana Press. https://doi.org/10.1007/978-1-59745-251-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-59745-251-9_12

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-910-9

  • Online ISBN: 978-1-59745-251-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics