Multi-Genome Annotation with AUGUSTUS

  • Stefanie Nachtweide
  • Mario StankeEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1962)


Comparing multiple related genomes can help to improve their structural annotation. The accuracy and consistency of the predicted exon–intron structures of the protein coding genes can be higher when considering all genomes at once rather than annotating one genome at a time.

The comparative gene prediction algorithm of AUGUSTUS performs such a multi-genome annotation. A multiple alignment of genomes is used to exploit evolutionary clues to conservation and negative selection. Further, AUGUSTUS exploits the fact that orthologous genes typically have congruent exon–intron structures. Comparative AUGUSTUS simultaneously predicts the genes in all input genomes. In this chapter we walk the reader through a small example from eight vertebrate species, including the construction of an alignment of the input genomes and how to integrate RNA-Seq evidence from multiple species for gene finding.

Key words

Comparative genomics Genome annotation Gene prediction Protein-coding genes AUGUSTUS RNA-Seq 



This chapter is based on research that was funded partially by Deutsche Forschungsgemeinschaft grant STA 1009/10-1 to MS and by a scholarship of the Studienstiftung des deutschen Volkes to SN.


  1. 1.
    Stanke M, Waack S (2003) Gene prediction with a hidden Markov model and new intron submodel. Bioinformatics 19(Suppl 2):ii215–ii225CrossRefGoogle Scholar
  2. 2.
    Stanke M, Diekhans M, Baertsch R, Haussler D (2008) Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24(5):637–644CrossRefGoogle Scholar
  3. 3.
    Keller O, Kollmar M, Stanke M, Waack S (2011) A novel hybrid gene prediction method employing protein multiple sequence alignments. Bioinformatics 27(6):757–763CrossRefGoogle Scholar
  4. 4.
    Hoff KJ, Stanke M (2013) WebAUGUSTUS – a web service for training AUGUSTUS and predicting genes in eukaryotes. Nucleic Acids Res 41(W1):W123–W128CrossRefGoogle Scholar
  5. 5.
    Hoff KJ, Stanke M (2018) Predicting genes in single genomes with AUGUSTUS. Curr Protoc Bioinf (.e57)Google Scholar
  6. 6.
    Gross S, Do C, Sirota M, Batzoglou S (2007) CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction. Genome Biol 8(12):R269CrossRefGoogle Scholar
  7. 7.
    Gross SS, Brent MR (2006) Using multiple alignments to improve gene prediction. J Comput Biol 13(2):379–393CrossRefGoogle Scholar
  8. 8.
    König S, Romoth LW, Gerischer L, Stanke M (2016) Simultaneous gene finding in multiple genomes. Bioinformatics 32(22):3388–3395PubMedPubMedCentralGoogle Scholar
  9. 9.
    Nachtweide S (2018) The simultaneous identification of genes in related species. Doctoral thesisGoogle Scholar
  10. 10.
    Rosenbloom KR, Armstrong J, Barber GP, Casper J, Clawson H, Diekhans M, Dreszer TR, Fujita PA, Guruvadoo L, Haeussler M, et al (2014) The UCSC genome browser database: 2015 update. Nucleic Acids Res 43(D1):D670–D681CrossRefGoogle Scholar
  11. 11.
    Paten B, Earl D, Nguyen N, Diekhans M, Zerbino D, Haussler D (2011) Cactus: algorithms for genome multiple sequence alignment. Genome Res 21(9):1512–1528CrossRefGoogle Scholar
  12. 12.
    Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21CrossRefGoogle Scholar
  13. 13.
    Lilue J, Doran AG, Fiddes IT, Abrudan M, Armstrong J, Bennett R, Chow W, Collins J, Collins S, Czechanski A, Danecek P, Diekhans M, Dolle D-D, Dunn M, Durbin R, Earl D, Ferguson-Smith A, Flicek P, Flint J, Frankish A, Fu B, Gerstein M, Gilbert J, Goodstadt L, Harrow J, Howe K, Kolmogorov M, Koenig S, Lelliott C, Loveland J, Mott R, Muir P, Navarro F, Odom D, Park N, Pelan S, Phan SK, Quail M, Reinholdt L, Romoth L, Shirley L, Sisu C, Sjoberg-Herrera M, Stanke M, Steward C, Thomas M, Threadgold G, Thybert D, Torrance J, Wong K, Wood J, Yang F, Adams DJ, Paten B, Keane TM (2018) Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci. Nat Genet 50:1574–1583CrossRefGoogle Scholar
  14. 14.
    Fiddes IT, Armstrong J, Diekhans M, Nachtweide S, Kronenberg ZN, Underwood JG, Gordon D, Earl D, Keane T, Eichler EE, Haussler D, Stanke M, Paten B (2018) Comparative Annotation Toolkit (CAT) – simultaneous clade and personal genome annotation. Genome Res. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Institute of Mathematics and Computer ScienceUniversity of GreifswaldGreifswaldGermany

Personalised recommendations