Multi-Genome Annotation with AUGUSTUS
Comparing multiple related genomes can help to improve their structural annotation. The accuracy and consistency of the predicted exon–intron structures of the protein coding genes can be higher when considering all genomes at once rather than annotating one genome at a time.
The comparative gene prediction algorithm of AUGUSTUS performs such a multi-genome annotation. A multiple alignment of genomes is used to exploit evolutionary clues to conservation and negative selection. Further, AUGUSTUS exploits the fact that orthologous genes typically have congruent exon–intron structures. Comparative AUGUSTUS simultaneously predicts the genes in all input genomes. In this chapter we walk the reader through a small example from eight vertebrate species, including the construction of an alignment of the input genomes and how to integrate RNA-Seq evidence from multiple species for gene finding.
Key wordsComparative genomics Genome annotation Gene prediction Protein-coding genes AUGUSTUS RNA-Seq
This chapter is based on research that was funded partially by Deutsche Forschungsgemeinschaft grant STA 1009/10-1 to MS and by a scholarship of the Studienstiftung des deutschen Volkes to SN.
- 1.Stanke M, Waack S (2003) Gene prediction with a hidden Markov model and new intron submodel. Bioinformatics 19(Suppl 2):ii215–ii225Google Scholar
- 5.Hoff KJ, Stanke M (2018) Predicting genes in single genomes with AUGUSTUS. Curr Protoc Bioinf (.e57)Google Scholar
- 9.Nachtweide S (2018) The simultaneous identification of genes in related species. Doctoral thesisGoogle Scholar
- 13.Lilue J, Doran AG, Fiddes IT, Abrudan M, Armstrong J, Bennett R, Chow W, Collins J, Collins S, Czechanski A, Danecek P, Diekhans M, Dolle D-D, Dunn M, Durbin R, Earl D, Ferguson-Smith A, Flicek P, Flint J, Frankish A, Fu B, Gerstein M, Gilbert J, Goodstadt L, Harrow J, Howe K, Kolmogorov M, Koenig S, Lelliott C, Loveland J, Mott R, Muir P, Navarro F, Odom D, Park N, Pelan S, Phan SK, Quail M, Reinholdt L, Romoth L, Shirley L, Sisu C, Sjoberg-Herrera M, Stanke M, Steward C, Thomas M, Threadgold G, Thybert D, Torrance J, Wong K, Wood J, Yang F, Adams DJ, Paten B, Keane TM (2018) Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci. Nat Genet 50:1574–1583CrossRefGoogle Scholar
- 14.Fiddes IT, Armstrong J, Diekhans M, Nachtweide S, Kronenberg ZN, Underwood JG, Gordon D, Earl D, Keane T, Eichler EE, Haussler D, Stanke M, Paten B (2018) Comparative Annotation Toolkit (CAT) – simultaneous clade and personal genome annotation. Genome Res. https://doi.org/10.1101/gr.233460.117