Skip to main content

Whole-Genome Annotation with BRAKER

  • Protocol
  • First Online:
Gene Prediction

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1962))

Abstract

BRAKER is a pipeline for highly accurate and fully automated gene prediction in novel eukaryotic genomes. It combines two major tools: GeneMark-ES/ET and AUGUSTUS. GeneMark-ES/ET learns its parameters from a novel genomic sequence in a fully automated fashion; if available, it uses extrinsic evidence for model refinement. From the protein-coding genes predicted by GeneMark-ES/ET, we select a set for training AUGUSTUS, one of the most accurate gene finding tools that, in contrast to GeneMark-ES/ET, integrates extrinsic evidence already into the gene prediction step. The first published version, BRAKER1, integrated genomic footprints of unassembled RNA-Seq reads into the training as well as into the prediction steps. The pipeline has since been extended to the integration of data on mapped cross-species proteins, and to the usage of heterogeneous extrinsic evidence, both RNA-Seq and protein alignments. In this book chapter, we briefly summarize the pipeline methodology and describe how to apply BRAKER in environments characterized by various combinations of external evidence.

The authors “Alexandre Lomsadze”, “Mark Borodovsky”, and “Mario Stanke” contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M (2015) BRAKER1: unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 32(5):767–769

    Article  PubMed  PubMed Central  Google Scholar 

  2. Lomsadze A, Ter-Hovhannisyan V, Chernoff YO, Borodovsky M (2005) Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res 33(20):6494–6506

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Ter-Hovhannisyan V, Lomsadze A, Chernoff YO, Borodovsky M (2008) Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res 18:1979–1990. https://doi.org/10.1101/gr.081612.108

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Lomsadze A, Burns PD, Borodovsky M (2014) Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. Nucleic Acids Res 42(15):e119

    Article  PubMed  PubMed Central  Google Scholar 

  5. Stanke M, Schöffmann O, Dahms St, Morgenstern B, Waack S (2006) Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinf 7:62

    Article  Google Scholar 

  6. Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B (2006) AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 3(34):W435–W439

    Article  Google Scholar 

  7. Stanke M, Steinkamp R, Waack S, Morgenstern B (2004) AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res 32:W309–W312

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Hoff KJ, Stanke M (2013) WebAUGUSTUS – a web service for training AUGUSTUS and predicting genes in eukaryotes. Nucleic Acids Res 41(W1):W123–W128

    Article  PubMed  PubMed Central  Google Scholar 

  9. König S, Romoth LW, Gerischer L, Stanke M (2016) Simultaneous gene finding in multiple genomes. Bioinformatics 32(22):3388–3395

    PubMed  PubMed Central  Google Scholar 

  10. Stanke M, Diekhans M, Baertsch R, Haussler D (2008) Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24(5):637–644

    Article  CAS  PubMed  Google Scholar 

  11. Cantarel BL, Korf I, Robb SMC, Parra G, Ross E, Moore B, Holt C, Alvarado AS, Yandell M (2008) MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res 18(1):188–196

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Holt C, Yandell M (2011) MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinf 12(1):491

    Article  Google Scholar 

  13. Abbott A (2005) Competition boosts bid to find human genes. Nature 435:134

    Article  CAS  PubMed  Google Scholar 

  14. Guigó R, Flicek P, Abril JF, Reymond A, Lagarde J, Denoeud F, Antonarakis S, Ashburner M, Bajic VB, Birney E, Castelo R, Eyras E, Ucla C, Gingeras TR, Harrow J, Hubbard T, Lewis SE, Reese MG (2006) EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biol 7(1):S2

    Article  PubMed  PubMed Central  Google Scholar 

  15. Stanke M, Tzvetkova A, Morgenstern B (2006) AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome. Genome Biol 7(1):S11

    Article  PubMed  PubMed Central  Google Scholar 

  16. Coghlan A, Fiedler T, McKay S, Flicek P, Harris T, Blasiar D, the nGASP Consortium, Stein L (2008) nGASP - the nematode genome annotation assessment project. BMC Bioinf 9(1):549

    Article  Google Scholar 

  17. Steijger T, Abril JF, Engstrom PG, Kokocinski F, Akerman M, Alioto T, Ambrosini G, Antonarakis SE, Behr J, Bohnert R, Bucher P, Cloonan N, Derrien T, Djebali S, Du J, Dudoit S, Gerstein M, Gingeras TR, Gonzalez D, Grimmond SM, Habegger L, Iseli C, Jean G, Kahles A, Lagarde J, Leng J, Lefebvre G, Lewis S, Mortazavi A, Niermann P, Rätsch G, Reymond A, Ribeca P, Richard H, Rougemont J, Rozowsky J, Sammeth M, Sboner A, Schulz MH, Searle SMJ, Solorzano ND, Solovyev V, Stanke M, Steijger T, Stevenson BJ, Stockinger H, Valsesia A, Weese D, White S, Wold BJ, Wu J, Wu TD, Zeller G, Zerbino D, Zhang MQ, Hubbard TJ, Guigo R, Harrow J, Bertone P (2013) Assessment of transcript reconstruction methods for RNA-seq. Nat Methods 10(12):1177–1184

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Keller O, Odronitz F, Stanke M, Kollmar M, Waack S (2008) Scipio: using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species. BMC Bioinf 9(1):278

    Article  Google Scholar 

  19. Gremme G (2013) Computational gene structure prediction. PhD thesis, Universität Hamburg

    Google Scholar 

  20. Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, Salzberg SL, White O (2003) Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 31(19):5654–5666

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) A basic local alignment search tool. J Mol Biol 215(3):403–410

    Article  CAS  PubMed  Google Scholar 

  22. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+: architecture and applications. BMC Bioinf 10(1):421

    Article  Google Scholar 

  23. Barnett DW, Garrison EK, Quinlan AR, Strömberg MP, Marth GT (2011) BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics 27(12):1691–1692

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25(16):2078–2079

    Article  PubMed  PubMed Central  Google Scholar 

  25. Chen N (2004) Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinf 5(1):4.10. 1–4.10. 14

    Google Scholar 

  26. Price AL, Jones NC, Pevzner PA (2005) De novo identification of repeat families in large genomes. Bioinformatics 21(Suppl 1):i351–i358

    Article  CAS  PubMed  Google Scholar 

  27. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21

    Article  CAS  PubMed  Google Scholar 

  28. Daehwan K, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14(4):R36

    Article  Google Scholar 

  29. Wu TD, Nacu S (2010) Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26(6):873–881

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Kapustin Y, Souvorov A, Tatusova T, Lipman D (2008) Splign: algorithms for computing spliced alignments with identification of paralogs. Biol Direct 3(1):20

    Article  PubMed  PubMed Central  Google Scholar 

  31. Powell S, Szklarczyk D, Trachana K, Roth A, Kuhn M, Muller J, Arnold R, Rattei T, Letunic I, Doerks T, et al (2011) eggNOG v3. 0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges. Nucleic Acids Res 40(D1):D284–D289

    Article  PubMed  PubMed Central  Google Scholar 

  32. Waterhouse RM, Tegenfeldt F, Li J, Zdobnov EM, Kriventseva EV (2012) OrthoDB: a hierarchical catalog of animal, fungal and bacterial orthologs. Nucleic Acids Res 41(D1):D358–D365

    Article  PubMed  PubMed Central  Google Scholar 

  33. Slater GSC, Birney E (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinf 6(1):31

    Article  Google Scholar 

  34. Gotoh O (2008) Direct mapping and alignment of protein sequences onto genomic sequence. Bioinformatics 24(21):2438–2444

    Article  CAS  PubMed  Google Scholar 

  35. Gotoh O (2008) A space-efficient and accurate method for mapping and aligning cDNA sequences onto genomic sequence. Nucleic Acids Res 36(8):2630–2638

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Iwata H, Gotoh O (2012) Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features. Nucleic Acids Res 40(20):e161

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Keilwagen J, Wenk M, Erickson JL, Schattat MH, Grau J, Hartung F (2016) Using intron position conservation for homology-based gene prediction. Nucleic Acids Res 44(9):e89

    Article  PubMed  PubMed Central  Google Scholar 

  38. Keilwagen J, Hartung F, Paulini M, Twardziok SO, Grau J (2018) Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi. BMC Bioinf 19(1):189

    Article  Google Scholar 

  39. Casper J, Zweig AS, Villarreal C, Tyner C, Speir ML, Rosenbloom KR, Raney BJ, Lee CM, Lee BT, Karolchik D et al (2017) The UCSC genome browser database: 2018 update. Nucleic Acids Res 46(D1):D762–D769

    PubMed Central  Google Scholar 

  40. Skinner ME, Uzilov AV, Stein LD, Mungall CJ, Holmes IH (2009) JBrowse: a next-generation genome browser. Genome Res 19(9):1630–1638. https://doi.org/10.1101/gr.094607.109

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Carver T, Harris SR, Berriman M, Parkhill J, McQuillan JA (2011) Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data. Bioinformatics 28(4):464–469

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work is supported in part by the US National Institutes of Health grant HG000783 to MB, by the German Research Foundation grant 1009/12-1 to MS and by the US National Institutes of Health grant GM128145 to MB and MS.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Katharina J. Hoff or Mark Borodovsky .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Hoff, K.J., Lomsadze, A., Borodovsky, M., Stanke, M. (2019). Whole-Genome Annotation with BRAKER. In: Kollmar, M. (eds) Gene Prediction. Methods in Molecular Biology, vol 1962. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9173-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-9173-0_5

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-4939-9172-3

  • Online ISBN: 978-1-4939-9173-0

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics