BUSCO: Assessing Genome Assembly and Annotation Completeness

  • Mathieu Seppey
  • Mosè Manni
  • Evgeny M. ZdobnovEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1962)


Genomics drives the current progress in molecular biology, generating unprecedented volumes of data. The scientific value of these sequences depends on the ability to evaluate their completeness using a biologically meaningful approach. Here, we describe the use of the BUSCO tool suite to assess the completeness of genomes, gene sets, and transcriptomes, using their gene content as a complementary method to common technical metrics. The chapter introduces the concept of universal single-copy genes, which underlies the BUSCO methodology, covers the basic requirements to set up the tool, and provides guidelines to properly design the analyses, run the assessments, and interpret and utilize the results.

Key words

BUSCO Orthologs Genome completeness Quality assessment Gene content Phylogenomics 



We would like to thank all members of the Zdobnov group, in particular Felipe Simão and Christopher Rands for their useful comments. This work was partly supported by the Swiss Institute of Bioinformatics SER funding and the Swiss National Science Foundation funding 31003A_166483 to E.Z.


  1. 1.
    Vurture GW, Sedlazeck FJ, Nattestad M et al (2017) GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33:2202–2204. CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Chikhi R, Medvedev P (2014) Informed and automated k-mer size selection for genome assembly. Bioinformatics 30:31–37. CrossRefPubMedGoogle Scholar
  3. 3.
    Hunt M, Kikuchi T, Sanders M et al (2013) REAPR: a universal tool for genome assembly evaluation. Genome Biol 14:R47. CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Simão FA, Waterhouse RM, Ioannidis P et al (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210–3212. CrossRefGoogle Scholar
  5. 5.
    Waterhouse RM, Seppey M, Simão FA et al (2018) BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol Biol Evol 35:543–548. CrossRefGoogle Scholar
  6. 6.
    Parra G, Bradnam K, Korf I (2007) CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23:1061–1067. CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Waterhouse RM, Zdobnov EM, Kriventseva EV (2011) Correlating traits of gene retention, sequence divergence, duplicability and essentiality in vertebrates, arthropods, and fungi. Genome Biol Evol 3:75–86. CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Kriventseva EV, Kuznetsov D, Tegenfeldt F et al (2019) OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res 47:D807–D811. CrossRefPubMedGoogle Scholar
  9. 9.
    Camacho C, Coulouris G, Avagyan V et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421. CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Keller O, Kollmar M, Stanke M, Waack S (2011) A novel hybrid gene prediction method employing protein multiple sequence alignments. Bioinformatics Oxf Engl 27:757–763. CrossRefGoogle Scholar
  11. 11.
    Eddy SR (2011) Accelerated profile HMM searches. PLoS Comput Biol 7:e1002195. CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Araujo NS, Santos PKF, Arias MC (2018) RNA-Seq reveals that mitochondrial genes and long non-coding RNAs may play important roles in the bivoltine generations of the non-social Neotropical bee Tetrapedia diversipes. Apidologie 49:3–12. CrossRefGoogle Scholar
  13. 13.
    Keren H, Lev-Maor G, Ast G (2010) Alternative splicing and evolution: diversification, exon definition and function. Nat Rev Genet 11:345–355. CrossRefPubMedGoogle Scholar
  14. 14.
    Kollmar M, Mühlhausen S (2017) Nuclear codon reassignments in the genomics era and mechanisms behind their evolution. Bioessays 39:1600221. CrossRefGoogle Scholar
  15. 15.
    Ioannidis P, Simao FA, Waterhouse RM et al (2017) Genomic features of the Damselfly Calopteryx splendens representing a Sister Clade to most insect orders. Genome Biol Evol 9:415–430. CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Holt C, Campbell M, Keays DA et al (2018) Improved genome assembly and annotation for the rock pigeon (Columba livia). G3 Genes Genomes Genet 8:1391–1398. CrossRefGoogle Scholar
  17. 17.
    Plomion C, Aury J-M, Amselem J et al (2018) Oak genome reveals facets of long lifespan. Nat Plants. CrossRefGoogle Scholar
  18. 18.
    Armstrong EE, Prost S, Ertz D et al (2018) Draft genome sequence and annotation of the Lichen-forming fungus Arthonia radiata. Genome Announc 6:e00281–e00218. CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Carruthers M, Yurchenko AA, Augley JJ et al (2018) De novo transcriptome assembly, annotation and comparison of four ecological and evolutionary model salmonid fish species. BMC Genomics 19:32. CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Teh BT, Lim K, Yong CH et al (2017) The draft genome of tropical fruit durian (Durio zibethinus). Nat Genet 49:1633–1641. CrossRefPubMedGoogle Scholar
  21. 21.
    Core Team R (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
  22. 22.
    Wickham H (2009) Ggplot2: elegant graphics for data analysis. Springer, New York, NYCrossRefGoogle Scholar
  23. 23.
    Korf I (2004) Gene finding in novel genomes. BMC Bioinformatics 5:59. CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Blanco E, Parra G, Guigó R (2007) Using geneid to identify genes. In: Baxevanis AD, Davison DB, Page RDM et al (eds) Current protocols in bioinformatics. John Wiley & Sons, Inc., Hoboken, NJGoogle Scholar
  25. 25.
    Borodovsky M, Lomsadze A (2011) Eukaryotic gene prediction using GeneMark.hmm-E and GeneMark-ES. Curr Protoc Bioinformatics 35:4.6.1–4.6.10. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Mathieu Seppey
    • 1
  • Mosè Manni
    • 1
  • Evgeny M. Zdobnov
    • 1
    Email author
  1. 1.Department of Genetic Medicine and Development, Swiss Institute of BioinformaticsUniversity of Geneva Medical SchoolGenevaSwitzerland

Personalised recommendations