Advertisement

BUSCO: Assessing Genome Assembly and Annotation Completeness

  • Mathieu Seppey
  • Mosè Manni
  • Evgeny M. ZdobnovEmail author
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1962)

Abstract

Genomics drives the current progress in molecular biology, generating unprecedented volumes of data. The scientific value of these sequences depends on the ability to evaluate their completeness using a biologically meaningful approach. Here, we describe the use of the BUSCO tool suite to assess the completeness of genomes, gene sets, and transcriptomes, using their gene content as a complementary method to common technical metrics. The chapter introduces the concept of universal single-copy genes, which underlies the BUSCO methodology, covers the basic requirements to set up the tool, and provides guidelines to properly design the analyses, run the assessments, and interpret and utilize the results.

Key words

BUSCO Orthologs Genome completeness Quality assessment Gene content Phylogenomics 

Notes

Acknowledgments

We would like to thank all members of the Zdobnov group, in particular Felipe Simão and Christopher Rands for their useful comments. This work was partly supported by the Swiss Institute of Bioinformatics SER funding and the Swiss National Science Foundation funding 31003A_166483 to E.Z.

References

  1. 1.
    Vurture GW, Sedlazeck FJ, Nattestad M et al (2017) GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33:2202–2204.  https://doi.org/10.1093/bioinformatics/btx153 CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Chikhi R, Medvedev P (2014) Informed and automated k-mer size selection for genome assembly. Bioinformatics 30:31–37.  https://doi.org/10.1093/bioinformatics/btt310 CrossRefPubMedGoogle Scholar
  3. 3.
    Hunt M, Kikuchi T, Sanders M et al (2013) REAPR: a universal tool for genome assembly evaluation. Genome Biol 14:R47.  https://doi.org/10.1186/gb-2013-14-5-r47 CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Simão FA, Waterhouse RM, Ioannidis P et al (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210–3212.  https://doi.org/10.1093/bioinformatics/btv351 CrossRefGoogle Scholar
  5. 5.
    Waterhouse RM, Seppey M, Simão FA et al (2018) BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol Biol Evol 35:543–548.  https://doi.org/10.1093/molbev/msx319 CrossRefGoogle Scholar
  6. 6.
    Parra G, Bradnam K, Korf I (2007) CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23:1061–1067.  https://doi.org/10.1093/bioinformatics/btm071 CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Waterhouse RM, Zdobnov EM, Kriventseva EV (2011) Correlating traits of gene retention, sequence divergence, duplicability and essentiality in vertebrates, arthropods, and fungi. Genome Biol Evol 3:75–86.  https://doi.org/10.1093/gbe/evq083 CrossRefGoogle Scholar
  8. 8.
    Kriventseva EV, Kuznetsov D, Tegenfeldt F et al (2019) OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res 47:D807–D811.  https://doi.org/10.1093/nar/gky1053 CrossRefPubMedGoogle Scholar
  9. 9.
    Camacho C, Coulouris G, Avagyan V et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421.  https://doi.org/10.1186/1471-2105-10-421 CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Keller O, Kollmar M, Stanke M, Waack S (2011) A novel hybrid gene prediction method employing protein multiple sequence alignments. Bioinformatics Oxf Engl 27:757–763.  https://doi.org/10.1093/bioinformatics/btr010 CrossRefGoogle Scholar
  11. 11.
    Eddy SR (2011) Accelerated profile HMM searches. PLoS Comput Biol 7:e1002195.  https://doi.org/10.1371/journal.pcbi.1002195 CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Araujo NS, Santos PKF, Arias MC (2018) RNA-Seq reveals that mitochondrial genes and long non-coding RNAs may play important roles in the bivoltine generations of the non-social Neotropical bee Tetrapedia diversipes. Apidologie 49:3–12.  https://doi.org/10.1007/s13592-017-0542-2 CrossRefGoogle Scholar
  13. 13.
    Keren H, Lev-Maor G, Ast G (2010) Alternative splicing and evolution: diversification, exon definition and function. Nat Rev Genet 11:345–355.  https://doi.org/10.1038/nrg2776 CrossRefPubMedGoogle Scholar
  14. 14.
    Kollmar M, Mühlhausen S (2017) Nuclear codon reassignments in the genomics era and mechanisms behind their evolution. Bioessays 39:1600221.  https://doi.org/10.1002/bies.201600221 CrossRefGoogle Scholar
  15. 15.
    Ioannidis P, Simao FA, Waterhouse RM et al (2017) Genomic features of the Damselfly Calopteryx splendens representing a Sister Clade to most insect orders. Genome Biol Evol 9:415–430.  https://doi.org/10.1093/gbe/evx006 CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Holt C, Campbell M, Keays DA et al (2018) Improved genome assembly and annotation for the rock pigeon (Columba livia). G3 Genes Genomes Genet 8:1391–1398.  https://doi.org/10.1534/g3.117.300443 CrossRefGoogle Scholar
  17. 17.
    Plomion C, Aury J-M, Amselem J et al (2018) Oak genome reveals facets of long lifespan. Nat Plants.  https://doi.org/10.1038/s41477-018-0172-3
  18. 18.
    Armstrong EE, Prost S, Ertz D et al (2018) Draft genome sequence and annotation of the Lichen-forming fungus Arthonia radiata. Genome Announc 6:e00281–e00218.  https://doi.org/10.1128/genomeA.00281-18 CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Carruthers M, Yurchenko AA, Augley JJ et al (2018) De novo transcriptome assembly, annotation and comparison of four ecological and evolutionary model salmonid fish species. BMC Genomics 19:32.  https://doi.org/10.1186/s12864-017-4379-x CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Teh BT, Lim K, Yong CH et al (2017) The draft genome of tropical fruit durian (Durio zibethinus). Nat Genet 49:1633–1641.  https://doi.org/10.1038/ng.3972 CrossRefPubMedGoogle Scholar
  21. 21.
    Core Team R (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
  22. 22.
    Wickham H (2009) Ggplot2: elegant graphics for data analysis. Springer, New York, NYCrossRefGoogle Scholar
  23. 23.
    Korf I (2004) Gene finding in novel genomes. BMC Bioinformatics 5:59.  https://doi.org/10.1186/1471-2105-5-59 CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Blanco E, Parra G, Guigó R (2007) Using geneid to identify genes. In: Baxevanis AD, Davison DB, Page RDM et al (eds) Current protocols in bioinformatics. John Wiley & Sons, Inc., Hoboken, NJGoogle Scholar
  25. 25.
    Borodovsky M, Lomsadze A (2011) Eukaryotic gene prediction using GeneMark.hmm-E and GeneMark-ES. Curr Protoc Bioinformatics 35:4.6.1–4.6.10.  https://doi.org/10.1002/0471250953.bi0406s35 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Mathieu Seppey
    • 1
  • Mosè Manni
    • 1
  • Evgeny M. Zdobnov
    • 1
    Email author
  1. 1.Department of Genetic Medicine and Development, Swiss Institute of BioinformaticsUniversity of Geneva Medical SchoolGenevaSwitzerland

Personalised recommendations