Bioinformatics Approaches and Software for Detection of Secondary Metabolic Gene Clusters

  • Natalie D. Fedorova
  • Venkatesh Moktali
  • Marnix H. Medema
Part of the Methods in Molecular Biology book series (MIMB, volume 944)


The accelerating pace of microbial genomics is sparking a renaissance in the field of natural products research. Researchers can now get a preview of the organism’s secondary metabolome by analyzing its genomic sequence. Combined with other -omics data, this approach may provide a cost-effective alternative to industrial high-throughput screening in drug discovery. In the last few years, several computational tools have been developed to facilitate this process by identifying genes involved in secondary metabolite biosynthesis in bacterial and fungal genomes. Here, we review seven software programs that are available for this purpose, with an emphasis on antibiotics & Secondary Metabolite Analysis SHell (antiSMASH) and Secondary Metabolite Unknown Regions Finder (SMURF), the only tools that can comprehensively detect complete secondary metabolite biosynthesis gene clusters. We also discuss five related software packages—CLUster SEquence ANalyzer (CLUSEAN), ClustScan, Structure Based Sequence Analysis of Polyketide Synthases (SBSPKS), NRPSPredictor, and Natural Product searcher (NP.searcher)—that identify secondary metabolite backbone biosynthesis genes. This chapter offers detailed protocols, suggestions, and caveats to assist researchers in using these tools most effectively.

Key words

Fungi Genome Gene cluster Secondary metabolite Mycotoxin Polyketide Nonribosomal peptide Natural product Antibiotic Software 


  1. 1.
    Winter JM, Behnken S, Hertweck C (2011) Genomics-inspired discovery of natural products. Curr Opin Chem Biol 15(1):22–31PubMedCrossRefGoogle Scholar
  2. 2.
    Keller NP, Hohn TM (1997) Metabolic pathway gene clusters in filamentous fungi. Fungal Genet Biol 21(1):17–29CrossRefGoogle Scholar
  3. 3.
    Khaldi N, Seifuddin FT, Turner G, Haft D, Nierman WC, Wolfe KH, Fedorova ND (2010) SMURF: Genomic mapping of fungal secondary metabolite clusters. Fungal Genet Biol 47(9):736–741PubMedCrossRefGoogle Scholar
  4. 4.
    Medema MH, Blin K, Cimermancic P, de Jager V, Zakrzewski P, Fischbach MA, Weber T, Takano E, Breitling R (2011) antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res 39:W339–W346PubMedCrossRefGoogle Scholar
  5. 5.
    Weber T, Rausch C, Lopez P, Hoof I, Gaykova V, Huson DH, Wohlleben W (2009) CLUSEAN: a computer-based framework for the automated analysis of bacterial secondary metabolite biosynthetic gene clusters. J Biotechnol 140(1–2):13–17PubMedCrossRefGoogle Scholar
  6. 6.
    Starcevic A, Zucko J, Simunkovic J, Long PF, Cullum J, Hranueli D (2008) ClustScan: an integrated program package for the semi-automatic annotation of modular biosynthetic gene clusters and in silico prediction of novel ­chemical structures. Nucleic Acids Res 36(21):6882–6892PubMedCrossRefGoogle Scholar
  7. 7.
    Anand S, Prasad MV, Yadav G, Kumar N, Shehara J, Ansari MZ, Mohanty D (2010) SBSPKS: structure based sequence analysis of polyketide synthases. Nucleic Acids Res 38:W487–W496PubMedCrossRefGoogle Scholar
  8. 8.
    Li MH, Ung PM, Zajkowski J, Garneau-Tsodikova S, Sherman DH (2009) Automated genome mining for natural products. BMC Bioinformatics 10:185PubMedCrossRefGoogle Scholar
  9. 9.
    Rausch C, Weber T, Kohlbacher O, Wohlleben W, Huson DH (2005) Specificity prediction of adenylation domains in nonribosomal peptide synthetases (NRPS) using transductive support vector machines (TSVMs). Nucleic Acids Res 33(18):5799–5808PubMedCrossRefGoogle Scholar
  10. 10.
    Röttig M, Medema MH, Blin K, Weber T, Rausch C, Kohlbacher O (2011) NRPSpredictor2—a web server for predicting NRPS adenylation domain specificity. Nucleic Acids Res 39(2):W362–W367PubMedCrossRefGoogle Scholar
  11. 11.
    Lansini G, Demain AL (1999) Biology of the prokaryotes. Georg Thieme, StuttgartGoogle Scholar
  12. 12.
    Majoros WH, Pertea M, Salzberg SL (2004) TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20(16):2878–2879PubMedCrossRefGoogle Scholar
  13. 13.
    Delcher AL, Bratke KA, Powers EC, Salzberg SL (2007) Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23(6):673–679PubMedCrossRefGoogle Scholar
  14. 14.
    Cantarel BL, Korf I, Robb SM, Parra G, Ross E, Moore B, Holt C, Sanchez Alvarado A, Yandell M (2008) MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res 18(1):188–196PubMedCrossRefGoogle Scholar
  15. 15.
    Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M et al (2008) The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9:75PubMedCrossRefGoogle Scholar
  16. 16.
    Sonnhammer EL, Eddy SR, Birney E, Bateman A, Durbin R (1998) Pfam: multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res 26(1):320–322PubMedCrossRefGoogle Scholar
  17. 17.
    Haft DH, Selengut JD, White O (2003) The TIGRFAMs database of protein families. Nucleic Acids Res 31(1):371–373PubMedCrossRefGoogle Scholar
  18. 18.
    Starcevic A, Diminic J, Zucko J, Elbekali M, Schlosser T, Lisfi M, Vukelic A, Long PF, Hranueli D, Cullum J (2011) A novel docking domain interface model predicting recombination between homoeologous modular biosynthetic gene clusters. J Ind Microbiol Biotechnol 38(9):1295–1304. doi:10.1007/s10295-10010-10909-10290 PubMedCrossRefGoogle Scholar
  19. 19.
    Wortman JR, Gilsenan JM, Joardar V, Deegan J, Clutterbuck J, Andersen MR, Archer D, Bencina M, Braus G, Coutinho P et al (2009) The 2008 update of the Aspergillus nidulans genome annotation: a community effort. Fungal Genet Biol 46(Suppl 1):S2–S13PubMedCrossRefGoogle Scholar
  20. 20.
    Ma L-J, Fedorova ND (2010) A practical guide to fungal genome projects. Mycol Int J Fungal Biol 1(1):9–24CrossRefGoogle Scholar
  21. 21.
    Pati A, Ivanova NN, Mikhailova N, Ovchinnikova G, Hooper SD, Lykidis A, Kyrpides NC (2010) GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes. Nat Methods 7(6):455–457PubMedCrossRefGoogle Scholar
  22. 22.
    Nicholson MJ, Koulman A, Monahan BJ, Pritchard BL, Payne GA, Scott B (2009) Identification of two aflatrem biosynthesis gene loci in Aspergillus flavus and metabolic engineering of Penicillium paxilli to elucidate their function. Appl Environ Microbiol 75(23):7469–7481PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Natalie D. Fedorova
    • 1
  • Venkatesh Moktali
    • 1
  • Marnix H. Medema
    • 2
  1. 1.The J. Craig Venter InstituteRockvilleUSA
  2. 2.Groningen Bioinformatics Centre and Department of Microbial Physiology, Groningen Biomolecular Sciences and Biotechnology InstituteUniversity of GroningenGroningenThe Netherlands

Personalised recommendations