Abstract
Gem-Pro is a new tool for gene mining and functional profiling of bacteria. It initially identifies homologous genes using BLAST and then applies three filtering steps to select orthologous gene pairs. The first one uses BLAST score values to identify trivial paralogs. The second filter uses the shared identity percentages of found trivial paralogs as internal witnesses of non-orthology to set orthology cutoff values. The third filtering step uses conditional probabilities of orthology and non-orthology to define new cutoffs and generate supportive information of orthology assignations. Additionally, a subsidiary tool, called q-GeM, was also developed to mine traits of interest using logistic regression (LR) or linear discriminant analysis (LDA) classifiers. q-GeM is more efficient in the use of computing resources than Gem-Pro but needs an initial classified set of homologous genes in order to train LR and LDA classifiers. Hence, q-GeM could be used to analyze new set of strains with available genome sequences, without the need to rerun a complete Gem-Pro analysis. Finally, Gem-Pro and q-GeM perform a synteny analysis to evaluate the integrity and genomic arrangement of specific pathways of interest to infer their presence. The tools were applied to more than 2 million homologous pairs encoded by Bacillus strains generating statistical supported predictions of trait contents. The different patterns of encoded traits of interest were successfully used to perform a descriptive bacterial profiling.
Similar content being viewed by others
References
Aleti G, Sessitsch A, Brader G (2015) Genome mining: prediction of lipopeptides and polyketides from Bacillus and related Firmicutes. Comput Struct Biotechnol J 13:192–203. https://doi.org/10.1016/j.csbj.2015.03.003
Altenhoff AM, Dessimoz C (2009) Phylogenetic and functional assessment of orthologs inference projects and methods. PLoS Comput Biol 5:e1000262. https://doi.org/10.1371/journal.pcbi.1000262
Altenhoff AM, Dessimoz C (2012) Inferring orthology and paralogy. Methods Mol Biol 855:259–279. https://doi.org/10.1007/978-1-61779-582-4_9
Belbahri L, Bouket AC, Rekik I, Alenezi FN, Vallat A, Luptakova L, Petrovova E, Oszako T, Cherrad S, Vacher S, Rateb ME (2017) Comparative genomics of Bacillus amyloliquefaciens strains reveals a core genome with traits for habitat adaptation and a secondary metabolites rich accessory genome. Front Microbiol 8:1–15. https://doi.org/10.3389/fmicb.2017.01438
Borowiec ML (2016) AMAS: a fast tool for alignment manipulation and computing of summary statistics. PeerJ 4:e1660. https://doi.org/10.7717/peerj.1660
Chavali AK, Rhee SY (2018) Bioinformatics tools for the identification of gene clusters that biosynthesize specialized metabolites. Brief Bioinform 19:1022–1034. https://doi.org/10.1093/bib/bbx020
Chen XH, Koumoutsi A, Scholz R, Eisenreich A, Schneider K, Heinemeyer I, Morgenstern B, Voss B, Hess WR, Reva O, Junge H, Voigt B, Jungblut PR, Vater J, Süssmuth R, Liesegang H, Strittmatter A, Gottschalk G, Borriss R (2007) Comparative analysis of the complete genome sequence of the plant growth–promoting bacterium Bacillus amyloliquefaciens FZB42. Nat Biotechnol 25:1007–1014. https://doi.org/10.1038/nbt1325
Chen XH, Koumoutsi A, Scholz R, Borriss R (2008) More than anticipated - production of antibiotics and other secondary metabolites by Bacillus amyloliquefaciens FZB42. J Mol Microbiol Biotechnol 16:14–24. https://doi.org/10.1159/000142891
Chen XH, Koumoutsi A, Scholz R, Schneider K, Vater J, Sussmuth R, Piel J, Borriss R (2009a) Genome analysis of Bacillus amyloliquefaciens FZB42 reveals its potential for biocontrol of plant pathogens. J Biotechnol 140:27–37. https://doi.org/10.1016/j.jbiotec.2008.10.011
Chen XH, Scholz R, Borriss M, Junge H, Mögel G, Kunz S, Borriss R (2009b) Difficidin and bacilysin produced by plant-associated Bacillus amyloliquefaciens are efficient in controlling fire blight disease. J Biotechnol 140:38–44. https://doi.org/10.1016/j.jbiotec.2008.10.015
Chowdhury SP, Dietel K, Rändler M, Schmid M, Junge H, Borriss R, Hartmann A, Grosch R (2013) Effects of Bacillus amyloliquefaciens FZB42 on lettuce growth and health under pathogen pressure and its impact on the rhizosphere bacterial community. PLoS One 8:1–10. https://doi.org/10.1371/journal.pone.0068818
Chowdhury SP, Hartmann A, Gao XW, Borriss R (2015a) Biocontrol mechanism by root-associated Bacillus amyloliquefaciens FZB42 - a review. Front Microbiol 6:1–11. https://doi.org/10.3389/fmicb.2015.00780
Chowdhury SP, Uhl J, Grosch R, Alquéres S, Pittroff S, Dietel K, Schmitt-Kopplin P, Borriss R, Hartmann A (2015b) Cyclic lipopeptides of Bacillus amyloliquefaciens subsp. plantarum colonizing the lettuce rhizosphere enhance plant defense responses toward the bottom rot pathogen Rhizoctonia solani. Mol Plant-Microbe Interact 28:984–995. https://doi.org/10.1094/MPMI-03-15-0066-R
Dessimoz C, Boeckmann B, Roth ACJ, Gonnet GH (2006) Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits. Nucleic Acids Res 34:3309–3316. https://doi.org/10.1093/nar/gkl433
Espariz M, Zuljan FA, Esteban L, Magni C (2016) Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: the Bacillus pumilus group case. PLoS One 11:e0163098. https://doi.org/10.1371/journal.pone.0163098
Fan B, Blom J, Klenk H-P, Borriss R (2017) Bacillus amyloliquefaciens, Bacillus velezensis, and Bacillus siamensis form an “operational group B. amyloliquefaciens” within the B. subtilis species complex. Front Microbiol 8:1–15. https://doi.org/10.3389/fmicb.2017.00022
Fang G, Bhardwaj N, Robilotto R, Gerstein MB (2010) Getting started in gene orthology and functional analysis. PLoS Comput Biol 6:e1000703. https://doi.org/10.1371/journal.pcbi.1000703
Forslund K (2011) The relationship between orthology, protein domain architecture and protein function. Stockholm University, Stockholm
Gu Q, Yang Y, Yuan Q, Shi G, Wu L, Lou Z, Huo R, Wu H, Borriss R, Gao X (2017) Bacillomycin D produced by Bacillus amyloliquefaciens is involved in the antagonistic interaction with the plant-pathogenic fungus Fusarium graminearum. Appl Environ Microbiol 83. doi: https://doi.org/10.1128/AEM.01075-17
Horiike T, Minai R, Miyata D, Nakamura Y, Tateno Y (2016) Ortholog-finder: a tool for constructing an ortholog data set. Genome Biol Evol 8:446–457. https://doi.org/10.1093/gbe/evw005
Hyatt D, Chen G-L, LoCascio PF, Land ML, Larimer FW, Hauser LJ (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinf 11:119. https://doi.org/10.1186/1471-2105-11-119
James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning. Springer New York, New York
Jensen RA (2001) Orthologs and paralogs - we need to get it right. Genome Biol 2:INTERACTIONS1002. https://doi.org/10.1186/gb-2001-2-8-interactions1002
Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL (2008) NCBI BLAST: a better web interface. Nucleic Acids Res 36:W5–W9. https://doi.org/10.1093/nar/gkn201
Kierul K, Voigt B, Albrecht D, Chen XH, Carvalhais LC, Borriss R (2015) Influence of root exudates on the extracellular proteome of the plant growth-promoting bacterium Bacillus amyloliquefaciens FZB42. Microbiology 161:131–147. https://doi.org/10.1099/mic.0.083576-0
Kim B-Y, Lee S, Ahn J, Song J, Kim W, Weon H (2015) Complete genome sequence of Bacillus amyloliquefaciens subsp. plantarum CC178, a phyllosphere bacterium antagonistic to plant pathogenic fungi. Genome Announc 3:1–2. https://doi.org/10.1128/genomeA.01368-14
Koski LB, Golding GB (2001) The closest BLAST hit is often not the nearest neighbor. J Mol Evol 52:540–542. https://doi.org/10.1007/s002390010184
Koumoutsi A, Chen X, Henne A, Hitzeroth G, Franke P, Vater J, Borriss R, Liesegang H (2004) Structural and functional characterization of gene clusters directing nonribosomal synthesis of bioactive cyclic lipopeptides in Bacillus amyloliquefaciens strain FZB42. J Bacteriol 186:1084–1096. https://doi.org/10.1128/JB.186.4.1084
Kristensen DM, Wolf YI, Mushegian AR, Koonin EV (2011) Computational methods for gene orthology inference. Brief Bioinform 12:379–391. https://doi.org/10.1093/bib/bbr030
Kuzniar A, van Ham RCHJ, Pongor S, Leunissen JAM (2008) The quest for orthologs: finding the corresponding gene across genomes. Trends Genet 24:539–551. https://doi.org/10.1016/j.tig.2008.08.009
Larkin M, Blackshields G, Brown N, Chenna R, McGettigan P, McWilliam H, Valentin F, Wallace I, Wilm A, Lopez R, Thompson J, Gibson T, Higgins D (2007) ClustalW and ClustalX version 2. Bioinformatics 23:2947–2948. https://doi.org/10.1093/bioinformatics/btm404
Lee I, Kim YO, Park SC, Chun J (2016) OrthoANI: an improved algorithm and software for calculating average nucleotide identity. Int J Syst Evol Microbiol 66:1100–1103. https://doi.org/10.1099/ijsem.0.000760
Letunic I, Bork P (2011) Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic Acids Res 39:W475–W478. https://doi.org/10.1093/nar/gkr201
Li L (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13:2178–2189. https://doi.org/10.1101/gr.1224503
Linard B, Thompson JD, Poch O, Lecompte O (2011) OrthoInspector: comprehensive orthology analysis and visual exploration. BMC Bioinf 12:11. https://doi.org/10.1186/1471-2105-12-11
Liu Z, Budiharjo A, Wang P, Shi H, Fang J, Borriss R, Zhang K, Huang X (2013) The highly modified microcin peptide plantazolicin is associated with nematicidal activity of Bacillus amyloliquefaciens FZB42. Appl Microbiol Biotechnol 97:10081–10090. https://doi.org/10.1007/s00253-013-5247-5
Mi H, Muruganujan A, Thomas PD (2013) PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res 41:377–386. https://doi.org/10.1093/nar/gks1118
Nichio BTL, Marchaukoski JN, Raittz RT (2017) New tools in orthology analysis: a brief review of promising perspectives. Front Genet 8:1–12. https://doi.org/10.3389/fgene.2017.00165
Overbeek R, Fonstein M, D’Souza M, Pusch GD, Maltsev N (1999) The use of gene clusters to infer functional coupling. Proc Natl Acad Sci U S A 96:2896–2901. https://doi.org/10.1073/pnas.96.6.2896
Powell S, Forslund K, Szklarczyk D, Trachana K, Roth A, Huerta-Cepas J, Gabaldón T, Rattei T, Creevey C, Kuhn M, Jensen LJ, Von Mering C, Bork P (2014) EggNOG v4.0: nested orthology inference across 3686 organisms. Nucleic Acids Res 42:231–239. https://doi.org/10.1093/nar/gkt1253
Pryszcz LP, Huerta-Cepas J, Gabaldón T (2011) MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score. Nucleic Acids Res 39:e32–e32. https://doi.org/10.1093/nar/gkq953
Rahman A, Uddin W, Wenner NG (2015) Induced systemic resistance responses in perennial ryegrass against Magnaporthe oryzae elicited by semi-purified surfactin lipopeptides and live cells of Bacillus amyloliquefaciens. Mol Plant Pathol 16:546–558. https://doi.org/10.1111/mpp.12209
Ranea JAG, Sillero A, Thornton JM, Orengo CA (2006) Protein superfamily evolution and the last universal common ancestor (LUCA). J Mol Evol 63:513–525. https://doi.org/10.1007/s00239-005-0289-7
Remm M, Storm CEV, Sonnhammer ELL (2001) Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol 314:1041–1052. https://doi.org/10.1006/jmbi.2000.5197
Ryu CM, Farag MA, Hu CH, Reddy MS, Wei HX, Pare PW, Kloepper JW (2003) Bacterial volatiles promote growth in Arabidopsis. Proc Natl Acad Sci U S A 100:4927–4932. https://doi.org/10.1073/pnas.0730845100
Sangar V, Blankenberg DJ, Altman N, Lesk AM (2007) Quantitative sequence-function relationships in proteins based on gene ontology. BMC Bioinf 8:1–15. https://doi.org/10.1186/1471-2105-8-294
Schneider K, Chen X-H, Vater J, Franke P, Nicholson G, Borriss R, Süssmuth RD (2007) Macrolactin is the polyketide biosynthesis product of the pks2 cluster of Bacillus amyloliquefaciens FZB42. J Nat Prod 70:1417–1423. https://doi.org/10.1021/np070070k
Scholz R, Molohon KJ, Nachtigall J, Vater J, Markley AL, Sussmuth RD, Mitchell DA, Borriss R (2011) Plantazolicin, a novel microcin B17/streptolysin S-like natural product from Bacillus amyloliquefaciens FZB42. J Bacteriol 193:215–224. https://doi.org/10.1128/JB.00784-10
Scholz R, Vater J, Budiharjo A, Wang Z, He Y, Dietel K, Schwecke T, Herfort S, Lasch P, Borriss R (2014) Amylocyclicin, a novel circular bacteriocin produced by Bacillus amyloliquefaciens FZB42. J Bacteriol 196:1842–1852. https://doi.org/10.1128/JB.01474-14
Schreiber F, Patricio M, Muffato M, Pignatelli M, Bateman A (2014) TreeFam v9: a new website, more species and orthology-on-the-fly. Nucleic Acids Res 42:922–925. https://doi.org/10.1093/nar/gkt1055
Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313. https://doi.org/10.1093/bioinformatics/btu033
Studer RA, Robinson-Rechavi M (2009) How confident can we be that orthologs are similar, but paralogs differ? Trends Genet 25:210–216. https://doi.org/10.1016/j.tig.2009.03.004
Sullivan MJ, Petty NK, Beatson SA (2011) Easyfig: a genome comparison visualizer. Bioinformatics 27:1009–1010. https://doi.org/10.1093/bioinformatics/btr039
Sutphin GL, Mahoney JM, Sheppard K, Walton DO, Korstanje R (2016) WORMHOLE: novel least diverged ortholog prediction through machine learning. PLoS Comput Biol 12:e1005182. https://doi.org/10.1371/journal.pcbi.1005182
Suzuki R, Shimodaira H (2006) Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics 22:1540–1542. https://doi.org/10.1093/bioinformatics/btl117
Talavera G, Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56:564–577. https://doi.org/10.1080/10635150701472164
Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science 278:631–637. https://doi.org/10.1126/science.278.5338.631
Train CM, Glover NM, Gonnet GH, Altenhoff AM, Dessimoz C (2017) Orthologous matrix (OMA) algorithm 2.0: more robust to asymmetric evolutionary rates and more scalable hierarchical orthologous group inference. Bioinformatics 33:i75–i82. https://doi.org/10.1093/bioinformatics/btx229
Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E (2009) EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Res 19:327–335. https://doi.org/10.1101/gr.073585.107
Waterhouse RM, Zdobnov EM, Tegenfeldt F, Li J, Kriventseva EV (2011) OrthoDB: the hierarchical catalog of eukaryotic orthologs in 2011. Nucleic Acids Res 39:271–275. https://doi.org/10.1093/nar/gkq930
Wu L, Wu H, Chen L, Xie S, Zang H, Borriss R, Gao X (2014) Bacilysin from Bacillus amyloliquefaciens FZB42 has specific bactericidal activity against harmful algal bloom species. Appl Environ Microbiol 80:7512–7520. https://doi.org/10.1128/AEM.02605-14
Wu L, Wu H, Chen L, Lin L, Borriss R, Gao X (2015a) Bacilysin overproduction in Bacillus amyloliquefaciens FZB42 markerless derivative strains FZBREP and FZBSPA enhances antibacterial activity. Appl Microbiol Biotechnol 99:4255–4263. https://doi.org/10.1007/s00253-014-6251-0
Wu L, Wu H, Chen L, Yu X, Borriss R, Gao X (2015b) Difficidin and bacilysin from Bacillus amyloliquefaciens FZB42 have antibacterial activity against Xanthomonas oryzae rice pathogens. Sci Rep 5:12975. https://doi.org/10.1038/srep12975
Xu B, Lu Y, Ye Z, Zheng Q, Wei T, Lin J-F, Guo L-Q (2018) Genomics-guided discovery and structure identification of cyclic lipopeptides from the Bacillus siamensis JFL15. PLoS One 13:e0202893. https://doi.org/10.1371/journal.pone.0202893
Yoon SH, Ha SM, Kwon S, Lim J, Kim Y, Seo H, Chun J (2017) Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. Int J Syst Evol Microbiol 67:1613–1617. https://doi.org/10.1099/ijsem.0.001755
Acknowledgements
MATM and MDP are CONICET fellows; CM, LD, and ME are researchers of the same institution. MP is professor at UNR.
Funding
This study was funded by Agencia Nacional de Promoción Científica y Tecnológica (PICT 2014-1513 to CM, PICT 2015-2361 to ME, PICT-2016-0426 to LD).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Torres Manno, M.A., Pizarro, M.D., Prunello, M. et al. GeM-Pro: a tool for genome functional mining and microbial profiling. Appl Microbiol Biotechnol 103, 3123–3134 (2019). https://doi.org/10.1007/s00253-019-09648-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00253-019-09648-8