Advertisement

GeM-Pro: a tool for genome functional mining and microbial profiling

  • Mariano A. Torres Manno
  • María D. Pizarro
  • Marcos Prunello
  • Christian Magni
  • Lucas D. DaurelioEmail author
  • Martín EsparizEmail author
Genomics, transcriptomics, proteomics
  • 35 Downloads

Abstract

Gem-Pro is a new tool for gene mining and functional profiling of bacteria. It initially identifies homologous genes using BLAST and then applies three filtering steps to select orthologous gene pairs. The first one uses BLAST score values to identify trivial paralogs. The second filter uses the shared identity percentages of found trivial paralogs as internal witnesses of non-orthology to set orthology cutoff values. The third filtering step uses conditional probabilities of orthology and non-orthology to define new cutoffs and generate supportive information of orthology assignations. Additionally, a subsidiary tool, called q-GeM, was also developed to mine traits of interest using logistic regression (LR) or linear discriminant analysis (LDA) classifiers. q-GeM is more efficient in the use of computing resources than Gem-Pro but needs an initial classified set of homologous genes in order to train LR and LDA classifiers. Hence, q-GeM could be used to analyze new set of strains with available genome sequences, without the need to rerun a complete Gem-Pro analysis. Finally, Gem-Pro and q-GeM perform a synteny analysis to evaluate the integrity and genomic arrangement of specific pathways of interest to infer their presence. The tools were applied to more than 2 million homologous pairs encoded by Bacillus strains generating statistical supported predictions of trait contents. The different patterns of encoded traits of interest were successfully used to perform a descriptive bacterial profiling.

Keywords

Bacterial profiling Gene mining Orthology Phylogenomic analysis Plant growth–promoting rhizobacteria Bacillus 

Notes

Acknowledgements

MATM and MDP are CONICET fellows; CM, LD, and ME are researchers of the same institution. MP is professor at UNR.

Funding information

This study was funded by Agencia Nacional de Promoción Científica y Tecnológica (PICT 2014-1513 to CM, PICT 2015-2361 to ME, PICT-2016-0426 to LD).

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Supplementary material

253_2019_9648_MOESM1_ESM.xls (344 kb)
ESM 1 (XLS 344 kb)
253_2019_9648_MOESM2_ESM.pdf (821 kb)
ESM 2 (PDF 820 kb)

References

  1. Aleti G, Sessitsch A, Brader G (2015) Genome mining: prediction of lipopeptides and polyketides from Bacillus and related Firmicutes. Comput Struct Biotechnol J 13:192–203.  https://doi.org/10.1016/j.csbj.2015.03.003 Google Scholar
  2. Altenhoff AM, Dessimoz C (2009) Phylogenetic and functional assessment of orthologs inference projects and methods. PLoS Comput Biol 5:e1000262.  https://doi.org/10.1371/journal.pcbi.1000262 Google Scholar
  3. Altenhoff AM, Dessimoz C (2012) Inferring orthology and paralogy. Methods Mol Biol 855:259–279.  https://doi.org/10.1007/978-1-61779-582-4_9 Google Scholar
  4. Belbahri L, Bouket AC, Rekik I, Alenezi FN, Vallat A, Luptakova L, Petrovova E, Oszako T, Cherrad S, Vacher S, Rateb ME (2017) Comparative genomics of Bacillus amyloliquefaciens strains reveals a core genome with traits for habitat adaptation and a secondary metabolites rich accessory genome. Front Microbiol 8:1–15.  https://doi.org/10.3389/fmicb.2017.01438 Google Scholar
  5. Borowiec ML (2016) AMAS: a fast tool for alignment manipulation and computing of summary statistics. PeerJ 4:e1660.  https://doi.org/10.7717/peerj.1660 Google Scholar
  6. Chavali AK, Rhee SY (2018) Bioinformatics tools for the identification of gene clusters that biosynthesize specialized metabolites. Brief Bioinform 19:1022–1034.  https://doi.org/10.1093/bib/bbx020 Google Scholar
  7. Chen XH, Koumoutsi A, Scholz R, Eisenreich A, Schneider K, Heinemeyer I, Morgenstern B, Voss B, Hess WR, Reva O, Junge H, Voigt B, Jungblut PR, Vater J, Süssmuth R, Liesegang H, Strittmatter A, Gottschalk G, Borriss R (2007) Comparative analysis of the complete genome sequence of the plant growth–promoting bacterium Bacillus amyloliquefaciens FZB42. Nat Biotechnol 25:1007–1014.  https://doi.org/10.1038/nbt1325 Google Scholar
  8. Chen XH, Koumoutsi A, Scholz R, Borriss R (2008) More than anticipated - production of antibiotics and other secondary metabolites by Bacillus amyloliquefaciens FZB42. J Mol Microbiol Biotechnol 16:14–24.  https://doi.org/10.1159/000142891 Google Scholar
  9. Chen XH, Koumoutsi A, Scholz R, Schneider K, Vater J, Sussmuth R, Piel J, Borriss R (2009a) Genome analysis of Bacillus amyloliquefaciens FZB42 reveals its potential for biocontrol of plant pathogens. J Biotechnol 140:27–37.  https://doi.org/10.1016/j.jbiotec.2008.10.011 Google Scholar
  10. Chen XH, Scholz R, Borriss M, Junge H, Mögel G, Kunz S, Borriss R (2009b) Difficidin and bacilysin produced by plant-associated Bacillus amyloliquefaciens are efficient in controlling fire blight disease. J Biotechnol 140:38–44.  https://doi.org/10.1016/j.jbiotec.2008.10.015 Google Scholar
  11. Chowdhury SP, Dietel K, Rändler M, Schmid M, Junge H, Borriss R, Hartmann A, Grosch R (2013) Effects of Bacillus amyloliquefaciens FZB42 on lettuce growth and health under pathogen pressure and its impact on the rhizosphere bacterial community. PLoS One 8:1–10.  https://doi.org/10.1371/journal.pone.0068818 Google Scholar
  12. Chowdhury SP, Hartmann A, Gao XW, Borriss R (2015a) Biocontrol mechanism by root-associated Bacillus amyloliquefaciens FZB42 - a review. Front Microbiol 6:1–11.  https://doi.org/10.3389/fmicb.2015.00780 Google Scholar
  13. Chowdhury SP, Uhl J, Grosch R, Alquéres S, Pittroff S, Dietel K, Schmitt-Kopplin P, Borriss R, Hartmann A (2015b) Cyclic lipopeptides of Bacillus amyloliquefaciens subsp. plantarum colonizing the lettuce rhizosphere enhance plant defense responses toward the bottom rot pathogen Rhizoctonia solani. Mol Plant-Microbe Interact 28:984–995.  https://doi.org/10.1094/MPMI-03-15-0066-R Google Scholar
  14. Dessimoz C, Boeckmann B, Roth ACJ, Gonnet GH (2006) Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits. Nucleic Acids Res 34:3309–3316.  https://doi.org/10.1093/nar/gkl433 Google Scholar
  15. Espariz M, Zuljan FA, Esteban L, Magni C (2016) Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: the Bacillus pumilus group case. PLoS One 11:e0163098.  https://doi.org/10.1371/journal.pone.0163098 Google Scholar
  16. Fan B, Blom J, Klenk H-P, Borriss R (2017) Bacillus amyloliquefaciens, Bacillus velezensis, and Bacillus siamensis form an “operational group B. amyloliquefaciens” within the B. subtilis species complex. Front Microbiol 8:1–15.  https://doi.org/10.3389/fmicb.2017.00022 Google Scholar
  17. Fang G, Bhardwaj N, Robilotto R, Gerstein MB (2010) Getting started in gene orthology and functional analysis. PLoS Comput Biol 6:e1000703.  https://doi.org/10.1371/journal.pcbi.1000703 Google Scholar
  18. Forslund K (2011) The relationship between orthology, protein domain architecture and protein function. Stockholm University, StockholmGoogle Scholar
  19. Gu Q, Yang Y, Yuan Q, Shi G, Wu L, Lou Z, Huo R, Wu H, Borriss R, Gao X (2017) Bacillomycin D produced by Bacillus amyloliquefaciens is involved in the antagonistic interaction with the plant-pathogenic fungus Fusarium graminearum. Appl Environ Microbiol 83. doi:  https://doi.org/10.1128/AEM.01075-17
  20. Horiike T, Minai R, Miyata D, Nakamura Y, Tateno Y (2016) Ortholog-finder: a tool for constructing an ortholog data set. Genome Biol Evol 8:446–457.  https://doi.org/10.1093/gbe/evw005 Google Scholar
  21. Hyatt D, Chen G-L, LoCascio PF, Land ML, Larimer FW, Hauser LJ (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinf 11:119.  https://doi.org/10.1186/1471-2105-11-119 Google Scholar
  22. James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning. Springer New York, New YorkGoogle Scholar
  23. Jensen RA (2001) Orthologs and paralogs - we need to get it right. Genome Biol 2:INTERACTIONS1002.  https://doi.org/10.1186/gb-2001-2-8-interactions1002 Google Scholar
  24. Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL (2008) NCBI BLAST: a better web interface. Nucleic Acids Res 36:W5–W9.  https://doi.org/10.1093/nar/gkn201 Google Scholar
  25. Kierul K, Voigt B, Albrecht D, Chen XH, Carvalhais LC, Borriss R (2015) Influence of root exudates on the extracellular proteome of the plant growth-promoting bacterium Bacillus amyloliquefaciens FZB42. Microbiology 161:131–147.  https://doi.org/10.1099/mic.0.083576-0 Google Scholar
  26. Kim B-Y, Lee S, Ahn J, Song J, Kim W, Weon H (2015) Complete genome sequence of Bacillus amyloliquefaciens subsp. plantarum CC178, a phyllosphere bacterium antagonistic to plant pathogenic fungi. Genome Announc 3:1–2.  https://doi.org/10.1128/genomeA.01368-14
  27. Koski LB, Golding GB (2001) The closest BLAST hit is often not the nearest neighbor. J Mol Evol 52:540–542.  https://doi.org/10.1007/s002390010184 Google Scholar
  28. Koumoutsi A, Chen X, Henne A, Hitzeroth G, Franke P, Vater J, Borriss R, Liesegang H (2004) Structural and functional characterization of gene clusters directing nonribosomal synthesis of bioactive cyclic lipopeptides in Bacillus amyloliquefaciens strain FZB42. J Bacteriol 186:1084–1096.  https://doi.org/10.1128/JB.186.4.1084
  29. Kristensen DM, Wolf YI, Mushegian AR, Koonin EV (2011) Computational methods for gene orthology inference. Brief Bioinform 12:379–391.  https://doi.org/10.1093/bib/bbr030 Google Scholar
  30. Kuzniar A, van Ham RCHJ, Pongor S, Leunissen JAM (2008) The quest for orthologs: finding the corresponding gene across genomes. Trends Genet 24:539–551.  https://doi.org/10.1016/j.tig.2008.08.009 Google Scholar
  31. Larkin M, Blackshields G, Brown N, Chenna R, McGettigan P, McWilliam H, Valentin F, Wallace I, Wilm A, Lopez R, Thompson J, Gibson T, Higgins D (2007) ClustalW and ClustalX version 2. Bioinformatics 23:2947–2948.  https://doi.org/10.1093/bioinformatics/btm404 Google Scholar
  32. Lee I, Kim YO, Park SC, Chun J (2016) OrthoANI: an improved algorithm and software for calculating average nucleotide identity. Int J Syst Evol Microbiol 66:1100–1103.  https://doi.org/10.1099/ijsem.0.000760 Google Scholar
  33. Letunic I, Bork P (2011) Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic Acids Res 39:W475–W478.  https://doi.org/10.1093/nar/gkr201 Google Scholar
  34. Li L (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13:2178–2189.  https://doi.org/10.1101/gr.1224503 Google Scholar
  35. Linard B, Thompson JD, Poch O, Lecompte O (2011) OrthoInspector: comprehensive orthology analysis and visual exploration. BMC Bioinf 12:11.  https://doi.org/10.1186/1471-2105-12-11 Google Scholar
  36. Liu Z, Budiharjo A, Wang P, Shi H, Fang J, Borriss R, Zhang K, Huang X (2013) The highly modified microcin peptide plantazolicin is associated with nematicidal activity of Bacillus amyloliquefaciens FZB42. Appl Microbiol Biotechnol 97:10081–10090.  https://doi.org/10.1007/s00253-013-5247-5 Google Scholar
  37. Mi H, Muruganujan A, Thomas PD (2013) PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res 41:377–386.  https://doi.org/10.1093/nar/gks1118 Google Scholar
  38. Nichio BTL, Marchaukoski JN, Raittz RT (2017) New tools in orthology analysis: a brief review of promising perspectives. Front Genet 8:1–12.  https://doi.org/10.3389/fgene.2017.00165 Google Scholar
  39. Overbeek R, Fonstein M, D’Souza M, Pusch GD, Maltsev N (1999) The use of gene clusters to infer functional coupling. Proc Natl Acad Sci U S A 96:2896–2901.  https://doi.org/10.1073/pnas.96.6.2896 Google Scholar
  40. Powell S, Forslund K, Szklarczyk D, Trachana K, Roth A, Huerta-Cepas J, Gabaldón T, Rattei T, Creevey C, Kuhn M, Jensen LJ, Von Mering C, Bork P (2014) EggNOG v4.0: nested orthology inference across 3686 organisms. Nucleic Acids Res 42:231–239.  https://doi.org/10.1093/nar/gkt1253 Google Scholar
  41. Pryszcz LP, Huerta-Cepas J, Gabaldón T (2011) MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score. Nucleic Acids Res 39:e32–e32.  https://doi.org/10.1093/nar/gkq953 Google Scholar
  42. Rahman A, Uddin W, Wenner NG (2015) Induced systemic resistance responses in perennial ryegrass against Magnaporthe oryzae elicited by semi-purified surfactin lipopeptides and live cells of Bacillus amyloliquefaciens. Mol Plant Pathol 16:546–558.  https://doi.org/10.1111/mpp.12209 Google Scholar
  43. Ranea JAG, Sillero A, Thornton JM, Orengo CA (2006) Protein superfamily evolution and the last universal common ancestor (LUCA). J Mol Evol 63:513–525.  https://doi.org/10.1007/s00239-005-0289-7 Google Scholar
  44. Remm M, Storm CEV, Sonnhammer ELL (2001) Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol 314:1041–1052.  https://doi.org/10.1006/jmbi.2000.5197 Google Scholar
  45. Ryu CM, Farag MA, Hu CH, Reddy MS, Wei HX, Pare PW, Kloepper JW (2003) Bacterial volatiles promote growth in Arabidopsis. Proc Natl Acad Sci U S A 100:4927–4932.  https://doi.org/10.1073/pnas.0730845100
  46. Sangar V, Blankenberg DJ, Altman N, Lesk AM (2007) Quantitative sequence-function relationships in proteins based on gene ontology. BMC Bioinf 8:1–15.  https://doi.org/10.1186/1471-2105-8-294 Google Scholar
  47. Schneider K, Chen X-H, Vater J, Franke P, Nicholson G, Borriss R, Süssmuth RD (2007) Macrolactin is the polyketide biosynthesis product of the pks2 cluster of Bacillus amyloliquefaciens FZB42. J Nat Prod 70:1417–1423.  https://doi.org/10.1021/np070070k
  48. Scholz R, Molohon KJ, Nachtigall J, Vater J, Markley AL, Sussmuth RD, Mitchell DA, Borriss R (2011) Plantazolicin, a novel microcin B17/streptolysin S-like natural product from Bacillus amyloliquefaciens FZB42. J Bacteriol 193:215–224.  https://doi.org/10.1128/JB.00784-10 Google Scholar
  49. Scholz R, Vater J, Budiharjo A, Wang Z, He Y, Dietel K, Schwecke T, Herfort S, Lasch P, Borriss R (2014) Amylocyclicin, a novel circular bacteriocin produced by Bacillus amyloliquefaciens FZB42. J Bacteriol 196:1842–1852.  https://doi.org/10.1128/JB.01474-14 Google Scholar
  50. Schreiber F, Patricio M, Muffato M, Pignatelli M, Bateman A (2014) TreeFam v9: a new website, more species and orthology-on-the-fly. Nucleic Acids Res 42:922–925.  https://doi.org/10.1093/nar/gkt1055 Google Scholar
  51. Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313.  https://doi.org/10.1093/bioinformatics/btu033 Google Scholar
  52. Studer RA, Robinson-Rechavi M (2009) How confident can we be that orthologs are similar, but paralogs differ? Trends Genet 25:210–216.  https://doi.org/10.1016/j.tig.2009.03.004 Google Scholar
  53. Sullivan MJ, Petty NK, Beatson SA (2011) Easyfig: a genome comparison visualizer. Bioinformatics 27:1009–1010.  https://doi.org/10.1093/bioinformatics/btr039 Google Scholar
  54. Sutphin GL, Mahoney JM, Sheppard K, Walton DO, Korstanje R (2016) WORMHOLE: novel least diverged ortholog prediction through machine learning. PLoS Comput Biol 12:e1005182.  https://doi.org/10.1371/journal.pcbi.1005182 Google Scholar
  55. Suzuki R, Shimodaira H (2006) Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics 22:1540–1542.  https://doi.org/10.1093/bioinformatics/btl117 Google Scholar
  56. Talavera G, Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56:564–577.  https://doi.org/10.1080/10635150701472164 Google Scholar
  57. Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science 278:631–637.  https://doi.org/10.1126/science.278.5338.631 Google Scholar
  58. Train CM, Glover NM, Gonnet GH, Altenhoff AM, Dessimoz C (2017) Orthologous matrix (OMA) algorithm 2.0: more robust to asymmetric evolutionary rates and more scalable hierarchical orthologous group inference. Bioinformatics 33:i75–i82.  https://doi.org/10.1093/bioinformatics/btx229 Google Scholar
  59. Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E (2009) EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Res 19:327–335.  https://doi.org/10.1101/gr.073585.107 Google Scholar
  60. Waterhouse RM, Zdobnov EM, Tegenfeldt F, Li J, Kriventseva EV (2011) OrthoDB: the hierarchical catalog of eukaryotic orthologs in 2011. Nucleic Acids Res 39:271–275.  https://doi.org/10.1093/nar/gkq930 Google Scholar
  61. Wu L, Wu H, Chen L, Xie S, Zang H, Borriss R, Gao X (2014) Bacilysin from Bacillus amyloliquefaciens FZB42 has specific bactericidal activity against harmful algal bloom species. Appl Environ Microbiol 80:7512–7520.  https://doi.org/10.1128/AEM.02605-14 Google Scholar
  62. Wu L, Wu H, Chen L, Lin L, Borriss R, Gao X (2015a) Bacilysin overproduction in Bacillus amyloliquefaciens FZB42 markerless derivative strains FZBREP and FZBSPA enhances antibacterial activity. Appl Microbiol Biotechnol 99:4255–4263.  https://doi.org/10.1007/s00253-014-6251-0 Google Scholar
  63. Wu L, Wu H, Chen L, Yu X, Borriss R, Gao X (2015b) Difficidin and bacilysin from Bacillus amyloliquefaciens FZB42 have antibacterial activity against Xanthomonas oryzae rice pathogens. Sci Rep 5:12975.  https://doi.org/10.1038/srep12975 Google Scholar
  64. Xu B, Lu Y, Ye Z, Zheng Q, Wei T, Lin J-F, Guo L-Q (2018) Genomics-guided discovery and structure identification of cyclic lipopeptides from the Bacillus siamensis JFL15. PLoS One 13:e0202893.  https://doi.org/10.1371/journal.pone.0202893 Google Scholar
  65. Yoon SH, Ha SM, Kwon S, Lim J, Kim Y, Seo H, Chun J (2017) Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. Int J Syst Evol Microbiol 67:1613–1617.  https://doi.org/10.1099/ijsem.0.001755 Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Laboratorio de Biotecnología e Inocuidad de los Alimentos, Facultad de Ciencias Bioquímicas y FarmacéuticasUniversidad Nacional de RosarioRosarioArgentina
  2. 2.Laboratorio de Genética y Fisiología de Bacterias Lácticas, Instituto de Biología Molecular y Celular de Rosario (IBR - CONICET)sede FCByF – UNRRosarioArgentina
  3. 3.Laboratorio de Investigaciones en Fisiología y Biología Molecular Vegetal (LIFiBVe), Cátedra de Fisiología Vegetal, Facultad de Ciencias AgrariasUniversidad Nacional del LitoralEsperanzaArgentina
  4. 4.Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)Buenos AiresArgentina
  5. 5.Área Estadística y Procesamiento de Datos, Departamento de Matemática y Estadística, Facultad de Ciencias Bioquímicas y FarmacéuticasUniversidad Nacional de RosarioRosarioArgentina

Personalised recommendations