Abstract
High availability of fast, cheap, and high-throughput next generation sequencing techniques resulted in acquisition of numerous de novo sequenced and assembled bacterial genomes. It rapidly became clear that digging out useful biological information from such a huge amount of data presents a considerable challenge. In this chapter we share our experience with utilization of several handy open source comparative genomic tools. All of them were applied in the studies focused on revealing inter- and intraspecies variation in pectinolytic plant pathogenic bacteria classified to Dickeya solani and Pectobacterium parmentieri. As the described software performed well on the species within the Pectobacteriaceae family, it presumably may be readily utilized on some closely related taxa from the Enterobacteriaceae family. First of all, implementation of various annotation software is discussed and compared. Then, tools computing whole genome comparisons including generation of circular juxtapositions of multiple sequences, revealing the order of synteny blocks or calculation of ANI or Tetra values are presented. Besides, web servers intended either for functional annotation of the genes of interest or for detection of genomic islands, plasmids, prophages, CRISPR/Cas are described. Last but not least, utilization of the software designed for pangenome studies and the further downstream analyses is explained. The presented work not only summarizes broad possibilities assured by the comparative genomic approach but also provides a user-friendly guide that might be easily followed by nonbioinformaticians interested in undertaking similar studies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chain P, Kurtz S, Ohlebusch E, Slezak T (2003) An applications-focused review of comparative genomics tools: capabilities, limitations and future challenges. Brief Bioinform 4:105–123. https://doi.org/10.1093/bib/4.2.105
Miller W, Makova KD, Nekrutenko A, Hardison RC (2004) Comparative genomics. Annu Rev Genomics Hum Genet 5:15–56. https://doi.org/10.1146/annurev.genom.5.061903.180057
Van Sluys MA, Monteiro-Vitorello CB, Camargo LEA et al (2002) Comparative genomic analysis of plant-associated bacteria. Annu Rev Phytopathol 40:169–189. https://doi.org/10.1146/annurev.phyto.40.030402.090559
Sugawara M, Epstein B, Badgley BD et al (2013) Comparative genomics of the core and accessory genomes of 48 Sinorhizobium strains comprising five genospecies. Genome Biol 14:R17. https://doi.org/10.1186/gb-2013-14-2-r17
Tian X, Zhang Z, Yang T et al (2016) Comparative genomics analysis of Streptomyces species reveals their adaptation to the marine environment and their diversity at the genomic level. Front Microbiol 7:998. https://doi.org/10.3389/fmicb.2016.00998
Rasko DA, Rosovitz MJ, Myers GSA et al (2008) The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates. J Bacteriol 190:6881–6893. https://doi.org/10.1128/JB.00619-08
Mosquera-Rendón J, Rada-Bravo AM, Cárdenas-Brito S et al (2016) Pangenome-wide and molecular evolution analyses of the Pseudomonas aeruginosa species. BMC Genomics 17:45. https://doi.org/10.1186/s12864-016-2364-4
Zhang A, Yang M, Hu P et al (2011) Comparative genomic analysis of Streptococcus suis reveals significant genomic diversity among different serotypes. BMC Genomics 12:523. https://doi.org/10.1186/1471-2164-12-523
Rouli L, Merhej V, Fournier PE, Raoult D (2015) The bacterial pangenome as a new tool for analysing pathogenic bacteria. New Microbes New Infect 7:72–85. https://doi.org/10.1016/j.nmni.2015.06.005
(2003) Act of 24 April 2003 on public benefit and voluntary work. The Council of Ministers, Warsaw
Vincent AT, Schiettekatte O, Goarant C et al (2019) Revisiting the taxonomy and evolution of pathogenicity of the genus Leptospira through the prism of genomics. PLoS Negl Trop Dis 13:e0007270. https://doi.org/10.1371/journal.pntd.0007270
O’Connor E, McGowan J, McCarthy CGP et al (2019) Whole genome sequence of the commercially relevant mushroom strain Agaricus bisporus var. bisporus ARP23. G3 (Bethesda) 9:3057–3066. https://doi.org/10.1534/g3.119.400563
Adeolu M, Alnajar S, Naushad S, Gupta RS (2016) Genome-based phylogeny and taxonomy of the ‘Enterobacteriales’: proposal for Enterobacterales ord. nov. divided into the families Enterobacteriaceae, Erwiniaceae fam. nov., Pectobacteriaceae fam. nov., Yersiniaceae fam. nov., Hafniaceae fam. nov., Morganellaceae fam. nov., and Budviciaceae fam. nov. Int J Syst Evol Microbiol 66:5575–5599. https://doi.org/10.1099/ijsem.0.001485
Śledź W, Jafra S, Waleron M, Lojkowska E (2000) Genetic diversity of Erwinia carotovora strains isolated from infected plants grown in Poland. EPPO Bull 30:403–407. https://doi.org/10.1111/j.1365-2338.2000.tb00919.x
Potrykus M, Golanowska M, Sledz W et al (2016) Biodiversity of Dickeya spp. isolated from potato plants and water sources in temperate climate. Plant Dis 100:408–417. https://doi.org/10.1094/PDIS-04-15-0439-RE
Zoledowska S, Motyka A, Zukowska D et al (2018) Population structure and biodiversity of Pectobacterium parmentieri isolated from potato fields in temperate climate. Plant Dis 102:154–164. https://doi.org/10.1094/PDIS-05-17-0761-RE
Waleron M, Waleron K, Lojkowska E (2013) Occurrence of Pectobacterium wasabiae in potato field samples. Eur J Plant Pathol 137:149–158. https://doi.org/10.1007/s10658-013-0227-2
Waleron M, Misztak AE, Jonca J, Waleron KF (2019) First report of Pectobacterium polaris causing soft rot of potato in Poland. Plant Dis 103:144. https://doi.org/10.1094/PDIS-05-18-0861-PDN
Waleron M, Waleron K, Lojkowska E (2015) First report of Pectobacterium carotovorum subsp. brasiliense causing soft rot on potato and other vegetables in Poland. Plant Dis 99:1271. https://doi.org/10.1094/PDIS-02-15-0180-PDN
Zoledowska S (2019) Characterization of the biodiversity and pan-genome of plant pathogenic bacteria from Pectobacterium parmentieri species. PhD thesis. Univeristy of Gdańsk
Golanowska M, Potrykus M, Motyka-Pomagruk A et al (2018) Comparison of highly and weakly virulent Dickeya solani strains, with a view on the pangenome and panregulon of this species. Front Microbiol 9:1940. https://doi.org/10.3389/fmicb.2018.01940
Misztak AE, śledź W, Mengoni A, Łojkowka E (2020) Comparative genomics and pangenome-oriented studies reveal high homogeneity of the agronomically relevant enterobacterial plant pathogen Dickeya solani. BMC Genomics 21: 449–467. https://doi.org/10.1186/s12864-020-06863-w
Bentley S (2009) Sequencing the species pan-genome. Nat Rev Microbiol 7:258–259. https://doi.org/10.1038/nrmicro2123
Zoledowska S, Motyka-Pomagruk A, Sledz W et al (2018) High genomic variability in the plant pathogenic bacterium Pectobacterium parmenieri deciphered from de novo assembled complete genomes. BMC Genomics 19:751. https://doi.org/10.1186/s12864-018-5140-9
Alikhan NF, Petty NK, Ben Zakour NL, Beatson SA (2011) BLAST ring image generator (BRIG): simple prokaryote genome comparisons. BMC Genomics 12:402. https://doi.org/10.1186/1471-2164-12-402
Darling ACE, Mau B, Blattner FR, Perna NT (2004) Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 14:1394–1403. https://doi.org/10.1101/gr.2289704
Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30:2068–2069. https://doi.org/10.1093/bioinformatics/btu153
Tatusova T, DiCuccio M, Badretdin A et al (2016) Prokaryotic genome annotation pipeline. Nucleic Acids Res 44:6614–6624. https://doi.org/10.1093/nar/gkw569
Page AJ, Cummins CA, Hunt M et al (2015) Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31:3691–3693. https://doi.org/10.1093/bioinformatics/btv421
Chaudhari NM, Gupta VK, Dutta C (2016) BPGA-an ultra-fast pan-genome analysis pipeline. Sci Rep 6:24373. https://doi.org/10.1038/srep24373
Richter M, Rosselló-Móra R, Oliver Glöckner F, Peplies J (2016) JSpeciesWS: a web server for prokaryotic species circumscription based on pairwise genome comparison. Bioinformatics 32:929–931. https://doi.org/10.1093/bioinformatics/btv681
Medema MH, Blin K, Cimermancic P et al (2011) antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res 39:W339–W346. https://doi.org/10.1093/nar/gkr466
Carattoli A, Zankari E, García-Fernández A et al (2014) In silico detection and typing of plasmids using PlasmidFinder and plasmid multilocus sequence typing. Antimicrob Agents Chemother 58:3895–3903. https://doi.org/10.1128/AAC.02412-14
Zhou Y, Liang Y, Lynch KH et al (2011) PHAST: a fast phage search tool. Nucleic Acids Res 39:W347–W352. https://doi.org/10.1093/nar/gkr485
Arndt D, Grant JR, Marcu A et al (2016) PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res 44:W16–W21. https://doi.org/10.1093/nar/gkw387
Couvin D, Bernheim A, Toffano-Nioche C et al (2018) CRISPRCasFinder, an update of CRISRFinder, includes a portable version, enhanced performance and integrates search for Cas proteins. Nucleic Acids Res 46:W246–W251. https://doi.org/10.1093/nar/gky425
Bertelli C, Laird MR, Williams KP et al (2017) IslandViewer 4: expanded prediction of genomic islands for larger-scale datasets. Nucleic Acids Res 45:W30–W35. https://doi.org/10.1093/nar/gkx343
Huerta-Cepas J, Forslund K, Coelho LP et al (2017) Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol Biol Evol 34:2115–2122. https://doi.org/10.1093/molbev/msx148
Stajich JE, Block D, Boulez K et al (2002) The Bioperl toolkit: Perl modules for the life sciences. Genome Res 12:1611–1618. https://doi.org/10.1101/gr.361602
Tange O (2011) Gnu parallel-the command-line power tool. login 36:42–47
Camacho C, Coulouris G, Avagyan V et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421. https://doi.org/10.1186/1471-2105-10-421
Hyatt D, Chen G-L, LoCascio PF et al (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119. https://doi.org/10.1186/1471-2105-11-119
Laslett D, Canback B (2004) ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res 32:11–16. https://doi.org/10.1093/nar/gkh152
Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39:W29–W37. https://doi.org/10.1093/nar/gkr367
Lagesen K, Hallin P, Rødland EA et al (2007) RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35:3100–3108. https://doi.org/10.1093/nar/gkm160
Petersen TN, Brunak S, von Heijne G, Nielsen H (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods 8:785–786. https://doi.org/10.1038/nmeth.1701
Kolbe DL, Eddy SR (2011) Fast filtering for RNA homology search. Bioinformatics 27:3102–3109. https://doi.org/10.1093/bioinformatics/btr545
Haft DH, DiCuccio M, Badretdin A et al (2018) RefSeq: an update on prokaryotic genome annotation and curation. Nucleic Acids Res 46:D851–D860. https://doi.org/10.1093/nar/gkx1068
Darling AE, Mau B, Perna NT (2010) Progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5:e11147. https://doi.org/10.1371/journal.pone.0011147
Stothard P, Wishart DS (2005) Circular genome visualization and exploration using CGView. Bioinformatics 21:537–539. https://doi.org/10.1093/bioinformatics/bti054
Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410. https://doi.org/10.1016/S0022-2836(05)80360-2
Kanehisa M, Goto S, Sato Y et al (2014) Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res 42:D199–D205. https://doi.org/10.1093/nar/gkt1076
Galperin MY, Makarova KS, Wolf YI, Koonin EV (2015) Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res 43:D261–D269. https://doi.org/10.1093/nar/gku1223
Waack S, Keller O, Asper R et al (2006) Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models. BMC Bioinformatics 7:142. https://doi.org/10.1186/1471-2105-7-142
Hsiao W, Wan I, Jones SJ, Brinkman FSL (2003) IslandPath: aiding detection of genomic islands in prokaryotes. Bioinformatics 19:418–420. https://doi.org/10.1093/bioinformatics/btg004
Langille MG, Hsiao WW, Brinkman FS (2008) Evaluation of genomic island predictors using a comparative genomics approach. BMC Bioinformatics 9:329. https://doi.org/10.1186/1471-2105-9-329
Kurtz S (2010) The Vmatch large scale sequence analysis software. A manual. Hamburg, Germany
Biswas A, Fineran PC, Brown CM (2014) Accurate computational prediction of the transcribed strand of CRISPR non-coding RNAs. Bioinformatics 30:1805–1813. https://doi.org/10.1093/bioinformatics/btu114
Abby SS, Néron B, Ménager H et al (2014) MacSyFinder: a program to mine genomes for molecular systems with an application to CRISPR-Cas systems. PLoS One 9:e110726. https://doi.org/10.1371/journal.pone.0110726
Salzberg SL, Delcher AL, Kasif S, White O (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Res 26:544–548. https://doi.org/10.1093/nar/26.2.544
Lowe T, Eddy S (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25:955–964. https://doi.org/10.1093/nar/25.5.955
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96:226–231
Goris J, Konstantinidis KT, Klappenbach JA et al (2007) DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int J Syst Evol Microbiol 57:81–91. https://doi.org/10.1099/ijs.0.64483-0
Kurtz S, Phillippy A, Delcher AL et al (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12. https://doi.org/10.1186/gb-2004-5-2-r12
Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680. https://doi.org/10.1093/nar/22.22.4673
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797. https://doi.org/10.1093/nar/gkh340
Buchfink B, Xie C, Huson DH (2015) Fast and sensitive protein alignment using DIAMOND. Nat Methods 12:59–60. https://doi.org/10.1038/nmeth.3176
Jones P, Binns D, Chang H-Y et al (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236–1240. https://doi.org/10.1093/bioinformatics/btu031
Grüning B, Dale R, Sjödin A et al (2018) Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods 15:475–476. https://doi.org/10.1038/s41592-018-0046-7
Delcher AL, Bratke KA, Powers EC, Salzberg SL (2007) Identifying bacterial genes and endosymbiont DNA with glimmer. Bioinformatics 23:673–679. https://doi.org/10.1093/bioinformatics/btm009
Majoros WH, Pertea M, Salzberg SL (2004) TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20:2878–2879. https://doi.org/10.1093/bioinformatics/bth315
Srividhya KV, Rao GV, Raghavenderan L et al (2006) Database and comparative identification of prophages. In: Intell. Control autom. Springer, Berlin Heidelberg, pp 863–868
Clausen PTLC, Aarestrup FM, Lund O (2018) Rapid and precise alignment of raw reads against redundant databases with KMA. BMC Bioinformatics 19:307. https://doi.org/10.1186/s12859-018-2336-6
Racine J (2006) Gnuplot 4.0: a portable interactive plotting utility. J Appl Econ 21:133–141. https://doi.org/10.1002/jae.885
Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26:2460–2461. https://doi.org/10.1093/bioinformatics/btq461
Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659. https://doi.org/10.1093/bioinformatics/btl158
Li L, Stoeckert CJ, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13:2178–2189. https://doi.org/10.1101/GR.1224503
Acknowledgments
The sequencing and comparative genomics analyses were funded from National Science Centre in Poland via 2014/14/M/NZ8/00501 granted to EL. National Science Centre in Poland via grant 2016/21/N/NZ1/02783 is currently supporting the work of AMP. The authors are highly grateful to Dr. Michal Kabza for provision of the script for genome reorientation, written in Python language.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Zoledowska, S., Motyka-Pomagruk, A., Misztak, A., Lojkowska, E. (2021). Comparative Genomics, from the Annotated Genome to Valuable Biological Information: A Case Study. In: Mengoni, A., Bacci, G., Fondi, M. (eds) Bacterial Pangenomics. Methods in Molecular Biology, vol 2242. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1099-2_7
Download citation
DOI: https://doi.org/10.1007/978-1-0716-1099-2_7
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1098-5
Online ISBN: 978-1-0716-1099-2
eBook Packages: Springer Protocols