Abstract
Mobile genetic elements (MGEs) are an important feature of prokaryote genomes but are seldom well annotated and, consequently, are often underestimated. MGEs include transposons (Tn), insertion sequences (ISs), prophages, genomic islands (GEIs), integrons, and integrative and conjugative elements (ICEs). They are intimately involved in genome evolution and promote phenomena such as genomic expansion and rearrangement, emergence of virulence and pathogenicity, and symbiosis. In spite of the annotation bottleneck, there are so far at least 75 different programs and databases dedicated to prokaryotic MGE analysis and annotation, and this number is rapidly growing. Here, we present a practical guide to explore, compare, and visualize prokaryote MGEs using a combination of available software and databases tailored to small scale genome analyses. This protocol can be coupled with expert MGE annotation and exploited for evolutionary and comparative genomic analyses.
References
Canchaya C, Proux C, Fournous G et al (2003) Prophage genomics. Microbiol Mol Biol Rev 67:238–276
Canchaya C, Fournous G, Chibani-Chennoufi S et al (2003) Phage as agents of lateral gene transfer. Curr Opin Microbiol 6:417–424
Canchaya C, Fournous G, Brüssow H (2004) The impact of prophages on bacterial chromosomes. Mol Microbiol 53:9–18
Burrus V, Waldor MK (2004) Shaping bacterial genomes with integrative and conjugative elements. Res Microbiol 155:376–386
Frost LS, Leplae R, Summers AO et al (2005) Mobile genetic elements: the agents of open source evolution. Nature Rev Microbiol 3:722–732
Mazel D (2006) Integrons: agents of bacterial evolution. Nature Rev Microbiol 4:608–620
Toleman MA, Benett PM, Walsh TR (2006) ISCR elements: novel gene-capturing systems of the 21st century? Microbiol Mol Biol Rev 70:296–316
Juhas M, van der Meer JR, Gaillard M et al (2009) Genomic islands: tools of bacterial horizontal gene transfer and evolution. FEMS Microbiol Rev 33:376–393
Wozniak RA, Waldor MK (2010) Integrative and conjugative elements: mosaic mobile genetic elements enabling dynamic lateral gene flow. Nature Rev Microbiol 8:552–563
Varani AM, Monteiro-Vitorello CB, Nakaya HI et al (2013) The role of prophage in plant-pathogenic bacteria. Annu Rev Phytopathol 51:429–451
Fortier LC, Sekulovic O (2013) Importance of prophages to evolution and virulence of bacterial pathogens. Virulence 4:354–365
Siguier P, Gourbeyre E, Chandler M (2014) Bacterial insertion sequences: their genomic impact and diversity. FEMS Microbiol Rev 38:865–891
Siguier P, Gourbeyre E, Varani AM et al (2015) Everyman’s guide to bacterial insertion sequences. In: Craig N, Chandler M, Gellert M et al (eds) Mobile DNA III. ASM Press, Washington
Camacho C, Coulouris G, Avagyan V et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421
Eddy SR (2011) Accelerated profile HMM searches. PLoS Comput Biol 7:e1002195
O’Leary NA, Wright MW, Brister JR et al (2016) Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 44:D733–D745
Hunter S, Apweiler R, Attwood TK et al (2009) InterPro: the integrative protein signature database. Nucleic Acids Res 37:D211–D215
Punta M, Coggill PC, Eberhardt RY et al (2012) The Pfam protein families database. Nucleic Acids Res 40:D290–D301
Haft DH, Selengut JD, White O (2003) The TIGRFams database of protein families. Nucleic Acids Res 31:371–373
Lima T, Auchincloss AH, Coudert E et al (2009) HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot. Nucleic Acids Res 37:D471–D478
Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30:2068–2069
Aziz RK, Bartels D, Best AA et al (2008) The RAST server: rapid annotations using subsystems technology. BMC Genomics 9:75
Overbeek R, Olson R, Pusch GD et al (2014) The SEED and the rapid annotation of microbial genomes using subsystems technology (RAST). Nucleic Acids Res 42:D206–D214
Markowitz VM, Mavromatis K, Ivanova NN et al (2009) IMG ER: a system for microbial genome annotation expert review and curation. Bioinformatics 17:2271–2278
Varani AM, Siguier P, Gourbeyre E et al (2011) ISsaga is an ensemble of web-based methods for high throughput identification and semi-automatic annotation of insertion sequences in prokaryotic genomes. Genome Biol 12:R30
Siguier P, Perochon J, Lestrade L et al (2006) ISfinder: the reference centre for bacterial insertion sequences. Nucleic Acids Res 34:D32–D36
Bi D, Xu Z, Harrison EM et al (2012) ICEberg: a web-based resource for integrative and conjugative elements found in bacteria. Nucleic Acids Res 40:D621–D626
Moura A, Soares M, Pereira C et al (2009) INTEGRALL: a database and search engine for integrons, integrases and gene cassettes. Bioinformatics 25:1096–1098
Zhou Y, Liang Y, Lynch KH et al (2011) PHAST: a fast phage search tool. Nucleic Acids Res 39:W347–W352
Zomer A, Burghout P, Bootsma HJ et al (2012) ESSENTIALS: software for rapid analysis of high throughput transposon insertion sequencing data. PLoS One 7:e43012
Kichenaradja P, Siguier P, Pérochon J et al (2010) ISbrowser: an extension of ISfinder for visualizing insertion sequences in prokaryotic genomes. Nucleic Acids Res 38:D62–D68
Wagner A, Lewis C, Bichsel M (2007) A survey of bacterial insertion sequences using IScan. Nucleic Acids Res 35:5284–5293
Hawkey J, Hamidian M, Wick RR et al (2015) ISMapper: identifying transposase insertion sites in bacterial genomes from short read sequence data. BMC Genomics 16:667
Biswas A, Gauthier DT, Ranjan D et al (2015) ISQuest: finding insertion sequences in prokaryotic sequence fragment data. Bioinformatics 31:3406–3412
Robinson DG, Lee MC, Marx CJ (2012) OASIS: an automated program for global investigation of bacterial and archaeal insertion sequences. Nucleic Acids Res 40:e174
Chen CL, Chang YJ, Hsueh CH (2013) PRAP: an ab initio software package for automated genome-wide analysis of DNA repeats for prokaryotes. Bioinformatics 21:2683–2689
Bao Z, Eddy SR (2002) Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res 12:1269–1276
Girgis HZ (2015) Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale. BMC Bioinformatics 16:227
Volfovsky N, Haas BJ, Salzberg SL (2001) A clustering method for repeat analysis in DNA sequences. Genome Biol 2:research0027.1–researc0027.11
Smit AFA, Hubley R, Green P (2015) RepeatMasker. http://www.repeatmasker.org . Accessed 24 Mar 2016
Achaz G, Boyer F, Rocha EPC et al (2007) Repseek, a tool to retrieve approximate repeats from large DNA sequences. Bioinformatics 23:119–121
Singh V, Mishra RK (2010) RISCI–repeat induced sequence changes identifier: a comprehensive, comparative genomics-based, in silico subtractive hybridization pipeline to identify repeat induced sequence changes in closely related genomes. BMC Bioinformatics 11:609
Riadi G, Medina-Moenne C, Holmes DS (2012) TnpPred: a web service for the robust prediction of prokaryotic transposases. Comp Funct Genomics 2012:678761
Leplae R, Lima-Mendez G, Toussaint A (2010) ACLAME: a classification of mobile genetic elements, update 2010. Nucleic Acids Res 38:D57–D61
Fouts DE (2006) Phage_Finder: automated identification and classification of prophage regions in complete bacterial genome sequences. Nucleic Acids Res 34:5839–5851
Akhter S, Aziz RK, Edwards RA (2012) PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res 40:e126
Srividhya KV, Alaguraj V, Poornima G et al (2007) Identification of prophages in bacterial genomes by dinucleotide relative abundance difference. PLoS One 11:e1193
Lima-Mendez G, Van Helden J, Toussaint A et al (2008) Prophinder: a computational tool for prophage prediction in prokaryotic genomes. Bioinformatics 24:863–865
Roux S, Enault F, Hurwitz BL et al (2015) VirSorter: mining viral signal from microbial genomic data. PeerJ 3:e985
Skennerton CT, Imelfort M, Tyson GW (2013) Crass: identification and reconstruction of CRISPR from unassembled metagenomic data. Nucleic Acids Res 41:e105
Rousseau C, Gonnet M, Le Romancer M et al (2009) CRISPI: a CRISPR interactive database. Bioinformatics 25:3317–3318
Grissa I, Vergnaud G, Pourcel C (2008) CRISPRcompar: a website to compare clustered regularly interspaced short palindromic repeats. Nucleic Acids Res 36:W145–W148
Grissa I, Vergnaud G, Pourcel C (2007) The CRISPR database and tools to display CRISPRs and to generate dictionaries of spacers and repeats. BMC Bioinformatics 8:172
Pinello L, Canver MC, Hoban MD et al (2015) CRISPResso: sequencing analysis toolbox for CRISPR-Cas9 genome editing. bioRxiv. https://doi.org/10.1101/031203
Grissa I, Vergnaud G, Pourcel C (2007) CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats. Nucleic Acids Res 35:W52–W57
Lange SJ, Alkhnbashi OS, Rose D et al (2013) CRISPRmap: an automated classification of repeat conservation in prokaryotic adaptive immune systems. Nucleic Acids Res 41:8034–8044
Bland C, Ramsey TL, Sabree F et al (2007) CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics 8:209
Alkhnbashi OS, Costa F, Shah SA et al (2014) CRISPRstrand: predicting repeat orientations to determine the crRNA-encoding strand at CRISPR loci. Bioinformatics 30:i489–i496
Biswas A, Gagnon JN, Brouns SJJ et al (2013) Bioinformatic prediction and analysis of crRNA targets. RNA Biol 10:817–827
Angly F, Skennerton C (2015) MinCED. https://github.com/ctSkennerton/minced . Accessed 24 Mar 2016
Edgar RC (2007) PILER-CR: fast and accurate identification of CRISPR repeats. BMC Bioinformatics 8:18
Che D, Hasan MS, Wang H et al (2011) EGID: an ensemble algorithm for improved genomic island detection in genomic sequences. Bioinformation 7:311–314
Che D, Hockenbury C, Marmelstein R et al (2010) Classification of genomic islands using decision trees and their ensemble algorithms. BMC Genomics 11:S1
Che D, Wang H, Fazekas J et al (2014) An accurate genomic island prediction method for sequenced bacterial and archaeal genomes. J Proteomics Bioinform 7:214–221
Soares SC, Geyik H, Ramos RTJ et al (2015) GIPSY: genomic island prediction software. J Biotechnol 232:2–11. https://doi.org/10.1016/j.jbiotec.2015.09.008
Hasan MS, Liu Q, Wang H et al (2012) GIST: genomic island suite of tools for predicting genomic islands in genomic sequences. Bioinformation 8:203–205
Che D, Wang H (2013) GIV: a tool for genomic islands visualization. Bioinformation 9:879–882
Jain R, Raminemi S, Parekh N (2011) IGIPT–integrated genomic island prediction tool. Bioinformation 7:307–310
Hudson CM, Lau BY, Williams KP (2015) Islander: a database of precisely mapped genomic islands in tRNA and tmRNA genes. Nucleic Acids Res 43:D48–D53
Baichoo S, Goodur H, Ramtohul V (2014) IslandHunter–a java-based GI detection software. PeerJ Preprints 2:e716v1
Hsiao W, Wan I, Jones SJ et al (2003) IslandPath: aiding detection of genomic islands in prokaryotes. Bioinformatics 19:418–420
Langille MGI, Hsiao WWL, Brinkman FSL (2008) Evaluation of genomic island predictors using a comparative genomics approach. BMC Bioinformatics 9:329
Langille MGI, Brinkman FSL (2009) IslandViewer: an integrated interface for computational indentification and visualization of genomic islands. Bioinformatics 25(5):25664–25665
Ou HY, He X, Harrison EM et al (2007) MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands. Nucleic Acids Res 35:W97–W104
Brito DM, Maracaja-Coutinho V, Farias ST et al (2016) A novel method to predict genomic islands based on mean shift clustering algorithm. PLoS One 11:e0146352
Reva ON, Tümmler B (2005) Differentiation of regions with atypical oligonucleotide composition in bacterial genomes. BMC Bioinformatics 6:251
Ganesan H, Rakitianskaia AS, Davenport CF et al (2008) The SeqWord genome browser: an online tool for the identification and visualization of atypical regions of bacterial genomes through oligonucleotide usage. BMC Bioinformatics 9:333
Waack S, Keller O, Asper R et al (2006) Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models. BMC Bioinformatics 7:142
Ou HY, Chen LL, Lonnen J et al (2006) A novel strategy for the identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites in closely related bacteria. Nucleic Acids Res 34:e3
Vernikos GS, Parkhill J (2006) Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands. Bioinformatics 22:2196–2203
Yoon SH, Park YK, Lee S et al (2007) Towards pathogenomics: a web-based resource for pathogenicity islands. Nucleic Acids Res 35:D395–D400
Soares SC, Abreu VAC, Ramos RTJ et al (2012) PIPS: pathogenicity island prediction software. PLoS One 7:e30848
Pundhir S, Vijayvargiya H, Kumar A (2008) PredictBias: a server for the identification of genomic and pathogenicity islands in prokaryotes. In Silico Biol 8:0019
Joss MJ, Koenig JE, Labbate M et al (2009) ACID: annotation of cassette and integron data. BMC Bioinformatics 10:118
Rajan I, Aravamuthan S, Mande SS (2007) Identification of compositionally distinct regions in genomes using the centroid method. Bioinformatics 23:2672–2677
Lee CC, Chen YPP, Yao TJ (2013) GI-POP: a combinational annotation and genomic island prediction pipeline for ongoing microbial genome projects. Gene 518:114–123
Tu Q, Ding D (2003) Detecting pathogenicity islands and anomalous gene clusters by iterative discriminant analysis. FEMS Microbiol Lett 221:269–275
Merkl R (2004) SIGI: score-based identification of genomic islands. BMC Bioinformatics 5:22
Al-Nayyef H, Guyeux C, Bahi J (2014) A pipeline for insertion sequence detection and study for bacterial genome. Lect Notes Informatics 235:85–99
Zhou F, Olman V, Xu Y (2008) Insertion sequences show diverse recent activities in cyanobacteria and Archaea. BMC Genomics 9:36
Bose M, Barber RD (2006) Prophage finder: a prophage loci prediction tool for prokaryotic genome sequences. In Silico Biol 6:0020
Upadhyay SK, Sharma S (2014) SSFinder: high throughput CRISPR-Cas target sites prediction tool. Biomed Res Int 2014:742482
Kamoun C, Payen T, Hua-Van A et al (2013) Improving prokaryotic transposable elements identification using a combination of de novo and profile HMM methods. BMC Genomics 14:700
Chen Y, Zhou F, Li G et al (2009) MUST: a system for identification of miniature inverted-repeat transposable elements and applications to Anabaena variabilis and Haloquadratum walsbyi. Gene 436:1–7
Herron PR, Hughes G, Chandra G (2004) Transposon express, a software application to report the identity of insertions obtained by comprehensive transposon mutagenesis of sequenced genomes: analysis of the preference for in vitro Tn5 transposition in to GC-rich DNA. Nucleic Acids Res 32:e113
Petkau A, Stuart-Edwards M, Stothard P et al (2010) Interactive microbial genome visualization with GView. Bioinformatics 26:3125–3126
Carver T, Harris SR, Berriman M et al (2012) Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data. Bioinformatics 28:464–469
Bankevich A, Nurk S, Antipov D (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477
Kajitani R, Toshimoto K, Noguchi H et al (2014) Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res 24:1384–1395
Reddy TBK, Thomas AD, Stamatis D et al (2015) The genomes OnLine database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification. Nucleic Acids Res 43:D1099–D1106
Lang AS, Zhaxybayeva O, Beatty JT (2012) Gene transfer agents: phage-like elements of genetic exchange. Nat Rev Microbiol 10:472–482
Guy L, Nystedt B, Toft C et al (2013) A gene transfer agent and a dynamic repertoire of secretion systems hold the keys to the explosive radiation of the emerging pathogen Bartonella. PLoS Genet 9:e1003393
Horvath P, Barrangou R (2010) CRISPR/Cas, the immune system of bacteria and archaea. Science 327:167–170
Escudero JA, Loot C, Nivina A et al (2015) The integron: adaptation on demand. In: Craig N, Chandler M, Gellert M et al (eds) Mobile DNA III. ASM Press, Washington
Gillings M, Boucher Y, Labbate M et al (2008) The evolution of class 1 integrons and the rise of antibiotic resistance. J Bacteriol 190:5095–5100
Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25:955–964
Laslett D, Canback B (2004) ARAGORN, a program to detect rRNA genes and tmRNA genes in nucleotide equences. Nucleic Acids Res 32:11–16
Acknowledgments
This work was supported by a project from the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES-BIGA, number 3385/2013). DOA was supported by a postdoctoral research fellowship from the São Paulo Research Foundation (FAPESP n° 2015/14600-5).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media LLC
About this protocol
Cite this protocol
Oliveira Alvarenga, D., Moreira, L.M., Chandler, M., Varani, A.M. (2018). A Practical Guide for Comparative Genomics of Mobile Genetic Elements in Prokaryotic Genomes. In: Setubal, J., Stoye, J., Stadler, P. (eds) Comparative Genomics. Methods in Molecular Biology, vol 1704. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7463-4_7
Download citation
DOI: https://doi.org/10.1007/978-1-4939-7463-4_7
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-7461-0
Online ISBN: 978-1-4939-7463-4
eBook Packages: Springer Protocols