POMAGO: Multiple Genome-Wide Alignment Tool for Bacteria

  • Nicolas Wieseke
  • Marcus Lechner
  • Marcus Ludwig
  • Manja Marz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7875)

Abstract

Multiple Genome-wide Alignments are a first crucial step to compare genomes. Gain and loss of genes, duplications and genomic rearrangements are challenging problems that aggravate with increasing phylogenetic distances. We describe a multiple genome-wide alignment tool for bacteria, called POMAGO, which is based on orthologous genes and their syntenic information determined by Proteinortho.This strategy enables POMAGO to efficiently define anchor points even across wide phylogenetic distances and outperform existing approaches in this field of application. The given set of orthologous genes is enhanced by several cleaning and completion steps, including the addition of previously undetected orthologous genes. Protein-coding genes are aligned on nucleotide and protein level, whereas intergenic regions are aligned on nucleotide level only. We tested and compared our program at three very different sets of bacteria that exhibit different degrees of phylogenetic distances: 1) 15 closely related, well examined and described E. coli species, 2) six more divergent Aquificales, as putative basal bacteria, and 3) a set of eight extreme divergent species, distributed among the whole phylogenetic tree of bacteria. POMAGO is written in a modular way which allows extending or even exchanging algorithms in different stages of the alignment process. Intergenic regions might for instance be aligned using an RNA secondary structure aware algorithm rather than to rely on sequence data alone. The software is freely available from http://www.rna.uni-jena.de/supplements/pomago

Keywords

Multiple Genome Alignment Synteny Annotation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)Google Scholar
  2. 2.
    Angiuoli, S.V., Salzberg, S.L.: Mugsy: fast multiple alignment of closely related whole genomes. Bioinformatics 27(3), 334–342 (2011)CrossRefGoogle Scholar
  3. 3.
    Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., Haussler, D., Miller, W.: Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14(4), 708–715 (2004)CrossRefGoogle Scholar
  4. 4.
    Blattner, F.R., Plunkett, G., Bloch, C.A., Perna, N.T., Burland, V., Riley, M., Collado-Vides, J., Glasner, J.D., Rode, C.K., Mayhew, G.F., Gregor, J., Davis, N.W., Kirkpatrick, H.A., Goeden, M.A., Rose, D.J., Mau, B., Shao, Y.: The complete genome sequence of Escherichia coli K-12. Science 277(5331), 1453–1462 (1997)CrossRefGoogle Scholar
  5. 5.
    Boussau, B., Guéguen, L., Gouy, M.: Accounting for horizontal gene transfers explains conflicting hypotheses regarding the position of aquificales in the phylogeny of bacteria. BMC Evol. Biol. 8, 272–272 (2008)CrossRefGoogle Scholar
  6. 6.
    Braga, M.D., Machado, R., Ribeiro, L.C., Stoye, J.: Genomic distance under gene substitutions. BMC Bioinformatics 12(suppl. 9) (2011)Google Scholar
  7. 7.
    Bray, N., Pachter, L.: MAVID: constrained ancestral alignment of multiple sequences. Genome Res. 14(4), 693–699 (2004)CrossRefGoogle Scholar
  8. 8.
    Brudno, M., Chapman, M., Göttgens, B., Batzoglou, S., Morgenstern, B.: Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics 4, 66–66 (2003)CrossRefGoogle Scholar
  9. 9.
    Burggraf, S., Olsen, G.J., Stetter, K.O., Woese, C.R.: A phylogenetic analysis of Aquifex pyrophilus. Syst. Appl. Microbiol. 15(3), 352–356 (1992)CrossRefGoogle Scholar
  10. 10.
    Chen, X., Tompa, M.: Comparative assessment of methods for aligning multiple genome sequences. Nat. Biotechnol. 28(6), 567–572 (2010)CrossRefGoogle Scholar
  11. 11.
    Darling, A.E., Mau, B., Perna, N.T.: progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5(6) (2010)Google Scholar
  12. 12.
    Deckert, G., Warren, P.V., Gaasterland, T., Young, W.G., Lenox, A.L., Graham, D.E., Overbeek, R., Snead, M.A., Keller, M., Aujay, M., Huber, R., Feldman, R.A., Short, J.M., Olsen, G.J., Swanson, R.V.: The complete genome of the hyperthermophilic bacterium Aquifex aeolicus. Nature 392(6674), 353–358 (1998)CrossRefGoogle Scholar
  13. 13.
    Dieterich, C., Wang, H., Rateitschak, K., Luz, H., Vingron, M.: CORG: a database for COmparative Regulatory Genomics. Nucleic Acids Res. 31(1), 55–57 (2003)CrossRefGoogle Scholar
  14. 14.
    Fitch, W.M.: Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–113 (1970)CrossRefGoogle Scholar
  15. 15.
    Frith, M.C., Hamada, M., Horton, P.: Parameters for accurate genome alignment. BMC Bioinformatics 11, 80–80 (2010)CrossRefGoogle Scholar
  16. 16.
    Gruber, A.R., Findeiß, S., Washietl, S., Hofacker, I.L., Stadler, P.F.: RNAz 2.0: improved noncoding RNA detection. Pac. Symp. Biocomput. 15, 69–79 (2010)Google Scholar
  17. 17.
    Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., Higgins, D.G.: Clustal W and Clustal X version 2.0. Bioinformatics 23(21), 2947–2948 (2007)CrossRefGoogle Scholar
  18. 18.
    Lechner, M., Findeiss, S., Steiner, L., Marz, M., Stadler, P.F., Prohaska, S.J.: Proteinortho: detection of (co-)orthologs in large-scale analysis. BMC Bioinformatics 12, 124–124 (2011)CrossRefGoogle Scholar
  19. 19.
    Qi, Z.-H., Du, M.-H., Qi, X.-Q., Zheng, L.-J.: Gene comparison based on the repetition of single-nucleotide structure patterns. Computers in Biology and Medicine 42, 975–981 (2012)CrossRefGoogle Scholar
  20. 20.
    Rose, D., Hertel, J., Reiche, K., Stadler, P.F., Hackermüller, J.: NcDNAlign: plausible multiple alignments of non-protein-coding genomic sequences. Genomics 92(1), 65–74 (2008)CrossRefGoogle Scholar
  21. 21.
    Wang, X., Fu, Y., Zhao, Y., Wang, Q., Pedamallu, C.S., Xu, S.Y., Niu, Y.: Accurate reconstruction of molecular phylogenies for proteins using codon and amino acid unified sequence alignments (CAUSA). Nature Proceedings (2001)Google Scholar
  22. 22.
    Will, A., Joshi, T., Hofacker, I.L., Stadler, P.F., Backofen, R.: LocARNA-P: accurate boundary prediction and improved detection of structural RNAs. RNA 18(5), 900–914 (2012)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Nicolas Wieseke
    • 1
  • Marcus Lechner
    • 2
  • Marcus Ludwig
    • 3
  • Manja Marz
    • 3
  1. 1.Faculty of Mathematics and Computer ScienceUniversity of LeipzigLeipzigGermany
  2. 2.Institut für Pharmazeutische ChemiePhilipps-Universität MarburgMarburgGermany
  3. 3.Faculty of Mathematics and Computer ScienceFriedrich-Schiller-University JenaJenaGermany

Personalised recommendations