Advertisement

Multiple Sequence Alignment with DIALIGN

  • Burkhard Morgenstern
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1079)

Abstract

DIALIGN is a software tool for multiple sequence alignment by combining global and local alignment features. It composes multiple alignments from local pairwise sequence similarities. This approach is particularly useful to discover conserved functional regions in sequences that share only local homologies but are otherwise unrelated. An anchoring option allows to use external information and expert knowledge in addition to primary-sequence similarity alone. The latest version of DIALIGN optionally uses matches to the PFAM database to detect weak homologies. Various versions of the program are available through Göttingen Bioinformatics Compute Server (GOBICS) at http://www.gobics.de/department/software.

Key words

Motif discovery Local alignment Anchored alignment Protein domain 

References

  1. 1.
    Feng DF, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25:351–360PubMedCrossRefGoogle Scholar
  2. 2.
    Higgins DG, Sharp PM (1988) CLUSTAL—a package for performing multiple sequence alignment on a microcomputer. Gene 73:237–244PubMedCrossRefGoogle Scholar
  3. 3.
    Taylor WR (1988) A flexible method to align large numbers of biological sequences. J Mol Evol 28:161–169PubMedCrossRefGoogle Scholar
  4. 4.
    Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33:511–518PubMedCrossRefGoogle Scholar
  5. 5.
    Edgar RC (2004) MUSCLE: multiple sequence alignment with high score accuracy and high throughput. Nucleic Acids Res 32:1792–1797PubMedCrossRefGoogle Scholar
  6. 6.
    Notredame C, Higgins D, Heringa J (2000) T-Coffee: a novel algorithm for multiple sequence alignment. J Mol Biol 302:205–217PubMedCrossRefGoogle Scholar
  7. 7.
    Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Sding J, Thompson JD, Higgins DG (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7:539PubMedCrossRefGoogle Scholar
  8. 8.
    Bailey TL, Elkan C (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proceedings of the second international conference on intelligent systems for molecular biology, The AAAI Press, Menlo Park, California, pp 28–36Google Scholar
  9. 9.
    Smith RF, Smith TF (1992) Pattern-Induced Multi-sequence Alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for comparative protein modelling. Protein Eng 5:35–41PubMedCrossRefGoogle Scholar
  10. 10.
    Rigoutsos I, Floratos A (1998) Combinatorial pattern discovery in biological sequences: the Teiresias algorithm. Bioinformatics 14(1):55–67PubMedCrossRefGoogle Scholar
  11. 11.
    Morgenstern B, Dress A, Werner T (1996) Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc Natl Acad Sci USA 93:12098–12103PubMedCrossRefGoogle Scholar
  12. 12.
    Loots GG, Locksley RM, Blankespoor CM, Wang ZE, Miller W, Rubin EM, Frazer KA (2000) Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science 288(5463):136–140PubMedCrossRefGoogle Scholar
  13. 13.
    Frazer KA, Elnitski L, Church DM, Dubchak I, Hardison RC (2003) Cross-species sequence comparisons: a review of methods and available resources. Genome Res 13:1–12PubMedCrossRefGoogle Scholar
  14. 14.
    Göttgens B, Gilbert JGR, Barton LM, Grafham D, Rogers J, Bentley DR, Green AR (2001) Long-range comparison of human and mouse SCL loci: localized regions of sensitivity to restriction endonucleases correspond precisely with peaks of conserved noncoding sequences. Genome Res 11:87–97PubMedCrossRefGoogle Scholar
  15. 15.
    Chapman MA, Charchar FJ, Kinston S, Bird CP, Grafham D, Rogers J, Grützner F, Marshall Graves JA, Green AR, Göttgens B (2003) Comparative and functional analysis of LYL1 loci establish marsupial sequences as a model for phylogenetic footprinting. Genomics 81:249–259PubMedCrossRefGoogle Scholar
  16. 16.
    Morgenstern B (2002) A simple and space-efficient fragment-chaining algorithm for alignment of DNA and protein sequences. Appl Math Lett 15:11–16CrossRefGoogle Scholar
  17. 17.
    Morgenstern B, Werner N, Prohaska SJ, Steinkamp R, Schneider I, Subramanian AR, Stadler PF, Weyer-Menkhoff J (2005) Multiple sequence alignment with user-defined constraints at GOBICS. Bioinformatics 21:1271–1273PubMedCrossRefGoogle Scholar
  18. 18.
    Morgenstern B, Prohaska SJ, Pöhler D, Stadler PF (2006) Multiple sequence alignment with user-defined anchor points. Algorithms Mol Biol 1:6PubMedCrossRefGoogle Scholar
  19. 19.
    Brudno M, Steinkamp R, Morgenstern B (2004) The CHAOS/DIALIGN WWW server for multiple alignment of genomic sequences. Nucleic Acids Res 32:W41–W44PubMedCrossRefGoogle Scholar
  20. 20.
    Pöhler D, Werner N, Steinkamp R, Morgenstern B (2005) Multiple alignment of genomic sequences using CHAOS, DIALIGN and ABC. Nucleic Acids Res 33:W532–W534PubMedCrossRefGoogle Scholar
  21. 21.
    Altschul SF, Gish W, Miller W, Myers EM, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410PubMedGoogle Scholar
  22. 22.
    Stanke M, Tzvetkova A, Morgenstern B (2006) AUGUSTUS+ at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome. Genome Biol 7:S11PubMedCrossRefGoogle Scholar
  23. 23.
    Corel E, Pitschi F, Morgenstern B (2010) A min-cut algorithm for the consistency problem in multiple sequence alignment. Bioinformatics 26:1015–1021PubMedCrossRefGoogle Scholar
  24. 24.
    Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680PubMedCrossRefGoogle Scholar
  25. 25.
    Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113PubMedCrossRefGoogle Scholar
  26. 26.
    Do CB, Mahabhashyam MSP, Brudno M, Batzoglou S (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15:330–340PubMedCrossRefGoogle Scholar
  27. 27.
    Lenhof H, Morgenstern B, Reinert K (1999) An exact solution for the segment-to-segment multiple sequence alignment problem. Bioinformatics 15:203–210PubMedCrossRefGoogle Scholar
  28. 28.
    Kececioglu JD, Lenhof H, Mehlhorn K, Mutzel P, Reinert K, Vingron M (2000) A polyhedral approach to sequence alignment problems. Discrete Appl Math 104:143–186CrossRefGoogle Scholar
  29. 29.
    Subramanian AR, Weyer-Menkhoff J, Kaufmann M, Morgenstern B (2005) DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment. BMC Bioinformatics 6:66PubMedCrossRefGoogle Scholar
  30. 30.
    Morgenstern B (2000) A space-efficient algorithm for aligning large genomic sequences. Bioinformatics 16:948–949PubMedCrossRefGoogle Scholar
  31. 31.
    Subramanian AR, Kaufmann M, Morgenstern B (2008) DIALIGN-TX: greedy and progressive approaches for the segment-based multiple sequence alignment. Algorithms Mol Biol 3:6PubMedCrossRefGoogle Scholar
  32. 32.
    Clarkson KL (1983) A modification of the greedy algorithm for vertex cover. Inf Process Lett 16:23–25CrossRefGoogle Scholar
  33. 33.
    Thompson JD, Plewniak F, Thierry J, Poch O (2000) DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches. Nucleic Acids Res 28:2919–2926PubMedCrossRefGoogle Scholar
  34. 34.
    Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K et al (2010) The Pfam protein families database. Nucleic Acids Res 38(Suppl 1):D211–D222PubMedCrossRefGoogle Scholar
  35. 35.
    Ait LA, Corel E, Morgenstern B (2012) Using protein-domain information for multiple sequence alignment. In: Proceedings of the IEEE 12th international conference on bioinformatics and bioengineering (BIBE 12), Institute of Electrical and Electronics Engineers (IEEE), pp 164–168Google Scholar
  36. 36.
    Thompson JD, Plewniak F, Poch O (1999) BAliBASE: a benchmark alignment database for the evaluation of multiple sequence alignment programs. Bioinformatics 15:87–88PubMedCrossRefGoogle Scholar
  37. 37.
    Thompson JD, Koehl P, Ripp R, Poch O (2005) BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins Struct Funct Bioinformatics 61:127–136CrossRefGoogle Scholar
  38. 38.
    Walle IV, Lasters I, Wyns L (2005) SABmark—a benchmark for sequence alignment that covers the entire known fold space. Bioinformatics 21:1267–1268PubMedCrossRefGoogle Scholar
  39. 39.
    Clamp M, Cuff J, Searle SM, Barton GJ (2004) The Jalview java alignment editor. Bioinformatics 20:426–427PubMedCrossRefGoogle Scholar
  40. 40.
    Morgenstern B, Goel S, Sczyrba A, Dress A (2003) AltAVisT: a WWW server for comparison of alternative multiple sequence alignments. Bioinformatics 19:425–426PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2014

Authors and Affiliations

  • Burkhard Morgenstern
    • 1
  1. 1.Abteilung für Bioinformatik (IMG)Universitat GöttingenGöttingenGermany

Personalised recommendations