Abstract
Multiple sequence alignment as a means of comparing DNA, RNA, or amino acid sequences is an essential precondition for various analyses, including structure prediction, modeling binding sites, phylogeny, or function prediction. This range of applications implies a demand for versatile, flexible, and specialized methods to compute accurate alignments. This chapter summarizes the key algorithmic insights gained in the past years to facilitate an easy understanding of the current multiple sequence alignment literature and to enable the readers to use and apply current tools in their own research.
Keywords
- Dynamic Programming
- Multiple Sequence Alignment
- Pairwise Alignment
- Alignment Score
- Progressive Alignment
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Abouelhoda, M.I., Ohlebusch, E.: Multiple genome alignment: Chaining algorithms revisited. In: Proc. 14th Annual Symposium on Combinatorial Pattern Matching, Lect. Notes Comput. Sci., pp. 1–16 (2003)
Althaus, E., Canzar, S.: Bioinformatics research and development, chap. LASA: A tool for non-heuristic alignment of multiple sequences, pp. 489–498. Springer (2008)
Althaus, E., Caprara, A., Lenhof, H.P., Reinert, K.:Multiple sequence alignment with arbitrary gap costs: Computing an optimal solution using polyhedral combinatorics. Bioinformatics 18 Suppl 2, S4–S16 (2002)
Althaus, E., Caprara, A., Lenhof, H.P., Reinert, K.: A branch-and-cut algorithm for multiple sequence alignment. Math. Programm. 105, 387–425 (2006)
Altschul, S.F., Gish, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J.Mol. Biol. 215(3), 403–410 (1990)
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)
Armougom, F., Moretti, S., Poirot, O., Audic, S., Dumas, P., Schaeli, B., Keduas, V., Notredame, C.: Expresso: Automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee. Nucleic Acids Res. 34, W604–608 (2006)
Bailey, T.L., Williams, N., Misleh, C., Li, W.W.: MEME: Discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res. 34(suppl 2), W369–373 (2006)
Blanchette, M.: Computation and analysis of genomic multi-sequence alignments. Annu. Rev. Genomics Hum. Genet. 8(1), 193–213 (2007)
Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., Haussler, D., Miller, W.: Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14(4), 708–715 (2004)
Brudno, M., Do, C.B., Cooper, G.M., Kim, M.F., Davydov, E., Green, E.D., Sidow, A., Batzoglou, S.: LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13, 721–731 (2003)
Buhler, J.: Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics 17(5), 419–428 (2001)
Clamp, M., Cuff, J., Searle, S.M., Barton, G.J.: The Jalview Java alignment editor. Bioinformatics 20(3), 426–427 (2004)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to algorithms. MIT Press, Cambridge, MA (2001)
Crooks, G.E., Hon, G., Chandonia, J.M., Brenner, S.E.:WebLogo: A sequence logo generator. Genome Res. 14(6), 1188–1190 (2004)
Darling, A.C., Mau, B., Blattner, F.R., Perna, N.T.: Mauve: Multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 14(7), 1394–1403 (2004)
Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model of evolutionary change in proteins. In: M.O. Dayhoff (ed.) Atlas of Protein Structure, vol. 5(Suppl. 3), pp. 345–352. National Biomedical Reasearch Foundataion, Silver Spring, Md. (1979)
Delcher, A.L., Kasif, S., Fleischmann, R.D., Peterson, J.,White, O., Salzberg, S.L.: Alignment of whole genomes. Nucleic Acids Res. 27(11), 2369–2376 (1999)
Delcher, A.L., Phillippy, A., Carlton, J., Salzberg, S.L.: Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 30(11), 2478–2483 (2002)
Do, C.B., Mahabhashyam, M.S., Brudno, M., Batzoglou, S.: ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res. 15, 330–340 (2005)
Döring, A., Weese, D., Rausch, T., Reinert, K.: SeqAn - An efficient, generic C++ library for sequence analysis. BMC Bioinformatics 9, 11 (2008)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press (1998)
Edgar, R.C.: Local homology recognition and distance measures in linear time using compressed amino acid alphabets. Nucleic Acids Res. 32(1), 380–385 (2004)
Edgar, R.C.:MUSCLE:Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5), 1792–1797 (2004)
Edgar, R.C., Batzoglou, S.: Multiple sequence alignment. Curr. Opin. Struct. Biol. 16(3), 368 – 373 (2006)
Edgar, R.C., Sjolander, K.: A comparison of scoring functions for protein sequence profile alignment. Bioinformatics 20(8), 1301–1308 (2004)
Feng, D.F., Doolittle, R.F.: Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol. 25, 351–360 (1987)
Fitch, W.M., Margoliash, E.: Construction of phylogenetic trees. Science 155(760), 279–84 (1967)
Galtier, N., Gouy, M., Gautier, C.: SEAVIEW and PHYLO WIN: Two graphic tools for sequence alignment and molecular phylogeny. Comput. Appl. Biosci. 12(6), 543–548 (1996)
Gardner, P.P., Wilm, A., Washietl, S.: A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res. 33(8), 2433–2439 (2005)
Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162(3), 705–708 (1982)
Gotoh, O.: Alignment of three biological sequences with an efficient traceback procedure. J. Theor. Biol. 121(3), 327–37 (1986)
Gotoh, O.: Consistency of optimal sequence alignments. Bull.Math. Biol. 52, 509–525 (1990)
Gotoh, O.: Multiple sequence alignment: Algorithms and applications. Adv. Biophys. 36, 159–206 (1999)
Gupta, S.K., Kececioglu, J.D., Schffer, A.A.: Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment. J. Comput. Biol. 2, 459–472 (1995)
Gusfield, D.: Algorithms on strings, trees, and sequences: Computer science and computational biology. Cambridge University Press, New York, NY, USA (1997)
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. U.S.A. 89(22), 10,915–10,919 (1992)
Higgins, D.G., Sharp, P.M.: CLUSTAL: A package for performing multiple sequence alignment on a microcomputer. Gene 73(1), 237–244 (1988)
Hohl, M., Kurtz, S., Ohlebusch, E.: Efficient multiple genome alignment. Bioinformatics 18(suppl 1), S312–320 (2002)
Katoh, K., Kuma, K., Toh, H., Miyata, T.: MAFFT version 5: Improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33(2), 511–518 (2005)
Katoh, K., Misawa, K., Kuma, K., Miyata, T.: MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002)
Kececioglu, J.D.: Exact and approximation algorithms for DNA sequence reconstruction. Ph.D. thesis, University of Arizona, Tucson, AZ, USA (1992)
Kececioglu, J.D.: The maximum weight trace problem in multiple sequence alignment. In: Proc. 4th Annual Symposium on Combinatorial Pattern Matching, Lect. Notes Comput. Sci., pp. 106–119. Springer-Verlag, London, UK (1993)
Kececioglu, J.D., Starrett, D.: Aligning alignments exactly. In: Proc. 8th Annual International Conference on Research in Computational Molecular Biology, RECOMB, pp. 85–96. ACM, New York, NY, USA (2004)
Kececioglu, J.D., Zhang,W.: Aligning alignments. In: Proc. 9th Annual Symposium on Combinatorial Pattern Matching, Lect. Notes Comput. Sci., pp. 189–208. Springer Verlag (1998)
Kurtz, S., Phillippy, A., Delcher, A., Smoot, M., Shumway, M., Antonescu, C., Salzberg, S.: Versatile and open software for comparing large genomes. Genome Biol. 5(2), R12 (2004)
Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., Higgins, D.G.: Clustal W. and Clustal X. version 2.0. Bioinformatics 23(21), 2947–2948 (2007)
Lassmann, T., Sonnhammer, E.: Kalign - An accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics 6(1), 298 (2005)
Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C.: Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262(5131), 208–214 (1993)
Lee, C., Grasso, C., Sharlow, M.F.: Multiple sequence alignment using partial order graphs. Bioinformatics 18(3), 452–464 (2002)
Lermen, M., Reinert, K.: The practical use of the A* algorithm for exact multiple sequence alignment. J. Comput. Biol. 7, 655–671 (2000)
Lipman, D.J., Altschul, S.F., Kececioglu, J.D.: A tool for multiple sequence alignment. Proc. Natl. Acad. Sci. U.S.A. 86, 4412–4415 (1989)
Ma, B., Wang, Z., Zhang, K.: Alignment between two multiple alignments. In: Proc. 14th Annual Symposium on Combinatorial Pattern Matching, Lect. Notes Comput. Sci., Lect. Notes Comput. Sci., vol. 2676, pp. 254–265. Springer (2003)
McGuffin, L.J., Bryson, K., Jones, D.T.: The PSIPRED protein structure prediction server. Bioinformatics 16(4), 404–405 (2000)
Morgenstern, B., Frech, K., Dress, A., Werner, T.: DIALIGN: Finding local similarities by multiple sequence alignment. Bioinformatics 14(3), 290–294 (1998)
Murata, M., Richardson, J.S., Sussman, J.L.: Simultaneous comparison of three protein sequences. Proc. Natl. Acad. Sci. U.S.A. 82(10), 3073–3077 (1985)
Myers, G., Miller, W.: Chaining multiple-alignment fragments in sub-quadratic time. In: Proc. 6th Annual ACM-SIAM Symposium, pp. 38–47. Soc. Ind. Appl. Math., Philadelphia, PA, USA (1995)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Notredame, C., Higgins, D., Heringa, J.: T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000)
O’Sullivan, O., Suhre, K., Abergel, C., Higgins, D.G., Notredame, C.: 3DCoffee: Combining protein sequences and structures within multiple sequence alignments. J. Mol. Biol. 340(2), 385 – 395 (2004)
Ovcharenko, I., Loots, G.G., Giardine, B.M., Hou, M., Ma, J., Hardison, R.C., Stubbs, L., Miller, W.: Mulan: Multiple-sequence local alignment and visualization for studying function and evolution. Genome Res. 15(1), 184–194 (2005)
Pei, J.: Multiple protein sequence alignment. Curr. Opin. Struct. Biol. 18(3), 382 – 386 (2008)
Pei, J., Grishin, N.V.: MUMMALS: Multiple sequence alignment improved by using hidden Markov models with local structural information. Nucleic Acids Res. 34, 4364–4374 (2006)
Pei, J., Grishin, N.V.: PROMALS: Towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 23, 802–808 (2007)
Pei, J., Kim, B.H., Grishin, N.V.: PROMALS3D: A tool for multiple protein sequence and structure alignments. Nucleic Acids Res. 36(7), 2295–2300 (2008)
Pirovano, W., Heringa, J.: Multiple sequence alignment. Methods Mol. Biol. 452, 143–61 (2008)
Raghava, G.P., Searle, S., Audley, P., Barber, J., Barton, G.: OXBench: A benchmark for evaluation of protein multiple sequence alignment accuracy. BMC Bioinformatics 4(1), 47 (2003)
Raphael, B., Zhi, D., Tang, H., Pevzner, P.: A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Res. 14(11), 2336–2346 (2004)
Rausch, T., Emde, A.K., Reinert, K.: Robust consensus computation. BMC Bioinformatics 9(Suppl 10), P4 (2008)
Rausch, T., Emde, A.K., Weese, D., Döring, A., Notredame, C., Reinert, K.: Segment-based multiple sequence alignment. Bioinformatics 24(16), i187–192 (2008)
Reinert, K.: A polyhedral approach to sequence alignment problems. Ph.D. thesis, Universität Saarbrücken (1999)
Reinert, K., Lenhof, H.P., Mutzel, P., Mehlhorn, K., Kececioglu, J.: A branch-and-cut algorithm for multiple sequence alignment. In: Proc. 1st Annual International Conference on Research in Computational Molecular Biology, RECOMB, pp. 241–249 (1997)
Reinert, K., Stoye, J., Will, T.: An iterative method for faster sum-of-pairs multiple sequence alignment. Bioinformatics 16(9), 808–814 (2000)
Rice, P., Longden, I., Bleasby, A.: EMBOSS: The european molecular biology open software suite. Trends Genet. 16(6), 276 – 277 (2000)
Rost, B.: Review: Protein secondary structure prediction continues to rise. J. Struct. Biol. 134(2-3), 204 – 218 (2001)
Saitou, N., Nei, M.: The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)
Sankoff, D., Kruskal, J.B.: Time warps, string edits, and macromolecules: The theory and practice of sequence comparison. Addison-Wesley, Reading, MA (1983)
Schwartz, A.S., Pachter, L.: Multiple alignment by sequence annealing. Bioinformatics 23, e24–29 (2007)
Simossis, V.A., Heringa, J.: PRALINE: A multiple sequence alignment toolbox that integrates homology-extended and secondary structure information. Nucleic Acids Res. 33,W289 (2005)
Simossis, V.A., Kleinjung, J., Heringa, J.: Homology-extended sequence alignment. Nucleic Acids Res. 33(3), 816–824 (2005)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J.Mol. Biol. 147(1), 195–197 (1981)
Sokal, R.R., Michener, C.D.: A statistical method for evaluating systematic relationships. Univ. Kansas Sci. Bull. 38, 1409–1438 (1958)
Sommer, D., Delcher, A., Salzberg, S., Pop, M.: Minimus: A fast, lightweight genome assembler. BMC Bioinformatics 8(1), 64 (2007)
Subramanian, A., Kaufmann, M., Morgenstern, B.: DIALIGN-TX: Greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol. Biol. 3(1), 6 (2008)
Subramanian, A., Weyer-Menkhoff, J., Kaufmann, M., Morgenstern, B.: DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment. BMC Bioinformatics 6(1), 66 (2005)
Taylor, W.: Protein structure comparison using iterated double dynamic programming. Protein Sci. 8(3), 654–665 (1999)
Thompson, J., Plewniak, F., Poch, O.: BAliBASE: A benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15, 87–88 (1999)
Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994)
Thompson, J.D., Koehl, P., Ripp, R., Poch, O.: BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark. Proteins 61, 127–136 (2005)
Treangen, T., Messeguer, X.: M-GCAT: Interactively and efficiently constructing large-scale multiple genome comparison frameworks in closely related species. BMC Bioinformatics 7(1), 433 (2006)
Wallace, I.M., O’Sullivan, O., Higgins, D.G., Notredame, C.: M-Coffee: Combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res. 34, 1692–1699 (2006)
Walle, I.V., Lasters, I., Wyns, L.: SABmark - A benchmark for sequence alignment that covers the entire known fold space. Bioinformatics 21(7), 1267–1268 (2005)
Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. J. Comput. Biol. 1, 337–348 (1994)
Wheeler, T.J., Kececioglu, J.D.: Multiple alignment by aligning alignments. Bioinformatics 23, 559–568 (2007)
Zhou, H., Zhou, Y.: SPEM: Improving multiple sequence alignment with sequence profiles and predicted secondary structures. Bioinformatics 21(18), 3615–3621 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer US
About this chapter
Cite this chapter
Rausch, T., Reinert, K. (2010). Practical Multiple Sequence Alignment. In: Heath, L., Ramakrishnan, N. (eds) Problem Solving Handbook in Computational Biology and Bioinformatics. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09760-2_2
Download citation
DOI: https://doi.org/10.1007/978-0-387-09760-2_2
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09759-6
Online ISBN: 978-0-387-09760-2
eBook Packages: Computer ScienceComputer Science (R0)