Skip to main content

Protein Multiple Sequence Alignment

  • Protocol
Functional Proteomics

Part of the book series: Methods In Molecular Biology™ ((MIMB,volume 484))

Protein sequence alignment is the task of identifying evolutionarily or structurally related positions in a collection of amino acid sequences. Although the protein alignment problem has been studied for several decades, many recent studies have demonstrated considerable progress in improving the accuracy or scalability of multiple and pairwise alignment tools, or in expanding the scope of tasks handled by an alignment program. In this chapter, we review state-of-the-art protein sequence alignment and provide practical advice for users of alignment tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Notredame, C. (2002) Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 3, 131–144.

    Article  PubMed  CAS  Google Scholar 

  2. Needleman, S. B. and Wunsch, C. D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453.

    Article  PubMed  CAS  Google Scholar 

  3. Smith, T. F. and Waterman, M. S. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197.

    Article  PubMed  CAS  Google Scholar 

  4. Gotoh, O. (1982) An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708.

    Article  PubMed  CAS  Google Scholar 

  5. Myers, E. W. and Miller, W. (1988) Optimal alignments in linear space. Comput. Appl. Biosci. 4, 11–17.

    PubMed  CAS  Google Scholar 

  6. Murata, M., Richardson, J. S., and Sussman, J. L. (1985) Simultaneous comparison of three protein sequences. Proc. Natl. Acad. Sci. USA 82, 3073–3077.

    Google Scholar 

  7. Waterman, M. S. and Jones, R. (1990) Consensus methods for DNA and protein sequence alignment. Methods Enzymol. 183, 221–237.

    Article  PubMed  CAS  Google Scholar 

  8. Durbin, R., Eddy, S. R., Krogh, A., and Mitchison, G. (1999) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge.

    Google Scholar 

  9. Gonnet, G. H., Korostensky, C., and Benner, S. (2000) Evaluation measures of multiple sequence alignments. J. Comput. Biol. 7, 261–276.

    Article  PubMed  CAS  Google Scholar 

  10. Wang, L. and Jiang, T. (1994) On the complexity of multiple sequence alignment. J. Comput. Biol. 1, 337–348.

    PubMed  CAS  Google Scholar 

  11. Bonizzoni, P. and Della Vedova, G. (2001) The complexity of multiple sequence alignment with SP-score that is a metric. Theor. Comput. Sci. 259, 63–79.

    Article  Google Scholar 

  12. Just, W. (2001) Computational complexity of multiple sequence alignment with SP-score. J. Comput. Biol. 8, 615–623.

    Article  PubMed  CAS  Google Scholar 

  13. Elias, I. (2006) Settling the intractability of multiple alignment. J. Comput. Biol. 13, 1323–1339.

    Article  PubMed  CAS  Google Scholar 

  14. Lipman, D. J., Altschul, S. F., and Kececioglu, J. D. (1989) A tool for multiple sequence alignment. Proc. Natl. Acad. Sci. USA 86, 4412–4415.

    Google Scholar 

  15. Gupta, S. K., Kececioglu, J. D., and Schaffer, A. A. (1995) Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment. J. Comput. Biol. 2, 459–472.

    Article  PubMed  CAS  Google Scholar 

  16. Carrillo, H. and Lipman, D. (1988) The multiple sequence alignment problem in biology. SIAM J. Appl. Math. 48, 1073–1082.

    Article  Google Scholar 

  17. Dress, A., Fullen, G., and Perrey, S. (1995) A divide and conquer approach to multiple alignment. Proc. Int. Conf. Intell. Syst. Mol. Biol. 3, 107–113.

    Google Scholar 

  18. Stoye, J., Perrey, S. W., and Dress, A. W. M. (1997) Improving the divide-and-conquer approach to sum-of-pairs multiple sequence alignment. Appl. Math. Lett. 10, 67–73.

    Article  Google Scholar 

  19. Stoye, J., Moulton, V., and Dress, A. W. (1997) DCA: an efficient implementation of the divide-and-conquer approach to simultaneous multiple sequence alignment. Comput. Appl. Biosci. 13, 625–626.

    PubMed  CAS  Google Scholar 

  20. Stoye, J. (1998) Multiple sequence alignment with the divide-and-conquer method. Gene 211, GC45–56.

    Article  PubMed  CAS  Google Scholar 

  21. Reinert, K., Stoye, J., and Will, T. (2000) An iterative method for faster sum-of-pairs multiple sequence alignment. Bioinformatics 16, 808–814.

    Article  PubMed  CAS  Google Scholar 

  22. Holland, J. H. (1975) Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor.

    Google Scholar 

  23. Zhang, C. and Wong, A. K. (1997) A genetic algorithm for multiple molecular sequence alignment. Comput. Appl. Biosci. 13, 565–581.

    PubMed  CAS  Google Scholar 

  24. Anbarasu, L. A., Narayanasamy, P., and Sundararajan, V. (1998) Multiple sequence alignment using parallel genetic algorithms. SEAL.

    Google Scholar 

  25. Chellapilla, K. and Fogel, G. B. (1999) Multiple sequence alignment using evolutionary programming. Congress on Evolutionary Computation.

    Google Scholar 

  26. Gonzalez, R. R., Izquierdo, C. M., and Seijas, J. (1999) Multiple protein sequence comparison by genetic algorithms. SPIE-98.

    Google Scholar 

  27. Cai, L., Juedes, D., and Liakhovitch, E. (2000) Evolutionary computation techniques for multiple sequence alignment. Congress on Evolutionary Computation.

    Google Scholar 

  28. Zhang, G.-Z. and Huang, D.-S. (2004) Aligning multiple protein sequence by an improved genetic algorithm. IEEE International Joint Conference on Neural Networks.

    Google Scholar 

  29. Notredame, C. and Higgins, D. G. (1996) SAGA: sequence alignment by genetic algorithm. Nucleic Acids Res. 24, 1515–1524.

    Article  PubMed  CAS  Google Scholar 

  30. Isokawa, M., Takahashi, K., and Shimizu, T. (1996) Multiple sequence alignment using a genetic algorithm. Genome Inform. 7, 176–177.

    Google Scholar 

  31. Harada, Y., Wayama, M., and Shimizu, T. (1997) An inspection of the multiple alignment methods with use of genetic algorithm. Genome Inform. 8, 272–273.

    Google Scholar 

  32. Hanada, K., Yokoyama, T., and Shimizu, T. (2000) Multiple sequence alignment by genetic algorithm. Genome Inform. 11, 317–318.

    CAS  Google Scholar 

  33. Yokoyama, T., Watanabe, T., Taneda, A., and Shimizu, T. (2001) A web server for multiple sequence alignment using genetic algorithm. Genome Inform. 12, 382–383.

    CAS  Google Scholar 

  34. Nguyen, H. D., Yoshihara, I., Yamamori, K., and Yasunaga, M. (2002) A parallel hybrid genetic algorithm for multiple protein sequence alignment. Evol. Comput. 1, 309–314.

    Google Scholar 

  35. Kirkpatrick, S., Gelatt, J., C. D., and Vecchi, M. P. (1983) Optimization by simulated annealing. Science 220, 671–680.

    Article  PubMed  CAS  Google Scholar 

  36. Ishikawa, M., Toya, T., Hoshida, M., Nitta, K., Ogiwara, A., and Kanehisa, M. (1993) Multiple sequence alignment by parallel simulated annealing. Comput. Appl. Biosci. 9, 267–273.

    PubMed  CAS  Google Scholar 

  37. Kim, J., Pramanik, S., and Chung, M. J. (1994) Multiple sequence alignment using simulated annealing. Comput. Appl. Biosci. 10, 419–426.

    PubMed  CAS  Google Scholar 

  38. Eddy, S. R. (1995) Multiple alignment using hidden Markov models. Proc. Int. Conf. Intell. Syst. Mol. Biol. 3, 114–120.

    Google Scholar 

  39. Ikeda, T. and Imai, H. (1999) Enhanced A* algorithms for multiple alignments: optimal alignments for several sequences and k-opt approximate alignments for large cases. Theor. Comput. Sci. 210, 341–374.

    Article  Google Scholar 

  40. Horton, P. (2001) Tsukuba BB: a branch and bound algorithm for local multiple alignment of DNA and protein sequences. J. Comput. Biol. 8, 283–303.

    Article  PubMed  CAS  Google Scholar 

  41. Reinert, K., Lenhof, H.-P., Mutzel, P., Mehlhorn, K., and Kececioglu, J. D. (1997) A branch-and-cut algorithm for multiple sequence alignment. RECOMB.

    Google Scholar 

  42. Reinert, K., Stoye, J., and Will, T. (1999) Combining divide-and-conquer, the A*-algorithm and successive realignment approaches to speed up multiple sequence alignment. German Conference on Bioinformatics.

    Google Scholar 

  43. Lermen, M. and Reinert, K. (2000) The practical use of the A* algorithm for exact multiple sequence alignment. J. Comput. Biol. 7, 655–671.

    Article  PubMed  CAS  Google Scholar 

  44. Feng, D. F. and Doolittle, R. F. (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol. 25, 351–360.

    Article  PubMed  CAS  Google Scholar 

  45. Taylor, W. R. (1987) Multiple sequence alignment by a pairwise algorithm. Comput. Appl. Biosci. 3, 81–87.

    PubMed  CAS  Google Scholar 

  46. Taylor, W. R. (1988) A flexible method to align large numbers of biological sequences. J. Mol. Evol. 28, 161–169.

    Article  PubMed  CAS  Google Scholar 

  47. Kececioglu, J. and Starrett, D. (2004) Aligning alignments exactly. RECOMB.

    Google Scholar 

  48. Kececioglu, J. and Zhang, W. (1998) Aligning alignments. CPM.

    Google Scholar 

  49. Altschul, S. F. (1989) Gap costs for multiple sequence alignment. J. Theor. Biol. 138, 297–309.

    Article  PubMed  CAS  Google Scholar 

  50. Katoh, K., Misawa, K., Kuma, K., and Miyata, T. (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066.

    Article  PubMed  CAS  Google Scholar 

  51. Edgar, R. C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797.

    Article  PubMed  CAS  Google Scholar 

  52. Huang, X. (1994) On global sequence alignment. Comput. Appl. Biosci. 10, 227–235.

    PubMed  CAS  Google Scholar 

  53. Pei, J., Sadreyev, R., and Grishin, N. V. (2003) PCMA: fast and accurate multiple sequence alignment based on profile consistency. Bioinformatics 19, 427–428.

    Article  PubMed  CAS  Google Scholar 

  54. Smith, R. F. and Smith, T. F. (1992) Pattern-induced multi-sequence alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for use in comparative protein modelling. Protein Eng. 5, 35–41.

    Article  PubMed  CAS  Google Scholar 

  55. Yamada, S., Gotoh, O., and Yamana, H. (2006) Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost. BMC Bioinform. 7, 524.

    Article  Google Scholar 

  56. Gotoh, O. (1996) Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J. Mol. Biol. 264, 823–838.

    Article  PubMed  CAS  Google Scholar 

  57. Corpet, F. (1988) Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res. 16, 10881–10890.

    Article  PubMed  CAS  Google Scholar 

  58. Higgins, D. G. and Sharp, P. M. (1988) CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene 73, 237–244.

    Article  PubMed  CAS  Google Scholar 

  59. Higgins, D. G. and Sharp, P. M. (1989) Fast and sensitive multiple sequence alignments on a microcomputer. Comput. Appl. Biosci. 5, 151–153.

    PubMed  CAS  Google Scholar 

  60. Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680.

    Article  PubMed  CAS  Google Scholar 

  61. Katoh, K., Kuma, K., Toh, H., and Miyata, T. (2005) MAFFT version 5: improve- ment in accuracy of multiple sequence alignment. Nucleic Acids Res. 33, 511–518.

    Article  PubMed  CAS  Google Scholar 

  62. Edgar, R. C. (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinform. 5, 113.

    Article  CAS  Google Scholar 

  63. Notredame, C., Holm, L., and Higgins, D. G. (1998) COFFEE: an objective function for multiple sequence alignments. Bioinformatics 14, 407–422.

    Article  PubMed  CAS  Google Scholar 

  64. Notredame, C., Higgins, D. G., and Heringa, J. (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217.

    Article  PubMed  CAS  Google Scholar 

  65. Lassmann, T. and Sonnhammer, E. L. (2005) Kalign–an accurate and fast multiple sequence alignment algorithm. BMC Bioinform. 6, 298.

    Article  CAS  Google Scholar 

  66. Lee, C., Grasso, C., and Sharlow, M. F. (2002) Multiple sequence alignment using partial order graphs. Bioinformatics 18, 452–464.

    Article  PubMed  CAS  Google Scholar 

  67. Lee, C. (2003) Generating consensus sequences from partial order multiple sequence alignment graphs. Bioinformatics 19, 999–1008.

    Article  PubMed  CAS  Google Scholar 

  68. Grasso, C. and Lee, C. (2004) Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems. Bioinformatics 20, 1546–1556.

    Article  PubMed  CAS  Google Scholar 

  69. Do, C. B., Mahabhashyam, M. S., Brudno, M., and Batzoglou, S. (2005) ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res. 15, 330–340.

    Article  PubMed  CAS  Google Scholar 

  70. Pei, J. and Grishin, N. V. (2006) MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information. Nucleic Acids Res. 34, 4364–4374.

    Article  PubMed  CAS  Google Scholar 

  71. Pei, J. and Grishin, N. V. (2007) PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 23, 802–808.

    Article  PubMed  CAS  Google Scholar 

  72. Gribskov, M., McLachlan, A. D., and Eisenberg, D. (1987) Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. US A 84, 4355–4358.

    Google Scholar 

  73. von Ohsen, N., Sommer, I., and Zimmer, R. (2003) Profile-profile alignment: a powerful tool for protein structure prediction. Pac. Symp. Biocomput. 252–263.

    Google Scholar 

  74. von Ohsen, N., Sommer, I., Zimmer, R., and Lengauer, T. (2004) Arby: automatic protein structure prediction using profile-profile alignment and confidence measures. Bioinformatics 20, 2228–2235.

    Article  Google Scholar 

  75. Soding, J. (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951–960.

    Article  PubMed  Google Scholar 

  76. von Ohsen, N. and Zimmer, R. (2001) Improving profile-profile alignments via log-average scoring. WABI.

    Google Scholar 

  77. Yona, G. and Levitt, M. (2002) Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. J. Mol. Biol. 315, 1257–1275.

    Article  PubMed  CAS  Google Scholar 

  78. Heger, A. and Holm, L. (2003) Exhaustive enumeration of protein domain families. J. Mol. Biol. 328, 749–767.

    Article  PubMed  CAS  Google Scholar 

  79. Mittelman, D., Sadreyev, R., and Grishin, N. (2003) Probabilistic scoring measures for profile-profile comparison yield more accurate short seed alignments. Bioinformatics 19, 1531–1539.

    Article  PubMed  CAS  Google Scholar 

  80. Sadreyev, R. and Grishin, N. (2003) COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. J. Mol. Biol. 326, 317–336.

    Article  PubMed  CAS  Google Scholar 

  81. Edgar, R. C. and Sjolander, K. (2004) COACH: profile-profile alignment of protein families using hidden Markov models. Bioinformatics 20, 1309–1318.

    Article  PubMed  CAS  Google Scholar 

  82. Rychlewski, L., Jaroszewski, L., Li, W., and Godzik, A. (2000) Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci. 9, 232–241.

    CAS  Google Scholar 

  83. Edgar, R. C. and Sjolander, K. (2004) A comparison of scoring functions for protein sequence profile alignment. Bioinformatics 20, 1301–1308.

    Article  PubMed  CAS  Google Scholar 

  84. Ohlson, T., Wallner, B., and Elofsson, A. (2004) Profile-profile methods provide improved fold-recognition: a study of different profile–profile alignment methods. Proteins 57, 188–197.

    Article  PubMed  CAS  Google Scholar 

  85. Sokal, R. R. and Michener, C. D. (1958) A statistical method for evaluating systematic relationships. Univ. Kans. Sci. Bull. 28, 1409–1438.

    Google Scholar 

  86. Sneath, P. H. and Sokal, R. R. (1962) Numerical taxonomy. Nature 193, 855–860.

    Article  PubMed  CAS  Google Scholar 

  87. Saitou, N. and Nei, M. (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425.

    PubMed  CAS  Google Scholar 

  88. Studier, J. A. and Keppler, K. J. (1988) A note on the neighbor-joining algorithm of Saitou and Nei. Mol. Biol. Evol. 5, 729–731.

    PubMed  CAS  Google Scholar 

  89. Jones, D. T., Taylor, W. R., and Thornton, J. M. (1992) The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8, 275–282.

    PubMed  CAS  Google Scholar 

  90. Edgar, R. C. (2004) Local homology recognition and distance measures in linear time using compressed amino acid alphabets. Nucleic Acids Res. 32, 380–385.

    Article  PubMed  CAS  Google Scholar 

  91. Wu, S. and Manber, U. (1992) Fast text searching allowing errors. Commun. ACM 35, 83–91.

    Article  Google Scholar 

  92. Vingron, M. and Argos, P. (1989) A fast and sensitive multiple sequence alignment algorithm. Comput. Appl. Biosci. 5, 115–121.

    PubMed  CAS  Google Scholar 

  93. Vingron, M. and Argos, P. (1990) Determination of reliable regions in protein sequence alignments. Protein Eng. 3, 565–569.

    Article  PubMed  CAS  Google Scholar 

  94. Vingron, M. and Argos, P. (1991) Motif recognition and alignment for many sequences by comparison of dot-matrices. J. Mol. Biol. 218, 33–43.

    Article  PubMed  CAS  Google Scholar 

  95. Gotoh, O. (1990) Consistency of optimal sequence alignments. Bull. Math. Biol. 52, 509–525.

    PubMed  CAS  Google Scholar 

  96. Van Walle, I., Lasters, I., and Wyns, L. (2003) Consistency matrices: quantified structure alignments for sets of related proteins. Proteins 51, 1–9.

    Article  PubMed  CAS  Google Scholar 

  97. Van Walle, I., Lasters, I., and Wyns, L. (2004) Align-m–a new algorithm for multiple alignment of highly divergent sequences. Bioinformatics 20, 1428–1435.

    Article  PubMed  CAS  Google Scholar 

  98. Do, C. B., Gross, S. S., and Batzoglou, S. (2006) CONTRAlign: discriminative training for protein sequence alignment. RECOMB.

    Google Scholar 

  99. Lolkema, J. S. and Slotboom, D. J. (1998) Hydropathy profile alignment: a tool to search for structural homologues of membrane proteins. FEMS Microbiol. Rev. 22, 305–322.

    Article  PubMed  CAS  Google Scholar 

  100. Altschul, S. F., Carroll, R. J., and Lipman, D. J. (1989) Weights for data related by a tree. J. Mol. Biol. 207, 647–653.

    Article  PubMed  CAS  Google Scholar 

  101. Vingron, M. and Sibbald, P. R. (1993) Weighting in sequence space: a comparison of methods in terms of generalized sequences. Proc. Natl. Acad. Sci. USA 90, 8777–8781.

    Google Scholar 

  102. Sibbald, P. R. and Argos, P. (1990) Weighting aligned protein or nucleic acid sequences to correct for unequal representation. J. Mol. Biol. 216, 813–818.

    Article  PubMed  CAS  Google Scholar 

  103. Henikoff, S. and Henikoff, J. G. (1994) Position-based sequence weights. J. Mol. Biol. 243, 574–578.

    Article  PubMed  CAS  Google Scholar 

  104. Eddy, S. R., Mitchison, G., and Durbin, R. (1995) Maximum discrimination hidden Markov models of sequence consensus. J. Comput. Biol. 2, 9–23.

    Article  PubMed  CAS  Google Scholar 

  105. Gotoh, O. (1995) A weighting system and algorithm for aligning many phylogenetically related sequences. Comput. Appl. Biosci. 11, 543–551.

    PubMed  CAS  Google Scholar 

  106. Krogh, A. and Mitchison, G. (1995) Maximum entropy weighting of aligned sequences of proteins or DNA. Proc. Int. Conf. Intell. Syst. Mol. Biol. 3, 215–221.

    Google Scholar 

  107. Karchin, R. and Hughey, R. (1998) Weighting hidden Markov models for maximum discrimination. Bioinformatics 14, 772–782.

    Article  PubMed  CAS  Google Scholar 

  108. May, A. C. (2001) Optimal classification of protein sequences and selection of representative sets from multiple alignments: application to homologous families and lessons for structural genomics. Protein Eng. 14, 209–217.

    Article  PubMed  CAS  Google Scholar 

  109. Hirosawa, M., Totoki, Y., Hoshida, M., and Ishikawa, M. (1995) Comprehensive study on iterative algorithms of multiple sequence alignment. Comput. Appl. Biosci. 11, 13–18.

    PubMed  CAS  Google Scholar 

  110. Wang, Y. and Li, K. B. (2004) An adaptive and iterative algorithm for refining multiple sequence alignment. Comput. Biol. Chem. 28, 141–148.

    Article  PubMed  CAS  Google Scholar 

  111. Wallace, I. M., O’Sullivan, O., and Higgins, D. G. (2005) Evaluation of iterative alignment algorithms for multiple alignment. Bioinformatics 21, 1408–1414.

    Article  PubMed  CAS  Google Scholar 

  112. Brocchieri, L. and Karlin, S. (1998) A symmetric-iterated multiple alignment of protein sequences. J. Mol. Biol. 276, 249–264.

    Article  PubMed  CAS  Google Scholar 

  113. Subbiah, S. and Harrison, S. C. (1989) A method for multiple sequence alignment with gaps. J. Mol. Biol. 209, 539–548.

    Article  PubMed  CAS  Google Scholar 

  114. Barton, G. J. and Sternberg, M. J. (1987) A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. J. Mol. Biol. 198, 327–337.

    Article  PubMed  CAS  Google Scholar 

  115. Barton, G. J. and Sternberg, M. J. (1987) Evaluation and improvements in the automatic alignment of protein sequences. Protein Eng. 1, 89–94.

    Article  PubMed  CAS  Google Scholar 

  116. Bains, W. (1986) MULTAN: a program to align multiple DNA sequences. Nucleic Acids Res. 14, 159–177.

    Article  PubMed  CAS  Google Scholar 

  117. Thompson, J. D., Thierry, J. C., and Poch, O. (2003) RASCAL: rapid scanning and correction of multiple sequence alignments. Bioinformatics 19, 1155–1161.

    Article  PubMed  CAS  Google Scholar 

  118. Chakrabarti, S., Lanczycki, C. J., Panchenko, A. R., Przytycka, T. M., Thiessen, P. A., and Bryant, S. H. (2006) State of the art: refinement of multiple sequence alignments. BMC Bioinform. 7, 499.

    Google Scholar 

  119. Chakrabarti, S., Lanczycki, C. J., Panchenko, A. R., Przytycka, T. M., Thiessen, P. A., and Bryant, S. H. (2006) Refining multiple sequence alignments with conserved core regions. Nucleic Acids Res. 34, 2598–2606.

    Article  PubMed  CAS  Google Scholar 

  120. Huang, X. Q., Hardison, R. C., and Miller, W. (1990) A space-efficient algorithm for local similarities. Comput. Appl. Biosci. 6, 373–381.

    PubMed  CAS  Google Scholar 

  121. Huang, X. and Miller, W. (1991) A time-efficient, linear-space local similarity algorithm. Adv. Appl. Math. 12, 337–357.

    Article  Google Scholar 

  122. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.

    PubMed  CAS  Google Scholar 

  123. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.

    Article  PubMed  CAS  Google Scholar 

  124. Pearson, W. R. (1998) Empirical statistical estimates for sequence similarity searches. J. Mol. Biol. 276, 71–84.

    Article  PubMed  CAS  Google Scholar 

  125. Pearson, W. R. (1990) Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183, 63–98.

    Article  PubMed  CAS  Google Scholar 

  126. Pearson, W. R. (2000) Flexible sequence similarity searching with the FASTA3 program package. Methods Mol. Biol. 132, 185–219.

    PubMed  CAS  Google Scholar 

  127. Morgenstern, B., Dress, A., and Werner, T. (1996) Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc. Natl. Acad. Sci. USA 93, 12098–12103.

    Google Scholar 

  128. Morgenstern, B., Frech, K., Dress, A., and Werner, T. (1998) DIALIGN: finding local similarities by multiple sequence alignment. Bioinformatics 14, 290–294.

    Article  PubMed  CAS  Google Scholar 

  129. Morgenstern, B. (1999) DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211–218.

    Article  PubMed  CAS  Google Scholar 

  130. Morgenstern, B. (2004) DIALIGN: multiple DNA and protein sequence alignment at BiBiServ. Nucleic Acids Res. 32, W33–36.

    Article  PubMed  CAS  Google Scholar 

  131. Subramanian, A. R., Weyer-Menkhoff, J., Kaufmann, M., and Morgenstern, B. (2005) DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment. BMC Bioinform. 6, 66.

    Article  CAS  Google Scholar 

  132. Depiereux, E. and Feytmans, E. (1992) MATCH-BOX: a fundamentally new algorithm for the simultaneous alignment of several protein sequences. Comput. Appl. Biosci. 8, 501–509.

    PubMed  CAS  Google Scholar 

  133. Depiereux, E., Baudoux, G., Briffeuil, P., Reginster, I., De Bolle, X., Vinals, C., et al. (1997) Match-Box_server: a multiple sequence alignment tool placing emphasis on reliability. Comput. Appl. Biosci. 13, 249–256.

    PubMed  CAS  Google Scholar 

  134. Schwartz, A. S. and Pachter, L. (2007) Multiple alignment by sequence annealing. Bioinformatics 23, e24–29.

    Article  PubMed  CAS  Google Scholar 

  135. Pellegrini, M., Marcotte, E. M., and Yeates, T. O. (1999) A fast algorithm for genome-wide analysis of proteins with repeated sequences. Proteins 35, 440–446.

    Article  PubMed  CAS  Google Scholar 

  136. Notredame, C. (2001) Mocca: semi-automatic method for domain hunting. Bioinformatics 17, 373–374.

    Article  PubMed  CAS  Google Scholar 

  137. Heger, A. and Holm, L. (2000) Rapid automatic detection and alignment of repeats in protein sequences. Proteins 41, 224–237.

    Article  PubMed  CAS  Google Scholar 

  138. Heringa, J. and Argos, P. (1993) A method to recognize distant repeats in protein sequences. Proteins 17, 391–341.

    Article  PubMed  CAS  Google Scholar 

  139. Szklarczyk, R. and Heringa, J. (2004) Tracking repeats using significance and transitivity. Bioinformatics 20(Suppl 1), I311–I317.

    Article  PubMed  CAS  Google Scholar 

  140. Sammeth, M. and Heringa, J. (2006) Global multiple-sequence alignment with repeats. Proteins 64, 263–274.

    Article  PubMed  CAS  Google Scholar 

  141. Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F., and Wootton, J. C. (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214.

    Article  PubMed  CAS  Google Scholar 

  142. Neuwald, A. F., Liu, J. S., and Lawrence, C. E. (1995) Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci. 4, 1618–1632.

    Article  PubMed  CAS  Google Scholar 

  143. Henikoff, S., Henikoff, J. G., Alford, W. J., and Pietrokovski, S. (1995) Automated construction and graphical presentation of protein blocks from unaligned sequences. Gene 163, GC17–26.

    Article  PubMed  CAS  Google Scholar 

  144. Smith, H. O., Annau, T. M., and Chandrasegaran, S. (1990) Finding sequence motifs in groups of functionally related proteins. Proc. Natl. Acad. Sci. USA 87, 826–830.

    Google Scholar 

  145. Bailey, T. L. and Elkan, C. (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36.

    Google Scholar 

  146. Sonnhammer, E. L. and Kahn, D. (1994) Modular arrangement of proteins as inferred from analysis of homology. Protein Sci. 3, 482–492.

    Article  PubMed  CAS  Google Scholar 

  147. Schuler, G. D., Altschul, S. F., and Lipman, D. J. (1991) A workbench for multiple alignment construction and analysis. Proteins 9, 180–190.

    Article  PubMed  CAS  Google Scholar 

  148. Pevzner, P. A., Tang, H., and Tesler, G. (2004) De novo repeat classification and fragment assembly. Genome Res. 14, 1786–1796.

    Article  PubMed  CAS  Google Scholar 

  149. Raphael, B., Zhi, D., Tang, H., and Pevzner, P. (2004) A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Res. 14, 2336–2346.

    Article  PubMed  CAS  Google Scholar 

  150. Phuong, T. M., Do, C. B., Edgar, R. C., and Batzoglou, S. (2006) Multiple alignment of protein sequences with repeats and rearrangements. Nucleic Acids Res. 34, 5932–5942.

    Article  PubMed  CAS  Google Scholar 

  151. Bishop, M. J. and Thompson, E. A. (1986) Maximum likelihood alignment of DNA sequences. J. Mol. Biol. 190, 159–165.

    Article  PubMed  CAS  Google Scholar 

  152. Hein, J., Wiuf, C., Knudsen, B., Moller, M. B., and Wibling, G. (2000) Statistical alignment: computational properties, homology testing and goodness-of-fit. J. Mol. Biol. 302, 265–279.

    Article  PubMed  CAS  Google Scholar 

  153. Thorne, J. L., Kishino, H., and Felsenstein, J. (1991) An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol. 33, 114–124.

    Article  PubMed  CAS  Google Scholar 

  154. Thorne, J. L., Kishino, H., and Felsenstein, J. (1992) Inching toward reality: an improved likelihood model of sequence evolution. J. Mol. Evol. 34, 3–16.

    Article  PubMed  CAS  Google Scholar 

  155. Miklos, I. and Toroczkai, Z. (2001) An improved model for statistical alignment. WABI.

    Google Scholar 

  156. Miklos, I. (2003) Algorithm for statistical alignment of sequences derived from a Poisson sequence length distribution. Disc. Appl. Math. 127, 79–84.

    Article  Google Scholar 

  157. Miklos, I., Lunter, G. A., and Holmes, I. (2004) A “Long Indel” model for evolutionary sequence alignment. Mol. Biol. Evol. 21, 529–540.

    Article  PubMed  CAS  Google Scholar 

  158. Knudsen, B. and Miyamoto, M. M. (2003) Sequence alignments and pair hidden Markov models using evolutionary history. J. Mol. Biol. 333, 453–460.

    Article  PubMed  CAS  Google Scholar 

  159. Metzler, D. (2003) Statistical alignment based on fragment insertion and deletion models. Bioinformatics 19, 490–499.

    Article  PubMed  CAS  Google Scholar 

  160. Hein, J. (2001) A generalisation of the Thorne-Kishino-Felsenstein model of statistical alignment to k sequences related by a binary tree. PSB.

    Google Scholar 

  161. Hein, J., Jensen, J. L., and Pedersen, C. N. (2003) Recursions for statistical multiple alignment. Proc. Natl. Acad. Sci. USA 100, 14960–14965.

    Google Scholar 

  162. Holmes, I. and Bruno, W. J. (2001) Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics 17, 803–820.

    Article  PubMed  CAS  Google Scholar 

  163. Holmes, I. (2003) Using guide trees to construct multiple-sequence evolutionary HMMs. Bioinformatics 19(Suppl 1), i147–157.

    Article  PubMed  Google Scholar 

  164. Steel, M. and Hein, J. (2001) Applying the Thorne-Kishino-Felsenstein model to sequence evolution on a star-shaped tree. Appl. Math. Lett. 14, 679–684.

    Article  Google Scholar 

  165. Miklos, I. (2002) An improved algorithm for statistical alignment of sequences related by a star tree. Bull. Math. Biol. 64, 771–779.

    Article  PubMed  CAS  Google Scholar 

  166. Lunter, G. A., Miklos, I., Song, Y. S., and Hein, J. (2003) An efficient algorithm for statistical multiple alignment on arbitrary phylogenetic trees. J. Comput. Biol. 10, 869–889.

    Article  PubMed  CAS  Google Scholar 

  167. Jensen, J. L. and Hein, J. (2005) Gibbs sampler for statistical multiple alignment. Stat. Sin. 15, 889–907.

    Google Scholar 

  168. Hein, J. (1990) Unified approach to alignment and phylogenies. Methods Enzymol. 183, 626–645.

    Article  PubMed  CAS  Google Scholar 

  169. Vingron, M. and von Haeseler, A. (1997) Towards integration of multiple alignment and phylogenetic tree construction. J. Comput. Biol. 4, 23–34.

    Article  PubMed  CAS  Google Scholar 

  170. Fleissner, R., Metzler, D., and von Haeseler, A. (2005) Simultaneous statistical multiple alignment and phylogeny reconstruction. Syst. Biol. 54, 548–561.

    Article  PubMed  Google Scholar 

  171. Lunter, G., Miklos, I., Drummond, A., Jensen, J. L., and Hein, J. (2005) Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinform. 6, 83.

    Article  CAS  Google Scholar 

  172. Redelings, B. D. and Suchard, M. A. (2005) Joint Bayesian estimation of alignment and phylogeny. Syst. Biol. 54, 401–418.

    Article  PubMed  Google Scholar 

  173. Metzler, D., Fleissner, R., Wakolbinger, A., and von Haeseler, A. (2001) Assessing variability by joint sampling of alignments and mutation rates. J. Mol. Evol. 53, 660–669.

    Article  PubMed  CAS  Google Scholar 

  174. Allison, L. and Wallace, C. S. (1994) The posterior probability distribution of alignments and its application to parameter estimation of evolutionary trees and to optimization of multiple alignments. J. Mol. Evol. 39, 418–430.

    Article  PubMed  CAS  Google Scholar 

  175. Krogh, A., Brown, M., Mian, I. S., Sjolander, K., and Haussler, D. (1994) Hidden Markov models in computational biology. Applications to protein modeling. J. Mol. Biol. 235, 1501–1531.

    Article  PubMed  CAS  Google Scholar 

  176. Krogh, A. (1998) An introduction to hidden Markov models for biological sequences. In Computational Methods in Molecular Biology (Salzberg, S., Searls, D., Kasif, S., eds.). Elsevier Science, St. Louis, MO, pp. 45–63.

    Chapter  Google Scholar 

  177. Hughey, R. and Krogh, A. (1996) Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput. Appl. Biosci. 12, 95–107.

    PubMed  CAS  Google Scholar 

  178. Eddy, S. R. (1996) Hidden Markov models. Curr. Opin. Struct. Biol. 6, 361–365.

    Article  PubMed  CAS  Google Scholar 

  179. Eddy, S. R. (1998) Profile hidden Markov models. Bioinformatics 14, 755–763.

    Article  PubMed  CAS  Google Scholar 

  180. Mamitsuka, H. (2005) Finding the biologically optimal alignment of multiple sequences. Artif. Intell. Med. 35, 9–18.

    Article  PubMed  Google Scholar 

  181. Baldi, P. and Chauvin, Y. (1994) Smooth on-line learning algorithms for hidden Markov models. Neural Comput. 6, 307–318.

    Article  Google Scholar 

  182. Baldi, P., Chauvin, Y., Hunkapiller, T., and McClure, M. A. (1994) Hidden Markov models of biological primary sequence information. Proc. Natl. Acad. Sci. USA 91, 1059–1063.

    Google Scholar 

  183. Viterbi, A. J. (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inform. Theory It13, 260.

    Article  Google Scholar 

  184. Grundy, W. N., Bailey, T. L., Elkan, C. P., and Baker, M. E. (1997) Meta-MEME: motif-based hidden Markov models of protein families. Comput. Appl. Biosci. 13, 397–406.

    PubMed  CAS  Google Scholar 

  185. Bucher, P., Karplus, K., Moeri, N., and Hofmann, K. (1996) A flexible motif search technique based on generalized profiles. Comput. Chem. 20, 3–23.

    Article  PubMed  CAS  Google Scholar 

  186. Karplus, K., Barrett, C., and Hughey, R. (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics 14, 846–856.

    Article  PubMed  CAS  Google Scholar 

  187. Park, J., Karplus, K., Barrett, C., Hughey, R., Haussler, D., Hubbard, T., et al. (1998) Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J. Mol. Biol. 284, 1201–1210.

    Article  PubMed  CAS  Google Scholar 

  188. Sonnhammer, E. L., Eddy, S. R., Birney, E., Bateman, A., and Durbin, R. (1998) Pfam: multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res. 26, 320–322.

    Article  PubMed  CAS  Google Scholar 

  189. Eddy, S. R. HMMER: a profile hidden Markov modeling package, available from http://hmmer.janelia.org/.

  190. Sjolander, K., Karplus, K., Brown, M., Hughey, R., Krogh, A., Mian, I. S., et al. (1996) Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology. Comput. Appl. Biosci. 12, 327–345.

    PubMed  CAS  Google Scholar 

  191. Barrett, C., Hughey, R., and Karplus, K. (1997) Scoring hidden Markov models. Comput. Appl. Biosci. 13, 191–199.

    PubMed  CAS  Google Scholar 

  192. McClure, M. A., Smith, C., and Elton, P. (1996) Parameterization studies for the SAM and HMMER methods of hidden Markov model generation. Proc. Int. Conf. Intell. Syst. Mol. Biol. 4, 155–164.

    Google Scholar 

  193. Karplus, K. and Hu, B. (2001) Evaluation of protein multiple alignments by SAM-T99 using the BAliBASE multiple alignment test set. Bioinformatics 17, 713–720.

    Article  PubMed  CAS  Google Scholar 

  194. Loytynoja, A. and Milinkovitch, M. C. (2003) A hidden Markov model for progressive multiple alignment. Bioinformatics 19, 1505–1513.

    Article  PubMed  CAS  Google Scholar 

  195. Edgar, R. C. and Sjolander, K. (2003) Simultaneous sequence alignment and tree construction using hidden Markov models. Pac. Symp. Biocomput. 180–191.

    Google Scholar 

  196. Edgar, R. C. and Sjolander, K. (2003) SATCHMO: sequence alignment and tree construction using hidden Markov models. Bioinformatics 19, 1404–1411.

    Article  PubMed  CAS  Google Scholar 

  197. Loytynoja, A. and Goldman, N. (2005) An algorithm for progressive multiple alignment of sequences with insertions. Proc. Natl. Acad. Sci. USA 102, 10557–10562.

    Google Scholar 

  198. Holmes, I. and Durbin, R. (1998) Dynamic programming alignment accuracy. J. Comput. Biol. 5, 493–504.

    Article  PubMed  CAS  Google Scholar 

  199. Schwartz, A. S., Myers, E., and Pachter, L. (2006) Alignment metric accuracy. arXiv 2006:q-bio.QM/0510052.

    Google Scholar 

  200. Roshan, U. and Livesay, D. R. (2006) Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22, 2715–2721.

    Article  PubMed  CAS  Google Scholar 

  201. Wallace, I. M., O’Sullivan, O., Higgins, D. G., and Notredame, C. (2006) M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res. 34, 1692–1699.

    Article  PubMed  CAS  Google Scholar 

  202. Kececioglu, J. D. (1993) The maximum weight trace problem in multiple sequence alignment. CPM.

    Google Scholar 

  203. Kececioglu, J. D., Lenhof, H.-P., Mehlhorn, K., Mutzel, P., Reinert, K., and Vingron, M. (2000) A polyhedral approach to sequence alignment problems. Disc. Appl. Math. 104, 143–186.

    Article  Google Scholar 

  204. Koller, G. and Raidl, G. R. (2004) An evolutionary algorithm for the maximum weight trace formulation of the multiple sequence alignment problem. In LNCS, 3242, pp. 302–311.

    Google Scholar 

  205. Simossis, V. A. and Heringa, J. (2005) PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information. Nucleic Acids Res. 33, W289–294.

    Article  PubMed  CAS  Google Scholar 

  206. Simossis, V. A., Kleinjung, J., and Heringa, J. (2005) Homology-extended sequence alignment. Nucleic Acids Res. 33, 816–824.

    Article  PubMed  CAS  Google Scholar 

  207. Thompson, J. D., Plewniak, F., Thierry, J., and Poch, O. (2000) DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches. Nucleic Acids Res. 28, 2919–2926.

    Article  PubMed  CAS  Google Scholar 

  208. Wang, J. and Feng, J. A. (2005) NdPASA: a novel pairwise protein sequence alignment algorithm that incorporates neighbor-dependent amino acid propensities. Proteins 58, 628–637.

    Article  PubMed  CAS  Google Scholar 

  209. Yang, A. S. (2002) Structure-dependent sequence alignment for remotely related proteins. Bioinformatics 18, 1658–1665.

    Article  PubMed  CAS  Google Scholar 

  210. Zhou, H. and Zhou, Y. (2005) SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures. Bioinformatics 21, 3615–3621.

    Article  PubMed  CAS  Google Scholar 

  211. O’Sullivan, O., Suhre, K., Abergel, C., Higgins, D. G., and Notredame, C. (2004) 3DCoffee: combining protein sequences and structures within multiple sequence alignments. J. Mol. Biol. 340, 385–395.

    Article  PubMed  CAS  Google Scholar 

  212. Armougom, F., Moretti, S., Poirot, O., Audic, S., Dumas, P., Schaeli, B., et al. (2006) Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee. Nucleic Acids Res. 34, W604–608.

    Article  PubMed  CAS  Google Scholar 

  213. Thompson, J. D., Plewniak, F., and Poch, O. (1999) BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15, 87–88.

    Article  PubMed  CAS  Google Scholar 

  214. Thompson, J. D., Plewniak, F., and Poch, O. (1999) A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 27, 2682–2690.

    Article  PubMed  CAS  Google Scholar 

  215. Mizuguchi, K., Deane, C. M., Blundell, T. L., and Overington, J. P. (1998) HOMSTRAD: a database of protein structure alignments for homologous families. Protein Sci. 7, 2469–2471.

    Article  PubMed  CAS  Google Scholar 

  216. Van Walle, I., Lasters, I., and Wyns, L. (2005) SABmark–a benchmark for sequence alignment that covers the entire known fold space. Bioinformatics 21, 1267–1268.

    Article  PubMed  Google Scholar 

  217. Raghava, G. P., Searle, S. M., Audley, P. C., Barber, J. D., and Barton, G. J. (2003) OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy. BMC Bioinform. 4, 47.

    Article  CAS  Google Scholar 

  218. Thompson, J. D., Koehl, P., Ripp, R., and Poch, O. (2005) BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61, 127–136.

    Article  PubMed  CAS  Google Scholar 

  219. Sauder, J. M., Arthur, J. W., and Dunbrack, R. L., Jr. (2000) Large-scale comparison of protein sequence alignment algorithms with structure alignments. Proteins 40, 6–22.

    Article  PubMed  CAS  Google Scholar 

  220. Pang, A., Smith, A. D., Nuin, P. A., and Tillier, E. R. (2005) SIMPROT: using an empirically determined indel distribution in simulations of protein evolution. BMC Bioinform. 6, 236.

    Article  CAS  Google Scholar 

  221. Nuin, P. A., Wang, Z., and Tillier, E. R. (2006) The accuracy of several multiple sequence alignment programs for proteins. BMC Bioinform. 7, 471.

    Article  CAS  Google Scholar 

  222. Stoye, J., Evers, D., and Meyer, F. (1998) Rose: generating sequence families. Bioinformatics 14, 157–163.

    Article  PubMed  CAS  Google Scholar 

  223. Eidhammer, I., Jonassen, I., and Taylor, W. R. (2000) Structure comparison and structure patterns. J. Comput. Biol. 7, 685–716.

    Article  PubMed  CAS  Google Scholar 

  224. Carugo, O. and Pongor, S. (2001) A normalized root-mean-square distance for comparing protein three-dimensional structures. Protein Sci. 10, 1470–1473.

    Article  PubMed  CAS  Google Scholar 

  225. Armougom, F., Moretti, S., Keduas, V., and Notredame, C. (2006) The iRMSD: a local measure of sequence alignment accuracy using structural information. Bioinformatics 22, e35–39.

    Article  PubMed  CAS  Google Scholar 

  226. Chew, L. P., Huttenlocher, D., Kedem, K., and Kleinberg, J. (1999) Fast detection of common geometric substructure in proteins. J. Comput. Biol. 6, 313–325.

    Article  PubMed  CAS  Google Scholar 

  227. O’Sullivan, O., Zehnder, M., Higgins, D., Bucher, P., Grosdidier, A., and Notredame, C. (2003) APDB: a novel measure for benchmarking sequence alignment methods without reference alignments. Bioinformatics 19(Suppl 1), i215–221.

    Article  PubMed  Google Scholar 

  228. Henikoff, S. and Henikoff, J. G. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919.

    Google Scholar 

  229. Dayhoff, M. O., Eck, R. V., and Park, C. M. (1972) A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure (Dayhoff, M. O., ed.). National Biomedical Research Foundation, Washington, DC, pp. 89–99.

    Google Scholar 

  230. Dayhoff, M. O., Schwartz, R. M., and Orcutt, B. C. (1978) A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure (Dayhoff, M. O., ed.). National Biomedical Research Foundation, Washington, DC, pp. 345–352.

    Google Scholar 

  231. Muller, T. and Vingron, M. (2000) Modeling amino acid replacement. J. Comput. Biol. 7, 761–776.

    Article  PubMed  CAS  Google Scholar 

  232. Whelan, S. and Goldman, N. (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18, 691–699.

    PubMed  CAS  Google Scholar 

  233. Prlic, A., Domingues, F. S., and Sippl, M. J. (2000) Structure-derived substitution matrices for alignment of distantly related sequences. Protein Eng. 13, 545–550.

    Article  PubMed  CAS  Google Scholar 

  234. Reese, J. T. and Pearson, W. R. (2002) Empirical determination of effective gap penalties for sequence comparison. Bioinformatics 18, 1500–1507.

    Article  PubMed  CAS  Google Scholar 

  235. Arribas-Gil, A., Gassiat, E., and Matias, C. (2006) Parameter estimation in pair-hidden Markov models. Scand. J. Stat. 33, 651–671.

    Article  Google Scholar 

  236. Liu, J. S., Neuwald, A. F., and Lawrence, C. E. (1995) Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Am. Stat. Assoc. 90, 1156–1170.

    Article  Google Scholar 

  237. Zhu, J., Liu, J. S., and Lawrence, C. E. (1998) Bayesian adaptive sequence alignment algorithms. Bioinformatics 14, 25–39.

    Article  PubMed  CAS  Google Scholar 

  238. Kececioglu, J. and Kim, E. (2007) Simple and fast inverse alignment. RECOMB.

    Google Scholar 

  239. Yu, C.-N., Joachims, T., Elber, R., and Pillardy, J. (2007) Support vector training of protein alignment models. RECOMB.

    Google Scholar 

  240. Tsochantaridis, I., Joachims, T., Hofmann, T., and Altun, Y. (2005) Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6, 1453–1484.

    Google Scholar 

  241. Katoh, K. and Toh, H. (2007) PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences. Bioinformatics 23, 372–374.

    Article  PubMed  CAS  Google Scholar 

  242. Ahola, V., Aittokallio, T., Vihinen, M., and Uusipaikka, E. (2006) A statistical score for assessing the quality of multiple sequence alignments. BMC Bioinform. 7, 484.

    Article  CAS  Google Scholar 

  243. Altschul, S. F. (1998) Generalized affine gap costs for protein sequence alignment. Proteins 32, 88–96.

    Article  PubMed  CAS  Google Scholar 

  244. Zachariah, M. A., Crooks, G. E., Holbrook, S. R., and Brenner, S. E. (2005) A generalized affine gap model significantly improves protein sequence alignment accuracy. Proteins 58, 329–338.

    Article  PubMed  CAS  Google Scholar 

  245. Thompson, J. D., Muller, A., Waterhouse, A., Procter, J., Barton, G. J., Plewniak, F., et al. (2006) MACSIMS: multiple alignment of complete sequences information management system. BMC Bioinform. 7, 318.

    Article  CAS  Google Scholar 

  246. Thompson, J. D., Holbrook, S. R., Katoh, K., Koehl, P., Moras, D., Westhof, E., et al. (2005) MAO: a multiple alignment ontology for nucleic acid and protein sequences. Nucleic Acids Res. 33, 4164–4171.

    Article  PubMed  CAS  Google Scholar 

  247. Gotoh, O. (1999) Multiple sequence alignment: algorithms and applications. Adv. Biophys. 36, 159–206.

    Article  PubMed  CAS  Google Scholar 

  248. Phillips, A., Janies, D., and Wheeler, W. (2000) Multiple sequence alignment in phylogenetic analysis. Mol. Phylogenet. Evol. 16, 317–330.

    Article  PubMed  CAS  Google Scholar 

  249. Lambert, C., Campenhout, J. M. V., DeBolle, X., and Depiereux, E. (2003) Review of common sequence alignment methods: clues to enhance reliability. Curr. Genom. 4, 131–146.

    Article  CAS  Google Scholar 

  250. Wallace, I. M., Blackshields, G., and Higgins, D. G. (2005) Multiple sequence alignments. Curr. Opin. Struct. Biol. 15, 261–266.

    Article  PubMed  CAS  Google Scholar 

  251. Edgar, R. C. and Batzoglou, S. (2006) Multiple sequence alignment. Curr. Opin. Struct. Biol. 16, 368–373.

    Article  PubMed  CAS  Google Scholar 

  252. Morrison, D. A. (2006) Multiple sequence alignment for phylogenetic purposes. Aust. Syst. Bot. 19, 479–539.

    Article  CAS  Google Scholar 

  253. Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. (2001) Introduction to Algorithms. MIT Press, Cambridge, MA.

    Google Scholar 

  254. Eppstein, D. (2000) Fast hierarchical clustering and other applications of dynamic closest pairs. J. Exp. Algorithmics 5, 1–23.

    Article  Google Scholar 

  255. Elias, I. and Lagergren, J. (2005) Fast neighbor joining. ICALP.

    Google Scholar 

  256. Waterman, M. S., Eggert, M., and Lander, E. (1992) Parametric sequence comparisons. Proc. Natl. Acad. Sci. USA 89, 6090–6093.

    Google Scholar 

  257. Waterman, M. S. (1994) Parametric and ensemble sequence alignment algorithms. Bull. Math. Biol. 56, 743–767.

    PubMed  CAS  Google Scholar 

  258. Gusfield, D., Balasubramanian, K., and Naor, D. (1994) Parametric optimization of sequence alignment. Algorithmica 12, 312–326.

    Article  Google Scholar 

Download references

Acknowledgments

We thank Karen Ann Lee for help in preparing the manuscript. C.B.D was funded by an NDSEG fellowship.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Do, C.B., Katoh, K. (2009). Protein Multiple Sequence Alignment. In: Thompson, J.D., Schaeffer-Reiss, C., Ueffing, M. (eds) Functional Proteomics. Methods In Molecular Biology™, vol 484. Humana Press. https://doi.org/10.1007/978-1-59745-398-1_25

Download citation

  • DOI: https://doi.org/10.1007/978-1-59745-398-1_25

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-971-0

  • Online ISBN: 978-1-59745-398-1

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics