Protein Multiple Sequence Alignment

  • Chuong B. Do
  • Kazutaka Katoh
Part of the Methods In Molecular Biology™ book series (MIMB, volume 484)

Protein sequence alignment is the task of identifying evolutionarily or structurally related positions in a collection of amino acid sequences. Although the protein alignment problem has been studied for several decades, many recent studies have demonstrated considerable progress in improving the accuracy or scalability of multiple and pairwise alignment tools, or in expanding the scope of tasks handled by an alignment program. In this chapter, we review state-of-the-art protein sequence alignment and provide practical advice for users of alignment tools.

Key Words

Multiple sequence alignment review proteins software 

References

  1. 1.
    Notredame, C. (2002) Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 3, 131–144.PubMedCrossRefGoogle Scholar
  2. 2.
    Needleman, S. B. and Wunsch, C. D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453.PubMedCrossRefGoogle Scholar
  3. 3.
    Smith, T. F. and Waterman, M. S. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197.PubMedCrossRefGoogle Scholar
  4. 4.
    Gotoh, O. (1982) An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708.PubMedCrossRefGoogle Scholar
  5. 5.
    Myers, E. W. and Miller, W. (1988) Optimal alignments in linear space. Comput. Appl. Biosci. 4, 11–17.PubMedGoogle Scholar
  6. 6.
    Murata, M., Richardson, J. S., and Sussman, J. L. (1985) Simultaneous comparison of three protein sequences. Proc. Natl. Acad. Sci. USA 82, 3073–3077.Google Scholar
  7. 7.
    Waterman, M. S. and Jones, R. (1990) Consensus methods for DNA and protein sequence alignment. Methods Enzymol. 183, 221–237.PubMedCrossRefGoogle Scholar
  8. 8.
    Durbin, R., Eddy, S. R., Krogh, A., and Mitchison, G. (1999) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge.Google Scholar
  9. 9.
    Gonnet, G. H., Korostensky, C., and Benner, S. (2000) Evaluation measures of multiple sequence alignments. J. Comput. Biol. 7, 261–276.PubMedCrossRefGoogle Scholar
  10. 10.
    Wang, L. and Jiang, T. (1994) On the complexity of multiple sequence alignment. J. Comput. Biol. 1, 337–348.PubMedGoogle Scholar
  11. 11.
    Bonizzoni, P. and Della Vedova, G. (2001) The complexity of multiple sequence alignment with SP-score that is a metric. Theor. Comput. Sci. 259, 63–79.CrossRefGoogle Scholar
  12. 12.
    Just, W. (2001) Computational complexity of multiple sequence alignment with SP-score. J. Comput. Biol. 8, 615–623.PubMedCrossRefGoogle Scholar
  13. 13.
    Elias, I. (2006) Settling the intractability of multiple alignment. J. Comput. Biol. 13, 1323–1339.PubMedCrossRefGoogle Scholar
  14. 14.
    Lipman, D. J., Altschul, S. F., and Kececioglu, J. D. (1989) A tool for multiple sequence alignment. Proc. Natl. Acad. Sci. USA 86, 4412–4415.Google Scholar
  15. 15.
    Gupta, S. K., Kececioglu, J. D., and Schaffer, A. A. (1995) Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment. J. Comput. Biol. 2, 459–472.PubMedCrossRefGoogle Scholar
  16. 16.
    Carrillo, H. and Lipman, D. (1988) The multiple sequence alignment problem in biology. SIAM J. Appl. Math. 48, 1073–1082.CrossRefGoogle Scholar
  17. 17.
    Dress, A., Fullen, G., and Perrey, S. (1995) A divide and conquer approach to multiple alignment. Proc. Int. Conf. Intell. Syst. Mol. Biol. 3, 107–113.Google Scholar
  18. 18.
    Stoye, J., Perrey, S. W., and Dress, A. W. M. (1997) Improving the divide-and-conquer approach to sum-of-pairs multiple sequence alignment. Appl. Math. Lett. 10, 67–73.CrossRefGoogle Scholar
  19. 19.
    Stoye, J., Moulton, V., and Dress, A. W. (1997) DCA: an efficient implementation of the divide-and-conquer approach to simultaneous multiple sequence alignment. Comput. Appl. Biosci. 13, 625–626.PubMedGoogle Scholar
  20. 20.
    Stoye, J. (1998) Multiple sequence alignment with the divide-and-conquer method. Gene 211, GC45–56.PubMedCrossRefGoogle Scholar
  21. 21.
    Reinert, K., Stoye, J., and Will, T. (2000) An iterative method for faster sum-of-pairs multiple sequence alignment. Bioinformatics 16, 808–814.PubMedCrossRefGoogle Scholar
  22. 22.
    Holland, J. H. (1975) Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor.Google Scholar
  23. 23.
    Zhang, C. and Wong, A. K. (1997) A genetic algorithm for multiple molecular sequence alignment. Comput. Appl. Biosci. 13, 565–581.PubMedGoogle Scholar
  24. 24.
    Anbarasu, L. A., Narayanasamy, P., and Sundararajan, V. (1998) Multiple sequence alignment using parallel genetic algorithms. SEAL.Google Scholar
  25. 25.
    Chellapilla, K. and Fogel, G. B. (1999) Multiple sequence alignment using evolutionary programming. Congress on Evolutionary Computation.Google Scholar
  26. 26.
    Gonzalez, R. R., Izquierdo, C. M., and Seijas, J. (1999) Multiple protein sequence comparison by genetic algorithms. SPIE-98.Google Scholar
  27. 27.
    Cai, L., Juedes, D., and Liakhovitch, E. (2000) Evolutionary computation techniques for multiple sequence alignment. Congress on Evolutionary Computation.Google Scholar
  28. 28.
    Zhang, G.-Z. and Huang, D.-S. (2004) Aligning multiple protein sequence by an improved genetic algorithm. IEEE International Joint Conference on Neural Networks.Google Scholar
  29. 29.
    Notredame, C. and Higgins, D. G. (1996) SAGA: sequence alignment by genetic algorithm. Nucleic Acids Res. 24, 1515–1524.PubMedCrossRefGoogle Scholar
  30. 30.
    Isokawa, M., Takahashi, K., and Shimizu, T. (1996) Multiple sequence alignment using a genetic algorithm. Genome Inform. 7, 176–177.Google Scholar
  31. 31.
    Harada, Y., Wayama, M., and Shimizu, T. (1997) An inspection of the multiple alignment methods with use of genetic algorithm. Genome Inform. 8, 272–273.Google Scholar
  32. 32.
    Hanada, K., Yokoyama, T., and Shimizu, T. (2000) Multiple sequence alignment by genetic algorithm. Genome Inform. 11, 317–318.Google Scholar
  33. 33.
    Yokoyama, T., Watanabe, T., Taneda, A., and Shimizu, T. (2001) A web server for multiple sequence alignment using genetic algorithm. Genome Inform. 12, 382–383.Google Scholar
  34. 34.
    Nguyen, H. D., Yoshihara, I., Yamamori, K., and Yasunaga, M. (2002) A parallel hybrid genetic algorithm for multiple protein sequence alignment. Evol. Comput. 1, 309–314.Google Scholar
  35. 35.
    Kirkpatrick, S., Gelatt, J., C. D., and Vecchi, M. P. (1983) Optimization by simulated annealing. Science 220, 671–680.PubMedCrossRefGoogle Scholar
  36. 36.
    Ishikawa, M., Toya, T., Hoshida, M., Nitta, K., Ogiwara, A., and Kanehisa, M. (1993) Multiple sequence alignment by parallel simulated annealing. Comput. Appl. Biosci. 9, 267–273.PubMedGoogle Scholar
  37. 37.
    Kim, J., Pramanik, S., and Chung, M. J. (1994) Multiple sequence alignment using simulated annealing. Comput. Appl. Biosci. 10, 419–426.PubMedGoogle Scholar
  38. 38.
    Eddy, S. R. (1995) Multiple alignment using hidden Markov models. Proc. Int. Conf. Intell. Syst. Mol. Biol. 3, 114–120.Google Scholar
  39. 39.
    Ikeda, T. and Imai, H. (1999) Enhanced A* algorithms for multiple alignments: optimal alignments for several sequences and k-opt approximate alignments for large cases. Theor. Comput. Sci. 210, 341–374.CrossRefGoogle Scholar
  40. 40.
    Horton, P. (2001) Tsukuba BB: a branch and bound algorithm for local multiple alignment of DNA and protein sequences. J. Comput. Biol. 8, 283–303.PubMedCrossRefGoogle Scholar
  41. 41.
    Reinert, K., Lenhof, H.-P., Mutzel, P., Mehlhorn, K., and Kececioglu, J. D. (1997) A branch-and-cut algorithm for multiple sequence alignment. RECOMB.Google Scholar
  42. 42.
    Reinert, K., Stoye, J., and Will, T. (1999) Combining divide-and-conquer, the A*-algorithm and successive realignment approaches to speed up multiple sequence alignment. German Conference on Bioinformatics.Google Scholar
  43. 43.
    Lermen, M. and Reinert, K. (2000) The practical use of the A* algorithm for exact multiple sequence alignment. J. Comput. Biol. 7, 655–671.PubMedCrossRefGoogle Scholar
  44. 44.
    Feng, D. F. and Doolittle, R. F. (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol. 25, 351–360.PubMedCrossRefGoogle Scholar
  45. 45.
    Taylor, W. R. (1987) Multiple sequence alignment by a pairwise algorithm. Comput. Appl. Biosci. 3, 81–87.PubMedGoogle Scholar
  46. 46.
    Taylor, W. R. (1988) A flexible method to align large numbers of biological sequences. J. Mol. Evol. 28, 161–169.PubMedCrossRefGoogle Scholar
  47. 47.
    Kececioglu, J. and Starrett, D. (2004) Aligning alignments exactly. RECOMB.Google Scholar
  48. 48.
    Kececioglu, J. and Zhang, W. (1998) Aligning alignments. CPM.Google Scholar
  49. 49.
    Altschul, S. F. (1989) Gap costs for multiple sequence alignment. J. Theor. Biol. 138, 297–309.PubMedCrossRefGoogle Scholar
  50. 50.
    Katoh, K., Misawa, K., Kuma, K., and Miyata, T. (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066.PubMedCrossRefGoogle Scholar
  51. 51.
    Edgar, R. C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797.PubMedCrossRefGoogle Scholar
  52. 52.
    Huang, X. (1994) On global sequence alignment. Comput. Appl. Biosci. 10, 227–235.PubMedGoogle Scholar
  53. 53.
    Pei, J., Sadreyev, R., and Grishin, N. V. (2003) PCMA: fast and accurate multiple sequence alignment based on profile consistency. Bioinformatics 19, 427–428.PubMedCrossRefGoogle Scholar
  54. 54.
    Smith, R. F. and Smith, T. F. (1992) Pattern-induced multi-sequence alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for use in comparative protein modelling. Protein Eng. 5, 35–41.PubMedCrossRefGoogle Scholar
  55. 55.
    Yamada, S., Gotoh, O., and Yamana, H. (2006) Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost. BMC Bioinform. 7, 524.CrossRefGoogle Scholar
  56. 56.
    Gotoh, O. (1996) Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J. Mol. Biol. 264, 823–838.PubMedCrossRefGoogle Scholar
  57. 57.
    Corpet, F. (1988) Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res. 16, 10881–10890.PubMedCrossRefGoogle Scholar
  58. 58.
    Higgins, D. G. and Sharp, P. M. (1988) CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene 73, 237–244.PubMedCrossRefGoogle Scholar
  59. 59.
    Higgins, D. G. and Sharp, P. M. (1989) Fast and sensitive multiple sequence alignments on a microcomputer. Comput. Appl. Biosci. 5, 151–153.PubMedGoogle Scholar
  60. 60.
    Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680.PubMedCrossRefGoogle Scholar
  61. 61.
    Katoh, K., Kuma, K., Toh, H., and Miyata, T. (2005) MAFFT version 5: improve- ment in accuracy of multiple sequence alignment. Nucleic Acids Res. 33, 511–518.PubMedCrossRefGoogle Scholar
  62. 62.
    Edgar, R. C. (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinform. 5, 113.CrossRefGoogle Scholar
  63. 63.
    Notredame, C., Holm, L., and Higgins, D. G. (1998) COFFEE: an objective function for multiple sequence alignments. Bioinformatics 14, 407–422.PubMedCrossRefGoogle Scholar
  64. 64.
    Notredame, C., Higgins, D. G., and Heringa, J. (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217.PubMedCrossRefGoogle Scholar
  65. 65.
    Lassmann, T. and Sonnhammer, E. L. (2005) Kalign–an accurate and fast multiple sequence alignment algorithm. BMC Bioinform. 6, 298.CrossRefGoogle Scholar
  66. 66.
    Lee, C., Grasso, C., and Sharlow, M. F. (2002) Multiple sequence alignment using partial order graphs. Bioinformatics 18, 452–464.PubMedCrossRefGoogle Scholar
  67. 67.
    Lee, C. (2003) Generating consensus sequences from partial order multiple sequence alignment graphs. Bioinformatics 19, 999–1008.PubMedCrossRefGoogle Scholar
  68. 68.
    Grasso, C. and Lee, C. (2004) Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems. Bioinformatics 20, 1546–1556.PubMedCrossRefGoogle Scholar
  69. 69.
    Do, C. B., Mahabhashyam, M. S., Brudno, M., and Batzoglou, S. (2005) ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res. 15, 330–340.PubMedCrossRefGoogle Scholar
  70. 70.
    Pei, J. and Grishin, N. V. (2006) MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information. Nucleic Acids Res. 34, 4364–4374.PubMedCrossRefGoogle Scholar
  71. 71.
    Pei, J. and Grishin, N. V. (2007) PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 23, 802–808.PubMedCrossRefGoogle Scholar
  72. 72.
    Gribskov, M., McLachlan, A. D., and Eisenberg, D. (1987) Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. US A 84, 4355–4358.Google Scholar
  73. 73.
    von Ohsen, N., Sommer, I., and Zimmer, R. (2003) Profile-profile alignment: a powerful tool for protein structure prediction. Pac. Symp. Biocomput. 252–263.Google Scholar
  74. 74.
    von Ohsen, N., Sommer, I., Zimmer, R., and Lengauer, T. (2004) Arby: automatic protein structure prediction using profile-profile alignment and confidence measures. Bioinformatics 20, 2228–2235.CrossRefGoogle Scholar
  75. 75.
    Soding, J. (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951–960.PubMedCrossRefGoogle Scholar
  76. 76.
    von Ohsen, N. and Zimmer, R. (2001) Improving profile-profile alignments via log-average scoring. WABI.Google Scholar
  77. 77.
    Yona, G. and Levitt, M. (2002) Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. J. Mol. Biol. 315, 1257–1275.PubMedCrossRefGoogle Scholar
  78. 78.
    Heger, A. and Holm, L. (2003) Exhaustive enumeration of protein domain families. J. Mol. Biol. 328, 749–767.PubMedCrossRefGoogle Scholar
  79. 79.
    Mittelman, D., Sadreyev, R., and Grishin, N. (2003) Probabilistic scoring measures for profile-profile comparison yield more accurate short seed alignments. Bioinformatics 19, 1531–1539.PubMedCrossRefGoogle Scholar
  80. 80.
    Sadreyev, R. and Grishin, N. (2003) COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. J. Mol. Biol. 326, 317–336.PubMedCrossRefGoogle Scholar
  81. 81.
    Edgar, R. C. and Sjolander, K. (2004) COACH: profile-profile alignment of protein families using hidden Markov models. Bioinformatics 20, 1309–1318.PubMedCrossRefGoogle Scholar
  82. 82.
    Rychlewski, L., Jaroszewski, L., Li, W., and Godzik, A. (2000) Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci. 9, 232–241.Google Scholar
  83. 83.
    Edgar, R. C. and Sjolander, K. (2004) A comparison of scoring functions for protein sequence profile alignment. Bioinformatics 20, 1301–1308.PubMedCrossRefGoogle Scholar
  84. 84.
    Ohlson, T., Wallner, B., and Elofsson, A. (2004) Profile-profile methods provide improved fold-recognition: a study of different profile–profile alignment methods. Proteins 57, 188–197.PubMedCrossRefGoogle Scholar
  85. 85.
    Sokal, R. R. and Michener, C. D. (1958) A statistical method for evaluating systematic relationships. Univ. Kans. Sci. Bull. 28, 1409–1438.Google Scholar
  86. 86.
    Sneath, P. H. and Sokal, R. R. (1962) Numerical taxonomy. Nature 193, 855–860.PubMedCrossRefGoogle Scholar
  87. 87.
    Saitou, N. and Nei, M. (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425.PubMedGoogle Scholar
  88. 88.
    Studier, J. A. and Keppler, K. J. (1988) A note on the neighbor-joining algorithm of Saitou and Nei. Mol. Biol. Evol. 5, 729–731.PubMedGoogle Scholar
  89. 89.
    Jones, D. T., Taylor, W. R., and Thornton, J. M. (1992) The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8, 275–282.PubMedGoogle Scholar
  90. 90.
    Edgar, R. C. (2004) Local homology recognition and distance measures in linear time using compressed amino acid alphabets. Nucleic Acids Res. 32, 380–385.PubMedCrossRefGoogle Scholar
  91. 91.
    Wu, S. and Manber, U. (1992) Fast text searching allowing errors. Commun. ACM 35, 83–91.CrossRefGoogle Scholar
  92. 92.
    Vingron, M. and Argos, P. (1989) A fast and sensitive multiple sequence alignment algorithm. Comput. Appl. Biosci. 5, 115–121.PubMedGoogle Scholar
  93. 93.
    Vingron, M. and Argos, P. (1990) Determination of reliable regions in protein sequence alignments. Protein Eng. 3, 565–569.PubMedCrossRefGoogle Scholar
  94. 94.
    Vingron, M. and Argos, P. (1991) Motif recognition and alignment for many sequences by comparison of dot-matrices. J. Mol. Biol. 218, 33–43.PubMedCrossRefGoogle Scholar
  95. 95.
    Gotoh, O. (1990) Consistency of optimal sequence alignments. Bull. Math. Biol. 52, 509–525.PubMedGoogle Scholar
  96. 96.
    Van Walle, I., Lasters, I., and Wyns, L. (2003) Consistency matrices: quantified structure alignments for sets of related proteins. Proteins 51, 1–9.PubMedCrossRefGoogle Scholar
  97. 97.
    Van Walle, I., Lasters, I., and Wyns, L. (2004) Align-m–a new algorithm for multiple alignment of highly divergent sequences. Bioinformatics 20, 1428–1435.PubMedCrossRefGoogle Scholar
  98. 98.
    Do, C. B., Gross, S. S., and Batzoglou, S. (2006) CONTRAlign: discriminative training for protein sequence alignment. RECOMB.Google Scholar
  99. 99.
    Lolkema, J. S. and Slotboom, D. J. (1998) Hydropathy profile alignment: a tool to search for structural homologues of membrane proteins. FEMS Microbiol. Rev. 22, 305–322.PubMedCrossRefGoogle Scholar
  100. 100.
    Altschul, S. F., Carroll, R. J., and Lipman, D. J. (1989) Weights for data related by a tree. J. Mol. Biol. 207, 647–653.PubMedCrossRefGoogle Scholar
  101. 101.
    Vingron, M. and Sibbald, P. R. (1993) Weighting in sequence space: a comparison of methods in terms of generalized sequences. Proc. Natl. Acad. Sci. USA 90, 8777–8781.Google Scholar
  102. 102.
    Sibbald, P. R. and Argos, P. (1990) Weighting aligned protein or nucleic acid sequences to correct for unequal representation. J. Mol. Biol. 216, 813–818.PubMedCrossRefGoogle Scholar
  103. 103.
    Henikoff, S. and Henikoff, J. G. (1994) Position-based sequence weights. J. Mol. Biol. 243, 574–578.PubMedCrossRefGoogle Scholar
  104. 104.
    Eddy, S. R., Mitchison, G., and Durbin, R. (1995) Maximum discrimination hidden Markov models of sequence consensus. J. Comput. Biol. 2, 9–23.PubMedCrossRefGoogle Scholar
  105. 105.
    Gotoh, O. (1995) A weighting system and algorithm for aligning many phylogenetically related sequences. Comput. Appl. Biosci. 11, 543–551.PubMedGoogle Scholar
  106. 106.
    Krogh, A. and Mitchison, G. (1995) Maximum entropy weighting of aligned sequences of proteins or DNA. Proc. Int. Conf. Intell. Syst. Mol. Biol. 3, 215–221.Google Scholar
  107. 107.
    Karchin, R. and Hughey, R. (1998) Weighting hidden Markov models for maximum discrimination. Bioinformatics 14, 772–782.PubMedCrossRefGoogle Scholar
  108. 108.
    May, A. C. (2001) Optimal classification of protein sequences and selection of representative sets from multiple alignments: application to homologous families and lessons for structural genomics. Protein Eng. 14, 209–217.PubMedCrossRefGoogle Scholar
  109. 109.
    Hirosawa, M., Totoki, Y., Hoshida, M., and Ishikawa, M. (1995) Comprehensive study on iterative algorithms of multiple sequence alignment. Comput. Appl. Biosci. 11, 13–18.PubMedGoogle Scholar
  110. 110.
    Wang, Y. and Li, K. B. (2004) An adaptive and iterative algorithm for refining multiple sequence alignment. Comput. Biol. Chem. 28, 141–148.PubMedCrossRefGoogle Scholar
  111. 111.
    Wallace, I. M., O’Sullivan, O., and Higgins, D. G. (2005) Evaluation of iterative alignment algorithms for multiple alignment. Bioinformatics 21, 1408–1414.PubMedCrossRefGoogle Scholar
  112. 112.
    Brocchieri, L. and Karlin, S. (1998) A symmetric-iterated multiple alignment of protein sequences. J. Mol. Biol. 276, 249–264.PubMedCrossRefGoogle Scholar
  113. 113.
    Subbiah, S. and Harrison, S. C. (1989) A method for multiple sequence alignment with gaps. J. Mol. Biol. 209, 539–548.PubMedCrossRefGoogle Scholar
  114. 114.
    Barton, G. J. and Sternberg, M. J. (1987) A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. J. Mol. Biol. 198, 327–337.PubMedCrossRefGoogle Scholar
  115. 115.
    Barton, G. J. and Sternberg, M. J. (1987) Evaluation and improvements in the automatic alignment of protein sequences. Protein Eng. 1, 89–94.PubMedCrossRefGoogle Scholar
  116. 116.
    Bains, W. (1986) MULTAN: a program to align multiple DNA sequences. Nucleic Acids Res. 14, 159–177.PubMedCrossRefGoogle Scholar
  117. 117.
    Thompson, J. D., Thierry, J. C., and Poch, O. (2003) RASCAL: rapid scanning and correction of multiple sequence alignments. Bioinformatics 19, 1155–1161.PubMedCrossRefGoogle Scholar
  118. 118.
    Chakrabarti, S., Lanczycki, C. J., Panchenko, A. R., Przytycka, T. M., Thiessen, P. A., and Bryant, S. H. (2006) State of the art: refinement of multiple sequence alignments. BMC Bioinform. 7, 499.Google Scholar
  119. 119.
    Chakrabarti, S., Lanczycki, C. J., Panchenko, A. R., Przytycka, T. M., Thiessen, P. A., and Bryant, S. H. (2006) Refining multiple sequence alignments with conserved core regions. Nucleic Acids Res. 34, 2598–2606.PubMedCrossRefGoogle Scholar
  120. 120.
    Huang, X. Q., Hardison, R. C., and Miller, W. (1990) A space-efficient algorithm for local similarities. Comput. Appl. Biosci. 6, 373–381.PubMedGoogle Scholar
  121. 121.
    Huang, X. and Miller, W. (1991) A time-efficient, linear-space local similarity algorithm. Adv. Appl. Math. 12, 337–357.CrossRefGoogle Scholar
  122. 122.
    Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.PubMedGoogle Scholar
  123. 123.
    Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.PubMedCrossRefGoogle Scholar
  124. 124.
    Pearson, W. R. (1998) Empirical statistical estimates for sequence similarity searches. J. Mol. Biol. 276, 71–84.PubMedCrossRefGoogle Scholar
  125. 125.
    Pearson, W. R. (1990) Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183, 63–98.PubMedCrossRefGoogle Scholar
  126. 126.
    Pearson, W. R. (2000) Flexible sequence similarity searching with the FASTA3 program package. Methods Mol. Biol. 132, 185–219.PubMedGoogle Scholar
  127. 127.
    Morgenstern, B., Dress, A., and Werner, T. (1996) Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc. Natl. Acad. Sci. USA 93, 12098–12103.Google Scholar
  128. 128.
    Morgenstern, B., Frech, K., Dress, A., and Werner, T. (1998) DIALIGN: finding local similarities by multiple sequence alignment. Bioinformatics 14, 290–294.PubMedCrossRefGoogle Scholar
  129. 129.
    Morgenstern, B. (1999) DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211–218.PubMedCrossRefGoogle Scholar
  130. 130.
    Morgenstern, B. (2004) DIALIGN: multiple DNA and protein sequence alignment at BiBiServ. Nucleic Acids Res. 32, W33–36.PubMedCrossRefGoogle Scholar
  131. 131.
    Subramanian, A. R., Weyer-Menkhoff, J., Kaufmann, M., and Morgenstern, B. (2005) DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment. BMC Bioinform. 6, 66.CrossRefGoogle Scholar
  132. 132.
    Depiereux, E. and Feytmans, E. (1992) MATCH-BOX: a fundamentally new algorithm for the simultaneous alignment of several protein sequences. Comput. Appl. Biosci. 8, 501–509.PubMedGoogle Scholar
  133. 133.
    Depiereux, E., Baudoux, G., Briffeuil, P., Reginster, I., De Bolle, X., Vinals, C., et al. (1997) Match-Box_server: a multiple sequence alignment tool placing emphasis on reliability. Comput. Appl. Biosci. 13, 249–256.PubMedGoogle Scholar
  134. 134.
    Schwartz, A. S. and Pachter, L. (2007) Multiple alignment by sequence annealing. Bioinformatics 23, e24–29.PubMedCrossRefGoogle Scholar
  135. 135.
    Pellegrini, M., Marcotte, E. M., and Yeates, T. O. (1999) A fast algorithm for genome-wide analysis of proteins with repeated sequences. Proteins 35, 440–446.PubMedCrossRefGoogle Scholar
  136. 136.
    Notredame, C. (2001) Mocca: semi-automatic method for domain hunting. Bioinformatics 17, 373–374.PubMedCrossRefGoogle Scholar
  137. 137.
    Heger, A. and Holm, L. (2000) Rapid automatic detection and alignment of repeats in protein sequences. Proteins 41, 224–237.PubMedCrossRefGoogle Scholar
  138. 138.
    Heringa, J. and Argos, P. (1993) A method to recognize distant repeats in protein sequences. Proteins 17, 391–341.PubMedCrossRefGoogle Scholar
  139. 139.
    Szklarczyk, R. and Heringa, J. (2004) Tracking repeats using significance and transitivity. Bioinformatics 20(Suppl 1), I311–I317.PubMedCrossRefGoogle Scholar
  140. 140.
    Sammeth, M. and Heringa, J. (2006) Global multiple-sequence alignment with repeats. Proteins 64, 263–274.PubMedCrossRefGoogle Scholar
  141. 141.
    Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F., and Wootton, J. C. (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214.PubMedCrossRefGoogle Scholar
  142. 142.
    Neuwald, A. F., Liu, J. S., and Lawrence, C. E. (1995) Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci. 4, 1618–1632.PubMedCrossRefGoogle Scholar
  143. 143.
    Henikoff, S., Henikoff, J. G., Alford, W. J., and Pietrokovski, S. (1995) Automated construction and graphical presentation of protein blocks from unaligned sequences. Gene 163, GC17–26.PubMedCrossRefGoogle Scholar
  144. 144.
    Smith, H. O., Annau, T. M., and Chandrasegaran, S. (1990) Finding sequence motifs in groups of functionally related proteins. Proc. Natl. Acad. Sci. USA 87, 826–830.Google Scholar
  145. 145.
    Bailey, T. L. and Elkan, C. (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36.Google Scholar
  146. 146.
    Sonnhammer, E. L. and Kahn, D. (1994) Modular arrangement of proteins as inferred from analysis of homology. Protein Sci. 3, 482–492.PubMedCrossRefGoogle Scholar
  147. 147.
    Schuler, G. D., Altschul, S. F., and Lipman, D. J. (1991) A workbench for multiple alignment construction and analysis. Proteins 9, 180–190.PubMedCrossRefGoogle Scholar
  148. 148.
    Pevzner, P. A., Tang, H., and Tesler, G. (2004) De novo repeat classification and fragment assembly. Genome Res. 14, 1786–1796.PubMedCrossRefGoogle Scholar
  149. 149.
    Raphael, B., Zhi, D., Tang, H., and Pevzner, P. (2004) A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Res. 14, 2336–2346.PubMedCrossRefGoogle Scholar
  150. 150.
    Phuong, T. M., Do, C. B., Edgar, R. C., and Batzoglou, S. (2006) Multiple alignment of protein sequences with repeats and rearrangements. Nucleic Acids Res. 34, 5932–5942.PubMedCrossRefGoogle Scholar
  151. 151.
    Bishop, M. J. and Thompson, E. A. (1986) Maximum likelihood alignment of DNA sequences. J. Mol. Biol. 190, 159–165.PubMedCrossRefGoogle Scholar
  152. 152.
    Hein, J., Wiuf, C., Knudsen, B., Moller, M. B., and Wibling, G. (2000) Statistical alignment: computational properties, homology testing and goodness-of-fit. J. Mol. Biol. 302, 265–279.PubMedCrossRefGoogle Scholar
  153. 153.
    Thorne, J. L., Kishino, H., and Felsenstein, J. (1991) An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol. 33, 114–124.PubMedCrossRefGoogle Scholar
  154. 154.
    Thorne, J. L., Kishino, H., and Felsenstein, J. (1992) Inching toward reality: an improved likelihood model of sequence evolution. J. Mol. Evol. 34, 3–16.PubMedCrossRefGoogle Scholar
  155. 155.
    Miklos, I. and Toroczkai, Z. (2001) An improved model for statistical alignment. WABI.Google Scholar
  156. 156.
    Miklos, I. (2003) Algorithm for statistical alignment of sequences derived from a Poisson sequence length distribution. Disc. Appl. Math. 127, 79–84.CrossRefGoogle Scholar
  157. 157.
    Miklos, I., Lunter, G. A., and Holmes, I. (2004) A “Long Indel” model for evolutionary sequence alignment. Mol. Biol. Evol. 21, 529–540.PubMedCrossRefGoogle Scholar
  158. 158.
    Knudsen, B. and Miyamoto, M. M. (2003) Sequence alignments and pair hidden Markov models using evolutionary history. J. Mol. Biol. 333, 453–460.PubMedCrossRefGoogle Scholar
  159. 159.
    Metzler, D. (2003) Statistical alignment based on fragment insertion and deletion models. Bioinformatics 19, 490–499.PubMedCrossRefGoogle Scholar
  160. 160.
    Hein, J. (2001) A generalisation of the Thorne-Kishino-Felsenstein model of statistical alignment to k sequences related by a binary tree. PSB.Google Scholar
  161. 161.
    Hein, J., Jensen, J. L., and Pedersen, C. N. (2003) Recursions for statistical multiple alignment. Proc. Natl. Acad. Sci. USA 100, 14960–14965.Google Scholar
  162. 162.
    Holmes, I. and Bruno, W. J. (2001) Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics 17, 803–820.PubMedCrossRefGoogle Scholar
  163. 163.
    Holmes, I. (2003) Using guide trees to construct multiple-sequence evolutionary HMMs. Bioinformatics 19(Suppl 1), i147–157.PubMedCrossRefGoogle Scholar
  164. 164.
    Steel, M. and Hein, J. (2001) Applying the Thorne-Kishino-Felsenstein model to sequence evolution on a star-shaped tree. Appl. Math. Lett. 14, 679–684.CrossRefGoogle Scholar
  165. 165.
    Miklos, I. (2002) An improved algorithm for statistical alignment of sequences related by a star tree. Bull. Math. Biol. 64, 771–779.PubMedCrossRefGoogle Scholar
  166. 166.
    Lunter, G. A., Miklos, I., Song, Y. S., and Hein, J. (2003) An efficient algorithm for statistical multiple alignment on arbitrary phylogenetic trees. J. Comput. Biol. 10, 869–889.PubMedCrossRefGoogle Scholar
  167. 167.
    Jensen, J. L. and Hein, J. (2005) Gibbs sampler for statistical multiple alignment. Stat. Sin. 15, 889–907.Google Scholar
  168. 168.
    Hein, J. (1990) Unified approach to alignment and phylogenies. Methods Enzymol. 183, 626–645.PubMedCrossRefGoogle Scholar
  169. 169.
    Vingron, M. and von Haeseler, A. (1997) Towards integration of multiple alignment and phylogenetic tree construction. J. Comput. Biol. 4, 23–34.PubMedCrossRefGoogle Scholar
  170. 170.
    Fleissner, R., Metzler, D., and von Haeseler, A. (2005) Simultaneous statistical multiple alignment and phylogeny reconstruction. Syst. Biol. 54, 548–561.PubMedCrossRefGoogle Scholar
  171. 171.
    Lunter, G., Miklos, I., Drummond, A., Jensen, J. L., and Hein, J. (2005) Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinform. 6, 83.CrossRefGoogle Scholar
  172. 172.
    Redelings, B. D. and Suchard, M. A. (2005) Joint Bayesian estimation of alignment and phylogeny. Syst. Biol. 54, 401–418.PubMedCrossRefGoogle Scholar
  173. 173.
    Metzler, D., Fleissner, R., Wakolbinger, A., and von Haeseler, A. (2001) Assessing variability by joint sampling of alignments and mutation rates. J. Mol. Evol. 53, 660–669.PubMedCrossRefGoogle Scholar
  174. 174.
    Allison, L. and Wallace, C. S. (1994) The posterior probability distribution of alignments and its application to parameter estimation of evolutionary trees and to optimization of multiple alignments. J. Mol. Evol. 39, 418–430.PubMedCrossRefGoogle Scholar
  175. 175.
    Krogh, A., Brown, M., Mian, I. S., Sjolander, K., and Haussler, D. (1994) Hidden Markov models in computational biology. Applications to protein modeling. J. Mol. Biol. 235, 1501–1531.PubMedCrossRefGoogle Scholar
  176. 176.
    Krogh, A. (1998) An introduction to hidden Markov models for biological sequences. In Computational Methods in Molecular Biology (Salzberg, S., Searls, D., Kasif, S., eds.). Elsevier Science, St. Louis, MO, pp. 45–63.CrossRefGoogle Scholar
  177. 177.
    Hughey, R. and Krogh, A. (1996) Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput. Appl. Biosci. 12, 95–107.PubMedGoogle Scholar
  178. 178.
    Eddy, S. R. (1996) Hidden Markov models. Curr. Opin. Struct. Biol. 6, 361–365.PubMedCrossRefGoogle Scholar
  179. 179.
    Eddy, S. R. (1998) Profile hidden Markov models. Bioinformatics 14, 755–763.PubMedCrossRefGoogle Scholar
  180. 180.
    Mamitsuka, H. (2005) Finding the biologically optimal alignment of multiple sequences. Artif. Intell. Med. 35, 9–18.PubMedCrossRefGoogle Scholar
  181. 181.
    Baldi, P. and Chauvin, Y. (1994) Smooth on-line learning algorithms for hidden Markov models. Neural Comput. 6, 307–318.CrossRefGoogle Scholar
  182. 182.
    Baldi, P., Chauvin, Y., Hunkapiller, T., and McClure, M. A. (1994) Hidden Markov models of biological primary sequence information. Proc. Natl. Acad. Sci. USA 91, 1059–1063.Google Scholar
  183. 183.
    Viterbi, A. J. (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inform. Theory It13, 260.CrossRefGoogle Scholar
  184. 184.
    Grundy, W. N., Bailey, T. L., Elkan, C. P., and Baker, M. E. (1997) Meta-MEME: motif-based hidden Markov models of protein families. Comput. Appl. Biosci. 13, 397–406.PubMedGoogle Scholar
  185. 185.
    Bucher, P., Karplus, K., Moeri, N., and Hofmann, K. (1996) A flexible motif search technique based on generalized profiles. Comput. Chem. 20, 3–23.PubMedCrossRefGoogle Scholar
  186. 186.
    Karplus, K., Barrett, C., and Hughey, R. (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics 14, 846–856.PubMedCrossRefGoogle Scholar
  187. 187.
    Park, J., Karplus, K., Barrett, C., Hughey, R., Haussler, D., Hubbard, T., et al. (1998) Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J. Mol. Biol. 284, 1201–1210.PubMedCrossRefGoogle Scholar
  188. 188.
    Sonnhammer, E. L., Eddy, S. R., Birney, E., Bateman, A., and Durbin, R. (1998) Pfam: multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res. 26, 320–322.PubMedCrossRefGoogle Scholar
  189. 189.
    Eddy, S. R. HMMER: a profile hidden Markov modeling package, available from http://hmmer.janelia.org/.
  190. 190.
    Sjolander, K., Karplus, K., Brown, M., Hughey, R., Krogh, A., Mian, I. S., et al. (1996) Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology. Comput. Appl. Biosci. 12, 327–345.PubMedGoogle Scholar
  191. 191.
    Barrett, C., Hughey, R., and Karplus, K. (1997) Scoring hidden Markov models. Comput. Appl. Biosci. 13, 191–199.PubMedGoogle Scholar
  192. 192.
    McClure, M. A., Smith, C., and Elton, P. (1996) Parameterization studies for the SAM and HMMER methods of hidden Markov model generation. Proc. Int. Conf. Intell. Syst. Mol. Biol. 4, 155–164.Google Scholar
  193. 193.
    Karplus, K. and Hu, B. (2001) Evaluation of protein multiple alignments by SAM-T99 using the BAliBASE multiple alignment test set. Bioinformatics 17, 713–720.PubMedCrossRefGoogle Scholar
  194. 194.
    Loytynoja, A. and Milinkovitch, M. C. (2003) A hidden Markov model for progressive multiple alignment. Bioinformatics 19, 1505–1513.PubMedCrossRefGoogle Scholar
  195. 195.
    Edgar, R. C. and Sjolander, K. (2003) Simultaneous sequence alignment and tree construction using hidden Markov models. Pac. Symp. Biocomput. 180–191.Google Scholar
  196. 196.
    Edgar, R. C. and Sjolander, K. (2003) SATCHMO: sequence alignment and tree construction using hidden Markov models. Bioinformatics 19, 1404–1411.PubMedCrossRefGoogle Scholar
  197. 197.
    Loytynoja, A. and Goldman, N. (2005) An algorithm for progressive multiple alignment of sequences with insertions. Proc. Natl. Acad. Sci. USA 102, 10557–10562.Google Scholar
  198. 198.
    Holmes, I. and Durbin, R. (1998) Dynamic programming alignment accuracy. J. Comput. Biol. 5, 493–504.PubMedCrossRefGoogle Scholar
  199. 199.
    Schwartz, A. S., Myers, E., and Pachter, L. (2006) Alignment metric accuracy. arXiv 2006:q-bio.QM/0510052.Google Scholar
  200. 200.
    Roshan, U. and Livesay, D. R. (2006) Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22, 2715–2721.PubMedCrossRefGoogle Scholar
  201. 201.
    Wallace, I. M., O’Sullivan, O., Higgins, D. G., and Notredame, C. (2006) M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res. 34, 1692–1699.PubMedCrossRefGoogle Scholar
  202. 202.
    Kececioglu, J. D. (1993) The maximum weight trace problem in multiple sequence alignment. CPM.Google Scholar
  203. 203.
    Kececioglu, J. D., Lenhof, H.-P., Mehlhorn, K., Mutzel, P., Reinert, K., and Vingron, M. (2000) A polyhedral approach to sequence alignment problems. Disc. Appl. Math. 104, 143–186.CrossRefGoogle Scholar
  204. 204.
    Koller, G. and Raidl, G. R. (2004) An evolutionary algorithm for the maximum weight trace formulation of the multiple sequence alignment problem. In LNCS, 3242, pp. 302–311.Google Scholar
  205. 205.
    Simossis, V. A. and Heringa, J. (2005) PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information. Nucleic Acids Res. 33, W289–294.PubMedCrossRefGoogle Scholar
  206. 206.
    Simossis, V. A., Kleinjung, J., and Heringa, J. (2005) Homology-extended sequence alignment. Nucleic Acids Res. 33, 816–824.PubMedCrossRefGoogle Scholar
  207. 207.
    Thompson, J. D., Plewniak, F., Thierry, J., and Poch, O. (2000) DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches. Nucleic Acids Res. 28, 2919–2926.PubMedCrossRefGoogle Scholar
  208. 208.
    Wang, J. and Feng, J. A. (2005) NdPASA: a novel pairwise protein sequence alignment algorithm that incorporates neighbor-dependent amino acid propensities. Proteins 58, 628–637.PubMedCrossRefGoogle Scholar
  209. 209.
    Yang, A. S. (2002) Structure-dependent sequence alignment for remotely related proteins. Bioinformatics 18, 1658–1665.PubMedCrossRefGoogle Scholar
  210. 210.
    Zhou, H. and Zhou, Y. (2005) SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures. Bioinformatics 21, 3615–3621.PubMedCrossRefGoogle Scholar
  211. 211.
    O’Sullivan, O., Suhre, K., Abergel, C., Higgins, D. G., and Notredame, C. (2004) 3DCoffee: combining protein sequences and structures within multiple sequence alignments. J. Mol. Biol. 340, 385–395.PubMedCrossRefGoogle Scholar
  212. 212.
    Armougom, F., Moretti, S., Poirot, O., Audic, S., Dumas, P., Schaeli, B., et al. (2006) Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee. Nucleic Acids Res. 34, W604–608.PubMedCrossRefGoogle Scholar
  213. 213.
    Thompson, J. D., Plewniak, F., and Poch, O. (1999) BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15, 87–88.PubMedCrossRefGoogle Scholar
  214. 214.
    Thompson, J. D., Plewniak, F., and Poch, O. (1999) A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 27, 2682–2690.PubMedCrossRefGoogle Scholar
  215. 215.
    Mizuguchi, K., Deane, C. M., Blundell, T. L., and Overington, J. P. (1998) HOMSTRAD: a database of protein structure alignments for homologous families. Protein Sci. 7, 2469–2471.PubMedCrossRefGoogle Scholar
  216. 216.
    Van Walle, I., Lasters, I., and Wyns, L. (2005) SABmark–a benchmark for sequence alignment that covers the entire known fold space. Bioinformatics 21, 1267–1268.PubMedCrossRefGoogle Scholar
  217. 217.
    Raghava, G. P., Searle, S. M., Audley, P. C., Barber, J. D., and Barton, G. J. (2003) OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy. BMC Bioinform. 4, 47.CrossRefGoogle Scholar
  218. 218.
    Thompson, J. D., Koehl, P., Ripp, R., and Poch, O. (2005) BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61, 127–136.PubMedCrossRefGoogle Scholar
  219. 219.
    Sauder, J. M., Arthur, J. W., and Dunbrack, R. L., Jr. (2000) Large-scale comparison of protein sequence alignment algorithms with structure alignments. Proteins 40, 6–22.PubMedCrossRefGoogle Scholar
  220. 220.
    Pang, A., Smith, A. D., Nuin, P. A., and Tillier, E. R. (2005) SIMPROT: using an empirically determined indel distribution in simulations of protein evolution. BMC Bioinform. 6, 236.CrossRefGoogle Scholar
  221. 221.
    Nuin, P. A., Wang, Z., and Tillier, E. R. (2006) The accuracy of several multiple sequence alignment programs for proteins. BMC Bioinform. 7, 471.CrossRefGoogle Scholar
  222. 222.
    Stoye, J., Evers, D., and Meyer, F. (1998) Rose: generating sequence families. Bioinformatics 14, 157–163.PubMedCrossRefGoogle Scholar
  223. 223.
    Eidhammer, I., Jonassen, I., and Taylor, W. R. (2000) Structure comparison and structure patterns. J. Comput. Biol. 7, 685–716.PubMedCrossRefGoogle Scholar
  224. 224.
    Carugo, O. and Pongor, S. (2001) A normalized root-mean-square distance for comparing protein three-dimensional structures. Protein Sci. 10, 1470–1473.PubMedCrossRefGoogle Scholar
  225. 225.
    Armougom, F., Moretti, S., Keduas, V., and Notredame, C. (2006) The iRMSD: a local measure of sequence alignment accuracy using structural information. Bioinformatics 22, e35–39.PubMedCrossRefGoogle Scholar
  226. 226.
    Chew, L. P., Huttenlocher, D., Kedem, K., and Kleinberg, J. (1999) Fast detection of common geometric substructure in proteins. J. Comput. Biol. 6, 313–325.PubMedCrossRefGoogle Scholar
  227. 227.
    O’Sullivan, O., Zehnder, M., Higgins, D., Bucher, P., Grosdidier, A., and Notredame, C. (2003) APDB: a novel measure for benchmarking sequence alignment methods without reference alignments. Bioinformatics 19(Suppl 1), i215–221.PubMedCrossRefGoogle Scholar
  228. 228.
    Henikoff, S. and Henikoff, J. G. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919.Google Scholar
  229. 229.
    Dayhoff, M. O., Eck, R. V., and Park, C. M. (1972) A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure (Dayhoff, M. O., ed.). National Biomedical Research Foundation, Washington, DC, pp. 89–99.Google Scholar
  230. 230.
    Dayhoff, M. O., Schwartz, R. M., and Orcutt, B. C. (1978) A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure (Dayhoff, M. O., ed.). National Biomedical Research Foundation, Washington, DC, pp. 345–352.Google Scholar
  231. 231.
    Muller, T. and Vingron, M. (2000) Modeling amino acid replacement. J. Comput. Biol. 7, 761–776.PubMedCrossRefGoogle Scholar
  232. 232.
    Whelan, S. and Goldman, N. (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18, 691–699.PubMedGoogle Scholar
  233. 233.
    Prlic, A., Domingues, F. S., and Sippl, M. J. (2000) Structure-derived substitution matrices for alignment of distantly related sequences. Protein Eng. 13, 545–550.PubMedCrossRefGoogle Scholar
  234. 234.
    Reese, J. T. and Pearson, W. R. (2002) Empirical determination of effective gap penalties for sequence comparison. Bioinformatics 18, 1500–1507.PubMedCrossRefGoogle Scholar
  235. 235.
    Arribas-Gil, A., Gassiat, E., and Matias, C. (2006) Parameter estimation in pair-hidden Markov models. Scand. J. Stat. 33, 651–671.CrossRefGoogle Scholar
  236. 236.
    Liu, J. S., Neuwald, A. F., and Lawrence, C. E. (1995) Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Am. Stat. Assoc. 90, 1156–1170.CrossRefGoogle Scholar
  237. 237.
    Zhu, J., Liu, J. S., and Lawrence, C. E. (1998) Bayesian adaptive sequence alignment algorithms. Bioinformatics 14, 25–39.PubMedCrossRefGoogle Scholar
  238. 238.
    Kececioglu, J. and Kim, E. (2007) Simple and fast inverse alignment. RECOMB.Google Scholar
  239. 239.
    Yu, C.-N., Joachims, T., Elber, R., and Pillardy, J. (2007) Support vector training of protein alignment models. RECOMB.Google Scholar
  240. 240.
    Tsochantaridis, I., Joachims, T., Hofmann, T., and Altun, Y. (2005) Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6, 1453–1484.Google Scholar
  241. 241.
    Katoh, K. and Toh, H. (2007) PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences. Bioinformatics 23, 372–374.PubMedCrossRefGoogle Scholar
  242. 242.
    Ahola, V., Aittokallio, T., Vihinen, M., and Uusipaikka, E. (2006) A statistical score for assessing the quality of multiple sequence alignments. BMC Bioinform. 7, 484.CrossRefGoogle Scholar
  243. 243.
    Altschul, S. F. (1998) Generalized affine gap costs for protein sequence alignment. Proteins 32, 88–96.PubMedCrossRefGoogle Scholar
  244. 244.
    Zachariah, M. A., Crooks, G. E., Holbrook, S. R., and Brenner, S. E. (2005) A generalized affine gap model significantly improves protein sequence alignment accuracy. Proteins 58, 329–338.PubMedCrossRefGoogle Scholar
  245. 245.
    Thompson, J. D., Muller, A., Waterhouse, A., Procter, J., Barton, G. J., Plewniak, F., et al. (2006) MACSIMS: multiple alignment of complete sequences information management system. BMC Bioinform. 7, 318.CrossRefGoogle Scholar
  246. 246.
    Thompson, J. D., Holbrook, S. R., Katoh, K., Koehl, P., Moras, D., Westhof, E., et al. (2005) MAO: a multiple alignment ontology for nucleic acid and protein sequences. Nucleic Acids Res. 33, 4164–4171.PubMedCrossRefGoogle Scholar
  247. 247.
    Gotoh, O. (1999) Multiple sequence alignment: algorithms and applications. Adv. Biophys. 36, 159–206.PubMedCrossRefGoogle Scholar
  248. 248.
    Phillips, A., Janies, D., and Wheeler, W. (2000) Multiple sequence alignment in phylogenetic analysis. Mol. Phylogenet. Evol. 16, 317–330.PubMedCrossRefGoogle Scholar
  249. 249.
    Lambert, C., Campenhout, J. M. V., DeBolle, X., and Depiereux, E. (2003) Review of common sequence alignment methods: clues to enhance reliability. Curr. Genom. 4, 131–146.CrossRefGoogle Scholar
  250. 250.
    Wallace, I. M., Blackshields, G., and Higgins, D. G. (2005) Multiple sequence alignments. Curr. Opin. Struct. Biol. 15, 261–266.PubMedCrossRefGoogle Scholar
  251. 251.
    Edgar, R. C. and Batzoglou, S. (2006) Multiple sequence alignment. Curr. Opin. Struct. Biol. 16, 368–373.PubMedCrossRefGoogle Scholar
  252. 252.
    Morrison, D. A. (2006) Multiple sequence alignment for phylogenetic purposes. Aust. Syst. Bot. 19, 479–539.CrossRefGoogle Scholar
  253. 253.
    Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. (2001) Introduction to Algorithms. MIT Press, Cambridge, MA.Google Scholar
  254. 254.
    Eppstein, D. (2000) Fast hierarchical clustering and other applications of dynamic closest pairs. J. Exp. Algorithmics 5, 1–23.CrossRefGoogle Scholar
  255. 255.
    Elias, I. and Lagergren, J. (2005) Fast neighbor joining. ICALP.Google Scholar
  256. 256.
    Waterman, M. S., Eggert, M., and Lander, E. (1992) Parametric sequence comparisons. Proc. Natl. Acad. Sci. USA 89, 6090–6093.Google Scholar
  257. 257.
    Waterman, M. S. (1994) Parametric and ensemble sequence alignment algorithms. Bull. Math. Biol. 56, 743–767.PubMedGoogle Scholar
  258. 258.
    Gusfield, D., Balasubramanian, K., and Naor, D. (1994) Parametric optimization of sequence alignment. Algorithmica 12, 312–326.CrossRefGoogle Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Chuong B. Do
    • 1
  • Kazutaka Katoh
    • 1
  1. 1.Computer Science DepartmentStanford UniversityStanfordUSA

Personalised recommendations