Multiple Sequence Alignment Based on Profile Alignment of Intermediate Sequences

Lu, Yue; Sze, Sing-Hoi

doi:10.1007/978-3-540-71681-5_20

Yue Lu¹ &
Sing-Hoi Sze^1,2

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4453))

Included in the following conference series:

Annual International Conference on Research in Computational Molecular Biology

1571 Accesses
3 Citations

Abstract

Despite considerable efforts, it remains difficult to obtain accurate multiple sequence alignments. By using additional hits from database search of the input sequences, a few strategies have been proposed to significantly improve alignment accuracy, including the construction of profiles from the hits while performing profile alignment, the inclusion of high scoring hits into the input sequences, the use of intermediate sequence search to link distant homologs, and the use of secondary structure information. We develop an algorithm that integrates these strategies to further improve alignment accuracy by modifying the pair-HMM approach in ProbCons to incorporate profiles of intermediate sequences from database search and utilize secondary structure predictions as in SPEM. We test our algorithm on a few sets of benchmark multiple alignments, including BAliBASE, HOMSTRAD, PREFAB and SABmark, and show that it significantly outperforms MAFFT and ProbCons, which are among the best multiple alignment algorithms that do not utilize additional information, and SPEM, which is among the best multiple alignment algorithms that utilize additional hits from database search. The improvement in accuracy over SPEM can be as much as 5 to 10% when aligning divergent sequences. A software program that implements this approach (ISPAlign) is at http://faculty.cs.tamu.edu/shsze/ispalign.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Article Google Scholar
Bolten, E., Schliep, A., Schneckener, S., Schomburg, D., Schrader, R.: Clustering protein sequences — structure prediction by transitive homology. Bioinformatics 17, 935–941 (2001)
Article Google Scholar
Bucka-Lassen, K., Caprani, O., Hein, J.: Combining many multiple alignments in one improved alignment. Bioinformatics 15, 122–130 (1999)
Article Google Scholar
Do, C.B., Gross, S.S., Batzoglou, S.: CONTRAlign: discriminative training for protein sequence alignment. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P., Waterman, M. (eds.) RECOMB 2006. LNCS (LNBI), vol. 3909, pp. 160–174. Springer, Heidelberg (2006)
Chapter Google Scholar
Do, C.B., Mahabhashyam, M.S., Brudno, M., Batzoglou, S.: ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res. 15, 330–340 (2005)
Article Google Scholar
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis. Cambridge University Press, Cambridge (1998)
MATH Google Scholar
Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004)
Article Google Scholar
Edgar, R.C., Sjölander, K.: A comparison of scoring functions for protein sequence profile alignment. Bioinformatics 20, 1301–1308 (2004)
Article Google Scholar
Gerstein, M.: Measurement of the effectiveness of transitive sequence comparison, through a third ‘intermediate’ sequence. Bioinformatics 14, 707–714 (1998)
Article Google Scholar
Gotoh, O.: Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J. Mol. Biol. 264, 823–838 (1996)
Article Google Scholar
Gusfield, D.: Efficient methods for multiple sequence alignment with guaranteed error bounds. Bull. Math. Biol. 55, 141–154 (1993)
MATH Google Scholar
Heger, A., Lappe, M., Holm, L.: Accurate detection of very sparse sequence motifs. J. Comp. Biol. 11, 843–857 (2004)
Article Google Scholar
Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292, 195–202 (1999)
Article Google Scholar
Katoh, K., Kuma, K., Toh, H., Miyata, T.: MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33, 511–518 (2005)
Article Google Scholar
Lassmann, T., Sonnhammer, E.L.L.: Kalign — an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics 6, 298 (2005)
Article Google Scholar
Lee, C., Grasso, C., Sharlow, M.F.: Multiple sequence alignment using partial order graphs. Bioinformatics 18, 452–464 (2002)
Article Google Scholar
Li, W., Jaroszewski, L., Godzik, A.: Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics 18, 77–82 (2002)
Article Google Scholar
Li, W., Pio, F., Pawlowski, K., Godzik, A.: Saturated BLAST: an automated multiple intermediate sequence search used to detect distant homology. Bioinformatics 16, 1105–1110 (2000)
Article Google Scholar
Margelevičius, M., Venclovas, Č.: PSI-BLAST-ISS: an intermediate sequence search tool for estimation of the position-specific alignment reliability. BMC Bioinformatics 6, 185 (2005)
Article Google Scholar
Marti-Renom, M.A., Madhusudhan, M.S., Sali, A.: Alignment of protein sequences by their profiles. Protein Sci. 13, 1071–1087 (2004)
Article Google Scholar
Mizuguchi, K., Deane, C.M., Blundell, T.L., Overington, J.P.: HOMSTRAD: a database of protein structure alignments for homologous families. Protein Sci. 7, 2469–2471 (1998)
Article Google Scholar
Morgenstern, B., Dress, A., Werner, T.: Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc. Natl. Acad. Sci. USA 93, 12098–12103 (1996)
Article MATH Google Scholar
Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000)
Article Google Scholar
O’Sullivan, O., Suhre, K., Abergel, C., Higgins, D.G., Notredame, C.: 3DCoffee: combining protein sequences and structures within multiple sequence alignments. J. Mol. Biol. 340, 385–395 (2004)
Article Google Scholar
Park, J., Teichmann, S.A., Hubbard, T., Chothia, C.: Intermediate sequences increase the detection of homology between sequences. J. Mol. Biol. 273, 349–354 (1997)
Article Google Scholar
Pei, J., Grishin, N.V.: MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information. Nucleic Acids Res. 34, 4364–4374 (2006)
Article Google Scholar
Roshan, U., Livesay, D.R.: Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22, 2715–2721 (2006)
Article Google Scholar
Salamov, A.A., Suwa, M., Orengo, C.A., Swindells, M.B.: Combining sensitive database searches with multiple intermediates to detect distant homologues. Protein Eng. 12, 95–100 (1999)
Article Google Scholar
Simossis, V.A., Kleinjung, J., Heringa, J.: Homology-extended sequence alignment. Nucleic Acids Res. 33, 816–824 (2005)
Article Google Scholar
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Article Google Scholar
Stoye, J.: Multiple sequence alignment with the divide-and-conquer method. Gene 211, GC45–56 (1998)
Article Google Scholar
Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994)
Article Google Scholar
Thompson, J.D., Koehl, P., Ripp, R., Poch, O.: BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61, 127–136 (2005)
Article Google Scholar
Thompson, J.D., Plewniak, F., Poch, O.: A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 27, 2682–2690 (1999)
Article Google Scholar
Van Walle, I., Lasters, I., Wyns, L.: Align-m — a new algorithm for multiple alignment of highly divergent sequences. Bioinformatics 20, 1428–1435 (2004)
Article Google Scholar
Wallace, I.M., O’Sullivan, O., Higgins, D.G., Notredame, C.: M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res. 34, 1692–1699 (2006)
Article Google Scholar
Wilcoxon, F.: Probability tables for individual comparisons by ranking methods. Biometrics 3, 119–122 (1947)
Article MathSciNet Google Scholar
Yamada, S., Gotoh, O., Yamana, H.: Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost. BMC Bioinformatics 7, 524 (2006)
Article Google Scholar
Zhou, H., Zhou, Y.: SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures. Bioinformatics 21, 3615–3621 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biochemistry & Biophysics,
Yue Lu & Sing-Hoi Sze
Department of Computer Science, Texas A&M University, College Station, TX 77843, USA
Sing-Hoi Sze

Authors

Yue Lu
View author publications
You can also search for this author in PubMed Google Scholar
Sing-Hoi Sze
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Terry Speed Haiyan Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, Y., Sze, SH. (2007). Multiple Sequence Alignment Based on Profile Alignment of Intermediate Sequences. In: Speed, T., Huang, H. (eds) Research in Computational Molecular Biology. RECOMB 2007. Lecture Notes in Computer Science(), vol 4453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71681-5_20

Download citation

DOI: https://doi.org/10.1007/978-3-540-71681-5_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71680-8
Online ISBN: 978-3-540-71681-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics