MSARC: Multiple Sequence Alignment by Residue Clustering

Modzelewski, Michał; Dojer, Norbert

doi:10.1007/978-3-642-40453-5_20

Michał Modzelewski²¹ &
Norbert Dojer²¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8126))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

2025 Accesses

Abstract

Progressive methods offer efficient and reasonably good solutions to the multiple sequence alignment problem. However, resulting alignments are biased by guide-trees, especially for relatively distant sequences.

We propose MSARC, a new graph-clustering based algorithm that aligns sequence sets without guide-trees. Experiments on the BAliBASE dataset show that MSARC achieves alignment quality similar to best progressive methods and substantially higher than the quality of other non-progressive algorithms. Furthermore, MSARC outperforms all other methods on sequence sets whose evolutionary distances are hardly representable by a phylogenetic tree. These datasets are most exposed to the guide-tree bias of alignments.

MSARC is available at http://bioputer.mimuw.edu.pl/msarc .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Do, C.B., Mahabhashyam, M.S.P., Brudno, M., Batzoglou, S.: Probcons: Probabilistic consistency-based multiple sequence alignment. Genome. Res. 15(2), 330–340 (2005), http://dx.doi.org/10.1101/gr.2821705
Article Google Scholar
Edgar, R.C.: Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5), 1792–1797 (2004), http://dx.doi.org/10.1093/nar/gkh340
Article Google Scholar
Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network partitions. In: Proceedings of the 19th Design Automation Conference, DAC 1982, pp. 175–181. IEEE Press, Piscataway (1982), http://dl.acm.org/citation.cfm?id=800263.809204
Google Scholar
Gonnet, G.H., Cohen, M.A., Benner, S.A.: Exhaustive matching of the entire protein sequence database. Science 256(5062), 1443–1445 (1992)
Article Google Scholar
Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162(3), 705–708 (1982)
Article Google Scholar
Guindon, S., Dufayard, J.F., Lefort, V., Anisimova, M., Hordijk, W., Gascuel, O.: New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of phyml 3.0. Syst. Biol. 59(3), 307–321 (2010), http://dx.doi.org/10.1093/sysbio/syq010
Article Google Scholar
Hendrickson, B., Leland, R.: A multilevel algorithm for partitioning graphs. In: Proceedings of the 1995 ACM/IEEE Conference on Supercomputing (CDROM). Supercomputing 1995. ACM, New York (1995), http://doi.acm.org/10.1145/224170.224228
Google Scholar
Katoh, K., Kuma, K.-I., Toh, H., Miyata, T.: Mafft version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33(2), 511–518 (2005), http://dx.doi.org/10.1093/nar/gki198
Article Google Scholar
Kececioglu, J.: The maximum weight trace problem in multiple sequence alignment. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1993. LNCS, vol. 684, pp. 106–119. Springer, Heidelberg (1993)
Chapter Google Scholar
Liu, K., Raghavan, S., Nelesen, S., Linder, C.R., Warnow, T.: Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science 324(5934), 1561–1564 (2009), http://dx.doi.org/10.1126/science.1171243
Article Google Scholar
Löytynoja, A., Goldman, N.: Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320(5883), 1632–1635 (2008), http://dx.doi.org/10.1126/science.1158395
Article Google Scholar
Lunter, G., Miklós, I., Drummond, A., Jensen, J.L., Hein, J.: Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinformatics 6, 83 (2005), http://dx.doi.org/10.1186/1471-2105-6-83
Article Google Scholar
Miyazawa, S.: A reliable sequence alignment method based on probabilities of residue correspondences. Protein Eng. 8(10), 999–1009 (1995)
Article MathSciNet Google Scholar
Mückstein, U., Hofacker, I.L., Stadler, P.F.: Stochastic pairwise alignments. Bioinformatics 18(suppl. 2), S153–S160 (2002)
Google Scholar
Notredame, C., Higgins, D.G., Heringa, J.: T-coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302(1), 205–217 (2000), http://dx.doi.org/10.1006/jmbi.2000.4042
Article Google Scholar
Redelings, B.D., Suchard, M.A.: Joint bayesian estimation of alignment and phylogeny. Syst. Biol. 54(3), 401–418 (2005), http://dx.doi.org/10.1080/10635150590947041
Article Google Scholar
Roshan, U., Livesay, D.R.: Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22(22), 2715–2721 (2006), http://dx.doi.org/10.1093/bioinformatics/btl472
Article Google Scholar
Sievers, F., Wilm, A., Dineen, D., Gibson, T.J., Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M., Söding, J., Thompson, J.D., Higgins, D.G.: Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega. Mol. Syst. Biol. 7, 539 (2011), http://dx.doi.org/10.1038/msb.2011.75
Article Google Scholar
Subramanian, A.R., Kaufmann, M., Morgenstern, B.: Dialign-tx: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol. Biol. 3, 6 (2008), http://dx.doi.org/10.1186/1748-7188-3-6
Article Google Scholar
Subramanian, A.R., Weyer-Menkhoff, J., Kaufmann, M., Morgenstern, B.: Dialign-t: an improved algorithm for segment-based multiple sequence alignment. BMC Bioinformatics 6, 66 (2005), http://dx.doi.org/10.1186/1471-2105-6-66
Article Google Scholar
Thompson, J.D., Higgins, D.G., Gibson, T.J.: Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22(22), 4673–4680 (1994)
Article Google Scholar
Thompson, J.D., Koehl, P., Ripp, R., Poch, O.: Balibase 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61(1), 127–136 (2005), http://dx.doi.org/10.1002/prot.20527
Article Google Scholar
Wong, K.M., Suchard, M.A., Huelsenbeck, J.P.: Alignment uncertainty and genomic analysis. Science 319(5862), 473–476 (2008), http://dx.doi.org/10.1126/science.1151532
Article MathSciNet MATH Google Scholar
Yu, Y.K., Hwa, T.: Statistical significance of probabilistic sequence alignment and related local hidden markov models. J. Comput. Biol. 8(3), 249–282 (2001), http://dx.doi.org/10.1089/10665270152530845
Article Google Scholar

Download references

Author information

Authors and Affiliations

Insitute of Informatics, University of Warsaw, Poland
Michał Modzelewski & Norbert Dojer

Authors

Michał Modzelewski
View author publications
You can also search for this author in PubMed Google Scholar
Norbert Dojer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ithree institute,, University of Technology Sydney, 2007, Ultimo, NSW, Australia
Aaron Darling
Faculty of Technology, Bielefeld University, Universitätsstraße 25, 33615, Bielefeld, Germany
Jens Stoye

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Modzelewski, M., Dojer, N. (2013). MSARC: Multiple Sequence Alignment by Residue Clustering. In: Darling, A., Stoye, J. (eds) Algorithms in Bioinformatics. WABI 2013. Lecture Notes in Computer Science(), vol 8126. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40453-5_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-40453-5_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40452-8
Online ISBN: 978-3-642-40453-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics