Skip to main content

MSARC: Multiple Sequence Alignment by Residue Clustering

  • Conference paper
Algorithms in Bioinformatics (WABI 2013)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8126))

Included in the following conference series:

  • 2025 Accesses

Abstract

Progressive methods offer efficient and reasonably good solutions to the multiple sequence alignment problem. However, resulting alignments are biased by guide-trees, especially for relatively distant sequences.

We propose MSARC, a new graph-clustering based algorithm that aligns sequence sets without guide-trees. Experiments on the BAliBASE dataset show that MSARC achieves alignment quality similar to best progressive methods and substantially higher than the quality of other non-progressive algorithms. Furthermore, MSARC outperforms all other methods on sequence sets whose evolutionary distances are hardly representable by a phylogenetic tree. These datasets are most exposed to the guide-tree bias of alignments.

MSARC is available at http://bioputer.mimuw.edu.pl/msarc .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Do, C.B., Mahabhashyam, M.S.P., Brudno, M., Batzoglou, S.: Probcons: Probabilistic consistency-based multiple sequence alignment. Genome. Res. 15(2), 330–340 (2005), http://dx.doi.org/10.1101/gr.2821705

    Article  Google Scholar 

  2. Edgar, R.C.: Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5), 1792–1797 (2004), http://dx.doi.org/10.1093/nar/gkh340

    Article  Google Scholar 

  3. Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network partitions. In: Proceedings of the 19th Design Automation Conference, DAC 1982, pp. 175–181. IEEE Press, Piscataway (1982), http://dl.acm.org/citation.cfm?id=800263.809204

    Google Scholar 

  4. Gonnet, G.H., Cohen, M.A., Benner, S.A.: Exhaustive matching of the entire protein sequence database. Science 256(5062), 1443–1445 (1992)

    Article  Google Scholar 

  5. Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162(3), 705–708 (1982)

    Article  Google Scholar 

  6. Guindon, S., Dufayard, J.F., Lefort, V., Anisimova, M., Hordijk, W., Gascuel, O.: New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of phyml 3.0. Syst. Biol. 59(3), 307–321 (2010), http://dx.doi.org/10.1093/sysbio/syq010

    Article  Google Scholar 

  7. Hendrickson, B., Leland, R.: A multilevel algorithm for partitioning graphs. In: Proceedings of the 1995 ACM/IEEE Conference on Supercomputing (CDROM). Supercomputing 1995. ACM, New York (1995), http://doi.acm.org/10.1145/224170.224228

    Google Scholar 

  8. Katoh, K., Kuma, K.-I., Toh, H., Miyata, T.: Mafft version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33(2), 511–518 (2005), http://dx.doi.org/10.1093/nar/gki198

    Article  Google Scholar 

  9. Kececioglu, J.: The maximum weight trace problem in multiple sequence alignment. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1993. LNCS, vol. 684, pp. 106–119. Springer, Heidelberg (1993)

    Chapter  Google Scholar 

  10. Liu, K., Raghavan, S., Nelesen, S., Linder, C.R., Warnow, T.: Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science 324(5934), 1561–1564 (2009), http://dx.doi.org/10.1126/science.1171243

    Article  Google Scholar 

  11. Löytynoja, A., Goldman, N.: Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320(5883), 1632–1635 (2008), http://dx.doi.org/10.1126/science.1158395

    Article  Google Scholar 

  12. Lunter, G., Miklós, I., Drummond, A., Jensen, J.L., Hein, J.: Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinformatics 6, 83 (2005), http://dx.doi.org/10.1186/1471-2105-6-83

    Article  Google Scholar 

  13. Miyazawa, S.: A reliable sequence alignment method based on probabilities of residue correspondences. Protein Eng. 8(10), 999–1009 (1995)

    Article  MathSciNet  Google Scholar 

  14. MĂŒckstein, U., Hofacker, I.L., Stadler, P.F.: Stochastic pairwise alignments. Bioinformatics 18(suppl. 2), S153–S160 (2002)

    Google Scholar 

  15. Notredame, C., Higgins, D.G., Heringa, J.: T-coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302(1), 205–217 (2000), http://dx.doi.org/10.1006/jmbi.2000.4042

    Article  Google Scholar 

  16. Redelings, B.D., Suchard, M.A.: Joint bayesian estimation of alignment and phylogeny. Syst. Biol. 54(3), 401–418 (2005), http://dx.doi.org/10.1080/10635150590947041

    Article  Google Scholar 

  17. Roshan, U., Livesay, D.R.: Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22(22), 2715–2721 (2006), http://dx.doi.org/10.1093/bioinformatics/btl472

    Article  Google Scholar 

  18. Sievers, F., Wilm, A., Dineen, D., Gibson, T.J., Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M., Söding, J., Thompson, J.D., Higgins, D.G.: Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega. Mol. Syst. Biol. 7, 539 (2011), http://dx.doi.org/10.1038/msb.2011.75

    Article  Google Scholar 

  19. Subramanian, A.R., Kaufmann, M., Morgenstern, B.: Dialign-tx: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol. Biol. 3, 6 (2008), http://dx.doi.org/10.1186/1748-7188-3-6

    Article  Google Scholar 

  20. Subramanian, A.R., Weyer-Menkhoff, J., Kaufmann, M., Morgenstern, B.: Dialign-t: an improved algorithm for segment-based multiple sequence alignment. BMC Bioinformatics 6, 66 (2005), http://dx.doi.org/10.1186/1471-2105-6-66

    Article  Google Scholar 

  21. Thompson, J.D., Higgins, D.G., Gibson, T.J.: Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22(22), 4673–4680 (1994)

    Article  Google Scholar 

  22. Thompson, J.D., Koehl, P., Ripp, R., Poch, O.: Balibase 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61(1), 127–136 (2005), http://dx.doi.org/10.1002/prot.20527

    Article  Google Scholar 

  23. Wong, K.M., Suchard, M.A., Huelsenbeck, J.P.: Alignment uncertainty and genomic analysis. Science 319(5862), 473–476 (2008), http://dx.doi.org/10.1126/science.1151532

    Article  MathSciNet  MATH  Google Scholar 

  24. Yu, Y.K., Hwa, T.: Statistical significance of probabilistic sequence alignment and related local hidden markov models. J. Comput. Biol. 8(3), 249–282 (2001), http://dx.doi.org/10.1089/10665270152530845

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Modzelewski, M., Dojer, N. (2013). MSARC: Multiple Sequence Alignment by Residue Clustering. In: Darling, A., Stoye, J. (eds) Algorithms in Bioinformatics. WABI 2013. Lecture Notes in Computer Science(), vol 8126. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40453-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40453-5_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40452-8

  • Online ISBN: 978-3-642-40453-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics