Skip to main content

Multiple Protein Sequence Alignment with MSAProbs

Part of the Methods in Molecular Biology book series (MIMB,volume 1079)

Abstract

Multiple sequence alignment (MSA) generally constitutes the foundation of many bioinformatics studies involving functional, structural, and evolutionary relationship analysis between sequences. As a result of the exponential computational complexity of the exact approach to producing optimal multiple alignments, the majority of state-of-the-art MSA algorithms are designed based on the progressive alignment heuristic. In this chapter, we outline MSAProbs, a parallelized MSA algorithm for protein sequences based on progressive alignment. To achieve high alignment accuracy, this algorithm employs a hybrid combination of a pair hidden Markov model and a partition function to calculate posterior probabilities. Furthermore, we provide some practical advice on the usage of the algorithm.

Key words

  • Multiple sequence alignment
  • Progressive alignment
  • Hidden Markov models
  • Partition function
  • Consistency-based scheme

This is a preview of subscription content, access via your institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-1-62703-646-7_14
  • Chapter length: 8 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   109.00
Price excludes VAT (USA)
  • ISBN: 978-1-62703-646-7
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   139.99
Price excludes VAT (USA)
Hardcover Book
USD   179.99
Price excludes VAT (USA)
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Feng DF, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25:351–361

    PubMed  CrossRef  CAS  Google Scholar 

  2. Liu Y, Schmidt B, Maskell DL (2010) MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities. Bioinformatics 26:1958–1964

    PubMed  CrossRef  CAS  Google Scholar 

  3. Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge

    CrossRef  Google Scholar 

  4. Miyazawa S (1995) A reliable sequence alignment method based on probabilities of residue correspondences. Protein Eng 8:999–1009

    PubMed  CrossRef  CAS  Google Scholar 

  5. Thompson JD, Koehl P, Ripp R, Poch O (2005) BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61:127–136

    PubMed  CrossRef  CAS  Google Scholar 

  6. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797

    PubMed  CrossRef  CAS  Google Scholar 

  7. Sievers F, Wilm A, Dineen D et al (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7:539

    PubMed  CrossRef  Google Scholar 

  8. Chang JM, Di Tommaso P, Taly JF et al (2012) Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee. BMC Bioinformatics 13:S1

    PubMed  CrossRef  CAS  Google Scholar 

  9. Deng X, Cheng J (2011) MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue–residue contacts. BMC Bioinformatics 12:472

    PubMed  CrossRef  CAS  Google Scholar 

  10. Vingron M, Argos P (1989) A fast and sensitive multiple sequence alignment algorithm. Comput Appl Biosci 5:115–121

    PubMed  CAS  Google Scholar 

  11. Gotoh O (1990) Consistency of optimal sequence alignments. Bull Math Biol 52:509–525

    PubMed  CAS  Google Scholar 

  12. Notredame C, Holm L, Higgins DG (1998) COFFEE: an objective function for multiple sequence alignments. Bioinformatics 14:407–422

    PubMed  CrossRef  CAS  Google Scholar 

  13. Notredame C, Higgins DG, Heringa J (2000) T-coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217

    PubMed  CrossRef  CAS  Google Scholar 

  14. Do CB, Mahabhashyam MS, Brudno M et al (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15:330–340

    PubMed  CrossRef  CAS  Google Scholar 

  15. Liu Y, Schmidt B, Maskell DL (2009) MSA-CUDA: multiple sequence alignment on graphics processing units with CUDA. 20th IEEE international conference on application-specific systems, architectures and processors, pp 121–128

    Google Scholar 

  16. Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680

    PubMed  CrossRef  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Liu, Y., Schmidt, B. (2014). Multiple Protein Sequence Alignment with MSAProbs. In: Russell, D. (eds) Multiple Sequence Alignment Methods. Methods in Molecular Biology, vol 1079. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-646-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-62703-646-7_14

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-62703-645-0

  • Online ISBN: 978-1-62703-646-7

  • eBook Packages: Springer Protocols