Multiple Sequence Alignment Methods pp 263-271

Part of the Methods in Molecular Biology book series (MIMB, volume 1079) | Cite as

PROMALS3D: Multiple Protein Sequence Alignment Enhanced with Evolutionary and Three-Dimensional Structural Information

  • Jimin Pei
  • Nick V. Grishin
Protocol

Abstract

Multiple sequence alignment (MSA) is an essential tool with many applications in bioinformatics and computational biology. Accurate MSA construction for divergent proteins remains a difficult computational task. The constantly increasing protein sequences and structures in public databases could be used to improve alignment quality. PROMALS3D is a tool for protein MSA construction enhanced with additional evolutionary and structural information from database searches. PROMALS3D automatically identifies homologs from sequence and structure databases for input proteins, derives structure-based constraints from alignments of three-dimensional structures, and combines them with sequence-based constraints of profile–profile alignments in a consistency-based framework to construct high-quality multiple sequence alignments. PROMALS3D output is a consensus alignment enriched with sequence and structural information about input proteins and their homologs. PROMALS3D Web server and package are available at http://prodata.swmed.edu/PROMALS3D.

Key words

Multiple sequence alignment Database searches Three-dimensional structural alignment Consistency-based scoring Probabilistic model of profile–profile alignment 

References

  1. 1.
    Do CB, Katoh K (2008) Protein multiple sequence alignment. In: Walker J (ed) Methods Mol Biol, vol 484, 1st edn. Humana, Totowa, pp 379–413Google Scholar
  2. 2.
    Pei J (2008) Multiple protein sequence alignment. Curr Opin Struct Biol 18(3):382–386PubMedCrossRefGoogle Scholar
  3. 3.
    Notredame C (2007) Recent evolutions of multiple sequence alignment algorithms. PLoS Comput Biol 3(8):e123PubMedCrossRefGoogle Scholar
  4. 4.
    Edgar RC, Batzoglou S (2006) Multiple sequence alignment. Curr Opin Struct Biol 16(3):368–373PubMedCrossRefGoogle Scholar
  5. 5.
    Wallace IM, Blackshields G, Higgins DG (2005) Multiple sequence alignments. Curr Opin Struct Biol 15(3):261–266PubMedCrossRefGoogle Scholar
  6. 6.
    Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453PubMedCrossRefGoogle Scholar
  7. 7.
    Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197PubMedCrossRefGoogle Scholar
  8. 8.
    Lipman DJ, Altschul SF, Kececioglu JD (1989) A tool for multiple sequence alignment. Proc Natl Acad Sci USA 86(12):4412–4415PubMedCrossRefGoogle Scholar
  9. 9.
    Wang L, Jiang T (1994) On the complexity of multiple sequence alignment. J Comput Biol 1(4):337–348PubMedCrossRefGoogle Scholar
  10. 10.
    Feng DF, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25(4):351–360PubMedCrossRefGoogle Scholar
  11. 11.
    Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22):4673–4680PubMedCrossRefGoogle Scholar
  12. 12.
    Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797PubMedCrossRefGoogle Scholar
  13. 13.
    Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30(14):3059–3066PubMedCrossRefGoogle Scholar
  14. 14.
    Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302(1):205–217PubMedCrossRefGoogle Scholar
  15. 15.
    Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15(2):330–340PubMedCrossRefGoogle Scholar
  16. 16.
    Pei J, Grishin NV (2006) MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information. Nucleic Acids Res 34(16):4364–4374PubMedCrossRefGoogle Scholar
  17. 17.
    Sadreyev R, Grishin N (2003) COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. J Mol Biol 326(1):317–336PubMedCrossRefGoogle Scholar
  18. 18.
    Soding J (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21(7):951–960PubMedCrossRefGoogle Scholar
  19. 19.
    Pei J, Grishin NV (2007) PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 23(7):802–808PubMedCrossRefGoogle Scholar
  20. 20.
    Deng X, Cheng J (2011) MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue-residue contacts. BMC Bioinformatics 12:472PubMedCrossRefGoogle Scholar
  21. 21.
    Chothia C, Lesk AM (1986) The relation between the divergence of sequence and structure in proteins. EMBO J 5(4):823–826PubMedGoogle Scholar
  22. 22.
    Armougom F, Moretti S, Poirot O, Audic S, Dumas P, Schaeli B, Keduas V, Notredame C (2006) Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee. Nucleic Acids Res 34(Web Server issue):W604–W608PubMedCrossRefGoogle Scholar
  23. 23.
    Pei J, Kim BH, Grishin NV (2008) PROMALS3D: a tool for multiple protein sequence and structure alignments. Nucleic Acids Res 36(7):2295–2300PubMedCrossRefGoogle Scholar
  24. 24.
    Zhou H, Zhou Y (2005) SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures. Bioinformatics 21(18):3615–3621PubMedCrossRefGoogle Scholar
  25. 25.
    Pei J, Tang M, Grishin NV (2008) PROMALS3D web server for accurate multiple protein sequence and structure alignments. Nucleic Acids Res 36(Web Server issue):W30–W34PubMedCrossRefGoogle Scholar
  26. 26.
    Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13):1658–1659PubMedCrossRefGoogle Scholar
  27. 27.
    Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33(2):511–518PubMedCrossRefGoogle Scholar
  28. 28.
    Pei J, Sadreyev R, Grishin NV (2003) PCMA: fast and accurate multiple sequence alignment based on profile consistency. Bioinformatics 19(3):427–428PubMedCrossRefGoogle Scholar
  29. 29.
    Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89(22):10915–10919PubMedCrossRefGoogle Scholar
  30. 30.
    Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402PubMedCrossRefGoogle Scholar
  31. 31.
    Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH (2007) UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23(10):1282–1288PubMedCrossRefGoogle Scholar
  32. 32.
    Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292(2):195–202PubMedCrossRefGoogle Scholar
  33. 33.
    Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proceedings of the 18th international conference on machine learning, pp 282–289Google Scholar
  34. 34.
    Pei J, Grishin NV (2001) AL2CO: calculation of positional conservation in a protein sequence alignment. Bioinformatics 17(8):700–712PubMedCrossRefGoogle Scholar
  35. 35.
    Holm L, Park J (2000) DaliLite workbench for protein structure comparison. Bioinformatics 16(6):566–567PubMedCrossRefGoogle Scholar
  36. 36.
    Zhu J, Weng Z (2005) FAST: a novel protein structure alignment algorithm. Proteins 58(3):618–627PubMedCrossRefGoogle Scholar
  37. 37.
    Zhang Y, Skolnick J (2005) TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33(7):2302–2309PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2014

Authors and Affiliations

  • Jimin Pei
    • 1
  • Nick V. Grishin
    • 2
    • 3
  1. 1.Howard Hughes Medical InstituteUniversity of Texas Southwestern Medical CenterDallasUSA
  2. 2.Department of BiophysicsHoward Hughes Medical InstituteDallasUSA
  3. 3.Department of BiochemistryUniversity of Texas Southwestern Medical CenterDallasUSA

Personalised recommendations