PROMALS3D: Multiple Protein Sequence Alignment Enhanced with Evolutionary and Three-Dimensional Structural Information

  • Jimin Pei
  • Nick V. Grishin
Part of the Methods in Molecular Biology book series (MIMB, volume 1079)


Multiple sequence alignment (MSA) is an essential tool with many applications in bioinformatics and computational biology. Accurate MSA construction for divergent proteins remains a difficult computational task. The constantly increasing protein sequences and structures in public databases could be used to improve alignment quality. PROMALS3D is a tool for protein MSA construction enhanced with additional evolutionary and structural information from database searches. PROMALS3D automatically identifies homologs from sequence and structure databases for input proteins, derives structure-based constraints from alignments of three-dimensional structures, and combines them with sequence-based constraints of profile–profile alignments in a consistency-based framework to construct high-quality multiple sequence alignments. PROMALS3D output is a consensus alignment enriched with sequence and structural information about input proteins and their homologs. PROMALS3D Web server and package are available at

Key words

Multiple sequence alignment Database searches Three-dimensional structural alignment Consistency-based scoring Probabilistic model of profile–profile alignment 



The work is supported in part by the National Institutes of Health (GM094575 to NVG) and the Welch Foundation (I-1505 to NVG).


  1. 1.
    Do CB, Katoh K (2008) Protein multiple sequence alignment. In: Walker J (ed) Methods Mol Biol, vol 484, 1st edn. Humana, Totowa, pp 379–413Google Scholar
  2. 2.
    Pei J (2008) Multiple protein sequence alignment. Curr Opin Struct Biol 18(3):382–386PubMedCrossRefGoogle Scholar
  3. 3.
    Notredame C (2007) Recent evolutions of multiple sequence alignment algorithms. PLoS Comput Biol 3(8):e123PubMedCrossRefGoogle Scholar
  4. 4.
    Edgar RC, Batzoglou S (2006) Multiple sequence alignment. Curr Opin Struct Biol 16(3):368–373PubMedCrossRefGoogle Scholar
  5. 5.
    Wallace IM, Blackshields G, Higgins DG (2005) Multiple sequence alignments. Curr Opin Struct Biol 15(3):261–266PubMedCrossRefGoogle Scholar
  6. 6.
    Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453PubMedCrossRefGoogle Scholar
  7. 7.
    Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197PubMedCrossRefGoogle Scholar
  8. 8.
    Lipman DJ, Altschul SF, Kececioglu JD (1989) A tool for multiple sequence alignment. Proc Natl Acad Sci USA 86(12):4412–4415PubMedCrossRefGoogle Scholar
  9. 9.
    Wang L, Jiang T (1994) On the complexity of multiple sequence alignment. J Comput Biol 1(4):337–348PubMedCrossRefGoogle Scholar
  10. 10.
    Feng DF, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25(4):351–360PubMedCrossRefGoogle Scholar
  11. 11.
    Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22):4673–4680PubMedCrossRefGoogle Scholar
  12. 12.
    Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797PubMedCrossRefGoogle Scholar
  13. 13.
    Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30(14):3059–3066PubMedCrossRefGoogle Scholar
  14. 14.
    Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302(1):205–217PubMedCrossRefGoogle Scholar
  15. 15.
    Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15(2):330–340PubMedCrossRefGoogle Scholar
  16. 16.
    Pei J, Grishin NV (2006) MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information. Nucleic Acids Res 34(16):4364–4374PubMedCrossRefGoogle Scholar
  17. 17.
    Sadreyev R, Grishin N (2003) COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. J Mol Biol 326(1):317–336PubMedCrossRefGoogle Scholar
  18. 18.
    Soding J (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21(7):951–960PubMedCrossRefGoogle Scholar
  19. 19.
    Pei J, Grishin NV (2007) PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 23(7):802–808PubMedCrossRefGoogle Scholar
  20. 20.
    Deng X, Cheng J (2011) MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue-residue contacts. BMC Bioinformatics 12:472PubMedCrossRefGoogle Scholar
  21. 21.
    Chothia C, Lesk AM (1986) The relation between the divergence of sequence and structure in proteins. EMBO J 5(4):823–826PubMedGoogle Scholar
  22. 22.
    Armougom F, Moretti S, Poirot O, Audic S, Dumas P, Schaeli B, Keduas V, Notredame C (2006) Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee. Nucleic Acids Res 34(Web Server issue):W604–W608PubMedCrossRefGoogle Scholar
  23. 23.
    Pei J, Kim BH, Grishin NV (2008) PROMALS3D: a tool for multiple protein sequence and structure alignments. Nucleic Acids Res 36(7):2295–2300PubMedCrossRefGoogle Scholar
  24. 24.
    Zhou H, Zhou Y (2005) SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures. Bioinformatics 21(18):3615–3621PubMedCrossRefGoogle Scholar
  25. 25.
    Pei J, Tang M, Grishin NV (2008) PROMALS3D web server for accurate multiple protein sequence and structure alignments. Nucleic Acids Res 36(Web Server issue):W30–W34PubMedCrossRefGoogle Scholar
  26. 26.
    Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13):1658–1659PubMedCrossRefGoogle Scholar
  27. 27.
    Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33(2):511–518PubMedCrossRefGoogle Scholar
  28. 28.
    Pei J, Sadreyev R, Grishin NV (2003) PCMA: fast and accurate multiple sequence alignment based on profile consistency. Bioinformatics 19(3):427–428PubMedCrossRefGoogle Scholar
  29. 29.
    Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89(22):10915–10919PubMedCrossRefGoogle Scholar
  30. 30.
    Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402PubMedCrossRefGoogle Scholar
  31. 31.
    Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH (2007) UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23(10):1282–1288PubMedCrossRefGoogle Scholar
  32. 32.
    Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292(2):195–202PubMedCrossRefGoogle Scholar
  33. 33.
    Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proceedings of the 18th international conference on machine learning, pp 282–289Google Scholar
  34. 34.
    Pei J, Grishin NV (2001) AL2CO: calculation of positional conservation in a protein sequence alignment. Bioinformatics 17(8):700–712PubMedCrossRefGoogle Scholar
  35. 35.
    Holm L, Park J (2000) DaliLite workbench for protein structure comparison. Bioinformatics 16(6):566–567PubMedCrossRefGoogle Scholar
  36. 36.
    Zhu J, Weng Z (2005) FAST: a novel protein structure alignment algorithm. Proteins 58(3):618–627PubMedCrossRefGoogle Scholar
  37. 37.
    Zhang Y, Skolnick J (2005) TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33(7):2302–2309PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2014

Authors and Affiliations

  • Jimin Pei
    • 1
  • Nick V. Grishin
    • 2
    • 3
  1. 1.Howard Hughes Medical InstituteUniversity of Texas Southwestern Medical CenterDallasUSA
  2. 2.Department of BiophysicsHoward Hughes Medical InstituteDallasUSA
  3. 3.Department of BiochemistryUniversity of Texas Southwestern Medical CenterDallasUSA

Personalised recommendations