Improving Pairwise Sequence Alignment between Distantly Related Proteins

  • Jin-an Feng
Part of the Methods in Molecular Biology™ book series (MIMB, volume 395)


Sequence alignment between remotely related proteins has been one of the more difficult problems in structural biology. Improvements have been achieved by incorporating information that enhances the diversity of the substitution matrices. NdPASA is a web-based server that optimizes sequence alignments between proteins sharing low percentages of sequence identity. The program integrates structure information of the template sequence into a global alignment algorithm by employing amino acids’ neighbor-dependent propensities for secondary structure as unique parameters for alignment. NdPASA optimizes alignment by evaluating the likelihood of a residue pair in the query sequence matching against a corresponding residue pair adopting a particular secondary structure in the template sequence. The server is designed to aid homologous protein structure modeling. It is most effective when the structure of the template sequence is known. NdPASA can be accessed online at


Sequence alignment propensity protein structures sequence pattern secondary structure 



The author would like to thank Wei Li, Junwen Wang for their contributions in developing NdPASA. The author also thanks for the financial support from the National Institutes of Health (GM54630), the American Cancer Society (PRG9926301GMC), and an appropriation from the commonwealth of Pennsylvania.


  1. 1.
    Pearson, W. R. and Lipman, D. J. (1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448.Google Scholar
  2. 2.
    Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) J. Mol. Biol. 215, 403–410.PubMedGoogle Scholar
  3. 3.
    Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389–3402.CrossRefPubMedGoogle Scholar
  4. 4.
    Chothia, C. and Lesk, A. M. (1986) The relation between the divergence of sequence and structure in proteins. EMBO J. 5, 823–826.PubMedGoogle Scholar
  5. 5.
    Scharf, M., Schneider, R., Casari, G., et al. (1994) GeneQuiz: a workbench for sequence analysis. ISMB 2, 348–353.PubMedGoogle Scholar
  6. 6.
    Abagyan, R. A. and Batalov, S. (1997) Do aligned sequences share the same fold? J. Mol. Biol. 273, 355–368.CrossRefPubMedGoogle Scholar
  7. 7.
    Teichmann, S. A., Chothia, C., and Gerstein, M. (1999) Advances in structural genomics. Curr. Opin. Struct. Biol. 9, 390–399.CrossRefPubMedGoogle Scholar
  8. 8.
    Feng, D. F., Johnson, M. S., and Doolittle, R. F. (1985) Aligning amino acid sequences: comparison of commonly used methods. J. Mol. Evol. 212, 112–125.CrossRefGoogle Scholar
  9. 9.
    Rost, B. (1999) Twilight zone of protein sequence alignments. Protein Eng. 12, 85–94.CrossRefPubMedGoogle Scholar
  10. 10.
    Dayhoff, M., Schwartz, R. M., and Orcutt, B. C. (1978) A model of evolutionary change in proteins, in Atlas of Protein Sequence and Structure, (Dayhoff, M. ed.), National Biomedical Research Foundation, Silver Springs, MD, pp. 345–352.Google Scholar
  11. 11.
    Henikoff, S. and Henikoff, J. G. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10,915–10,919.Google Scholar
  12. 12.
    Gribskov, M., McLachlan, A. D., and Eisenberg, D. (1987) Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. USA 84, 4355–4358.CrossRefPubMedGoogle Scholar
  13. 13.
    Marti-Renom, M. A., Madhusudhan, M. S., and Sali, A. (2004) Alignment of protein sequences by their profiles. Protein Sci. 13, 1071–1087.CrossRefPubMedGoogle Scholar
  14. 14.
    Shi, J., Blundell, T. L., and Mizuguchi, K. (2001) FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol. 310, 243–257.CrossRefPubMedGoogle Scholar
  15. 15.
    Ogata, K., Ohya, M., and Umeyama, H. (1998) Amino acid similarity matrix for homology modeling derived from structural alignment and optimized by the Monte Carlo method. J. Mol. Graph. Model. 16, 178–189.PubMedGoogle Scholar
  16. 16.
    Johnson, M. S. and Overington, J. P. (1993) A structural basis for sequence comparisons An evaluation of scoring methodologies. J. Mol. Biol. 233, 716–738.CrossRefPubMedGoogle Scholar
  17. 17.
    Russell, R. B., Saqi, M. A., Sayle, R. A., Bates, P. A., and Sternberg, M. J. (1997) Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. J. Mol. Biol. 269, 423–439.CrossRefPubMedGoogle Scholar
  18. 18.
    May, A. C. and Johnson, M. S. (1995) Improved genetic algorithm-based protein structure comparisons: pairwise and multiple superpositions. Protein Eng. 8, 873–882.CrossRefPubMedGoogle Scholar
  19. 19.
    Prlic, A., Domingues, F. S., and Sippl, M. J. (2000) Structure-derived substitution matrices for alignment of distantly related sequences. Protein Eng. 13, 545–550.CrossRefPubMedGoogle Scholar
  20. 20.
    Blake, J. D. and Cohen, F. E. (2001) Pairwise sequence alignment below the twilight zone. J. Mol. Biol. 307, 721–735.CrossRefPubMedGoogle Scholar
  21. 21.
    Yang, A. S. (2002) Structure-dependent sequence alignment for remotely related proteins Bioinformatics 18, 1658–1665.CrossRefPubMedGoogle Scholar
  22. 22.
    Panchenko, A. R. and Bryant, S. H. (2002) A comparison of position-specific score matrices based on sequence and structure alignments. Protein Sci. 11, 361–370.CrossRefPubMedGoogle Scholar
  23. 23.
    Tang, C. L., Xie, L., Koh, I. Y. Y., Posy, S., Alexov, E., and Honig, B. (2003) On the role of structural information in remote homology detection and sequence alignment: New methods using hybrid sequence profiles. J. Mol. Biol. 334, 1043–1062.CrossRefPubMedGoogle Scholar
  24. 24.
    Wang, J. and Feng, J. A. (2005) NdPASA: a novel pair-wise protein sequence alignment that incorporates neighbor-dependent amino acid propensities. Proteins 58, 628–637.CrossRefPubMedGoogle Scholar
  25. 25.
    Crasto, C. J. and Feng, J. A. (2001) Sequence codes for extended conformation: a neighbor-dependent sequence analysis of loops in proteins. Proteins 42, 399–413.CrossRefPubMedGoogle Scholar
  26. 26.
    Wang, J. and Feng, J. A. (2003) Exploring the sequence patterns in the alpha-helices of proteins. Protein Eng. 16, 799–807.CrossRefPubMedGoogle Scholar
  27. 27.
    Berstein, F. C., Koetle, T. F., Williams, G. J. B., et al. (1977) The protein data bank: a computer-based archival file for macromelecular structures. J. Mol. Biol. 112, 535–542.CrossRefGoogle Scholar
  28. 28.
    Wang, G. and Dunbrack, R. L. (2003) PISCES: a protein sequence culling server Bioinformatics 19, 1589–1591.CrossRefPubMedGoogle Scholar
  29. 29.
    Kabsch, W. and Sander, C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637.CrossRefPubMedGoogle Scholar
  30. 30.
    Chou, P. Y. and Fasman, G. D. (1974) Conformational parameters for amino acids in helical, β-sheet, and random coil regions calculated from proteins. Biochemistry 15, 211–221.CrossRefGoogle Scholar
  31. 31.
    Murzin, A. G., Brenner, S. E., Hubbard, T., and Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540.PubMedGoogle Scholar
  32. 32.
    Needleman, S. B. and Wunsch, C. D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453.CrossRefPubMedGoogle Scholar
  33. 33.
    Ginalski, K., Pas, J., Wyrwicz, L. S., von Grotthuss, M., Bujnicki, J. M., and Rychlewski, L. (2003) ORFeus: detection of distant homology using sequence profiles and predicted secondary structure. Nucl. Acids Res. 31, 3804–3807.CrossRefPubMedGoogle Scholar
  34. 34.
    Smith, T. F. and Waterman, M. S. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197.CrossRefPubMedGoogle Scholar
  35. 35.
    Ortiz, A. R., Strauss, C. E., and Olmea, O. (2002) MAMMOTH: matching molecular models obtained from theory: an automated method for model comparison. Protein Sci. 11, 2606–2621.CrossRefPubMedGoogle Scholar
  36. 36.
    Bryson, K., McGuffin, L. J., Marsden, R. L., Ward, J. J., Sodhi, J. S., and Jones, D. T. (2005) Protein structure prediction servers at University College London. Nucl. Acids Res. 33, W36–W38.CrossRefPubMedGoogle Scholar
  37. 37.
    Jones, D. T. (1999) GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J. Mol. Biol. 287, 797–815.CrossRefPubMedGoogle Scholar
  38. 38.
    Kelley, L. A., MacCallum, R. M., and Sternberg, M. J. (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol. 299, 523–544.CrossRefGoogle Scholar
  39. 39.
    Wallner, B. and Elofsson, A. (2005) Pcons5: combining consensus, structural evaluation and fold recognition scores. Bioinformatics 21, 4248–4254.CrossRefPubMedGoogle Scholar

Copyright information

© Humana Press Inc. 2007

Authors and Affiliations

  • Jin-an Feng
    • 1
  1. 1.Department of Chemistry, Center for BiotechnologyTemple UniversityUSA

Personalised recommendations