Journal of Molecular Evolution

, Volume 28, Issue 1–2, pp 161–169 | Cite as

A flexible method to align large numbers of biological sequences

  • William R. Taylor
Article

Summary

A method for the alignment of two or more biological sequences is described. The method is a direct extension of the method of Taylor (1987) incorporating a consensus sequence approach and allows considerable freedom in the control of the clustering of the sequences. At one extreme this is equivalent to the earlier method (Taylor 1987), whereas at the other, the clustering approaches the binary method of Feng and Doolittle (1987). Such freedom allows the program to be adapted to particular problems, which has the important advantage of resulting in considerable savings in computer time, allowing very large problems to be tackled. Besides a detailed analysis of the alignment of the cytochrome c superfamily, the clustering and alignment of the PIR sequence data bank (3500 sequences approx.) is described.

Key words

Multiple protein sequence consensus Alignment Evolutionary tree Clustering 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bacon DJ, Anderson WF (1986) Multiple sequence alignment. J Mol Biol 191:153–161Google Scholar
  2. Bains W (1986) A program to align multiple DNA sequences. Nucleic Acids Res 14:159–177Google Scholar
  3. Barker WC, Hunt TL, Orcutt BC, George DG, Yeh LS, Chen HR, Blomquist MC, Johnson GC, Seibel-Ross EI, Hong MK, Ledley RS (1984) Protein identification resource, version 4.3. National Biomedical Research Foundation, Washington DCGoogle Scholar
  4. Barton JG, Sternberg MJE (1987a) Evaluation and improvements in the automatic alignment of protein sequences. Prot Eng 1:89–94Google Scholar
  5. Barton JG, Sternberg MJE (1987b) A strategy for the rapid multiple alignment of protein sequences. J Mol Biol 198:327–337Google Scholar
  6. Bashford D, Chothia C, Lesk AM (1987) Determinants of a protein fold. J Mol Biol 196:199–216Google Scholar
  7. Bishop M, Thompson E (1984) Fast computer search for similar DNA sequences. Nucleic Acids Res 12:5471–5474Google Scholar
  8. Coulson AFW, Collins JF, Lyall A (1987) Protein and nucleic acid sequence database searching; a suitable case for parallel processing. Computer J 30:420–423Google Scholar
  9. Dayhoff MO, Barker WC (1978) Supplement to the (1972) atlas of protein sequence and structure. National Biomedical Research Foundation, Washington DCGoogle Scholar
  10. Everitt BS, Dunn G (1983) Advanced methods of data exploration and modelling. Hienmann, New YorkGoogle Scholar
  11. Feng D-F, Doolittle R (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25:351–360Google Scholar
  12. Feng D-F, Johnson MS, Doolittle RF (1985) Aligning amino acid sequences: comparison of commonly used methods. J Mol Evol 21:112–125Google Scholar
  13. Gotoh O (1986) Alignment of three biological with an efficient traceback procedure. J Theor Biol 121:327–337Google Scholar
  14. Gribskov M, McLachlan AD, Eisenberg D (1987) Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci USA 84:4355–4358Google Scholar
  15. Johnson MS, Doolittle RF (1986) A method for the simultaneous alignment of three or more amino acid sequences. J Mol Evol 23:267–278Google Scholar
  16. Kruskal JB, Sankoff D (1983) An anthology of algorithms and concepts for sequence comparison. In: Sankoff D, Kruskal JB (eds) Time warps, string edits and macromolecules, chapter 10. Addison-Wesley, MAGoogle Scholar
  17. Landau GM, Vishkin U, Nussinov R (1987) An efficient string matching algorithm with k differences for nucleotide and amino acid sequences. J Theor Biol 126:483–490Google Scholar
  18. Lesk AM, Levitt M, Chothia C (1986) Alignment of the amino acid sequences of distantly related proteins using variable gap penalties. Prot eng 1:77–78Google Scholar
  19. Murata M, Richardson JS, Sussman IL (1985) Simultaneous comparison of three protein sequences. Proc Natl Acad Sci USA 82:3073–3077Google Scholar
  20. Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:444–453Google Scholar
  21. Orcutt BC, Dayhoff MO, George DG, Barker WC (1984) Users guide for the alignment score program. PIR report ALI-1284, National Biomedical Research Foundation, Washington DCGoogle Scholar
  22. Patthy L (1987) Detecting homology of distantly related proteins with consensus sequences. J Mol Biol 198:567–577Google Scholar
  23. Pearl LH, Taylor WR (1987) A structural model for the retroviral proteases. Nature 329:351–354Google Scholar
  24. Sankoff D (1975) Minimal mutation trees of proteins. SIAM J Appl Math 78:35–42Google Scholar
  25. Sankoff D, Cedergren J (1983) Simultaneous comparison of three or more sequences related by a tree. In: Sankoff D, Kruskal JB (eds) Time warps, string edits and macromolecules, chapter 9. Addison-Wesley, MAGoogle Scholar
  26. Santibanez M, Rohde K (1987) A multiple sequence alignment program for protein sequences. CABIOS 3:111–114Google Scholar
  27. Schwartz RM, Dayhoff MO (1978) Origins of prokaryotes, eukaryotes, mitochondria and chloroplasts. Science 199:395–403Google Scholar
  28. Sobel E, Martinez HM (1986) A multiple sequence alignment program. Nucleic Acids Res 14:363–374Google Scholar
  29. Taylor WR (1986) Identification of protein sequence homology by consensus template alignment. J Mol Biol 188:233–258Google Scholar
  30. Taylor WR (1987) Multiple sequence alignment by a pairwise algorithm. CABIOS 3:81–87Google Scholar
  31. Waterman MS (1986) Multiple sequence alignment by consensus. Nucleic Acids Res 14:9095–9102Google Scholar
  32. Waterman MS, eggert M (1987) A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. J Mol Biol 197:723–728Google Scholar
  33. Waterman MS, Perlwitz M (1984) Line geometries for sequence comparison. Bull Math Biol 46:567–577Google Scholar
  34. Waterman MS, Smith TF, Beyer WA (1976) Some biological sequence metrics. Adv Math 20:367–387Google Scholar
  35. Wilbur WJ, Lipman DJ (1983) Rapid similarity searches of nucleic acid and protein data banks. Proc Natl Acad Sci USA 80:726–730Google Scholar
  36. Zvelebil MJ, Barton GJ, Taylor WR, Sternberg MJE (1987) Prediction of protein secondary structure and active sites using the alignment of homologous proteins. J Mol Biol 195:957–961Google Scholar

Copyright information

© Springer-Verlag New York Inc. 1988

Authors and Affiliations

  • William R. Taylor
    • 1
  1. 1.The National Institute for Medical Research (MRC)The RidgewayLondonUK

Personalised recommendations