A flexible method to align large numbers of biological sequences

Taylor, William R.

doi:10.1007/BF02143508

A flexible method to align large numbers of biological sequences

Published: December 1988

Volume 28, pages 161–169, (1988)
Cite this article

Journal of Molecular Evolution Aims and scope Submit manuscript

William R. Taylor¹

163 Accesses
168 Citations
Explore all metrics

Summary

A method for the alignment of two or more biological sequences is described. The method is a direct extension of the method of Taylor (1987) incorporating a consensus sequence approach and allows considerable freedom in the control of the clustering of the sequences. At one extreme this is equivalent to the earlier method (Taylor 1987), whereas at the other, the clustering approaches the binary method of Feng and Doolittle (1987). Such freedom allows the program to be adapted to particular problems, which has the important advantage of resulting in considerable savings in computer time, allowing very large problems to be tackled. Besides a detailed analysis of the alignment of the cytochrome c superfamily, the clustering and alignment of the PIR sequence data bank (3500 sequences approx.) is described.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bacon DJ, Anderson WF (1986) Multiple sequence alignment. J Mol Biol 191:153–161
Google Scholar
Bains W (1986) A program to align multiple DNA sequences. Nucleic Acids Res 14:159–177
Google Scholar
Barker WC, Hunt TL, Orcutt BC, George DG, Yeh LS, Chen HR, Blomquist MC, Johnson GC, Seibel-Ross EI, Hong MK, Ledley RS (1984) Protein identification resource, version 4.3. National Biomedical Research Foundation, Washington DC
Google Scholar
Barton JG, Sternberg MJE (1987a) Evaluation and improvements in the automatic alignment of protein sequences. Prot Eng 1:89–94
Google Scholar
Barton JG, Sternberg MJE (1987b) A strategy for the rapid multiple alignment of protein sequences. J Mol Biol 198:327–337
Google Scholar
Bashford D, Chothia C, Lesk AM (1987) Determinants of a protein fold. J Mol Biol 196:199–216
Google Scholar
Bishop M, Thompson E (1984) Fast computer search for similar DNA sequences. Nucleic Acids Res 12:5471–5474
Google Scholar
Coulson AFW, Collins JF, Lyall A (1987) Protein and nucleic acid sequence database searching; a suitable case for parallel processing. Computer J 30:420–423
Google Scholar
Dayhoff MO, Barker WC (1978) Supplement to the (1972) atlas of protein sequence and structure. National Biomedical Research Foundation, Washington DC
Google Scholar
Everitt BS, Dunn G (1983) Advanced methods of data exploration and modelling. Hienmann, New York
Google Scholar
Feng D-F, Doolittle R (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25:351–360
Google Scholar
Feng D-F, Johnson MS, Doolittle RF (1985) Aligning amino acid sequences: comparison of commonly used methods. J Mol Evol 21:112–125
Google Scholar
Gotoh O (1986) Alignment of three biological with an efficient traceback procedure. J Theor Biol 121:327–337
Google Scholar
Gribskov M, McLachlan AD, Eisenberg D (1987) Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci USA 84:4355–4358
Google Scholar
Johnson MS, Doolittle RF (1986) A method for the simultaneous alignment of three or more amino acid sequences. J Mol Evol 23:267–278
Google Scholar
Kruskal JB, Sankoff D (1983) An anthology of algorithms and concepts for sequence comparison. In: Sankoff D, Kruskal JB (eds) Time warps, string edits and macromolecules, chapter 10. Addison-Wesley, MA
Landau GM, Vishkin U, Nussinov R (1987) An efficient string matching algorithm with k differences for nucleotide and amino acid sequences. J Theor Biol 126:483–490
Google Scholar
Lesk AM, Levitt M, Chothia C (1986) Alignment of the amino acid sequences of distantly related proteins using variable gap penalties. Prot eng 1:77–78
Google Scholar
Murata M, Richardson JS, Sussman IL (1985) Simultaneous comparison of three protein sequences. Proc Natl Acad Sci USA 82:3073–3077
Google Scholar
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:444–453
Google Scholar
Orcutt BC, Dayhoff MO, George DG, Barker WC (1984) Users guide for the alignment score program. PIR report ALI-1284, National Biomedical Research Foundation, Washington DC
Google Scholar
Patthy L (1987) Detecting homology of distantly related proteins with consensus sequences. J Mol Biol 198:567–577
Google Scholar
Pearl LH, Taylor WR (1987) A structural model for the retroviral proteases. Nature 329:351–354
Google Scholar
Sankoff D (1975) Minimal mutation trees of proteins. SIAM J Appl Math 78:35–42
Google Scholar
Sankoff D, Cedergren J (1983) Simultaneous comparison of three or more sequences related by a tree. In: Sankoff D, Kruskal JB (eds) Time warps, string edits and macromolecules, chapter 9. Addison-Wesley, MA
Google Scholar
Santibanez M, Rohde K (1987) A multiple sequence alignment program for protein sequences. CABIOS 3:111–114
Google Scholar
Schwartz RM, Dayhoff MO (1978) Origins of prokaryotes, eukaryotes, mitochondria and chloroplasts. Science 199:395–403
Google Scholar
Sobel E, Martinez HM (1986) A multiple sequence alignment program. Nucleic Acids Res 14:363–374
Google Scholar
Taylor WR (1986) Identification of protein sequence homology by consensus template alignment. J Mol Biol 188:233–258
Google Scholar
Taylor WR (1987) Multiple sequence alignment by a pairwise algorithm. CABIOS 3:81–87
Google Scholar
Waterman MS (1986) Multiple sequence alignment by consensus. Nucleic Acids Res 14:9095–9102
Google Scholar
Waterman MS, eggert M (1987) A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. J Mol Biol 197:723–728
Google Scholar
Waterman MS, Perlwitz M (1984) Line geometries for sequence comparison. Bull Math Biol 46:567–577
Google Scholar
Waterman MS, Smith TF, Beyer WA (1976) Some biological sequence metrics. Adv Math 20:367–387
Google Scholar
Wilbur WJ, Lipman DJ (1983) Rapid similarity searches of nucleic acid and protein data banks. Proc Natl Acad Sci USA 80:726–730
Google Scholar
Zvelebil MJ, Barton GJ, Taylor WR, Sternberg MJE (1987) Prediction of protein secondary structure and active sites using the alignment of homologous proteins. J Mol Biol 195:957–961
Google Scholar

Download references

Author information

Authors and Affiliations

The National Institute for Medical Research (MRC), The Ridgeway, Mill Hill, NW7 1AA, London, UK
William R. Taylor

Authors

William R. Taylor
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Taylor, W.R. A flexible method to align large numbers of biological sequences. J Mol Evol 28, 161–169 (1988). https://doi.org/10.1007/BF02143508

Download citation

Received: 24 January 1988
Accepted: 14 April 1988
Issue Date: December 1988
DOI: https://doi.org/10.1007/BF02143508

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A flexible method to align large numbers of biological sequences

Summary

Access this article

Similar content being viewed by others

Clustal Omega, Accurate Alignment of Very Large Numbers of Sequences

The Clustal Omega Multiple Alignment Package

BLAST and FASTA Similarity Searching for Multiple Sequence Alignment

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Key words

Navigation

A flexible method to align large numbers of biological sequences

Summary

Access this article

Similar content being viewed by others

Clustal Omega, Accurate Alignment of Very Large Numbers of Sequences

The Clustal Omega Multiple Alignment Package

BLAST and FASTA Similarity Searching for Multiple Sequence Alignment

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation