Abstract
Given the nucleotide or amino acid sequence of a biological molecule, what do we know about that molecule? We can find biologically relevant information in sequences by searching for particular patterns that may reflect some function of the molecule. These can be catalogued motifs and domains, secondary structure predictions, physical attributes such as hydrophobicity, or even the content of DNA itself as in some of the gene-finding techniques. What about comparisons with other sequences? Can we learn about one molecule by comparing it to another? Yes, naturally we can; inference through similarity is fundamental to all the biological sciences. We can learn a tremendous amount by comparing our sequence against others.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
Suggested Readings
Dynamic Programming
Needleman, S. B. and Wunsch, C. D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins, J. Mol. Biol. 48, 443–453.
Smith, T. F. and Waterman, M. S. (1981) Comparison of bio-sequences, Adv. Appl. Math. 2, 482–489.
Scoring Matrices
Henikoff, S. and Henikoff, J. G. (1992) Amino acid substitution matrices from protein blocks, Proc. Natl. Acad. Sci. USA 89, 10,915–10,919.
Schwartz, R. M. and Dayhoff, M. O. (1979) Matrices for detecting distant relationships, in: Atlas of Protein Sequences and Structure, vol. 5, (Dayhoff, M. O., ed.), National Biomedical Research Foundation, Washington DC, pp. 353–358.
Multiple Sequence Dynamic Programming
Feng, D. F. and Doolittle, R. F. (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees, J. Mol. Evol. 25, 351–360
Genetics Computer Group (GCG), a part of Accelrys Inc., a subsidiary of Pharmacopeia Inc. (©1982–2002) Program Manual for the Wisconsin Package, Version 10.3. (http://www.accelrys.com/products/gcg-wisconsin-package).
Gupta, S. K., Kececioglu, J. D., and Schaffer, A. A. (1995) Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment, J. Comp. Biol. 2, 459–472.
Higgins, D. G., Bleasby, A. J., and Fuchs, R. (1992) CLUSTALV: improved software for multiple sequence alignment, Comp. Appl. Biol. Sci. 8, 189–191.
Smith, R. F. and Smith, T. F. (1992) Pattern-induced multi-sequence alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for comparative protein modeling, Protein Eng. 5, 35–41.
Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice, Nucleic Acids Res. 22, 4673–4680.
Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F., and Higgins, D. G. (1997) The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools, Nucleic Acids Res. 24, 4876–4882.
Applicability Alignment Profiles
Eddy, S. R. (1996) Hidden Markov models, Cuff. Opin. Struct. Biol. 6, 361–365.
Eddy, S. R. (1998) Profile hidden Markov models, Bioinformatics 14, 755–763
Gribskov, M., Luethy, R., and Eisenberg, D. (1989) Profile analysis, in: Methods in Enzymology, vol. 183, Academic Press, San Diego, CA, pp. 146–159.
Gribskov M., McLachlan M., and Eisenberg, D. (1987) Profile analysis: detection of distantly related proteins, Proc. Natl. Acad. Sci. USA 84, 4355–4358.
Complications File Formats
Gilbert, D. G. (1993 [C release] and 1999 [Java release]) ReadSeq, public domain software, Bioinformatics Group, Biology Department, Indiana University, Bloomington, IN. (seeWebsite: http://www.iubio.bio.indiana.edu/soft/molbio/readseq/)
The Protein System Phylogenetic Relaionships
The E. coliDatabase Collection (ECDC) The K12 chromosome, Justus-LiebigUniversitaet, Giessen, Germany. (seeWebsite: http://www.uni-giessen.de/ngx1052/ecdc.htm)
Hasegawa, M., Hashimoto, T., Adachi, J., Iwabe, N., and Miyata, T. (1993) Early branchings in the evolution of Eukaryotes: ancient divergence of Entamoeba that lacks mitochondria revealed by protein sequence data, J. Mol. Evol. 36, 380–388.
Iwabe, N., Kuma, E.-I., Hasegawa, M., Osawa, S., and Miyata, T. (1989) Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes, Proc. Natl. Acad. Sci. USA 86. 9355–9359.
Madsen, H. O. Poulsen, K., Dahl, O., Clark, B. F., and Hjorth, J. P. (1990) Retropseudogenes constitute the major part of the human elongation factor 1 alpha gene family, Nucleic Acids Res. 18, 1513–1516.
Rivera, M. C. and Lake, J. A. (1992) Evidence that eukaryotes and eocyte prokaryotes are immediate relatives, Science 257, 74–76.
What is Availble Running ClustaIX on Your Machine, Briefly
Etzold, T. and Argos, P. (1993) SRS—an indexing and retrieval tool for flat file data libraries, Comp. Appl. Biosci. 9, 49–57.
Gonnet, G. H., Cohen, M. A., and Benner, S. A. (1992) Exhaustive matching of the entire protein sequence database, Science 256, 1443–1145.
Clustalw on the Web
Smith, R. F., Wiese, B. A., Wojzynski, M. K., Davison, D. B., and Worley, K. C. (1996) BCM Search Launcher—an integrated interface to molecular biology data base search and analysis services available on the World Wide Web, Genome Res. 6, 454–62.
Multiple Sequence Alignment and Structure Prediction Alignment Secondary Structure
Guex, N., Diemand, A., and Peitsch, M. C. (1999) Protein modeling for all, Trends Biochem. Sci. 24, 364–367.
Guex, N. and Peitsch, M. C. (1997) SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis 18. 2714–2723.
Rost, B. and Sander, C. (1993) Prediction of protein secondary structure at better than 70% accuracy, J. Mol. Biol. 232, 584–599.
Rost, B. and Sander, C. (1994) Combining evolutionary information and neural networks to predict protein secondary structure, Proteins 19, 55–77.
Sander, C. and Schneider, R. (1991) Database of homology-derived structures and the structural meaning of sequence alignment, Proteins 9, 56–68.
Sayle, R. A. and Milner-White, E. J. (1995) RasMol: biomolecular graphics for all, Trends Biochem. Sci. 20, 374–376.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer Science+Business Media New York
About this chapter
Cite this chapter
Thompson, S.M. (2003). An Introduction to Multiple Sequence Alignment and Analysis. In: Krawetz, S.A., Womble, D.D. (eds) Introduction to Bioinformatics. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-59259-335-4_31
Download citation
DOI: https://doi.org/10.1007/978-1-59259-335-4_31
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-58829-241-4
Online ISBN: 978-1-59259-335-4
eBook Packages: Springer Book Archive