Skip to main content

An Introduction to Multiple Sequence Alignment and Analysis

  • Chapter
  • 1012 Accesses

Abstract

Given the nucleotide or amino acid sequence of a biological molecule, what do we know about that molecule? We can find biologically relevant information in sequences by searching for particular patterns that may reflect some function of the molecule. These can be catalogued motifs and domains, secondary structure predictions, physical attributes such as hydrophobicity, or even the content of DNA itself as in some of the gene-finding techniques. What about comparisons with other sequences? Can we learn about one molecule by comparing it to another? Yes, naturally we can; inference through similarity is fundamental to all the biological sciences. We can learn a tremendous amount by comparing our sequence against others.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Suggested Readings

Dynamic Programming

  • Needleman, S. B. and Wunsch, C. D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins, J. Mol. Biol. 48, 443–453.

    Article  PubMed  CAS  Google Scholar 

  • Smith, T. F. and Waterman, M. S. (1981) Comparison of bio-sequences, Adv. Appl. Math. 2, 482–489.

    Article  Google Scholar 

Scoring Matrices

  • Henikoff, S. and Henikoff, J. G. (1992) Amino acid substitution matrices from protein blocks, Proc. Natl. Acad. Sci. USA 89, 10,915–10,919.

    Article  Google Scholar 

  • Schwartz, R. M. and Dayhoff, M. O. (1979) Matrices for detecting distant relationships, in: Atlas of Protein Sequences and Structure, vol. 5, (Dayhoff, M. O., ed.), National Biomedical Research Foundation, Washington DC, pp. 353–358.

    Google Scholar 

Multiple Sequence Dynamic Programming

  • Feng, D. F. and Doolittle, R. F. (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees, J. Mol. Evol. 25, 351–360

    Article  PubMed  CAS  Google Scholar 

  • Genetics Computer Group (GCG), a part of Accelrys Inc., a subsidiary of Pharmacopeia Inc. (©1982–2002) Program Manual for the Wisconsin Package, Version 10.3. (http://www.accelrys.com/products/gcg-wisconsin-package).

  • Gupta, S. K., Kececioglu, J. D., and Schaffer, A. A. (1995) Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment, J. Comp. Biol. 2, 459–472.

    Article  CAS  Google Scholar 

  • Higgins, D. G., Bleasby, A. J., and Fuchs, R. (1992) CLUSTALV: improved software for multiple sequence alignment, Comp. Appl. Biol. Sci. 8, 189–191.

    CAS  Google Scholar 

  • Smith, R. F. and Smith, T. F. (1992) Pattern-induced multi-sequence alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for comparative protein modeling, Protein Eng. 5, 35–41.

    Article  PubMed  CAS  Google Scholar 

  • Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice, Nucleic Acids Res. 22, 4673–4680.

    Article  PubMed  CAS  Google Scholar 

  • Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F., and Higgins, D. G. (1997) The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools, Nucleic Acids Res. 24, 4876–4882.

    Article  Google Scholar 

Applicability Alignment Profiles

  • Eddy, S. R. (1996) Hidden Markov models, Cuff. Opin. Struct. Biol. 6, 361–365.

    Article  CAS  Google Scholar 

  • Eddy, S. R. (1998) Profile hidden Markov models, Bioinformatics 14, 755–763

    Article  PubMed  CAS  Google Scholar 

  • Gribskov, M., Luethy, R., and Eisenberg, D. (1989) Profile analysis, in: Methods in Enzymology, vol. 183, Academic Press, San Diego, CA, pp. 146–159.

    Google Scholar 

  • Gribskov M., McLachlan M., and Eisenberg, D. (1987) Profile analysis: detection of distantly related proteins, Proc. Natl. Acad. Sci. USA 84, 4355–4358.

    Article  PubMed  CAS  Google Scholar 

Complications File Formats

  • Gilbert, D. G. (1993 [C release] and 1999 [Java release]) ReadSeq, public domain software, Bioinformatics Group, Biology Department, Indiana University, Bloomington, IN. (seeWebsite: http://www.iubio.bio.indiana.edu/soft/molbio/readseq/)

The Protein System Phylogenetic Relaionships

  • The E. coliDatabase Collection (ECDC) The K12 chromosome, Justus-LiebigUniversitaet, Giessen, Germany. (seeWebsite: http://www.uni-giessen.de/ngx1052/ecdc.htm)

  • Hasegawa, M., Hashimoto, T., Adachi, J., Iwabe, N., and Miyata, T. (1993) Early branchings in the evolution of Eukaryotes: ancient divergence of Entamoeba that lacks mitochondria revealed by protein sequence data, J. Mol. Evol. 36, 380–388.

    Article  PubMed  CAS  Google Scholar 

  • Iwabe, N., Kuma, E.-I., Hasegawa, M., Osawa, S., and Miyata, T. (1989) Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes, Proc. Natl. Acad. Sci. USA 86. 9355–9359.

    Article  PubMed  CAS  Google Scholar 

  • Madsen, H. O. Poulsen, K., Dahl, O., Clark, B. F., and Hjorth, J. P. (1990) Retropseudogenes constitute the major part of the human elongation factor 1 alpha gene family, Nucleic Acids Res. 18, 1513–1516.

    Article  PubMed  Google Scholar 

  • Rivera, M. C. and Lake, J. A. (1992) Evidence that eukaryotes and eocyte prokaryotes are immediate relatives, Science 257, 74–76.

    Article  PubMed  CAS  Google Scholar 

What is Availble Running ClustaIX on Your Machine, Briefly

  • Etzold, T. and Argos, P. (1993) SRS—an indexing and retrieval tool for flat file data libraries, Comp. Appl. Biosci. 9, 49–57.

    PubMed  CAS  Google Scholar 

  • Gonnet, G. H., Cohen, M. A., and Benner, S. A. (1992) Exhaustive matching of the entire protein sequence database, Science 256, 1443–1145.

    Article  PubMed  CAS  Google Scholar 

Clustalw on the Web

  • Smith, R. F., Wiese, B. A., Wojzynski, M. K., Davison, D. B., and Worley, K. C. (1996) BCM Search Launcher—an integrated interface to molecular biology data base search and analysis services available on the World Wide Web, Genome Res. 6, 454–62.

    Article  PubMed  CAS  Google Scholar 

Multiple Sequence Alignment and Structure Prediction Alignment Secondary Structure

  • Guex, N., Diemand, A., and Peitsch, M. C. (1999) Protein modeling for all, Trends Biochem. Sci. 24, 364–367.

    Google Scholar 

  • Guex, N. and Peitsch, M. C. (1997) SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis 18. 2714–2723.

    Article  PubMed  CAS  Google Scholar 

  • Rost, B. and Sander, C. (1993) Prediction of protein secondary structure at better than 70% accuracy, J. Mol. Biol. 232, 584–599.

    Article  PubMed  CAS  Google Scholar 

  • Rost, B. and Sander, C. (1994) Combining evolutionary information and neural networks to predict protein secondary structure, Proteins 19, 55–77.

    Article  PubMed  CAS  Google Scholar 

  • Sander, C. and Schneider, R. (1991) Database of homology-derived structures and the structural meaning of sequence alignment, Proteins 9, 56–68.

    Article  PubMed  CAS  Google Scholar 

  • Sayle, R. A. and Milner-White, E. J. (1995) RasMol: biomolecular graphics for all, Trends Biochem. Sci. 20, 374–376.

    Article  CAS  Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer Science+Business Media New York

About this chapter

Cite this chapter

Thompson, S.M. (2003). An Introduction to Multiple Sequence Alignment and Analysis. In: Krawetz, S.A., Womble, D.D. (eds) Introduction to Bioinformatics. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-59259-335-4_31

Download citation

  • DOI: https://doi.org/10.1007/978-1-59259-335-4_31

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-58829-241-4

  • Online ISBN: 978-1-59259-335-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics