Sequence Alignment

Axelson-Fisk, Marina

doi:10.1007/978-1-84996-104-2_3

Sequence Alignment

Marina Axelson-Fisk²

Chapter
First Online: 01 January 2010

1040 Accesses

Part of the book series: Computational Biology ((COBO,volume 11))

Abstract

Before we can discuss comparative gene finding in this chapter we go through some of the basic theory behind sequence alignment. The chapter is divided into two parts. In the first part we describe the basic concepts of pairwise alignments, including substitution schemes and gap models, and move on to the application of dynamic programming to global and local alignments. We finish off by giving an overview of heuristic database searches and the statistical foundation they rest upon. The extension of dynamic programming to multiple alignments is complicated by the increased computational complexity. As a result, a flora of heuristic alignment algorithms have evolved. In the second part of the chapter we give an account of the most common of these models, including progressive alignments, iterative methods, hidden Markov models, genetic algorithms, simulated annealing, and alignment profiles.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alexandersson, M., Bray, N., Pachter, L.: Pair hidden Markov models. In: Jorde, L.B., Little, P., Dunn, M., Subramanian, S. (eds.) Encyklopedia of Genetics, Genomics, Proteomics and Bioinformatics, Chap. 4.2 (17) (2005)
Google Scholar
Altschul, S.F.: Gap costs for multiple alignments. J. Theor. Biol. 138, 297–309 (1989)
Article MathSciNet Google Scholar
Altschul, S.F.: Amino acid substitution matrices from an information theoretic perspective. J. Mol. Biol. 219, 555–565 (1991)
Article Google Scholar
Altschul, S.F., Gish, W.: Local alignment statistics. Methods Enzymol. 266, 460–480 (1996)
Article Google Scholar
Altschul, S.F., Carroll, R.J., Lipman, D.J.: Weights for data related by a tree. J. Mol. Biol. 207, 647–653 (1989)
Article Google Scholar
Altschul, S.F., Gish, W., Miller, W., Myers, E.M., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)
Google Scholar
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Article Google Scholar
Baldi, P., Chauvin, Y., Hunkapiller, T., McClure, M.A.: Hidden Markov models of biological primary sequence information. Proc. Natl. Acad. Sci. USA 91, 1059–1063 (1994)
Article Google Scholar
Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Wheeler, D.L.: GenBank. Nucleic Acids Res. 36, D25–D30 (2008)
Article Google Scholar
Berger, M.P., Munson, P.J.: A novel randomized iterative strategy for aligning multiple protein sequences. Comput. Appl. Biosci. 7, 479–484 (1991)
Google Scholar
Bishop, M.J., Rawlings, C.J. (eds.) DNA and Protein Sequence Analysis. Oxford University Press, Oxford (1997)
Google Scholar
Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.-C., Estreicher, A., Gasteiger, E., Martin, M.J., Mochoud, K., O’Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The SWISS-PROT protein knowledgebase and its supplement in 2003. Nucleic Acids Res. 31, 365–370 (2003)
Article Google Scholar
Brudno, M., Chapman, M., Göttgens, B., Batzoglou, S., Morgenstern, B.: Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinf. 4, 66 (2003)
Article Google Scholar
Carrillo, H., Lipman, D.: The multiple sequence alignment problem in biology. SIAM J. Appl. Math. 48, 1073–1082 (1988)
Article MATH MathSciNet Google Scholar
Černý, V.: Thermodynamical approach to the traveling salesman problem: an efficient simulation algorithm. J. Optim. Theory Appl. 45, 41–51 (1985)
Article MATH MathSciNet Google Scholar
Dayhoff, M.O.: Atlas of Protein Sequence and Structure. National Biomedical Research Foundation (1969)
Google Scholar
Dayhoff, M.O., Schwartz, R.M.: Matrices for detecting distant relationships. In: Dayhoff, M.O. (ed.) Atlas of Protein Sequence and Structure, vol. 5, pp. 353–358 (1978)
Google Scholar
Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model of evolutionary change in proteins. In: Dayhoff, M.O. (ed.) Atlas of Protein Sequence and Structure, vol. 5, pp. 345–352 (1978)
Google Scholar
Durbin, R., Eddy, S., Krogh, A., Mitchinson, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
MATH Google Scholar
Eddy, S.R.: Multiple alignment using hidden Markov models. Proc. Int. Conf. Intell. Syst. Mol. Biol. 3, 114–120 (1995)
Google Scholar
Eddy, S.R.: Profile hidden Markov models. Bioinformatics 14, 755–763 (1998)
Article Google Scholar
Edgar, R.C.: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinf. 5, 113 (2004)
Article Google Scholar
Edgar, R.C., Batzoglou, S.: Multiple sequence alignment. Curr. Opin. Struct. Biol. 16, 368–373 (2006)
Article Google Scholar
Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981)
Article Google Scholar
Feng, D.F., Johnson, M.S., Dolittle, R.F.: Aligning amino acid sequences: comparison of commonly used methods. J. Mol. Evol. 21, 112–125 (1985)
Article Google Scholar
Feng, D.F., Doolittle, R.F.: Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol. 25, 351–360 (1987)
Article Google Scholar
Finn, R.D., Tate, J., Mistry, J., Coggill, P.C., Sammut, S.J., Hotz, H.-R., Ceric, G., Forslund, K., Eddy, S.R., Sonnhammer, E.L.L., Bateman, A.: The Pfam protein families database. Nucleic Acids Res. 36, D281–D288 (2008)
Article Google Scholar
Fitch, W.M.: Random sequences. J. Mol. Biol. 163, 171–176 (1983)
Article Google Scholar
Fitch, W.M., Margoliash, E.: Construction of phylogenetic trees. Science 155, 279–284 (1967)
Article Google Scholar
Gibbs, A.J., McIntyre, G.A.: The diagram, a method for comparing sequences. Its use with amino acid and nucleotide sequences. Eur. J. Biochem. 16, 1–11 (1970)
Article Google Scholar
Gonnet, G.H., Cohen, M.A., Benner, S.A.: Exhaustive matching of the entire protein sequence database. Science 256, 1443–1445 (1992)
Article Google Scholar
Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982)
Article Google Scholar
Gotoh, O.: Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J. Mol. Biol. 264, 823–838 (1996)
Article Google Scholar
Gotoh, O.: Multiple sequence alignments: algorithms and applications. Adv. Biophys. 36, 159–206 (1999)
Article Google Scholar
Gribskov, M., McLachlan, A.D., Eisenberg, D.: Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. USA 84, 4355–4358 (1987)
Article Google Scholar
Gumbel, E.J.: Statistics of Extremes. Columbia University Press, New York (1958)
MATH Google Scholar
Gupta, S.K., Kececioglu, J.D., Schäffer, A.A.: Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment. J. Comput. Biol. 2, 459–472 (1995)
Article Google Scholar
Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
MATH Google Scholar
Hasegawa, M., Kishino, H., Yano, T.: Dating of human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22, 160–174 (1985)
Article Google Scholar
Haussler, D., Krogh, A., Mian, I.S., Sjölander, K.: Protein modeling using hidden Markov models: analysis of globins. HICSS-26 1, 792–802 (1993)
Google Scholar
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992)
Article Google Scholar
Henikoff, S., Henikoff, J.G., Pietrokovski, S.: Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations. Bioinformatics 15, 471–479 (1999)
Article Google Scholar
Henikoff, J.G., Greene, E.A., Pietrokovski, S., Henikoff, S.: Increased coverage of protein families with the blocks database servers. Nucleic Acids Res. 28, 228–230 (2000)
Article Google Scholar
Higgins, D.G., Sharp, P.M.: CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene 73, 237–244 (1988)
Article Google Scholar
Hirosawa, M., Totoki, Y., Hoshida, M., Ishikawa, M.: Comprehensive study on iterative algorithms of multiple sequence alignment. Comput. Appl. Biosci. 11, 13–18 (1995)
Google Scholar
Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor (1975)
Google Scholar
Hughey, R., Krogh, A.: Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput. Appl. Biosci. 12, 95–108 (1996)
Google Scholar
Jones, D.T., Taylor, W.R., Thornton, J.M.: The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8, 275–282 (1992)
Google Scholar
Jukes, T.H., Cantor, C.R.: Evolution of protein molecules. In: Munro, H.N. (ed.) Mammalian Protein Metabolism, pp. 21–123. Academic Press, New York (1969)
Google Scholar
Karlin, S., Altschul, S.F.: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268 (1990)
Article MATH Google Scholar
Katoh, K., Misawa, K., Kuma, K., Miyata, T.: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002)
Article Google Scholar
Kim, J., Pramanik, S., Chung, M.J.: Multiple sequence alignment using simulated annealing. Comput. Appl. Biosci. 10, 419–426 (1994)
Google Scholar
Kimura, M.: A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16, 111–120 (1980)
Article Google Scholar
Kimura, M.: Estimation of evolutionary distances between homologous nucleotide sequences. Proc. Natl. Acad. Sci. USA 78, 454–458 (1981)
Article MATH Google Scholar
Kimura, M.: The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge (1983)
Google Scholar
Kimura, M., Ohta, T.: On the stochastic model for estimation of mutational distances between homologous proteins. J. Mol. Evol. 2, 87–90 (1972)
Article Google Scholar
Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220, 671–680 (1983)
Article MathSciNet Google Scholar
Krogh, A., Brown, M., Mian, I.S., Sjölander, K., Haussler, D.: Hidden Markov models in computational biology: applications to protein modeling. J. Mol. Biol. 235, 1501–1531 (1994)
Article Google Scholar
Kruskal, J.B.: An overview of sequence comparison: time warps, string edits, and macromolecules. SIAM Rev. 25, 201–237 (1983)
Article MATH MathSciNet Google Scholar
Lipman, D.J., Pearson, W.R.: Rapid and sensitive protein similarity searches. Science 227, 1435–1441 (1985)
Article Google Scholar
Lipman, D.J., Altschul, S.F., Kececioglu, J.D.: A tool for multiple sequence alignment. Proc. Natl. Acad. Sci. USA 86, 4412–4415 (1989)
Article Google Scholar
Lüthy, R., Xenarios, I., Bucher, P.: Improving the sensitivity of the sequence profile method. Protein Sci. 3, 139–146 (1994)
Article Google Scholar
Maizel, J.V., Lenk, R.P.: Enhanced graphic matrix analysis of nucleic acid and protein sequences. Proc. Natl. Acad. Sci. USA 78, 7665–7669 (1981)
Article MathSciNet Google Scholar
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equations of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092 (1953)
Article Google Scholar
Meyer, I.M., Durbin, R.: Comparative ab initio prediction of gene structures using pair HMMs. Bioinformatics 18, 1309–1318 (2002)
Article Google Scholar
Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge (1998)
MATH Google Scholar
Morgenstern, B., Frech, K., Dress, A., Werner, T.: DIALIGN: finding local similarities by multiple sequence alignment. Bioinformatics 14, 290–294 (1998)
Article Google Scholar
Morgenstern, B.: DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211–218 (1999)
Article Google Scholar
Morrison, D.R.: PATRICIA—practical algorithm to retrieve information coded in alphanumeric. J. ACM 15, 514–534 (1968)
Google Scholar
Mott, R.: Maximum-likelihood estimation of the statistical distribution of Smith–Waterman local sequence similarity scores. Bull. Math. Biol. 54, 59–75 (1992)
MATH Google Scholar
Müller, T., Spang, R., Vingron, T.: Estimating amino acid substitution models: a comparison of Dayhoff’s estimator, the resolvent approach and a maximum likelihood method. Mol. Biol. Evol. 19, 8–13 (2002)
Google Scholar
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Article Google Scholar
Notredame, C.: Recent evolutions of multiple sequence alignment algorithms. PLoS Comput. Biol. 3, e123 (2007)
Article Google Scholar
Notredame, C., Higgins, D.G.: SAGA: sequence alignment by genetic algorithm. Nucleic Acids Res. 24, 1515–1524 (1996)
Article Google Scholar
Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000)
Article Google Scholar
Pearson, W.R.: Empirical statistical estimates for sequence similarity searches. J. Mol. Biol. 276, 71–84 (1998)
Article Google Scholar
Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 86, 2444–2448 (1988)
Article Google Scholar
Pustell, J., Kafatos, C.: A high speed. high capacity homology matrix: zooming through SV40 and polyoma. Nucleic Acids Res. 10, 4765–4782 (1982)
Google Scholar
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 257–286 (1989)
Article Google Scholar
Saitou, N., Nei, M.: Neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)
Google Scholar
Sankoff, D., Kruskal, J.B.: Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, New York (1983)
Google Scholar
Sigrist, C.J.A., Cerutti, L., Hulo, N., Gattiker, A., Falquet, L., Pagni, M., Bairoch, A., Bucher, P.: PROSITE: a documented database using patterns and profiles as motif descriptors. Brief. Bioinf. 3, 265–274 (2002)
Article Google Scholar
Smith, T.F., Waterman, M.S.: Comparison of biosequences. Adv. Appl. Math. 2, 482–489 (1981)
Article MATH MathSciNet Google Scholar
Smith, T.F., Waterman, M.S., Burks, C.: The statistical distribution of nucleic acid similarities. Nucleic Acids Res. 13, 645–656 (1985)
Article Google Scholar
Sneath, P.H.A., Sokal, R.R.: Numerical Taxonomy. Freeman, San Francisco (1973)
MATH Google Scholar
Staden, R.: An interactive graphics program for comparing and aligning nucleic acid and amino acid sequences. Nucleic Acids Res. 10, 2951–2961 (1982)
Article Google Scholar
Steinmetz, M., Frelinger, J.G., Fisher, D., Hunkapiller, T., Pereira, D., Weissman, S.M., Uehara, H., Nathenson, S., Hood, L.: Three cDNA clones encoding mouse transplantation antigens: homology to immunoglobulin genes. Cell 24, 125–134 (1981)
Article Google Scholar
Tamura, K., Nei, M.: Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10, 512–526 (1993)
Google Scholar
Tavare, S.: Some probabilistic and statistical problems in the analysis of DNA sequences. In: Lectures on Mathematics in the Life Sciences, vol. 17, pp. 57–86 (1986)
Google Scholar
Thompson, J.D., Higgins, D.G., Gibson, T.J.: Improved sensitivity of profile searches through the use of sequence weights and gap excision. Comput. Appl. Biosci. 10, 19–29 (1994)
Google Scholar
Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994)
Article Google Scholar
The UniProt Consortium: The universal protein resource (UniProt) 2009. Nucleic Acids Res. 37, D169–D174 (2009)
Article Google Scholar
Wallace, I.M., Blackshields, G., Higgins, D.G.: Multiple sequence alignments. Curr. Opin. Struct. Biol. 15, 261–266 (2005)
Article Google Scholar
Wang, J., Jiang, T.: On the complexity of multiple sequence alignment. J. Comput. Biol. 1, 337–448 (1994)
Article Google Scholar
Waterman, M.S., Smith, T.F., Beyer, W.A.: Some biological sequence metrics. Adv. Math. 20, 367–387 (1976)
Article MATH MathSciNet Google Scholar
Waterman, M.S.: Introduction to Computational Biology: Maps, Sequences and Genomes. Chapman & Hall/CRC, London (1995)
MATH Google Scholar
Wilbur, W.J., Lipman, D.J.: Rapid similarity searches of nucleic acid and protein data banks. Proc. Natl. Acad. Sci. USA 80, 726–730 (1983)
Article Google Scholar
Wilbur, W.J., Lipman, D.J.: The context dependent comparison of biological sequences. SIAM J. Appl. Math. 44, 557–567 (1984)
Article MATH MathSciNet Google Scholar
Yang, Z.: Estimating the pattern of nucleotide substitution. J. Mol. Evol. 39, 105–111 (1994)
Google Scholar
Zaki, M.J., Bystroff, C.: Protein structure prediction. In: Zaki, M.J., Bystroff, C. (eds.) Methods in Molecular Biology, vol. 413. Humana Press, Clifton (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. Mathematical Sciences, Chalmers University of Technology, Eklandgatan 86, 412 96, Göteborg, Sweden
Dr. Marina Axelson-Fisk

Authors

Dr. Marina Axelson-Fisk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marina Axelson-Fisk .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Axelson-Fisk, M. (2010). Sequence Alignment. In: Comparative Gene Finding. Computational Biology, vol 11. Springer, London. https://doi.org/10.1007/978-1-84996-104-2_3

Download citation

DOI: https://doi.org/10.1007/978-1-84996-104-2_3
Published: 30 January 2010
Publisher Name: Springer, London
Print ISBN: 978-1-84996-103-5
Online ISBN: 978-1-84996-104-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics