Multiple Sequence Alignment

Bawono, Punto; Dijkstra, Maurits; Pirovano, Walter; Feenstra, Anton; Abeln, Sanne; Heringa, Jaap

doi:10.1007/978-1-4939-6622-6_8

Punto Bawono³,
Maurits Dijkstra³,
Walter Pirovano⁴,
Anton Feenstra³,
Sanne Abeln³ &
…
Jaap Heringa³

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1525))

7198 Accesses
27 Citations

Abstract

The increasing importance of Next Generation Sequencing (NGS) techniques has highlighted the key role of multiple sequence alignment (MSA) in comparative structure and function analysis of biological sequences. MSA often leads to fundamental biological insight into sequence–structure–function relationships of nucleotide or protein sequence families. Significant advances have been achieved in this field, and many useful tools have been developed for constructing alignments, although many biological and methodological issues are still open. This chapter first provides some background information and considerations associated with MSA techniques, concentrating on the alignment of protein sequences. Then, a practical overview of currently available methods and a description of their specific advantages and limitations are given, to serve as a helpful guide or starting point for researchers who aim to construct a reliable MSA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Gribskov M, McLachlan AD, Eisenberg D (1987) Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A 84:4355–4358
Article CAS PubMed PubMed Central Google Scholar
Haussler D, Krogh A, Mian IS et al (1993) Protein modeling using hidden Markov models: analysis of globins. In: Proceedings of the Hawaii international conference on system sciences. IEEE Computer Society Press, Los Alamitos, CA
Google Scholar
Bucher P, Karplus K, Moeri N et al (1996) A flexible motif search technique based on generalized profiles. Comput Chem 20:3–23
Article CAS PubMed Google Scholar
Dayhoff MO, Schwart RM, Orcutt BC (1978) A model of evolutionary change in proteins. In: Dayhoff M (ed) Atlas of protein sequence and structure. National Biomedical Research Foundation, Washington, DC
Google Scholar
Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89:10915–10919
Article CAS PubMed PubMed Central Google Scholar
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453
Article CAS PubMed Google Scholar
Carillo H, Lipman DJ (1988) The multiple sequence alignment problem in biology. SIAM J Appl Math 48:1073–1082
Article Google Scholar
Stoye J, Moulton V, Dress AW (1997) DCA: an efficient implementation of the divide-and-conquer approach to simultaneous multiple sequence alignment. Comput Appl Biosci 13:625–626
CAS PubMed Google Scholar
Feng DF, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25:351–360
Article CAS PubMed Google Scholar
Hogeweg P, Hesper B (1984) The alignment of sets of sequences and the construction of phyletic trees: an integrated method. J Mol Evol 20:175–186
Article CAS PubMed Google Scholar
Gotoh O (1996) Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J Mol Biol 264:823–838
Article CAS PubMed Google Scholar
Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Article CAS PubMed Google Scholar
Pearson WR (1990) Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol 183:63–98
Article CAS PubMed Google Scholar
Heringa J, Taylor WR (1997) Three-dimensional domain duplication, swapping and stealing. Curr Opin Struct Biol 7:416–421
Article CAS PubMed Google Scholar
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197
Article CAS PubMed Google Scholar
Waterman MS, Eggert M (1987) A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. J Mol Biol 197:723–728
Article CAS PubMed Google Scholar
Thompson JD, Plewniak F, Poch O (1999) BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15:87–88
Article CAS PubMed Google Scholar
Heringa J (1999) Two strategies for sequence comparison: profile-preprocessed and secondary structure-induced multiple alignment. Comput Chem 23:341–364
Article CAS PubMed Google Scholar
Heringa J (2002) Local weighting schemes for protein multiple sequence alignment. Comput Chem 26:459–477
Article CAS PubMed Google Scholar
Simossis VA, Heringa J (2005) PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information. Nucleic Acids Res 33:W289–W294
Article CAS PubMed PubMed Central Google Scholar
Altschul SF, Madden TL, Schaffer AA et al (1997) Gapped BLAST and PSIBLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Article CAS PubMed PubMed Central Google Scholar
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577–2637
Article CAS PubMed Google Scholar
Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292:195–202
Article CAS PubMed Google Scholar
Rost B, Sander C (1993) Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol 232:584–599
Article CAS PubMed Google Scholar
Lin K, Simossis VA, Taylor WR et al (2005) A simple and fast secondary structure prediction method using hidden neural networks. Bioinformatics 21:152–159
Article CAS PubMed Google Scholar
Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113
Article PubMed PubMed Central Google Scholar
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797
Article CAS PubMed PubMed Central Google Scholar
Edgar RC (2004) Local homology recognition and distance measures in linear time using compressed amino acid alphabets. Nucleic Acids Res 32:380–385
Article CAS PubMed PubMed Central Google Scholar
Notredame C, Higgins DG, Heringa J (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217
Article CAS PubMed Google Scholar
Huang X, Miller W (1991) A time-efficient, linear-space local similarity algorithm. Adv Appl Math 12:337–357
Article Google Scholar
Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680
Article CAS PubMed PubMed Central Google Scholar
O’Sullivan O, Suhre K, Abergel C et al (2004) 3DCoffee: combining protein sequences and structures within multiple sequence alignments. J Mol Biol 340:385–395
Article PubMed Google Scholar
Taylor WR, Orengo CA (1989) Protein structure alignment. J Mol Biol 208:1–22
Article CAS PubMed Google Scholar
Shi J, Blundell TL, Mizuguchi K (2001) FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol 310:243–257
Article CAS PubMed Google Scholar
Wallace IM, O’Sullivan O, Higgins DG et al (2006) M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res 34:1692–1699
Article CAS PubMed PubMed Central Google Scholar
Katoh K, Misawa K, Kuma K et al (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30:3059–3066
Article CAS PubMed PubMed Central Google Scholar
Katoh K, Kuma K, Toh H et al (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33:511–518
Article CAS PubMed PubMed Central Google Scholar
Gotoh O (1995) A weighting system and algorithm for aligning many phylogenetically related sequences. Comput Appl Biosci 11:543–551
CAS PubMed Google Scholar
Altschul SF (1998) Generalized affine gap costs for protein sequence alignment. Proteins 32:88–96
Article CAS PubMed Google Scholar
Zachariah MA, Crooks GE, Holbrook SR et al (2005) A generalized affine gap model significantly improves protein sequence alignment accuracy. Proteins 58:329–338
Article CAS PubMed Google Scholar
Do CB, Mahabhashyam MS, Brudno M et al (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15:330–340
Article CAS PubMed PubMed Central Google Scholar
Holmes I, Durbin R (1998) Dynamic programming alignment accuracy. J Comput Biol 5:493–504
Article CAS PubMed Google Scholar
Lassmann T, Sonnhammer ELL (2005) Kalign: an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics 6(1):298
Article PubMed PubMed Central Google Scholar
Wu S, Manber U (1992) Fast text searching allowing errors. Commun ACM 35:83–91
Article Google Scholar
Liu Y, Schmidt B, Maskell DL (2010) MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities. Bioinformatics 26(16):1958–1964
Article CAS PubMed Google Scholar
Sievers F, Wilm A, Dineen D, Li W, Lopez R, McWilliam H, Remmert M, Söding J, Thompson JD, Higgins DG (2011) Fast, scalable generation of high quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7(1):539
Article PubMed PubMed Central Google Scholar
Söding J (2005) Protein homology detection by HMM–HMM comparison. Bioinformatics 21(7):951–960
Article PubMed Google Scholar
Blackshields G, Sievers F, Shi W, Wilm A, Higgins DG (2010) Sequence embedding for fast construction of guide trees for multiple sequence alignment. Algorithms Mol Biol 5:21
Article PubMed PubMed Central Google Scholar
Rost B (1999) Twilight zone of protein sequence alignments. Protein Eng 12:85–94
Article CAS PubMed Google Scholar
Morgenstern B, Dress A, Werner T (1996) Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc Natl Acad Sci U S A 93:12098–12103
Article CAS PubMed PubMed Central Google Scholar
Morgenstern B (2004) DIALIGN: multiple DNA and protein sequence alignment at BiBiServ. Nucleic Acids Res 32:W33–W36
Article CAS PubMed PubMed Central Google Scholar
Sammeth M, Heringa J (2006) Global multiple-sequence alignment with repeats. Prot Struct Funct Bioinf 64:263–274
Article CAS Google Scholar
Phuong TM, Choung BD, Edgar RC, Batzoglou S (2006) Multiple alignment of protein sequences with repeats and rearrangements. Nucleic Acids Res 34:5932–5942
Article CAS PubMed PubMed Central Google Scholar
Krogh A, Larsson B, von Heijne G et al (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 305:567–580
Article CAS PubMed Google Scholar
Kall L, Krogh A, Sonnhammer EL (2004) A combined transmembrane topology and signal peptide prediction method. J Mol Biol 338:1027–1036
Article CAS PubMed Google Scholar
Clamp M, Cuff J, Searle SM et al (2004) The Jalview Java alignment editor. Bioinformatics 20:426–427
Article CAS PubMed Google Scholar
Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425
CAS PubMed Google Scholar
Galtier N, Gouy M, Gautier C (1996) SEAVIEW and PHYLO_WIN: two graphic tools for sequence alignment and molecular phylogeny. Comput Appl Biosci 12:543–548
CAS PubMed Google Scholar
Li W-H, Graur D (1991) Fundamentals of molecular evolution. Sinauer, Sunderland, MA
Google Scholar
Gille C, Frommel C (2001) STRAP: editor for STRuctural Alignments of Proteins. Bioinformatics 17:377–378
Article CAS PubMed Google Scholar
Parry-Smith DJ, Payne AW, Michie AD et al (1998) CINEMA—a novel colour INteractive editor for multiple alignments. Gene 221:GC57–GC63
Article CAS PubMed Google Scholar
Attwood TK, Beck ME, Bleasby AJ et al (1997) Novel developments with the PRINTS protein fingerprint database. Nucleic Acids Res 25:212–217
Article CAS PubMed PubMed Central Google Scholar
Golubchik T, Wise MJ, Easteal S, Jermiin LS (2007) Mind the gaps: evidence of bias in estimates of multiple sequence alignments. Mol Biol Evol 24(11):2433–2442
Article CAS PubMed Google Scholar
Raghava GPS, Searle SMJ, Audley PC, Barber JD, Barton GJ (2003) OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy. BMC Bioinformatics 4(1):47
Article CAS PubMed PubMed Central Google Scholar
Van Walle I, Lasters I, Wyns L (2005) SABmark—a benchmark for sequence alignment that covers the entire known fold space. Bioinformatics 21(7):1267–1268
Article PubMed Google Scholar
Cline M, Hughey R, Karplus K (2002) Predicting reliable regions in protein sequence alignments. Bioinformatics 18(2):306–314
Article CAS PubMed Google Scholar
Bawono P, van der Velde A, Abeln S, Heringa J (2015) Quantifying the displacement of mismatches in multiple sequence alignment benchmarks. PLoS ONE 10(5):e0127431
Article PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Integrative Bioinformatics, Vrije Universiteit, Amsterdam, The Netherlands
Punto Bawono, Maurits Dijkstra, Anton Feenstra, Sanne Abeln & Jaap Heringa
Bioinformatics Department, BaseClear, Leiden, The Netherlands
Walter Pirovano

Authors

Punto Bawono
View author publications
You can also search for this author in PubMed Google Scholar
Maurits Dijkstra
View author publications
You can also search for this author in PubMed Google Scholar
Walter Pirovano
View author publications
You can also search for this author in PubMed Google Scholar
Anton Feenstra
View author publications
You can also search for this author in PubMed Google Scholar
Sanne Abeln
View author publications
You can also search for this author in PubMed Google Scholar
Jaap Heringa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jaap Heringa .

Editor information

Editors and Affiliations

Monash University, Melbourne, Victoria, Australia
Jonathan M. Keith

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Bawono, P., Dijkstra, M., Pirovano, W., Feenstra, A., Abeln, S., Heringa, J. (2017). Multiple Sequence Alignment. In: Keith, J. (eds) Bioinformatics. Methods in Molecular Biology, vol 1525. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6622-6_8

Download citation

DOI: https://doi.org/10.1007/978-1-4939-6622-6_8
Published: 29 November 2016
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-6620-2
Online ISBN: 978-1-4939-6622-6
eBook Packages: Springer Protocols

Publish with us

Policies and ethics