MAFFT: Iterative Refinement and Additional Methods

Katoh, Kazutaka; Standley, Daron M.

doi:10.1007/978-1-62703-646-7_8

Kazutaka Katoh^3,4 &
Daron M. Standley³

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1079))

5999 Accesses
242 Citations

Abstract

This chapter outlines several methods implemented in the MAFFT package. MAFFT is a popular multiple sequence alignment (MSA) program with various options for the progressive method, the iterative refinement method and other methods. We first outline basic usage of MAFFT and then describe recent practical extensions, such as dot plot and adjustment of direction in DNA alignment. We also refer to MUSCLE, another high-performance MSA program.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30:3059–3066
Article PubMed CAS Google Scholar
Nuin PA, Wang Z, Tillier ER (2006) The accuracy of several multiple sequence alignment programs for proteins. BMC Bioinformatics 7:471
Article PubMed Google Scholar
Dessimoz C, Gil M (2010) Phylogenetic assessment of alignments reveals neglected tree signal in gaps. Genome Biol 11:R37
Article PubMed Google Scholar
Letsch HO, Kuck P, Stocsits RR, Misof B (2010) The impact of rRNA secondary structure consideration in alignment and tree reconstruction: simulated data and a case study on the phylogeny of hexapods. Mol Biol Evol 27:2507–2521
Article PubMed CAS Google Scholar
Sahraeian SM, Yoon BJ (2011) PicXAA-R: efficient structural alignment of multiple RNA sequences using a greedy approach. BMC Bioinformatics 12(Suppl 1):S38
Article PubMed Google Scholar
Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Soding J, Thompson JD, Higgins DG (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7:539
Article PubMed Google Scholar
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797
Article PubMed CAS Google Scholar
Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113
Article PubMed Google Scholar
Feng DF, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25:351–360
Article PubMed CAS Google Scholar
Higgins DG, Sharp PM (1988) CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene 73:237–244
Article PubMed CAS Google Scholar
Wilbur WJ, Lipman DJ (1983) Rapid similarity searches of nucleic acid and protein data banks. Proc Natl Acad Sci USA 80:726–730
Article PubMed CAS Google Scholar
Loytynoja A, Goldman N (2008) Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320:1632–1635
Article PubMed Google Scholar
Lassmann T, Sonnhammer EL (2005) Kalign—an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics 6:298
Article PubMed Google Scholar
Barton GJ, Sternberg MJ (1987) A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. J Mol Biol 198:327–337
CAS Google Scholar
Berger MP, Munson PJ (1991) A novel randomized iterative strategy for aligning multiple protein sequences. Comput Appl Biosci 7:479–484
PubMed CAS Google Scholar
Gotoh O (1993) Optimal alignment between groups of sequences and its application to multiple sequence alignment. Comput Appl Biosci 9:361–370
PubMed CAS Google Scholar
Gotoh O (1995) A weighting system and algorithm for aligning many phylogenetically related sequences. Comput Appl Biosci 11:543–551
PubMed CAS Google Scholar
Ishikawa M, Toya T, Hoshida M, Nitta K, Ogiwara A, Kanehisa M (1993) Multiple sequence alignment by parallel simulated annealing. Comput Appl Biosci 9:267–273
PubMed CAS Google Scholar
Notredame C, Higgins DG (1996) Saga: sequence alignment by genetic algorithm. Nucleic Acids Res 24:1515–1524
Article PubMed CAS Google Scholar
Gotoh O (1996) Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J Mol Biol 264:823–838
Article PubMed CAS Google Scholar
Hirosawa M, Totoki Y, Hoshida M, Ishikawa M (1995) Comprehensive study on iterative algorithms of multiple sequence alignment. Comput Appl Biosci 11:13–18
PubMed CAS Google Scholar
Vingron M, Argos P (1989) A fast and sensitive multiple sequence alignment algorithm. Comput Appl Biosci 5:115–121
PubMed CAS Google Scholar
Gotoh O (1990) Consistency of optimal sequence alignments. Bull Math Biol 52:509–525
PubMed CAS Google Scholar
Notredame C, Holm L, Higgins DG (1998) COFFEE: an objective function for multiple sequence alignments. Bioinformatics 14:407–422
Article PubMed CAS Google Scholar
Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217
Article PubMed CAS Google Scholar
Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15:330–340
Article PubMed CAS Google Scholar
Roshan U, Livesay DR (2006) Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22:2715–2721
Article PubMed CAS Google Scholar
Pei J, Grishin NV (2007) PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 23:802–808
Article PubMed CAS Google Scholar
Liu Y, Schmidt B, Maskell DL (2010) MSAProbs: multiple sequence alignment based on pair hidden markov models and partition function posterior probabilities. Bioinformatics 26:1958–1964
Article PubMed CAS Google Scholar
Katoh K, Toh H (2008) Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform 9:286–298
Article PubMed CAS Google Scholar
Katoh K, Toh H (2008) Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework. BMC Bioinformatics 9:212
Article PubMed Google Scholar
McCaskill JS (1990) The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29:1105–1119
Article PubMed CAS Google Scholar
Tabei Y, Tsuda K, Kin T, Asai K (2006) SCARNA: fast and accurate structural alignment of rna sequences by matching fixed-length stem fragments. Bioinformatics 22:1723–1729
Article PubMed CAS Google Scholar
Hofacker IL, Fekete M, Stadler PF (2002) Secondary structure prediction for aligned RNA sequences. J Mol Biol 319:1059–1066
Article PubMed CAS Google Scholar
Tabei Y, Kiryu H, Kin T, Asai K (2008) A fast structural multiple alignment method for long RNA sequences. BMC Bioinformatics 9:33
Article PubMed Google Scholar
Hamada M, Sato K, Kiryu H, Mituyama T, Asai K (2009) CentroidAlign: fast and accurate aligner for structured RNAs by maximizing expected sum-of-pairs score. Bioinformatics 25:3236–3243
Article PubMed CAS Google Scholar
Wilm A, Higgins DG, Notredame C (2008) R-Coffee: a method for multiple alignment of non-coding RNA. Nucleic Acids Res 36:e52
Article PubMed Google Scholar
Katoh K, Frith MC (2012) Adding unaligned sequences into an existing alignment using MAFFT and LAST. Bioinformatics 28:3144–3146
Google Scholar
Katoh K, Toh H (2007) PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences. Bioinformatics 23:372–374
Article PubMed CAS Google Scholar
Blackshields G, Sievers F, Shi W, Wilm A, Higgins DG (2010) Sequence embedding for fast construction of guide trees for multiple sequence alignment. Algorithms Mol Biol 5:21
Article PubMed Google Scholar
Kiełbasa SM, Wan R, Sato K, Horton P, Frith MC (2011) Adaptive seeds tame genomic sequence comparison. Genome Res 21:487–493
Article PubMed Google Scholar
Katoh K, Toh H (2010) Parallelization of the MAFFT multiple sequence alignment program. Bioinformatics 26:1899–1900
Article PubMed CAS Google Scholar
Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer EL, Eddy SR, Bateman A, Finn RD (2012) The Pfam protein families database. Nucleic Acids Res 40:D290–D301
Article PubMed CAS Google Scholar
Sigrist CJ, Cerutti L, deCastro E, Langendijk-Genevaux PS, Bulliard V, Bairoch A, Hulo N (2010) PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res 38:D161–D166
Article PubMed CAS Google Scholar
Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, Kulam-Syed-Mohideen AS, McGarrell DM, Marsh T, Garrity GM, Tiedje JM (2009) The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37:D141–D145
Article PubMed CAS Google Scholar
Berger SA, Stamatakis A (2011) Aligning short reads to reference alignments and trees. Bioinformatics 27:2068–2075
Article PubMed CAS Google Scholar
Sun H, Buhler JD (2012) PhyLAT: a phylogenetic local alignment tool. Bioinformatics 28:1336–1344
Article PubMed CAS Google Scholar
Löytynoja A, Vilella AJ, Goldman N (2012) Accurate extension of multiple sequence alignments using a phylogeny-aware graph algorithm. Bioinformatics 28:1684–1691
Article PubMed Google Scholar
Mirarab S, Nguyen N, Warnow T (2012) SEPP: SATé-Enabled phylogenetic placement. Pac Symp Biocomput 17:247–258
Google Scholar
Cannone JJ, Subramanian S, Schnare MN, Collett JR, D’Souza LM, Du Y, Feng B, Lin N, Madabusi LV, Muller KM, Pande N, Shang Z, Yu N, Gutell RR (2002) The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics 3:2
Article PubMed Google Scholar
O’Sullivan O, Suhre K, Abergel C, Higgins DG, Notredame C (2004) 3DCoffee: combining protein sequences and structures within multiple sequence alignments. J Mol Biol 340:385–395
Article PubMed Google Scholar
Pei J, Kim BH, Grishin NV (2008) PROMALS3D: a tool for multiple protein sequence and structure alignments. Nucleic Acids Res 36:2295–2300
Article PubMed CAS Google Scholar
Standley DM, Toh H, Nakamura H (2004) Detecting local structural similarity in proteins by maximizing number of equivalent residues. Proteins 57:381–391
Article PubMed CAS Google Scholar
Taylor WR, Orengo CA (1989) Protein structure alignment. J Mol Biol 208:1–22
Article PubMed CAS Google Scholar
Orengo CA, Taylor WR (1993) A local alignment method for protein structure motifs. J Mol Biol 233:488–497
Article PubMed CAS Google Scholar
Toh H (1997) Introduction of a distance cut-off into structural alignment by the double dynamic programming algorithm. Comput Appl Biosci 13:387–396
PubMed CAS Google Scholar
Katoh K, Asimenos G, Toh H (2009) Multiple alignment of DNA sequences with MAFFT. Methods Mol Biol 537:39–64
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Immunology Frontier Research Center, Osaka University, Suita, Japan
Kazutaka Katoh & Daron M. Standley
Computational Biology Research Center, The National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
Kazutaka Katoh

Authors

Kazutaka Katoh
View author publications
You can also search for this author in PubMed Google Scholar
Daron M. Standley
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Electrical Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
David J Russell

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Katoh, K., Standley, D.M. (2014). MAFFT: Iterative Refinement and Additional Methods. In: Russell, D. (eds) Multiple Sequence Alignment Methods. Methods in Molecular Biology, vol 1079. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-646-7_8

Download citation

DOI: https://doi.org/10.1007/978-1-62703-646-7_8
Published: 23 August 2013
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-645-0
Online ISBN: 978-1-62703-646-7
eBook Packages: Springer Protocols

Publish with us

Policies and ethics