PRALINE: A Versatile Multiple Sequence Alignment Toolkit

Bawono, Punto; Heringa, Jaap

doi:10.1007/978-1-62703-646-7_16

Punto Bawono^3,4 &
Jaap Heringa^5,4

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1079))

5228 Accesses
44 Citations

Abstract

Profile ALIgNmEnt (PRALINE) is a versatile multiple sequence alignment toolkit. In its main alignment protocol, PRALINE follows the global progressive alignment algorithm. It provides various alignment optimization strategies to address the different situations that call for protein multiple sequence alignment: global profile preprocessing, homology-extended alignment, secondary structure-guided alignment, and transmembrane aware alignment. A number of combinations of these strategies are enabled as well.

PRALINE is accessible via the online server http://www.ibi.vu.nl/programs/PRALINEwww/. The server facilitates extensive visualization possibilities aiding the interpretation of alignments generated, which can be written out in pdf format for publication purposes. PRALINE also allows the sequences in the alignment to be represented in a dendrogram to show their mutual relationships according to the alignment. The chapter ends with a discussion of various issues occurring in multiple sequence alignment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
PRALINE finds the PDB identifier of a protein by extracting it from the fasta definition line of that protein. For example, these description lines are fine: “>102L_A,” “>102L|A,” and “>102LA”. For any other description line, PDB identifier is not extracted. No description may follow the sequence identifier. Thus “>pdb|102L|A”, “>gi|157829524|pdb|102L|A”, and also “>102L_A ” (note the trailing space) are skipped.

References

Sankoff D, Cedergren RJ (1983) Simultaneous comparison of three or more sequences related by a tree, time warps, string edits and macromolecules. The theory and practice of sequence comparison. Addison-Wesley, Reading, MA, pp 253–263
Google Scholar
Hogeweg P, Hesper B (1984) The alignment of sets of sequences and the construction of phyletic trees: an integrated method. J Mol Evol 20:175–186
Article PubMed CAS Google Scholar
Feng DF, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25:351–360
Article PubMed CAS Google Scholar
Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680
Article PubMed CAS Google Scholar
Gotoh O (1996) Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J Mol Biol 264:823–838
Article PubMed CAS Google Scholar
Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217
Article PubMed CAS Google Scholar
Heringa J (1999) Two strategies for sequence comparison: profile-preprocessed and secondary structure-induced multiple alignment. Comput Chem 23:341–364
Article PubMed CAS Google Scholar
Heringa J (2002) Local weighting schemes for protein multiple sequence alignment. Comput Chem 26:459–477
Article PubMed CAS Google Scholar
Katoh K, Kuma K, Toh H et al (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33:511–518
Article PubMed CAS Google Scholar
Edgar RC, Sjölander K (2004) A comparison of scoring functions for protein sequence profile alignment. Bioinformatics 20:1301–1308
Article PubMed CAS Google Scholar
Wang G, Dunbrack RL Jr (2004) Scoring profile-to-profile sequence alignments. Protein Sci 13:1612–1626
Article PubMed CAS Google Scholar
Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89:10915–10919
Article PubMed CAS Google Scholar
Dayhoff MO, Barker WC, Hunt LT (1983) Establishing homologies in protein sequences. Methods Enzymol 91:524–545
Article PubMed CAS Google Scholar
Vogt G, Etzold T, Argos P (1995) An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. J Mol Biol 249:816–831
Article PubMed CAS Google Scholar
Yona G, Brenner SE (2000) Comparison of protein sequences and practical database searching. In: Higgins D, Taylor W (eds) Bioinformatics: sequence, structure, and data-banks. A practical approach. Oxford University Press, New York, pp 167–190
Google Scholar
Rost B (1999) Twilight zone of protein sequence alignments. Protein Eng 12:85–94
Article PubMed CAS Google Scholar
Yu Y-K, Wootton JC, Altschul SF (2003) The compositional adjustment of amino acid substitution matrices. Proc Natl Acad Sci 100:15688–15693
Article PubMed CAS Google Scholar
Simossis VA, Kleinjung J, Heringa J (2005) Homology-extended sequence alignment. Nucleic Acids Res 33:816–824
Article PubMed CAS Google Scholar
Sander C, Schneider R (1991) Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9:56–68
Article PubMed CAS Google Scholar
Chothia C, Lesk AM (1986) The relation between the divergence of sequence and structure in proteins. EMBO J 5:823–826
PubMed CAS Google Scholar
Simossis VA, Heringa J (2004) The influence of gapped positions in multiple sequence alignments on secondary structure prediction methods. Comput Biol Chem 28:351–366
Article PubMed CAS Google Scholar
Heringa J (2000) Computational methods for protein secondary structure prediction using multiple sequence alignments. Curr Protein Pept Sci 1:273–301
Article PubMed CAS Google Scholar
Chung R, Yona G (2004) Protein family comparison using statistical models and predicted structural information. BMC Bioinformatics 5:183
Article PubMed Google Scholar
Ginalski K, Pas J, Wyrwicz LS et al (2003) ORFeus: Detection of distant homology using sequence profiles and predicted secondary structure. Nucleic Acids Res 31:3804–3807
Article PubMed CAS Google Scholar
Söding J (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21:951–960
Article PubMed Google Scholar
von Ohsen N, Sommer I, Zimmer R et al (2004) Arby: automatic protein structure prediction using profile-profile alignment and confidence measures. Bioinformatics 20:2228–2235
Article Google Scholar
Ginalski K, von Grotthuss M, Grishin NV et al (2004) Detecting distant homology with Meta-BASIC. Nucleic Acids Res 32:W576–W581
Article PubMed CAS Google Scholar
Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292:195–202
Article PubMed CAS Google Scholar
Pollastri G, Przybylski D, Rost B et al (2002) Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins 47:228–235
Article PubMed CAS Google Scholar
Pollastri G, McLysaght A (2005) Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics 21:1719–1720
Article PubMed CAS Google Scholar
Lin K, Simossis VA, Taylor WR et al (2005) A simple and fast secondary structure prediction method using hidden neural networks. Bioinformatics 21:152–159
Article PubMed CAS Google Scholar
Berman HM, Westbrook J, Feng Z et al (2000) The protein data bank. Nucleic Acids Res 28:235–242
Article PubMed CAS Google Scholar
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577–2637
Article PubMed CAS Google Scholar
Lüthy R, McLachlan AD, Eisenberg D (1991) Secondary structure-based profiles: use of structure-conserving scoring tables in searching protein sequence databases for structural similarities. Proteins 10:229–239
Article PubMed Google Scholar
Jones DT, Taylor WR, Thornton JM (1994) A mutation data matrix for transmembrane proteins. FEBS Lett 339:269–275
Article PubMed CAS Google Scholar
Shafrir Y, Guy HR (2004) STAM: simple transmembrane alignment method. Bioinformatics 20:758–769
Article PubMed CAS Google Scholar
Pirovano W, Feenstra KA, Heringa J (2008) PRALINETM: a strategy for improved multiple alignment of transmembrane proteins. Bioinformatics 24:492–497
Article PubMed CAS Google Scholar
Käll L, Krogh A, Sonnhammer ELL (2004) A combined transmembrane topology and signal peptide prediction method. J Mol Biol 338:1027–1036
Article PubMed Google Scholar
Krogh A, Larsson B, von Heijne G et al (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 305:567–580
Article PubMed CAS Google Scholar
Tusnády GE, Simon I (2001) The HMMTOP transmembrane topology prediction server. Bioinformatics 17:849–850
Article PubMed Google Scholar
Ng PC, Henikoff JG, Henikoff S (2000) PHAT: a transmembrane-specific substitution matrix. Bioinformatics 16:760–766
Article PubMed CAS Google Scholar
Hirosawa M, Totoki Y, Hoshida M et al (1995) Comprehensive study on iterative algorithms of multiple sequence alignment. Comput Appl Biosci 11:13–18
PubMed CAS Google Scholar
Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113
Article PubMed Google Scholar
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797
Article PubMed CAS Google Scholar
Pearson WR (2000) Flexible sequence similarity searching with the FASTA3 program package. Methods Mol Biol 132:185–219
PubMed CAS Google Scholar
Gonnet GH, Cohen MA, Benner SA (1992) Exhaustive matching of the entire protein sequence database. Science 256:1443–1445
Article PubMed CAS Google Scholar
Thompson JD, Koehl P, Ripp R et al (2005) BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61:127–136
Article PubMed CAS Google Scholar
Sammeth M, Heringa J (2006) Global multiple-sequence alignment with repeats. Proteins 64:263–274
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands
Punto Bawono
Netherlands Bioinformatics Centre (NBIC), Nijmegen, The Netherlands
Punto Bawono & Jaap Heringa
Centre for Integrative Bioinformatics (IBIVU), Amsterdam Institute for Molecules, Medicines and Systems (AIMMS), VU University Amsterdam, Amsterdam, The Netherlands
Jaap Heringa

Authors

Punto Bawono
View author publications
You can also search for this author in PubMed Google Scholar
Jaap Heringa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Electrical Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
David J Russell

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Bawono, P., Heringa, J. (2014). PRALINE: A Versatile Multiple Sequence Alignment Toolkit. In: Russell, D. (eds) Multiple Sequence Alignment Methods. Methods in Molecular Biology, vol 1079. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-646-7_16

Download citation

DOI: https://doi.org/10.1007/978-1-62703-646-7_16
Published: 23 August 2013
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-645-0
Online ISBN: 978-1-62703-646-7
eBook Packages: Springer Protocols

Publish with us

Policies and ethics