Advertisement

PRALINE: A Versatile Multiple Sequence Alignment Toolkit

  • Punto Bawono
  • Jaap Heringa
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1079)

Abstract

Profile ALIgNmEnt (PRALINE) is a versatile multiple sequence alignment toolkit. In its main alignment protocol, PRALINE follows the global progressive alignment algorithm. It provides various alignment optimization strategies to address the different situations that call for protein multiple sequence alignment: global profile preprocessing, homology-extended alignment, secondary structure-guided alignment, and transmembrane aware alignment. A number of combinations of these strategies are enabled as well.

PRALINE is accessible via the online server http://www.ibi.vu.nl/programs/PRALINEwww/. The server facilitates extensive visualization possibilities aiding the interpretation of alignments generated, which can be written out in pdf format for publication purposes. PRALINE also allows the sequences in the alignment to be represented in a dendrogram to show their mutual relationships according to the alignment. The chapter ends with a discussion of various issues occurring in multiple sequence alignment.

Key words

Multiple sequence alignment Progressive alignment Sequence preprocessing Homology-extended MSA Secondary structure-guided MSA Transmembrane-aware protein alignment 

References

  1. 1.
    Sankoff D, Cedergren RJ (1983) Simultaneous comparison of three or more sequences related by a tree, time warps, string edits and macromolecules. The theory and practice of sequence comparison. Addison-Wesley, Reading, MA, pp 253–263Google Scholar
  2. 2.
    Hogeweg P, Hesper B (1984) The alignment of sets of sequences and the construction of phyletic trees: an integrated method. J Mol Evol 20:175–186PubMedCrossRefGoogle Scholar
  3. 3.
    Feng DF, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25:351–360PubMedCrossRefGoogle Scholar
  4. 4.
    Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680PubMedCrossRefGoogle Scholar
  5. 5.
    Gotoh O (1996) Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J Mol Biol 264:823–838PubMedCrossRefGoogle Scholar
  6. 6.
    Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217PubMedCrossRefGoogle Scholar
  7. 7.
    Heringa J (1999) Two strategies for sequence comparison: profile-preprocessed and secondary structure-induced multiple alignment. Comput Chem 23:341–364PubMedCrossRefGoogle Scholar
  8. 8.
    Heringa J (2002) Local weighting schemes for protein multiple sequence alignment. Comput Chem 26:459–477PubMedCrossRefGoogle Scholar
  9. 9.
    Katoh K, Kuma K, Toh H et al (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33:511–518PubMedCrossRefGoogle Scholar
  10. 10.
    Edgar RC, Sjölander K (2004) A comparison of scoring functions for protein sequence profile alignment. Bioinformatics 20:1301–1308PubMedCrossRefGoogle Scholar
  11. 11.
    Wang G, Dunbrack RL Jr (2004) Scoring profile-to-profile sequence alignments. Protein Sci 13:1612–1626PubMedCrossRefGoogle Scholar
  12. 12.
    Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89:10915–10919PubMedCrossRefGoogle Scholar
  13. 13.
    Dayhoff MO, Barker WC, Hunt LT (1983) Establishing homologies in protein sequences. Methods Enzymol 91:524–545PubMedCrossRefGoogle Scholar
  14. 14.
    Vogt G, Etzold T, Argos P (1995) An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. J Mol Biol 249:816–831PubMedCrossRefGoogle Scholar
  15. 15.
    Yona G, Brenner SE (2000) Comparison of protein sequences and practical database searching. In: Higgins D, Taylor W (eds) Bioinformatics: sequence, structure, and data-banks. A practical approach. Oxford University Press, New York, pp 167–190Google Scholar
  16. 16.
    Rost B (1999) Twilight zone of protein sequence alignments. Protein Eng 12:85–94PubMedCrossRefGoogle Scholar
  17. 17.
    Yu Y-K, Wootton JC, Altschul SF (2003) The compositional adjustment of amino acid substitution matrices. Proc Natl Acad Sci 100:15688–15693PubMedCrossRefGoogle Scholar
  18. 18.
    Simossis VA, Kleinjung J, Heringa J (2005) Homology-extended sequence alignment. Nucleic Acids Res 33:816–824PubMedCrossRefGoogle Scholar
  19. 19.
    Sander C, Schneider R (1991) Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9:56–68PubMedCrossRefGoogle Scholar
  20. 20.
    Chothia C, Lesk AM (1986) The relation between the divergence of sequence and structure in proteins. EMBO J 5:823–826PubMedGoogle Scholar
  21. 21.
    Simossis VA, Heringa J (2004) The influence of gapped positions in multiple sequence alignments on secondary structure prediction methods. Comput Biol Chem 28:351–366PubMedCrossRefGoogle Scholar
  22. 22.
    Heringa J (2000) Computational methods for protein secondary structure prediction using multiple sequence alignments. Curr Protein Pept Sci 1:273–301PubMedCrossRefGoogle Scholar
  23. 23.
    Chung R, Yona G (2004) Protein family comparison using statistical models and predicted structural information. BMC Bioinformatics 5:183PubMedCrossRefGoogle Scholar
  24. 24.
    Ginalski K, Pas J, Wyrwicz LS et al (2003) ORFeus: Detection of distant homology using sequence profiles and predicted secondary structure. Nucleic Acids Res 31:3804–3807PubMedCrossRefGoogle Scholar
  25. 25.
    Söding J (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21:951–960PubMedCrossRefGoogle Scholar
  26. 26.
    von Ohsen N, Sommer I, Zimmer R et al (2004) Arby: automatic protein structure prediction using profile-profile alignment and confidence measures. Bioinformatics 20:2228–2235CrossRefGoogle Scholar
  27. 27.
    Ginalski K, von Grotthuss M, Grishin NV et al (2004) Detecting distant homology with Meta-BASIC. Nucleic Acids Res 32:W576–W581PubMedCrossRefGoogle Scholar
  28. 28.
    Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292:195–202PubMedCrossRefGoogle Scholar
  29. 29.
    Pollastri G, Przybylski D, Rost B et al (2002) Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins 47:228–235PubMedCrossRefGoogle Scholar
  30. 30.
    Pollastri G, McLysaght A (2005) Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics 21:1719–1720PubMedCrossRefGoogle Scholar
  31. 31.
    Lin K, Simossis VA, Taylor WR et al (2005) A simple and fast secondary structure prediction method using hidden neural networks. Bioinformatics 21:152–159PubMedCrossRefGoogle Scholar
  32. 32.
    Berman HM, Westbrook J, Feng Z et al (2000) The protein data bank. Nucleic Acids Res 28:235–242PubMedCrossRefGoogle Scholar
  33. 33.
    Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577–2637PubMedCrossRefGoogle Scholar
  34. 34.
    Lüthy R, McLachlan AD, Eisenberg D (1991) Secondary structure-based profiles: use of structure-conserving scoring tables in searching protein sequence databases for structural similarities. Proteins 10:229–239PubMedCrossRefGoogle Scholar
  35. 35.
    Jones DT, Taylor WR, Thornton JM (1994) A mutation data matrix for transmembrane proteins. FEBS Lett 339:269–275PubMedCrossRefGoogle Scholar
  36. 36.
    Shafrir Y, Guy HR (2004) STAM: simple transmembrane alignment method. Bioinformatics 20:758–769PubMedCrossRefGoogle Scholar
  37. 37.
    Pirovano W, Feenstra KA, Heringa J (2008) PRALINETM: a strategy for improved multiple alignment of transmembrane proteins. Bioinformatics 24:492–497PubMedCrossRefGoogle Scholar
  38. 38.
    Käll L, Krogh A, Sonnhammer ELL (2004) A combined transmembrane topology and signal peptide prediction method. J Mol Biol 338:1027–1036PubMedCrossRefGoogle Scholar
  39. 39.
    Krogh A, Larsson B, von Heijne G et al (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 305:567–580PubMedCrossRefGoogle Scholar
  40. 40.
    Tusnády GE, Simon I (2001) The HMMTOP transmembrane topology prediction server. Bioinformatics 17:849–850PubMedCrossRefGoogle Scholar
  41. 41.
    Ng PC, Henikoff JG, Henikoff S (2000) PHAT: a transmembrane-specific substitution matrix. Bioinformatics 16:760–766PubMedCrossRefGoogle Scholar
  42. 42.
    Hirosawa M, Totoki Y, Hoshida M et al (1995) Comprehensive study on iterative algorithms of multiple sequence alignment. Comput Appl Biosci 11:13–18PubMedGoogle Scholar
  43. 43.
    Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113PubMedCrossRefGoogle Scholar
  44. 44.
    Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797PubMedCrossRefGoogle Scholar
  45. 45.
    Pearson WR (2000) Flexible sequence similarity searching with the FASTA3 program package. Methods Mol Biol 132:185–219PubMedGoogle Scholar
  46. 46.
    Gonnet GH, Cohen MA, Benner SA (1992) Exhaustive matching of the entire protein sequence database. Science 256:1443–1445PubMedCrossRefGoogle Scholar
  47. 47.
    Thompson JD, Koehl P, Ripp R et al (2005) BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61:127–136PubMedCrossRefGoogle Scholar
  48. 48.
    Sammeth M, Heringa J (2006) Global multiple-sequence alignment with repeats. Proteins 64:263–274PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2014

Authors and Affiliations

  • Punto Bawono
    • 1
    • 2
  • Jaap Heringa
    • 3
    • 2
  1. 1.Centre for Integrative Bioinformatics (IBIVU)VU University AmsterdamAmsterdamThe Netherlands
  2. 2.Netherlands Bioinformatics Centre (NBIC)NijmegenThe Netherlands
  3. 3.Centre for Integrative Bioinformatics (IBIVU), Amsterdam Institute for Molecules, Medicines and Systems (AIMMS)VU University AmsterdamAmsterdamThe Netherlands

Personalised recommendations