PicXAA: A Probabilistic Scheme for Finding the Maximum Expected Accuracy Alignment of Multiple Biological Sequences

Part of the Methods in Molecular Biology book series (MIMB, volume 1079)


PicXAA is a probabilistic nonprogressive alignment algorithm that finds protein (or DNA) multiple sequence alignments with maximum expected accuracy. PicXAA greedily builds up the alignment from sequence regions with high local similarity, thereby yielding an accurate global alignment that effectively captures the local similarities across sequences. PicXAA constantly yields accurate alignment results on a wide range of reference sets that have different characteristics, with especially remarkable improvements over other leading algorithms on sequence sets with high local similarities. In this chapter, we describe the overall alignment strategy used in PicXAA and discuss several important considerations for effective deployment of the algorithm.

Key words

Multiple sequence alignment Non-progress alignment Maximum expected accuracy (MEA) Probabilistic consistency transformation PicXAA 



This work was supported in part by the National Science Foundation through NSF Award CCF-1149544.


  1. 1.
    Phillips A, Janies D, Wheeler W (2000) Multiple sequence alignment in phylogenetic analysis. Mol Phylogenet Evol 16:317–330PubMedCrossRefGoogle Scholar
  2. 2.
    Wong KM, Suchard MA, Huelsenbeck JP (2008) Alignment uncertainty and genomic analysis. Science 319:473–476PubMedCrossRefGoogle Scholar
  3. 3.
    Cuff JA, Barton GJ (2000) Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins 40:502–511PubMedCrossRefGoogle Scholar
  4. 4.
    Kemena C, Notredame C (2009) Upcoming challenges for multiple sequence alignment methods in the high-throughput era. Bioinformatics 25:2455–2465PubMedCrossRefGoogle Scholar
  5. 5.
    Edgar R, Batzoglou S (2006) Multiple sequence alignment. Curr Opin Struct Biol 16:368–373PubMedCrossRefGoogle Scholar
  6. 6.
    Pei J (2008) Multiple protein sequence alignment. Curr Opin Struct Biol 18:382–386PubMedCrossRefGoogle Scholar
  7. 7.
    Kumar S, Filipski A (2007) Multiple sequence alignment: in pursuit of homologous DNA positions. Genome Res 17:127–135PubMedCrossRefGoogle Scholar
  8. 8.
    Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680PubMedCrossRefGoogle Scholar
  9. 9.
    Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217PubMedCrossRefGoogle Scholar
  10. 10.
    Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15:330–340PubMedCrossRefGoogle Scholar
  11. 11.
    Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797PubMedCrossRefGoogle Scholar
  12. 12.
    Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30:3059–3066PubMedCrossRefGoogle Scholar
  13. 13.
    Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33:511–518PubMedCrossRefGoogle Scholar
  14. 14.
    Katoh K, Toh H (2008) Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform 9:286–298PubMedCrossRefGoogle Scholar
  15. 15.
    Pei J, Grishin NV (2006) MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information. Nucleic Acids Res 34:4364–4374PubMedCrossRefGoogle Scholar
  16. 16.
    Paten B, Herrero J, Beal K, Birney E (2009) Sequence progressive alignment, a framework for practical large-scale probabilistic consistency alignment. Bioinformatics 25:295–301PubMedCrossRefGoogle Scholar
  17. 17.
    Subramanian AR, Kaufmann M, Morgenstern B (2008) DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol Biol 3:6PubMedCrossRefGoogle Scholar
  18. 18.
    Schwartz AS, Pachter L (2007) Multiple alignment by sequence annealing. Bioinformatics 23:e24–e29PubMedCrossRefGoogle Scholar
  19. 19.
    Bradley RK, Roberts A, Smoot M, Juvekar S, Do J, Dewey C, Holmes I, Pachter L (2009) Fast statistical alignment. PLoS Comput Biol 5:e1000392PubMedCrossRefGoogle Scholar
  20. 20.
    Sahraeian SM, Yoon BJ (2010) PicXAA: greedy probabilistic construction of maximum expected accuracy alignment of multiple Sequences. Nucleic Acids Res 38:4917–4928PubMedCrossRefGoogle Scholar
  21. 21.
    Roshan U, Livesay DR (2006) Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22:2715–2721PubMedCrossRefGoogle Scholar
  22. 22.
    Gonnet GH, Cohen MA, Benner SA (1992) Exhaustive matching of the entire protein sequence database. Science 256:1443–1445PubMedCrossRefGoogle Scholar
  23. 23.
    Sahraeian SM, Yoon BJ (2010) PicXAA-R: efficient structural alignment of multiple RNA sequences using a greedy approach. BMC Bioinformatics 11(Suppl 1):S38CrossRefGoogle Scholar
  24. 24.
    Sahraeian SM, Yoon BJ (2011) PicXAA-Web: a web-based platform for non-progressive maximum expected accuracy alignment of multiple biological sequences. Nucleic Acids Res 39:W8–W12PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2014

Authors and Affiliations

  1. 1.Department of Plant and Microbial BiologyUniversity of CaliforniaBerkeleyUSA
  2. 2.Department of Electrical and Computer EngineeringTexas A&M UniversityCollege StationUSA

Personalised recommendations