Boosting Alignment Accuracy by Adaptive Local Realignment

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10229)

Abstract

While mutation rates can vary markedly over the residues of a protein, multiple sequence alignment tools typically use the same values for their scoring-function parameters across a protein’s entire length. We present a new approach, called adaptive local realignment, that in contrast automatically adapts to the diversity of mutation rates along protein sequences. This builds upon a recent technique known as parameter advising that finds global parameter settings for aligners, to adaptively find local settings. Our approach in essence identifies local regions with low estimated accuracy, constructs a set of candidate realignments using a carefully-chosen collection of parameter settings, and replaces the region if a realignment has higher estimated accuracy. This new method of local parameter advising, when combined with prior methods for global advising, boosts alignment accuracy as much as 26% over the best default setting on hard-to-align protein benchmarks, and by 6.4% over global advising alone. Adaptive local realignment, implemented within the Opal aligner using the Facet accuracy estimator, is available at facet.cs.arizona.edu.

Keywords

Multiple sequence alignment Iterative refinement Local mutation rates Alignment accuracy Parameter advising 

Notes

Acknowledgements

Research of JK and DD at Arizona was funded by NSF Grant IIS-1217886 to JK. DD was partially supported at Carnegie Mellon by NSF Grant CCF-1256087, NSF Grant CCF-131999, NIH Grant R01HG007104, and Gordon and Betty Moore Foundation Grant GBMF4554, to Carl Kingsford.

References

  1. 1.
    Bahr, A., Thompson, J.D., Thierry, J.C., Poch, O.: BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations. Nucleic Acids Res. 29(1), 323–326 (2001)CrossRefGoogle Scholar
  2. 2.
    Balaji, S., Sujatha, S., Kumar, S., Srinivasan, N.: PALI—a database of Phylogeny and ALIgnment of homologous protein structures. NAR 29(1), 61–65 (2001)CrossRefGoogle Scholar
  3. 3.
    Chang, J., Tommaso, P., Notredame, C.: A new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction. Mol. Biol. Evol. 31(6), 1625–1637 (2014)CrossRefGoogle Scholar
  4. 4.
    DeBlasio, D., Kececioglu, J.: Facet: software for accuracy estimation of protein multiple sequence alignments (2014). facet.cs.arizona.edu
  5. 5.
    DeBlasio, D., Kececioglu, J.: Learning parameter-advising sets for multiple sequence alignment. IEEE/ACM Trans. Comput. Biol. Bioinform. (2015). doi:10.1109/TCBB.2015.2430323
  6. 6.
    DeBlasio, D.F., Wheeler, T.J., Kececioglu, J.D.: Estimating the accuracy of multiple alignments and its use in parameter advising. In: Chor, B. (ed.) RECOMB 2012. LNCS, vol. 7262, pp. 45–59. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29627-7_5 CrossRefGoogle Scholar
  7. 7.
    DeBlasio, D.F.: Parameter Advising for Multiple Sequence Alignment. Ph.D. dissertation, Department of Computer Science, The University of Arizona, May 2016Google Scholar
  8. 8.
    Do, C., Mahabhashyam, M., Brudno, M., Batzoglou, S.: Probabilistic consistency-based multiple sequence alignment. Genome Res. 15(2), 330–340 (2005)CrossRefGoogle Scholar
  9. 9.
    Edgar, R.C.: BENCH (2009). drive5.com/bench
  10. 10.
    Edgar, R.: MUSCLE multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5), 1792–1797 (2004)CrossRefGoogle Scholar
  11. 11.
    Fitch, W.M., Margoliash, E.: A method for estimating the number of invariant amino acid coding positions in a gene using cytochrome c as a model case. Biochem. Genet. 1(1), 65–71 (1967)CrossRefGoogle Scholar
  12. 12.
    Gotoh, O.: Optimal alignment between groups of sequences and its application to multiple sequence alignment. Comput. Appl. Biosci. 9(3), 361–370 (1993)Google Scholar
  13. 13.
    Henikoff, S., Henikoff, J.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89(22), 10915–10919 (1992)CrossRefGoogle Scholar
  14. 14.
    Katoh, K., Kuma, K.I., Toh, H., Miyata, T.: MAFFT version: 5 improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33(2), 511–518 (2005)CrossRefGoogle Scholar
  15. 15.
    Kececioglu, J., DeBlasio, D.: Accuracy estimation and parameter advising for protein multiple sequence alignment. J. Comput. Biol. 20(4), 259–279 (2013)CrossRefGoogle Scholar
  16. 16.
    Kececioglu, J., Starrett, D.: Aligning alignments exactly. In: Proceedings of the 8th Conference on Research in Computational Molecular Biology (RECOMB), pp. 85–96. ACM (2004)Google Scholar
  17. 17.
    Löytynoja, A., Goldman, N.: Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320(5883), 1632–1635 (2008)CrossRefGoogle Scholar
  18. 18.
    Müller, T., Spang, R., Vingron, M.: Estimating amino acid substitution models: a comparison of Dayhoff’s estimator, the resolvent approach and a maximum likelihood method. Mol. Biol. Evol. 19(1), 8–13 (2002)CrossRefGoogle Scholar
  19. 19.
    Notredame, C., Higgins, D., Heringa, J.: T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302(1), 205–217 (2000)CrossRefGoogle Scholar
  20. 20.
    Raghava, G., et al.: OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy. BMC Bioinform. 4(1), 1–23 (2003)CrossRefGoogle Scholar
  21. 21.
    Roskin, K.M., Paten, B., Haussler, D.: Meta-alignment with Crumbleand Prune: partitioning very large alignment problems for performance and parallelization. BMC Bioinform. 12(1), 1–12 (2011)CrossRefGoogle Scholar
  22. 22.
    Sievers, F., et al.: Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Sys. Biol. 7(1), 539 (2011)CrossRefGoogle Scholar
  23. 23.
    Thompson, J., Higgins, D., Gibson, T.: Clustal W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22(22), 4673–4680 (1994)CrossRefGoogle Scholar
  24. 24.
    Van Walle, I., Lasters, I., Wyns, L.: SABmark: a benchmark for sequence alignment that covers the entire known fold space. Bioinformatics 21(7), 1267–1268 (2005)CrossRefGoogle Scholar
  25. 25.
    Wheeler, T.J., Kececioglu, J.D.: Multiple alignment by aligning alignments. Bioinformatics 23(13), i559–i568 (2007)CrossRefGoogle Scholar
  26. 26.
    Yang, Z.: Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol. Biol. Evol. 10(6), 1396–1401 (1993)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Computational Biology DepartmentCarnegie Mellon UniversityPittsburghUSA
  2. 2.Department of Computer ScienceThe University of ArizonaTucsonUSA

Personalised recommendations