Boosting Alignment Accuracy by Adaptive Local Realignment
While mutation rates can vary markedly over the residues of a protein, multiple sequence alignment tools typically use the same values for their scoring-function parameters across a protein’s entire length. We present a new approach, called adaptive local realignment, that in contrast automatically adapts to the diversity of mutation rates along protein sequences. This builds upon a recent technique known as parameter advising that finds global parameter settings for aligners, to adaptively find local settings. Our approach in essence identifies local regions with low estimated accuracy, constructs a set of candidate realignments using a carefully-chosen collection of parameter settings, and replaces the region if a realignment has higher estimated accuracy. This new method of local parameter advising, when combined with prior methods for global advising, boosts alignment accuracy as much as 26% over the best default setting on hard-to-align protein benchmarks, and by 6.4% over global advising alone. Adaptive local realignment, implemented within the Opal aligner using the Facet accuracy estimator, is available at facet.cs.arizona.edu.
KeywordsMultiple sequence alignment Iterative refinement Local mutation rates Alignment accuracy Parameter advising
Research of JK and DD at Arizona was funded by NSF Grant IIS-1217886 to JK. DD was partially supported at Carnegie Mellon by NSF Grant CCF-1256087, NSF Grant CCF-131999, NIH Grant R01HG007104, and Gordon and Betty Moore Foundation Grant GBMF4554, to Carl Kingsford.
- 4.DeBlasio, D., Kececioglu, J.: Facet: software for accuracy estimation of protein multiple sequence alignments (2014). facet.cs.arizona.edu
- 5.DeBlasio, D., Kececioglu, J.: Learning parameter-advising sets for multiple sequence alignment. IEEE/ACM Trans. Comput. Biol. Bioinform. (2015). doi:10.1109/TCBB.2015.2430323
- 7.DeBlasio, D.F.: Parameter Advising for Multiple Sequence Alignment. Ph.D. dissertation, Department of Computer Science, The University of Arizona, May 2016Google Scholar
- 9.Edgar, R.C.: BENCH (2009). drive5.com/bench
- 12.Gotoh, O.: Optimal alignment between groups of sequences and its application to multiple sequence alignment. Comput. Appl. Biosci. 9(3), 361–370 (1993)Google Scholar
- 16.Kececioglu, J., Starrett, D.: Aligning alignments exactly. In: Proceedings of the 8th Conference on Research in Computational Molecular Biology (RECOMB), pp. 85–96. ACM (2004)Google Scholar
- 26.Yang, Z.: Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol. Biol. Evol. 10(6), 1396–1401 (1993)Google Scholar