Skip to main content

Boosting Alignment Accuracy by Adaptive Local Realignment

  • Conference paper
  • First Online:
Research in Computational Molecular Biology (RECOMB 2017)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10229))

Abstract

While mutation rates can vary markedly over the residues of a protein, multiple sequence alignment tools typically use the same values for their scoring-function parameters across a protein’s entire length. We present a new approach, called adaptive local realignment, that in contrast automatically adapts to the diversity of mutation rates along protein sequences. This builds upon a recent technique known as parameter advising that finds global parameter settings for aligners, to adaptively find local settings. Our approach in essence identifies local regions with low estimated accuracy, constructs a set of candidate realignments using a carefully-chosen collection of parameter settings, and replaces the region if a realignment has higher estimated accuracy. This new method of local parameter advising, when combined with prior methods for global advising, boosts alignment accuracy as much as 26% over the best default setting on hard-to-align protein benchmarks, and by 6.4% over global advising alone. Adaptive local realignment, implemented within the Opal aligner using the Facet accuracy estimator, is available at facet.cs.arizona.edu.

The work of both authors was performed at the University of Arizona.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bahr, A., Thompson, J.D., Thierry, J.C., Poch, O.: BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations. Nucleic Acids Res. 29(1), 323–326 (2001)

    Article  Google Scholar 

  2. Balaji, S., Sujatha, S., Kumar, S., Srinivasan, N.: PALI—a database of Phylogeny and ALIgnment of homologous protein structures. NAR 29(1), 61–65 (2001)

    Article  Google Scholar 

  3. Chang, J., Tommaso, P., Notredame, C.: A new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction. Mol. Biol. Evol. 31(6), 1625–1637 (2014)

    Article  Google Scholar 

  4. DeBlasio, D., Kececioglu, J.: Facet: software for accuracy estimation of protein multiple sequence alignments (2014). facet.cs.arizona.edu

  5. DeBlasio, D., Kececioglu, J.: Learning parameter-advising sets for multiple sequence alignment. IEEE/ACM Trans. Comput. Biol. Bioinform. (2015). doi:10.1109/TCBB.2015.2430323

  6. DeBlasio, D.F., Wheeler, T.J., Kececioglu, J.D.: Estimating the accuracy of multiple alignments and its use in parameter advising. In: Chor, B. (ed.) RECOMB 2012. LNCS, vol. 7262, pp. 45–59. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29627-7_5

    Chapter  Google Scholar 

  7. DeBlasio, D.F.: Parameter Advising for Multiple Sequence Alignment. Ph.D. dissertation, Department of Computer Science, The University of Arizona, May 2016

    Google Scholar 

  8. Do, C., Mahabhashyam, M., Brudno, M., Batzoglou, S.: Probabilistic consistency-based multiple sequence alignment. Genome Res. 15(2), 330–340 (2005)

    Article  Google Scholar 

  9. Edgar, R.C.: BENCH (2009). drive5.com/bench

  10. Edgar, R.: MUSCLE multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5), 1792–1797 (2004)

    Article  Google Scholar 

  11. Fitch, W.M., Margoliash, E.: A method for estimating the number of invariant amino acid coding positions in a gene using cytochrome c as a model case. Biochem. Genet. 1(1), 65–71 (1967)

    Article  Google Scholar 

  12. Gotoh, O.: Optimal alignment between groups of sequences and its application to multiple sequence alignment. Comput. Appl. Biosci. 9(3), 361–370 (1993)

    Google Scholar 

  13. Henikoff, S., Henikoff, J.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89(22), 10915–10919 (1992)

    Article  Google Scholar 

  14. Katoh, K., Kuma, K.I., Toh, H., Miyata, T.: MAFFT version: 5 improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33(2), 511–518 (2005)

    Article  Google Scholar 

  15. Kececioglu, J., DeBlasio, D.: Accuracy estimation and parameter advising for protein multiple sequence alignment. J. Comput. Biol. 20(4), 259–279 (2013)

    Article  Google Scholar 

  16. Kececioglu, J., Starrett, D.: Aligning alignments exactly. In: Proceedings of the 8th Conference on Research in Computational Molecular Biology (RECOMB), pp. 85–96. ACM (2004)

    Google Scholar 

  17. Löytynoja, A., Goldman, N.: Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320(5883), 1632–1635 (2008)

    Article  Google Scholar 

  18. Müller, T., Spang, R., Vingron, M.: Estimating amino acid substitution models: a comparison of Dayhoff’s estimator, the resolvent approach and a maximum likelihood method. Mol. Biol. Evol. 19(1), 8–13 (2002)

    Article  Google Scholar 

  19. Notredame, C., Higgins, D., Heringa, J.: T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302(1), 205–217 (2000)

    Article  Google Scholar 

  20. Raghava, G., et al.: OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy. BMC Bioinform. 4(1), 1–23 (2003)

    Article  Google Scholar 

  21. Roskin, K.M., Paten, B., Haussler, D.: Meta-alignment with Crumbleand Prune: partitioning very large alignment problems for performance and parallelization. BMC Bioinform. 12(1), 1–12 (2011)

    Article  Google Scholar 

  22. Sievers, F., et al.: Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Sys. Biol. 7(1), 539 (2011)

    Article  Google Scholar 

  23. Thompson, J., Higgins, D., Gibson, T.: Clustal W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22(22), 4673–4680 (1994)

    Article  Google Scholar 

  24. Van Walle, I., Lasters, I., Wyns, L.: SABmark: a benchmark for sequence alignment that covers the entire known fold space. Bioinformatics 21(7), 1267–1268 (2005)

    Article  Google Scholar 

  25. Wheeler, T.J., Kececioglu, J.D.: Multiple alignment by aligning alignments. Bioinformatics 23(13), i559–i568 (2007)

    Article  Google Scholar 

  26. Yang, Z.: Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol. Biol. Evol. 10(6), 1396–1401 (1993)

    Google Scholar 

Download references

Acknowledgements

Research of JK and DD at Arizona was funded by NSF Grant IIS-1217886 to JK. DD was partially supported at Carnegie Mellon by NSF Grant CCF-1256087, NSF Grant CCF-131999, NIH Grant R01HG007104, and Gordon and Betty Moore Foundation Grant GBMF4554, to Carl Kingsford.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dan DeBlasio .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

DeBlasio, D., Kececioglu, J. (2017). Boosting Alignment Accuracy by Adaptive Local Realignment. In: Sahinalp, S. (eds) Research in Computational Molecular Biology. RECOMB 2017. Lecture Notes in Computer Science(), vol 10229. Springer, Cham. https://doi.org/10.1007/978-3-319-56970-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56970-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56969-7

  • Online ISBN: 978-3-319-56970-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics