Skip to main content

Bayesian Multiple Protein Structure Alignment

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2014)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8394))

Abstract

Multiple protein structure alignment is an important tool in computational biology, with numerous algorithms published in the past two decades. However, recently literature highlights a growing recognition of the inconsistencies among alignments from different algorithms, and the instability of alignments obtained by individual algorithms under small fluctuations of the input structures. Here we present a probabilistic model-based approach to the problem of multiple structure alignment, using an explicit statistical model. The resulting algorithm produces a Bayesian posterior distribution over alignments which accounts for alignment uncertainty arising from evolutionary variability, experimental noise, and thermal fluctuation, as well as sensitivity to alignment algorithm parameters. We demonstrate the robustness of this approach on alignments identified previously in the literature as “difficult” for existing algorithms. We also show the potential for significant stabilization of tree reconstruction in structural phylogenetics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alexandrov, V., Gerstein, M.: Using 3D hidden Markov models that explicitly represent spatial coordinates to model and compare protein structures. BMC Bioinformatics 5 (2004)

    Google Scholar 

  2. Andreeva, A., Prlic, A., Hubbard, T.J.P., Murzin, A.G.: SISYPHUS - structural alignments for proteins with non-trivial relationships. Nucleic Acids Research 35, D253–D259 (2007)

    Google Scholar 

  3. Baldi, P., Chauvin, Y., Hunkapiller, T., McClure, M.A.: Hidden Markov-models of biological primary sequence information. Proceedings of the National Academy of Sciences U.S.A 91(3), 1059–1063 (1994)

    Article  Google Scholar 

  4. Berbalk, C., Schwaiger, C.S., Lackner, P.: Accuracy analysis of multiple structure alignments. Protein Science 18(10), 2027–2035 (2009)

    Article  Google Scholar 

  5. Bluis, J., Shin, D.: Nodal distance algorithm: Calculating a phylogenetic tree comparison metric. In: Proc. 3rd IEEE Symposium on Bioinformatics and Bioengineering, pp. 87–94 (2003)

    Google Scholar 

  6. Burra, P.V., Zhang, Y., Godzik, A., Stec, B.: Global distribution of conformational states derived from redundant models in the PDB points to non-uniqueness of the protein structure. Proceedings of the National Academy of Sciences U.S.A 106(26), 10505–10510 (2009)

    Article  Google Scholar 

  7. Castelloe, J.M., Zimmerman, D.L.: Convergence assessment for reversible jump MCMC samplers. Technical Report 313, University of Iowa, Dept. of Statistics and Actuarial Science (2002)

    Google Scholar 

  8. Challis, C., Schmidler, S.C.: A Stochastic Evolutionary Model for Protein Structure Alignment and Phylogeny. Molecular Biology and Evolution 29(11), 3375–3387 (2012)

    Article  Google Scholar 

  9. Mardia, K.V., Dryden, I.L.: Statistical Shape Analysis. Wiley (1998)

    Google Scholar 

  10. Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press (1998)

    Google Scholar 

  11. Eddy, S.R., Durbin, R.: RNA sequence analysis using covariance models. Nucleic Acids Research 22(11), 2079–2088 (1994)

    Article  Google Scholar 

  12. Eidhammer, I., Jonassen, I., Taylor, W.R.: Structure comparison and structure patterns. Journal of Computational Biology 7(5), 685–716 (2000)

    Article  Google Scholar 

  13. Geisbrecht, B.V., Dowd, K.A., Barfield, R.W., Longo, P.A., Leahy, D.J.: Netrin binds discrete subdomains of DCC and UNC5 and mediates interactions between DCC and heparin. Journal of Biological Chemistry 278(35), 32561–32568 (2003)

    Article  Google Scholar 

  14. Spiegelhalter, D.J., Gilks, W.R., Richardson, S. (eds.): Markov Chain Monte Carlo in Practice. Chapman & Hall (1996)

    Google Scholar 

  15. Godzik, A.: The structural alignment between two proteins: Is there a unique answer? Protein Science 5(7), 1325–1338 (1996)

    Article  Google Scholar 

  16. Green, P.J.: Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82(4), 711–732 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  17. Green, P.J., Mardia, K.V.: Bayesian alignment using hierarchical models, with applications in protein bioinformatics. Biometrika 93(2), 235–254 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  18. Hasegawa, H., Holm, L.: Advances and pitfalls of protein structural alignment. Current Opinion in Structural Biology 19(3), 341–348 (2009)

    Article  Google Scholar 

  19. Kolodny, R., Petrey, D., Honig, B.: Protein structure comparison: implications for the nature of ‘fold space’, and structure and function prediction. Current Opinion in Structural Biology 16(3), 393–398 (2006)

    Article  Google Scholar 

  20. Krissinel, E., Henrick, K.: Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallographica Section D-Biological Crystallography 60, 2256–2268 (2004)

    Article  Google Scholar 

  21. Krogh, A., Brown, M., Mian, I.S., Sjolander, K., Haussler, D.: Hidden Markov-models in computational biology - applications to protein modeling. Journal of Molecular Biology 235(5), 1501–1531 (1994)

    Article  Google Scholar 

  22. Lathrop, R., Rogers, R., Smith, T., White, J.: A Bayes-optimal sequence-structure theory that unifies protein sequence-structure recognition and alignment. Bulletin of Mathematical Biology 60(6), 1039–1071 (1998)

    Article  MATH  Google Scholar 

  23. Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C.: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262(5131), 208–214 (1993)

    Article  Google Scholar 

  24. Lawrence, C.E., Reilly, A.A.: An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins: Structure, Function, and Bioinformatics 7(1), 41–51 (1990)

    Article  Google Scholar 

  25. Levitt, M., Gerstein, M.: A unified statistical framework for sequence comparison and structure comparison. Proceedings of the National Academy of Sciences, U.S.A 95(11), 5913–5920 (1998)

    Article  Google Scholar 

  26. Lunter, G., Rocco, A., Mimouni, N., Heger, A., Caldeira, A., Hein, J.: Uncertainty in homology inferences: Assessing and improving genomic sequence alignment. Genome Research 18(2), 298–309 (2008)

    Article  Google Scholar 

  27. Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247(4), 536–540 (1995)

    Google Scholar 

  28. Nye, T.M.W., Lio, P., Gilks, W.R.: A novel algorithm and web-based tool for comparing two alternative phylogenetic trees. Bioinformatics 22(1), 117–119 (2006)

    Article  Google Scholar 

  29. Pirovano, W., Feenstra, K.A., Heringa, J.: The meaning of alignment: lessons from structural diversity. BMC Bioinformatics 9 (2008)

    Google Scholar 

  30. Rajagopalan, S., Deitinghoff, L., Davis, D., Conrad, S., Skutella, T., Chedotal, A., Mueller, B.K., Strittmatter, S.M.: Neogenin mediates the action of repulsive guidance molecule. Nature Cell Biology 6(8), 756–762 (2004)

    Article  Google Scholar 

  31. Rodriguez, A., Schmidler, S.C.: Bayesian protein structure alignment (under revision)

    Google Scholar 

  32. Saitou, N., Nei, M.: The neighbor-joining method - a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4(4), 406–425 (1987)

    Google Scholar 

  33. Sakakibara, Y., Brown, M., Hughey, R., Mian, I.S., Sjolander, K., Underwood, R.C., Haussler, D.: Stochastic context-free grammers for tRNA modeling. Nucleic Acids Research 22(23), 5112–5120 (1994)

    Article  Google Scholar 

  34. Schmidler, S.C., Liu, J.S., Brutlag, D.L.: Bayesian segmentation of protein secondary structure. Journal of Computational Biology 7(1-2), 233–248 (2000)

    Article  Google Scholar 

  35. Schmidler, S.C., Liu, J.S., Brutlag, D.L.: Bayesian protein structure prediction. Case Studies in Bayesian Statistics 5, 363–378 (2001)

    Google Scholar 

  36. Schmidler, S.C.: Statistical Models and Monte Carlo Methods for Protein Structure Prediction. PhD thesis, Stanford University (2002)

    Google Scholar 

  37. Schmidler, S.C.: Fast Bayesian shape matching using geometric algorithms (with discussion). In: Bernardo, J.M., Bayarri, S., Berger, J.O., Dawid, A.P., Heckerman, D., Smith, A.F.M., West, M. (eds.) Bayesian Statistics, vol. 8, pp. 471–490. Oxford University Press, Oxford (2006)

    Google Scholar 

  38. Schmidler, S.C.: Bayesian flexible shape matching with applications to structural bioinformatics (submitted)

    Google Scholar 

  39. Simons, K.T., Kooperberg, C., Huang, E., Baker, D.: Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and a Bayesian scoring function. Journal of Molecular Biology 268(1), 209–225 (1997)

    Article  Google Scholar 

  40. Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. Journal of Computational Biology 1(4), 12 (1994)

    Article  Google Scholar 

  41. Webb, B.J.M., Liu, J.S., Lawrence, C.E.: BALSA: Bayesian algorithm for local sequence alignment. Nucleic Acids Research 30(5), 1268–1277 (2002)

    Article  Google Scholar 

  42. Wong, K.M., Suchard, M.A., Huelsenbeck, J.P.: Alignment uncertainty and genomic analysis. Science 319(5862), 473–476 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  43. Wouter, B., Mardia, K.V., Taylor, C.C., Ferkinghoff-Borg, J., Krogh, A., Hamelryck, T.: A generative, probabilistic model of local protein structure. Proceedings of the National Academy of Sciences, U.S.A 105(26), 8932–8937 (2008)

    Article  Google Scholar 

  44. Zhu, J., Liu, J.S., Lawrence, C.E.: Bayesian adaptive sequence alignment algorithms. Bioinformatics 14(1), 25–39 (1998)

    Article  Google Scholar 

  45. Zhu, J.H., Weng, Z.P.: FAST: A novel protein structure alignment algorithm. Proteins-Structure Function and Bioinformatics 58(3), 618–627 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, R., Schmidler, S.C. (2014). Bayesian Multiple Protein Structure Alignment. In: Sharan, R. (eds) Research in Computational Molecular Biology. RECOMB 2014. Lecture Notes in Computer Science(), vol 8394. Springer, Cham. https://doi.org/10.1007/978-3-319-05269-4_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05269-4_27

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05268-7

  • Online ISBN: 978-3-319-05269-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics