Abstract
Multiple protein structure alignment is an important tool in computational biology, with numerous algorithms published in the past two decades. However, recently literature highlights a growing recognition of the inconsistencies among alignments from different algorithms, and the instability of alignments obtained by individual algorithms under small fluctuations of the input structures. Here we present a probabilistic model-based approach to the problem of multiple structure alignment, using an explicit statistical model. The resulting algorithm produces a Bayesian posterior distribution over alignments which accounts for alignment uncertainty arising from evolutionary variability, experimental noise, and thermal fluctuation, as well as sensitivity to alignment algorithm parameters. We demonstrate the robustness of this approach on alignments identified previously in the literature as “difficult” for existing algorithms. We also show the potential for significant stabilization of tree reconstruction in structural phylogenetics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alexandrov, V., Gerstein, M.: Using 3D hidden Markov models that explicitly represent spatial coordinates to model and compare protein structures. BMC Bioinformatics 5 (2004)
Andreeva, A., Prlic, A., Hubbard, T.J.P., Murzin, A.G.: SISYPHUS - structural alignments for proteins with non-trivial relationships. Nucleic Acids Research 35, D253–D259 (2007)
Baldi, P., Chauvin, Y., Hunkapiller, T., McClure, M.A.: Hidden Markov-models of biological primary sequence information. Proceedings of the National Academy of Sciences U.S.A 91(3), 1059–1063 (1994)
Berbalk, C., Schwaiger, C.S., Lackner, P.: Accuracy analysis of multiple structure alignments. Protein Science 18(10), 2027–2035 (2009)
Bluis, J., Shin, D.: Nodal distance algorithm: Calculating a phylogenetic tree comparison metric. In: Proc. 3rd IEEE Symposium on Bioinformatics and Bioengineering, pp. 87–94 (2003)
Burra, P.V., Zhang, Y., Godzik, A., Stec, B.: Global distribution of conformational states derived from redundant models in the PDB points to non-uniqueness of the protein structure. Proceedings of the National Academy of Sciences U.S.A 106(26), 10505–10510 (2009)
Castelloe, J.M., Zimmerman, D.L.: Convergence assessment for reversible jump MCMC samplers. Technical Report 313, University of Iowa, Dept. of Statistics and Actuarial Science (2002)
Challis, C., Schmidler, S.C.: A Stochastic Evolutionary Model for Protein Structure Alignment and Phylogeny. Molecular Biology and Evolution 29(11), 3375–3387 (2012)
Mardia, K.V., Dryden, I.L.: Statistical Shape Analysis. Wiley (1998)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press (1998)
Eddy, S.R., Durbin, R.: RNA sequence analysis using covariance models. Nucleic Acids Research 22(11), 2079–2088 (1994)
Eidhammer, I., Jonassen, I., Taylor, W.R.: Structure comparison and structure patterns. Journal of Computational Biology 7(5), 685–716 (2000)
Geisbrecht, B.V., Dowd, K.A., Barfield, R.W., Longo, P.A., Leahy, D.J.: Netrin binds discrete subdomains of DCC and UNC5 and mediates interactions between DCC and heparin. Journal of Biological Chemistry 278(35), 32561–32568 (2003)
Spiegelhalter, D.J., Gilks, W.R., Richardson, S. (eds.): Markov Chain Monte Carlo in Practice. Chapman & Hall (1996)
Godzik, A.: The structural alignment between two proteins: Is there a unique answer? Protein Science 5(7), 1325–1338 (1996)
Green, P.J.: Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82(4), 711–732 (1995)
Green, P.J., Mardia, K.V.: Bayesian alignment using hierarchical models, with applications in protein bioinformatics. Biometrika 93(2), 235–254 (2006)
Hasegawa, H., Holm, L.: Advances and pitfalls of protein structural alignment. Current Opinion in Structural Biology 19(3), 341–348 (2009)
Kolodny, R., Petrey, D., Honig, B.: Protein structure comparison: implications for the nature of ‘fold space’, and structure and function prediction. Current Opinion in Structural Biology 16(3), 393–398 (2006)
Krissinel, E., Henrick, K.: Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallographica Section D-Biological Crystallography 60, 2256–2268 (2004)
Krogh, A., Brown, M., Mian, I.S., Sjolander, K., Haussler, D.: Hidden Markov-models in computational biology - applications to protein modeling. Journal of Molecular Biology 235(5), 1501–1531 (1994)
Lathrop, R., Rogers, R., Smith, T., White, J.: A Bayes-optimal sequence-structure theory that unifies protein sequence-structure recognition and alignment. Bulletin of Mathematical Biology 60(6), 1039–1071 (1998)
Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C.: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262(5131), 208–214 (1993)
Lawrence, C.E., Reilly, A.A.: An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins: Structure, Function, and Bioinformatics 7(1), 41–51 (1990)
Levitt, M., Gerstein, M.: A unified statistical framework for sequence comparison and structure comparison. Proceedings of the National Academy of Sciences, U.S.A 95(11), 5913–5920 (1998)
Lunter, G., Rocco, A., Mimouni, N., Heger, A., Caldeira, A., Hein, J.: Uncertainty in homology inferences: Assessing and improving genomic sequence alignment. Genome Research 18(2), 298–309 (2008)
Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247(4), 536–540 (1995)
Nye, T.M.W., Lio, P., Gilks, W.R.: A novel algorithm and web-based tool for comparing two alternative phylogenetic trees. Bioinformatics 22(1), 117–119 (2006)
Pirovano, W., Feenstra, K.A., Heringa, J.: The meaning of alignment: lessons from structural diversity. BMC Bioinformatics 9 (2008)
Rajagopalan, S., Deitinghoff, L., Davis, D., Conrad, S., Skutella, T., Chedotal, A., Mueller, B.K., Strittmatter, S.M.: Neogenin mediates the action of repulsive guidance molecule. Nature Cell Biology 6(8), 756–762 (2004)
Rodriguez, A., Schmidler, S.C.: Bayesian protein structure alignment (under revision)
Saitou, N., Nei, M.: The neighbor-joining method - a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4(4), 406–425 (1987)
Sakakibara, Y., Brown, M., Hughey, R., Mian, I.S., Sjolander, K., Underwood, R.C., Haussler, D.: Stochastic context-free grammers for tRNA modeling. Nucleic Acids Research 22(23), 5112–5120 (1994)
Schmidler, S.C., Liu, J.S., Brutlag, D.L.: Bayesian segmentation of protein secondary structure. Journal of Computational Biology 7(1-2), 233–248 (2000)
Schmidler, S.C., Liu, J.S., Brutlag, D.L.: Bayesian protein structure prediction. Case Studies in Bayesian Statistics 5, 363–378 (2001)
Schmidler, S.C.: Statistical Models and Monte Carlo Methods for Protein Structure Prediction. PhD thesis, Stanford University (2002)
Schmidler, S.C.: Fast Bayesian shape matching using geometric algorithms (with discussion). In: Bernardo, J.M., Bayarri, S., Berger, J.O., Dawid, A.P., Heckerman, D., Smith, A.F.M., West, M. (eds.) Bayesian Statistics, vol. 8, pp. 471–490. Oxford University Press, Oxford (2006)
Schmidler, S.C.: Bayesian flexible shape matching with applications to structural bioinformatics (submitted)
Simons, K.T., Kooperberg, C., Huang, E., Baker, D.: Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and a Bayesian scoring function. Journal of Molecular Biology 268(1), 209–225 (1997)
Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. Journal of Computational Biology 1(4), 12 (1994)
Webb, B.J.M., Liu, J.S., Lawrence, C.E.: BALSA: Bayesian algorithm for local sequence alignment. Nucleic Acids Research 30(5), 1268–1277 (2002)
Wong, K.M., Suchard, M.A., Huelsenbeck, J.P.: Alignment uncertainty and genomic analysis. Science 319(5862), 473–476 (2008)
Wouter, B., Mardia, K.V., Taylor, C.C., Ferkinghoff-Borg, J., Krogh, A., Hamelryck, T.: A generative, probabilistic model of local protein structure. Proceedings of the National Academy of Sciences, U.S.A 105(26), 8932–8937 (2008)
Zhu, J., Liu, J.S., Lawrence, C.E.: Bayesian adaptive sequence alignment algorithms. Bioinformatics 14(1), 25–39 (1998)
Zhu, J.H., Weng, Z.P.: FAST: A novel protein structure alignment algorithm. Proteins-Structure Function and Bioinformatics 58(3), 618–627 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, R., Schmidler, S.C. (2014). Bayesian Multiple Protein Structure Alignment. In: Sharan, R. (eds) Research in Computational Molecular Biology. RECOMB 2014. Lecture Notes in Computer Science(), vol 8394. Springer, Cham. https://doi.org/10.1007/978-3-319-05269-4_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-05269-4_27
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05268-7
Online ISBN: 978-3-319-05269-4
eBook Packages: Computer ScienceComputer Science (R0)