Modeling Dependence in Evolutionary Inference for Proteins

  • Gary Larson
  • Jeffrey L. Thorne
  • Scott Schmidler
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10812)


Protein structure alignment is a classic problem of computational biology, and is widely used to identify structural and functional similarity and to infer homology among proteins. Previously a statistical model for protein structural evolution has been introduced and shown to significantly improve phylogenetic inferences compared to approaches that utilize only amino acid sequence information. Here we extend this model to account for correlated evolutionary drift among neighboring amino acid positions, resulting in a spatio-temporal model of protein structure evolution. The result is a multivariate diffusion process convolved with a spatial birth-death process, which comes with little additional computational cost or analytical complexity compared to the site-independent model (SIM). We demonstrate that this extended, site-dependent model (SDM) yields a significant reduction of bias in estimated evolutionary distances and helps further improve phylogenetic tree reconstruction.


Protein structure Evolution Dynamic programming Phylogeny Diffusion process 



This work was partially supported by NSF grant DMS-1407622 and NIH grant R01-GM090201 (S.C.S.). Jeffrey L. Thorne was supported by NIH grant GM118508. Gary Larson was partially supported by NSF training grant DMS-1045153 (S.C.S.).


  1. 1.
    Wang, S., Ma, J., Peng, J., Xu, J.: Protein structure alignment beyond spatial proximity. Sci. Rep. 3, 1448 (2013). Scholar
  2. 2.
    Challis, C.J., Schmidler, S.C.: A stochastic evolutionary model for protein structure alignment and phylogeny. Mol. Biol. Evol. 29(11), 3575–3587 (2012). Scholar
  3. 3.
    Herman, J.L., Challis, C.J., Novák, A., Hein, J., Schmidler, S.C.: Simultaneous Bayesian estimation of alignment and phylogeny under a joint model of protein sequence and structure. Mol. Biol. Evol. 31(9), 2251–2266 (2014). Scholar
  4. 4.
    von Haeseler, A., Schöniger, M.: Evolution of DNA or amino acid sequences with dependent sites. J. Comput. Biol. 5(1), 149–163 (1998). Scholar
  5. 5.
    Arenas, M.: Trends in substitution models of molecular evolution. Front. Genet. 6, 319 (2015). Scholar
  6. 6.
    Schmidler, S.C.: Bayesian Statistics, vol. 8. Oxford University Press, New York (2006)Google Scholar
  7. 7.
    Wang, R., Schmidler, S.C.: Bayesian multiple protein structure alignment. In: Sharan, R. (ed.) RECOMB 2014. LNCS, vol. 8394, pp. 326–339. Springer, Cham (2014). Scholar
  8. 8.
    Cheng, H., Kim, B.H., Grishin, N.V.: MALIDUP: a database of manually constructed structure alignments for duplicated domain pairs. Proteins 70(4), 1162–1166 (2008). Scholar
  9. 9.
    Thorne, J.L., Kishino, H., Felsenstein, J.: An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol. 33(2), 114–124 (1991). Scholar
  10. 10.
    Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University, Cambridge (1998). Scholar
  11. 11.
    Kosiol, C., Goldman, N.: Different versions of the Dayhoff rate matrix. Mol. Biol. Evol. 22(2), 193–199 (2005). Scholar
  12. 12.
    Felsenstein, J.: Phylip - phylogeny inference package (version 3.2). Cladistics (1989)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Gary Larson
    • 1
  • Jeffrey L. Thorne
    • 2
  • Scott Schmidler
    • 3
  1. 1.Department of Statistical ScienceDuke UniversityDurhamUSA
  2. 2.Departments of Biological Sciences and StatisticsNorth Carolina State UniversityRaleighUSA
  3. 3.Departments of Statistical Science and Computer ScienceDuke UniversityDurhamUSA

Personalised recommendations