Skip to main content

Enhancing Statistical Multiple Sequence Alignment and Tree Inference Using Structural Information

  • Protocol
  • First Online:
Computational Methods in Protein Evolution

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1851))

Abstract

For highly divergent sequences, there is often insufficient information to reliably construct alignments and phylogenetic trees. Since protein structure may be strongly conserved despite large divergences in sequence, structural information can be used to help identify homology in such cases.

While there exist well-studied models of sequence evolution, structurally informed alignment methods have typically made use of geometric measures of deviation that do not take into account the underlying mutational processes. In order to integrate structural information into sequence-based evolutionary models, we recently developed a stochastic model of structural evolution on a phylogenetic tree and implemented this as the StructAlign plugin for the StatAlign statistical alignment package.

In this chapter, we will outline the types of analyses that can be carried out using StructAlign, illustrating how the inclusion of structural information can be used to inform joint estimation of alignments and trees. StructAlign can also be used to infer branch-specific rates of structural evolution, and analysis of an example globin dataset highlights strong variation in the inferred rate across the tree. While structure is more highly conserved within clades, the rate of structural divergence as a function of sequence variation is larger between functionally divergent proteins. Allowing for the rate of structural divergence to vary over the tree results in an improved fit to the empirically observed pairwise RMSD values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Change history

  • 22 December 2018

    The published version of this book included errors in code listings in Chapter 10. These code listings have been corrected and text has been updated.

References

  1. Godzik A (1996) The structural alignment between two proteins: is there a unique answer? Protein Sci 5:1325–1338

    Article  CAS  Google Scholar 

  2. Sela I, Ashkenazy H, Katoh K, Pupko T (2015) GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucleic Acids Res 43:W7–W14

    Article  CAS  Google Scholar 

  3. Morrison DA, Ellis JT (1997) Effects of nucleotide sequence alignment on phylogeny estimation: a case study of 18S rDNAs of apicomplexa. Mol Biol Evol 14:428–441

    Article  CAS  Google Scholar 

  4. Ogden TH, Rosenberg MS (2006) Multiple sequence alignment accuracy and phylogenetic inference. Syst Biol 55:314–328

    Article  Google Scholar 

  5. Wong KM, Suchard MA, Huelsenbeck JP (2008) Alignment uncertainty and genomic analysis. Science 319:473–476

    Article  CAS  Google Scholar 

  6. Lunter G, Rocco A, Mimouni N, Heger A, Caldeira A, Hein J (2008) Uncertainty in homology inferences: assessing and improving genomic sequence alignment. Genome Res 18:298–309

    Article  CAS  Google Scholar 

  7. Herman JL, Novák Á, Lyngsø R, Szabó A, Miklós I, Hein J (2015) Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs. BMC Bioinformatics 16:108

    Article  Google Scholar 

  8. Nelesen S, Liu K, Zhao D, Linder CR, Warnow T (2008) The effect of the guide tree on multiple sequence alignments and subsequent phylogenetic analyses. In: Proceedings of the 2008 Pacific Symposium on Biocomputing. World Scientific. p 25–36

    Google Scholar 

  9. Lunter G, Drummond AJ, Miklós I, Hein J (2005) Statistical alignment: recent progress, new applications, and challenges. In: Statistical Methods in Molecular Evolution. Statistics for Biology and Health. Springer, New York, NY

    Google Scholar 

  10. Redelings BD, Suchard MA (2005) Joint Bayesian estimation of alignment and phylogeny. Syst Biol 54:401–418

    Article  Google Scholar 

  11. Westesson O, Lunter G, Paten B, Holmes I (2012) Accurate reconstruction of insertion-deletion histories by statistical phylogenetics. PLoS One 7:e34572

    Article  CAS  Google Scholar 

  12. Holmes IH (2017) Historian: accurate reconstruction of ancestral sequences and evolutionary rates. Bioinformatics 33:1227–1229

    Article  Google Scholar 

  13. Redelings BD (2014) Erasing errors due to alignment ambiguity when estimating positive selection. Mol Biol Evol 31:1979–1993

    Article  CAS  Google Scholar 

  14. Satija R, Pachter L, Hein J (2008) Combining statistical alignment and phylogenetic footprinting to detect regulatory elements. Bioinformatics 24:1236–1242

    Article  CAS  Google Scholar 

  15. Satija R, Novák Á, Miklós I, Lyngsø R, Hein J (2009) BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC. BMC Evol Biol 9:217

    Article  Google Scholar 

  16. Philippe H, Brinkmann H, Lavrov DV, Littlewood DTJ, Manuel M, Wörheide G, Baurain D (2011) Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biol 9:e1000602

    Article  CAS  Google Scholar 

  17. Kumar S, Filipski AJ, Battistuzzi FU, Kosakovsky Pond SL, Tamura K (2012) Statistics and truth in phylogenomics. Mol Biol Evol 29:457–472

    Article  CAS  Google Scholar 

  18. Talavera G, Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56:564–577

    Article  CAS  Google Scholar 

  19. Wu M, Chatterji S, Eisen JA (2012) Accounting for alignment uncertainty in phylogenomics. PLoS One 7:e30288

    Article  CAS  Google Scholar 

  20. Gatesy J, DeSalle R, Wheeler W (1993) Alignment-ambiguous nucleotide sites and the exclusion of systematic data. Mol Phylogenet Evol 2:152–157

    Article  CAS  Google Scholar 

  21. Lee MS (2001) Unalignable sequences and molecular evolution. Trends Ecol Evol 16:681–685

    Article  Google Scholar 

  22. Löytynoja A, Goldman N (2008) Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320:1632–1635

    Article  Google Scholar 

  23. Hasegawa H, Holm L (2009) Advances and pitfalls of protein structural alignment. Curr Opin Struct Biol 19:341–348

    Article  CAS  Google Scholar 

  24. Johnson MS, Šali A, Blundell TL (1990) Phylogenetic relationships from three-dimensional protein structures. Methods Enzymol 183:670–690

    Article  CAS  Google Scholar 

  25. Bujnicki JM (2000) Phylogeny of the restriction endonuclease-like superfamily inferred from comparison of protein structures. J Mol Evol 50:39–44

    Article  CAS  Google Scholar 

  26. Lundin D, Poole AM, Sjöberg B-M, Högbom M (2012) Use of structural phylogenetic networks for classification of the ferritin-like superfamily. J Biol Chem 287:20565–20575

    Article  CAS  Google Scholar 

  27. Chothia C, Lesk AM (1986) The relation between the divergence of sequence and structure in proteins. EMBO J 5:823

    Article  CAS  Google Scholar 

  28. Panchenko AR, Wolf YI, Panchenko LA, Madej T (2005) Evolutionary plasticity of protein families: coupling between sequence and structure variation. Proteins 61:535–544

    Article  CAS  Google Scholar 

  29. Illergård K, Ardell DH, Elofsson A (2009) Structure is three to ten times more conserved than sequence: a study of structural response in protein cores. Proteins 77:499–508

    Article  Google Scholar 

  30. Echave J, Spielman SJ, Wilke CO (2016) Causes of evolutionary rate variation among protein sites. Nat Rev Genet 17:109–121

    Article  CAS  Google Scholar 

  31. Worth CL, Gong S, Blundell TL (2009) Structural and functional constraints in the evolution of protein families. Nat Rev Mol Cell Biol 10:709–720

    Article  CAS  Google Scholar 

  32. Gilson AI, Marshall-Christensen A, Choi J-M, Shakhnovich EI (2017) The role of evolutionary selection in the dynamics of protein structure evolution. Biophys J 112:1350–1365

    Article  CAS  Google Scholar 

  33. Choi SC, Hobolth A, Robinson DM, Kishino H, Thorne JL (2007) Quantifying the impact of protein tertiary structure on molecular evolution. Mol Biol Evol 24:1769–1782

    Article  CAS  Google Scholar 

  34. Kleinman CL, Rodrigue N, Lartillot N, Philippe H (2010) Statistical potentials for improved structurally constrained evolutionary models. Mol Biol Evol 27:1546–1560

    Article  CAS  Google Scholar 

  35. Rodrigue N, Philippe H, Lartillot N (2006) Assessing site-interdependent phylogenetic models of sequence evolution. Mol Biol Evol 23:1762–1775

    Article  CAS  Google Scholar 

  36. Sadowski M, Taylor W (2010) On the evolutionary origins of “fold space continuity”: a study of topological convergence and divergence in mixed alpha-beta domains. J Struct Biol 172:244–252

    Article  CAS  Google Scholar 

  37. Rackovsky S (2015) Nonlinearities in protein space limit the utility of informatics in protein biophysics. Proteins 83:1923–1928

    Article  CAS  Google Scholar 

  38. Sadreyev RI, Kim B-H, Grishin NV (2009) Discrete–continuous duality of protein structure space. Curr Opin Struct Biol 19:321–328

    Article  CAS  Google Scholar 

  39. Holzgräfe C, Wallin S (2014) Smooth functional transition along a mutational pathway with an abrupt protein fold switch. Biophys J 107:1217–1225

    Article  Google Scholar 

  40. Challis CJ, Schmidler SC (2012) A stochastic evolutionary model for protein structure alignment and phylogeny. Mol Biol Evol 29:3575–3587

    Article  CAS  Google Scholar 

  41. Herman JL, Challis CJ, Novák Á, Hein J, Schmidler SC (2014) Simultaneous Bayesian estimation of alignment and phylogeny under a joint model of protein sequence and structure. Mol Biol Evol 31:2251–2266

    Article  CAS  Google Scholar 

  42. Novák Á, Miklós I, Lyngsø R, Hein J (2008) StatAlign: an extendable software package for joint Bayesian estimation of alignments and evolutionary trees. Bioinformatics 24:2403–2404

    Article  Google Scholar 

  43. Burmester T, Ebner B, Weich B, Hankeln T (2002) Cytoglobin: a novel globin type ubiquitously expressed invertebrate tissues. Mol Biol Evol 19:416–421

    Article  CAS  Google Scholar 

  44. de Sanctis D, Dewilde S, Pesce A, Moens L, Ascenzi P, Hankeln T, Burmester T, Bolognesi M (2004) Crystal structure of cytoglobin: the fourth globin type discovered in man displays heme hexa-coordination. J Mol Biol 336:917–927

    Article  Google Scholar 

  45. Hoffmann FG, Opazo JC, Storz JF (2010) Gene cooption and convergent evolution of oxygen transport hemoglobins in jawed and jawless vertebrates. Proc Natl Acad Sci U S A 107:14274–14279

    Article  CAS  Google Scholar 

  46. Hoffmann FG, Opazo JC, Storz JF (2011) Differential loss and retention of cytoglobin, myoglobin, and globin-e during the radiation of vertebrates. Genome Biol Evol 3:588–600

    Article  CAS  Google Scholar 

  47. Hoffmann FG, Opazo JC, Hoogewijs D, Hankeln T, Ebner B, Vinogradov SN, Bailly X, Storz JF (2012) Evolution of the globin gene family in deuterostomes: lineage-specific patterns of diversification and attrition. Mol Biol Evol 29:1735–1745

    Article  CAS  Google Scholar 

  48. Geyer C (2011) Importance sampling, simulated tempering, and umbrella sampling. In: Brooks S, Gelman A, Jones G, Meng X (eds) Handbook of Markov Chain Monte Carlo. Chapman & Hall/CRC, Boca Raton, pp 295–311

    Google Scholar 

  49. Altekar G, Dwarkadas S, Huelsenbeck JP, Ronquist F (2004) Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics 20:407–415

    Article  CAS  Google Scholar 

  50. Thorne JL, Kishino H, Felsenstein J (1992) Inching toward reality: an improved likelihood model of sequence evolution. J Mol Evol 34:3–16

    Article  CAS  Google Scholar 

  51. Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7:457–472

    Article  Google Scholar 

  52. Humphrey W, Dalke A, Schulten K (1996) VMD: visual molecular dynamics. J Mol Graph 14:33–38

    Article  CAS  Google Scholar 

  53. Hoy JA, Robinson H, Trent JT, Kakar S, Smagghe BJ, Hargrove MS (2007) Plant hemoglobins: a molecular fossil record for the evolution of oxygen transport. J Mol Biol 371:168–179

    Article  CAS  Google Scholar 

  54. Lobanov M, Bogatyreva N, Galzitskaia O (2008) Radius of gyration is indicator of compactness of protein structure. Mol Biol 42:701–706

    Article  Google Scholar 

  55. Christensen AB, Herman JL, Elphick MR, Kober KM, Janies D, Linchangco G, Semmens DC, Bailly X, Vinogradov SN, Hoogewijs D (2015) Phylogeny of echinoderm hemoglobins. PLoS One 10:e0129668

    Article  Google Scholar 

  56. Gupta KJ, Hebelstrup KH, Mur LA, Igamberdiev AU (2011) Plant hemoglobins: important players at the crossroads between oxygen and nitric oxide. FEBS Lett 585:3843–3849

    Article  CAS  Google Scholar 

  57. Hargrove MS, Brucker EA, Stec B, Sarath G, Arredondo-Peter R, Klucas RV, Olson JS, Phillips GN (2000) Crystal structure of a nonsymbiotic plant hemoglobin. Structure 8:1005–1014

    Article  CAS  Google Scholar 

  58. Sharir-Ivry A, Xia Y (2017) The impact of native state switching on protein sequence evolution. Mol Biol Evol 34:1378–1390

    Article  CAS  Google Scholar 

  59. Maadooliat M, Zhou L, Najibi SM, Gao X, Huang JZ (2016) Collective estimation of multiple bivariate density functions with application to angular-sampling-based protein loop modeling. J Am Stat Assoc 111:43–56

    Article  CAS  Google Scholar 

  60. Golden M, García-Portugués E, Sørensen M, Mardia KV, Hamelryck T, Hein J (2017) A generative angular model of protein structure evolution. Mol Biol Evol 34:2085–2100

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joseph L. Herman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Herman, J.L. (2019). Enhancing Statistical Multiple Sequence Alignment and Tree Inference Using Structural Information. In: Sikosek, T. (eds) Computational Methods in Protein Evolution. Methods in Molecular Biology, vol 1851. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8736-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-8736-8_10

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-8735-1

  • Online ISBN: 978-1-4939-8736-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics