Skip to main content

Using Data-Reduction Techniques to Analyze Biomolecular Trajectories

  • Protocol
  • First Online:
Biomolecular Simulations

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2022))

Abstract

This chapter discusses the way in which dimensionality reduction algorithms such as diffusion maps and sketch-map can be used to analyze molecular dynamics trajectories. The first part discusses how these various algorithms function as well as practical issues such as landmark selection and how these algorithms can be used when the data to be analyzed comes from enhanced sampling trajectories. In the later part a comparison between the results obtained by applying various algorithms to two sets of sample data is performed and discussed. This section is then followed by a summary of how one algorithm in particular, sketch-map, has been applied to a range of problems. The chapter concludes with a discussion on the directions that we believe this field is currently moving.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. McCammon JA, Gelin BR, Karplus M (1977) Dynamics of folded proteins. Nature 267:585

    Article  CAS  PubMed  Google Scholar 

  2. Wales DJ (2003) Energy landscapes. Cambridge University Press, Cambridge

    Google Scholar 

  3. Friedman JH (1997) On bias, variance, 0/1—loss, and the curse-of-dimensionality. Data Min Knowl Disc 1(1):55–77

    Article  Google Scholar 

  4. Amadei A, Linssen ABM, Berendsen HJC (1993) Essential dynamics of proteins. Proteins Struct Funct Genet 17:412

    Article  CAS  PubMed  Google Scholar 

  5. Garcia AE (1992) Large-amplitude nonlinear motions in proteins. Phys Rev Lett 68:2696–2699

    Article  CAS  PubMed  Google Scholar 

  6. Zhuravlev PI, Materese CK, Papoian GA (2009) Deconstructing the native state: energy landscapes, function and dynamics of globular proteins. J Phys Chem B 113:8800–8812

    Article  CAS  PubMed  Google Scholar 

  7. Hegger R, Altis A, Nguyen PH, Stock G (2007) How complex is the dynamics of peptide folding? Phys Rev Lett 98(2):028102

    Article  PubMed  CAS  Google Scholar 

  8. Facco E, d’Errico M, Rodriguez A, Laio A (2017) Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Sci Rep 7:12140

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  9. Noé F, Clementi C (2015) Kinetic distance and kinetic maps from molecular dynamics simulation. J Chem Theory Comput 11(10):5002–5011. PMID: 26574285

    Article  PubMed  CAS  Google Scholar 

  10. Piana S, Laio A (2008) Advillin folding takes place on a hypersurface of small dimensionality. Phys Rev Lett 101(20):208101

    Article  PubMed  CAS  Google Scholar 

  11. Borg I, Groenen PJF (2005) Modern multidimensional scaling: theory and applications. Springer, Berlin

    Google Scholar 

  12. Jolliffe IT (2002) Principal component analysis. Springer, Berlin

    Google Scholar 

  13. James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning with applications in R. Springer, Berlin

    Book  Google Scholar 

  14. Frenkel D, Smit B (2002) Understanding molecular simulation. Academic Press, Orlando

    Google Scholar 

  15. Allen MP, Tildesley DJ (1990) Computer simulation of liquids. Oxford University Press, Oxford

    Google Scholar 

  16. Kabsch W (1976) A solution for the best rotation to relate two sets of vectors. Acta Crystallogr Sect A Cryst Phys Diffr Theor Gen Crystallogr 32(5):922–923

    Article  Google Scholar 

  17. Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323

    Article  CAS  PubMed  Google Scholar 

  18. de Silva V, Tenenbaum J (2004) Sparse multidimensional scaling using landmark points. Stanford Univ., Stanford, CA. http://graphics.stanford.edu/courses/cs468-05-winter/Papers/Landmarks/Silva_landmarks5.pdf

  19. Schölkopf B, Smola A, Müller K-R (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319

    Article  Google Scholar 

  20. Voter AF (2007) Introduction to the kinetic Monte Carlo method. In: Sickafus KE, Kotomin EA, Uberuaga BP (eds) Radiation effects in solids, volume 235 of NATO science series. Springer, Dordrecht, pp 1–23

    Google Scholar 

  21. Hochbaum DS, Shmoys DB (1985) A best possible heuristic for the k-center problem. Math Oper Res 10(2):180–184

    Article  Google Scholar 

  22. Ceriotti M, Tribello GA, Parrinello M (2013) Demonstrating the transferability and the descriptive power of sketch-map. J Chem Theory Comput 9(3):1521–1532. PMID: 26587614

    Article  CAS  PubMed  Google Scholar 

  23. Barducci A, Bussi G, Parrinello M (2008) Well tempered metadynamics: a smoothly converging and tunable free energy method. Phys Rev Lett 100:020603

    Article  PubMed  CAS  Google Scholar 

  24. Bonomi M, Parrinello M (2010) Enhanced sampling in the well-tempered ensemble. Phys Rev Lett 104:190601

    Article  CAS  PubMed  Google Scholar 

  25. Balsera MA, Wriggers W, Oono Y, Schulten K (1996) Principal component analysis and long time protein dynamics. J Phys Chem 100(7):2567–2572

    Article  CAS  Google Scholar 

  26. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

    Article  CAS  PubMed  Google Scholar 

  27. Das P, Moll M, Stamati H, Kavraki LE, Clementi C (2006) Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction. Proc Natl Acad Sci USA 103(26):9885–9890

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Plaku E, Stamati H, Clementi C, Kavraki LE (2007) Fast and reliable analysis of molecular motion using proximity relations and dimensionality reduction. Proteins Struct Funct Bioinf 67(4):897–907

    Article  CAS  Google Scholar 

  29. Stamati H, Clementi C, Kavraki LE (2010) Application of nonlinear dimensionality reduction to characterize the conformational landscape of small peptides. Proteins Struct Funct Bioinf 78(2):223–235

    Article  CAS  Google Scholar 

  30. Rohrdanz MA, Zheng W, Maggioni M, Clementi C (2011) Determination of reaction coordinates via locally scaled diffusion map. J Chem Phys 134(12):124116

    Article  PubMed  CAS  Google Scholar 

  31. Zheng W, Rohrdanz MA, Maggioni M, Clementi C (2011) Polymer reversal rate calculated via locally scaled diffusion map. J Chem Phys 134(14):144109

    Article  PubMed  CAS  Google Scholar 

  32. Donoho DL, Grimes C (2002) When does isomap recover the natural parameterization of families of articulated images? Technical Report 2002–27, Department of Statistics, Stanford University

    Google Scholar 

  33. Donoho DL, Grimes C (2003) Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc Natl Acad Sci USA 100(10):5591–5596

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Rosman G, Bronstein MM, Bronstein AM, Kimmel R (2010) Nonlinear dimensionality reduction by topologically constrained isometric embedding. Int J Comput Vis 89:56–58

    Article  Google Scholar 

  35. Dijkstra EW (1959) A note on two problems in connexion with graphs. Numer Math 1(1):269–271

    Article  Google Scholar 

  36. Floyd RW (1962) Algorithm 97: shortest path. Commun ACM 5(6):345

    Article  Google Scholar 

  37. Coifman RR, Lafon S, Lee AB, Maggioni M, Nadler B, Warner F, Zucker SW (2005) Geometric diffusions as a tool for harmonic analysis and structure definition of data: multiscale methods. Proc Natl Acad Sci USA 102(21):7432–7437

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Coifman RR, Lafon S (2006) Diffusion maps. Appl Comput Harmon Anal 21(1):5–30

    Article  Google Scholar 

  39. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396

    Article  Google Scholar 

  40. Ferguson AL, Panagiotopoulos AZ, Debenedetti PG, Kevrekidis IG (2010) Systematic determination of order parameters for chain dynamics using diffusion maps. Proc Natl Acad Sci USA 107(31):13597–13602

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Singer A, Erban R, Kevrekidis IG, Coifman RR (2009) Detecting intrinsic slow variables in stochastic dynamical systems by anisotropic diffusion maps. Proc Natl Acad Sci USA 106(38):16090–16095

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Rohrdanz MA, Zheng W, Clementi C (2013) Discovering mountain passes via torchlight: methods for the definition of reaction coordinates and pathways in complex macromolecular reactions. Annu Rev Phys Chem 64(1):295–316. PMID: 23298245

    Article  CAS  PubMed  Google Scholar 

  43. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396

    Article  Google Scholar 

  44. van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605

    Google Scholar 

  45. Ceriotti M, Tribello GA, Parrinello M (2011) Simplifying the representation of complex free-energy landscapes using sketch-map. Proc Natl Acad Sci USA 108:13023–13029

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Tribello GA, Ceriotti M, Parrinello M (2012) Using sketch-map coordinates to analyze and bias molecular dynamics simulations. Proc Natl Acad Sci USA 109(14):5196–5201

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Tribello GA, Ceriotti M, Parrinello M (2010) A self-learning algorithm for biased molecular dynamics. Proc Natl Acad Sci USA 107(41):17509–17514

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Mortenson PN, Evans DA, Wales DJ (2002) Energy landscapes of model polyalanines. J Chem Phys 117:1363

    Article  CAS  Google Scholar 

  49. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    Google Scholar 

  50. Ardevol A, Tribello GA, Ceriotti M, Parrinello M (2015) Probing the unfolded configurations of a β-hairpin using sketch-map. J Chem Theory Comput 11(3):1086–1093. PMID: 26579758

    Article  CAS  PubMed  Google Scholar 

  51. Frishman D, Argos P (1995) Knowledge-based protein secondary structure assignment. Proteins Struct Funct Bioinf 23(4):566–579

    Article  CAS  Google Scholar 

  52. Mu Y, Nguyen PH, Stock G (2005) Energy landscape of a small peptide revealed by dihedral angle principal component analysis. Proteins Struct Funct Bioinf 58(1):45–52

    Article  CAS  Google Scholar 

  53. Hinsen K (2006) Comment on: “energy landscape of a small peptide revealed by dihedral angle principal component analysis”. Proteins Struct Funct Bioinf 64(3):795–797

    Article  CAS  Google Scholar 

  54. Pietrucci F, Laio A (2009) A collective variable for the efficient exploration of protein beta-sheet structures: application to SH3 and GB1. J Chem Theory Comput 5(9):2197–2201

    Article  CAS  PubMed  Google Scholar 

  55. Dunker AK, Silman I, Uversky VN, Sussman JL (2008) Function and structure of inherently disordered proteins. Curr Opin Struct Biol 18:756–764

    Article  CAS  PubMed  Google Scholar 

  56. Constanzi S (2010) Modeling g protein-coupled receptors: a concrete possibility. Chim Oggi 28:26–31

    Google Scholar 

  57. Goldfeld DA, Zhu K, Beuming T, Friesner RA (2011) Successful prediction of the intra- and extracellular loops of four g-protein-coupled receptors. Proc Natl Acad Sci 108(20):8275–8280

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Kmiecik S, Jamroz M, Kolinski M (2015) Structure prediction of the second extracellular loop in G-protein-coupled receptors. Biophys J 106:2408–2416

    Article  CAS  Google Scholar 

  59. Dyson HJ, Wright PE (2005) Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol 6:197–208

    Article  CAS  PubMed  Google Scholar 

  60. Doye JPK, Miller MA, Wales DJ (1999) The double-funnel energy landscape of the 38-atom Lennard-Jones cluster. J Chem Phys 110(14):6896–6906

    Article  CAS  Google Scholar 

  61. Neirotti JP, Calvo F, Freeman DL, Doll JD (2000) Phase changes in 38-atom Lennard-Jones clusters. I. A parallel tempering study in the canonical ensemble. J Chem Phys 112(23):10340–10349

    CAS  Google Scholar 

  62. Calvo F, Neirotti JP, Freeman DL, Doll JD (2000) Phase changes in 38-atom Lennard-Jones clusters. II. A parallel tempering study of equilibrium and dynamic properties in the molecular dynamics and microcanonical ensembles. J Chem Phys 112(23):10350–10357

    CAS  Google Scholar 

  63. Wales DJ (2002) Discrete path sampling. Mol Phys 100:3285–3306

    Article  CAS  Google Scholar 

  64. Bussi G, Gervasio FL, Laio A, Parrinello M (2006) Free-energy landscape for β hairpin folding from combined parallel tempering and metadynamics. J Chem Am Soc 128(41):13435–13441. PMID: 17031956

    Article  CAS  Google Scholar 

  65. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28:235–242

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Berman HM, Henrick K, Nakamura H (2003) Announcing the worldwide protein data bank. Nat Struct Biol 10:980

    Article  CAS  PubMed  Google Scholar 

  67. Rose PW, Prlic A, Altunkaya A, Bi C, Bradley AR, Christie CH, Costanzo LD, Duarte JM, Dutta S, Feng Z, Green RK, Goodsell DS, Hudson B, Kalro T, Lowe R, Peisach E, Randle C, Rose AS, Shao C, Tao Y-P, Valasatava Y, Voigt M, Westbrook JD, Woo J, Yang H, Young JY, Zardecki C, Berman HM, Burley SK (2017) The RCSB protein data bank: integrative view of protein, gene and 3d structural information. Nucleic Acids Res 45:D271–D281

    Article  CAS  PubMed  Google Scholar 

  68. Ardevol A, Palazzesi F, Tribello GA, Parrinello M (2016) General protein data bank-based collective variables for protein folding. J Chem Theory Comput 12(1):29–35. PMID: 26632859

    Article  CAS  PubMed  Google Scholar 

  69. Kukharenko O, Sawade K, Steuer J, Peter C (2016) Using dimensionality reduction to systematically expand conformational sampling of intrinsically disordered peptides. J Chem Theory Comput 12(10):4726–4734. PMID: 27588692

    Article  CAS  PubMed  Google Scholar 

  70. Laio A, Parrinello M (2002) Escaping free-energy minima. Proc Natl Acad Sci USA 99(20):12562–12566

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Spiwok V, Lipovová P, Králová B (2007) Metadynamics in essential coordinates: free energy simulation of conformational changes. J Phys Chem B 111(12):3073–3076. PMID: 17388445

    Article  CAS  PubMed  Google Scholar 

  72. Sutto L, D’Abramo M, Gervasio FL (2010) Comparing the efficiency of biased and unbiased molecular dynamics in reconstructing the free energy landscape of met-enkephalin. J Chem Theory Comput 6(12):3640–3646

    Article  CAS  Google Scholar 

  73. Spiwok V, Kralova B (2011) Metadynamics in the conformational space nonlinearly dimensionally reduced by Isomap. J Chem Phys 135(22):224504

    Article  PubMed  CAS  Google Scholar 

  74. Gasparotto P, Ceriotti M (2014) Recognizing molecular patterns by machine learning: an agnostic structural definition of the hydrogen bond. J Chem Phys 141(17):174110

    Article  PubMed  CAS  Google Scholar 

  75. Gasparotto P, Meißner RH, Ceriotti M (2018) Recognizing local and global structural motifs at the atomic scale. J Chem Theory Comput 14(2):486–498. PMID: 29298385

    Article  CAS  PubMed  Google Scholar 

  76. De S, Bartok AP, Csanyi G, Ceriotti M (2016) Comparing molecules and solids across structural and alchemical space. Phys Chem Chem Phys 18:13754–13769

    Article  CAS  PubMed  Google Scholar 

  77. Musil F, De S, Yang J, Campbell JE, Day GM, Ceriotti M (2018) Machine learning for the structure-energy-property landscapes of molecular crystals. Chem Sci 9:1289–1300

    Article  CAS  PubMed  Google Scholar 

  78. Chen W, Ferguson AL (2018) Molecular enhanced sampling with autoencoders: on-the-fly collective variable discovery and accelerated free energy landscape exploration. arXiv e-prints, December 2018

    Article  CAS  PubMed  Google Scholar 

  79. Sultan MM, Wayment-Steele HK, Pande VS (2018) Transferable neural networks for enhanced sampling of protein dynamics. arXiv e-prints, January 2018

    Google Scholar 

  80. Bowman GR, Pande VS, Noé F (2014) An introduction to Markov state models and their application to long timescale molecular simulation. In: Bowman GR, Pande VS, Noé F (eds) Advances in experimental medicine and biology. Springer, Dordrecht

    Google Scholar 

  81. Noé F, Clementi C (2017) Collective variables for the study of long-time kinetics from molecular trajectories: theory and methods. Curr Opin Struct Biol 43:141–147. Theory and simulation • Macromolecular assemblies

    Article  PubMed  CAS  Google Scholar 

  82. Tiwary P, Parrinello M (2013) From metadynamics to dynamics. Phys Rev Lett 111:230602

    Article  CAS  PubMed  Google Scholar 

  83. Tiwary P, Berne BJ (2016) Spectral gap optimization of order parameters for sampling complex molecular systems. Proc Natl Acad Sci 113(11):2839–2844

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Sultan MM, Pande VS (2017) tICA-metadynamics: accelerating metadynamics by using kinetically selected collective variables. J Chem Theory Comput 13(6):2440–2447. PMID: 28383914

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gareth A. Tribello .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Tribello, G.A., Gasparotto, P. (2019). Using Data-Reduction Techniques to Analyze Biomolecular Trajectories. In: Bonomi, M., Camilloni, C. (eds) Biomolecular Simulations. Methods in Molecular Biology, vol 2022. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9608-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-9608-7_19

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-4939-9607-0

  • Online ISBN: 978-1-4939-9608-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics