Abstract
This chapter discusses the way in which dimensionality reduction algorithms such as diffusion maps and sketch-map can be used to analyze molecular dynamics trajectories. The first part discusses how these various algorithms function as well as practical issues such as landmark selection and how these algorithms can be used when the data to be analyzed comes from enhanced sampling trajectories. In the later part a comparison between the results obtained by applying various algorithms to two sets of sample data is performed and discussed. This section is then followed by a summary of how one algorithm in particular, sketch-map, has been applied to a range of problems. The chapter concludes with a discussion on the directions that we believe this field is currently moving.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
McCammon JA, Gelin BR, Karplus M (1977) Dynamics of folded proteins. Nature 267:585
Wales DJ (2003) Energy landscapes. Cambridge University Press, Cambridge
Friedman JH (1997) On bias, variance, 0/1—loss, and the curse-of-dimensionality. Data Min Knowl Disc 1(1):55–77
Amadei A, Linssen ABM, Berendsen HJC (1993) Essential dynamics of proteins. Proteins Struct Funct Genet 17:412
Garcia AE (1992) Large-amplitude nonlinear motions in proteins. Phys Rev Lett 68:2696–2699
Zhuravlev PI, Materese CK, Papoian GA (2009) Deconstructing the native state: energy landscapes, function and dynamics of globular proteins. J Phys Chem B 113:8800–8812
Hegger R, Altis A, Nguyen PH, Stock G (2007) How complex is the dynamics of peptide folding? Phys Rev Lett 98(2):028102
Facco E, d’Errico M, Rodriguez A, Laio A (2017) Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Sci Rep 7:12140
Noé F, Clementi C (2015) Kinetic distance and kinetic maps from molecular dynamics simulation. J Chem Theory Comput 11(10):5002–5011. PMID: 26574285
Piana S, Laio A (2008) Advillin folding takes place on a hypersurface of small dimensionality. Phys Rev Lett 101(20):208101
Borg I, Groenen PJF (2005) Modern multidimensional scaling: theory and applications. Springer, Berlin
Jolliffe IT (2002) Principal component analysis. Springer, Berlin
James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning with applications in R. Springer, Berlin
Frenkel D, Smit B (2002) Understanding molecular simulation. Academic Press, Orlando
Allen MP, Tildesley DJ (1990) Computer simulation of liquids. Oxford University Press, Oxford
Kabsch W (1976) A solution for the best rotation to relate two sets of vectors. Acta Crystallogr Sect A Cryst Phys Diffr Theor Gen Crystallogr 32(5):922–923
Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
de Silva V, Tenenbaum J (2004) Sparse multidimensional scaling using landmark points. Stanford Univ., Stanford, CA. http://graphics.stanford.edu/courses/cs468-05-winter/Papers/Landmarks/Silva_landmarks5.pdf
Schölkopf B, Smola A, Müller K-R (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319
Voter AF (2007) Introduction to the kinetic Monte Carlo method. In: Sickafus KE, Kotomin EA, Uberuaga BP (eds) Radiation effects in solids, volume 235 of NATO science series. Springer, Dordrecht, pp 1–23
Hochbaum DS, Shmoys DB (1985) A best possible heuristic for the k-center problem. Math Oper Res 10(2):180–184
Ceriotti M, Tribello GA, Parrinello M (2013) Demonstrating the transferability and the descriptive power of sketch-map. J Chem Theory Comput 9(3):1521–1532. PMID: 26587614
Barducci A, Bussi G, Parrinello M (2008) Well tempered metadynamics: a smoothly converging and tunable free energy method. Phys Rev Lett 100:020603
Bonomi M, Parrinello M (2010) Enhanced sampling in the well-tempered ensemble. Phys Rev Lett 104:190601
Balsera MA, Wriggers W, Oono Y, Schulten K (1996) Principal component analysis and long time protein dynamics. J Phys Chem 100(7):2567–2572
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Das P, Moll M, Stamati H, Kavraki LE, Clementi C (2006) Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction. Proc Natl Acad Sci USA 103(26):9885–9890
Plaku E, Stamati H, Clementi C, Kavraki LE (2007) Fast and reliable analysis of molecular motion using proximity relations and dimensionality reduction. Proteins Struct Funct Bioinf 67(4):897–907
Stamati H, Clementi C, Kavraki LE (2010) Application of nonlinear dimensionality reduction to characterize the conformational landscape of small peptides. Proteins Struct Funct Bioinf 78(2):223–235
Rohrdanz MA, Zheng W, Maggioni M, Clementi C (2011) Determination of reaction coordinates via locally scaled diffusion map. J Chem Phys 134(12):124116
Zheng W, Rohrdanz MA, Maggioni M, Clementi C (2011) Polymer reversal rate calculated via locally scaled diffusion map. J Chem Phys 134(14):144109
Donoho DL, Grimes C (2002) When does isomap recover the natural parameterization of families of articulated images? Technical Report 2002–27, Department of Statistics, Stanford University
Donoho DL, Grimes C (2003) Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc Natl Acad Sci USA 100(10):5591–5596
Rosman G, Bronstein MM, Bronstein AM, Kimmel R (2010) Nonlinear dimensionality reduction by topologically constrained isometric embedding. Int J Comput Vis 89:56–58
Dijkstra EW (1959) A note on two problems in connexion with graphs. Numer Math 1(1):269–271
Floyd RW (1962) Algorithm 97: shortest path. Commun ACM 5(6):345
Coifman RR, Lafon S, Lee AB, Maggioni M, Nadler B, Warner F, Zucker SW (2005) Geometric diffusions as a tool for harmonic analysis and structure definition of data: multiscale methods. Proc Natl Acad Sci USA 102(21):7432–7437
Coifman RR, Lafon S (2006) Diffusion maps. Appl Comput Harmon Anal 21(1):5–30
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
Ferguson AL, Panagiotopoulos AZ, Debenedetti PG, Kevrekidis IG (2010) Systematic determination of order parameters for chain dynamics using diffusion maps. Proc Natl Acad Sci USA 107(31):13597–13602
Singer A, Erban R, Kevrekidis IG, Coifman RR (2009) Detecting intrinsic slow variables in stochastic dynamical systems by anisotropic diffusion maps. Proc Natl Acad Sci USA 106(38):16090–16095
Rohrdanz MA, Zheng W, Clementi C (2013) Discovering mountain passes via torchlight: methods for the definition of reaction coordinates and pathways in complex macromolecular reactions. Annu Rev Phys Chem 64(1):295–316. PMID: 23298245
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
Ceriotti M, Tribello GA, Parrinello M (2011) Simplifying the representation of complex free-energy landscapes using sketch-map. Proc Natl Acad Sci USA 108:13023–13029
Tribello GA, Ceriotti M, Parrinello M (2012) Using sketch-map coordinates to analyze and bias molecular dynamics simulations. Proc Natl Acad Sci USA 109(14):5196–5201
Tribello GA, Ceriotti M, Parrinello M (2010) A self-learning algorithm for biased molecular dynamics. Proc Natl Acad Sci USA 107(41):17509–17514
Mortenson PN, Evans DA, Wales DJ (2002) Energy landscapes of model polyalanines. J Chem Phys 117:1363
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Ardevol A, Tribello GA, Ceriotti M, Parrinello M (2015) Probing the unfolded configurations of a β-hairpin using sketch-map. J Chem Theory Comput 11(3):1086–1093. PMID: 26579758
Frishman D, Argos P (1995) Knowledge-based protein secondary structure assignment. Proteins Struct Funct Bioinf 23(4):566–579
Mu Y, Nguyen PH, Stock G (2005) Energy landscape of a small peptide revealed by dihedral angle principal component analysis. Proteins Struct Funct Bioinf 58(1):45–52
Hinsen K (2006) Comment on: “energy landscape of a small peptide revealed by dihedral angle principal component analysis”. Proteins Struct Funct Bioinf 64(3):795–797
Pietrucci F, Laio A (2009) A collective variable for the efficient exploration of protein beta-sheet structures: application to SH3 and GB1. J Chem Theory Comput 5(9):2197–2201
Dunker AK, Silman I, Uversky VN, Sussman JL (2008) Function and structure of inherently disordered proteins. Curr Opin Struct Biol 18:756–764
Constanzi S (2010) Modeling g protein-coupled receptors: a concrete possibility. Chim Oggi 28:26–31
Goldfeld DA, Zhu K, Beuming T, Friesner RA (2011) Successful prediction of the intra- and extracellular loops of four g-protein-coupled receptors. Proc Natl Acad Sci 108(20):8275–8280
Kmiecik S, Jamroz M, Kolinski M (2015) Structure prediction of the second extracellular loop in G-protein-coupled receptors. Biophys J 106:2408–2416
Dyson HJ, Wright PE (2005) Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol 6:197–208
Doye JPK, Miller MA, Wales DJ (1999) The double-funnel energy landscape of the 38-atom Lennard-Jones cluster. J Chem Phys 110(14):6896–6906
Neirotti JP, Calvo F, Freeman DL, Doll JD (2000) Phase changes in 38-atom Lennard-Jones clusters. I. A parallel tempering study in the canonical ensemble. J Chem Phys 112(23):10340–10349
Calvo F, Neirotti JP, Freeman DL, Doll JD (2000) Phase changes in 38-atom Lennard-Jones clusters. II. A parallel tempering study of equilibrium and dynamic properties in the molecular dynamics and microcanonical ensembles. J Chem Phys 112(23):10350–10357
Wales DJ (2002) Discrete path sampling. Mol Phys 100:3285–3306
Bussi G, Gervasio FL, Laio A, Parrinello M (2006) Free-energy landscape for β hairpin folding from combined parallel tempering and metadynamics. J Chem Am Soc 128(41):13435–13441. PMID: 17031956
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28:235–242
Berman HM, Henrick K, Nakamura H (2003) Announcing the worldwide protein data bank. Nat Struct Biol 10:980
Rose PW, Prlic A, Altunkaya A, Bi C, Bradley AR, Christie CH, Costanzo LD, Duarte JM, Dutta S, Feng Z, Green RK, Goodsell DS, Hudson B, Kalro T, Lowe R, Peisach E, Randle C, Rose AS, Shao C, Tao Y-P, Valasatava Y, Voigt M, Westbrook JD, Woo J, Yang H, Young JY, Zardecki C, Berman HM, Burley SK (2017) The RCSB protein data bank: integrative view of protein, gene and 3d structural information. Nucleic Acids Res 45:D271–D281
Ardevol A, Palazzesi F, Tribello GA, Parrinello M (2016) General protein data bank-based collective variables for protein folding. J Chem Theory Comput 12(1):29–35. PMID: 26632859
Kukharenko O, Sawade K, Steuer J, Peter C (2016) Using dimensionality reduction to systematically expand conformational sampling of intrinsically disordered peptides. J Chem Theory Comput 12(10):4726–4734. PMID: 27588692
Laio A, Parrinello M (2002) Escaping free-energy minima. Proc Natl Acad Sci USA 99(20):12562–12566
Spiwok V, Lipovová P, Králová B (2007) Metadynamics in essential coordinates: free energy simulation of conformational changes. J Phys Chem B 111(12):3073–3076. PMID: 17388445
Sutto L, D’Abramo M, Gervasio FL (2010) Comparing the efficiency of biased and unbiased molecular dynamics in reconstructing the free energy landscape of met-enkephalin. J Chem Theory Comput 6(12):3640–3646
Spiwok V, Kralova B (2011) Metadynamics in the conformational space nonlinearly dimensionally reduced by Isomap. J Chem Phys 135(22):224504
Gasparotto P, Ceriotti M (2014) Recognizing molecular patterns by machine learning: an agnostic structural definition of the hydrogen bond. J Chem Phys 141(17):174110
Gasparotto P, Meißner RH, Ceriotti M (2018) Recognizing local and global structural motifs at the atomic scale. J Chem Theory Comput 14(2):486–498. PMID: 29298385
De S, Bartok AP, Csanyi G, Ceriotti M (2016) Comparing molecules and solids across structural and alchemical space. Phys Chem Chem Phys 18:13754–13769
Musil F, De S, Yang J, Campbell JE, Day GM, Ceriotti M (2018) Machine learning for the structure-energy-property landscapes of molecular crystals. Chem Sci 9:1289–1300
Chen W, Ferguson AL (2018) Molecular enhanced sampling with autoencoders: on-the-fly collective variable discovery and accelerated free energy landscape exploration. arXiv e-prints, December 2018
Sultan MM, Wayment-Steele HK, Pande VS (2018) Transferable neural networks for enhanced sampling of protein dynamics. arXiv e-prints, January 2018
Bowman GR, Pande VS, Noé F (2014) An introduction to Markov state models and their application to long timescale molecular simulation. In: Bowman GR, Pande VS, Noé F (eds) Advances in experimental medicine and biology. Springer, Dordrecht
Noé F, Clementi C (2017) Collective variables for the study of long-time kinetics from molecular trajectories: theory and methods. Curr Opin Struct Biol 43:141–147. Theory and simulation • Macromolecular assemblies
Tiwary P, Parrinello M (2013) From metadynamics to dynamics. Phys Rev Lett 111:230602
Tiwary P, Berne BJ (2016) Spectral gap optimization of order parameters for sampling complex molecular systems. Proc Natl Acad Sci 113(11):2839–2844
Sultan MM, Pande VS (2017) tICA-metadynamics: accelerating metadynamics by using kinetically selected collective variables. J Chem Theory Comput 13(6):2440–2447. PMID: 28383914
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Tribello, G.A., Gasparotto, P. (2019). Using Data-Reduction Techniques to Analyze Biomolecular Trajectories. In: Bonomi, M., Camilloni, C. (eds) Biomolecular Simulations. Methods in Molecular Biology, vol 2022. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9608-7_19
Download citation
DOI: https://doi.org/10.1007/978-1-4939-9608-7_19
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-4939-9607-0
Online ISBN: 978-1-4939-9608-7
eBook Packages: Springer Protocols