Advertisement

Spectral Techniques to Explore Point Clouds in Euclidean Space, with Applications to Collective Coordinates in Structural Biology

  • Frédéric Cazals
  • Frédéric Chazal
  • Joachim Giesen
Conference paper
Part of the The IMA Volumes in Mathematics and its Applications book series (IMA, volume 151)

Abstract

Life sciences, engineering, or telecommunications provide numerous systems whose description requires a large number of variables. Developing insights into such systems, forecasting their evolution, or monitoring them is often based on the inference of correlations between these variables. Given a collection of points describing states of the system, questions such as inferring the effective number of independent parameters of the system (its intrinsic dimensionality) and the way these are coupled are paramount to develop models. In this context, this paper makes two contributions.

First, we review recent work on spectral techniques to organize point clouds in Euclidean space, with emphasis on the main difficulties faced. Second, after a careful presentation of the bio-physical context, we present applications of dimensionality reduction techniques to a core problem in structural biology, namely protein folding.

Both from the computer science and the structural biology perspective, we expect this survey to shed new light on the importance of non linear computational geometry in geometric data analysis in general, and for protein folding in particular.

Keywords

Point Cloud Dimensionality Reduction Energy Landscape Morse Theory Folding Process 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    P.K. Agarwal, S. Har-Peled, and H. Yu. Embeddings of surfaces, curves, and moving points in euclidean space. In ACM SoCG, 2007.Google Scholar
  2. [2]
    D. Agrafiotus and H. Xu. A self-organizing principle for learning nonlinear manifolds. PNAS.Google Scholar
  3. [3]
    M. Belkin and P. Niyogi. Towards a theoretical foundation for laplacian-based manifold methods. In COLT 2005. Google Scholar
  4. [4]
    M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373–1396, 2003.MATHCrossRefGoogle Scholar
  5. [5]
    M. Belkin and P. Niyogi. Semi-supervised learning on riemannian manifolds. Machine Learning, Invited, Special Issue on Clustering, pages 209–234, 2004.Google Scholar
  6. [6]
    Y. Bengio, M. Monperrus, and H. Larochelle. Nonlocal estimation of manifold structure. Neural Computation, 18, 2006.Google Scholar
  7. [7]
    Y. Bengio, J.-F. Paiement, P. Vincent, O. Delalleau, N. Le Roux, and M. Ouimet. Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering. In NIPS, 2004.Google Scholar
  8. [8]
    C.M. Bishop. Pattern Recognition and Machine Learning. Springer, 2007.Google Scholar
  9. [9]
    C.M. Bishop, M. Svensen, and C.K.I. Williams. Gtm: The generative topographic mapping. Neural Computation, 10:215–234, 1998.CrossRefGoogle Scholar
  10. [10]
    M. Brand. Charting a manifold. In Advances in Neural Information Processing Systems 15. MIT Press, Cambridge, MA, 2003.Google Scholar
  11. [11]
    F. Chazal, D. Cohen-Steiner, and A. Lieutier. A sampling theory for compact sets in euclidean space. In Proceedings of the 22nd ACM Symposium on Computational Geometry, 2006.Google Scholar
  12. [12]
    F. Chazal, D. Cohen-Steiner, and Q. Mérigot. Stability of boundary measures. 2007.Google Scholar
  13. [13]
    Siu-Wing Cheng, Yajun Wang, and Zhuangzhi Wu. Provable dimension detection using principal component analysis. In Symposium on Computational Geometry, pp. 208–217, 2005.Google Scholar
  14. [14]
    B. Christiansen. The shortcomings of nlpca in identifying circulation regimes. J. Climate, 18:4814–4823, 2005.CrossRefMathSciNetGoogle Scholar
  15. [15]
    R.R. Coifman, S. Lafon, A. Lee, M. Maggioni, B. Nadler, F. Warner, and S. Zucker. Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proc. of Nat. Acad. Sci., 102:7426–7431, 2005.CrossRefGoogle Scholar
  16. [16]
    R.R. Coifman, S. Lafon, A. Lee, M. Maggioni, B. Nadler, F. Warner, and S. Zucker. Geometric diffusions as a tool for harmonic analysis and structure definition of data: Multiscale methods. Proc. of Nat. Acad. Sci., 102:7432–7437, 2005.CrossRefGoogle Scholar
  17. [17]
    J.A. Costa and A.O. Hero. Geodesic entropic graphs for dimension and entropy estimation in manifold learning. IEEE Trans. on Signal Processing, 52(8), 2004.Google Scholar
  18. [18]
    T.F. Cox and M.A. Cox. Multidimensional Scaling. Chapman Hall, 1994.Google Scholar
  19. [19]
    V. de Silva and G. Carlsson. Topological estimation using witness complexes. In Eurographics Symposium on Point-BasedGraphics, ETH, Switzerland, 2004.Google Scholar
  20. [20]
    V. de Silva, J.C. Langford, and J.B. Tenenbaum. Graph approximations to geodesics on embedded manifolds. 2000.Google Scholar
  21. [21]
    V. de Silva and J.B. Tenenbaum. Global versus local methods in nonlinear dimensionality reduction. In Advances in Neural Information Processing Systems 15. MIT Press, Cambridge, MA, 2003.Google Scholar
  22. [22]
    M. Dellnitz, M. Hessel von Molo, P. Metzner, R. Preiss, and C. Schutte. Graph algorithms for dynamical systems. In A. Mielke, editor, Analysis, Modeling and Simulation of Multiscale Problems. Springer, 2006.Google Scholar
  23. [23]
    M. Demazure. Bifurcations and Catastrophes: Geometry of Solutions to Nonlinear Problems. Springer, 1898.Google Scholar
  24. [24]
    D. Donoho and C. Grimes. Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proceedings of the National Academy of Sciences, 100(10):5591–5596, 2003.MATHCrossRefMathSciNetGoogle Scholar
  25. [25]
    Y. Bengio et al. Learning eigenfunctions links spectral embedding and kernel pca. Neural compuation, 16(10), 2004.Google Scholar
  26. [26]
    J. Giesen and U. Wagner. Shape dimansion and intrinsic metric from samples of manifolds with high co-dimension. In Proc. of the 19th Annual symp. Computational Geometry, pp. 329–337, 2003.Google Scholar
  27. [27]
    D. Givon, R. Kupferman, and A. Stuart. Extracting macroscopic dymamics. Nonlinearity, 17:R55–R127, 2004.MATHCrossRefMathSciNetGoogle Scholar
  28. [28]
    A. Globerson and S. Roweis. Metric learning by collapsing classes. In NIPS, 2005.Google Scholar
  29. [29]
    Jihun Ham, Daniel D. Lee, Sebastian Mika, and Bernhard Schölkopf. A kernel view of the dimensionality reduction of manifolds. In ICML '04: Proceedings of the twenty-first international conference on Machine learning, p. 47, New York, NY, USA, 2004. ACM.CrossRefGoogle Scholar
  30. [30]
    Gloria Haro, Gregory Randall, and Guillermo Sapiro. Stratification learning: Detecting mixed density and dimensionality in high dimensional point clouds. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pp. 553–560. MIT Press, Cambridge, MA, 2007.Google Scholar
  31. [31]
    T. Hastie and W. Stuetzle. Principal curves. J. Amer. Stat. Assoc., 84:502–516, 1989.MATHCrossRefMathSciNetGoogle Scholar
  32. [32]
    Matthias Hein and Markus Maier. Manifold denoising. In NIPS, pp. 561–568, 2006.Google Scholar
  33. [33]
    Matthias Hein and Markus Maier. Manifold denoising. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pp. 561–568. MIT Press, Cambridge, MA, 2007.Google Scholar
  34. [34]
    I. Horenko, J. Schmidt-Ehrenberg, and C. Schutte. Set-oriented dimension reduction: localizing principal component analysis vie hidden markov models. In LNBS in Bio-informatics. 2006.Google Scholar
  35. [35]
    B. Kégl. Intrinsic dimension estimation using packing numbers. In Advances in Neural Information Processing Systems 17. MIT Press, Cambridge, MA, 2002.Google Scholar
  36. [36]
    R.I. Kondor and J. Lafferty. Diffusion kernels on graphs and other discrete structures.Google Scholar
  37. [37]
    S. Lafon and A.B. Lee. Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning and data set parameterization. IEEE PAMI, 28(9):1393–1403, 2006.Google Scholar
  38. [38]
    M.C. Law and A.K. Jain. Incremental nonlinear dimensionality reduction by manifold learning. IEEE Trans. on pattern analysis and machine intelligence, 28(3), 2006.Google Scholar
  39. [39]
    J.A. Lee and M. Verleysen. Nonlinear Dimensionality Reduction. Springer, 2007.Google Scholar
  40. [40]
    Elizaveta Levina and Peter J. Bickel. Maximum likelihood estimation of intrinsic dimension. In Lawrence K. Saul, Yair Weiss, and Léon Bottou, editors, Advances in Neural Information Processing Systems 17, pp. 777–784. MIT Press, Cambridge, MA, 2005.Google Scholar
  41. [41]
    Nathan Linial, Eran London, and Yuri Rabinovich. The geometry of graphs and some of its algorithmic applications. In IEEE Symposium on Foundations of Computer Science, pp. 577–591, 1994.Google Scholar
  42. [42]
    J. Mao and A.K. Jain. Artificial neural networks for feature extraction and multivariate data projection. IEEE Trans. Neural Networks, 6(2), 1995.Google Scholar
  43. [43]
    E. Meerbach, E. Dittmer, I. Horenko, and C. Schutte. Multiscale modelling in molecular dynamics : Biomolecular conformations as metastable states. Lecture notes in physics, 703, 2006.Google Scholar
  44. [44]
    F. Memoli and G. Sapiro. Distance functions and geodesics on point clouds, 2005.Google Scholar
  45. [45]
    S.T. Roweis and L.K. Saul. Non linear dimensionality reduction by locally linear embedding. Science, 290:2323–2326, 2000.CrossRefGoogle Scholar
  46. [46]
    S.T. Roweis and L.K. Saul. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research, 4:119–155, 2003.MathSciNetGoogle Scholar
  47. [47]
    J.B. Tenenbaum and V. de Silva. Sparse multi-dimensional scaling using landmark points. In preparation.Google Scholar
  48. [48]
    J.B. Tenenbaum, V. de Silva, and J.C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319–2323, 2000.CrossRefGoogle Scholar
  49. [49]
    R. Tibshirani. Principal curves revisited. Statistics and Computing, 2:183–190, 1992.CrossRefGoogle Scholar
  50. [50]
    M. Trosset. Applications of multidimensional scaling to molecular conformation. Computing Science and Statistics, (29):148–152, 1998.Google Scholar
  51. [51]
    L.J.P. van der Maaten, E.O. Postma, and H.J. van den Herik. Dimensionality reduction: a comparative review. 2007.Google Scholar
  52. [52]
    Kilian Q. Weinberger and Lawrence K. Saul. Unsupervised learning of image manifolds by semidefinite programming. In CVPR (2), pp. 988–995, 2004.Google Scholar
  53. [53]
    Kilian Q. Weinberger, Fei Sha, and Lawrence K. Saul. Learning a kernel matrix for nonlinear dimensionality reduction. In ICML '04: Proceedings of the twenty-first international conference on Machine learning, p. 106, New York, NY, USA, 2004. ACM.Google Scholar
  54. [54]
    K.Q. Weinberger and L.K. Saul. An introduction to nonlinear dimensionality reduction by maximum variance unfolding. In AAAI, 2006.Google Scholar
  55. [55]
    K.Q. Weinberger and L.K. Saul. Unsupervised learning of image manifolds by semidefinite programming. International Journal of Computer Vision, 70(1):77–90, 2006.CrossRefGoogle Scholar
  56. [56]
    Li Yang. Building connected neighborhood graphs for isometric data embedding. In KDD, pp. 722–728, 2005.Google Scholar
  57. [57]
    P. Zhand, Y. Huang, S. Shekhar, and V. Kumar. Correlation analysis of spatial time series datasets. In Pacific Asia Conf. on Knowledge Discovery and Data Mining, 2003.Google Scholar
  58. [58]
    Hao Zhang, Oliver van Kaick, and Ramsay Dyer. Spectral mesh processing. Computer Graphics Forum (to appear), 2008.Google Scholar
  59. [59]
    A. Amadei, A.B.M. Linssen, and H.J.C. Berendsen. Essential dynamics of proteins. Proteins: Structure, Function, and Genetics, 17(4):412–425, 1993.CrossRefGoogle Scholar
  60. [60]
    K.D. Ball, R.S. Berry, R. Kunz, F-Y. Li, A. Proykova, and D.J. Wales. From topographies to dynamics on multidimensional potential energy surfaces of atomic clusters. Science, 271(5251):963–966, 1996.CrossRefGoogle Scholar
  61. [61]
    O. Becker and M. Karplus. The topology of multidimensional potential energy surfaces: Theory and application to peptide structure and kinetics. The Journal of Chemical Physics, 106(4):1495–1517, 1997.CrossRefGoogle Scholar
  62. [62]
    O.M. Becker. Principal coordinate maps of molecular potential energy surfaces. J. of Comp. Chem., 19(11):1255–1267, 1998.CrossRefGoogle Scholar
  63. [63]
    R. Stephen Berry, Nuran Elmaci, John P. Rose, and Benjamin Vekhter. Linking topography of its potential surface with the dynamics of folding of a proteinmodel. Proceedings of the National Academy of Sciences, 94(18):9520–9524, 1997.CrossRefGoogle Scholar
  64. [64]
    Robert B. Best and Gerhard Hummer. Chemical Theory and Computation Special Feature: Reaction coordinates and rates from transition paths. Proceedings of the National Academy of Sciences, 102(19):6732–6737, 2005.CrossRefGoogle Scholar
  65. [65]
    P.G. Bolhuis, D. Chandler, C. Dellago, and P.L. Geissler. Transition path sampling: Throwing ropes over rough mountain passes, in the dark. Annual review of physical chemistry, 53:291–318, 2002.CrossRefGoogle Scholar
  66. [66]
    P.G. Bolhuisdagger, C. Dellago, and D. Chandler. Reaction coordinates of biomolecular isomerization. PNAS, 97(11):5877–5882, 2000.CrossRefGoogle Scholar
  67. [67]
    C.L. Brooks, J. Onuchic, and D.J. Wales. Statistical thermodynamics: taking a walk on a landscape. Science, 293(5530):612 – 613, 2001.CrossRefGoogle Scholar
  68. [68]
    L. Chavez, J.N. Onuchic, and C. Clementi. Quantifying the roughness on the free energy landscape: Entropic bottlenecks and protein folding rates. J. Am. Chem. Soc., 126(27):8426–8432, 2004.CrossRefGoogle Scholar
  69. [69]
    Samuel S. Cho, Yaakov Levy, and Peter G. Wolynes. P versus Q: Structural reaction coordinates capture protein folding on smooth landscapes. Proceedings of the National Academy of Sciences, 103(3):586–591, 2006.CrossRefGoogle Scholar
  70. [70]
    P. Das, M. Moll, H. Stamati, L. Kavraki, and C. Clementi. Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction. PNAS, 103(26):9885–9890, 2006.CrossRefGoogle Scholar
  71. [71]
    Payel Das, Corey J. Wilson, Giovanni Fossati, Pernilla Wittung-Stafshede, Kathleen S. Matthews, and Cecilia Clementi. Characterization of the folding landscape of monomeric lactose repressor: Quantitative comparison of theory and experiment. Proceedings of the National Academy of Sciences, 102(41):14569–14574, 2005.CrossRefGoogle Scholar
  72. [72]
    R. Du, V. Pande, A.Y. Grosberg, T. Tanaka, and E.I. Shakhnovich. On the transition coordinate for protein folding. J. Chem. Phys., 108(1):334–350, 1998.CrossRefGoogle Scholar
  73. [73]
    R.L. Dunbrack. Rotamer libraries in the 21st century. Curr. Opin. Struct. Biol., 12(4):431–440, 2002.CrossRefGoogle Scholar
  74. [74]
    H.A. Scheraga et al. A united-residue force field for off-lattice protein-structure simulations. i. functional forms and parameters of long-range side-chain interaction potentials from protein crystal data. J. of Computational Chemistry, 18(7):849–873, 1997.CrossRefGoogle Scholar
  75. [75]
    A. Fersht. Structure and Mechanism in Protein Science: A Guide to Enzyme Catalysis and Protein Folding. 1999.Google Scholar
  76. [76]
    A.T. Fomenko and T.L. Kunii. Topological Modeling for visualization. Springer, 1997.Google Scholar
  77. [77]
    D. Frenkel and B. Smit. Understanding molecular simulation. Academic Press, 2002.Google Scholar
  78. [78]
    A.E. Garcia. Large-amplitude nonlinear motions in proteins. Physical Review Letters, 68(17):2696–2699, 1992.CrossRefGoogle Scholar
  79. [79]
    D. Gfeller, P. De Los Rios, A. Caflisch, and F. Rao. Complex network analysis of free-energy landscapes. Proceedings of the National Academy of Sciences, 104(6):1817–1822, 2007.CrossRefGoogle Scholar
  80. [80]
    Nobuhiro Go and Hiroshi Taketomi. Respective Roles of Short- and Long-Range Interactions in Protein Folding. Proceedings of the National Academy of Sciences, 75(2):559–563, 1978.CrossRefGoogle Scholar
  81. [81]
    Isaac A. Hubner, Eric J. Deeds, and Eugene I. Shakhnovich. Understanding ensemble protein folding at atomic detail. Proceedings of the National Academy of Sciences, 103(47):17747–17752, 2006.CrossRefGoogle Scholar
  82. [82]
    G. Hummer. From transition paths to transition states and rate coefficients. J. Chemical Physics, 120(2), 2004.Google Scholar
  83. [83]
    T. Ichiye and M. Karplus. Collective motions in proteins: A covariance analysis of atomic fluctuations in molecular dynamics and normal mode simulations. Proteins: Structure, Function, and Genetics, 11(3):205–217, 1991.CrossRefGoogle Scholar
  84. [84]
    C. L Brooks III, M. Gruebele, J. Onuchic, and P. Wolynes. Chemical physics of protein folding. Proceedings of the National Academy of Sciences, 95(19):11037–11038, 1998.CrossRefGoogle Scholar
  85. [85]
    S.E. Jackson. How do small single-domain proteins fold? Fold Des., 3(4):R81–91, 1998.CrossRefGoogle Scholar
  86. [86]
    J. Janin, S. Wodak, M. Levitt, and B. Maigret. Conformations of amino acid side chains in proteins. J. Mol. Biol., 125:357–386, 1978.CrossRefGoogle Scholar
  87. [87]
    T. Komatsuzaki, K. Hoshino, Y. Matsunaga, G.J. Rylance, R.L. Johnston, and D. Wales. How many dimensions are required to approximate the potential energy landscape of a model protein? J. Chem. Phys., 122, February 2005.Google Scholar
  88. [88]
    R.E. Kunz and R.S. Berry. Statistical interpretation of topographies and dynamics of multidimensional potentials. J. Chem. Phys., 103:1904–1912, August 1995.CrossRefGoogle Scholar
  89. [89]
    O.F. Lange and H Grubmller. Generalized correlation for biomolecular dynamics. Proteins, 62:1053–1061, 2006.CrossRefGoogle Scholar
  90. [90]
    C. Levinthal. Are there pathways for protein folding? Journal de Chimie Physique et de Physico-Chimie Biologique, 65:44–45, 1968.Google Scholar
  91. [91]
    John W. Milnor. Morse Theory. Princeton University Press, Princeton, NJ, 1963.MATHGoogle Scholar
  92. [92]
    E. Paci, M. Vendruscolo, and M. Karplus. Native and non-native interactions along protein folding and unfolding pathways. Proteins, 47(3):379–392, 2002.CrossRefGoogle Scholar
  93. [93]
    J. Palis and W. de Melo. Geometric Theory of Dynamical Systems. Springer, 1982.Google Scholar
  94. [94]
    M. Pettini. Geometry and Topology in Hamiltonian Dynamics and Statistical Mechanics. Springer, 2007.Google Scholar
  95. [95]
    E. Plaku, H. Stamati, C. Clementi, and L.E. Kavraki. Fast and reliable analysis of molecular motion using proximity relations and dimensionality reduction. Proteins: Structure, Function, and Bioinformatics, 67(4):897–907, 2007.CrossRefGoogle Scholar
  96. [96]
    G. Rylance, R. Johnston, Y. Matsunaga, C-B Li A. Baba, and T. Komatsuzaki. Topographical complexity of multidimensional energy landscapes. PNAS, 103(49):18551–18555, 2006.CrossRefGoogle Scholar
  97. [97]
    M. Tirion. Large amplitude elastic motions in proteins from a single-parameter, atomic analysis. Phys. Rev. Lett., 77:1905–1908, 1996.CrossRefGoogle Scholar
  98. [98]
    Monique M. Tirion. Large amplitude elastic motions in proteins from a singleparameter, atomic analysis. Phys. Rev. Lett., 77(9):1905–1908, Aug 1996.CrossRefGoogle Scholar
  99. [99]
    D.J. Wales. Energy Landscapes. Cambridge University Press, 2003.Google Scholar
  100. [100]
    L. Yang, G. Song, and R. Jernigan. Comparison of experimental and computed protein anisotropic temperature factors. In IEEE Bioinformactics and biomedecine workshop, 2007.Google Scholar

Copyright information

© Springer-Verlag New York 2009

Authors and Affiliations

  • Frédéric Cazals
    • 1
  • Frédéric Chazal
    • 2
  • Joachim Giesen
    • 3
  1. 1.INRIA Sophia-AntipolisValbonneFrance
  2. 2.INRIA SaclayParc Orsay UniversitéOrsay CedexFrance
  3. 3.Institut fuer InformatikJenaGermany

Personalised recommendations