Skip to main content

Principal Component Analysis: A Method for Determining the Essential Dynamics of Proteins

  • Protocol
  • First Online:
Protein Dynamics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1084))

Abstract

It has become commonplace to employ principal component analysis to reveal the most important motions in proteins. This method is more commonly known by its acronym, PCA. While most popular molecular dynamics packages inevitably provide PCA tools to analyze protein trajectories, researchers often make inferences of their results without having insight into how to make interpretations, and they are often unaware of limitations and generalizations of such analysis. Here we review best practices for applying standard PCA, describe useful variants, discuss why one may wish to make comparison studies, and describe a set of metrics that make comparisons possible. In practice, one will be forced to make inferences about the essential dynamics of a protein without having the desired amount of samples. Therefore, considerable time is spent on describing how to judge the significance of results, highlighting pitfalls. The topic of PCA is reviewed from the perspective of many practical considerations, and useful recipes are provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Pearson K (1901) On lines and planes of closest fit to systems of points in space. The London, Edinburgh and Dublin Philosophical Magazine and Journal of Science 2:572

    Google Scholar 

  2. Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24:441

    Google Scholar 

  3. Manly B (1986) Multivariate statistics—a primer. Chapman & Hall/CRC, Boca Raton, FL

    Google Scholar 

  4. Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics 2:433–459

    Article  Google Scholar 

  5. Jolliffe IT (2002) Principal component analysis, vol XXIX, 2nd edn, Springer series in statistics. Springer, New York, p 487, p. 28 illus. ISBN 978-0-387-95442-4

    Google Scholar 

  6. Balsera MA, Wriggers W, Oono Y, Schulten K (1996) Principal component analysis and long time protein dynamics. J Phys Chem 100:2567–2572

    Article  CAS  Google Scholar 

  7. Brüschweiler R (1995) Collective protein dynamics and nuclear spin relaxation. J Chem Phys 102(8):3396–3403

    Article  Google Scholar 

  8. Berendsen HJ, Hayward S (2000) Collective protein dynamics in relation to function. Curr Opin Struct Biol 10:165–169

    Article  PubMed  CAS  Google Scholar 

  9. Amadei A, Linssen AB, de Groot BL, van Aalten DM, Berendsen HJ (1996) An efficient method for sampling the essential subspace of proteins. J Biomol Struct Dyn 13:615–625

    Article  PubMed  CAS  Google Scholar 

  10. Amadei A, Linssen AB, Berendsen HJ (1993) Essential dynamics of proteins. Proteins 17:412–425

    Article  PubMed  CAS  Google Scholar 

  11. Krebs WG, Alexandrov V, Wilson CA, Echols N, Yu H, Gerstein M (2002) Normal mode analysis of macromolecular motions in a database framework: developing mode concentration as a useful classifying statistic. Proteins 48:682–695

    Article  PubMed  CAS  Google Scholar 

  12. Sanejouand TF (2001) Conformational change of proteins arising from normal mode calculations. Protein Eng 14:1–6

    Article  PubMed  Google Scholar 

  13. Atilgan AR, Durell SR, Jernigan RL, Demirel MC, Keskin O, Bahar I (2001) Anisotropy of fluctuation dynamics of proteins with an elastic network model. Biophys J 80:505–515

    Article  PubMed  CAS  Google Scholar 

  14. Tirion MM (1996) Large amplitude elastic motions in proteins from a single-parameter, atomic analysis. Phys Rev Lett 77:1905–1908

    Article  PubMed  CAS  Google Scholar 

  15. Yang L, Song G, Carriquiry A, Jernigan RL (2008) Close Correspondence between the motions from principal component analysis of multiple HIV-1 protease structures and elastic network modes. Structure 16:321–330

    Article  PubMed  CAS  Google Scholar 

  16. David CC, Jacobs DJ (2011) Characterizing protein motions from structure. J Mol Graph Model 31:41–56

    Article  PubMed  CAS  Google Scholar 

  17. Van Aalten DMF, De Groot BL, Findlay JBC, Berendsen HJC, Amadei A (1997) A comparison of techniques for calculating protein essential dynamics. J Comput Chem 18(2):169–181

    Article  Google Scholar 

  18. Rueda M, Chacó P, Orozco M (2007) Thorough validation of protein normal mode analysis: a comparative study with essential dynamics. Structure 15:565–575

    Article  PubMed  CAS  Google Scholar 

  19. Cui Q, Bahar I (eds) (2005) Normal mode analysis: theory and applications to biological and chemical systems. Chapman and Hall/CRC, Boca Raton, FL, 432 pages

    Google Scholar 

  20. Kitao A, Go N (1999) Investigating protein dynamics in collective coordinate space. Curr Opin Struct Biol 9:164–169

    Article  PubMed  CAS  Google Scholar 

  21. Ma J (2005) Usefulness and limitations of normal mode analysis in modeling dynamics of biomolecular complexes. Structure 13:373–380

    Article  PubMed  CAS  Google Scholar 

  22. Hayward S, Kitao A, Go N (1995) Harmonicity and anharmonicity in protein dynamics: a normal mode analysis and principal component analysis. Proteins 23(2):177–186

    Article  PubMed  CAS  Google Scholar 

  23. Hayward S, Kitao A, Go N (1994) Harmonic and anharmonic aspects in the dynamics of BPTI: a normal mode analysis and principal component analysis. Protein Sci 3(6):936–943

    Article  PubMed  CAS  Google Scholar 

  24. Scholkopf B, Smola A, Muller K-R (1999) Kernel principal component analysis. In: Scholkopf B, Burges CJC, Smola AJ (eds) Advances in kernel methods—support vector learning. MIT Press, Cambridge, MA, pp 327–352

    Google Scholar 

  25. Sapra S (2010) Robust vs. classical principal component analysis in the presence of outliers. Appl Econ Lett 17:519–523

    Article  Google Scholar 

  26. Storer M, Peter M, Roth PM, Urschler M, Bischof H. Fast-robust PCA (2009). Institute for Computer Graphics and Vision Graz University of Technology Inffeldgasse 16/II, 8010 Graz, Austria

    Google Scholar 

  27. Gnanadesikan R, Kettenring J (1972) Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 28:81–124

    Article  Google Scholar 

  28. Huber P (1981) Robust statistics. Wiley, New York

    Book  Google Scholar 

  29. De La Torre F, Black M (2003) A framework for robust subspace learning. Int J Comput Vis 54:117–142

    Article  Google Scholar 

  30. Handling of data containing outliers. Wolfram Stacklies and Henning Redestig CAS-MPG Partner Institute for Computational Biology (PICB) Shanghai, P.R. China and Max Planck Institute for Molecular Plant Physiology Potsdam, Germany

    Google Scholar 

  31. Joint Outliers and Principal Component Analysis. Georgy Gimel’farb, Alexander Shorin, and Patrice Delmas. Dept. of Computer Science, University of Auckland, P.B. 92019, Auckland, New Zealand

    Google Scholar 

  32. Kriegel HP, Kröger P, Schubert E, Zimek A (2008) a general framework for increasing the robustness of PCA-based correlation clustering algorithms. Scientific and Statistical Database Management. Lecture Notes in Computer Science, vol 5069. p 418

    Google Scholar 

  33. Cattell RB (1966) The scree test for the number of factors. Multivariate Behav Res 1(2):245–276

    Article  Google Scholar 

  34. Cattell RB, Vogelmann S (1977) A comprehensive trial of the scree and KG criteria for determining the number of factors. Multivariate Behav Res 12:289–325

    Article  Google Scholar 

  35. Charles David (2012) Essential dynamics of proteins using geometrical simulations and subspace analysis. Ph.D. Dissertation, UNC Charlotte, Department of Bioinformatics and Genomics

    Google Scholar 

  36. Jacobs DJ, Trivedi D, David CC, Yengo CM (2011) Kinetics and thermodynamics of the rate limiting conformational change in the myosin V mechanochemical cycle. J Mol Biol 407(5):716–730

    Article  PubMed  CAS  Google Scholar 

  37. Trivedi D, David CC, Jacobs DJ, Yengo CM (2012) Switch II mutants reveal coupling between the nucleotide- and actin-binding regions in myosin V. Biophys J 102(11):2545–2555. doi:10.1016/j.bpj.2012.04.025

    Article  PubMed  CAS  Google Scholar 

  38. Wells SA, Menor S, Hespenheide BM, Thorpe MF (2005) Constrained geometric simulation of diffusive motion in proteins. Phys Biol 2:S127–S136

    Article  PubMed  CAS  Google Scholar 

  39. Farrell DW, Kirill S, Thorpe MF (2010) Generating stereochemically acceptable protein pathways. Proteins 78:2908–2921

    Article  PubMed  CAS  Google Scholar 

  40. Jacobs DJ, Rader AJ, Kuhn LA, Thorpe MF (2001) Protein flexibility predictions using graph theory. Proteins 44:150–165

    Article  PubMed  CAS  Google Scholar 

  41. Amadei A, Ceruso MA, Di Nola A (1999) On the convergence of the conformational coordinates basis set obtained by the essential dynamics analysis of proteins’ molecular dynamics simulations. Proteins 36:419–424

    Article  PubMed  CAS  Google Scholar 

  42. Leo-Macias A, Lopez-Romero P, Lupyan D, Zerbino D, Ortiz AR (2005) An analysis of core deformations in protein superfamilies. Biophys J 88:1291–1299

    Article  PubMed  CAS  Google Scholar 

  43. Miao J, Ben-Israel A (1992) On principal angles between subspaces. Linear Algebra Appl 171:81–98

    Article  Google Scholar 

  44. Gunawan H, Neswan O, Setya-Budhi W (2005) A formula for angles between subspaces of inner product spaces. Contribut Algebra Geom 46(2):311–320

    Google Scholar 

  45. Absil PA, Edelman A, Koev P (2006) On the largest principal angle between random subspaces. Linear Algebra Appl 414(1):288–294

    Article  Google Scholar 

  46. Cerny CA, Kaiser HF (1977) A study of a measure of sampling adequacy for factor-analytic correlation matrices. Multivariate Behav Res 12(1):43–47

    Article  Google Scholar 

  47. Hess B (2002) Convergence of sampling in protein simulations. Phys Rev E 65:031910

    Article  Google Scholar 

  48. Kabsch W (1978) A discussion of the solution for the best rotation to relate two sets of vectors. Acta Crystallogr A 34:827–828

    Article  Google Scholar 

  49. Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4–5):411–430

    Article  PubMed  Google Scholar 

  50. Hyvärinen A (1999) Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans Neural Netw 10(3):626–634

    Article  PubMed  Google Scholar 

  51. Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15(2):265–286

    Article  Google Scholar 

  52. Yao F, Coquery J, Lê Cao K (2012) Independent principal component analysis for biologically meaningful dimension reduction of large biological data sets. BMC Bioinformatics 13:24

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media,New York

About this protocol

Cite this protocol

David, C.C., Jacobs, D.J. (2014). Principal Component Analysis: A Method for Determining the Essential Dynamics of Proteins. In: Livesay, D. (eds) Protein Dynamics. Methods in Molecular Biology, vol 1084. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-658-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-62703-658-0_11

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-62703-657-3

  • Online ISBN: 978-1-62703-658-0

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics