Skip to main content
Log in

Distance matrix-based approach to protein structure prediction

  • Published:
Journal of Structural and Functional Genomics

Abstract

Much structural information is encoded in the internal distances; a distance matrix-based approach can be used to predict protein structure and dynamics, and for structural refinement. Our approach is based on the square distance matrix D = [r 2ij ] containing all square distances between residues in proteins. This distance matrix contains more information than the contact matrix C, that has elements of either 0 or 1 depending on whether the distance r ij is greater or less than a cutoff value r cutoff. We have performed spectral decomposition of the distance matrices \( {\mathbf{D}} = \sum {\lambda_{k} {\mathbf{v}}_{k} {\mathbf{v}}_{k}^{T} } \), in terms of eigenvalues \( \lambda_{k} \) and the corresponding eigenvectors \( {\mathbf{v}}_{k} \) and found that it contains at most five nonzero terms. A dominant eigenvector is proportional to r 2—the square distance of points from the center of mass, with the next three being the principal components of the system of points. By predicting r 2 from the sequence we can approximate a distance matrix of a protein with an expected RMSD value of about 7.3 Å, and by combining it with the prediction of the first principal component we can improve this approximation to 4.0 Å. We can also explain the role of hydrophobic interactions for the protein structure, because r is highly correlated with the hydrophobic profile of the sequence. Moreover, r is highly correlated with several sequence profiles which are useful in protein structure prediction, such as contact number, the residue-wise contact order (RWCO) or mean square fluctuations (i.e. crystallographic temperature factors). We have also shown that the next three components are related to spatial directionality of the secondary structure elements, and they may be also predicted from the sequence, improving overall structure prediction. We have also shown that the large number of available HIV-1 protease structures provides a remarkable sampling of conformations, which can be viewed as direct structural information about the dynamics. After structure matching, we apply principal component analysis (PCA) to obtain the important apparent motions for both bound and unbound structures. There are significant similarities between the first few key motions and the first few low-frequency normal modes calculated from a static representative structure with an elastic network model (ENM) that is based on the contact matrix C (related to D), strongly suggesting that the variations among the observed structures and the corresponding conformational changes are facilitated by the low-frequency, global motions intrinsic to the structure. Similarities are also found when the approach is applied to an NMR ensemble, as well as to atomic molecular dynamics (MD) trajectories. Thus, a sufficiently large number of experimental structures can directly provide important information about protein dynamics, but ENM can also provide a similar sampling of conformations. Finally, we use distance constraints from databases of known protein structures for structure refinement. We use the distributions of distances of various types in known protein structures to obtain the most probable ranges or the mean-force potentials for the distances. We then impose these constraints on structures to be refined or include the mean-force potentials directly in the energy minimization so that more plausible structural models can be built. This approach has been successfully used by us in 2006 in the CASPR structure refinement (http://predictioncenter.org/caspR).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Pokarowski P, Kloczkowski A, Jernigan RL, Kothari NS, Pokarowska M, Kolinski A (2005) Inferring ideal amino acid interaction forms from statistical protein contact potentials. Proteins: Struct Funct Bioinform 59:49–57. doi:10.1002/prot.20380

    Article  CAS  Google Scholar 

  2. Kawashima S, Kanehisa M (2000) AAindex: amino acid index database. Nucleic Acids Res 28:374. doi:10.1093/nar/28.1.374

    Article  PubMed  CAS  Google Scholar 

  3. Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M (2008) AAindex: amino acid index database, progress report 2008. Nucleic Acids Res 36:D202–D205. doi:10.1093/nar/gkm998

    Article  PubMed  CAS  Google Scholar 

  4. Pokarowski P, Kloczkowski A, Nowakowski S, Pokarowska M, Jernigan RL, Kolinski A (2007) Ideal amino acid exchange forms for approximating substitution matrices. Proteins: Struct Funct Bioinform 69:379–393. doi:10.1002/prot.21509

    Article  CAS  Google Scholar 

  5. Bastolla U, Porto M, Roman HE, Vendruscolo M (2005) Principal eigenvector of contact matrices and hydrophobicity profiles in proteins. Proteins: Struct Funct Bioinform 58:22–30. doi:10.1002/prot.20240

    Article  CAS  Google Scholar 

  6. Choi IG, Kwon J, Kim SH (2004) Local feature frequency profile: a method to measure structural similarity in proteins. Proc Natl Acad Sci USA 101:3797–3802. doi:10.1073/pnas.0308656100

    Article  PubMed  CAS  Google Scholar 

  7. Domingues FS, Rahnenfuhrer J, Lengauer T (2007) Conformational analysis of alternative protein structures. Bioinformatics 23:3131–3138. doi:10.1093/bioinformatics/btm499

    Article  PubMed  CAS  Google Scholar 

  8. Godzik A, Skolnick J, Kolinski A (1993) Regularities in interaction patterns of globular-proteins. Protein Eng 6:801–810. doi:10.1093/protein/6.8.801

    Article  PubMed  CAS  Google Scholar 

  9. Heger A, Lappe M, Holm L (2004) Accurate detection of very sparse sequence motifs. J Comput Biol 11:843–857. doi:10.1089/cmb.2004.11.843

    Article  PubMed  CAS  Google Scholar 

  10. Holm L, Park J (2000) DaliLite workbench for protein structure comparison. Bioinformatics 16:566–567. doi:10.1093/bioinformatics/16.6.566

    Article  PubMed  CAS  Google Scholar 

  11. Huang YM, Bystroff C (2006) Improved pairwise alignments of proteins in the Twilight Zone using local structure predictions. Bioinformatics 22:413–422. doi:10.1093/bioinformatics/bti828

    Article  PubMed  CAS  Google Scholar 

  12. Jaroszewski L, Li WZ, Godzik A (2002) In search for more accurate alignments in the twilight zone. Protein Sci 11:1702–1713. doi:10.1110/ps.4820102

    Article  PubMed  CAS  Google Scholar 

  13. Kolodny R, Linial N (2004) Approximate protein structural alignment in polynomial time. Proc Natl Acad Sci USA 101:12201–12206. doi:10.1073/pnas.0404383101

    Article  PubMed  CAS  Google Scholar 

  14. Mooney SD, Liang MHP, DeConde R, Altman RB (2005) Structural characterization of proteins using residue environments. Proteins: Struct Funct Bioinform 61:741–747. doi:10.1002/prot.20661

    Article  CAS  Google Scholar 

  15. Pazos F, Valencia A (2008) Protein co-evolution, co-adaptation and interactions. EMBO J 27:2648–2655. doi:10.1038/emboj.2008.189

    Article  PubMed  CAS  Google Scholar 

  16. Rodionov MA, Galaktionov SG (1992) Analysis of the 3-dimensional structure of proteins in terms of residue contact matrices. 1. The contact criterion. Mol Biol 26:773–776

    Google Scholar 

  17. Sato T, Yamanishi Y, Kanehisa M, Toh H (2005) The inference of protein–protein interactions by co-evolutionary analysis is improved by excluding the information about the phylogenetic relationships. Bioinformatics 21:3482–3489. doi:10.1093/bioinformatics/bti564

    Article  PubMed  CAS  Google Scholar 

  18. Sato T, Yamanishi Y, Horimoto K, Kanehisa M, Toh H (2006) Partial correlation coefficient between distance matrices as a new indicator of protein–protein interactions. Bioinformatics 22:2488–2492. doi:10.1093/bioinformatics/btl419

    Article  PubMed  CAS  Google Scholar 

  19. Schneider TR (2000) Objective comparison of protein structures: error-scaled difference distance matrices. Acta Crystallogr D Biol Crystallogr 56:714–721. doi:10.1107/S0907444900003723

    Article  PubMed  CAS  Google Scholar 

  20. Snyder DA, Montelione GT (2005) Clustering algorithms for identifying core atom sets and for assessing the precision of protein structure ensembles. Proteins: Struct Funct Bioinform 59:673–686. doi:10.1002/prot.20402

    Article  CAS  Google Scholar 

  21. Snyder DA, Bhattacharya A, Huang YPJ, Montelione GT (2005) Assessing precision and accuracy of protein structures derived from NMR data. Proteins: Struct Funct Bioinform 59:655–661. doi:10.1002/prot.20499

    Article  CAS  Google Scholar 

  22. Szustakowski JD, Weng ZP (2000) Protein structure alignment using a genetic algorithm. Proteins-Structure Funct Genet 38:428–440. doi:10.1002/(SICI)1097-0134(20000301)38:4<428::AID-PROT8>3.0.CO;2-N

    Article  CAS  Google Scholar 

  23. Ye JP, Janardan R (2004) Approximate multiple protein structure alignment using the sum-of-pairs distance. J Comput Biol 11:986–1000. doi:10.1089/cmb.2004.11.986

    Article  PubMed  CAS  Google Scholar 

  24. Zhou XB, Chou J, Wong STC (2006) Protein structure similarity from principle component correlation analysis. BMC Bioinformatics 7:40 (10pp)

    Google Scholar 

  25. Flory PJ (1976) Statistical thermodynamics of random networks. Proc R Soc Lond A: Math Phys Eng Sci 351:351–380

  26. Kloczkowski A, Mark JE, Erman B (1989) Chain dimensions and fluctuations in random elastomeric networks 1 phantom Gaussian networks in the undeformed state. Macromolecules 22:1423–1432. doi:10.1021/ma00193a070

    Article  CAS  Google Scholar 

  27. Bahar I, Atilgan AR, Erman B (1997) Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. Fold Des 2:173–181. doi:10.1016/S1359-0278(97)00024-2

    Article  PubMed  CAS  Google Scholar 

  28. Haliloglu T, Bahar I, Erman B (1997) Gaussian dynamics of folded proteins. Phys Rev Lett 79:3090–3093. doi:10.1103/PhysRevLett.79.3090

    Article  CAS  Google Scholar 

  29. Tirion MM (1996) Large amplitude elastic motions in proteins from a single-parameter, atomic analysis. Phys Rev Lett 77:1905–1908. doi:10.1103/PhysRevLett.77.1905

    Article  PubMed  CAS  Google Scholar 

  30. Kundu S, Melton JS, Sorensen DC, Phillips GN (2002) Dynamics of proteins in crystals: comparison of experiment with simple models. Biophys J 83:723–732. doi:10.1016/S0006-3495(02)75203-X

    Article  PubMed  CAS  Google Scholar 

  31. Sen TZ, Feng YP, Garcia JV, Kloczkowski A, Jernigan RL (2006) The extent of cooperativity of protein motions observed with elastic network models is similar for atomic and coarser-grained models. J Chem Theory Comput 2:696–704. doi:10.1021/ct600060d

    Article  PubMed  CAS  Google Scholar 

  32. Atilgan AR, Durell SR, Jernigan RL, Demirel MC, Keskin O, Bahar I (2001) Anisotropy of fluctuation dynamics of proteins with an elastic network model. Biophys J 80:505–515. doi:10.1016/S0006-3495(01)76033-X

    Article  PubMed  CAS  Google Scholar 

  33. Keskin O, Bahar I, Flatow D, Covell DG, Jernigan RL (2002) Molecular mechanisms of chaperonin GroEL-GroES function. Biochemistry 41:491–501. doi:10.1021/bi011393x

    Article  PubMed  CAS  Google Scholar 

  34. Keskin O, Durell SR, Bahar I, Jernigan RL, Covell DG (2002) Relating molecular flexibility to function: a case study of tubulin. Biophys J 83:663–680. doi:10.1016/S0006-3495(02)75199-0

    Article  PubMed  CAS  Google Scholar 

  35. Navizet I, Lavery R, Jernigan RL (2004) Myosin flexibility: structural domains and collective vibrations. Proteins-Structure Funct Genet 54:384–393. doi:10.1002/prot.10476

    Article  CAS  Google Scholar 

  36. Wang YM, Rader AJ, Bahar I, Jernigan RL (2004) Global ribosome motions revealed with elastic network model. J Struct Biol 147:302–314. doi:10.1016/j.jsb.2004.01.005

    Article  PubMed  CAS  Google Scholar 

  37. Wang YM, Jernigan RL (2005) Comparison of tRNA motions in the free and ribosomal bound structures. Biophys J 89:3399–3409. doi:10.1529/biophysj.105.064840

    Article  PubMed  CAS  Google Scholar 

  38. Yan A, Wang Y, Kloczkowski A, Jernigan RL (2008) Effects of protein subunits removal on the computed motions of partial 30S structures of the ribosome. J Chem Theory Comput 4:1757–1767

    Google Scholar 

  39. Crippen GM, Havel TF (1978) Stable calculation of coordinates from distance information. Acta Crystallogr A 34:282–284. doi:10.1107/S0567739478000522

    Article  Google Scholar 

  40. Havel TF, Crippen GM, Kuntz ID (1979) Effects of distance constraints on macromolecular conformation. 2. Simulation of experimental results and theoretical predictions. Biopolymers 18:73–81. doi:10.1002/bip.1979.360180108

    Article  CAS  Google Scholar 

  41. Havel TF, Kuntz ID, Crippen GM (1983) The combinatorial distance geometry method for the calculation of molecular-conformation. 1. A new approach to an old problem. J Theor Biol 104:359–381. doi:10.1016/0022-5193(83)90112-1

    Article  PubMed  CAS  Google Scholar 

  42. Havel TF, Crippen GM, Kuntz ID, Blaney JM (1983) The combinatorial distance geometry method for the calculation of molecular-conformation. 2. Sample problems and computational statistics. J Theor Biol 104:383–400. doi:10.1016/0022-5193(83)90113-3

    Article  PubMed  CAS  Google Scholar 

  43. Havel TF, Kuntz ID, Crippen GM (1983) The theory and practice of distance geometry. Bull Math Biol 45:665–720

    Google Scholar 

  44. Petsko GA, Frauenfelder H (1980) Crystallographic approaches to the dynamics of ligand-binding to myoglobin. Fed Proc 39:1648

    Google Scholar 

  45. Halle B (2002) Flexibility and packing in proteins. Proc Natl Acad Sci USA 99:1274–1279. doi:10.1073/pnas.032522499

    Article  PubMed  CAS  Google Scholar 

  46. Chen P, Wang B, Wong HS, Huang DS (2007) Prediction of protein B-factors using multi-class bounded SVM. Protein Pept Lett 14:185–190. doi:10.2174/092986607779816078

    Article  PubMed  CAS  Google Scholar 

  47. Yang L, Song G, Carriquiry A, Jernigan RL (2008) Close correspondence between the motions from principal component analysis of multiple HIV-1 protease structures and elastic network modes. Structure 16:321–330. doi:10.1016/j.str.2007.12.011

    Article  PubMed  CAS  Google Scholar 

  48. Sippl MJ (1990) Calculation of conformational ensembles from potentials of mean force—an approach to the knowledge-based prediction of local structures in globular-proteins. J Mol Biol 213:859–883

    Google Scholar 

  49. Sippl MJ (1992) Detection of native-like models for amino-acid-sequences of unknown 3-dimensional structure in a data-base of known protein conformations. Proteins 13:258–271

    Article  PubMed  CAS  Google Scholar 

  50. Sippl MJ (1993) Recognition of errors in 3-dimensional structures of proteins. Proteins-Structure Funct Genet 17:355–362. doi:10.1002/prot.340170404

    Article  CAS  Google Scholar 

  51. Sippl MJ (1995) Knowledge-based potentials for proteins. Curr Opin Struct Biol 5:229–235

    Google Scholar 

  52. Sippl MJ, Scheraga HA (1986) Cayley-Menger coordinates. Proc Natl Acad Sci USA 83:2283–2287. doi:10.1073/pnas.83.8.2283

    Article  PubMed  CAS  Google Scholar 

  53. Sippl MJ, Scheraga HA (1985) Solution of the embedding problem and decomposition of symmetric-matrices. Proc Natl Acad Sci USA 82:2197–2201. doi:10.1073/pnas.82.8.2197

    Article  PubMed  CAS  Google Scholar 

  54. Melo F, Feytmans E (1997) Novel knowledge-based mean force potential at atomic level. J Mol Biol 267:207–222. doi:10.1006/jmbi.1996.0868

    Article  PubMed  CAS  Google Scholar 

  55. Melo F, Feytmans E (1998) Assessing protein structures with a non-local atomic interaction energy. J Mol Biol 277:1141–1152. doi:10.1006/jmbi.1998.1665

    Article  PubMed  CAS  Google Scholar 

  56. Garbuzynskiy SO, Melnik BS, Lobanov MY, Finkelstein AV, Galzitskaya OV (2005) Comparison of X-ray and NMR structures: is there a systematic difference in residue contacts between X-ray and NMR-resolved protein structures? Proteins: Struct Funct Bioinform 60:139–147. doi:10.1002/prot.20491

    Article  CAS  Google Scholar 

  57. Wu D, Cui F, Jernigan R, Wu ZJ (2007) PIDD: database for protein inter-atomic distance distributions. Nucleic Acids Res 35:D202–D207. doi:10.1093/nar/gkl802

    Article  PubMed  CAS  Google Scholar 

  58. Ulrich EL, Akutsu H, Doreleijers JF, Harano Y, Ioannidis YE, Lin J, Livny M, Mading S, Maziuk D, Miller Z, Nakatani E, Schulte CF, Tolmie DE, Wenger RK, Yao HY, Markley JL (2008) BioMagResBank. Nucleic Acids Res 36:D402–D408. doi:10.1093/nar/gkm957

    Article  PubMed  CAS  Google Scholar 

  59. Brunger AT, Adams PD, Clore GM, Delano WL, Gros P, Grosse-Kunstleve RW, Jiang JS, Kuszewski J, Nilges M, Pannu NS, Read RJ, Rice LM, Simonson T, Warren GL (1998) Crystallography & NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr D Biol Crystallogr 54:905–921. doi:10.1107/S0907444998003254

    Article  PubMed  CAS  Google Scholar 

  60. Brunger AT (2007) Version 1.2 of the crystallography and NMR system. Nat Protoc 2:2728–2733. doi:10.1038/nprot.2007.406

    Article  PubMed  CAS  Google Scholar 

  61. Wu D, Jernigan R, Wu ZJ (2007) Refinement of NMR-determined protein structures with database derived mean-force potentials. Proteins: Struct Funct Bioinform 68:232–242. doi:10.1002/prot.21358

    Article  CAS  Google Scholar 

  62. Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983) Charmm—a program for macromolecular energy, minimization, and dynamics calculations. J Comput Chem 4:187–217. doi:10.1002/jcc.540040211

    Article  CAS  Google Scholar 

Download references

Acknowledgements

It is a pleasure to acknowledge the financial support provided by the National Institutes of Health through grants 1R01GM081680, 1R01GM072014, and 1R01GM073095.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrzej Kloczkowski.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kloczkowski, A., Jernigan, R.L., Wu, Z. et al. Distance matrix-based approach to protein structure prediction. J Struct Funct Genomics 10, 67–81 (2009). https://doi.org/10.1007/s10969-009-9062-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10969-009-9062-2

Keywords

Navigation