Distance matrix-based approach to protein structure prediction

  • Andrzej KloczkowskiEmail author
  • Robert L. Jernigan
  • Zhijun Wu
  • Guang Song
  • Lei Yang
  • Andrzej Kolinski
  • Piotr Pokarowski


Much structural information is encoded in the internal distances; a distance matrix-based approach can be used to predict protein structure and dynamics, and for structural refinement. Our approach is based on the square distance matrix D = [r ij 2 ] containing all square distances between residues in proteins. This distance matrix contains more information than the contact matrix C, that has elements of either 0 or 1 depending on whether the distance r ij is greater or less than a cutoff value r cutoff. We have performed spectral decomposition of the distance matrices \( {\mathbf{D}} = \sum {\lambda_{k} {\mathbf{v}}_{k} {\mathbf{v}}_{k}^{T} } \), in terms of eigenvalues \( \lambda_{k} \) and the corresponding eigenvectors \( {\mathbf{v}}_{k} \) and found that it contains at most five nonzero terms. A dominant eigenvector is proportional to r 2—the square distance of points from the center of mass, with the next three being the principal components of the system of points. By predicting r 2 from the sequence we can approximate a distance matrix of a protein with an expected RMSD value of about 7.3 Å, and by combining it with the prediction of the first principal component we can improve this approximation to 4.0 Å. We can also explain the role of hydrophobic interactions for the protein structure, because r is highly correlated with the hydrophobic profile of the sequence. Moreover, r is highly correlated with several sequence profiles which are useful in protein structure prediction, such as contact number, the residue-wise contact order (RWCO) or mean square fluctuations (i.e. crystallographic temperature factors). We have also shown that the next three components are related to spatial directionality of the secondary structure elements, and they may be also predicted from the sequence, improving overall structure prediction. We have also shown that the large number of available HIV-1 protease structures provides a remarkable sampling of conformations, which can be viewed as direct structural information about the dynamics. After structure matching, we apply principal component analysis (PCA) to obtain the important apparent motions for both bound and unbound structures. There are significant similarities between the first few key motions and the first few low-frequency normal modes calculated from a static representative structure with an elastic network model (ENM) that is based on the contact matrix C (related to D), strongly suggesting that the variations among the observed structures and the corresponding conformational changes are facilitated by the low-frequency, global motions intrinsic to the structure. Similarities are also found when the approach is applied to an NMR ensemble, as well as to atomic molecular dynamics (MD) trajectories. Thus, a sufficiently large number of experimental structures can directly provide important information about protein dynamics, but ENM can also provide a similar sampling of conformations. Finally, we use distance constraints from databases of known protein structures for structure refinement. We use the distributions of distances of various types in known protein structures to obtain the most probable ranges or the mean-force potentials for the distances. We then impose these constraints on structures to be refined or include the mean-force potentials directly in the energy minimization so that more plausible structural models can be built. This approach has been successfully used by us in 2006 in the CASPR structure refinement (


Distance matrix Spectral analysis Protein structure prediction Protein structure refinement Elastic networks Distance geometry 



It is a pleasure to acknowledge the financial support provided by the National Institutes of Health through grants 1R01GM081680, 1R01GM072014, and 1R01GM073095.


  1. 1.
    Pokarowski P, Kloczkowski A, Jernigan RL, Kothari NS, Pokarowska M, Kolinski A (2005) Inferring ideal amino acid interaction forms from statistical protein contact potentials. Proteins: Struct Funct Bioinform 59:49–57. doi: 10.1002/prot.20380 CrossRefGoogle Scholar
  2. 2.
    Kawashima S, Kanehisa M (2000) AAindex: amino acid index database. Nucleic Acids Res 28:374. doi: 10.1093/nar/28.1.374 PubMedCrossRefGoogle Scholar
  3. 3.
    Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M (2008) AAindex: amino acid index database, progress report 2008. Nucleic Acids Res 36:D202–D205. doi: 10.1093/nar/gkm998 PubMedCrossRefGoogle Scholar
  4. 4.
    Pokarowski P, Kloczkowski A, Nowakowski S, Pokarowska M, Jernigan RL, Kolinski A (2007) Ideal amino acid exchange forms for approximating substitution matrices. Proteins: Struct Funct Bioinform 69:379–393. doi: 10.1002/prot.21509 CrossRefGoogle Scholar
  5. 5.
    Bastolla U, Porto M, Roman HE, Vendruscolo M (2005) Principal eigenvector of contact matrices and hydrophobicity profiles in proteins. Proteins: Struct Funct Bioinform 58:22–30. doi: 10.1002/prot.20240 CrossRefGoogle Scholar
  6. 6.
    Choi IG, Kwon J, Kim SH (2004) Local feature frequency profile: a method to measure structural similarity in proteins. Proc Natl Acad Sci USA 101:3797–3802. doi: 10.1073/pnas.0308656100 PubMedCrossRefGoogle Scholar
  7. 7.
    Domingues FS, Rahnenfuhrer J, Lengauer T (2007) Conformational analysis of alternative protein structures. Bioinformatics 23:3131–3138. doi: 10.1093/bioinformatics/btm499 PubMedCrossRefGoogle Scholar
  8. 8.
    Godzik A, Skolnick J, Kolinski A (1993) Regularities in interaction patterns of globular-proteins. Protein Eng 6:801–810. doi: 10.1093/protein/6.8.801 PubMedCrossRefGoogle Scholar
  9. 9.
    Heger A, Lappe M, Holm L (2004) Accurate detection of very sparse sequence motifs. J Comput Biol 11:843–857. doi: 10.1089/cmb.2004.11.843 PubMedCrossRefGoogle Scholar
  10. 10.
    Holm L, Park J (2000) DaliLite workbench for protein structure comparison. Bioinformatics 16:566–567. doi: 10.1093/bioinformatics/16.6.566 PubMedCrossRefGoogle Scholar
  11. 11.
    Huang YM, Bystroff C (2006) Improved pairwise alignments of proteins in the Twilight Zone using local structure predictions. Bioinformatics 22:413–422. doi: 10.1093/bioinformatics/bti828 PubMedCrossRefGoogle Scholar
  12. 12.
    Jaroszewski L, Li WZ, Godzik A (2002) In search for more accurate alignments in the twilight zone. Protein Sci 11:1702–1713. doi: 10.1110/ps.4820102 PubMedCrossRefGoogle Scholar
  13. 13.
    Kolodny R, Linial N (2004) Approximate protein structural alignment in polynomial time. Proc Natl Acad Sci USA 101:12201–12206. doi: 10.1073/pnas.0404383101 PubMedCrossRefGoogle Scholar
  14. 14.
    Mooney SD, Liang MHP, DeConde R, Altman RB (2005) Structural characterization of proteins using residue environments. Proteins: Struct Funct Bioinform 61:741–747. doi: 10.1002/prot.20661 CrossRefGoogle Scholar
  15. 15.
    Pazos F, Valencia A (2008) Protein co-evolution, co-adaptation and interactions. EMBO J 27:2648–2655. doi: 10.1038/emboj.2008.189 PubMedCrossRefGoogle Scholar
  16. 16.
    Rodionov MA, Galaktionov SG (1992) Analysis of the 3-dimensional structure of proteins in terms of residue contact matrices. 1. The contact criterion. Mol Biol 26:773–776Google Scholar
  17. 17.
    Sato T, Yamanishi Y, Kanehisa M, Toh H (2005) The inference of protein–protein interactions by co-evolutionary analysis is improved by excluding the information about the phylogenetic relationships. Bioinformatics 21:3482–3489. doi: 10.1093/bioinformatics/bti564 PubMedCrossRefGoogle Scholar
  18. 18.
    Sato T, Yamanishi Y, Horimoto K, Kanehisa M, Toh H (2006) Partial correlation coefficient between distance matrices as a new indicator of protein–protein interactions. Bioinformatics 22:2488–2492. doi: 10.1093/bioinformatics/btl419 PubMedCrossRefGoogle Scholar
  19. 19.
    Schneider TR (2000) Objective comparison of protein structures: error-scaled difference distance matrices. Acta Crystallogr D Biol Crystallogr 56:714–721. doi: 10.1107/S0907444900003723 PubMedCrossRefGoogle Scholar
  20. 20.
    Snyder DA, Montelione GT (2005) Clustering algorithms for identifying core atom sets and for assessing the precision of protein structure ensembles. Proteins: Struct Funct Bioinform 59:673–686. doi: 10.1002/prot.20402 CrossRefGoogle Scholar
  21. 21.
    Snyder DA, Bhattacharya A, Huang YPJ, Montelione GT (2005) Assessing precision and accuracy of protein structures derived from NMR data. Proteins: Struct Funct Bioinform 59:655–661. doi: 10.1002/prot.20499 CrossRefGoogle Scholar
  22. 22.
    Szustakowski JD, Weng ZP (2000) Protein structure alignment using a genetic algorithm. Proteins-Structure Funct Genet 38:428–440. doi: 10.1002/(SICI)1097-0134(20000301)38:4<428::AID-PROT8>3.0.CO;2-N CrossRefGoogle Scholar
  23. 23.
    Ye JP, Janardan R (2004) Approximate multiple protein structure alignment using the sum-of-pairs distance. J Comput Biol 11:986–1000. doi: 10.1089/cmb.2004.11.986 PubMedCrossRefGoogle Scholar
  24. 24.
    Zhou XB, Chou J, Wong STC (2006) Protein structure similarity from principle component correlation analysis. BMC Bioinformatics 7:40 (10pp)Google Scholar
  25. 25.
    Flory PJ (1976) Statistical thermodynamics of random networks. Proc R Soc Lond A: Math Phys Eng Sci 351:351–380Google Scholar
  26. 26.
    Kloczkowski A, Mark JE, Erman B (1989) Chain dimensions and fluctuations in random elastomeric networks 1 phantom Gaussian networks in the undeformed state. Macromolecules 22:1423–1432. doi: 10.1021/ma00193a070 CrossRefGoogle Scholar
  27. 27.
    Bahar I, Atilgan AR, Erman B (1997) Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. Fold Des 2:173–181. doi: 10.1016/S1359-0278(97)00024-2 PubMedCrossRefGoogle Scholar
  28. 28.
    Haliloglu T, Bahar I, Erman B (1997) Gaussian dynamics of folded proteins. Phys Rev Lett 79:3090–3093. doi: 10.1103/PhysRevLett.79.3090 CrossRefGoogle Scholar
  29. 29.
    Tirion MM (1996) Large amplitude elastic motions in proteins from a single-parameter, atomic analysis. Phys Rev Lett 77:1905–1908. doi: 10.1103/PhysRevLett.77.1905 PubMedCrossRefGoogle Scholar
  30. 30.
    Kundu S, Melton JS, Sorensen DC, Phillips GN (2002) Dynamics of proteins in crystals: comparison of experiment with simple models. Biophys J 83:723–732. doi: 10.1016/S0006-3495(02)75203-X PubMedCrossRefGoogle Scholar
  31. 31.
    Sen TZ, Feng YP, Garcia JV, Kloczkowski A, Jernigan RL (2006) The extent of cooperativity of protein motions observed with elastic network models is similar for atomic and coarser-grained models. J Chem Theory Comput 2:696–704. doi: 10.1021/ct600060d PubMedCrossRefGoogle Scholar
  32. 32.
    Atilgan AR, Durell SR, Jernigan RL, Demirel MC, Keskin O, Bahar I (2001) Anisotropy of fluctuation dynamics of proteins with an elastic network model. Biophys J 80:505–515. doi: 10.1016/S0006-3495(01)76033-X PubMedCrossRefGoogle Scholar
  33. 33.
    Keskin O, Bahar I, Flatow D, Covell DG, Jernigan RL (2002) Molecular mechanisms of chaperonin GroEL-GroES function. Biochemistry 41:491–501. doi: 10.1021/bi011393x PubMedCrossRefGoogle Scholar
  34. 34.
    Keskin O, Durell SR, Bahar I, Jernigan RL, Covell DG (2002) Relating molecular flexibility to function: a case study of tubulin. Biophys J 83:663–680. doi: 10.1016/S0006-3495(02)75199-0 PubMedCrossRefGoogle Scholar
  35. 35.
    Navizet I, Lavery R, Jernigan RL (2004) Myosin flexibility: structural domains and collective vibrations. Proteins-Structure Funct Genet 54:384–393. doi: 10.1002/prot.10476 CrossRefGoogle Scholar
  36. 36.
    Wang YM, Rader AJ, Bahar I, Jernigan RL (2004) Global ribosome motions revealed with elastic network model. J Struct Biol 147:302–314. doi: 10.1016/j.jsb.2004.01.005 PubMedCrossRefGoogle Scholar
  37. 37.
    Wang YM, Jernigan RL (2005) Comparison of tRNA motions in the free and ribosomal bound structures. Biophys J 89:3399–3409. doi: 10.1529/biophysj.105.064840 PubMedCrossRefGoogle Scholar
  38. 38.
    Yan A, Wang Y, Kloczkowski A, Jernigan RL (2008) Effects of protein subunits removal on the computed motions of partial 30S structures of the ribosome. J Chem Theory Comput 4:1757–1767Google Scholar
  39. 39.
    Crippen GM, Havel TF (1978) Stable calculation of coordinates from distance information. Acta Crystallogr A 34:282–284. doi: 10.1107/S0567739478000522 CrossRefGoogle Scholar
  40. 40.
    Havel TF, Crippen GM, Kuntz ID (1979) Effects of distance constraints on macromolecular conformation. 2. Simulation of experimental results and theoretical predictions. Biopolymers 18:73–81. doi: 10.1002/bip.1979.360180108 CrossRefGoogle Scholar
  41. 41.
    Havel TF, Kuntz ID, Crippen GM (1983) The combinatorial distance geometry method for the calculation of molecular-conformation. 1. A new approach to an old problem. J Theor Biol 104:359–381. doi: 10.1016/0022-5193(83)90112-1 PubMedCrossRefGoogle Scholar
  42. 42.
    Havel TF, Crippen GM, Kuntz ID, Blaney JM (1983) The combinatorial distance geometry method for the calculation of molecular-conformation. 2. Sample problems and computational statistics. J Theor Biol 104:383–400. doi: 10.1016/0022-5193(83)90113-3 PubMedCrossRefGoogle Scholar
  43. 43.
    Havel TF, Kuntz ID, Crippen GM (1983) The theory and practice of distance geometry. Bull Math Biol 45:665–720Google Scholar
  44. 44.
    Petsko GA, Frauenfelder H (1980) Crystallographic approaches to the dynamics of ligand-binding to myoglobin. Fed Proc 39:1648Google Scholar
  45. 45.
    Halle B (2002) Flexibility and packing in proteins. Proc Natl Acad Sci USA 99:1274–1279. doi: 10.1073/pnas.032522499 PubMedCrossRefGoogle Scholar
  46. 46.
    Chen P, Wang B, Wong HS, Huang DS (2007) Prediction of protein B-factors using multi-class bounded SVM. Protein Pept Lett 14:185–190. doi: 10.2174/092986607779816078 PubMedCrossRefGoogle Scholar
  47. 47.
    Yang L, Song G, Carriquiry A, Jernigan RL (2008) Close correspondence between the motions from principal component analysis of multiple HIV-1 protease structures and elastic network modes. Structure 16:321–330. doi: 10.1016/j.str.2007.12.011 PubMedCrossRefGoogle Scholar
  48. 48.
    Sippl MJ (1990) Calculation of conformational ensembles from potentials of mean force—an approach to the knowledge-based prediction of local structures in globular-proteins. J Mol Biol 213:859–883Google Scholar
  49. 49.
    Sippl MJ (1992) Detection of native-like models for amino-acid-sequences of unknown 3-dimensional structure in a data-base of known protein conformations. Proteins 13:258–271PubMedCrossRefGoogle Scholar
  50. 50.
    Sippl MJ (1993) Recognition of errors in 3-dimensional structures of proteins. Proteins-Structure Funct Genet 17:355–362. doi: 10.1002/prot.340170404 CrossRefGoogle Scholar
  51. 51.
    Sippl MJ (1995) Knowledge-based potentials for proteins. Curr Opin Struct Biol 5:229–235Google Scholar
  52. 52.
    Sippl MJ, Scheraga HA (1986) Cayley-Menger coordinates. Proc Natl Acad Sci USA 83:2283–2287. doi: 10.1073/pnas.83.8.2283 PubMedCrossRefGoogle Scholar
  53. 53.
    Sippl MJ, Scheraga HA (1985) Solution of the embedding problem and decomposition of symmetric-matrices. Proc Natl Acad Sci USA 82:2197–2201. doi: 10.1073/pnas.82.8.2197 PubMedCrossRefGoogle Scholar
  54. 54.
    Melo F, Feytmans E (1997) Novel knowledge-based mean force potential at atomic level. J Mol Biol 267:207–222. doi: 10.1006/jmbi.1996.0868 PubMedCrossRefGoogle Scholar
  55. 55.
    Melo F, Feytmans E (1998) Assessing protein structures with a non-local atomic interaction energy. J Mol Biol 277:1141–1152. doi: 10.1006/jmbi.1998.1665 PubMedCrossRefGoogle Scholar
  56. 56.
    Garbuzynskiy SO, Melnik BS, Lobanov MY, Finkelstein AV, Galzitskaya OV (2005) Comparison of X-ray and NMR structures: is there a systematic difference in residue contacts between X-ray and NMR-resolved protein structures? Proteins: Struct Funct Bioinform 60:139–147. doi: 10.1002/prot.20491 CrossRefGoogle Scholar
  57. 57.
    Wu D, Cui F, Jernigan R, Wu ZJ (2007) PIDD: database for protein inter-atomic distance distributions. Nucleic Acids Res 35:D202–D207. doi: 10.1093/nar/gkl802 PubMedCrossRefGoogle Scholar
  58. 58.
    Ulrich EL, Akutsu H, Doreleijers JF, Harano Y, Ioannidis YE, Lin J, Livny M, Mading S, Maziuk D, Miller Z, Nakatani E, Schulte CF, Tolmie DE, Wenger RK, Yao HY, Markley JL (2008) BioMagResBank. Nucleic Acids Res 36:D402–D408. doi: 10.1093/nar/gkm957 PubMedCrossRefGoogle Scholar
  59. 59.
    Brunger AT, Adams PD, Clore GM, Delano WL, Gros P, Grosse-Kunstleve RW, Jiang JS, Kuszewski J, Nilges M, Pannu NS, Read RJ, Rice LM, Simonson T, Warren GL (1998) Crystallography & NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr D Biol Crystallogr 54:905–921. doi: 10.1107/S0907444998003254 PubMedCrossRefGoogle Scholar
  60. 60.
    Brunger AT (2007) Version 1.2 of the crystallography and NMR system. Nat Protoc 2:2728–2733. doi: 10.1038/nprot.2007.406 PubMedCrossRefGoogle Scholar
  61. 61.
    Wu D, Jernigan R, Wu ZJ (2007) Refinement of NMR-determined protein structures with database derived mean-force potentials. Proteins: Struct Funct Bioinform 68:232–242. doi: 10.1002/prot.21358 CrossRefGoogle Scholar
  62. 62.
    Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983) Charmm—a program for macromolecular energy, minimization, and dynamics calculations. J Comput Chem 4:187–217. doi: 10.1002/jcc.540040211 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2009

Authors and Affiliations

  • Andrzej Kloczkowski
    • 1
    • 2
    Email author
  • Robert L. Jernigan
    • 1
    • 2
  • Zhijun Wu
    • 3
  • Guang Song
    • 4
  • Lei Yang
    • 1
    • 2
  • Andrzej Kolinski
    • 5
  • Piotr Pokarowski
    • 6
  1. 1.Laurence H. Baker Center for Bioinformatics and Biological StatisticsIowa State UniversityAmesUSA
  2. 2.Department of Biochemistry, Biophysics and Molecular BiologyIowa State UniversityAmesUSA
  3. 3.Department of MathematicsIowa State UniversityAmesUSA
  4. 4.Department of Computer ScienceIowa State UniversityAmesUSA
  5. 5.Laboratory of Theory of Biopolymers, Department of ChemistryWarsaw UniversityWarsawPoland
  6. 6.Institute of InformaticsWarsaw UniversityWarsawPoland

Personalised recommendations