WaterScore: a novel method for distinguishing between bound and displaceable water molecules in the crystal structure of the binding site of protein-ligand complexes
We have performed a multivariate logistic regression analysis to establish a statistical correlation between the structural properties of water molecules in the binding site of a free protein crystal structure, with the probability of observing the water molecules in the same location in the crystal structure of the ligand-complexed form. The temperature B-factor, the solvent-contact surface area, the total hydrogen bond energy and the number of protein–water contacts were found to discriminate between bound and displaceable water molecules in the best regression functions obtained. These functions may be used to identify those bound water molecules that should be included in structure-based drug design and ligand docking algorithms.
Figure The binding site (thin sticks) of penicillopepsin (3app) with its crystallographically determined water molecules (spheres) and superimposed ligand (in thick sticks, from complexed structure 1ppk). Water molecules sterically displaced by the ligand upon complexation are shown in cyan. Bound water molecules are shown in blue. Displaced water molecules are shown in yellow. Water molecules removed from the analysis due to a lack of hydrogen bonds to the protein are shown in white. WaterScore correctly predicted waters in blue as Probability=1 to remain bound and waters in yellow as Probability<1×10−20 to remain bound.
KeywordsProtein hydration Drug design Bound water molecules Multivariate logistic regression
ATGS would like to thank Consejo Nacional de Ciencia y Tecnología (CONACyT, México) for the award of a postgraduate scholarship and the CVCP of the Universities of the UK for an Overseas Research Scheme award. RLM is also a Research Fellow of Hughes Hall, Cambridge. We also thank Mr. Benjamin Carrington for his valuable help in the production of some of the figures, Dr. Per Kållblad for help and discussion on PC analysis, and Miss Eva-Liina Asu for proof-reading a draft of the manuscript.
- 1.Giacovazzo C, Monaco HL, Viterbo D, Scordari F, Gilli G, Zanotti G, Catti M (1992) Fundamentals of crystallography. Oxford University Press, Oxford, pp 583–584Google Scholar
- 4.Chung E, Henriques D, Renzoni D, Zvelebil M, Bradshaw JM, Waksman G, Robinson CV, Ladbury JE (1998) Struct Folding Design 6:1141–1151Google Scholar
- 14.Poornima CS, Dean PM (1995) J Comput-Aided Mol Des 9:521–531Google Scholar
- 15.Poornima CS, Dean PM (1995) J Comput-Aided Mol Des 9:500–512Google Scholar
- 16.Poornima CS, Dean PM (1995) J Comput-Aided Mol Des 9:513–520Google Scholar
- 43.Mancera RL (2002) J Comp-Aided Mol Des 16:479–499Google Scholar
- 49.Matlab 5.0 (1999) The Math Works,Google Scholar
- 50.Menard SM (1995) Applied logistic regression analysis in series. In: Lewis-Beck MS (ed) Quantitative applications in the social sciences. Sage, Thousand Oaks, Calif.Google Scholar
- 51.Agresti A (1996) An introduction to categorical data analysis, Wiley series in probability and statistics, applied probability and statistics. Wiley, New YorkGoogle Scholar
- 52.Rice JA (1995) Mathematical statistics and data analysis, 2nd edn. Duxbury Press, Belmont, Calif.Google Scholar
- 53.Holtsberg A (1994) http://www.mathtools.net