Journal of Computer-Aided Molecular Design

, Volume 29, Issue 8, pp 707–712

Molecular dynamics to enhance structure-based virtual screening on cathepsin B

  • Mitja Ogrizek
  • Samo Turk
  • Samo Lešnik
  • Izidor Sosič
  • Milan Hodošček
  • Bojana Mirković
  • Janko Kos
  • Dušanka Janežič
  • Stanislav Gobec
  • Janez Konc
Article

DOI: 10.1007/s10822-015-9847-2

Cite this article as:
Ogrizek, M., Turk, S., Lešnik, S. et al. J Comput Aided Mol Des (2015) 29: 707. doi:10.1007/s10822-015-9847-2

Abstract

Molecular dynamics (MD) and molecular docking are commonly used to study molecular interactions in drug discovery. Most docking approaches consider proteins as rigid, which can decrease the accuracy of predicted docked poses. Therefore MD simulations can be used prior to docking to add flexibility to proteins. We evaluated the contribution of using MD together with docking in a docking study on human cathepsin B, a well-studied protein involved in numerous pathological processes. Using CHARMM biomolecular simulation program and AutoDock Vina molecular docking program, we found, that short MD simulations significantly improved molecular docking. Our results, expressed with the area under the receiver operating characteristic curves, show an increase in discriminatory power i.e. the ability to discriminate active from inactive compounds of molecular docking, when docking is performed to selected snapshots from MD simulations.

Keywords

Cathepsin B Molecular dynamics Molecular docking Protein flexibility 

Introduction

Human cathepsin B (catB, EC 3.4.22.1) is a cysteine protease involved in many physiological processes and expressed in a variety of tissues [1]. Its alteration in expression, activity and localizations are associated with a number of diseases, most importantly Alzheimer’s disease [2] and different cancers [3]. CatB has a unique ability for cysteine cathepsins to act as an endo- and exopeptidase [1]. This dual character is regulated by the so called occluding loop formed by residues Pro102-Cys128 (Fig. 1) [4]. Exopeptidase activity of catB is associated with its physiological role in lysosomal compartments and has an optimum at pH around 5 with the catB occluding loop closed. Endopeptidase activity is on the other hand associated with pathological processes and has a pH optimum above 7 and with the occluding loop opened [5]. Ideal inhibitor of catB would thus target only the endopeptidase activity while leaving physiological exopeptidase activity unmodified. Our group discovered that an established antimicrobial agent 5-nitro-8-hydroxyquinoline (nitroxoline) is a potent, selective and reversible inhibitor of catB endopeptidase activity [6]. In the same work we also reported on the co-crystal structure of nitroxoline in the active site of catB (PDB code 3AI8). Examination of this crystal structure reveals that the nitro group from nitroxoline interacts with His110 and His111 from the occluding loop and thus locks the loop in the closed form, thereby explaining its selective inhibition of the endopeptidase activity. The work on nitroxoline derivatives was recently extended and a series of new compounds was synthesized, yielding improved inhibitors of catB [7].
Fig. 1

Superposition of open (PDB code 3K9 M, blue) and closed (PDB code 3AI8, green) forms of catB. The occluding loop is shown either as yellow cartoon (in opened form) or magenta (in closed form) with His110 and His111 shown as sticks. The co-crystallized nitroxoline (orange sticks) bound to the closed form of catB is also presented

Most of the drug design efforts are nowadays assisted by computational methods in the form of virtual screening. With 3D structures of targets often easily available the most popular method remains molecular docking [8]. Normally computer aided drug design software offers different settings and users can use different protocols, all of which influence the performance of docking [9]. Despite the obvious pitfalls of molecular docking, proper validation still often remains neglected and a lot of authors test the docking programs only by re-docking co-crystallized ligands measuring RMSD between docked and co-crystallized conformations [10]. To facilitate validation of molecular docking methods and protocols, DUD and DUD-E benchmarking sets were created [11, 12], and are used to test the performance of virtual screening methods by providing the sets of negative controls. Another important weakness of most molecular docking methods is that they consider the target protein as rigid which is especially problematic if the target is subject to conformational changes or loop movements upon ligand binding. To circumvent this problem, molecular dynamics (MD) can be used to simulate target dynamics, and then docking can be performed to individual snapshots obtained from MD. Performance of the docking is then evaluated on each snapshot [13].

In this work, we assessed the ability of molecular dynamics to improve the results of virtual screenings on catB (Fig. 2). As a starting point, the co-crystal structure of catB with nitroxoline, in which we targeted its occluding loop and the endopeptidase activity, was selected. For this structure we constructed over the course of several years an in-house database of active and inactive compounds [6, 7]. We simulated virtual screening scenario using the in-house database as well as decoys generated via DUD-E web service. We used AutoDock Vina docking software [14] and 40 snapshots from the MD simulation in addition to the crystal structure of catB. We found that selected MD snapshots significantly improve docking performance over the original crystal structure for nitroxoline derivatives, however such MD snapshots are rare and need to be carefully evaluated.
Fig. 2

Flowchart of molecular docking to snapshots from molecular dynamics

Methods

Databases of compounds

Nitroxoline derivatives with known cathepsin B activity were reported previously by our group [6, 7] and were classified as actives (31 compounds; Table S1, Supporting Information) or inactives (26 compounds; Table S2, Supporting Information) according to their biological activity on cathepsin B. Additionally, 1900 decoys were generated using the DUD-E web service [12] based on our active nitroxoline derivatives. We used a script utilizing OpenBabel toolkit version 2.3.2 [15, 16] on each compound to assign hydrogens appropriate for pH 7.4 and to generate 3D structures and minimize them in 1000 steps utilizing the MMFF94 force field. Compound properties such as molecular weight, number of H-bond acceptors, number of H-bond donors, number of atoms, and logP were calculated using RDKit nodes for KNIME [17].

Molecular dynamics

We used CHARMM 3.6 biomolecular simulation program [18] to perform molecular dynamics simulation of the closed form of cathepsin B bound to the ligand nitroxoline (PDB ID 3AI8) [6]. Hydrogen atoms were added using HBUILD routine in CHARMM. Steepest descent and adopted basis Newton-Raphson energy minimizations were performed to remove atomic clashes and to optimize the atomic coordinates of the protein–ligand complex. The nitroxoline was held fixed, and the protein was allowed to move freely during the minimization process. We embedded the protein in a cube of water, with TIP3P [19] water molecules, previously minimized using the SD algorithm and adopted Newton-Rhapson algorithm (ABNR). KCl was added at a concentration of 0.35 M to neutralize the system. Trajectories were generated at 310.15 K (37 °C) and covered 4 ns of constant pressure and temperature molecular dynamics employing periodic boundary conditions. No constrains were used during the simulation to allow cathepsin B and the nitroxoline ligand to position themselves freely according to physical forces between them. From the resulting simulation we quenched 40 snapshots, one for every 100 ps of the simulation [20]. Initial atomic positions of the nitroxoline molecule were obtained with MOLDEN [21]. Force field parameters for nitroxoline were estimated using CgenFF/ParamChem tool version 0.9.6 and the force field version 2b7 [22, 23, 24]. Parameters for the nitroxoline molecular force field were refined using the GAUSSIAN package version 09. For visualization of the target and ligands we used VMD [25].

Molecular docking

For all molecular docking experiments AutoDock Vina version 1.1.2. was used [14]. Active site of cathepsin B from the crystal structure 3AI8 was defined as a box of size 15 × 15 × 15 Å with nitroxoline in its center. All 40 snapshots of catB obtained from molecular dynamics were then superimposed on the original cathepsin B crystal structure and consequently we were able to use the same active site defining box for all the docking experiments. All snapshot PDB structures of catB were converted to PDBQT format using OpenBabel version 2.3.2. Finally, all 1957 compounds (actives, inactives and decoys) were docked in 41 structures of catB using the standard parameters of AutoDock Vina.

ROC curves

Receiver operators characteristics (ROC) curve is a common method to assess virtual screening performance [26]. A ROC curve is a plot of the true positive rate (TPR, sensitivity) versus the false positive rate (FPR, 1-specificity). Considering a rank i, two rates can be written as following:
$$TPR = \frac{{TP_{i} }}{{TP_{i} + FN_{i} }} = Se_{i}$$
(1)
$$FPR = \frac{{FP_{i} }}{{TN_{i} + FP_{i} }} = 1 - Sp{}_{i}$$
(2)
where TPi, FNi, TNi, and FPi are respectively the true positive, false negative, true negative, false positive rates at threshold i and Sei and Spi are the sensitivity and the specificity. The area under the curve (AUC) of a ROC plot is a scalar representation of the overall quality of the virtual screening method and measures the rank of a randomly selected active compared to the rank of a randomly chosen decoy. The value of AUC can vary between 0 and 1, where 1 represents a perfect ranking, when all actives are ranked above decoys, whereas 0.5 corresponds to a random ranking. The statistical analysis of the ROC curves, i.e., calculation of the standard deviations, asymptotic significance, was performed using the PSPP software.

Computer hardware

Molecular dynamics was carried out on columns and rows of computers (CROW) cluster at the National Institute of Chemistry, Ljubljana, Slovenia consisting of computers with Intel Core i7 3.40 GHz processors, having 8 GB of RAM, running Gentoo Linux [27]. Molecular docking was performed on a HP workstation with two quad core Intel Xeon 2.2 GHz processors, 8 GB of RAM, 320 GB and 1 TB hard drives, and a Nvidia Quadro FX 4800 graphic card, running a 64-bit version of Arch Linux. All calculations were performed on CPU cores.

Results and discussion

We assessed the ability of molecular dynamics to enhance the discriminatory power of structure-based virtual screening protocol, using the crystal structure of cathepsin B (catB) with the co-crystallized nitroxoline (PDB code 3AI8) as a starting point. Molecular dynamics simulation of this crystal structure was performed using CHARMM [18] as described in the Methods section and resulted in 40 snapshots spaced 100 ps apart.

To simulate the virtual screening scenario, our in-house set of 31 active and 26 inactive nitroxoline derivatives was used [6, 7] consisting of compounds having similar core quinoline scaffolds. A set of 1900 decoys based on the 31 active nitroxoline derivatives was generated using the DUD-E web service [12]. We added the 26 inactive compounds to the set of 1900 decoys. The compound sets were within the fragment-like chemical space [28] and had physiochemical properties similar to nitroxoline (Table 1), however their topologies, i.e. arrangements of functional groups, differed. As expected, both actives and decoys had very similar properties due to the methodology used to generate decoys, whereas the inactive compounds were slightly smaller than active compounds. For all compounds protonation states were assigned appropriate for pH of 7.4, since this is the optimal pH value for catB endopeptidase activity. Finally, 3D structures of compounds were generated using MOLDEN and minimized in 1000 steps utilizing the MMFF94 force field.
Table 1

Average properties of compound sets

 

MWa

LogPa

HBAb

HBDb

Atomsb

Actives (31)

279.93

1.94

5

1

20

Inactives (26)

219.01

1.65

4

1

15

Decoys (1900)

287.66

1.94

5

1

20

aMean values

bMedian values

All molecular docking experiments were done using the same protocol so not to introduce any bias. All MD snapshots of catB were aligned to the catB crystal structure (PDB code 3AI8). The active site was in all cases defined as a box of size 15 × 15 × 15 Å with crystallized nitroxoline in its center. Finally, all 1957 compounds (actives, inactives and decoys) were docked to the catB crystal structure and to all 40 snapshots obtained from MD.

The results (Table 2; Fig. 3) show that we were able to obtain a significant increase in the discriminatory ability of our docking method using the snapshot obtained at 1000 ps (AUC value of 0.64). In comparison, docking to the original crystal structure resulted in AUC value of 0.37, which corresponds to worse than random ranking. Among the tested snapshots, the 1000th ps snapshot was the only one where we observed better than random ranking with a high degree of certainty, i.e., the confidence interval lower bound is well above 0.5 and the asymptotic significance value is less than 0.05. Since Autodock Vina is a stochastic algorithm, we can expect variation in calculated AUC values, when running the docking study multiple times on the same snapshot. To investigate this, we repeated the docking five times for each snapshot using different random seed values. The obtained AUC values differed very little between the repetitions (maximum difference was 0.02), showing that the variation in the calculated AUCs is minimal. In addition, we performed case resampling with 10000 samples for each frame and as it can be observed from Table 2 (data for first ten snapshots) and Table S3, Supporting Information (all 40 snapshots) there is very small difference between the original and the bootstrap results.
Table 2

The AUC and standard deviation (SD) values obtained from docking to the crystal structure (CS) and first ten snapshots. Best AUC values are in italics

Frame/snapshot

AUC

SD

Asymptotic significance

AUCboostrap

SDbootstrap

CS

0.37

0.04

0.02

0.38

0.04

100

0.47

0.03

0.56

0.47

0.03

200

0.42

0.04

0.15

0.43

0.04

300

0.50

0.04

0.98

0.52

0.04

400

0.41

0.05

0.10

0.42

0.05

500

0.49

0.04

0.84

0.55

0.04

600

0.55

0.04

0.34

0.55

0.04

700

0.39

0.04

0.04

0.39

0.04

800

0.38

0.05

0.02

0.39

0.05

900

0.53

0.05

0.53

0.53

0.05

1000

0.64

0.04

0.01

0.65

0.04

We also report the asymptotic significance value and the AUC and SD values obtained after bootstrapping

Fig. 3

AUC values with confidence intervals for the crystal structure (0 ps) and for structures obtained from MD at every 100 ps

To see why docking is better in the 1000 ps snapshot we compared the crystal structure of catB with the structure after 1000 ps of MD simulation (Fig. 4, Table S4, SI). We calculated the individual energy contributions of the AutoDock Vina score, and found that the main difference between the docked actives in the crystal structure and actives in the 1000 ps snapshot can be attributed to more hydrogen bond interactions of the latter. Most actives in the 1000 ps snapshot were in a conformation that allowed them to form additional hydrogen bond with the His110 with the oxygen of the nitro functional group. In contrast, the His110 of the crystal structure was in a position where this hydrogen bond could not form. Therefore, the actives were mostly docked in the crystal structure with the nitro group facing away from the His110. These differences most likely make the 1000 ps snapshot better at discriminating actives from inactives.
Fig. 4

Crystal structure of catB (cyan) superimposed to the snapshot at 1000 ps of MD (light brown). Active compound GIS244 (yellow) docked to the 1000 ps snapshot forms a hydrogen bond (dashed line) with His110 (sticks); the same compound (purple) docked to the crystal structure.

Conclusion

In this study, we set out to evaluate the influence of protein flexibility on molecular docking. We simulated a virtual screening scenario using known active and inactive compounds in addition to generated decoys, and found that with correct choice of MD snapshots, we significantly improve docking performance. With short molecular dynamics simulations we were able to obtain a target conformation where an active versus inactive discrimination was significantly better than random choosing. Running short molecular dynamics simulations on a protein target prior to docking with subsequent statistical evaluation of different conformations at different molecular dynamics running times is a useful strategy in improving the performance of virtual screenings and can be a generally applicable approach in early drug discovery processes.

Supplementary material

10822_2015_9847_MOESM1_ESM.doc (552 kb)
Supplementary material 1 (DOC 552 kb)

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Mitja Ogrizek
    • 1
  • Samo Turk
    • 2
    • 4
  • Samo Lešnik
    • 1
  • Izidor Sosič
    • 2
  • Milan Hodošček
    • 1
  • Bojana Mirković
    • 2
  • Janko Kos
    • 2
  • Dušanka Janežič
    • 3
  • Stanislav Gobec
    • 2
  • Janez Konc
    • 1
    • 3
  1. 1.National Institute of ChemistryLjubljanaSlovenia
  2. 2.Faculty of PharmacyUniversity of LjubljanaLjubljanaSlovenia
  3. 3.Faculty of Mathematics, Natural Sciences, and Information TechnologiesUniversity of PrimorskaKoperSlovenia
  4. 4.BioMed X GmbHHeidelbergGermany

Personalised recommendations