Exploring fragment-based target-specific ranking protocol with machine learning on cathepsin S

Yang, Yuwei; Lu, Jianing; Yang, Chao; Zhang, Yingkai

doi:10.1007/s10822-019-00247-3

Exploring fragment-based target-specific ranking protocol with machine learning on cathepsin S

Published: 15 November 2019

Volume 33, pages 1095–1105, (2019)
Cite this article

Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Yuwei Yang¹,
Jianing Lu¹,
Chao Yang¹ &
…
Yingkai Zhang ORCID: orcid.org/0000-0002-4984-3354^1,2

652 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

Cathepsin S (CatS), a member of cysteine cathepsin proteases, has been well studied due to its significant role in many pathological processes, including arthritis, cancer and cardiovascular diseases. CatS inhibitors have been included in D3R-GC3 for both docking pose prediction and affinity ranking, and in D3R-GC4 for binding affinity ranking. The difficulties posed by CatS inhibitors in D3R mainly come from three aspects: large size, high flexibility and similar chemical structures. We have participated in GC4; our best submitted model, which employs a similarity-based alignment docking and Vina scoring protocol, yielded Kendall’s τ of 0.23 for 459 binders in GC4. In our further explorations with machine learning, by curating a CatS specific training set, adopting a similarity-based constrained docking method as well as an arm-based fragmentation strategy which can describe large inhibitors in a locality-sensitive fashion, our best structure-based ranking protocol can achieve Kendall’s τ of 0.52 for all binders in GC4. In this exploration process, we have demonstrated the importance of training data, docking approaches and fragmentation strategies in inhibitor-ranking protocol development with machine learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial intelligence to deep learning: machine intelligence approach for drug discovery

Article 12 April 2021

From UK-2A to florylpicoxamid: Active learning to identify a mimic of a macrocyclic natural product

Article Open access 17 April 2024

Machine Learning in Drug Discovery: A Review

Article 11 August 2021

References

Lavecchia A, Di Giovanni C (2013) Virtual screening strategies in drug discovery: a critical review. Curr Med Chem 20(23):2839–2860
CAS PubMed Google Scholar
Schneider G (2010) Virtual screening: an endless staircase? Nat Rev Drug Discov 9(4):273–276
CAS PubMed Google Scholar
Ashtawy HM, Mahapatra NR (2012) A comparative assessment of ranking accuracies of conventional and machine-learning-based scoring functions for protein-ligand binding affinity prediction. IEEE/ACM Trans Comput Biol Bioinform 9(5):1301–1313
PubMed Google Scholar
Kim R, Skolnick J (2008) Assessment of programs for ligand binding affinity prediction. J Comput Chem 29(8):1316–1331
CAS PubMed PubMed Central Google Scholar
Stjernschantz E, Oostenbrink C (2010) Improved ligand-protein binding affinity predictions using multiple binding modes. Biophys J 98(11):2682–2691
CAS PubMed PubMed Central Google Scholar
Su MY, Yang QF, Du Y, Feng GQ, Liu ZH, Li Y, Wang RX (2019) Comparative assessment of scoring functions: the CASF-2016 update. J Chem Inf Model 59(2):895–913
CAS PubMed Google Scholar
Li Y, Liu ZH, Li J, Han L, Liu J, Zhao ZX, Wang RX (2014) Comparative assessment of scoring functions on an updated benchmark: 1. Compilation of the test set. J Chem Inf Model 54(6):1700–1716
CAS PubMed Google Scholar
Li Y, Han L, Liu ZH, Wang RX (2014) Comparative assessment of scoring functions on an updated benchmark: 2. Evaluation methods and general results. J Chem Inf Model 54(6):1717–1736
CAS PubMed Google Scholar
Cheng TJ, Li QL, Zhou ZG, Wang YL, Bryant SH (2012) Structure-based virtual screening for drug discovery: a problem-centric review. AAPS J 14(1):133–141
CAS PubMed PubMed Central Google Scholar
Cheng TJ, Li X, Li Y, Liu ZH, Wang RX (2009) Comparative assessment of scoring functions on a diverse test set. J Chem Inf Model 49(4):1079–1093
CAS PubMed Google Scholar
Bauer MR, Ibrahim TM, Vogel SM, Boeckler FM (2013) Evaluation and optimization of virtual screening workflows with DEKOIS 2.0-a public library of challenging docking benchmark sets. J Chem Inf Model 53(6):1447–1462
CAS PubMed Google Scholar
Mysinger MM, Carchia M, Irwin JJ, Shoichet BK (2012) Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J Med Chem 55(14):6582–6594
CAS PubMed PubMed Central Google Scholar
Rohrer SG, Baumann K (2009) Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data. J Chem Inf Model 49(2):169–184
CAS PubMed Google Scholar
Amini A, Shrimpton PJ, Muggleton SH, Sternberg MJE (2007) A general approach for developing system-specific functions to score protein-ligand docked complexes using support vector inductive logic programming. Proteins 69(4):823–831
CAS PubMed Google Scholar
Ballester PJ, Mitchell JBO (2010) A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics 26(9):1169–1175
CAS PubMed Google Scholar
Durrant JD, McCammon JA (2011) NNScore 2.0: a neural-network receptor-ligand scoring function. J Chem Inf Model 51(11):2897–2903
CAS PubMed PubMed Central Google Scholar
Kinnings SL, Liu NN, Tonge PJ, Jackson RM, Xie L, Bourne PE (2011) A machine learning-based method to improve docking scoring functions and its application to drug repurposing. J Chem Inf Model 51(2):408–419
CAS PubMed PubMed Central Google Scholar
Li LW, Khanna M, Jo IH, Wang F, Ashpole NM, Hudmon A, Meroueh SO (2011) Target-specific support vector machine scoring in structure-based virtual screening: computational validation, on vitro testing in kinases, and effects on lung cancer cell proliferation. J Chem Inf Model 51(4):755–759
CAS PubMed PubMed Central Google Scholar
Zilian D, Sotriffer CA (2013) SFCscore(RF): a random forest-based scoring function for improved affinity prediction of protein-ligand complexes. J Chem Inf Model 53(8):1923–1933
CAS PubMed Google Scholar
Ragoza M, Hochuli J, Idrobo E, Sunseri J, Koes DR (2017) Protein-ligand scoring with convolutional neural networks. J Chem Inf Model 57(4):942–957
CAS PubMed PubMed Central Google Scholar
Wang C, Zhang YK (2017) Improving scoring-docking-screening powers of protein-ligand scoring functions using random forest. J Comput Chem 38(3):169–177
PubMed Google Scholar
Jimenez J, Skalic M, Martinez-Rosell G, De Fabritiis G (2018) K-DEEP: protein-ligand absolute binding affinity prediction via 3D-convolutional neural networks. J Chem Inf Model 58(2):287–296
CAS PubMed Google Scholar
Gathiaka S, Liu S, Chiu M, Yang HW, Stuckey JA, Kang YN, Delproposto J, Kubish G, Dunbar JB, Carlson HA, Burley SK, Walters WP, Amaro RE, Feher VA, Gilson MK (2016) D3R Grand Challenge 2015: evaluation of protein-ligand pose and affinity predictions. J Comput Aided Mol Des 30(9):651–668
CAS PubMed PubMed Central Google Scholar
Gaieb Z, Parks CD, Chiu M, Yang HW, Shao CH, Walters WP, Lambert MH, Nevins N, Bembenek SD, Ameriks MK, Mirzadegan T, Burley SK, Amaro RE, Gilson MK (2019) D3R Grand Challenge 3: blind prediction of protein-ligand poses and affinity rankings. J Comput Aided Mol Des 33(1):1–18
CAS PubMed PubMed Central Google Scholar
Gaieb Z, Liu S, Gathiaka S, Chiu M, Yang HW, Shao CH, Feher VA, Walters WP, Kuhn B, Rudolph MG, Burley SK, Gilson MK, Amaro RE (2018) D3R Grand Challenge 2: blind prediction of protein-ligand poses, affinity rankings, and relative binding free energies. J Comput Aided Mol Des 32(1):1–20
CAS PubMed Google Scholar
Turk V, Stoka V, Vasiljeva O, Renko M, Sun T, Turk B, Turk D (2012) Cysteine cathepsins: from structure, function and regulation to new frontiers. Biochim Biophys Acta 1824(1):68–88
CAS PubMed Google Scholar
Wilkinson RDA, Williams R, Scott CJ, Burden RE (2015) Cathepsin S: therapeutic, diagnostic, and prognostic potential. Biol Chem 396(8):867–882
CAS PubMed Google Scholar
Trott O, Olson AJ (2010) Software news and update AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31(2):455–461
CAS PubMed PubMed Central Google Scholar
Ameriks MK, Axe FU, Bembenek SD, Edwards JP, Gu Y, Karlsson L, Randal M, Sun SQ, Thurmond RL, Zhu J (2009) Pyrazole-based cathepsin S inhibitors with arylalkynes as P1 binding elements. Bioorg Med Chem Lett 19(21):6131–6134
CAS PubMed Google Scholar
Thurmond RL, Sun SQ, Sehon CA, Baker SM, Cai H, Gu Y, Jiang W, Riley JP, Williams KN, Edwards JP, Karlsson L (2004) Identification of a potent and selective noncovalent cathepsin S inhibitor. J Pharmacol Exp Ther 308(1):268–276
CAS PubMed Google Scholar
Wiener DK, Lee-Dutra A, Bembenek S, Nguyen S, Thurmond RL, Sun S, Karlsson L, Grice CA, Jones TK, Edwards JP (2010) Thioether acetamides as P3 binding elements for tetrahydropyrido-pyrazole cathepsin S inhibitors. Bioorg Med Chem Lett 20(7):2379–2382
CAS PubMed Google Scholar
Liu ZH, Su MY, Han L, Liu J, Yang QF, Li Y, Wang RX (2017) Forging the basis for developing protein-ligand interaction scoring functions. Acc Chem Res 50(2):302–309
CAS PubMed Google Scholar
Dunbar JB, Smith RD, Yang CY, Ung PMU, Lexa KW, Khazanov NA, Stuckey JA, Wang SM, Carlson HA (2011) CSAR benchmark exercise of 2010: selection of the protein-ligand complexes (vol 51, pg 2036, 2011). J Chem Inf Model 51(9):2146
CAS PubMed Central Google Scholar
Huang SY, Zou XQ (2011) Scoring and lessons learned with the CSAR benchmark using an improved iterative knowledge-based scoring function. J Chem Inf Model 51(9):2097–2106
CAS PubMed PubMed Central Google Scholar
Gaulton A, Hersey A, Nowotka M, Bento AP, Chambers J, Mendez D, Mutowo P, Atkinson F, Bellis LJ, Cibrian-Uhalte E, Davies M, Dedman N, Karlsson A, Magarinos MP, Overington JP, Papadatos G, Smit I, Leach AR (2017) The ChEMBL database in 2017. Nucleic Acids Res 45(D1):D945–D954
CAS PubMed Google Scholar
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742–754
CAS PubMed Google Scholar
RDKit: Open-source cheminformatics; http://www.rdkit.org
Koukos PI, Xue LC, Bonvin A (2019) Protein-ligand pose and affinity prediction: lessons from D3R Grand Challenge 3. J Comput Aided Mol Des 33(1):83–91
CAS PubMed Google Scholar
Kumar A, Zhang KYJ (2019) Shape similarity guided pose prediction: lessons from D3R Grand Challenge 3. J Comput Aided Mol Des 33(1):47–59
CAS PubMed Google Scholar
Lam PCH, Abagyan R, Totrov M (2019) Hybrid receptor structure/ligand-based docking and activity prediction in ICM: development and evaluation in D3R Grand Challenge 3. J Comput Aided Mol Des 33(1):35–46
CAS PubMed Google Scholar
Nguyen DD, Cang ZX, Wu KD, Wang ML, Cao Y, Wei GW (2019) Mathematical deep learning for pose and binding affinity prediction and ranking in D3R Grand Challenges. J Comput Aided Mol Des 33(1):71–82
CAS PubMed Google Scholar
Ignatov M, Liu C, Alekseenko A, Sun ZYZ, Padhorny D, Kotelnikov S, Kazennov A, Grebenkin I, Kholodov Y, Kolosvari I, Perez A, Dill K, Kozakov D (2019) Monte Carlo on the manifold and MD refinement for binding pose prediction of protein-ligand complexes: 2017 D3R Grand Challenge. J Comput Aided Mol Des 33(1):119–127
CAS PubMed Google Scholar
Dolinsky TJ, Czodrowski P, Li H, Nielsen JE, Jensen JH, Klebe G, Baker NA (2007) PDB2PQR: expanding and upgrading automated preparation of biomolecular structures for molecular simulations. Nucleic Acids Res 35:W522–W525
PubMed PubMed Central Google Scholar
SciFinder; https://scifinder.cas.org/scifinder/
Riniker S, Landrum GA (2015) Better informed distance geometry: using what we know to improve conformation generation. J Chem Inf Model 55(12):2562–2574
CAS PubMed Google Scholar
Halgren TA (1996) Merck molecular force field. 1. Basis, form, scope, parameterization, and performance of MMFF94. J Comput Chem 17(5–6):490–519
CAS Google Scholar
Halgren TA (1996) Merck molecular force field. 2. MMFF94 van der Waals and electrostatic parameters for intermolecular interactions. J Comput Chem 17(5–6):520–552
CAS Google Scholar
Halgren TA (1996) Merck molecular force field. 3. Molecular geometries and vibrational frequencies for MMFF94. J Comput Chem 17(5–6):553–586
CAS Google Scholar
Halgren TA (1996) Merck molecular force field. 5. Extension of MMFF94 using experimental data, additional computational data, and empirical rules. J Comput Chem 17(5–6):616–641
CAS Google Scholar
Halgren TA, Nachbar RB (1996) Merck molecular force field. 4. Conformational energies and geometries for MMFF94. J Comput Chem 17(5–6):587–615
CAS Google Scholar
Tosco P, Stiefl N, Landrum G (2014) Bringing the MMFF force field to the RDKit: implementation and validation. J Cheminform 6:37
PubMed Central Google Scholar
Morris GM, Huey R, Lindstrom W, Sanner MF, Belew RK, Goodsell DS, Olson AJ (2009) AutoDock4 and AutoDockTools4: automated docking with selective receptor flexibility. J Comput Chem 30(16):2785–2791
CAS PubMed PubMed Central Google Scholar
Koes DR, Baumgartner MP, Camacho CJ (2013) Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise. J Chem Inf Model 53(8):1893–1904
CAS PubMed PubMed Central Google Scholar
Temelso B, Mabey JM, Kubota T, Appiah-Padi N, Shields GC (2017) ArbAlign: a tool for optimal alignment of arbitrarily ordered isomers using the Kuhn-Munkres algorithm. J Chem Inf Model 57(5):1045–1054
CAS PubMed Google Scholar
Rooklin D, Wang C, Katigbak J, Arora PS, Zhang YK (2015) Alpha space: fragment-centric topographical mapping to target protein-protein interaction interfaces. J Chem Inf Model 55(8):1585–1599
CAS PubMed PubMed Central Google Scholar
Liu TR, Naderi M, Alvin C, Mukhopadhyay S, Brylinski M (2017) Break down in order to build up: decomposing small molecules for fragment-based drug design with eMolFrag. J Chem Inf Model 57(4):627–631
CAS PubMed PubMed Central Google Scholar
Murray CW, Rees DC (2009) The rise of fragment-based drug discovery. Nat Chem 1(3):187–192
CAS PubMed Google Scholar
Lewell XQ, Judd DB, Watson SP, Hann MM (1998) RECAP—retrosynthetic combinatorial analysis procedure: a powerful new technique for identifying privileged molecular fragments with useful applications in combinatorial chemistry. J Chem Inf Comput Sci 38(3):511–522
CAS PubMed Google Scholar
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794
XGBoost: A Scalable Tree Boosting System arXiv:1603.02754
Dunbar JB, Smith RD, Damm-Ganamet KL, Ahmed A, Esposito EX, Delproposto J, Chinnaswamy K, Kang YN, Kubish G, Gestwicki JE, Stuckey JA, Carlson HA (2013) CSAR data set release 2012: ligands, affinities, complexes, and docking decoys. J Chem Inf Model 53(8):1842–1852
CAS PubMed PubMed Central Google Scholar
Huey R, Morris GM, Olson AJ, Goodsell DS (2007) A semiempirical free energy force field with charge-based desolvation. J Comput Chem 28(6):1145–1152
CAS PubMed Google Scholar

Download references

Acknowledgements

We would like to acknowledge the support by NIH (Grant Nos. R35-GM127040, R01GM073943 and R01GM120736) and computing resources provided by NYU-ITS.

Author information

Authors and Affiliations

Department of Chemistry, New York University, New York, NY, 10003, USA
Yuwei Yang, Jianing Lu, Chao Yang & Yingkai Zhang
NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, 200062, China
Yingkai Zhang

Authors

Yuwei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jianing Lu
View author publications
You can also search for this author in PubMed Google Scholar
Chao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yingkai Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yingkai Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Y., Lu, J., Yang, C. et al. Exploring fragment-based target-specific ranking protocol with machine learning on cathepsin S. J Comput Aided Mol Des 33, 1095–1105 (2019). https://doi.org/10.1007/s10822-019-00247-3

Download citation

Received: 12 June 2019
Accepted: 02 November 2019
Published: 15 November 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10822-019-00247-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploring fragment-based target-specific ranking protocol with machine learning on cathepsin S

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence to deep learning: machine intelligence approach for drug discovery

From UK-2A to florylpicoxamid: Active learning to identify a mimic of a macrocyclic natural product

Machine Learning in Drug Discovery: A Review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exploring fragment-based target-specific ranking protocol with machine learning on cathepsin S

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence to deep learning: machine intelligence approach for drug discovery

From UK-2A to florylpicoxamid: Active learning to identify a mimic of a macrocyclic natural product

Machine Learning in Drug Discovery: A Review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation