The AAPS Journal

, Volume 15, Issue 2, pp 395–406 | Cite as

TargetHunter: An In Silico Target Identification Tool for Predicting Therapeutic Potential of Small Organic Molecules Based on Chemogenomic Database

  • Lirong Wang
  • Chao Ma
  • Peter Wipf
  • Haibin Liu
  • Weiwei Su
  • Xiang-Qun XieEmail author
Research Article Theme: New Paradigms in Pharmaceutical Sciences: In Silico Drug Discovery


Target identification of the known bioactive compounds and novel synthetic analogs is a very important research field in medicinal chemistry, biochemistry, and pharmacology. It is also a challenging and costly step towards chemical biology and phenotypic screening. In silico identification of potential biological targets for chemical compounds offers an alternative avenue for the exploration of ligand–target interactions and biochemical mechanisms, as well as for investigation of drug repurposing. Computational target fishing mines biologically annotated chemical databases and then maps compound structures into chemogenomical space in order to predict the biological targets. We summarize the recent advances and applications in computational target fishing, such as chemical similarity searching, data mining/machine learning, panel docking, and the bioactivity spectral analysis for target identification. We then described in detail a new web-based target prediction tool, TargetHunter ( This web portal implements a novel in silico target prediction algorithm, the Targets Associated with its MOst SImilar Counterparts, by exploring the largest chemogenomical databases, ChEMBL. Prediction accuracy reached 91.1% from the top 3 guesses on a subset of high-potency compounds from the ChEMBL database, which outperformed a published algorithm, multiple-category models. TargetHunter also features an embedded geography tool, BioassayGeoMap, developed to allow the user easily to search for potential collaborators that can experimentally validate the predicted biological target(s) or off target(s). TargetHunter therefore provides a promising alternative to bridge the knowledge gap between biology and chemistry, and significantly boost the productivity of chemogenomics researchers for in silico drug design and discovery.

Key words

ChEMBL chemogenomics machine learning target identification TargetHunter 



The authors gratefully acknowledge financial support from NIH Grants(NIH R01DA025612, NIGMS P50-GM067082, and NIH R21HL109654) and the National Natural Science Foundation of China (NSFC 81090410 and NSFC 90913018). We thank Dr. Herbert Barry III for improvements of the manuscript and Dr. Qin Ouyang for preparation of the figures.


  1. 1.
    Nettles JH, Jenkins JL, Bender A, Deng Z, Davies JW, Glick M. Bridging chemical and biological space: “target fishing” using 2D and 3D molecular descriptors. J Med Chem. 2006;49(23):6802–10. doi: 10.1021/jm060902w.CrossRefPubMedGoogle Scholar
  2. 2.
    Lipinski CA, Lombardo F, Dominy BW, Feeney PJ. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev. 2001;46(1–3):3–26.CrossRefPubMedGoogle Scholar
  3. 3.
    Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Bryant SH. PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res. 2009;37 suppl 2:W623–33. doi: 10.1093/nar/gkp456.CrossRefPubMedGoogle Scholar
  4. 4.
    Bolton EE, Wang Y, Thiessen PA, Bryant SH. PubChem: integrated platform of small molecules and biological activities. Annu Rep Comput Chem. 2008;4:217–41. doi: 10.1016/S1574-1400(08)00012-1.CrossRefGoogle Scholar
  5. 5.
    Xie X-QS. Exploiting PubChem for virtual screening. Expert Opin Drug Discov. 2010;5(12):1205–20. doi: 10.1517/17460441.2010.524924.CrossRefPubMedGoogle Scholar
  6. 6.
    Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012;40(1):D1100–7. doi: 10.1093/nar/gkr777.CrossRefPubMedGoogle Scholar
  7. 7.
    Nidhi, Glick M, Davies JW, Jenkins JL. Prediction of biological targets for compounds using multiple-category Bayesian models trained on chemogenomics databases. J Chem Inf Model. 2006;46(3):1124–33. doi: 10.1021/ci060003g.CrossRefPubMedGoogle Scholar
  8. 8.
    Jenkins JL, Bender A, Davies JW. In silico target fishing: predicting biological targets from chemical structure. Drug Discov Today Technol. 2007;3(4):413–21. doi: 10.1016/j.ddtec.2006.12.008.CrossRefGoogle Scholar
  9. 9.
    Abraham VC, Taylor DL, Haskins JR. High content screening applied to large-scale cell biology. Trends Biotechnol. 2004;22(1):15–22.CrossRefPubMedGoogle Scholar
  10. 10.
    Bleicher KH, Bohm HJ, Muller K, Alanine AI. Hit and lead generation: beyond high-throughput screening. Nat Rev Drug Discov. 2003;2(5):369–78. doi: 10.1038/nrd1086.CrossRefPubMedGoogle Scholar
  11. 11.
    Carpenter AE, Sabatini DM. Systematic genome-wide screens of gene function. Nat Rev Genet. 2004;5(1):11–22. doi: 10.1038/nrg1248.CrossRefPubMedGoogle Scholar
  12. 12.
    Oprea TI, Bauman JE, Bologa CG, Buranda T, Chigaev A, Edwards BS, et al. Drug repurposing from an academic perspective. Drug Discov Today Ther Strateg. 2011. doi: 10.1016/j.ddstr.2011.10.002.
  13. 13.
    Rognan D. Structure‐based approaches to target fishing and ligand profiling. Mol Inf. 2010;29(3):176–87. doi: 10.1002/minf.200900081.CrossRefGoogle Scholar
  14. 14.
    Rognan D, editors. Computational approaches to target fishing and ligand profiling. Theory and applications in computational chemistry: the first decade of the second millennium: International Congress TACC-2012; 2012.Google Scholar
  15. 15.
    Bender A, Young DW, Jenkins JL, Serrano M, Mikhailov D, Clemons PA, et al. Chemogenomic data analysis: prediction of small-molecule targets and the advent of biological fingerprint. Comb Chem High Throughput Screen. 2007;10(8):719–31.CrossRefPubMedGoogle Scholar
  16. 16.
    Martin YC, Kofron JL, Traphagen LM. Do structurally similar molecules have similar biological activity? J Med Chem. 2002;45(19):4350–8. doi: 10.1021/jm020155c.CrossRefPubMedGoogle Scholar
  17. 17.
    Schuffenhauer A, Floersheim P, Acklin P, Jacoby E. Similarity metrics for ligands reflecting the similarity of the target proteins. J Chem Inf Comput Sci. 2003;43(2):391–405. doi: 10.1021/ci025569t.CrossRefPubMedGoogle Scholar
  18. 18.
    Mitchell JBO. The relationship between the sequence identities of alpha helical proteins in the PDB and the molecular similarities of their ligands. J Chem Inf Comput Sci. 2001;41(6):1617–22. doi: 10.1021/ci010364q.CrossRefPubMedGoogle Scholar
  19. 19.
    Brown RD, Martin YC. The information content of 2D and 3D structural descriptors relevant to ligand-receptor binding. J Chem Inf Comput Sci. 1997;37(1):1–9. doi: 10.1021/ci960373c.CrossRefGoogle Scholar
  20. 20.
    Johnson M. Concepts and applications of molecular similarity. J Med Chem. 1991;34(12):3409. doi: 10.1021/jm00116a601.Google Scholar
  21. 21.
    Bender A, Glen RC. Molecular similarity: a key technique in molecular informatics. Org Biomol Chem. 2004;2(22):3204–18. doi: 10.1039/B409813G.CrossRefPubMedGoogle Scholar
  22. 22.
    Patterson DE, Cramer RD, Ferguson AM, Clark RD, Weinberger LE. Neighborhood behavior: a useful concept for validation of “molecular diversity” descriptors. J Med Chem. 1996;39(16):3049–59. doi: 10.1021/jm960290n.CrossRefPubMedGoogle Scholar
  23. 23.
    Tanimoto TT. IBM Internal Report. November 17, 1957.Google Scholar
  24. 24.
    Gregori-Puigjané E, Mestres J. SHED: Shannon entropy descriptors from topological feature distributions. J Chem Inf Model. 2006;46(4):1615–22. doi: 10.1021/ci0600509.CrossRefPubMedGoogle Scholar
  25. 25.
    Keiser MJ, Setola V, Irwin JJ, Laggner C, Abbas AI, Hufeisen SJ, et al. Predicting new molecular targets for known drugs. Nature. 2009;462(7270):175–81. doi: 10.1038/nature08506.CrossRefPubMedGoogle Scholar
  26. 26.
    Lounkine E, Keiser MJ, Whitebread S, Mikhailov D, Hamon J, Jenkins JL, et al. Large-scale prediction and testing of drug activity on side-effect targets. Nature. 2012;486(7403):361–7. doi: doi:10.1038/nature11159.PubMedGoogle Scholar
  27. 27.
    Olah M, Mracec M, Ostopovici L, Rad R, Bora A, Hadaruga N, et al. WOMBAT: world of molecular bioactivity. Chemoinformatics in Drug Discov. 2005. doi: 10.1002/3527603743.ch9.
  28. 28.
  29. 29.
    Bender A, Scheiber J, Glick M, Davies JW, Azzaoui K, Hamon J, et al. Analysis of pharmacology data and the prediction of adverse drug reactions and off‐target effects from chemical structure. ChemMedChem. 2007;2(6):861–73. doi: 10.1002/cmdc.200700026.CrossRefPubMedGoogle Scholar
  30. 30.
    Li H, Gao Z, Kang L, Zhang H, Yang K, Yu K, et al. TarFisDock: a web server for identifying drug targets with docking approach. Nucleic Acids Res. 2006;34(Web Server issue):W219–24. doi: 10.1093/nar/gkl114.CrossRefPubMedGoogle Scholar
  31. 31.
    Cai J, Han C, Hu T, Zhang J, Wu D, Wang F, et al. Peptide deformylase is a potential target for anti‐Helicobacter pylori drugs: reverse docking, enzymatic assay, and X‐ray crystallography validation. Protein Sci. 2006;15(9):2071–81. doi: 10.1110/ps.062238406.CrossRefPubMedGoogle Scholar
  32. 32.
    Chen YZ, Zhi DG. Ligand-protein inverse docking and its potential use in the computer search of protein targets of a small molecule. Proteins. 2001;43(2):217–26. doi:10.1002/1097-0134(20010501)43:2<217::AID-PROT1032>3.0.CO;2-G.CrossRefPubMedGoogle Scholar
  33. 33.
    Zhu W, Qiu XH, Xu XJ, Lu CJ. Computational network pharmacological research of Chinese medicinal plants for chronic kidney disease. SCIENCE CHINA Chem. 2010;53(11):2337–42. doi: 10.1007/s11426-010-4082-0.CrossRefGoogle Scholar
  34. 34.
    Krejsa CM, Horvath D, Rogalski SL, Penzotti JE, Mao B, Barbosa F, et al. Predicting ADME properties and side effects: the BioPrint approach. Curr Opin Drug Discov Dev. 2003;6(4):470–80.Google Scholar
  35. 35.
    Fliri AF, Loging WT, Thadeio PF, Volkmann RA. Biological spectra analysis: linking biological activity profiles to molecular structure. Proc Natl Acad Sci U S A. 2005;102(2):261–6. doi: 10.1073/pnas.0407790101.CrossRefPubMedGoogle Scholar
  36. 36.
    Cheng T, Li Q, Wang Y, Bryant SH. Identifying compound-target associations by combining bioactivity profile similarity search and public databases mining. J Chem Inf Model. 2011;51(9):2440–8. doi: 10.1021/ci200192v.CrossRefPubMedGoogle Scholar
  37. 37.
    Bender A. Databases: compound bioactivities go public. Nat Chem Biol. 2010;6(5):309. doi: 10.1038/nchembio.354.CrossRefGoogle Scholar
  38. 38.
    Heikamp K, Bajorath J. Large-scale similarity search profiling of ChEMBL compound data sets. J Chem Inf Model. 2011;51(8):1831–9. doi: 10.1021/ci200199u.CrossRefPubMedGoogle Scholar
  39. 39.
    Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50(5):742–54. doi: 10.1021/ci100050t.CrossRefPubMedGoogle Scholar
  40. 40.
    O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR. Open Babel: an open chemical toolbox. J Cheminform. 2011;3:33. doi: 10.1186/1758-2946-3-33.CrossRefPubMedGoogle Scholar
  41. 41.
    Tan KP, Yang M, Ito S. Activation of nuclear factor (erythroid-2 like) factor 2 by toxic bile acids provokes adaptive defense responses to enhance cell survival at the emergence of oxidative stress. Mol Pharmacol. 2007;72(5):1380. doi: 10.1124/mol.107.039370.CrossRefPubMedGoogle Scholar
  42. 42.
    Bozkurt TE, Sahin-Erdemli I. M1 and M3 muscarinic receptors are involved in the release of urinary bladder-derived relaxant factor. Pharmacol Res. 2009;59(5):300–5. doi: 10.1016/j.phrs.2009.01.013.CrossRefPubMedGoogle Scholar
  43. 43.
    Blair WS, Cao J, Jackson L, Jimenez J, Peng Q, Wu H, et al. Identification and characterization of UK-201844, a novel inhibitor that interferes with human immunodeficiency virus type 1 gp160 processing. Antimicrob Agents Chemother. 2007;51(10):3554–61. doi: 10.1128/AAC.00643-07.CrossRefPubMedGoogle Scholar
  44. 44.
    Sjoblom T, Jones S, Wood LD, Parsons DW, Lin J, Barber TD, et al. The consensus coding sequences of human breast and colorectal cancers. Science. 2006;314(5797):268–74. doi: 10.1126/science.1133427.CrossRefPubMedGoogle Scholar
  45. 45.
    Shibata T, Kokubu A, Gotoh M, Ojima H, Ohta T, Yamamoto M, et al. Genetic alteration of Keap1 confers constitutive Nrf2 activation and resistance to chemotherapy in gallbladder cancer. Gastroenterology. 2008. doi: 10.1053/j.gastro.2008.06.082.
  46. 46.
    Gregori-Puigjané E, Setola V, Hert J, Crews BA, Irwin JJ, Lounkine E, et al. Identifying mechanism-of-action targets for drugs and probes. Proc Natl Acad Sci U S A. 2012;109(28):11178–83. doi: 10.1073/pnas.1204524109.CrossRefPubMedGoogle Scholar
  47. 47.
    Gregori-Puigjané E, Keiser MJ. Chemoinformatic approaches to target identification. In: Harris CJ, Morphy JR, editors. Designing multi-target drugs. London: Royal Society of Chemistry; 2012.Google Scholar
  48. 48.
    Brummond KM, Goodell J, LaPorte M, Wang L, Xie X-Q. Synthesis and in silico screening of a library of carboline-containing compounds. Beilstein J Org Chem. 2012;8:1048–58. doi: 10.3762/bjoc.8.117.CrossRefPubMedGoogle Scholar
  49. 49.
    Sanseau P, Agarwal P, Barnes MR, Pastinen T, Richards JB, Cardon LR, et al. Use of genome-wide association studies for drug repositioning. Nat Biotechnol. 2012;30(4):317–20. doi: 10.1038/nbt.2151.CrossRefPubMedGoogle Scholar
  50. 50.
    Hu G, Agarwal P. Human disease–drug network based on genomic expression profiles. PLoS One. 2009;4(8):e6536. doi: 10.1371/journal.pone.0006536.CrossRefPubMedGoogle Scholar
  51. 51.
    Wang L, Ma C, Xie X-Q. Linear and non-linear support vector machine for the classification of human 5-HT1A ligand functionality. Mol Inf. 2012;31(1):85–95. doi: 10.1002/minf.201100126.CrossRefGoogle Scholar
  52. 52.
    Geppert H, Humrich J, Stumpfe D, Gärtner T, Bajorath J. Ligand prediction from protein sequence and small molecule information using support vector machines and fingerprint descriptors. J Chem Inf Model. 2009;49(4):767–79. doi: 10.1021/ci900004a.CrossRefPubMedGoogle Scholar
  53. 53.
    Ma C, Wang L, Xie XQ. GPU accelerated chemical similarity calculation for compound library comparison. J Chem Inf Model. 2011;51(7):1521–7. doi: 10.1021/ci1004948.CrossRefPubMedGoogle Scholar
  54. 54.
    Yang L, Agarwal P. Systematic drug repositioning based on clinical side-effects. PLoS One. 2011;6(12):e28025. doi: 10.1371/journal.pone.0028025.CrossRefPubMedGoogle Scholar
  55. 55.
    Yao L, Zhang Y, Li Y, Sanseau P, Agarwal P. Electronic health records: implications for drug discovery. Drug Discov Today. 2011;16(13–14):594–9. doi: 10.1016/j.drudis.2011.05.009.CrossRefPubMedGoogle Scholar

Copyright information

© American Association of Pharmaceutical Scientists 2013

Authors and Affiliations

  • Lirong Wang
    • 1
    • 2
    • 3
  • Chao Ma
    • 1
    • 3
    • 4
  • Peter Wipf
    • 2
    • 3
  • Haibin Liu
    • 1
    • 5
  • Weiwei Su
    • 5
  • Xiang-Qun Xie
    • 1
    • 2
    • 3
    • 4
    Email author
  1. 1.Department of Pharmaceutical SciencesSchool of Pharmacy, Computational Chemical Genomics Screening CenterPittsburghUSA
  2. 2.Center for Chemical Methodologies & Library Development (UPCMLD), Department of ChemistryPittsburghUSA
  3. 3.Drug Discovery InstitutePittsburghUSA
  4. 4.Departments of Computational and Systems BiologyUniversity of PittsburghPittsburghUSA
  5. 5.Guangzhou Quality R&D Center of Traditional Chinese Medicine, School of Life SciencesSun Yat-Sen UniversityGuangzhouPeople’s Republic of China

Personalised recommendations