Skip to main content

Mapping Between Databases of Compounds and Protein Targets

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 910))

Abstract

Databases that provide links between bioactive compounds and their protein targets are increasingly important in drug discovery and chemical biology. They join the expanding universes of cheminformatics via chemical structures on the one hand and bioinformatics via sequences on the other. However, it is difficult to assess the relative utility of databases without the explicit comparison of content. We have exemplified an approach to this by comparing resources that each has a different focus on bioactive chemistry (ChEMBL, DrugBank, Human Metabolome Database, and Therapeutic Target Database) both at the chemical structure and protein levels. We compared the compound sets at different representational stringencies using NCI/CADD Structure Identifiers. The overlap and uniqueness in chemical content can be broadly interpreted in the context of different data capture strategies. However, we recorded apparent anomalies, such as many compounds-in-common between the metabolite and drug databases. We also compared the content of sequences mapped to the compounds via their UniProt protein identifiers. While these were also generally interpretable in the context of individual databases we discerned differences in coverage and the types of supporting data used. For example, the target concept is applied differently between DrugBank and the Therapeutic Target Database. In ChEMBL it encompasses a broader range of mappings from chemical biology and species orthologue cross-screening in addition to drug targets per se. Our analysis should assist users not only in exploiting the synergies between these four high-value resources but also in assessing the utility of other databases at the interface of chemistry and biology.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   159.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Chemical Structure Lookup Service (CSLS). http://cactus.nci.nih.gov/lookup. Accessed 27 Oct 2010

  2. The UniProt Consortium (2010) The universal protein resource (UniProt) in 2010. Nucleic Acids Res 38:D142–D148

    Article  Google Scholar 

  3. Southan C, Varkonyi P, Muresan S (2009) Quantitative assessment of the expanding complementarity between public and commercial databases of bioactive compounds. J Cheminfo. http://www.jcheminf.com/content/1/1/10. doi:10.1186/1758-2946-1-10

  4. ChEMBL. http://www.ebi.ac.uk/chembl. Accessed 19 Sept 2010

  5. Wishart DS, Knox C, Guo AC et al (2008) DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res 36:D901–D906

    Article  PubMed  CAS  Google Scholar 

  6. Wishart DS, Knox C, Guo AC et al (2009) HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res 37:D603–D610

    Article  PubMed  CAS  Google Scholar 

  7. Zhu F, Han B, Kumar P et al (2010) Update of TTD: therapeutic target database. Nucleic Acids Res 38:D787–D791

    Article  PubMed  CAS  Google Scholar 

  8. The Protein Identifier Cross-Reference Service. http://www.ebi.ac.uk/Tools/picr/init.do. Accessed 27 Oct 2010

  9. The IUPAC International Chemical Identifier (Version 1.03). http://www.iupac.org/inchi/release103.html. Accessed 27 Oct 2010

  10. NCI/CADD Chemical Identifier Resolver. http://cactus.nci.nih.gov/chemical/structure. Accessed 27 Oct 2010

  11. Oliveros JC (2007) VENNY: an interactive tool for comparing lists with Venn diagrams. http://bioinfogp.cnb.csic.es/tools/venny/index.html. Accessed 27 Oct 2010

  12. de Matos P, Alcántara R, Dekker A et al (2010) Chemical entities of biological interest: an update. Nucleic Acids Res 38:D249–D254

    Article  PubMed  Google Scholar 

  13. Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28:31–36

    Article  CAS  Google Scholar 

  14. Daylight Chemical Information Systems Inc. http://www.daylight.com. Accessed 27 Oct 2010

  15. Dalby A, Nourse JG, Hounshell WD et al (1992) Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited. J Chem Inf Comput Sci 32:244–255

    Article  CAS  Google Scholar 

  16. CTfile Formats. http://www.symyx.com/downloads/public/ctfile/ctfile.jsp. Accessed 27 Oct 2010

  17. Weininger D, Weininger A, Weininger JL (1989) SMILES. 2. Algorithm for generation of unique SMILES notation. J Chem Inf Comput Sci 29:97–101

    Article  CAS  Google Scholar 

  18. Ihlenfeldt WD, Gasteiger J (1994) Hash codes for the identification and classification of molecular structure elements. J Comput Chem 15:793–813

    Article  CAS  Google Scholar 

  19. InChI TRUST—History of InChI. http://www.inchi-trust.org/index.php?q=node/2. Accessed 10 Sept 2010

  20. CAS Registry Numbers. http://www.cas.org/index.html. Accessed 10 Sept 2010

  21. ChemSpider. http://www.chemspider.com. Accessed 10 Oct 2010

  22. Li Q, Cheng T, Wang Y et al (2010) PubChem as a public resource for drug discovery. Drug Discov Today 15:1052–1057

    Article  PubMed  CAS  Google Scholar 

  23. Sitzmann M, Filippov IV, Nicklaus MC (2008) Internet resources integrating many small-molecule databases. SAR QSAR Environ Res 19:1–9

    Article  PubMed  CAS  Google Scholar 

  24. Ihlenfeldt WD, Takahashi Y, Abe H et al (1994) Computation and management of chemical properties in CACTVS: an extensible networked approach toward modularity and compatibility. J Chem Inf Comput Sci 34:109–116

    Article  CAS  Google Scholar 

  25. Xemistry GmbH. http://xemistry.com. Accessed 10 Oct 2010

  26. ChemNavigator—iResearch Library. http://www.chemnavigator.com/cnc/products/iRL.asp. Accessed 10 Oct 2010

  27. PubChem Substance Set. ftp://ftp.ncbi.nlm.nih.gov/pubchem/Substance/CURRENT-Full. Accessed 10 Oct 2010

  28. Irwin JJ, Shoichet BK (2005) ZINC—a free database of commercially available compounds for virtual screening. J Chem Inf Model 45:177–182

    Article  PubMed  CAS  Google Scholar 

  29. eMolecules. http://www.emolecules.com. Accessed 8 Sept 2010

  30. Overington JP, Al-Lazikani B, Hopkins AL (2006) How many drug targets are there? Nat Rev Drug Discov 5:993–996

    Article  PubMed  CAS  Google Scholar 

  31. Wang X, Chen C-F, Baker PR et al (2007) Mass spectrometric characterization of the affinity-purified human 26S proteasome complex. Biochemistry 46:3553–3565

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sorel Muresan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media New York

About this protocol

Cite this protocol

Muresan, S., Sitzmann, M., Southan, C. (2012). Mapping Between Databases of Compounds and Protein Targets. In: Larson, R. (eds) Bioinformatics and Drug Discovery. Methods in Molecular Biology, vol 910. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-965-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-965-5_8

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-61779-964-8

  • Online ISBN: 978-1-61779-965-5

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics