Mapping Between Databases of Compounds and Protein Targets

Muresan, Sorel; Sitzmann, Markus; Southan, Christopher

doi:10.1007/978-1-61779-965-5_8

Mapping Between Databases of Compounds and Protein Targets

Sorel Muresan²,
Markus Sitzmann³ &
Christopher Southan^2,4

Protocol
First Online: 01 January 2012

3084 Accesses
14 Citations
7 Altmetric

Part of the book series: Methods in Molecular Biology ((MIMB,volume 910))

Abstract

Databases that provide links between bioactive compounds and their protein targets are increasingly important in drug discovery and chemical biology. They join the expanding universes of cheminformatics via chemical structures on the one hand and bioinformatics via sequences on the other. However, it is difficult to assess the relative utility of databases without the explicit comparison of content. We have exemplified an approach to this by comparing resources that each has a different focus on bioactive chemistry (ChEMBL, DrugBank, Human Metabolome Database, and Therapeutic Target Database) both at the chemical structure and protein levels. We compared the compound sets at different representational stringencies using NCI/CADD Structure Identifiers. The overlap and uniqueness in chemical content can be broadly interpreted in the context of different data capture strategies. However, we recorded apparent anomalies, such as many compounds-in-common between the metabolite and drug databases. We also compared the content of sequences mapped to the compounds via their UniProt protein identifiers. While these were also generally interpretable in the context of individual databases we discerned differences in coverage and the types of supporting data used. For example, the target concept is applied differently between DrugBank and the Therapeutic Target Database. In ChEMBL it encompasses a broader range of mappings from chemical biology and species orthologue cross-screening in addition to drug targets per se. Our analysis should assist users not only in exploiting the synergies between these four high-value resources but also in assessing the utility of other databases at the interface of chemistry and biology.

This is a preview of subscription content, log in via an institution.

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 159.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

Chemical Structure Lookup Service (CSLS). http://cactus.nci.nih.gov/lookup. Accessed 27 Oct 2010
The UniProt Consortium (2010) The universal protein resource (UniProt) in 2010. Nucleic Acids Res 38:D142–D148
Article Google Scholar
Southan C, Varkonyi P, Muresan S (2009) Quantitative assessment of the expanding complementarity between public and commercial databases of bioactive compounds. J Cheminfo. http://www.jcheminf.com/content/1/1/10. doi:10.1186/1758-2946-1-10
ChEMBL. http://www.ebi.ac.uk/chembl. Accessed 19 Sept 2010
Wishart DS, Knox C, Guo AC et al (2008) DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res 36:D901–D906
Article PubMed CAS Google Scholar
Wishart DS, Knox C, Guo AC et al (2009) HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res 37:D603–D610
Article PubMed CAS Google Scholar
Zhu F, Han B, Kumar P et al (2010) Update of TTD: therapeutic target database. Nucleic Acids Res 38:D787–D791
Article PubMed CAS Google Scholar
The Protein Identifier Cross-Reference Service. http://www.ebi.ac.uk/Tools/picr/init.do. Accessed 27 Oct 2010
The IUPAC International Chemical Identifier (Version 1.03). http://www.iupac.org/inchi/release103.html. Accessed 27 Oct 2010
NCI/CADD Chemical Identifier Resolver. http://cactus.nci.nih.gov/chemical/structure. Accessed 27 Oct 2010
Oliveros JC (2007) VENNY: an interactive tool for comparing lists with Venn diagrams. http://bioinfogp.cnb.csic.es/tools/venny/index.html. Accessed 27 Oct 2010
de Matos P, Alcántara R, Dekker A et al (2010) Chemical entities of biological interest: an update. Nucleic Acids Res 38:D249–D254
Article PubMed Google Scholar
Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28:31–36
Article CAS Google Scholar
Daylight Chemical Information Systems Inc. http://www.daylight.com. Accessed 27 Oct 2010
Dalby A, Nourse JG, Hounshell WD et al (1992) Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited. J Chem Inf Comput Sci 32:244–255
Article CAS Google Scholar
CTfile Formats. http://www.symyx.com/downloads/public/ctfile/ctfile.jsp. Accessed 27 Oct 2010
Weininger D, Weininger A, Weininger JL (1989) SMILES. 2. Algorithm for generation of unique SMILES notation. J Chem Inf Comput Sci 29:97–101
Article CAS Google Scholar
Ihlenfeldt WD, Gasteiger J (1994) Hash codes for the identification and classification of molecular structure elements. J Comput Chem 15:793–813
Article CAS Google Scholar
InChI TRUST—History of InChI. http://www.inchi-trust.org/index.php?q=node/2. Accessed 10 Sept 2010
CAS Registry Numbers. http://www.cas.org/index.html. Accessed 10 Sept 2010
ChemSpider. http://www.chemspider.com. Accessed 10 Oct 2010
Li Q, Cheng T, Wang Y et al (2010) PubChem as a public resource for drug discovery. Drug Discov Today 15:1052–1057
Article PubMed CAS Google Scholar
Sitzmann M, Filippov IV, Nicklaus MC (2008) Internet resources integrating many small-molecule databases. SAR QSAR Environ Res 19:1–9
Article PubMed CAS Google Scholar
Ihlenfeldt WD, Takahashi Y, Abe H et al (1994) Computation and management of chemical properties in CACTVS: an extensible networked approach toward modularity and compatibility. J Chem Inf Comput Sci 34:109–116
Article CAS Google Scholar
Xemistry GmbH. http://xemistry.com. Accessed 10 Oct 2010
ChemNavigator—iResearch Library. http://www.chemnavigator.com/cnc/products/iRL.asp. Accessed 10 Oct 2010
PubChem Substance Set. ftp://ftp.ncbi.nlm.nih.gov/pubchem/Substance/CURRENT-Full. Accessed 10 Oct 2010
Irwin JJ, Shoichet BK (2005) ZINC—a free database of commercially available compounds for virtual screening. J Chem Inf Model 45:177–182
Article PubMed CAS Google Scholar
eMolecules. http://www.emolecules.com. Accessed 8 Sept 2010
Overington JP, Al-Lazikani B, Hopkins AL (2006) How many drug targets are there? Nat Rev Drug Discov 5:993–996
Article PubMed CAS Google Scholar
Wang X, Chen C-F, Baker PR et al (2007) Mass spectrometric characterization of the affinity-purified human 26S proteasome complex. Biochemistry 46:3553–3565
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

DECS Global Compound Sciences, Computational Chemistry, AstraZeneca R&D, Mölndal, Sweden
Sorel Muresan & Christopher Southan
Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, USA
Markus Sitzmann
ChrisDS Consulting, Göteborg, Sweden
Christopher Southan

Authors

Sorel Muresan
View author publications
You can also search for this author in PubMed Google Scholar
Markus Sitzmann
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Southan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sorel Muresan .

Editor information

Editors and Affiliations

, Department of Pathology, The University of New Mexico, Camino de Salud 915, Albuquerque, 87131, New Mexico, USA
Richard S. Larson

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Muresan, S., Sitzmann, M., Southan, C. (2012). Mapping Between Databases of Compounds and Protein Targets. In: Larson, R. (eds) Bioinformatics and Drug Discovery. Methods in Molecular Biology, vol 910. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-965-5_8

Download citation

DOI: https://doi.org/10.1007/978-1-61779-965-5_8
Published: 18 June 2012
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-61779-964-8
Online ISBN: 978-1-61779-965-5
eBook Packages: Springer Protocols

Publish with us

Policies and ethics