Incorporating Commercial and Private Data into an Open Linked Data Platform for Drug Discovery

  • Carole Goble
  • Alasdair J. G. Gray
  • Lee Harland
  • Karen Karapetyan
  • Antonis Loizou
  • Ivan Mikhailov
  • Yrjänä Rankka
  • Stefan Senger
  • Valery Tkachenko
  • Antony J. Williams
  • Egon L. Willighagen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8219)

Abstract

The Open PHACTS Discovery Platform aims to provide an integrated information space to advance pharmacological research in the area of drug discovery. Effective drug discovery requires comprehensive data coverage, i.e. integrating all available sources of pharmacology data. While many relevant data sources are available on the linked open data cloud, their content needs to be combined with that of commercial datasets and the licensing of these commercial datasets respected when providing access to the data. Additionally, pharmaceutical companies have built up their own extensive private data collections that they require to be included in their pharmacological dataspace. In this paper we discuss the challenges of incorporating private and commercial data into a linked dataspace: focusing on the modelling of these datasets and their interlinking. We also present the graph-based access control mechanism that ensures commercial and private datasets are only available to authorized users.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alexander, K., Cyganiak, R., Hausenblas, M., Zhao, J.: Describing linked datasets with the void vocabulary. Note, W3C (March 2011), http://www.w3.org/TR/void/
  2. 2.
    Azzaoui, K., Jacoby, E., Senger, S., Rodríguez, E.C., Loza, M., Zdrazil, B., Pinto, M., Williams, A.J., de la Torre, V., Mestres, J., Pastor, M., Taboureau, O., Rarey, M., Chichester, C., Pettifer, S., Blomberg, N., Harland, L., Williams-Jones, B., Ecker, G.F.: Scientific competency questions as the basis for semantically enriched open pharmacological space development. Drug Discovery Today (to appear), http://dx.doi.org/10.1016/j.drudis.2013.05.008
  3. 3.
  4. 4.
    Berners-Lee, T.: Linked data. Technical report, W3C (2006), http://www.w3.org/DesignIssues/LinkedData.html
  5. 5.
    Callahan, A., Cruz-Toledo, J., Ansell, P., Dumontier, M.: Bio2rdf release 2: Improved coverage, interoperability and provenance of life science linked data. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 200–212. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  6. 6.
    Chen, B., Wild, D., Guha, R.: Pubchem as a source of polypharmacology. Journal of Chemical Information and Modeling 49(9), 2044–2055 (2009)CrossRefGoogle Scholar
  7. 7.
    Cobden, M., Black, J., Gibbins, N., Carr, L., Shadbolt, N.: A research agenda for linked closed dataset. In: Proceedings of the Second International Workshop on Consuming Linked Data (COLD 2011). CEUR Workshop Proceedings, Bonn, Germany (2011)Google Scholar
  8. 8.
    Dalby, A., Nourse, J.G., Hounshell, W.D., Gushurst, A.K.I., Grier, D.L., Leland, B.A., Laufer, J.: Description of several chemical structure file formats used by computer programs developed at molecular design limited. Journal of Chemical Information and Modeling 32(3), 244 (1992)CrossRefGoogle Scholar
  9. 9.
    Gaulton, A., Bellis, L., Chambers, J., Davies, M., Hersey, A., Light, Y., McGlinchey, S., Akhtar, R., Atkinson, F., Bento, A., Al-Lazikani, B., Michalovich, D., Overington, J.: ChEMBL: A large-scale bioactivity database for chemical biology and drug discovery. Nucleic Acids Research. Database Issue 40(D1), D1100–D1107 (2012)Google Scholar
  10. 10.
    Gray, A.J.G., Groth, P., Loizou, A., Askjaer, S., Brenninkmeijer, C., Burger, K., Chichester, C., Evelo, C.T., Goble, C., Harland, L., Pettifer, S., Thompson, M., Waagmeester, A., Williams, A.J.: Applying linked data approaches to pharmacology: Architectural decisions and implementation. Semantic Web Journal (to appear), http://semantic-web-journal.net/sites/default/files/swj258.pdf
  11. 11.
    Gray, A.: Dataset descriptions for the open pharmacological space. Working Draft, Open PHACTS (October 2012), http://www.openphacts.org/specs/datadesc/
  12. 12.
    Haupt, C., Waagmeester, A., Zimmerman, M., Willighagen, E.: Guidelines for exposing data as RDF in Open PHACTS. Working Draft, Open PHACTS (August 2012), http://www.openphacts.org/specs/rdfguide/
  13. 13.
    Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space. In: Synthesis Lectures on the Semantic Web: Theory and Technology, 1st edn., vol. 1. Morgan & Claypool (2011)Google Scholar
  14. 14.
    Karapetyan, K., Tkachenko, V., Batchelor, C., Sharpe, D., Williams, A.J.: Rsc chemical validation and Standardization platform: A potential path to quality-conscious databases. In: 245th American Chemical Society National Meeting and Exposition, New Orleans, LA, USA (April 2013)Google Scholar
  15. 15.
    Kelder, T., van Iersel, M., Hanspers, K., Kutmon, M., Conklin, B., Evelo, C., Pico, A.: WikiPathways: building research communities on biological pathways. Nucleic Acids Research 40(D1), D1301–D1307 (2012)Google Scholar
  16. 16.
    Marshall, M.S., Boyce, R., Deus, H.F., Zhao, J., Willighagen, E.L., Samwald, M., Pichler, E., Hajagos, J., Prud’hommeaux, E., Stephens, S.: Emerging practices for mapping and linking life sciences data using RDF - a case series. Journal of Web Semantics 14, 2–13 (2012)CrossRefGoogle Scholar
  17. 17.
    McNaught, A.: The IUPAC international chemical identifier: InChI. Chemistry International 28(6) (2006)Google Scholar
  18. 18.
    Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., Kanehisa, M.: Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Research 27(1), 29–34 (1999)CrossRefGoogle Scholar
  19. 19.
    Pence, H.E., Williams, A.: Chemspider: An online chemical information resource. Journal of Chemical Education 87(11), 1123–1124 (2010)CrossRefGoogle Scholar
  20. 20.
    Schomburg, I., Chang, A., Ebeling, C., Gremse, M., Heldt, C., Huhn, G., Schomburg, D.: Brenda, the enzyme database: updates and major new developments. Nucleic Acids Research 32(Database issue), D431–D433 (2004)Google Scholar
  21. 21.
    Southan, C., Várkonyi, P., Muresan, S.: Quantitative assessment of the expanding complementarity between public and commercial databases of bioactive compounds. Journal of Cheminformatics 1(10) (2009)Google Scholar
  22. 22.
    The UniProt Consortium: Update on activities at the universal protein resource (UniProt) in 2013. Nucleic Acids Research 41(D1), D43–D47 (2013)Google Scholar
  23. 23.
    US Food and Drug Administration: Food and Drug Administration Substance Registration System Standard Operating Procedure, 5c edn. (June 2007), http://www.fda.gov/downloads/ForIndustry/DataStandards/SubstanceRegistrationSystem-UniqueIngredientIdentifierUNII/ucm127743.pdf
  24. 24.
    Vempati, U.D., Przydzial, M.J., Chung, C., Abeyruwan, S., Mir, A., Sakurai, K., Visser, U., Lemmon, V.P., Schürer, S.C.: Formalization, annotation and analysis of diverse drug and probe screening assay datasets using the BioAssay ontology (BAO). PLoS ONE 7(11), e49198+ (2012)Google Scholar
  25. 25.
    Wang, Y., Bolton, E., Dracheva, S., Karapetyan, K., Shoemaker, B., Suzek, T., Wang, J., Xiao, J., Zhang, J., Bryant, S.: An overview of the pubchem bioassay resource. Nucleic Acids Research 38(Database issue), D255–D266 (2010)Google Scholar
  26. 26.
    Williams, A.J., Harland, L., Groth, P., Pettifer, S., Chichester, C., Willighagen, E.L., Evelo, C.T., Blomberg, N., Ecker, G., Goble, C., Mons, B.: Open PHACTS: Semantic interoperability for drug discovery. Drug Discovery Today 17(21-22), 1188–1198 (2012)CrossRefGoogle Scholar
  27. 27.
    Williams, A.J., Wilbanks, J., Ekins, S.: Why open drug discovery needs four simple rules for licensing data and models. PLoS Computational Biology 8(9) (September 2012)Google Scholar
  28. 28.
    Willighagen, E.: Encoding units and unit types in RDF using QUDT. Working Draft, Open PHACTS (June 2013)Google Scholar
  29. 29.
    Willighagen, E.L., Waagmeester, A., Spjuth, O., Ansell, P., Williams, A.J., Tkachenko, V., Hastings, J., Chen, B., Wild, D.J.: The ChEMBL database as linked open data. Journal of Cheminformatics 5(23) (2013)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Carole Goble
    • 1
  • Alasdair J. G. Gray
    • 1
  • Lee Harland
    • 2
  • Karen Karapetyan
    • 3
  • Antonis Loizou
    • 4
  • Ivan Mikhailov
    • 5
  • Yrjänä Rankka
    • 5
  • Stefan Senger
    • 6
  • Valery Tkachenko
    • 3
  • Antony J. Williams
    • 3
  • Egon L. Willighagen
    • 7
  1. 1.School of Computer ScienceUniversity of ManchesterUK
  2. 2.Connected DiscoveryUK
  3. 3.Royal Society of ChemistryUK
  4. 4.Department of Computer ScienceVU University of AmsterdamThe Netherlands
  5. 5.OpenLink SoftwareUK
  6. 6.GlaxoSmithKlineUK
  7. 7.Department of Bioinformatics - BiGCaTMaastricht UniversityThe Netherlands

Personalised recommendations