Scientific Lenses to Support Multiple Views over Linked Chemistry Data

  • Colin Batchelor
  • Christian Y. A. Brenninkmeijer
  • Christine Chichester
  • Mark Davies
  • Daniela Digles
  • Ian Dunlop
  • Chris T. Evelo
  • Anna Gaulton
  • Carole Goble
  • Alasdair J. G. Gray
  • Paul Groth
  • Lee Harland
  • Karen Karapetyan
  • Antonis Loizou
  • John P. Overington
  • Steve Pettifer
  • Jon Steele
  • Robert Stevens
  • Valery Tkachenko
  • Andra Waagmeester
  • Antony Williams
  • Egon L. Willighagen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8796)

Abstract

When are two entries about a small molecule in different datasets the same? If they have the same drug name, chemical structure, or some other criteria? The choice depends upon the application to which the data will be put. However, existing Linked Data approaches provide a single global view over the data with no way of varying the notion of equivalence to be applied.

In this paper, we present an approach to enable applications to choose the equivalence criteria to apply between datasets. Thus, supporting multiple dynamic views over the Linked Data. For chemical data, we show that multiple sets of links can be automatically generated according to different equivalence criteria and published with semantic descriptions capturing their context and interpretation. This approach has been applied within a large scale public-private data integration platform for drug discovery. To cater for different use cases, the platform allows the application of different lenses which vary the equivalence rules to be applied based on the context and interpretation of the links.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space. Morgan & Claypool (2011)Google Scholar
  2. 2.
    Halpin, H., Hayes, P.J., McCusker, J.P., McGuinness, D.L., Thompson, H.S.: When owl:sameAs Isn’t the Same: An Analysis of Identity in Linked Data. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 305–320. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  3. 3.
    Pence, H.E., Williams, A.J.: ChemSpider: an online chemical information resource. Journal of Chemical Education 87(11), 10–11 (2010)CrossRefGoogle Scholar
  4. 4.
    Knox, C., Law, V., Jewison, T., Liu, P., Ly, S., Frolkis, A., Pon, A., Banco, K., Mak, C., Neveu, V., Djoumbou, Y., Eisner, R., Guo, A.C., Wishart, D.S.: DrugBank 3.0: a comprehensive resource for ’omics’ research on drugs. Nucleic Acids Research 39(Database issue), D1035–D1041 (2011)Google Scholar
  5. 5.
    Brenninkmeijer, C.Y.A., Evelo, C., Goble, C., Gray, A.J.G., Groth, P., Pettifer, S., Stevens, R., Williams, A.J., Willighagen, E.L.: Scientific Lenses over Linked Data: An approach to support task specific views of the data. A vision. In: Proc. Linked Science, Boston, MA, USA. CEUR-WS.org (2012)Google Scholar
  6. 6.
    Williams, A.J., Harland, L., Groth, P., Pettifer, S., Chichester, C., Willighagen, E.L., Evelo, C.T., Blomberg, N., Ecker, G., Goble, C., Mons, B.: Open PHACTS: semantic interoperability for drug discovery. Drug Discovery Today 17(21-22), 1188–1198 (2012)CrossRefGoogle Scholar
  7. 7.
    Gray, A.J.G., Groth, P., Loizou, A., Askjaer, S., Brenninkmeijer, C.Y.A., Burger, K., Chichester, C., Evelo, C.T., Goble, C.A., Harland, L., Pettifer, S., Thompson, M., Waagmeester, A., Williams, A.J.: Applying linked data approaches to pharmacology: Architectural decisions and implementation. Semantic Web 5(2), 101–113 (2014)Google Scholar
  8. 8.
    Groth, P., Loizou, A., Gray, A.J.G., Goble, C., Harland, L., Pettifer, S.: API-centric Linked Data Integration: The Open PHACTS Discovery Platform Case Study. Journal of Web Semantics (2014)Google Scholar
  9. 9.
    Azzaoui, K., Jacoby, E., Senger, S., Rodríguez, E.C., Loza, M., Zdrazil, B., Pinto, M., Williams, A.J., de la Torre, V., Mestres, J., Pastor, M., Taboureau, O., Rarey, M., Chichester, C., Pettifer, S., Blomberg, N., Harland, L., Williams-Jones, B., Ecker, G.F.: Scientific competency questions as the basis for semantically enriched open pharmacological space development. Drug Discovery Today 18(17-18), 843–852 (2013)CrossRefGoogle Scholar
  10. 10.
    Bento, A.P., Gaulton, A., Hersey, A., Bellis, L.J., Chambers, J., Davies, M., Krüger, F.A., Light, Y., Mak, L., McGlinchey, S., Nowotka, M., Papadatos, G., Santos, R., Overington, J.P.: The ChEMBL bioactivity database: an update. Nucleic Acids Research 42(Database issue), D1083–D1090 (2014)Google Scholar
  11. 11.
    Williams, A.J., Ekins, S.: A quality alert and call for improved curation of public chemistry databases. Drug Discovery Today 16(17-18), 747–750 (2011)CrossRefGoogle Scholar
  12. 12.
    Williams, A.J., Ekins, S., Tkachenko, V.: Towards a gold standard: regarding quality in public domain chemistry databases and approaches to improving the situation. Drug Discovery Today 17(13-14), 685–701 (2012)CrossRefGoogle Scholar
  13. 13.
    The UniProt Consortium: Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Research 41(Database issue), D43–D47 (2013)Google Scholar
  14. 14.
    Miles, A., Bechhofer, S.: SKOS Simple Knowledge Organization System Reference. Recommendation, W3C (2009), http://www.w3.org/TR/skos-reference
  15. 15.
    Alexander, K., Cyganiak, R., Hausenblas, M., Zhao, J.: Describing Linked Datasets with the VoID Vocabulary. Note, W3C (2011)Google Scholar
  16. 16.
    Gray, A.J.G.: Dataset descriptions for the Open Pharmacological Space. Working draft, Open PHACTS (2013)Google Scholar
  17. 17.
    Heller, S., McNaught, A., Stein, S., Tchekhovskoi, D., Pletnev, I.: InChI-the worldwide chemical structure identifier standard. J. of Cheminformatics 5(1), 1–9 (2013)CrossRefGoogle Scholar
  18. 18.
    Wohlgemuth, G., Haldiya, P.K., Willighagen, E., Kind, T., Fiehn, O.: The chemical translation service a web-based tool to improve standardization of metabolomic reports. Bioinformatics 26(20), 2647 (2010)Google Scholar
  19. 19.
    Haraldsdóttir, H.S., Thiele, I., Fleming, R.M.: Comparative evaluation of open source software for mapping between metabolite identifiers in metabolic network reconstructions: application to recon 2. Journal of Cheminformatics 6(1), 2 (2014)Google Scholar
  20. 20.
    Karapetyan, K., Tkachenko, V., Batchelor, C., Sharpe, D., Williams, A.J.: RSC chemical validation and standardization platform: A potential path to quality-conscious databases. In: 245th American Chemical Society National Meeting and Exposition, New Orleans, LA, USA (2013)Google Scholar
  21. 21.
    Dalby, A., Nourse, J.G., Hounshell, W.D., Gushurst, A.K.I., Grier, D.L., Leland, B.A., Laufer, J.: Description of several chemical structure file formats used by computer programs developed at molecular design limited. Journal of Chemical Information and Modeling 32(3), 244 (1992)Google Scholar
  22. 22.
    US Food and Drug Administration: Food and Drug Administration Substance Registration System Standard Operating Procedure. 5c edn. (2007), http://www.fda.gov/downloads/ForIndustry/DataStandards/SubstanceRegistrationSystem-UniqueIngredientIdentifierUNII/ucm127743.pdf
  23. 23.
    Degtyarenko, K., de Matos, P., Ennis, M., Hastings, J., Zbinden, M., McNaught, A., Alcántara, R., Darsow, M., Guedj, M., Ashburner, M.: ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Research 36, D344–D350 (2008)Google Scholar
  24. 24.
    Sayle, R.A.: So you think you understand tautomerism? Journal of Computer-Aided Molecular Design 24, 485–496 (2010)CrossRefGoogle Scholar
  25. 25.
    Hastings, J.: Personal communicationGoogle Scholar
  26. 26.
    McNaught, A.: The IUPAC international chemical identifier: InChI. Chemistry International 28(6) (2006)Google Scholar
  27. 27.
    Dalby, A., Nourse, J.G., Hounshell, W.D., Gushurst, A.K.I., Grier, D.L., Leland, B.A., Laufer, J.: Description of several chemical structure file formats used by computer programs developed at molecular design limited. Journal of Chemical Information and Computer Sciences 32(3), 244–255 (1992)Google Scholar
  28. 28.
    Pico, A.R., Kelder, T., van Iersel, M.P., Hanspers, K., Conklin, B.R., Evelo, C.: WikiPathways: pathway editing for the people. PLoS Biol. 6(7), e184 (2008)Google Scholar
  29. 29.
    Ciccarese, P., Soiland-Reyes, S., Belhajjame, K., Gray, A.J.G., Goble, C., Clark, T.: PAV ontology: Provenance, Authoring and Versioning. Journal of Biomedical Semantics 4(37) (2013)Google Scholar
  30. 30.
    van Iersel, M.P., Pico, A.R., Kelder, T., Gao, J., Ho, I., Hanspers, K., Conklin, B.R., Evelo, C.T.: The BridgeDb framework: standardized access to gene, protein and metabolite identifier mapping services. BMC Bioinformatics 11(5) (2010)Google Scholar
  31. 31.
    Brenninkmeijer, C.Y.A., Goble, C., Gray, A.J.G., Groth, P., Loizou, A., Pettifer, S.: Including Co-referent URIs in a SPARQL Query. In: 4th International Workshop on Consuming Linked Data, Sydney, Australia (2013)Google Scholar
  32. 32.
    Doan, A., Halevy, A., Ives, Z.: Principles of Data Integration. Elsevier (2012)Google Scholar
  33. 33.
    Halevy, A.Y., Franklin, M.J., Maier, D.: Principles of dataspace systems. In: PODS 2006, Chicago (IL, USA), pp. 1–9. ACM (2006)Google Scholar
  34. 34.
    Shvaiko, P., Euzenat, J.: Ontology Matching: State of the Art and Future Challenges. IEEE Trans. Knowl. Data Eng. 25(1), 158–176 (2013)CrossRefGoogle Scholar
  35. 35.
    Cuenca Grau, B., Dragisic, Z., Eckert, K., Euzenat, J., Ferrara, A., Granada, R., Ivanova, V., Jiménez-Ruiz, E., Kempf, A.O., Lambrix, P., Nikolov, A., Paulheim, H., Ritze, D., Scharffe, F., Shvaiko, P., Trojahn, C., Zamazal, O.: Results of the Ontology Alignment Evaluation Initiative 2013. In: Ontology Matching (2013)Google Scholar
  36. 36.
    Galgonek, J., Vondrasek, J.: On InChI and evaluating the quality of cross-reference links. Journal of Cheminformatics 6(1), 15+ (2014)Google Scholar
  37. 37.
    Juty, N., Le Novère, N., Laibe, C.: Identifiers.org and MIRIAM Registry: community resources to provide persistent identification. Nucleic Acids Research 40(Database issue), D580–D586 (2012)Google Scholar
  38. 38.
    Bouquet, P., Stoermer, H., Bazzanella, B.: An Entity Name System (ENS) for the Semantic Web. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 258–272. Springer, Heidelberg (2008)Google Scholar
  39. 39.
    Glaser, H., Jaffri, A., Millard, I.: Managing Co-reference on the Semantic Web. In: WWW 2009 Work. Linked Data Web, Madrid, Spain (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Colin Batchelor
    • 1
  • Christian Y. A. Brenninkmeijer
    • 2
  • Christine Chichester
    • 3
  • Mark Davies
    • 4
  • Daniela Digles
    • 5
  • Ian Dunlop
    • 2
  • Chris T. Evelo
    • 6
  • Anna Gaulton
    • 4
  • Carole Goble
    • 2
  • Alasdair J. G. Gray
    • 7
  • Paul Groth
    • 8
  • Lee Harland
    • 9
  • Karen Karapetyan
    • 1
  • Antonis Loizou
    • 8
  • John P. Overington
    • 4
  • Steve Pettifer
    • 2
  • Jon Steele
    • 1
  • Robert Stevens
    • 2
  • Valery Tkachenko
    • 1
  • Andra Waagmeester
    • 6
  • Antony Williams
    • 1
  • Egon L. Willighagen
    • 6
  1. 1.Royal Society of ChemistryUK
  2. 2.School of Computer ScienceUniversity of ManchesterManchesterUK
  3. 3.Swiss Institute for BioinformaticsSwitzerland
  4. 4.European Molecular Biology Laboratory European Bioinformatics InstituteHinxtonUK
  5. 5.Department of Pharmaceutical ChemistryUniversity of ViennaViennaAustria
  6. 6.Maastricht UniversityMaastrichtThe Netherlands
  7. 7.Heriot-Watt UniversityEdinburghUK
  8. 8.VU University of AmsterdamThe Netherlands
  9. 9.Connected DiscoveryUK

Personalised recommendations