Advertisement

Mining Linked Open Data: A Case Study with Genes Responsible for Intellectual Disability

  • Gabin Personeni
  • Simon Daget
  • Céline Bonnet
  • Philippe Jonveaux
  • Marie-Dominique Devignes
  • Malika Smaïl-Tabbone
  • Adrien Coulet
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8574)

Abstract

Linked Open Data (LOD) constitute a unique dataset that is in a standard format, partially integrated, and facilitates connections with domain knowledge represented within semantic web ontologies. Increasing amounts of biomedical data provided as LOD consequently offer novel opportunities for knowledge discovery in biomedicine. However, most data mining methods are neither adapted to LOD format, nor adapted to consider domain knowledge. We propose in this paper an approach for selecting, integrating, and mining LOD with the goal of discovering genes responsible for a disease. The selection step relies on a set of choices made by a domain expert to isolate relevant pieces of LOD. Because these pieces are potentially not linked, an integration step is required to connect unlinked pieces. The resulting graph is subsequently mined using Inductive Logic Programming (ILP) that presents two main advantages. First, the input format compliant with ILP is close to the format of LOD. Second, domain knowledge can be added to this input and considered by ILP. We have implemented and applied this approach to the characterization of genes responsible for intellectual disability. On the basis of this real-world use case, we present an evaluation of our mining approach and discuss its advantages and drawbacks for the mining of biomedical LOD.

Keywords

Intellectual Disability Intellectual Disability Inductive Logic Programming SPARQL Query Link Open Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked Data - The Story So Far. Int. J. Semantic Web Inf. Syst. 5(3), 1–22 (2009)CrossRefGoogle Scholar
  2. 2.
    Antezana, E., Kuiper, M., Mironov, V.: Biological knowledge management: the emerging role of the Semantic Web technologies. Briefings in Bioinformatics 10(4), 392–407 (2009)CrossRefGoogle Scholar
  3. 3.
    Belleau, F., Nolin, M.-A., Tourigny, N., Rigault, P., Morissette, J.: Bio2RDF: Towards a mashup to build bioinformatics knowledge systems. Journal of Biomedical Informatics 41(5), 706–716 (2008); Semantic Mashup of Biomedical DataGoogle Scholar
  4. 4.
    Samwald, M., Jentzsch, A., Bouton, C., Kallesøe, C., Willighagen, E.L., Hajagos, J., Marshall, M.S., Prud’hommeaux, E., Hassanzadeh, O., Pichler, E., Stephens, S.: Linked open drug data for pharmaceutical research and development. J. Cheminformatics 3, 19 (2011)CrossRefGoogle Scholar
  5. 5.
    Kinjo, A.R., Suzuki, H., Yamashita, R., Ikegawa, Y., Kudou, T., Igarashi, R., Kengaku, Y., Cho, H., Standley, D.M., Nakagawa, A., Nakamura, H.: Protein Data Bank Japan (PDBj): maintaining a structural data archive and resource description framework format. Nucleic Acids Research 40 (Database-Issue), 453–460 (2012)Google Scholar
  6. 6.
    The EBI RDF Platform, http://www.ebi.ac.uk/rdf/
  7. 7.
    Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al.: Gene Ontology: tool for the unification of biology. Nature Genetics 25(1), 25–29 (2000)CrossRefGoogle Scholar
  8. 8.
    Coulet, A., Smaïl-Tabbone, M., Benlian, P., Napoli, A., Devignes, M.-D.: Ontology-guided data preparation for discovering genotype-phenotype relationships. BMC Bioinformatics 9(suppl. 4), S3 (2008)Google Scholar
  9. 9.
    Coulet, A., Smaïl-Tabbone, M., Napoli, A., Devignes, M.-D.: Ontology-based knowledge discovery in pharmacogenomics. In: Software Tools and Algorithms for Biological Systems, pp. 357–366. Springer (2011)Google Scholar
  10. 10.
    Good, B.M., Wilkinson, M.D.: The Life Sciences Semantic Web is Full of Creeps!. Briefings in Bioinformatics 7(3), 275–286 (2006)CrossRefGoogle Scholar
  11. 11.
    Marshall, M.S., Boyce, R.D., Deus, H.F., Zhao, J., Willighagen, E.L., Samwald, M., Pichler, E., Hajagos, J., Prud’hommeaux, E., Stephens, S.: Emerging practices for mapping and linking life sciences data using RDF - A case series. J. Web Sem. 14, 2–13 (2012)CrossRefGoogle Scholar
  12. 12.
    Alam, M., Chekol, M.W., Coulet, A., Napoli, A., Smaïl-Tabbone, M.: Lattice Based Data Access (LBDA): An Approach for Organizing and Accessing Linked Open Data in Biology. In: Proceedings of the International Workshop on Data Mining on Linked Data, DMoLD 2013 (2013)Google Scholar
  13. 13.
    Callahan, A., Cruz-Toledo, J., Dumontier, M.: Querying Bio2RDF Linked Open Data with a Global Schema. In: Proceedings of Bio-ontologies SIG (2012)Google Scholar
  14. 14.
    Wilkinson, M.D., Vandervalk, B.P., McCarthy, E.L.: The Semantic Automated Discovery and Integration (SADI) Web service Design-Pattern, API and Reference Implementation. J. Biomedical Semantics 2, 8 (2011)CrossRefGoogle Scholar
  15. 15.
    Lopes, P., Oliveira, J.L.: COEUS: “semantic web in a box” for biomedical applications. J. Biomedical Semantics 3, 11 (2012)CrossRefGoogle Scholar
  16. 16.
    Gangemi, A., Nuzzolese, A.G., Presutti, V., Draicchio, F., Musetti, A., Ciancarini, P.: Automatic typing of dBpedia entities. In: Cudré-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 65–81. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  17. 17.
    Ngonga Ngomo, A.-C.: Link discovery with guaranteed reduction ratio in affine spaces with minkowski measures. In: Cudré-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 378–393. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  18. 18.
    Xu, M., Wang, Z., Bie, R., Li, J., Zheng, C., Ke, W., Zhou, M.: Discovering Missing Semantic Relations between Entities in Wikipedia. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 673–686. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  19. 19.
    Brenninkmeijer, C.Y.A., Dunlop, I., Goble, C.A., Gray, A.J.G., Pettifer, S., Stevens, R.: Computing Identity Co-Reference Across Drug Discovery Datasets. In: Proceedings of the 6th International Workshop on Semantic Web Applications and Tools for Life Sciences, SWAT4LS 2013 (2013)Google Scholar
  20. 20.
    Percha, B., Garten, Y., Altman, R.B.: Discovery and explanation of drug-drug interactions via text mining. In: Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, pp. 410–421. World Scientific (2012)Google Scholar
  21. 21.
    Pathak, J., Kiefer, R.C., Chute, C.G.: Mining Anti-coagulant Drug-Drug Interactions from Electronic Health Records Using Linked Data. In: Baker, C.J.O., Butler, G., Jurisica, I. (eds.) DILS 2013. LNCS, vol. 7970, pp. 128–140. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  22. 22.
    d’Aquin, M., Kronberger, G., Suárez-Figueroa, M.C.: Combining data mining and ontology engineering to enrich ontologies and linked data. In: Workshop: Knowledge Discovery and Data Mining Meets Linked Open Data-Know@ LOD at Extended Semantic Web Conference (ESWC), vol. 2012 (2012)Google Scholar
  23. 23.
    Galárraga, L.A., Teflioudi, C., Hose, K., Suchanek, F.: Amie: association rule mining under incomplete evidence in ontological knowledge bases. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 413–422. International World Wide Web Conferences Steering Committee (2013)Google Scholar
  24. 24.
    Muggleton, S.: Inductive Logic Programming. New Generation Computing 8(4), 295–318 (1991)CrossRefzbMATHGoogle Scholar
  25. 25.
    Muggleton, S., De Raedt, L.: Inductive logic programming: Theory and methods. The Journal of Logic Programming 19(20), 629–679 (1994)CrossRefGoogle Scholar
  26. 26.
    Srinivasan, A.: The Aleph Manual (2007), http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/
  27. 27.
    Inlow, J.K., Restifo, L.L.: Molecular and comparative genetics of mental retardation. Genetics 166(2), 835–881 (2004)CrossRefGoogle Scholar
  28. 28.
    Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., Ohl, P., Thiel, K., Wiswedel, B.: KNIME-the Konstanz information miner: version 2.0 and beyond. ACM SIGKDD Explorations Newsletter 11(1), 26–31 (2009)CrossRefGoogle Scholar
  29. 29.
    Renaud Grisoni, Emmanuel Bresso, Marie-Dominique Devignes, and Malika Smaïl-Tabbone. Méthodologie et outils pour l’extraction de connaissances par Programmation Logique Inductive (PLI) (Poster). In 13ème Conférence Francophone sur l’Extraction et la Gestion des Connaissances- EGC 2013, Toulouse, France, 2013.Google Scholar
  30. 30.
    Clara DM van Karnebeek and Sylvia Stockler. Treatable inborn errors of metabolism causing intellectual disability: a systematic literature review. Molecular genetics and metabolism, 105(3):368–381, 2012.CrossRefGoogle Scholar
  31. 31.
    Michael R. Berthold, Katharina Morik, and Arno Siebes, editors. Parallel Universes and Local Patterns, volume 07181 of Dagstuhl Seminar Proceedings, 2007.Google Scholar
  32. 32.
    Arno Knobbe, Bruno Crémilleux, Johannes Fürnkranz, and Martin Scholz. From Local Patterns to Global Models: The LeGo Approach to Data Mining. In International Workshop From Local Patterns to Global Models co-located with ECML/PKDD’08, pages 1–16, Antwerp, Belgium, September 2008.Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Gabin Personeni
    • 1
  • Simon Daget
    • 1
  • Céline Bonnet
    • 2
    • 3
  • Philippe Jonveaux
    • 2
    • 3
  • Marie-Dominique Devignes
    • 1
  • Malika Smaïl-Tabbone
    • 1
  • Adrien Coulet
    • 1
  1. 1.LORIA (CNRS, Inria NGE, Université de Lorraine)Vandoeuvre-lès-NancyFrance
  2. 2.Laboratoire de Génétique MédicaleCentre Hospitalier Universitaire de NancyVandoeuvre-lès-NancyFrance
  3. 3.INSERM U-954, Université de LorraineVandoeuvre-lès-NancyFrance

Personalised recommendations