Skip to main content

Mining Linked Open Data: A Case Study with Genes Responsible for Intellectual Disability

  • Conference paper
Data Integration in the Life Sciences (DILS 2014)

Abstract

Linked Open Data (LOD) constitute a unique dataset that is in a standard format, partially integrated, and facilitates connections with domain knowledge represented within semantic web ontologies. Increasing amounts of biomedical data provided as LOD consequently offer novel opportunities for knowledge discovery in biomedicine. However, most data mining methods are neither adapted to LOD format, nor adapted to consider domain knowledge. We propose in this paper an approach for selecting, integrating, and mining LOD with the goal of discovering genes responsible for a disease. The selection step relies on a set of choices made by a domain expert to isolate relevant pieces of LOD. Because these pieces are potentially not linked, an integration step is required to connect unlinked pieces. The resulting graph is subsequently mined using Inductive Logic Programming (ILP) that presents two main advantages. First, the input format compliant with ILP is close to the format of LOD. Second, domain knowledge can be added to this input and considered by ILP. We have implemented and applied this approach to the characterization of genes responsible for intellectual disability. On the basis of this real-world use case, we present an evaluation of our mining approach and discuss its advantages and drawbacks for the mining of biomedical LOD.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bizer, C., Heath, T., Berners-Lee, T.: Linked Data - The Story So Far. Int. J. Semantic Web Inf. Syst. 5(3), 1–22 (2009)

    Article  Google Scholar 

  2. Antezana, E., Kuiper, M., Mironov, V.: Biological knowledge management: the emerging role of the Semantic Web technologies. Briefings in Bioinformatics 10(4), 392–407 (2009)

    Article  Google Scholar 

  3. Belleau, F., Nolin, M.-A., Tourigny, N., Rigault, P., Morissette, J.: Bio2RDF: Towards a mashup to build bioinformatics knowledge systems. Journal of Biomedical Informatics 41(5), 706–716 (2008); Semantic Mashup of Biomedical Data

    Google Scholar 

  4. Samwald, M., Jentzsch, A., Bouton, C., Kallesøe, C., Willighagen, E.L., Hajagos, J., Marshall, M.S., Prud’hommeaux, E., Hassanzadeh, O., Pichler, E., Stephens, S.: Linked open drug data for pharmaceutical research and development. J. Cheminformatics 3, 19 (2011)

    Article  Google Scholar 

  5. Kinjo, A.R., Suzuki, H., Yamashita, R., Ikegawa, Y., Kudou, T., Igarashi, R., Kengaku, Y., Cho, H., Standley, D.M., Nakagawa, A., Nakamura, H.: Protein Data Bank Japan (PDBj): maintaining a structural data archive and resource description framework format. Nucleic Acids Research 40 (Database-Issue), 453–460 (2012)

    Google Scholar 

  6. The EBI RDF Platform, http://www.ebi.ac.uk/rdf/

  7. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al.: Gene Ontology: tool for the unification of biology. Nature Genetics 25(1), 25–29 (2000)

    Article  Google Scholar 

  8. Coulet, A., Smaïl-Tabbone, M., Benlian, P., Napoli, A., Devignes, M.-D.: Ontology-guided data preparation for discovering genotype-phenotype relationships. BMC Bioinformatics 9(suppl. 4), S3 (2008)

    Google Scholar 

  9. Coulet, A., Smaïl-Tabbone, M., Napoli, A., Devignes, M.-D.: Ontology-based knowledge discovery in pharmacogenomics. In: Software Tools and Algorithms for Biological Systems, pp. 357–366. Springer (2011)

    Google Scholar 

  10. Good, B.M., Wilkinson, M.D.: The Life Sciences Semantic Web is Full of Creeps!. Briefings in Bioinformatics 7(3), 275–286 (2006)

    Article  Google Scholar 

  11. Marshall, M.S., Boyce, R.D., Deus, H.F., Zhao, J., Willighagen, E.L., Samwald, M., Pichler, E., Hajagos, J., Prud’hommeaux, E., Stephens, S.: Emerging practices for mapping and linking life sciences data using RDF - A case series. J. Web Sem. 14, 2–13 (2012)

    Article  Google Scholar 

  12. Alam, M., Chekol, M.W., Coulet, A., Napoli, A., Smaïl-Tabbone, M.: Lattice Based Data Access (LBDA): An Approach for Organizing and Accessing Linked Open Data in Biology. In: Proceedings of the International Workshop on Data Mining on Linked Data, DMoLD 2013 (2013)

    Google Scholar 

  13. Callahan, A., Cruz-Toledo, J., Dumontier, M.: Querying Bio2RDF Linked Open Data with a Global Schema. In: Proceedings of Bio-ontologies SIG (2012)

    Google Scholar 

  14. Wilkinson, M.D., Vandervalk, B.P., McCarthy, E.L.: The Semantic Automated Discovery and Integration (SADI) Web service Design-Pattern, API and Reference Implementation. J. Biomedical Semantics 2, 8 (2011)

    Article  Google Scholar 

  15. Lopes, P., Oliveira, J.L.: COEUS: “semantic web in a box” for biomedical applications. J. Biomedical Semantics 3, 11 (2012)

    Article  Google Scholar 

  16. Gangemi, A., Nuzzolese, A.G., Presutti, V., Draicchio, F., Musetti, A., Ciancarini, P.: Automatic typing of dBpedia entities. In: Cudré-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 65–81. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  17. Ngonga Ngomo, A.-C.: Link discovery with guaranteed reduction ratio in affine spaces with minkowski measures. In: Cudré-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 378–393. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  18. Xu, M., Wang, Z., Bie, R., Li, J., Zheng, C., Ke, W., Zhou, M.: Discovering Missing Semantic Relations between Entities in Wikipedia. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 673–686. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  19. Brenninkmeijer, C.Y.A., Dunlop, I., Goble, C.A., Gray, A.J.G., Pettifer, S., Stevens, R.: Computing Identity Co-Reference Across Drug Discovery Datasets. In: Proceedings of the 6th International Workshop on Semantic Web Applications and Tools for Life Sciences, SWAT4LS 2013 (2013)

    Google Scholar 

  20. Percha, B., Garten, Y., Altman, R.B.: Discovery and explanation of drug-drug interactions via text mining. In: Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, pp. 410–421. World Scientific (2012)

    Google Scholar 

  21. Pathak, J., Kiefer, R.C., Chute, C.G.: Mining Anti-coagulant Drug-Drug Interactions from Electronic Health Records Using Linked Data. In: Baker, C.J.O., Butler, G., Jurisica, I. (eds.) DILS 2013. LNCS, vol. 7970, pp. 128–140. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  22. d’Aquin, M., Kronberger, G., Suárez-Figueroa, M.C.: Combining data mining and ontology engineering to enrich ontologies and linked data. In: Workshop: Knowledge Discovery and Data Mining Meets Linked Open Data-Know@ LOD at Extended Semantic Web Conference (ESWC), vol. 2012 (2012)

    Google Scholar 

  23. Galárraga, L.A., Teflioudi, C., Hose, K., Suchanek, F.: Amie: association rule mining under incomplete evidence in ontological knowledge bases. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 413–422. International World Wide Web Conferences Steering Committee (2013)

    Google Scholar 

  24. Muggleton, S.: Inductive Logic Programming. New Generation Computing 8(4), 295–318 (1991)

    Article  MATH  Google Scholar 

  25. Muggleton, S., De Raedt, L.: Inductive logic programming: Theory and methods. The Journal of Logic Programming 19(20), 629–679 (1994)

    Article  Google Scholar 

  26. Srinivasan, A.: The Aleph Manual (2007), http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/

  27. Inlow, J.K., Restifo, L.L.: Molecular and comparative genetics of mental retardation. Genetics 166(2), 835–881 (2004)

    Article  Google Scholar 

  28. Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., Ohl, P., Thiel, K., Wiswedel, B.: KNIME-the Konstanz information miner: version 2.0 and beyond. ACM SIGKDD Explorations Newsletter 11(1), 26–31 (2009)

    Article  Google Scholar 

  29. Renaud Grisoni, Emmanuel Bresso, Marie-Dominique Devignes, and Malika Smaïl-Tabbone. Méthodologie et outils pour l’extraction de connaissances par Programmation Logique Inductive (PLI) (Poster). In 13ème Conférence Francophone sur l’Extraction et la Gestion des Connaissances- EGC 2013, Toulouse, France, 2013.

    Google Scholar 

  30. Clara DM van Karnebeek and Sylvia Stockler. Treatable inborn errors of metabolism causing intellectual disability: a systematic literature review. Molecular genetics and metabolism, 105(3):368–381, 2012.

    Article  Google Scholar 

  31. Michael R. Berthold, Katharina Morik, and Arno Siebes, editors. Parallel Universes and Local Patterns, volume 07181 of Dagstuhl Seminar Proceedings, 2007.

    Google Scholar 

  32. Arno Knobbe, Bruno Crémilleux, Johannes Fürnkranz, and Martin Scholz. From Local Patterns to Global Models: The LeGo Approach to Data Mining. In International Workshop From Local Patterns to Global Models co-located with ECML/PKDD’08, pages 1–16, Antwerp, Belgium, September 2008.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Personeni, G. et al. (2014). Mining Linked Open Data: A Case Study with Genes Responsible for Intellectual Disability. In: Galhardas, H., Rahm, E. (eds) Data Integration in the Life Sciences. DILS 2014. Lecture Notes in Computer Science(), vol 8574. Springer, Cham. https://doi.org/10.1007/978-3-319-08590-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08590-6_2

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08589-0

  • Online ISBN: 978-3-319-08590-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics