Querying the Web with Statistical Machine Learning

  • Volker TrespEmail author
  • Yi Huang
  • Maximilian Nickel
Part of the Cognitive Technologies book series (COGTECH)


The traditional means of extracting information from the Web are keyword-based search and browsing. The Semantic Web adds structured information (i.e., semantic annotations and references) supporting both activities. One of the most interesting recent developments is Linked Open Data (LOD), where information is presented in the form of facts – often originating from published domain-specific databases – that can be accessed both by a human and a machine via specific query endpoints. In this article, we argue that machine learning provides a new way to query web data, in particular LOD, by analyzing and exploiting statistical regularities. We discuss challenges when applying machine learning to the Web and discuss the particular learning approaches we have been pursuing in THESEUS. We discuss a number of applications where the Web is queried via machine learning and describe several extensions to our approaches.


Machine Learning Disease Gene Link Prediction Deductive Reasoning Link Open Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives, DBpedia: a nucleus for a web of open data, in Proceedings of the 6th International Semantic Web Conference (ISWC’08), Karlsruhe. Volume 4825 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2008), pp. 722–735Google Scholar
  2. M. Balduini, I. Celino, D. Dell’Aglio, E.D. Valle, Y. Huang, T. Lee, S.H. Kim, V. Tresp, Reality mining on micropost streams: deductive and inductive reasoning for personalized and location-based recommendations. Semant. Web Interoperability Usability Applicability 2, 1–16 (2013)Google Scholar
  3. C. Bizer, T. Heath, T. Berners-Lee, Linked data – the story so far. Int. J. Semant. Web Inf. Syst. (IJSWIS) 5(3), 1–22 (2009)Google Scholar
  4. D. Brickley, L. Miller, The Friend of a Friend (FOAF) project,
  5. D. Fensel, F. van Harmelen, B. Andersson, P. Brennan, H. Cunningham, E.D. Valle, F. Fischer, Z. Huang, A. Kiryakov, T.K. il Lee, L. Schooler, V. Tresp, S. Wesner, M. Witbrock, N. Zhong, Towards LarKC: a platform for web-scale reasoning, in Proceedings of the IEEE International Conference on Semantic Computing, Santa Clara, Aug 2008, pp. 524–529Google Scholar
  6. Y. Huang, V. Tresp, M. Bundschus, A. Rettinger, H.P. Kriegel, Multivariate structured prediction for learning on semantic web, in Proceedings of the 20th International Conference on Inductive Logic Programming (ILP’10), Florence, ed. by P. Frasconi, F.A. Lisi. Volume 6489 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2010), pp. 92–104Google Scholar
  7. Y. Huang, V. Tresp, M. Nickel, A. Rettinger, H.P. Kriegel, A scalable approach for statistical learning in semantic graphs. Semant. Web Interoperability Usability Applicability 1, 1–18 (2013)Google Scholar
  8. X. Jiang, Y. Huang, M. Nickel, V. Tresp, Combining information extraction, deductive reasoning and machine learning for relation prediction, in Proceedings of the 9th Extended Semantic Web Conference (ESWC’12), Heraklion, ed. by E. Simperl, P. Cimiano, A. Polleres, O. Corcho, V. Presutti. Volume 7295 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2012a), pp. 164–178.
  9. X. Jiang, V. Tresp, Y. Huang, M. Nickel, Link prediction in multi-relational graphs using additive models, in Proceedings of the 11th International Workshop on Semantic Technologies Meet Recommender Systems & Big Data, ed. by M. de Gemmis, T.D. Noia, P. Lops, T. Lukasiewicz, G. Semeraro. Volume 919 of CEUR Workshop Proceedings, 2012b, pp. 1–12,
  10. X. Jiang, V. Tresp, Y. Huang, M. Nickel, H.P. Kriegel, Scalable relation prediction exploiting both intrarelational correlation and contextual information, in Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD’12), Bristol, ed. by P.A. Flach, T.D. Bie, N. Cristianini. Volume 7523 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2012c), pp. 601–616.
  11. M.G. Kann, Advances in translational bioinformatics: computational approaches for the hunting of disease genes. Brief. Bioinform. 11(1), 96–110 (2010).
  12. M. Nickel, H.P. Kriegel, V. Tresp, A three-way model for collective learning on multi-relational data, in Proceedings of the 28th International Conference on Machine Learning (ICML’11), Bellevue, 2011Google Scholar
  13. M. Nickel, V. Tresp, H.P. Kriegel, Factorizing YAGO: scalable machine learning for linked data, in Proceedings of the 21st International World Wide Web Conference, Lyon, ed. by A. Mille, F.L. Gandon, J. Misselis, M. Rabinovich, S. Staab (ACM, 2012), pp. 271–280.
  14. V. Tresp, Y. Huang, M. Bundschus, A. Rettinger, Materializing and querying learned knowledge, in Proceedings of the First ESWC Workshop on Inductive Reasoning and Machine Learning on the Semantic Web (IRMLeS’09), Heraklion, vol. 474 (RWTH Aachen, 2009)Google Scholar
  15. V. Tresp, Y. Huang, X. Jiang, A. Rettinger, Graphical models for relations – modeling relational context, in Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR’11), Paris, Oct 2011Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Siemens AGMunichGermany
  2. 2.Ludwig Maximilian University MunichMunichGermany

Personalised recommendations