Fast Approximate A-Box Consistency Checking Using Machine Learning

  • Heiko PaulheimEmail author
  • Heiner Stuckenschmidt
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9678)


Ontology reasoning is typically a computationally intensive operation. While soundness and completeness of results is required in some use cases, for many others, a sensible trade-off between computation efforts and correctness of results makes more sense. In this paper, we show that it is possible to approximate a central task in reasoning, i.e., A-box consistency checking, by training a machine learning model which approximates the behavior of that reasoner for a specific ontology. On four different datasets, we show that such learned models constantly achieve an accuracy above 95 % at less than 2 % of the runtime of a reasoner, using a decision tree with no more than 20 inner nodes. For example, this allows for validating 293M Microdata documents against the ontology in less than 90 min, compared to 18 days required by a state of the art ontology reasoner.


Approximate ontology reasoning Machine learning 



The authors would like to thank Aldo Gangemi for providing the DOLCE mappings for DBpedia and YAGO, and Robert Meusel for his assistance in providing suitable samples from the WebDataCommons corpora. This work has been supported by RapidMiner in the course of the RapidMiner Academia program.


  1. 1.
    Baader, F., Lutz, C., Suntisrivaraporn, B.: CEL — a polynomial-time reasoner for life science ontologies. In: Furbach, U., Shankar, N. (eds.) IJCAR 2006. LNCS (LNAI), vol. 4130, pp. 287–291. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  2. 2.
    Berners-Lee, T., Hendler, J., Lassila, O., et al.: The semantic web. Sci. Am. 284(5), 28–37 (2001)CrossRefGoogle Scholar
  3. 3.
    Cadoli, M., Schaerf, M.: Approximation in concept description languages. In: KR, pp. 330–341 (1992)Google Scholar
  4. 4.
    Chen, L., Nugent, C.: Ontology-based activity recognition in intelligent pervasive environments. Int. J. Web Inf. Syst. 5(4), 410–430 (2009)CrossRefGoogle Scholar
  5. 5.
    d’Amato, C., Fanizzi, N., Esposito, F.: Query answering and ontology population: an inductive approach. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 288–302. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  6. 6.
    de Vries, G.K.D., de Rooij, S.: A fast and simple graph kernel for RDF. In: DMoLD, vol. 1082 (2013)Google Scholar
  7. 7.
    Fanizzi, N., d’Amato, C., Esposito, F.: Statistical learning for inductive query answering on OWL ontologies. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 195–212. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  8. 8.
    Fanizzi, N., d’Amato, C., Esposito, F.: Induction of robust classifiers for web ontologies through kernel machines. J. Web Sem. 11, 1–13 (2012)CrossRefGoogle Scholar
  9. 9.
    Gangemi, A., Guarino, N., Masolo, C., Oltramari, A.: Sweetening WORDNET with DOLCE. AI Mag. 24, 13–24 (2003)zbMATHGoogle Scholar
  10. 10.
    Gangemi, A., Mika, P.: Understanding the semantic web through descriptions and situations. In: Meersman, R., Schmidt, D.C. (eds.) CoopIS 2003, DOA 2003, and ODBASE 2003. LNCS, vol. 2888, pp. 689–706. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  11. 11.
    Glimm, B., Horrocks, I., Motik, B., Stoilos, G., Wang, Z.: Hermit: an OWL 2 reasoner. J. Autom. Reasoning 53(3), 245–269 (2014)CrossRefzbMATHGoogle Scholar
  12. 12.
    Groot, P., Stuckenschmidt, H., Wache, H.: Approximating description logic classification for semantic web reasoning. In: Gómez-Pérez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, pp. 318–332. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  13. 13.
    Haarslev, V., Möller, R.: Racer: a core inference engine for the semantic web. In: EON, vol. 87 (2003)Google Scholar
  14. 14.
    Hendler, J.: Agents and the semantic web. IEEE Intell. Syst. 2, 30–37 (2001)CrossRefGoogle Scholar
  15. 15.
    Horrocks, I., Rector, A.L., Goble, C.A.: A description logic based schema for the classification of medical data. In: KRDB, vol. 96, pp. 24–28. Citeseer (1996)Google Scholar
  16. 16.
    Horrocks, I., Sattler, U.: A tableau decision procedure for\(\backslash \) mathcal \(\{\)SHOIQ\(\}\). J. Autom. Reasoning 39(3), 249–276 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Kang, Y.-B., Li, Y.-F., Krishnaswamy, S.: Predicting reasoning performance using ontology metrics. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 198–214. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  18. 18.
    Kazakov, Y., Krötzsch, M., Simančík, F.: The incredible ELK. J. Autom. Reasoning 53(1), 1–61 (2014)CrossRefzbMATHGoogle Scholar
  19. 19.
    Lehmann, J., Auer, S., Bühmann, L., Tramp, S. (geb. Dietzold).: Class expression learning for ontology engineering. J. Web Seman. 9(1), 71–81 (2011)Google Scholar
  20. 20.
    Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., et al.: DBpedia-a large-scale, multilingual knowledge base extracted from wikipedia. Seman. Web J. 5, 1–29 (2014)Google Scholar
  21. 21.
    Li, N., Motta, E., d’Aquin, M.: Ontology summarization: an analysis and an evaluation. In: International Workshop on Evaluation of Semantic Technologies (2010)Google Scholar
  22. 22.
    Lösch, U., Bloehdorn, S., Rettinger, A.: Graph kernels for RDF data. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 134–148. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  23. 23.
    Metke-Jimenez, A., Lawley, M.: Snorocket 2.0: concrete domains and concurrent classification. In: ORE, pp. 32–38. Citeseer (2013)Google Scholar
  24. 24.
    Meusel, R., Bizer, C., Paulheim, H.: A web-scale study of the adoption and evolution of the schema. org vocabulary over time. In: 5th International Conference on Web Intelligence, Mining and Semantics (WIMS), pp. 15. ACM (2015)Google Scholar
  25. 25.
    Meusel, R., Petrovski, P., Bizer, C.: The WebDataCommons microdata, RDFa and microformat dataset series. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 277–292. Springer, Heidelberg (2014)Google Scholar
  26. 26.
    Middleton, S.E., De Roure, D., Shadbolt, N.R.: Ontology-based recommender systems. Handbook on Ontologies. International Handbooks on Information Systems, pp. 779–796. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  27. 27.
    Motik, B., Grau, B.C., Horrocks, I., Wu, Z., Fokoue, A., Lutz, C.: OWL 2 web ontology language: profiles. W3C recommendation, vol. 27, p. 61 (2009)Google Scholar
  28. 28.
    Patel-Schneider, P.F.: Analyzing In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 261–276. Springer, Heidelberg (2014)Google Scholar
  29. 29.
    Paulheim, H.: What the adoption of tells about linked open data. In: Dataset PROFIling & fEderated Search for Linked Data (2015)Google Scholar
  30. 30.
    Paulheim, H., Bizer, C.: Improving the quality of linked data using statistical distributions. Int. J. Seman. Web Inf. Syst. (IJSWIS) 10(2), 63–86 (2014)CrossRefGoogle Scholar
  31. 31.
    Paulheim, H., Gangemi, A.: Serving DBpedia with DOLCE – more than justadding a cherry on top. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 180–196. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-25007-6_11 CrossRefGoogle Scholar
  32. 32.
    Ren, Y., Pan, J.Z., Zhao, Y.: Soundness preserving approximation for tbox reasoning. In: AAAI, pp. 351–356 (2010)Google Scholar
  33. 33.
    Rizzo, G., dAmato, C., Fanizzi, N.: On the effectiveness of evidence-based terminological decision trees. In: Esposito, F., et al. (eds.) ISMIS 2015. LNCS, vol. 9384, pp. 139–149. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-25252-0_15 CrossRefGoogle Scholar
  34. 34.
    Rizzo, G., d’Amato, C., Fanizzi, N., Esposito, F.: Tackling the class-imbalance learning problem in semantic web knowledge bases. In: Janowicz, K., Schlobach, S., Lambrix, P., Hyvönen, E. (eds.) EKAW 2014. LNCS, vol. 8876, pp. 453–468. Springer, Heidelberg (2014)Google Scholar
  35. 35.
    Rizzo, G., d’Amato, C., Fanizzi, N., Esposito, F.: Towards evidence-based terminological decision trees. In: Laurent, A., Strauss, O., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2014, Part I. CCIS, vol. 442, pp. 36–45. Springer, Heidelberg (2014)Google Scholar
  36. 36.
    Rizzo, G., dAmato, C., Fanizzi, N., Esposito, F.: Inductive classification through evidence-based models and their ensembles. In: Gandon, F., Sabou, M., Sack, H., dAmato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 418–433. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  37. 37.
    Sazonau, V., Sattler, U., Brown, G.: Predicting performance of OWL reasoners: locally or globally? In: KR. Citeseer (2014)Google Scholar
  38. 38.
    Schaerf, M., Cadoli, M.: Tractable reasoning via approximation. Artif. Intell. 74(2), 249–310 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  39. 39.
    Settles, B.: Active learning literature survey. University of Wisconsin, Madison, vol. 52(55–66), p. 11 (2010)Google Scholar
  40. 40.
    Shah, U., Finin, T., Joshi, A., Cost, R.S., Matfield, J.: Information retrieval on the semantic web. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 461–468. ACM (2002)Google Scholar
  41. 41.
    Silla Jr., C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Disc. 22(1–2), 31–72 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  42. 42.
    Sirin, E., Parsia, B., Grau, B.C., Kalyanpur, A., Katz, Y.: Pellet: A practical OWL-dl reasoner. Web Seman. Sci. Serv. Agents World Wide Web 5(2), 51–53 (2007)CrossRefGoogle Scholar
  43. 43.
    Steigmiller, A., Liebig, T., Glimm, B.: Konclude: system description. Web Seman. Sci. Serv. Agents World Wide Web 27, 78–85 (2014)CrossRefzbMATHGoogle Scholar
  44. 44.
    Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: 16th International Conference on World Wide Web, pp. 697–706 (2007)Google Scholar
  45. 45.
    Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a large ontology from wikipedia and wordnet. Web Seman. Sci. Serv. Agents World Wide Web 6(3), 203–217 (2008)CrossRefGoogle Scholar
  46. 46.
    Suda, M., Weidenbach, C., Wischnewski, P.: On the saturation of YAGO. In: Giesl, J., Hähnle, R. (eds.) IJCAR 2010. LNCS, vol. 6173, pp. 441–456. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  47. 47.
    Tsarkov, D., Horrocks, I.: FaCT++ description logic reasoner: system description. In: Furbach, U., Shankar, N. (eds.) IJCAR 2006. LNCS (LNAI), vol. 4130, pp. 292–297. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  48. 48.
    Völker, J., Niepert, M.: Statistical schema induction. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 124–138. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  49. 49.
    Wienand, D., Paulheim, H.: Detecting incorrect numerical data in DBpedia. In: Presutti, V., dAmato, C., Gandon, F., dAquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 504–518. Springer, Heidelberg (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Data and Web Science GroupUniversity of MannheimMannheimGermany

Personalised recommendations