ESWC 2016: The Semantic Web. Latest Advances and New Domains pp 135-150 | Cite as
Fast Approximate A-Box Consistency Checking Using Machine Learning
Abstract
Ontology reasoning is typically a computationally intensive operation. While soundness and completeness of results is required in some use cases, for many others, a sensible trade-off between computation efforts and correctness of results makes more sense. In this paper, we show that it is possible to approximate a central task in reasoning, i.e., A-box consistency checking, by training a machine learning model which approximates the behavior of that reasoner for a specific ontology. On four different datasets, we show that such learned models constantly achieve an accuracy above 95 % at less than 2 % of the runtime of a reasoner, using a decision tree with no more than 20 inner nodes. For example, this allows for validating 293M Microdata documents against the schema.org ontology in less than 90 min, compared to 18 days required by a state of the art ontology reasoner.
Keywords
Approximate ontology reasoning Machine learningNotes
Acknowledgements
The authors would like to thank Aldo Gangemi for providing the DOLCE mappings for DBpedia and YAGO, and Robert Meusel for his assistance in providing suitable samples from the WebDataCommons corpora. This work has been supported by RapidMiner in the course of the RapidMiner Academia program.
References
- 1.Baader, F., Lutz, C., Suntisrivaraporn, B.: CEL — a polynomial-time reasoner for life science ontologies. In: Furbach, U., Shankar, N. (eds.) IJCAR 2006. LNCS (LNAI), vol. 4130, pp. 287–291. Springer, Heidelberg (2006)CrossRefGoogle Scholar
- 2.Berners-Lee, T., Hendler, J., Lassila, O., et al.: The semantic web. Sci. Am. 284(5), 28–37 (2001)CrossRefGoogle Scholar
- 3.Cadoli, M., Schaerf, M.: Approximation in concept description languages. In: KR, pp. 330–341 (1992)Google Scholar
- 4.Chen, L., Nugent, C.: Ontology-based activity recognition in intelligent pervasive environments. Int. J. Web Inf. Syst. 5(4), 410–430 (2009)CrossRefGoogle Scholar
- 5.d’Amato, C., Fanizzi, N., Esposito, F.: Query answering and ontology population: an inductive approach. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 288–302. Springer, Heidelberg (2008)CrossRefGoogle Scholar
- 6.de Vries, G.K.D., de Rooij, S.: A fast and simple graph kernel for RDF. In: DMoLD, vol. 1082 (2013)Google Scholar
- 7.Fanizzi, N., d’Amato, C., Esposito, F.: Statistical learning for inductive query answering on OWL ontologies. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 195–212. Springer, Heidelberg (2008)CrossRefGoogle Scholar
- 8.Fanizzi, N., d’Amato, C., Esposito, F.: Induction of robust classifiers for web ontologies through kernel machines. J. Web Sem. 11, 1–13 (2012)CrossRefGoogle Scholar
- 9.Gangemi, A., Guarino, N., Masolo, C., Oltramari, A.: Sweetening WORDNET with DOLCE. AI Mag. 24, 13–24 (2003)MATHGoogle Scholar
- 10.Gangemi, A., Mika, P.: Understanding the semantic web through descriptions and situations. In: Meersman, R., Schmidt, D.C. (eds.) CoopIS 2003, DOA 2003, and ODBASE 2003. LNCS, vol. 2888, pp. 689–706. Springer, Heidelberg (2003)CrossRefGoogle Scholar
- 11.Glimm, B., Horrocks, I., Motik, B., Stoilos, G., Wang, Z.: Hermit: an OWL 2 reasoner. J. Autom. Reasoning 53(3), 245–269 (2014)CrossRefMATHGoogle Scholar
- 12.Groot, P., Stuckenschmidt, H., Wache, H.: Approximating description logic classification for semantic web reasoning. In: Gómez-Pérez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, pp. 318–332. Springer, Heidelberg (2005)CrossRefGoogle Scholar
- 13.Haarslev, V., Möller, R.: Racer: a core inference engine for the semantic web. In: EON, vol. 87 (2003)Google Scholar
- 14.Hendler, J.: Agents and the semantic web. IEEE Intell. Syst. 2, 30–37 (2001)CrossRefGoogle Scholar
- 15.Horrocks, I., Rector, A.L., Goble, C.A.: A description logic based schema for the classification of medical data. In: KRDB, vol. 96, pp. 24–28. Citeseer (1996)Google Scholar
- 16.Horrocks, I., Sattler, U.: A tableau decision procedure for\(\backslash \) mathcal \(\{\)SHOIQ\(\}\). J. Autom. Reasoning 39(3), 249–276 (2007)MathSciNetCrossRefMATHGoogle Scholar
- 17.Kang, Y.-B., Li, Y.-F., Krishnaswamy, S.: Predicting reasoning performance using ontology metrics. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 198–214. Springer, Heidelberg (2012)CrossRefGoogle Scholar
- 18.Kazakov, Y., Krötzsch, M., Simančík, F.: The incredible ELK. J. Autom. Reasoning 53(1), 1–61 (2014)CrossRefMATHGoogle Scholar
- 19.Lehmann, J., Auer, S., Bühmann, L., Tramp, S. (geb. Dietzold).: Class expression learning for ontology engineering. J. Web Seman. 9(1), 71–81 (2011)Google Scholar
- 20.Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., et al.: DBpedia-a large-scale, multilingual knowledge base extracted from wikipedia. Seman. Web J. 5, 1–29 (2014)Google Scholar
- 21.Li, N., Motta, E., d’Aquin, M.: Ontology summarization: an analysis and an evaluation. In: International Workshop on Evaluation of Semantic Technologies (2010)Google Scholar
- 22.Lösch, U., Bloehdorn, S., Rettinger, A.: Graph kernels for RDF data. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 134–148. Springer, Heidelberg (2012)CrossRefGoogle Scholar
- 23.Metke-Jimenez, A., Lawley, M.: Snorocket 2.0: concrete domains and concurrent classification. In: ORE, pp. 32–38. Citeseer (2013)Google Scholar
- 24.Meusel, R., Bizer, C., Paulheim, H.: A web-scale study of the adoption and evolution of the schema. org vocabulary over time. In: 5th International Conference on Web Intelligence, Mining and Semantics (WIMS), pp. 15. ACM (2015)Google Scholar
- 25.Meusel, R., Petrovski, P., Bizer, C.: The WebDataCommons microdata, RDFa and microformat dataset series. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 277–292. Springer, Heidelberg (2014)Google Scholar
- 26.Middleton, S.E., De Roure, D., Shadbolt, N.R.: Ontology-based recommender systems. Handbook on Ontologies. International Handbooks on Information Systems, pp. 779–796. Springer, Heidelberg (2009)CrossRefGoogle Scholar
- 27.Motik, B., Grau, B.C., Horrocks, I., Wu, Z., Fokoue, A., Lutz, C.: OWL 2 web ontology language: profiles. W3C recommendation, vol. 27, p. 61 (2009)Google Scholar
- 28.Patel-Schneider, P.F.: Analyzing schema.org. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 261–276. Springer, Heidelberg (2014)Google Scholar
- 29.Paulheim, H.: What the adoption of schema.org tells about linked open data. In: Dataset PROFIling & fEderated Search for Linked Data (2015)Google Scholar
- 30.Paulheim, H., Bizer, C.: Improving the quality of linked data using statistical distributions. Int. J. Seman. Web Inf. Syst. (IJSWIS) 10(2), 63–86 (2014)CrossRefGoogle Scholar
- 31.Paulheim, H., Gangemi, A.: Serving DBpedia with DOLCE – more than justadding a cherry on top. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 180–196. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-25007-6_11 CrossRefGoogle Scholar
- 32.Ren, Y., Pan, J.Z., Zhao, Y.: Soundness preserving approximation for tbox reasoning. In: AAAI, pp. 351–356 (2010)Google Scholar
- 33.Rizzo, G., dAmato, C., Fanizzi, N.: On the effectiveness of evidence-based terminological decision trees. In: Esposito, F., et al. (eds.) ISMIS 2015. LNCS, vol. 9384, pp. 139–149. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-25252-0_15 CrossRefGoogle Scholar
- 34.Rizzo, G., d’Amato, C., Fanizzi, N., Esposito, F.: Tackling the class-imbalance learning problem in semantic web knowledge bases. In: Janowicz, K., Schlobach, S., Lambrix, P., Hyvönen, E. (eds.) EKAW 2014. LNCS, vol. 8876, pp. 453–468. Springer, Heidelberg (2014)Google Scholar
- 35.Rizzo, G., d’Amato, C., Fanizzi, N., Esposito, F.: Towards evidence-based terminological decision trees. In: Laurent, A., Strauss, O., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2014, Part I. CCIS, vol. 442, pp. 36–45. Springer, Heidelberg (2014)Google Scholar
- 36.Rizzo, G., dAmato, C., Fanizzi, N., Esposito, F.: Inductive classification through evidence-based models and their ensembles. In: Gandon, F., Sabou, M., Sack, H., dAmato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 418–433. Springer, Heidelberg (2015)CrossRefGoogle Scholar
- 37.Sazonau, V., Sattler, U., Brown, G.: Predicting performance of OWL reasoners: locally or globally? In: KR. Citeseer (2014)Google Scholar
- 38.Schaerf, M., Cadoli, M.: Tractable reasoning via approximation. Artif. Intell. 74(2), 249–310 (1995)MathSciNetCrossRefMATHGoogle Scholar
- 39.Settles, B.: Active learning literature survey. University of Wisconsin, Madison, vol. 52(55–66), p. 11 (2010)Google Scholar
- 40.Shah, U., Finin, T., Joshi, A., Cost, R.S., Matfield, J.: Information retrieval on the semantic web. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 461–468. ACM (2002)Google Scholar
- 41.Silla Jr., C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Disc. 22(1–2), 31–72 (2011)MathSciNetCrossRefMATHGoogle Scholar
- 42.Sirin, E., Parsia, B., Grau, B.C., Kalyanpur, A., Katz, Y.: Pellet: A practical OWL-dl reasoner. Web Seman. Sci. Serv. Agents World Wide Web 5(2), 51–53 (2007)CrossRefGoogle Scholar
- 43.Steigmiller, A., Liebig, T., Glimm, B.: Konclude: system description. Web Seman. Sci. Serv. Agents World Wide Web 27, 78–85 (2014)CrossRefMATHGoogle Scholar
- 44.Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: 16th International Conference on World Wide Web, pp. 697–706 (2007)Google Scholar
- 45.Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a large ontology from wikipedia and wordnet. Web Seman. Sci. Serv. Agents World Wide Web 6(3), 203–217 (2008)CrossRefGoogle Scholar
- 46.Suda, M., Weidenbach, C., Wischnewski, P.: On the saturation of YAGO. In: Giesl, J., Hähnle, R. (eds.) IJCAR 2010. LNCS, vol. 6173, pp. 441–456. Springer, Heidelberg (2010)CrossRefGoogle Scholar
- 47.Tsarkov, D., Horrocks, I.: FaCT++ description logic reasoner: system description. In: Furbach, U., Shankar, N. (eds.) IJCAR 2006. LNCS (LNAI), vol. 4130, pp. 292–297. Springer, Heidelberg (2006)CrossRefGoogle Scholar
- 48.Völker, J., Niepert, M.: Statistical schema induction. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 124–138. Springer, Heidelberg (2011)CrossRefGoogle Scholar
- 49.Wienand, D., Paulheim, H.: Detecting incorrect numerical data in DBpedia. In: Presutti, V., dAmato, C., Gandon, F., dAquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 504–518. Springer, Heidelberg (2014)CrossRefGoogle Scholar