Abstract
The Semantic Web (SW) is characterized by the availability of a vast amount of semantically annotated data collections. Annotations are provided by exploiting ontologies acting as shared vocabularies. Additionally ontologies are endowed with deductive reasoning capabilities which allow to make explicit knowledge that is formalized implicitly. Along the years a large number of data collections have been developed and interconnected, as testified by the Linked Open Data Cloud. Currently, seminal examples are represented by the numerous Knowledge Graphs (KGs) that have been built, either as enterprise KGs or open KGs, that are freely available. All of them are characterized by very large data volumes, but also incompleteness and noise. These characteristics have made the exploitation of deductive reasoning services less feasible from a practical viewpoint, opening up to alternative solutions, grounded on Machine Learning (ML), for mining knowledge from the vast amount of information available. Actually, ML methods have been exploited in the SW for solving several problems such as link and type prediction, ontology enrichment and completion (both at terminological and assertional level), and concept leaning. Whilst initially symbol-based solutions have been mostly targeted, recently numeric-based approaches are receiving major attention because of the need to scale on the very large data volumes. Nevertheless, data collections in the SW have peculiarities that can hardly be found in other fields. As such the application of ML methods for solving the targeted problems is not straightforward. This paper extends [20], by surveying the most representative symbol-based and numeric-based solutions and related problems, with a special focus on the main issues that need to be considered and solved when ML methods are adopted in the SW field as well as by analyzing the main peculiarities and drawbacks for each solution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
The induced knowledge should be validated by ontology engineerings for the possible further enrichment of ontologies.
- 3.
- 4.
TransR tackles some weak points in TransE, such as the difficulty of modeling specific types of relationships [3].
- 5.
Facilities available in the Apache Jena framework were used: https://jena.apache.org.
References
Abboud, R., Ceylan, İ.İ., Lukasiewicz, T., Salvatori, T.: BoxE: a box embedding model for knowledge base completion. In: Proceedings of NeurIPS 2020 (2020)
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Buneman, P., et al. (eds.) Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207–216. ACM Press (1993). https://doi.org/10.1145/170035.170072
Arnaout, H., Razniewski, S., Weikum, G.: Enriching knowledge bases with interesting negative statements. In: Das, D., et al. (eds.) Proceedings of AKBC 2020 (2020). https://doi.org/10.24432/C5101K
Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F. (eds.): Description Logic Handbook, 2nd edn. Cambridge University Press, Cambridge (2010). https://doi.org/10.1017/CBO9780511711787
Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Sci. Am. 284(5), 34–43 (2001). https://doi.org/10.4018/jswis.2009081901
Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Sem. Web Inf. Syst. 5(3), 1–22 (2009). https://doi.org/10.4018/jswis.2009081901
Blockeel, H., De Raedt, L.: Top-down induction of first-order logical decision trees. Artif. Intell. 101(1–2), 285–297 (1998). https://doi.org/10.1016/S0004-3702(98)00034-4
Bloehdorn, S., Sure, Y.: Kernel methods for mining instance data in ontologies. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 58–71. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_5
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Burges, C.J.C., et al. (eds.) Proceedings of NIPS 2013, pp. 2787–2795. Curran Associates, Inc. (2013)
Cai, H., Zheng, V.W., Chang, K.: A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Trans. Knowl. Data Eng. 30(09), 1616–1637 (2018). https://doi.org/10.1109/TKDE.2018.2807452
Carbonneau, M., Cheplygina, V., Granger, E., Gagnon, G.: Multiple instance learning. Pattern Recogn. 77, 329–353 (2018). https://doi.org/10.1016/j.patcog.2017.10.009
Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-supervised Learning. The MIT Press, Cambridge (2006). https://doi.org/10.7551/mitpress/9780262033589.001.0001
Chen, J., Lécué, F., Pan, J., Horrocks, I., Chen, H.: Knowledge-based transfer learning explanation. In: Thielscher, M., et al. (eds.) Principles of Knowledge Representation and Reasoning: Proceedings of the Sixteenth International Conference, KR 2018, pp. 349–358. AAAI Press (2018)
Cochez, M., Ristoski, P., Ponzetto, S.P., Paulheim, H.: Global RDF vector space embeddings. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10587, pp. 190–207. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68288-4_12
d’Amato, C.: Logic and learning: Can we provide explanations in the current knowledge lake? In: Bonatti, P., et al. (eds.) Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web (Dagstuhl Seminar 18371), Dagstuhl Reports, vol. 8, pp. 37–38. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2019). https://doi.org/10.4230/DagRep.8.9.29
d’Amato, C., Fanizzi, N., Esposito, F.: Query answering and ontology population: an inductive approach. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 288–302. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68234-9_23
d’Amato, C., Fanizzi, N., Esposito, F.: Inductive learning for the semantic web: what does it buy? Semant. Web 1(1–2), 53–59 (2010). https://doi.org/10.3233/SW-2010-0007
d’Amato, C., Quatraro, N.F., Fanizzi, N.: Injecting background knowledge into embedding models for predictive tasks on knowledge graphs. In: Verborgh, R., et al. (eds.) ESWC 2021. LNCS, vol. 12731, pp. 441–457. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-77385-4_26
d’Amato, C., Tettamanzi, A.G.B., Minh, T.D.: Evolutionary discovery of multi-relational association rules from ontological knowledge bases. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds.) EKAW 2016. LNCS (LNAI), vol. 10024, pp. 113–128. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49004-5_8
d’Amato, C.: Machine learning for the semantic web: lessons learnt and next research directions. Semant. Web 11(1), 195–203 (2020). https://doi.org/10.3233/SW-200388
Deng, L., Yu, D. (eds.): Deep Learning: Methods and Applications. NOW Publishers, Delft (2014). https://doi.org/10.1561/2000000039
Doran, D., Schulz, S., Besold, T.: What does explainable AI really mean? A new conceptualization of perspectives. In: Besold, T.R., Kutz, O. (eds.) Proceedings of the First International Workshop on Comprehensibility and Explanation in AI and ML 2017 co-located with 16th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2017), CEUR Work. Proc., vol. 2071. CEUR-WS.org (2017)
Fanizzi, N., d’Amato, C., Esposito, F.: Conceptual clustering and its application to concept drift and novelty detection. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 318–332. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68234-9_25
Fanizzi, N., d’Amato, C., Esposito, F.: DL-FOIL concept learning in description logics. In: Železný, F., Lavrač, N. (eds.) ILP 2008. LNCS (LNAI), vol. 5194, pp. 107–121. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85928-4_12
Fanizzi, N., d’Amato, C., Esposito, F.: Metric-based stochastic conceptual clustering for ontologies. Inf. Syst. 34(8), 792–806 (2009). https://doi.org/10.1016/j.is.2009.03.008
Fanizzi, N., d’Amato, C., Esposito, F.: Induction of concepts in web ontologies through terminological decision trees. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6321, pp. 442–457. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15880-3_34
Fanizzi, N., Rizzo, G., d’Amato, C.: Boosting DL concept learners. In: Hitzler, P., et al. (eds.) ESWC 2019. LNCS, vol. 11503, pp. 68–83. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21348-0_5
Galárraga, L., Teflioudi, C., Hose, K., Suchanek, F.M.: AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In: Schwabe, D., et al. (eds.) 22nd International World Wide Web Conference, WWW 2013, pp. 413–422. International World Wide Web Conferences Steering Committee/ACM (2013). https://doi.org/10.1145/2488388.2488425
Galárraga, L., Teflioudi, C., Hose, K., Suchanek, F.M.: Fast rule mining in ontological knowledge bases with AMIE\(+\). The VLDB J. 24(6), 707–730 (2015). https://doi.org/10.1007/s00778-015-0394-1
d’Avila Garcez, A., et al.: Neural-symbolic learning and reasoning: contributions and challenges. In: 2015 AAAI Spring Symposia. AAAI Press (2015). http://www.aaai.org/ocs/index.php/SSS/SSS15/paper/view/10281
Getoor, L., Taskar, B. (eds.): Introduction to Statistical Relational Learning. MIT Press, Cambridge (2007)
Guo, H., Herna, V.L.: Learning from imbalanced data sets with boosting and data generation: the databoost-im approach. SIGKDD Explor. 6(1), 30–39 (2004). https://doi.org/10.1145/1007730.1007736
Guo, S., Wang, Q., Wang, L., Wang, B., Guo, L.: Jointly embedding knowledge graphs and logical rules. In: Proceedings of EMNLP 2016, pp. 192–202. ACL (2016). https://doi.org/10.18653/v1/D16-1019
Hitzler, P., Bianchi, F., Ebrahimi, M., Sarker, M.K.: Neural-symbolic integration and the semantic web. Semant. Web J. 11(1), 3–11 (2020). https://doi.org/10.3233/SW-190368
Hoekstra, R.: The knowledge reengineering bottleneck. Semant. Web J. 1, 111–115 (2010). https://doi.org/10.3233/SW-2010-0004
Hogan, A., et al.: Knowledge graphs. ACM Comput. Surv. 54, 1–37 (2021). https://doi.org/10.1145/3447772
Horrocks, I., Patel-Schneider, P., Boley, H., Tabet, S., Grosof, B., Dean., M.: SWRL: a semantic web rule language combining owl and RuleML (2004). http://www.aaai.org/ocs/index.php/SSS/SSS15/paper/view/10281
Jayathilaka, M., Mu, T., Sattler, U.: Visual-semantic embedding model informed by structured knowledge. In: Rudolph, S., Marreiros, G. (eds.) Proceedings of STAIRS 2020. CEUR, vol. 2655. CEUR-WS.org (2020). http://ceur-ws.org/Vol-2655/paper23.pdf
Ji, S., Pan, S., Cambria, E., Marttinen, P., Yu, P.S.: A survey on knowledge graphs: representation, acquisition and applications (2020). arXiv:2002.00388
Józefowska, J., Lawrynowicz, A., Lukaszewski, T.: The role of semantics in mining frequent patterns from knowledge bases in description logics with rules. TPLP 10(3), 251–289 (2010). https://doi.org/10.1017/S1471068410000098
Kazemi, S., Poole, D.: Simple embedding for link prediction in knowledge graphs. In: Bengio, S., et al. (eds.) Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, pp. 4289–4300. ACM (2018)
Koller, D., Friedman, N. (eds.): Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge (2009)
Kramer, S., Lavrač, N., Flach, P.: Propositionalization approaches to relational data mining. In: Džeroski, S., Lavraž, N. (eds.) Relational Data Mining, pp. 262–291. LNCS, Springer (2001). https://doi.org/10.1007/978-3-662-04599-2_11
Labaf, M., Hitzler, P., Evans, A.: Propositional rule extraction from neural networks under background knowledge. In: Besold, T.R., et al. (eds.) Proceedings of the Twelfth International Workshop on Neural-Symbolic Learning and Reasoning, NeSy 2017. CEUR Workshop Proceedings, vol. 2003. CEUR-WS.org (2017)
Lehmann, J., Auer, S., Bühmann, L., Tramp, S.: Class expression learning for ontology engineering. J. Web Semant. 9(1), 71–81 (2011). https://doi.org/10.1016/j.websem.2011.01.001
Lehmann, J., Bühmann, L.: ORE - a tool for repairing and enriching knowledge bases. In: Patel-Schneider, P.F., et al. (eds.) ISWC 2010. LNCS, vol. 6497, pp. 177–193. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17749-1_12
Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: AAAI 2015 Proceedings, pp. 2181–2187. AAAI Press (2015)
Liu, X., Wu, J., Zhou, Z.: Exploratory under-sampling for class-imbalance learning. In: Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), pp. 965–969. IEEE Computer Society (2006). https://doi.org/10.1109/ICDM.2006.68
Luger, G.F. (ed.): Artificial Intelligence: Structures and Strategies for Complex Problem Solving. Addison Wesley, Boston (2005)
Melo, A., Völker, J., Paulheim, H.: Type prediction in noisy RDF knowledge bases using hierarchical multilabel classification with graph and latent features. Int. J. Artif. Intell. Tools 26(2), 1–32 (2017). https://doi.org/10.1142/S0218213017600119
Minervini, P., d’Amato, C., Fanizzi, N.: Efficient energy-based embedding models for link prediction in knowledge graphs. J. Intell. Inf. Syst. 47(1), 91–109 (2016). https://doi.org/10.1007/s10844-016-0414-7
Minervini, P., Tresp, V., d’Amato, C., Fanizzi, N.: Adaptive knowledge propagation in web ontologies. TWEB 12(1), 2:1-2:28 (2018). https://doi.org/10.1145/3105961
Minervini, P., Costabello, L., Muñoz, E., Nováček, V., Vandenbussche, P.-Y.: Regularizing knowledge graph embeddings via equivalence and inversion axioms. In: Ceci, M., Hollmén, J., Todorovski, L., Vens, C., Džeroski, S. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10534, pp. 668–683. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71249-9_40
Minervini, P., Demeester, T., Rocktäschel, T., Riedel, S.: Adversarial sets for regularising neural link predictors. In: Elidan, G., et al. (eds.) UAI 2017 Proceedings. AUAI Press (2017)
Nickel, M., Murphy, K., Tresp, V., Gabrilovich, E.: A review of relational machine learning for knowledge graphs. Proc. IEEE 104(1), 11–33 (2016). https://doi.org/10.1109/JPROC.2015.2483592
Nickel, M., Tresp, V., Kriegel, H.: A three-way model for collective learning on multi-relational data. In: Getoor, L., Scheffer, T. (eds.) Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pp. 809–816. Omnipress (2011). https://icml.cc/2011/papers/438_icmlpaper.pdf
Paulheim, H.: Make embeddings semantic again! In: Proceedings of the ISWC 2018 P&D-Industry-BlueSky Tracks. CEUR Workshop Proceedings (2018)
Raedt, L.D. (ed.): Logical and Relational Learning: From ILP to MRDM (Cognitive Technologies). Springer-Verlag, Berlin (2008)
Rettinger, A., Lösch, U., Tresp, V., d’Amato, C., Fanizzi, N.: Mining the semantic web - statistical learning for next generation knowledge bases. Data Mining Knowl. Disc. 24(3), 613–662 (2012). https://doi.org/10.1007/s10618-012-0253-2
Rettinger, A., Nickles, M., Tresp, V.: Statistical relational learning with formal ontologies. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS (LNAI), vol. 5782, pp. 286–301. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04174-7_19
Ristoski, P., Paulheim, H.: RDF2Vec: RDF graph embeddings for data mining. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9981, pp. 498–514. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46523-4_30
Rizzo, G., d’Amato, C., Fanizzi, N., Esposito, F.: Terminological cluster trees for disjointness axiom discovery. In: Blomqvist, E., et al. (eds.) ESWC 2017. LNCS, vol. 10249, pp. 184–201. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58068-5_12
Rizzo, G., Fanizzi, N., d’Amato, C., Esposito, F.: Approximate classification with web ontologies through evidential terminological trees and forests. Int. J. Approx. Reason. 92, 340–362 (2018). https://doi.org/10.1016/j.ijar.2017.10.019
Rizzo, G., Fanizzi, N., d’Amato, C., Esposito, F.: A framework for tackling myopia in concept learning on the web of data. In: Faron Zucker, C., Ghidini, C., Napoli, A., Toussaint, Y. (eds.) EKAW 2018. LNCS (LNAI), vol. 11313, pp. 338–354. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03667-6_22
Sarker, M., Xie, N., Doran, D., Raymer, M., Hitzler, P.: Explaining trained neural networks with semantic web technologies: First steps. In: Besold, T.R., et al. (eds.) Proceedings of the Twelfth International Workshop on Neural-Symbolic Learning and Reasoning, NeSy 2017. CEUR Workshop Proceedings, vol. 2003. CEUR-WS.org (2017)
Shadbolt, N., Berners-Lee, T., Hall, W.: The semantic web revisited. IEEE Intell. Syst. 21(3), 96–101 (2006). https://doi.org/10.1109/MIS.2006.62
Siorpaes, K., Hepp, M.: OntoGame: towards overcoming the incentive bottleneck in ontology building. In: Meersman, R., Tari, Z., Herrero, P. (eds.) OTM 2007. LNCS, vol. 4806, pp. 1222–1232. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76890-6_50
Spinosa, E., de Leon Ferreira de Carvalho, A.P., Gama, J.: Olindda: A cluster-based approach for detecting novelty and concept drift in data streams. In: Symposium of Applied Computing: Proceedings of the ACM International Conference, SAC 2007. vol. 1, pp. 448–452. ACM (2007)
Tiddi, I., d’Aquin, M., Motta, E.: Dedalo: looking for clusters explanations in a labyrinth of linked data. In: Presutti, V., et al. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 333–348. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07443-6_23
Tran, A.C., Dietrich, J., Guesgen, H.W., Marsland, S.: An approach to parallel class expression learning. In: Bikakis, A., Giurca, A. (eds.) RuleML 2012. LNCS, vol. 7438, pp. 302–316. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32689-9_25
Tran, A., Dietrich, J., Guesgen, H., Marsland, S.: Parallel symmetric class expression learning. J. Mach. Learn. Res. 18(64), 1–34 (2017)
Völker, J., Niepert, M.: Statistical schema induction. In: ESWC 2011. LNCS, vol. 6643, pp. 124–138. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21034-1_9
Völker, J., Fleischhacker, D., Stuckenschmidt, H.: Automatic acquisition of class disjointness. J. Web Semant. 35(P2), 124–139 (2015). https://doi.org/10.1016/j.websem.2015.07.001
West, R., Gabrilovich, E., Murphy, K., Sun, S., Gupta, R., Lin, D.: Knowledge base completion via search-based question answering. In: Chung, C., et al. (eds.) 23rd International World Wide Web Conference, WWW 2014, pp. 515–526. ACM (2014). https://doi.org/10.1145/2566486.2568032
Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23(1), 69–101 (1996)
Yang, B., Yih, W., He, X., Gao, J., Deng, L.: Embedding entities and relations for learning and inference in knowledge bases. In: Proceedings of ICLR 2015 (2015)
Zhou, Z., Zhang, M.: Multi-label learning. In: Sammut, C., Geoffrey, W. (eds.) Encyclopedia of Machine Learning and Data Mining, pp. 875–881. Springer, Berlin (2017). https://doi.org/10.1007/978-1-4899-7687-1_910
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this chapter
Cite this chapter
d’Amato, C. (2022). Mining the Semantic Web with Machine Learning: Main Issues that Need to Be Known. In: Šimkus, M., Varzinczak, I. (eds) Reasoning Web. Declarative Artificial Intelligence . Reasoning Web 2021. Lecture Notes in Computer Science(), vol 13100. Springer, Cham. https://doi.org/10.1007/978-3-030-95481-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-95481-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95480-2
Online ISBN: 978-3-030-95481-9
eBook Packages: Computer ScienceComputer Science (R0)