Abstract
The resource description framework (RDF) is a framework for describing metadata, such as attributes and relationships of resources on the Web. Machine learning tasks for RDF graphs adopt three methods: (i) support vector machines (SVMs) with RDF graph kernels, (ii) RDF graph embeddings, and (iii) relational graph convolutional networks. In this paper, we propose a novel feature vector (called a Skip vector) that represents some features of each resource in an RDF graph by extracting various combinations of neighboring edges and nodes. In order to make the Skip vector low-dimensional, we select important features for classification tasks based on the information gain ratio of each feature. The classification tasks can be performed by applying the low-dimensional Skip vector of each resource to conventional machine learning algorithms, such as SVMs, the k-nearest neighbors method, neural networks, random forests, and AdaBoost. In our evaluation experiments with RDF data, such as Wikidata, DBpedia, and YAGO, we compare our method with RDF graph kernels in an SVM. We also compare our method with the two approaches: RDF graph embeddings such as RDF2Vec and relational graph convolutional networks on the AIFB, MUTAG, BGS, and AM benchmarks. As a result, our proposed Skip vectors can represent the features of target resources in an RDF graph better than traditional methods and make conventional machine learning algorithms applicable to classification tasks in RDF data.
Similar content being viewed by others
Data availability
The datasets are available from the corresponding author on reasonable request.
Notes
The maximum value is chosen if many best parameters based on the validation set exist.
References
Arai, D.: Kaneiwa, Ken: A kernel function for redundant features from RDF graphs and its fast calculation. Trans Jpn Soc Artificial Intell 32(1), 1–12 (2017). ((in Japanese))
Arai, D.: Kaneiwa, Ken: A generic kernel for various RDF graphs. Trans. Jpn. Soc. Artificial Intell. 33(5), 1–14 (2018). ((in Japanese))
Bicer, Veli, Tran, Thanh, Gossen, Anna: Relational kernel machines for learning from graph-structured RDF data. In: Proceedings of the 8th Extended Semantic Web Conference, (ESWC 2011), pp 47–62, (2011)
Collins, M, Duffy, N: Convolution kernels for natural language. In: Proceedings of the Neural Information Processing Systems (NIPS 14), pp 625–632 (2001)
Exner, P, Nugues, P: Entity extraction: From unstructured text to dbpedia RDF triples. In: Proceedings of the Web of Linked Entities Workshop (WoLE 2012), pp 58–69, (2012)
Fanizzi, N, d’Amato, C: A declarative kernel for ALC concept descriptions. In: Proceedings of the 16th international symposium on methodologies for intelligent systems (ISMIS 2006), pp 322–331, (2006)
Fanizzi, N., d’Amato, C., Esposito, F.: Induction of robust classifiers for web ontologies through kernel machines. J. Web Semant. 11, 1–13 (2012)
Hido, S, Kashima, H: A linear-time graph kernel. In: Proceedings of the 9th IEEE international conference on data mining (ICDM 2009), pp 179–188, (2009)
Huang, Y., Tresp, V., Nickel, M., Rettinger, A., Kriegel, H.-P.: A scalable approach for statistical learning in semantic graphs. Semantic Web 5(1), 5–22 (2014)
Kang, U., Tong, H, Sun, J: Fast random walk graph kernel. In: Proceedings of the 12th SIAM international conference on data mining (SDM), pp 828–838, (2012)
Kipf, TN., Welling, M: Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th international conference on learning representations (ICLR 2017), (2017)
Klaas, G, de Vries, D: A fast approximation of the weisfeiler-lehman graph kernel for RDF data. In: proceedings of the European conference on machine learning and knowledge discovery in databases (ECML PKDD 2013), Part I, pp 606–621, (2013)
Klaas, G., de Vries, D., de Rooij, S.: Substructure counting graph kernels for machine learning from RDF data. J. Web Semant. 35, 71–84 (2015)
Lösch, Uta, B, Stephan, RA: Graph kernels for RDF data. In: Proceedings of the 9th extended semantic web conference (ESWC 2012), pp 134–148 (2012)
Marzagão, DK, Huynh, TD, Helal, AM, Luc: Provenance graph kernel. CoRR, arXiv:2010.10343 (2020)
Mikolov, Tomás, Chen, K, Corrado, G, Dean, J: Efficient estimation of word representations in vector space. In: Proceedings of the 1st international conference on learning representations (ICLR 2013), (2013)
Paulheim, H, Fürnkranz, J: Unsupervised generation of data mining features from linked open data. In: Proceedings of the 2nd international conference on web intelligence, mining and semantics, (WIMS ’12), pp 31:1–31:12 (2012)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Portisch, J, Paulheim, H: Walk this way! - entity walks and property walks for RDF2vec. In: Proceedings of the 19th European semantic web conference ESWC 2022, pp 133–137, (2022)
Ristoski, P., Rosati, J., Di Noia, T., De Leone, R., Paulheim, H.: RDF2Vec: RDF graph embeddings and their applications. J. Semantic Web 10(4), 721–752 (2019)
Ristoski, P., Rosati, J., Di Noia, T., De Leone, R., Paulheim, H.: RDF2Vec: RDF graph embeddings and their applications. Semantic Web 10(4), 721–752 (2019)
Schlichtkrull, MS, Kipf, TN., Bloem, P, van den B, Rianne, T, Ivan, WM: Modeling relational data with graph convolutional networks. In: Proceedings of the European semantic web conference (ESWC 2018), pp 593–607, (2018)
Schmachtenberg, M, Bizer, C, Paulheim, H: Adoption of the linked data best practices in different topical domains. In: Proceedings of the 13th international semantic web conference (ISWC 2014), pp 245–260, (2014)
Shervashidze, N., Schweitzer, P., van Leeuwen, E.J., Mehlhorn, K., Borgwardt, K.M.: Weisfeiler-lehman graph kernels. J. Mach. Learn. Res. 12(77), 2539–2561 (2011)
Steenwinckel, B, Vandewiele, G, Bonte, P, Weyns, M, Paulheim, H, Ristoski, P, De T, Filip, OF: Walk extraction strategies for node embeddings with RDF2Vec in knowledge graphs. In: Proceedings of DEXA 2021 Workshops - BIOKDD, IWCFS, MLKgraphs, AI-CARES, ProTime, AISys 2021, pp 70–80, (2021)
Tian, A, Zhang, C, Rang, M, Yang, X, Zhan, Z: RA-GCN: relational aggregation graph convolutional network for knowledge graph completion. In: Proceedings of the 12th international conference on machine learning and computing (ICMLC 2020), pp 580–586, (2020)
Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R., Borgwardt, K.M.: Graph kernels. J. Mach. Learn. Res. 11(40), 1201–1242 (2010)
Zhen, Z, Xiang, W, M, Huang, Y, Nehorai, YA: Retgk: Graph kernels based on return probabilities of random walks. In: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, Montréal, Canada, pp 3968–3978, (2018)
Acknowledgements
This work was supported by JSPS KAKENHI Grant Number JP18K11547.
Funding
This work was supported by JSPS KAKENHI Grant Number JP18K11547.
Author information
Authors and Affiliations
Contributions
K. Kaneiwa designed the method and Y. Minami designed the method and conducted the experiments. All authors wrote and reviewed the main manuscript text.
Corresponding author
Ethics declarations
Conflict of interest
All authors have no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kaneiwa, K., Minami, Y. Feature selection based on the complexity of structural patterns in RDF graphs. Int J Data Sci Anal (2023). https://doi.org/10.1007/s41060-023-00466-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41060-023-00466-w