Feature selection based on the complexity of structural patterns in RDF graphs

Kaneiwa, Ken; Minami, Yota

doi:10.1007/s41060-023-00466-w

Feature selection based on the complexity of structural patterns in RDF graphs

Regular Paper
Published: 10 November 2023

(2023)
Cite this article

International Journal of Data Science and Analytics Aims and scope Submit manuscript

Ken Kaneiwa¹ &
Yota Minami¹

83 Accesses
Explore all metrics

Abstract

The resource description framework (RDF) is a framework for describing metadata, such as attributes and relationships of resources on the Web. Machine learning tasks for RDF graphs adopt three methods: (i) support vector machines (SVMs) with RDF graph kernels, (ii) RDF graph embeddings, and (iii) relational graph convolutional networks. In this paper, we propose a novel feature vector (called a Skip vector) that represents some features of each resource in an RDF graph by extracting various combinations of neighboring edges and nodes. In order to make the Skip vector low-dimensional, we select important features for classification tasks based on the information gain ratio of each feature. The classification tasks can be performed by applying the low-dimensional Skip vector of each resource to conventional machine learning algorithms, such as SVMs, the k-nearest neighbors method, neural networks, random forests, and AdaBoost. In our evaluation experiments with RDF data, such as Wikidata, DBpedia, and YAGO, we compare our method with RDF graph kernels in an SVM. We also compare our method with the two approaches: RDF graph embeddings such as RDF2Vec and relational graph convolutional networks on the AIFB, MUTAG, BGS, and AM benchmarks. As a result, our proposed Skip vectors can represent the features of target resources in an RDF graph better than traditional methods and make conventional machine learning algorithms applicable to classification tasks in RDF data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Graph convolutional networks: a comprehensive review

Article Open access 10 November 2019

Modeling Relational Data with Graph Convolutional Networks

Graph neural networks in node classification: survey and evaluation

Article 02 November 2021

Data availability

The datasets are available from the corresponding author on reasonable request.

Notes

http://www.sw.cei.uec.ac.jp/frost/.
https://www.wikidata.org/.
http://ja.dbpedia.org/.
https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/.
https://github.com/tkipf/relational-gcn.
The maximum value is chosen if many best parameters based on the validation set exist.

References

Arai, D.: Kaneiwa, Ken: A kernel function for redundant features from RDF graphs and its fast calculation. Trans Jpn Soc Artificial Intell 32(1), 1–12 (2017). ((in Japanese))
Article MathSciNet Google Scholar
Arai, D.: Kaneiwa, Ken: A generic kernel for various RDF graphs. Trans. Jpn. Soc. Artificial Intell. 33(5), 1–14 (2018). ((in Japanese))
Article Google Scholar
Bicer, Veli, Tran, Thanh, Gossen, Anna: Relational kernel machines for learning from graph-structured RDF data. In: Proceedings of the 8th Extended Semantic Web Conference, (ESWC 2011), pp 47–62, (2011)
Collins, M, Duffy, N: Convolution kernels for natural language. In: Proceedings of the Neural Information Processing Systems (NIPS 14), pp 625–632 (2001)
Exner, P, Nugues, P: Entity extraction: From unstructured text to dbpedia RDF triples. In: Proceedings of the Web of Linked Entities Workshop (WoLE 2012), pp 58–69, (2012)
Fanizzi, N, d’Amato, C: A declarative kernel for ALC concept descriptions. In: Proceedings of the 16th international symposium on methodologies for intelligent systems (ISMIS 2006), pp 322–331, (2006)
Fanizzi, N., d’Amato, C., Esposito, F.: Induction of robust classifiers for web ontologies through kernel machines. J. Web Semant. 11, 1–13 (2012)
Article Google Scholar
Hido, S, Kashima, H: A linear-time graph kernel. In: Proceedings of the 9th IEEE international conference on data mining (ICDM 2009), pp 179–188, (2009)
Huang, Y., Tresp, V., Nickel, M., Rettinger, A., Kriegel, H.-P.: A scalable approach for statistical learning in semantic graphs. Semantic Web 5(1), 5–22 (2014)
Article Google Scholar
Kang, U., Tong, H, Sun, J: Fast random walk graph kernel. In: Proceedings of the 12th SIAM international conference on data mining (SDM), pp 828–838, (2012)
Kipf, TN., Welling, M: Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th international conference on learning representations (ICLR 2017), (2017)
Klaas, G, de Vries, D: A fast approximation of the weisfeiler-lehman graph kernel for RDF data. In: proceedings of the European conference on machine learning and knowledge discovery in databases (ECML PKDD 2013), Part I, pp 606–621, (2013)
Klaas, G., de Vries, D., de Rooij, S.: Substructure counting graph kernels for machine learning from RDF data. J. Web Semant. 35, 71–84 (2015)
Article Google Scholar
Lösch, Uta, B, Stephan, RA: Graph kernels for RDF data. In: Proceedings of the 9th extended semantic web conference (ESWC 2012), pp 134–148 (2012)
Marzagão, DK, Huynh, TD, Helal, AM, Luc: Provenance graph kernel. CoRR, arXiv:2010.10343 (2020)
Mikolov, Tomás, Chen, K, Corrado, G, Dean, J: Efficient estimation of word representations in vector space. In: Proceedings of the 1st international conference on learning representations (ICLR 2013), (2013)
Paulheim, H, Fürnkranz, J: Unsupervised generation of data mining features from linked open data. In: Proceedings of the 2nd international conference on web intelligence, mining and semantics, (WIMS ’12), pp 31:1–31:12 (2012)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Portisch, J, Paulheim, H: Walk this way! - entity walks and property walks for RDF2vec. In: Proceedings of the 19th European semantic web conference ESWC 2022, pp 133–137, (2022)
Ristoski, P., Rosati, J., Di Noia, T., De Leone, R., Paulheim, H.: RDF2Vec: RDF graph embeddings and their applications. J. Semantic Web 10(4), 721–752 (2019)
Article Google Scholar
Ristoski, P., Rosati, J., Di Noia, T., De Leone, R., Paulheim, H.: RDF2Vec: RDF graph embeddings and their applications. Semantic Web 10(4), 721–752 (2019)
Article Google Scholar
Schlichtkrull, MS, Kipf, TN., Bloem, P, van den B, Rianne, T, Ivan, WM: Modeling relational data with graph convolutional networks. In: Proceedings of the European semantic web conference (ESWC 2018), pp 593–607, (2018)
Schmachtenberg, M, Bizer, C, Paulheim, H: Adoption of the linked data best practices in different topical domains. In: Proceedings of the 13th international semantic web conference (ISWC 2014), pp 245–260, (2014)
Shervashidze, N., Schweitzer, P., van Leeuwen, E.J., Mehlhorn, K., Borgwardt, K.M.: Weisfeiler-lehman graph kernels. J. Mach. Learn. Res. 12(77), 2539–2561 (2011)
MathSciNet MATH Google Scholar
Steenwinckel, B, Vandewiele, G, Bonte, P, Weyns, M, Paulheim, H, Ristoski, P, De T, Filip, OF: Walk extraction strategies for node embeddings with RDF2Vec in knowledge graphs. In: Proceedings of DEXA 2021 Workshops - BIOKDD, IWCFS, MLKgraphs, AI-CARES, ProTime, AISys 2021, pp 70–80, (2021)
Tian, A, Zhang, C, Rang, M, Yang, X, Zhan, Z: RA-GCN: relational aggregation graph convolutional network for knowledge graph completion. In: Proceedings of the 12th international conference on machine learning and computing (ICMLC 2020), pp 580–586, (2020)
Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R., Borgwardt, K.M.: Graph kernels. J. Mach. Learn. Res. 11(40), 1201–1242 (2010)
MathSciNet MATH Google Scholar
Zhen, Z, Xiang, W, M, Huang, Y, Nehorai, YA: Retgk: Graph kernels based on return probabilities of random walks. In: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, Montréal, Canada, pp 3968–3978, (2018)

Download references

Acknowledgements

This work was supported by JSPS KAKENHI Grant Number JP18K11547.

Funding

This work was supported by JSPS KAKENHI Grant Number JP18K11547.

Author information

Authors and Affiliations

Department of Computer and Network Engineering, Graduate School of Informatics and Engineering, The University of Electro-Communications, Tokyo, Japan
Ken Kaneiwa & Yota Minami

Authors

Ken Kaneiwa
View author publications
You can also search for this author in PubMed Google Scholar
Yota Minami
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

K. Kaneiwa designed the method and Y. Minami designed the method and conducted the experiments. All authors wrote and reviewed the main manuscript text.

Corresponding author

Correspondence to Ken Kaneiwa.

Ethics declarations

Conflict of interest

All authors have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kaneiwa, K., Minami, Y. Feature selection based on the complexity of structural patterns in RDF graphs. Int J Data Sci Anal (2023). https://doi.org/10.1007/s41060-023-00466-w

Download citation

Received: 20 February 2023
Accepted: 02 October 2023
Published: 10 November 2023
DOI: https://doi.org/10.1007/s41060-023-00466-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature selection based on the complexity of structural patterns in RDF graphs

Abstract

Access this article

Similar content being viewed by others

Graph convolutional networks: a comprehensive review

Modeling Relational Data with Graph Convolutional Networks

Graph neural networks in node classification: survey and evaluation

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature selection based on the complexity of structural patterns in RDF graphs

Abstract

Access this article

Similar content being viewed by others

Graph convolutional networks: a comprehensive review

Modeling Relational Data with Graph Convolutional Networks

Graph neural networks in node classification: survey and evaluation

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation