Skip to main content
Log in

Feature selection based on the complexity of structural patterns in RDF graphs

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

The resource description framework (RDF) is a framework for describing metadata, such as attributes and relationships of resources on the Web. Machine learning tasks for RDF graphs adopt three methods: (i) support vector machines (SVMs) with RDF graph kernels, (ii) RDF graph embeddings, and (iii) relational graph convolutional networks. In this paper, we propose a novel feature vector (called a Skip vector) that represents some features of each resource in an RDF graph by extracting various combinations of neighboring edges and nodes. In order to make the Skip vector low-dimensional, we select important features for classification tasks based on the information gain ratio of each feature. The classification tasks can be performed by applying the low-dimensional Skip vector of each resource to conventional machine learning algorithms, such as SVMs, the k-nearest neighbors method, neural networks, random forests, and AdaBoost. In our evaluation experiments with RDF data, such as Wikidata, DBpedia, and YAGO, we compare our method with RDF graph kernels in an SVM. We also compare our method with the two approaches: RDF graph embeddings such as RDF2Vec and relational graph convolutional networks on the AIFB, MUTAG, BGS, and AM benchmarks. As a result, our proposed Skip vectors can represent the features of target resources in an RDF graph better than traditional methods and make conventional machine learning algorithms applicable to classification tasks in RDF data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data availability

The datasets are available from the corresponding author on reasonable request.

Notes

  1. http://www.sw.cei.uec.ac.jp/frost/.

  2. https://www.wikidata.org/.

  3. http://ja.dbpedia.org/.

  4. https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/.

  5. https://github.com/tkipf/relational-gcn.

  6. The maximum value is chosen if many best parameters based on the validation set exist.

References

  1. Arai, D.: Kaneiwa, Ken: A kernel function for redundant features from RDF graphs and its fast calculation. Trans Jpn Soc Artificial Intell 32(1), 1–12 (2017). ((in Japanese))

    Article  MathSciNet  Google Scholar 

  2. Arai, D.: Kaneiwa, Ken: A generic kernel for various RDF graphs. Trans. Jpn. Soc. Artificial Intell. 33(5), 1–14 (2018). ((in Japanese))

    Article  Google Scholar 

  3. Bicer, Veli, Tran, Thanh, Gossen, Anna: Relational kernel machines for learning from graph-structured RDF data. In: Proceedings of the 8th Extended Semantic Web Conference, (ESWC 2011), pp 47–62, (2011)

  4. Collins, M, Duffy, N: Convolution kernels for natural language. In: Proceedings of the Neural Information Processing Systems (NIPS 14), pp 625–632 (2001)

  5. Exner, P, Nugues, P: Entity extraction: From unstructured text to dbpedia RDF triples. In: Proceedings of the Web of Linked Entities Workshop (WoLE 2012), pp 58–69, (2012)

  6. Fanizzi, N, d’Amato, C: A declarative kernel for ALC concept descriptions. In: Proceedings of the 16th international symposium on methodologies for intelligent systems (ISMIS 2006), pp 322–331, (2006)

  7. Fanizzi, N., d’Amato, C., Esposito, F.: Induction of robust classifiers for web ontologies through kernel machines. J. Web Semant. 11, 1–13 (2012)

    Article  Google Scholar 

  8. Hido, S, Kashima, H: A linear-time graph kernel. In: Proceedings of the 9th IEEE international conference on data mining (ICDM 2009), pp 179–188, (2009)

  9. Huang, Y., Tresp, V., Nickel, M., Rettinger, A., Kriegel, H.-P.: A scalable approach for statistical learning in semantic graphs. Semantic Web 5(1), 5–22 (2014)

    Article  Google Scholar 

  10. Kang, U., Tong, H, Sun, J: Fast random walk graph kernel. In: Proceedings of the 12th SIAM international conference on data mining (SDM), pp 828–838, (2012)

  11. Kipf, TN., Welling, M: Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th international conference on learning representations (ICLR 2017), (2017)

  12. Klaas, G, de Vries, D: A fast approximation of the weisfeiler-lehman graph kernel for RDF data. In: proceedings of the European conference on machine learning and knowledge discovery in databases (ECML PKDD 2013), Part I, pp 606–621, (2013)

  13. Klaas, G., de Vries, D., de Rooij, S.: Substructure counting graph kernels for machine learning from RDF data. J. Web Semant. 35, 71–84 (2015)

    Article  Google Scholar 

  14. Lösch, Uta, B, Stephan, RA: Graph kernels for RDF data. In: Proceedings of the 9th extended semantic web conference (ESWC 2012), pp 134–148 (2012)

  15. Marzagão, DK, Huynh, TD, Helal, AM, Luc: Provenance graph kernel. CoRR, arXiv:2010.10343 (2020)

  16. Mikolov, Tomás, Chen, K, Corrado, G, Dean, J: Efficient estimation of word representations in vector space. In: Proceedings of the 1st international conference on learning representations (ICLR 2013), (2013)

  17. Paulheim, H, Fürnkranz, J: Unsupervised generation of data mining features from linked open data. In: Proceedings of the 2nd international conference on web intelligence, mining and semantics, (WIMS ’12), pp 31:1–31:12 (2012)

  18. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  19. Portisch, J, Paulheim, H: Walk this way! - entity walks and property walks for RDF2vec. In: Proceedings of the 19th European semantic web conference ESWC 2022, pp 133–137, (2022)

  20. Ristoski, P., Rosati, J., Di Noia, T., De Leone, R., Paulheim, H.: RDF2Vec: RDF graph embeddings and their applications. J. Semantic Web 10(4), 721–752 (2019)

    Article  Google Scholar 

  21. Ristoski, P., Rosati, J., Di Noia, T., De Leone, R., Paulheim, H.: RDF2Vec: RDF graph embeddings and their applications. Semantic Web 10(4), 721–752 (2019)

    Article  Google Scholar 

  22. Schlichtkrull, MS, Kipf, TN., Bloem, P, van den B, Rianne, T, Ivan, WM: Modeling relational data with graph convolutional networks. In: Proceedings of the European semantic web conference (ESWC 2018), pp 593–607, (2018)

  23. Schmachtenberg, M, Bizer, C, Paulheim, H: Adoption of the linked data best practices in different topical domains. In: Proceedings of the 13th international semantic web conference (ISWC 2014), pp 245–260, (2014)

  24. Shervashidze, N., Schweitzer, P., van Leeuwen, E.J., Mehlhorn, K., Borgwardt, K.M.: Weisfeiler-lehman graph kernels. J. Mach. Learn. Res. 12(77), 2539–2561 (2011)

    MathSciNet  MATH  Google Scholar 

  25. Steenwinckel, B, Vandewiele, G, Bonte, P, Weyns, M, Paulheim, H, Ristoski, P, De T, Filip, OF: Walk extraction strategies for node embeddings with RDF2Vec in knowledge graphs. In: Proceedings of DEXA 2021 Workshops - BIOKDD, IWCFS, MLKgraphs, AI-CARES, ProTime, AISys 2021, pp 70–80, (2021)

  26. Tian, A, Zhang, C, Rang, M, Yang, X, Zhan, Z: RA-GCN: relational aggregation graph convolutional network for knowledge graph completion. In: Proceedings of the 12th international conference on machine learning and computing (ICMLC 2020), pp 580–586, (2020)

  27. Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R., Borgwardt, K.M.: Graph kernels. J. Mach. Learn. Res. 11(40), 1201–1242 (2010)

    MathSciNet  MATH  Google Scholar 

  28. Zhen, Z, Xiang, W, M, Huang, Y, Nehorai, YA: Retgk: Graph kernels based on return probabilities of random walks. In: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, Montréal, Canada, pp 3968–3978, (2018)

Download references

Acknowledgements

This work was supported by JSPS KAKENHI Grant Number JP18K11547.

Funding

This work was supported by JSPS KAKENHI Grant Number JP18K11547.

Author information

Authors and Affiliations

Authors

Contributions

K. Kaneiwa designed the method and Y. Minami designed the method and conducted the experiments. All authors wrote and reviewed the main manuscript text.

Corresponding author

Correspondence to Ken Kaneiwa.

Ethics declarations

Conflict of interest

All authors have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kaneiwa, K., Minami, Y. Feature selection based on the complexity of structural patterns in RDF graphs. Int J Data Sci Anal (2023). https://doi.org/10.1007/s41060-023-00466-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s41060-023-00466-w

Keywords

Navigation