Skip to main content
Log in

A similar structural and semantic integrated method for RDF entity embedding

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Resource Description Framework (RDF) graphs have become an important data source for many knowledge discovery algorithms and data mining tasks. However, most complex analyses that use knowledge discovery algorithms require data in a vector representation format. As a result, several RDF entity embedding techniques have emerged in which entities in the RDF graph are represented as low-dimensional vectors. These techniques generate sequences of entities using graph walk and use language modeling techniques to extract the feature from the sequences used to learn the embedding. However, sequences produced by graph walks only capture structural context; they are unable to capture latent context, such as semantically related information, which is an important property of RDF data. In this paper, we present a novel method that consists of a series of steps that generate sequences. These sequences not only capture structural context but also semantic property. The method for structural context includes (1) a new concept of similar entities in which tradeoffs are made between similar outgoing edges and outgoing nodes and (2) a new structural similarity, which calculates the similarity between two entities in each sequence. We can generate sequences based on structural similarity so that similar entities contain sequences with similar structures. The method for the semantic property combines sequences with the same semantic to generate latent sequences that cannot be generated by traversing the graph. This paper presents experimental results and a case study using real graphs to show that the proposed method outperforms existing methods in terms of quality and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://arnetminer.org/citation (V4 version is used)

  2. http://citeseerx.ist.psu.edu

  3. https://github.com/chrisPiemonte/TripWalk

References

  1. Hogan A, Blomqvist E, Cochez M, d’Amato C, Melo GD, Gutierrez C, Kirrane S, Gayo JEL, Navigli R, Neumaier S et al (2021) Knowledge graphs. ACM Comput Surv (CSUR) 54(4):1–37

    Article  MATH  Google Scholar 

  2. Wang M, Qiu L, Wang X (2021) A survey on knowledge graph embeddings for link prediction. Symmetry 13(3):485

    Article  Google Scholar 

  3. Biswas R (2020) Embedding based link prediction for knowledge graph completion. In: Proceedings of the 29th ACM international conference on information & knowledge management, pp 3221–3224

  4. Rossi A, Barbosa D, Firmani D, Matinata A, Merialdo P (2021) Knowledge graph embedding for link prediction: A comparative analysis. ACM Trans Knowl Discov Data (TKDD) 15(2):1–49

    Article  Google Scholar 

  5. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781

  6. Pennington J, Socher R, Manning C (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  7. Perozzi B, Al-Rfou R, Skiena S (2014) DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 701–710

  8. Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 855–864

  9. Ristoski P, Paulheim H (2016) RDF2Vec: RDF graph embeddings for data mining. In: International semantic web conference. Springer, pp 498–514

  10. Cochez M, Ristoski P, Ponzetto SP, Paulheim H (2017) Biased graph walks for RDF graph embeddings. In: Proceedings of the 7th international conference on web intelligence, mining and semantics. ACM, p 21

  11. Cochez M, Ristoski P, Ponzetto SP, Paulheim H (2017) Global RDF vector space embeddings. In: International semantic web conference. Springer, pp 190–207

  12. Guan N, Song D, Liao L (2019) Knowledge graph embedding with concepts. Knowl-Based Syst 164:38–44

    Article  Google Scholar 

  13. Xu M (2021) Understanding graph embedding methods and their applications. SIAM Rev 63 (4):825–853

    Article  MathSciNet  MATH  Google Scholar 

  14. Ribeiro LF, Saverese PH, Figueiredo DR (2017) struc2vec: learning node representations from structural identity. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 385–394

  15. Gesese GA, Biswas R, Alam M, Sack H (2021) A survey on knowledge graph embeddings with literals: Which model links better literal-ly? Semantic Web 12(4):617–647

    Article  Google Scholar 

  16. Kristiadi A, Khan MA, Lukovnikov D, Lehmann J, Fischer A (2019) Incorporating literals into knowledge graph embeddings. In: International semantic web conference. Springer, pp 347– 363

  17. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  18. Devlin J, Chang M-W, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805

  19. Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155

    MATH  Google Scholar 

  20. Krbec P (2006) Language modeling for speech recognition of Czech

  21. Mikolov T, Kopecky J, Burget L, Glembek O et al (2009) Neural network based language models for highly inflective languages. In: 2009 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 4725–4728

  22. Nechaev Y, Corcoglioniti F, Giuliano C (2017) Linking knowledge bases to social media profiles. In: Proceedings of the symposium on applied computing. SAC ’17. Association for Computing Machinery, New York, pp 145–150

  23. Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems, pp 2787–2795

  24. Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: AAAI, vol 14, pp 1112–1119

  25. Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015) Learning entity and relation embeddings for knowledge graph completion. In: 29th AAAI conference on artificial intelligence

  26. Cappuzzo R, Papotti P, Thirumuruganathan S (2019) Local embeddings for relational data integration. arXiv:1909.01120

  27. Biswas R, Sack H, Alam M (2021) MADLINK: attentive multihop and entity descriptions for link prediction in knowledge graphs. Semantic Web (Preprint), 1–24

  28. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Advances in neural information processing systems 27

  29. Reimers N, Gurevych I (2019) Sentence-BERT: sentence embeddings using siamese BERT-networks. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). Association for Computational Linguistics, pp 3982–3992

  30. Cappuzzo R, Papotti P, Thirumuruganathan S (2020) Creating embeddings of heterogeneous relational datasets for data integration tasks. In: Proceedings of the 2020 ACM SIGMOD international conference on management of data, pp 1335– 1349

  31. Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning. PMLR, pp 1188–1196

  32. Aguilar J, Salazar C, Velasco H, Monsalve-Pulido J, Montoya E (2020) Comparison and evaluation of different methods for the feature extraction from educational contents. Computation 8(2):30

    Article  Google Scholar 

  33. Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) Graphx: Graph processing in a distributed dataflow framework. In: 11th {USENIX} symposium on operating systems design and implementation ({OSDI} 14), pp 599–613

  34. Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, Van Kleef P, Auer S et al (2015) DBpedia–a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web 6(2):167–195

    Article  Google Scholar 

  35. Ristoski P, De Vries GKD, Paulheim H (2016) A collection of benchmark datasets for systematic evaluations of machine learning on the semantic web. In: International semantic web conference. Springer, pp 186–194

  36. Paulheim H (2012) Generating possible interpretations for statistics from linked open data. In: Extended semantic web conference. Springer, pp 560–574

  37. Ristoski P, Paulheim H, Svátek V, Zeman V (2015) The linked data mining challenge 2015. In: KNOW@ LOD

  38. Ristoski P, Paulheim H, Svátek V, Zeman V (2016) The linked data mining challenge 2016. In: (KNOW@ LOD/CoDeS)@ ESWC

Download references

Acknowledgements

This work was supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIP) (No. IITP-2022-2021-0-00859, Development of a distributed graph DBMS for intelligent processing of big graphs and No.RS-2022-00155911, Artificial Intelligence Convergence Innovation Human Resources Development (Kyung Hee University))

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Young-Koo Lee.

Ethics declarations

Conflict of Interests

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Van, D., Lee, YK. A similar structural and semantic integrated method for RDF entity embedding. Appl Intell 53, 19302–19316 (2023). https://doi.org/10.1007/s10489-023-04520-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04520-9

Keywords

Navigation