Abstract
Resource Description Framework (RDF) graphs have become an important data source for many knowledge discovery algorithms and data mining tasks. However, most complex analyses that use knowledge discovery algorithms require data in a vector representation format. As a result, several RDF entity embedding techniques have emerged in which entities in the RDF graph are represented as low-dimensional vectors. These techniques generate sequences of entities using graph walk and use language modeling techniques to extract the feature from the sequences used to learn the embedding. However, sequences produced by graph walks only capture structural context; they are unable to capture latent context, such as semantically related information, which is an important property of RDF data. In this paper, we present a novel method that consists of a series of steps that generate sequences. These sequences not only capture structural context but also semantic property. The method for structural context includes (1) a new concept of similar entities in which tradeoffs are made between similar outgoing edges and outgoing nodes and (2) a new structural similarity, which calculates the similarity between two entities in each sequence. We can generate sequences based on structural similarity so that similar entities contain sequences with similar structures. The method for the semantic property combines sequences with the same semantic to generate latent sequences that cannot be generated by traversing the graph. This paper presents experimental results and a case study using real graphs to show that the proposed method outperforms existing methods in terms of quality and efficiency.
Similar content being viewed by others
Notes
References
Hogan A, Blomqvist E, Cochez M, d’Amato C, Melo GD, Gutierrez C, Kirrane S, Gayo JEL, Navigli R, Neumaier S et al (2021) Knowledge graphs. ACM Comput Surv (CSUR) 54(4):1–37
Wang M, Qiu L, Wang X (2021) A survey on knowledge graph embeddings for link prediction. Symmetry 13(3):485
Biswas R (2020) Embedding based link prediction for knowledge graph completion. In: Proceedings of the 29th ACM international conference on information & knowledge management, pp 3221–3224
Rossi A, Barbosa D, Firmani D, Matinata A, Merialdo P (2021) Knowledge graph embedding for link prediction: A comparative analysis. ACM Trans Knowl Discov Data (TKDD) 15(2):1–49
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781
Pennington J, Socher R, Manning C (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Perozzi B, Al-Rfou R, Skiena S (2014) DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 701–710
Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 855–864
Ristoski P, Paulheim H (2016) RDF2Vec: RDF graph embeddings for data mining. In: International semantic web conference. Springer, pp 498–514
Cochez M, Ristoski P, Ponzetto SP, Paulheim H (2017) Biased graph walks for RDF graph embeddings. In: Proceedings of the 7th international conference on web intelligence, mining and semantics. ACM, p 21
Cochez M, Ristoski P, Ponzetto SP, Paulheim H (2017) Global RDF vector space embeddings. In: International semantic web conference. Springer, pp 190–207
Guan N, Song D, Liao L (2019) Knowledge graph embedding with concepts. Knowl-Based Syst 164:38–44
Xu M (2021) Understanding graph embedding methods and their applications. SIAM Rev 63 (4):825–853
Ribeiro LF, Saverese PH, Figueiredo DR (2017) struc2vec: learning node representations from structural identity. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 385–394
Gesese GA, Biswas R, Alam M, Sack H (2021) A survey on knowledge graph embeddings with literals: Which model links better literal-ly? Semantic Web 12(4):617–647
Kristiadi A, Khan MA, Lukovnikov D, Lehmann J, Fischer A (2019) Incorporating literals into knowledge graph embeddings. In: International semantic web conference. Springer, pp 347– 363
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Devlin J, Chang M-W, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
Krbec P (2006) Language modeling for speech recognition of Czech
Mikolov T, Kopecky J, Burget L, Glembek O et al (2009) Neural network based language models for highly inflective languages. In: 2009 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 4725–4728
Nechaev Y, Corcoglioniti F, Giuliano C (2017) Linking knowledge bases to social media profiles. In: Proceedings of the symposium on applied computing. SAC ’17. Association for Computing Machinery, New York, pp 145–150
Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems, pp 2787–2795
Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: AAAI, vol 14, pp 1112–1119
Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015) Learning entity and relation embeddings for knowledge graph completion. In: 29th AAAI conference on artificial intelligence
Cappuzzo R, Papotti P, Thirumuruganathan S (2019) Local embeddings for relational data integration. arXiv:1909.01120
Biswas R, Sack H, Alam M (2021) MADLINK: attentive multihop and entity descriptions for link prediction in knowledge graphs. Semantic Web (Preprint), 1–24
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Advances in neural information processing systems 27
Reimers N, Gurevych I (2019) Sentence-BERT: sentence embeddings using siamese BERT-networks. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). Association for Computational Linguistics, pp 3982–3992
Cappuzzo R, Papotti P, Thirumuruganathan S (2020) Creating embeddings of heterogeneous relational datasets for data integration tasks. In: Proceedings of the 2020 ACM SIGMOD international conference on management of data, pp 1335– 1349
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning. PMLR, pp 1188–1196
Aguilar J, Salazar C, Velasco H, Monsalve-Pulido J, Montoya E (2020) Comparison and evaluation of different methods for the feature extraction from educational contents. Computation 8(2):30
Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) Graphx: Graph processing in a distributed dataflow framework. In: 11th {USENIX} symposium on operating systems design and implementation ({OSDI} 14), pp 599–613
Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, Van Kleef P, Auer S et al (2015) DBpedia–a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web 6(2):167–195
Ristoski P, De Vries GKD, Paulheim H (2016) A collection of benchmark datasets for systematic evaluations of machine learning on the semantic web. In: International semantic web conference. Springer, pp 186–194
Paulheim H (2012) Generating possible interpretations for statistics from linked open data. In: Extended semantic web conference. Springer, pp 560–574
Ristoski P, Paulheim H, Svátek V, Zeman V (2015) The linked data mining challenge 2015. In: KNOW@ LOD
Ristoski P, Paulheim H, Svátek V, Zeman V (2016) The linked data mining challenge 2016. In: (KNOW@ LOD/CoDeS)@ ESWC
Acknowledgements
This work was supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIP) (No. IITP-2022-2021-0-00859, Development of a distributed graph DBMS for intelligent processing of big graphs and No.RS-2022-00155911, Artificial Intelligence Convergence Innovation Human Resources Development (Kyung Hee University))
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Van, D., Lee, YK. A similar structural and semantic integrated method for RDF entity embedding. Appl Intell 53, 19302–19316 (2023). https://doi.org/10.1007/s10489-023-04520-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04520-9