Developing a BERT based triple classification model using knowledge graph embedding for question answering system

Do, Phuc; Phan, Truong H. V.

doi:10.1007/s10489-021-02460-w

Developing a BERT based triple classification model using knowledge graph embedding for question answering system

Published: 08 May 2021

Volume 52, pages 636–651, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Phuc Do¹ &
Truong H. V. Phan^1,2

3105 Accesses
40 Citations
Explore all metrics

Abstract

The current BERT-based question answering systems use a question and a contextual text to find the answer. This causes the systems to return wrong answers or nothing if the text contains irrelevant contents with the input question. Besides, the systems haven’t answered yes-no and aggregate questions yet. Besides that, the systems only concentrate on the contents of text regardless of the relationship between entities in the corpus. This systems cannot validate the answer. In this paper, we presented a solution to solve these issues by using the BERT model and the knowledge graph to enhance a question answering system. We combined content-based and linked-based information for knowledge graph representation learning and classified triples into one of three classes such as base class, derived class, or non-existent class. We then used the BERT model to build two classifiers: BERT-based text classification for content information and BERT-based triple classification for link information. The former was able to make a contextual embedding vector for representing triples that were used to classify into the three above classes. The latter generated all path instances from all meta paths of a large heterogeneous information network by running the Motif Search method of Apache Spark on a distributed environment. After creating the path instances, we produced triples from these path instances. We made content-based information by converting triples into natural language text with labels and considered them as a text classification problem. Our proposed solution outperformed other embedding methods with an average accuracy of 92.34% on benchmark datasets and the Motif Finding algorithm with an average executive time improvement of 37% on the distributed environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Knowledge Graphs: Opportunities and Challenges

Article Open access 03 April 2023

Modeling Relational Data with Graph Convolutional Networks

Effectiveness of Fine-tuned BERT Model in Classification of Helpful and Unhelpful Online Customer Reviews

Article 29 April 2022

References

Ankur, D, AJ (2016). GraphFrames: an integrated API for mixing graph and relational Queries. GRADES 2016, June 24 2016, Redwood Shores, CA, USA
Binbin Hu, C. S. (2018). Leveraging meta-path based context for Top-N recommendation with a neural co-attention mode KDD 2018
Changping, Meng, R. C. (2015). Discovering Meta-paths in large heterogeneous information networks. WWW 2015. Florence, Italy
Chi, Sun, X. Q. (2020). How to fine-tune BERT for text classification. Retrieved 2 12, 2020, from arXiv:1905.05583v3 [cs.CL] 5 Feb 2020
Chuan S, Y. L. (2017). A survey of heterogeneous information network analysis. IEEE Trans Knowl Data Eng, 29(1), 17–37
Diederik P, Kingma JL (2015). ADAM: a method for stochastic optimization. ICLR
Google Scholar
Do, P. (2019). A System for Natural Language Interaction With the Heterogeneous Information Network . In B. B. Gupta, Handbook of Research on Cloud Computing and Big Data Applications in IoT (pp. 271–301). IGI Global Publishing
Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge, MA
Book Google Scholar
Galkin M (2020) Knowledge graphs in natural language processing @ ACL:2020 Retrieved June 30, 2020, from https://towardsdatascience.com/knowledge-graphs-in-natural-language-processing-acl-2020-ebb1f0a6e0b1
Guller M (2015) Big data analytics with spark. Apress
Guoliang Ji, K. L. (2016). Knowledge graph completion with adaptive sparse transfer matrix. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) (pp. 985-991). Association for the Advancement of artificial
Jacob Devlin, M.-WC (2018). BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Jianfei Yu J (2019). Adapting BERT for target-oriented multimodal sentiment classification. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Google Scholar
Kurt Bollacker CE (2008). Freebase: a collaboratively created graph database for structuring human knowledge. Proceedings of the 2008 ACM SIGMOD international conference on Management of data, SIGMOD
Liang Yao CM (2019) KG-BERT: BERT for knowledge graph completion. arXiv:190903193v2 [csCL] (11 Sep 2019)
Lijun Chang XL (2015) Efficiently computing top-K shortest path join. In: 18th international conference on extending database technology (EDBT). Belgium, Brussels
Google Scholar
Liu H, Cheqing J (2018) Finding top-k shortest paths withdiversity. TKDE 30(3):488–502
Google Scholar
Manish Munikar SS (2020, 3 15). Fine-grained sentiment classification using BERT. retrieved from arXiv:1910.03474v1 [cs.CL] 4 Oct 2019
Marina Sokolova NJ (2006). Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. Advances in Artificial Intelligence
Google Scholar
Matsuoka KU (2017) Efficient breadth-first search on massively parallel and distributed-memory machines. Data Science and Engineering 2(1):22–35
Article Google Scholar
Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41
Article Google Scholar
Muangprathub J (2014) A novel algorithm for building concept lattice. Appl Math Sci 8(11):507–515
Google Scholar
Ni Lao WW (2010) Fast query execution for retrieval models based on path-constrained random walks. In: KDD’10, Washington. USA, DC
Google Scholar
Phuc Do PP (2018). DW-PathSim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information network. Journal of Information and Telecommunication, 1-20
Richard Socher DC (2013) Reasoning with neural tensor networks for Knowledge Base completion. Advances in Neural Information Processing:926–934
Santiago Gonzalez-Carvajal, EC-M (2020). Comparing BERT against traditional machine. Retrieved 5 17, 2020, from arXiv:2005.13012v1 [cs.CL] 26 May 2020
Siva Reddy, D. C. (2019). CoQA: a conversational question answering challenge. arXiv:1808.07042v2
Suchanek FM, G. K. (2007). Yago: A core of semantic knowledge Unifying WordNet and Wikipedia. WWW 2007, New York, NY, USA
Sun Y, Han J (2011). Path-Sim: Meta path-based top-k similarity search in heterogeneous information networks. VLDB, (pp. 992–1003)
Tomasz Drabas DL (2017). Learning PySpark. Packt
Google Scholar
Weninger BS (2017). ProjE: embedding projection for knowledge graph completion. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17)
Xiangnan Kong PS (2012). Meta path-based classification in heterogeneous information networks. CIKM’12
Yadav R (2015) Spark cookbook. Packt Publishing
Zhao Zhang, FZ (2018) Knowledge graph embedding with hierarchical relation structure. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 3198-3207). Association for Computational Linguistics
Zhigang W, a. J. (2016). Text-Enhanced representation learning for knowledge graph. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16), 1293–1300
Zhilin Yang ZD (2019) XLNet: generalized autoregressive Pretraining for language understanding. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019)
Zichen, Z, RC (2018) Evaluating top-k Meta path queries on large heterogeneous Information Networks. IEEE ICDM 2018, Singapore

Download references

Acknowledgments

This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCMC) under grant number DS2020-26-01.

Author information

Authors and Affiliations

University of Information Technology, Vietnam National University, Ho Chi Minh City, Vietnam
Phuc Do & Truong H. V. Phan
Van Lang University, Ho Chi Minh City, Vietnam
Truong H. V. Phan

Authors

Phuc Do
View author publications
You can also search for this author in PubMed Google Scholar
Truong H. V. Phan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Phuc Do.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Do, P., Phan, T.H.V. Developing a BERT based triple classification model using knowledge graph embedding for question answering system. Appl Intell 52, 636–651 (2022). https://doi.org/10.1007/s10489-021-02460-w

Download citation

Accepted: 20 April 2021
Published: 08 May 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s10489-021-02460-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Developing a BERT based triple classification model using knowledge graph embedding for question answering system

Abstract

Access this article

Similar content being viewed by others

Knowledge Graphs: Opportunities and Challenges

Modeling Relational Data with Graph Convolutional Networks

Effectiveness of Fine-tuned BERT Model in Classification of Helpful and Unhelpful Online Customer Reviews

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Developing a BERT based triple classification model using knowledge graph embedding for question answering system

Abstract

Access this article

Similar content being viewed by others

Knowledge Graphs: Opportunities and Challenges

Modeling Relational Data with Graph Convolutional Networks

Effectiveness of Fine-tuned BERT Model in Classification of Helpful and Unhelpful Online Customer Reviews

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation