Abstract
The current BERT-based question answering systems use a question and a contextual text to find the answer. This causes the systems to return wrong answers or nothing if the text contains irrelevant contents with the input question. Besides, the systems haven’t answered yes-no and aggregate questions yet. Besides that, the systems only concentrate on the contents of text regardless of the relationship between entities in the corpus. This systems cannot validate the answer. In this paper, we presented a solution to solve these issues by using the BERT model and the knowledge graph to enhance a question answering system. We combined content-based and linked-based information for knowledge graph representation learning and classified triples into one of three classes such as base class, derived class, or non-existent class. We then used the BERT model to build two classifiers: BERT-based text classification for content information and BERT-based triple classification for link information. The former was able to make a contextual embedding vector for representing triples that were used to classify into the three above classes. The latter generated all path instances from all meta paths of a large heterogeneous information network by running the Motif Search method of Apache Spark on a distributed environment. After creating the path instances, we produced triples from these path instances. We made content-based information by converting triples into natural language text with labels and considered them as a text classification problem. Our proposed solution outperformed other embedding methods with an average accuracy of 92.34% on benchmark datasets and the Motif Finding algorithm with an average executive time improvement of 37% on the distributed environment.
Similar content being viewed by others
References
Ankur, D, AJ (2016). GraphFrames: an integrated API for mixing graph and relational Queries. GRADES 2016, June 24 2016, Redwood Shores, CA, USA
Binbin Hu, C. S. (2018). Leveraging meta-path based context for Top-N recommendation with a neural co-attention mode KDD 2018
Changping, Meng, R. C. (2015). Discovering Meta-paths in large heterogeneous information networks. WWW 2015. Florence, Italy
Chi, Sun, X. Q. (2020). How to fine-tune BERT for text classification. Retrieved 2 12, 2020, from arXiv:1905.05583v3 [cs.CL] 5 Feb 2020
Chuan S, Y. L. (2017). A survey of heterogeneous information network analysis. IEEE Trans Knowl Data Eng, 29(1), 17–37
Diederik P, Kingma JL (2015). ADAM: a method for stochastic optimization. ICLR
Do, P. (2019). A System for Natural Language Interaction With the Heterogeneous Information Network . In B. B. Gupta, Handbook of Research on Cloud Computing and Big Data Applications in IoT (pp. 271–301). IGI Global Publishing
Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge, MA
Galkin M (2020) Knowledge graphs in natural language processing @ ACL:2020 Retrieved June 30, 2020, from https://towardsdatascience.com/knowledge-graphs-in-natural-language-processing-acl-2020-ebb1f0a6e0b1
Guller M (2015) Big data analytics with spark. Apress
Guoliang Ji, K. L. (2016). Knowledge graph completion with adaptive sparse transfer matrix. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) (pp. 985-991). Association for the Advancement of artificial
Jacob Devlin, M.-WC (2018). BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Jianfei Yu J (2019). Adapting BERT for target-oriented multimodal sentiment classification. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Kurt Bollacker CE (2008). Freebase: a collaboratively created graph database for structuring human knowledge. Proceedings of the 2008 ACM SIGMOD international conference on Management of data, SIGMOD
Liang Yao CM (2019) KG-BERT: BERT for knowledge graph completion. arXiv:190903193v2 [csCL] (11 Sep 2019)
Lijun Chang XL (2015) Efficiently computing top-K shortest path join. In: 18th international conference on extending database technology (EDBT). Belgium, Brussels
Liu H, Cheqing J (2018) Finding top-k shortest paths withdiversity. TKDE 30(3):488–502
Manish Munikar SS (2020, 3 15). Fine-grained sentiment classification using BERT. retrieved from arXiv:1910.03474v1 [cs.CL] 4 Oct 2019
Marina Sokolova NJ (2006). Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. Advances in Artificial Intelligence
Matsuoka KU (2017) Efficient breadth-first search on massively parallel and distributed-memory machines. Data Science and Engineering 2(1):22–35
Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41
Muangprathub J (2014) A novel algorithm for building concept lattice. Appl Math Sci 8(11):507–515
Ni Lao WW (2010) Fast query execution for retrieval models based on path-constrained random walks. In: KDD’10, Washington. USA, DC
Phuc Do PP (2018). DW-PathSim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information network. Journal of Information and Telecommunication, 1-20
Richard Socher DC (2013) Reasoning with neural tensor networks for Knowledge Base completion. Advances in Neural Information Processing:926–934
Santiago Gonzalez-Carvajal, EC-M (2020). Comparing BERT against traditional machine. Retrieved 5 17, 2020, from arXiv:2005.13012v1 [cs.CL] 26 May 2020
Siva Reddy, D. C. (2019). CoQA: a conversational question answering challenge. arXiv:1808.07042v2
Suchanek FM, G. K. (2007). Yago: A core of semantic knowledge Unifying WordNet and Wikipedia. WWW 2007, New York, NY, USA
Sun Y, Han J (2011). Path-Sim: Meta path-based top-k similarity search in heterogeneous information networks. VLDB, (pp. 992–1003)
Tomasz Drabas DL (2017). Learning PySpark. Packt
Weninger BS (2017). ProjE: embedding projection for knowledge graph completion. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17)
Xiangnan Kong PS (2012). Meta path-based classification in heterogeneous information networks. CIKM’12
Yadav R (2015) Spark cookbook. Packt Publishing
Zhao Zhang, FZ (2018) Knowledge graph embedding with hierarchical relation structure. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 3198-3207). Association for Computational Linguistics
Zhigang W, a. J. (2016). Text-Enhanced representation learning for knowledge graph. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16), 1293–1300
Zhilin Yang ZD (2019) XLNet: generalized autoregressive Pretraining for language understanding. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019)
Zichen, Z, RC (2018) Evaluating top-k Meta path queries on large heterogeneous Information Networks. IEEE ICDM 2018, Singapore
Acknowledgments
This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCMC) under grant number DS2020-26-01.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Do, P., Phan, T.H.V. Developing a BERT based triple classification model using knowledge graph embedding for question answering system. Appl Intell 52, 636–651 (2022). https://doi.org/10.1007/s10489-021-02460-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02460-w