Skip to main content
Log in

Developing a BERT based triple classification model using knowledge graph embedding for question answering system

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The current BERT-based question answering systems use a question and a contextual text to find the answer. This causes the systems to return wrong answers or nothing if the text contains irrelevant contents with the input question. Besides, the systems haven’t answered yes-no and aggregate questions yet. Besides that, the systems only concentrate on the contents of text regardless of the relationship between entities in the corpus. This systems cannot validate the answer. In this paper, we presented a solution to solve these issues by using the BERT model and the knowledge graph to enhance a question answering system. We combined content-based and linked-based information for knowledge graph representation learning and classified triples into one of three classes such as base class, derived class, or non-existent class. We then used the BERT model to build two classifiers: BERT-based text classification for content information and BERT-based triple classification for link information. The former was able to make a contextual embedding vector for representing triples that were used to classify into the three above classes. The latter generated all path instances from all meta paths of a large heterogeneous information network by running the Motif Search method of Apache Spark on a distributed environment. After creating the path instances, we produced triples from these path instances. We made content-based information by converting triples into natural language text with labels and considered them as a text classification problem. Our proposed solution outperformed other embedding methods with an average accuracy of 92.34% on benchmark datasets and the Motif Finding algorithm with an average executive time improvement of 37% on the distributed environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Ankur, D, AJ (2016). GraphFrames: an integrated API for mixing graph and relational Queries. GRADES 2016, June 24 2016, Redwood Shores, CA, USA

  2. Binbin Hu, C. S. (2018). Leveraging meta-path based context for Top-N recommendation with a neural co-attention mode KDD 2018

  3. Changping, Meng, R. C. (2015). Discovering Meta-paths in large heterogeneous information networks. WWW 2015. Florence, Italy

  4. Chi, Sun, X. Q. (2020). How to fine-tune BERT for text classification. Retrieved 2 12, 2020, from arXiv:1905.05583v3 [cs.CL] 5 Feb 2020

  5. Chuan S, Y. L. (2017). A survey of heterogeneous information network analysis. IEEE Trans Knowl Data Eng, 29(1), 17–37

  6. Diederik P, Kingma JL (2015). ADAM: a method for stochastic optimization. ICLR

    Google Scholar 

  7. Do, P. (2019). A System for Natural Language Interaction With the Heterogeneous Information Network . In B. B. Gupta, Handbook of Research on Cloud Computing and Big Data Applications in IoT (pp. 271–301). IGI Global Publishing

  8. Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge, MA

    Book  Google Scholar 

  9. Galkin M (2020) Knowledge graphs in natural language processing @ ACL:2020 Retrieved June 30, 2020, from https://towardsdatascience.com/knowledge-graphs-in-natural-language-processing-acl-2020-ebb1f0a6e0b1

  10. Guller M (2015) Big data analytics with spark. Apress

  11. Guoliang Ji, K. L. (2016). Knowledge graph completion with adaptive sparse transfer matrix. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) (pp. 985-991). Association for the Advancement of artificial

  12. Jacob Devlin, M.-WC (2018). BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805

  13. Jianfei Yu J (2019). Adapting BERT for target-oriented multimodal sentiment classification. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence

    Google Scholar 

  14. Kurt Bollacker CE (2008). Freebase: a collaboratively created graph database for structuring human knowledge. Proceedings of the 2008 ACM SIGMOD international conference on Management of data, SIGMOD

  15. Liang Yao CM (2019) KG-BERT: BERT for knowledge graph completion. arXiv:190903193v2 [csCL] (11 Sep 2019)

  16. Lijun Chang XL (2015) Efficiently computing top-K shortest path join. In: 18th international conference on extending database technology (EDBT). Belgium, Brussels

    Google Scholar 

  17. Liu H, Cheqing J (2018) Finding top-k shortest paths withdiversity. TKDE 30(3):488–502

    Google Scholar 

  18. Manish Munikar SS (2020, 3 15). Fine-grained sentiment classification using BERT. retrieved from arXiv:1910.03474v1 [cs.CL] 4 Oct 2019

  19. Marina Sokolova NJ (2006). Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. Advances in Artificial Intelligence

    Google Scholar 

  20. Matsuoka KU (2017) Efficient breadth-first search on massively parallel and distributed-memory machines. Data Science and Engineering 2(1):22–35

    Article  Google Scholar 

  21. Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41

    Article  Google Scholar 

  22. Muangprathub J (2014) A novel algorithm for building concept lattice. Appl Math Sci 8(11):507–515

    Google Scholar 

  23. Ni Lao WW (2010) Fast query execution for retrieval models based on path-constrained random walks. In: KDD’10, Washington. USA, DC

    Google Scholar 

  24. Phuc Do PP (2018). DW-PathSim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information network. Journal of Information and Telecommunication, 1-20

  25. Richard Socher DC (2013) Reasoning with neural tensor networks for Knowledge Base completion. Advances in Neural Information Processing:926–934

  26. Santiago Gonzalez-Carvajal, EC-M (2020). Comparing BERT against traditional machine. Retrieved 5 17, 2020, from arXiv:2005.13012v1 [cs.CL] 26 May 2020

  27. Siva Reddy, D. C. (2019). CoQA: a conversational question answering challenge. arXiv:1808.07042v2

  28. Suchanek FM, G. K. (2007). Yago: A core of semantic knowledge Unifying WordNet and Wikipedia. WWW 2007, New York, NY, USA

  29. Sun Y, Han J (2011). Path-Sim: Meta path-based top-k similarity search in heterogeneous information networks. VLDB, (pp. 992–1003)

  30. Tomasz Drabas DL (2017). Learning PySpark. Packt

    Google Scholar 

  31. Weninger BS (2017). ProjE: embedding projection for knowledge graph completion. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17)

  32. Xiangnan Kong PS (2012). Meta path-based classification in heterogeneous information networks. CIKM’12

  33. Yadav R (2015) Spark cookbook. Packt Publishing

  34. Zhao Zhang, FZ (2018) Knowledge graph embedding with hierarchical relation structure. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 3198-3207). Association for Computational Linguistics

  35. Zhigang W, a. J. (2016). Text-Enhanced representation learning for knowledge graph. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16), 1293–1300

  36. Zhilin Yang ZD (2019) XLNet: generalized autoregressive Pretraining for language understanding. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019)

  37. Zichen, Z, RC (2018) Evaluating top-k Meta path queries on large heterogeneous Information Networks. IEEE ICDM 2018, Singapore

Download references

Acknowledgments

This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCMC) under grant number DS2020-26-01.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Phuc Do.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Do, P., Phan, T.H.V. Developing a BERT based triple classification model using knowledge graph embedding for question answering system. Appl Intell 52, 636–651 (2022). https://doi.org/10.1007/s10489-021-02460-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02460-w

Keywords

Navigation