Abstract
Medical question-answering systems require the ability to extract accurate, concise, and comprehensive answers. They will better comprehend the complex text and produce helpful answers if they can reason on the explicit constraints described in the question’s textual context and the implicit, pertinent knowledge of the medical world. Integrating Knowledge Graphs (KG) with Language Models (LMs) is a common approach to incorporating structured information sources. However, effectively combining and reasoning over KG representations and language context remains an open question. To address this, we propose the Knowledge Infused Medical Question Answering system (KIMedQA), which employs two techniques viz. relevant knowledge graph selection and pruning of the large-scale graph to handle Vector Space Inconsistent (VSI) and Excessive Knowledge Information (EKI). The representation of the query and context are then combined with the pruned knowledge network using a pre-trained language model to generate an informed answer. Finally, we demonstrate through in-depth empirical evaluation that our suggested strategy provides cutting-edge outcomes on two benchmark datasets, namely MASH-QA and COVID-QA. We also compared our results to ChatGPT, a robust and very powerful generative model, and discovered that our model outperforms ChatGPT according to the F1 Score and human evaluation metrics such as adequacy.
Similar content being viewed by others
Notes
The code is available at https://github.com/aizan/kimedqa.
References
Abbasiantaeb, Z., & Momtazi, S. (2022). Entity-aware answer sentence selection for question answering with transformer-based language models. Journal of Intelligent Information Systems, 59. https://doi.org/10.1007/s10844-022-00724-6
Auer, S., Bizer, C., Kobilarov, G., et al. (2007). Dbpedia: A nucleus for a web of open data. Lecture Notes in Computer Science, 4825. https://doi.org/10.1007/978-3-540-76298-0_52
Bodenreider, O. (2004). The unified medical language system (umls): integrating biomedical terminology. Nucleic Acids Research, 32. https://doi.org/10.1093/nar/gkh061
Bollacker, K., Evans, C., Paritosh, P., et al. (2008). Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. https://doi.org/10.1145/1376616.1376746
Buscaldi, D., Rosso, P., Gómez-Soriano, J. M., et al. (2010). Answering questions with an n-gram based passage retrieval engine. Journal of Intelligent Information Systems, 34. https://doi.org/10.1007/s10844-009-0082-y
Cao, Y., Hou, L., Li, J., et al. (2018). Joint representation learning of cross-lingual words and entities via attentive distant supervision. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. https://doi.org/10.18653/v1/D18-1021
Chen, D., Fisch, A., Weston, J., et al. (2017). Reading wikipedia to answer open-domain questions. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). https://doi.org/10.18653/v1/P17-1171
Clark, C., Gardner, M. (2018). Simple and effective multi-paragraph reading comprehension. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). https://doi.org/10.18653/v1/P18-1078
Cortes, E. G., Woloszyn, V., Barone, D., et al. (2022). A systematic review of question answering systems for non-factoid questions. Journal of Intelligent Information Systems. https://doi.org/10.1007/s10844-021-00655-8
Cui, Y., Che, W., Liu, T., et al. (2021). Pre-training with whole word masking for chinese bert. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29. https://doi.org/10.1109/TASLP.2021.3124365
Dai, Z., Yang, Z., Yang, Y., et al. (2019). Transformer-xl: Attentive language models beyond a fixed-length context. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1285
Devlin, J., Chang, M.W., Lee, K., et al. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805https://doi.org/10.18653/v1/N19-1423
Dimitrakis, E., Sgontzos, K., & Tzitzikas, Y. (2020). A survey on question answering systems over linked data and documents. Journal of Intelligent Information Systems, 55. https://doi.org/10.1007/s10844-019-00584-7
Faldu, K., Sheth, A., Kikani, P., et al. (2021). Ki-bert: Infusing knowledge context for better language and domain understanding. arXiv:2104.08145https://doi.org/10.48550/arXiv.2104.08145
Feng, G., Du, Z., Wu, X. (2018). A chinese question answering system in medical domain. Journal of Shanghai Jiaotong University (Science) 23. https://doi.org/10.1007/s12204-018-1982-1
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5). https://doi.org/10.1037/h0031619
Han, X., Liu, Z., Sun, M. (2016). Joint representation learning of text and knowledge for knowledge graph completion. arXiv:1611.04125https://doi.org/10.48550/arXiv.1611.04125
Huang, K., Altosaar, J., Ranganath, R. (2019). Clinicalbert: Modeling clinical notes and predicting hospital readmission. arXiv:1904.05342https://doi.org/10.48550/arXiv.1904.05342
Joshi, M., Chen, D., Liu, Y., et al. (2020). Spanbert: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics, 8. https://doi.org/10.1162/tacl_a_00300
Khashabi, D., Min, S., Khot, T., et al. (2020). Unifiedqa: Crossing format boundaries with a single qa system. In: Findings of the Association for Computational Linguistics: EMNLP 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.171
Kingma, D.P., Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980https://doi.org/10.48550/arXiv.1412.6980
Kursuncu, U., Gaur, M., Sheth, A. (2020). Knowledge infused learning (k-il): Towards deep incorporation of knowledge in deep learning. Proceedings of the AAAI 2020 Spring Symposium on Combining Machine Learning and Knowledge Engineering in Practice (AAAI-MAKE). https://doi.org/10.48550/arXiv.1912.00512
Lee, J., Yoon, W., Kim, S., et al. (2020). Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36. https://doi.org/10.1093/bioinformatics/btz682
Li, Z., Sun, Y., Zhu, J., et al. (2021). Improve relation extraction with dual attention-guided graph convolutional networks. Neural Computing and Applications, 33. https://doi.org/10.1007/s00521-020-05087-z
Lin, B.Y., Chen, X., Chen, J., et al. (2019). Kagnet: Knowledge-aware graph networks for commonsense reasoning. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)https://doi.org/10.18653/v1/D19-1282
Lin, C. Y., Wu, Y. H., & Chen, A. L. (2021). Selecting the most helpful answers in online health question answering communities. Journal of Intelligent Information Systems, 57. https://doi.org/10.1007/s10844-021-00640-1
Liu, W., Zhou, P., Zhao, Z., et al. (2020). K-bert: Enabling language representation with knowledge graph. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34. https://doi.org/10.1609/aaai.v34i03.5681
Liu, Y., Ott, M., Goyal, N., et al. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692https://doi.org/10.48550/arXiv.1907.11692
Lukovnikov, D., Fischer, A., Lehmann, J., et al. (2017). Neural network-based question answering over knowledge graphs on word and character level. In: Proceedings of the 26th International Conference on World Wide Web. https://doi.org/10.1145/3038912.3052675
Lv, S., Guo, D., Xu, J., et al. (2020). Graph-based reasoning over heterogeneous external knowledge for commonsense question answering. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34. https://doi.org/10.1609/aaai.v34i05.6364
Lyu, K., Tian, Y., Shang, Y., et al. (2023). Causal knowledge graph construction and evaluation for clinical decision support of diabetic nephropathy. Journal of Biomedical Informatics. https://doi.org/10.1016/j.jbi.2023.104298
Mikolov, T., Chen, K., Corrado, G., et al. (2013). Efficient estimation of word representations in vector space. arXiv:1301.3781https://doi.org/10.48550/arXiv.1301.3781
Möller, T., Reina, A., Jayakumar, R., et al. (2020). Covid-qa: A question answering dataset for covid-19. In: Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020. https://aclanthology.org/2020.nlpcovid19-acl.18
Nentidis, A., Katsimpras, G., Vandorou, E., et al. (2022). Overview of bioasq 2022: The tenth bioasq challenge on large-scale biomedical semantic indexing and question answering. In: Experimental IR Meets Multilinguality, Multimodality, and Interaction: 13th International Conference of the CLEF Association, CLEF 2022, Bologna, Italy, September 5–8, 2022, Proceedings. Springer. https://doi.org/10.1007/978-3-031-13643-6_22
Pampari, A., Raghavan, P., Liang, J., et al. (2018). emrqa: A large corpus for question answering on electronic medical records. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). https://doi.org/10.18653/v1/D18-1258
Park, C., Park, J., & Park, S. (2020). Agcn: Attention-based graph convolutional networks for drug-drug interaction extraction. Expert Systems with Applications, 159. https://doi.org/10.1016/j.eswa.2020.113538
Peng, Z., Yu, H., & Jia, X. (2022). Path-based reasoning with k-nearest neighbor and position embedding for knowledge graph completion. Journal of Intelligent Information Systems. https://doi.org/10.1007/s10844-021-00671-8
Petroni, F., Rocktäschel, T., Lewis, P., et al. (2019). Language models as knowledge bases? In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). https://doi.org/10.18653/v1/D19-1250
Qin, C., Zhang, A., Zhang, Z., et al. (2023). Is chatgpt a general-purpose natural language processing task solver?. arXiv:2302.06476https://doi.org/10.48550/arXiv.2302.06476
Qiu, L., Xiao, Y., Qu, Y., et al. (2019). Dynamically fused graph network for multi-hop reasoning. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguisticshttps://doi.org/10.18653/v1/P19-1617
Qiu, Y., Li, M., Wang, Y., et al. (2018). Hierarchical type constrained topic entity detection for knowledge base question answering. In: Companion Proceedings of the The Web Conference 2018. https://doi.org/10.1145/3184558.3186916
Raffel, C., Shazeer, N., Roberts, A., et al. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(140). https://doi.org/10.5555/3455716.3455856
Roberts, K., Simpson, M., Demner-Fushman, D., et al. (2016). State-of-the-art in biomedical literature retrieval for clinical cases: a survey of the trec 2014 cds track. Information Retrieval Journal, 19. https://doi.org/10.1007/s10791-015-9259-x
Savenkov, D., Agichtein, E. (2016). When a knowledge base is not enough: Question answering over knowledge bases with external text data. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. https://doi.org/10.1145/2911451.2911536
Seo, M., Kembhavi, A., Farhadi, A., et al. (2016). Bidirectional attention flow for machine comprehension. In: International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.1611.01603
Soldaini, L., Goharian, N. (2016). Quickumls: a fast, unsupervised approach for medical concept extraction. In: MedIR workshop, SIGIR. https://ir.cs.georgetown.edu/downloads/quickumls.pdf
Speer, R., Chin, J., Havasi, C. (2017). Conceptnet 5.5: An open multilingual graph of general knowledge. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 31. https://doi.org/10.1609/aaai.v31i1.11164
Suchanek, F.M., Kasneci, G., Weikum, G. (2007). Yago: a core of semantic knowledge. In: Proceedings of the 16th international conference on World Wide Web. https://doi.org/10.1145/1242572.1242667
Sun, Y., Wang, S., Li, Y., et al. (2019). Ernie: Enhanced representation through knowledge integration. arXiv:1904.09223https://doi.org/10.48550/arXiv.1904.09223
Suster, S., Daelemans, W. (2018). Clicr: a dataset of clinical case reports for machine reading comprehension. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. https://doi.org/10.18653/v1/N18-1140
Toutanova, K., Chen, D., Pantel, P., et al. (2015). Representing text for joint embedding of text and knowledge bases. In: Proceedings of the 2015 conference on Empirical Methods in Natural Language Processing (EMNLP). https://doi.org/10.18653/v1/D15-1174
Tran, T. N. T., Felfernig, A., Trattner, C., et al. (2021). Recommender systems in the healthcare domain: state-of-the-art and research issues. Journal of Intelligent Information Systems, 57. https://doi.org/10.1007/s10844-020-00633-6
Trinh, T.H., Le, Q.V. (2018). A simple method for commonsense reasoning. arXiv:1806.02847https://doi.org/10.48550/arXiv.1806.02847
Wang, Q., Mao, Z., Wang, B., et al. (2017). Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering, 29. https://doi.org/10.1109/TKDE.2017.2754499
Wang, X., Kapanipathi, P., Musa, R., et al. (2019). Improving natural language inference using external knowledge in the science questions domain. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33. https://doi.org/10.1609/aaai.v33i01.33017208
Wang, X., Gao, T., Zhu, Z., et al. (2021). Kepler: A unified model for knowledge embedding and pre-trained language representation. Transactions of the Association for Computational Linguistics. https://doi.org/10.1162/tacl_a_00360
Wang, Z., Zhang, J., Feng, J., et al. (2014). Knowledge graph and text jointly embedding. In: Proceedings of the 2014 conference on Empirical Methods in Natural Language Processing (EMNLP)https://doi.org/10.3115/v1/D14-1167
Wang, Z., Ng, P., Ma, X., et al. (2019). Multi-passage bert: A globally normalized bert model for open-domain question answering. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). https://doi.org/10.18653/v1/D19-1599
Wishart, D. S., Feunang, Y. D., Guo, A. C., et al. (2018). Drugbank 5.0: a major update to the drugbank database for 2018. Nucleic Acids Research, 46. https://doi.org/10.1093/nar/gkx1037
Xiong, Y., Peng, H., Xiang, Y., et al. (2022). Leveraging multi-source knowledge for chinese clinical named entity recognition via relational graph convolutional network. Journal of Biomedical Informatics, 128. https://doi.org/10.1016/j.jbi.2022.104035
Yang, Z., Dai, Z., Yang, Y., et al. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. Advances in Neural Information Processing Systems 32. https://doi.org/10.48550/arXiv.1906.08237
Yao, L., Mao, C., Luo, Y. (2019). Kg-bert: Bert for knowledge graph completion. arXiv:1909.03193https://doi.org/10.48550/arXiv.1909.03193
Yasunaga, M., Ren, H., Bosselut, A., et al. (2021). Qa-gnn: Reasoning with language models and knowledge graphs for question answering. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. https://doi.org/10.18653/v1/2021.naacl-main.45
Yue, B., Gui, M., Guo, J., et al. (2017). An effective framework for question answering over freebase via reconstructing natural sequences. In: Proceedings of the 26th International Conference on World Wide Web Companion https://doi.org/10.1145/3041021.3054240
Zafar, A., Sahoo, S.K., Bhardawaj, H., et al. (2023). Ki-mag: A knowledge-infused abstractive question answering system in medical domain. Neurocomputing. https://doi.org/10.1016/j.neucom.2023.127141
Zhang, X., Bosselut, A., Yasunaga, M., et al. (2022). Greaselm: Graph reasoning enhanced language models for question answering. In: International Conference on Representation Learning (ICLR). https://doi.org/10.48550/arXiv.2201.08860
Zhang, Y., Chen, Q., Yang, Z., et al. (2019). Biowordvec, improving biomedical word embeddings with subword information and mesh. Scientific Data, 6. https://doi.org/10.1038/s41597-019-0055-0
Zhang, Y., Qi, P., Manning, C.D. (2018). Graph convolution over pruned dependency trees improves relation extraction. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). https://doi.org/10.18653/v1/D18-1244
Zheng, S., Rao, J., Song, Y., et al. (2021). Pharmkg: a dedicated knowledge graph benchmark for bomedical data mining. Briefings in Bioinformatics, 22. https://doi.org/10.1093/bib/bbaa344
Zhu, M., Ahuja, A., Juan, D.C., et al. (2020). Question answering with long multiple-span answers. In: Findings of the Association for Computational Linguistics: EMNLP 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.342
Zhu, Y., Kiros, R., Zemel, R., et al. (2015). Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), https://doi.org/10.1109/ICCV.2015.11
Acknowledgements
Authors gratefully acknowledge the support from the projects “Percuro-A Holistic Solution for Text Mining“, sponsored by Wipro Ltd; and “Sevak-An Intelligent Indian Language Chabot“, sponsored by Imprint 2, SERB, Government of India.
Funding
The authors did not receive support from any organization for the submitted work.
Author information
Authors and Affiliations
Contributions
Aizan: Conceptualization, Methodology, Software, Validation, Writing – original draft, Investigation. Sovan Kumar Sahoo: Conceptualization, Writing – original draft, Investigation. Deeksha Varshney: Writing – review and editing, Investigation error analysis. Amitava Das: Writing – review and editing, Supervision, Resources. Asif Ekbal: Writing – review and editing, Supervision, Resources.
Corresponding authors
Ethics declarations
Competing Interests
The authors declare no competing interests.
Ethical Approval
We make use of publicly available datasets. Without violating any copyright issues, we followed the policies of the datasets we used.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zafar, A., Sahoo, S.K., Varshney, D. et al. KIMedQA: towards building knowledge-enhanced medical QA models. J Intell Inf Syst (2024). https://doi.org/10.1007/s10844-024-00844-1
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10844-024-00844-1