Abstract
Question answering systems are capable of responding to user inquiries using natural language. These systems analyze questions utilizing natural language processing methods and retrieve responses from appropriate data sources using information retrieval techniques. Additionally, text mining and deep network techniques can enhance the effectiveness of question answering systems by providing more accurate and relevant information. In this study, we developed question answering models employing text mining and deep networks. We trained a pre-existing English BERT-base model with the Stanford Question Answering Dataset (SQuADv1.1) utilizing various hyperparameters and fine-tuning values. Our training yielded impressive results with an F1 score of 88.13 and an Exact Match (EM) rate of 80.74, outperforming previous studies in the field. An improvement study was conducted on the Turkish History Question Answering Dataset (THQuADv1.0), which led to the update of the dataset to THQuADv2.0 by adding questions regarding the units of Düzce University. The pre-trained Turkish BERTurk-base model received training with the THQuADv2.0 dataset utilizing the successful hyperparameters and fine-tuning values obtained in the English model. As a consequence of the training, we developed the BERTDuQuA (BERT Düzce University Question Answering) model for answering Turkish questions. The BERTDuQuA model demonstrated exceptional performance, achieving an F1 score of 87.10 and an EM of 76.90.
Similar content being viewed by others
Availability of data and materials
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Agushaka JO, Ezugwu AE, Abualigah L (2023) Gazelle optimization algorithm: a novel nature-inspired metaheuristic optimizer. Neural Comput Appl 35:4099–4131. https://doi.org/10.1007/s00521-022-07854-6
Akber A, Ferdousi T, Ahmed R, Asfara R, Rab R (2023) Personality prediction based on contextual feature embedding SBERT. In: 2023 IEEE region 10 symposium (TENSYMP), Canberra, Australia. pp 1–5. https://doi.org/10.1109/TENSYMP55890.2023.10223609
Allam AMN, Haggag MH (2012) The question answering systems: a survey. Int J Res Rev Inf Sci (IJRRIS) 2(3):221–221
Arora R, Singh P, Goyal H, Singhal S, Vijayvargiya S (2021) Comparative question answering system based on natural language processing and machine learning. In: 2021 International conference on artificial ıntelligence and smart systems (ICAIS), Coimbatore, India, pp 373–378. https://doi.org/10.1109/ICAIS50930.2021.9396015
Aroussi SA, Habib NE, Beqqali OE (2016) Improving question answering systems by using the explicit semantic analysis method. In: 2016 11th International conference on ıntelligent systems: theories and applications (SITA), Mohammedia, Morocco, 2016, pp 1–6. https://doi.org/10.1109/SITA.2016.7772300
Biswas P, Sharan A, Kumar R (2014) Question classification using syntactic and rule based approach. In: 2014 International conference on advances in computing, communications and ınformatics (ICACCI), Delhi, India, pp 1033–1038. https://doi.org/10.1109/ICACCI.2014.6968434
Çetiner M, Yıldırım A, Öksüz C, Onay B (2021) Mevzuat Verisetinde Soru Cevaplama Uygulamasi question answering application on legalisation dataset. In: 2021 6th International conference on computer science and engineering (UBMK), Ankara, Turkey, pp 603–607. https://doi.org/10.1109/UBMK52708.2021.9558981
Chau C-N, Nguyen T-S, Nguyen L-M (2020) VNLawBERT: a Vietnamese legal answer selection approach using BERT language model. In: 2020 7th NAFOSTED conference on ınformation and computer science (NICS), Ho Chi Minh City, Vietnam, pp 298–301. https://doi.org/10.1109/NICS51282.2020.9335906
Chen Y, Zulkernine F (2021) BIRD-QA: a BERT-based ınformation retrieval approach to domain specific question answering. In: 2021 IEEE ınternational conference on big data (big data), Orlando, FL, USA, pp 3503–3510. https://doi.org/10.1109/BigData52589.2021.9671523
Day M-Y, Kuo Y-L (2020) A study of deep learning for factoid question answering system. In: 2020 IEEE 21st International conference on ınformation reuse and ıntegration for data science (IRI), Las Vegas, NV, USA, pp 419–424. https://doi.org/10.1109/IRI49571.2020.00070
Devlin J et al (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. https://doi.org/10.48550/arXiv.1810.04805
Dodiya T, Jain S (2016) Question classification for medical domain question answering system. In: 2016 IEEE ınternational WIE conference on electrical and computer engineering (WIECON-ECE), Pune, India, pp 204–207. https://doi.org/10.1109/WIECON-ECE.2016.8009118
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(7):2121–2159
Espinal A, Haralambous Y, Bedart D, Puentes J (2023) A format-sensitive BERT-based approach to resume segmentation. In: 2023 33rd Conference of open ınnovations association (FRUCT), Zilina, Slovakia, pp 30–37. https://doi.org/10.23919/FRUCT58615.2023.10143072
Ezugwu AE, Agushaka JO, Abualigah L et al (2022) Prairie dog optimization algorithm. Neural Comput Appl 34:20017–20065. https://doi.org/10.1007/s00521-022-07530-9
Ghasemi M, Zare M, Zahedi A et al (2023) Geyser inspired algorithm: a new geological-inspired meta-heuristic for real-parameter and constrained engineering optimization. J Bionic Eng. https://doi.org/10.1007/s42235-023-00437-8
Gong L et al (2019) Efficient training of BERT by progressively stacking. In: International conference on machine learning. PMLR
Gupta D et al (2018) MMQA: a multi-domain multi-lingual question-answering framework for English and Hindi. In: Proceedings of the eleventh ınternational conference on language resources and evaluation (LREC 2018)
Hu G, Zheng Y, Abualigah L, Hussien AG (2023a) DETDO: an adaptive hybrid dandelion optimizer for engineering optimization. Adv Eng Inform 57:102004. https://doi.org/10.1016/j.aei.2023.102004
Hu G, Guo Y, Wei G, Abualigah L (2023b) Genghis Khan shark optimizer: a novel nature-inspired algorithm for engineering optimization. Adv Eng Inform 58:102210. https://doi.org/10.1016/j.aei.2023.102210
Japa SS, Rekabdar B (2021) Memory efficient knowledge base question answering with chatbot framework. In: 2021 IEEE seventh ınternational conference on multimedia big data (BigMM), Taichung, Taiwan, pp 33–39. https://doi.org/10.1109/BigMM52142.2021.00013
Kanodia N, Ahmed K, Miao Y (2021) Question answering model based conversational chatbot using BERT model and Google dialogflow. In: 2021 31st International telecommunication networks and applications conference (ITNAC), Sydney, Australia, pp 19–22. https://doi.org/10.1109/ITNAC53136.2021.9652153
Khurana D, Koli A, Khatter K et al (2023) Natural language processing: state of the art, current trends and challenges. Multimedia Tools Appl 82:3713–3744. https://doi.org/10.1007/s11042-022-13428-4
Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: International conference on learning representations, pp 1–13. https://doi.org/10.48550/arXiv.1412.6980
Lan J, Liu W, Hu Y, Zhang J (2021) Semantic parsing and text generation of complex questions answering based on deep learning and knowledge graph. In: 2021 4th International conference on robotics, control and automation engineering (RCAE), Wuhan, China, pp 201–207. https://doi.org/10.1109/RCAE53607.2021.9638851
Larson T, Gong JH, Daniel J (2024) Providing a simple question answering system by mapping questions to questions. Technical report, Department of Computer Science, Stanford University, 2006
Lewis P et al (2019) MLQA: evaluating cross-lingual extractive question answering. arXiv preprint arXiv:1910.07475. https://doi.org/10.48550/arXiv.1910.07475
Li Z, Ding X, Liu T (2019) Story ending prediction by transferable BERT. arXiv preprint arXiv:1905.07504. https://doi.org/10.48550/arXiv.1905.07504
Li X, Shu H, Zhai Y, Lin Z (2021) A method for resume ınformation extraction using BERT-BiLSTM-CRF. In: 2021 IEEE 21st International conference on communication technology (ICCT), Tianjin, China, pp 1437–1442. https://doi.org/10.1109/ICCT52962.2021.9657937
Liu D (2023) Design ınformation extraction and visual representation based on artificial ıntelligence natural language processing techniques. In: 2023 4th ınternational conference on computer vision, ımage and deep learning (CVIDL), Zhuhai, China, pp 154–158. https://doi.org/10.1109/CVIDL58838.2023.10165716
Liu Z-J, Wang X-L, Chen Q-C, Zhang Y-Y, Xiang Y (2014) A Chinese question answering system based on web search. In: 2014 International conference on machine learning and cybernetics, Lanzhou, China, pp 816–820. https://doi.org/10.1109/ICMLC.2014.7009714
Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101. https://doi.org/10.48550/arXiv.1711.05101
Lu X, Liu W, Jiang S, Liu C (2023) Multilingual BERT cross-lingual transferability with pre-trained representations on tangut: a survey. In: 2023 5th International conference on natural language processing (ICNLP), Guangzhou, China, pp 229–234. https://doi.org/10.1109/ICNLP58431.2023.00048
Luo D, Su J, Yu S (2020) A BERT-based approach with relation-aware attention for knowledge base question answering. In: 2020 International joint conference on neural networks (IJCNN), Glasgow, UK, pp 1–8. https://doi.org/10.1109/IJCNN48605.2020.9207186
Mishra A, Sahay A, Pandey MA, Routaray SS (2023) News text analysis using text summarization and sentiment analysis based on NLP. In: 2023 3rd international conference on smart data intelligence (ICSMDI), Trichy, India, pp 28–31. https://doi.org/10.1109/ICSMDI57622.2023.00014
Mollá D, Vicedo JL (2007) Question answering in restricted domains: an overview. Comput Linguist 33(1):41–61. https://doi.org/10.1162/coli.2007.33.1.41
Nguyen CT, Nguyen DT (2021) A Vietnamese answer extraction model based on PhoBERT. In: 2021 15th International conference on advanced computing and applications (ACOMP), Ho Chi Minh City, Vietnam, pp 112–119. https://doi.org/10.1109/ACOMP53746.2021.00022
Nguyen QT, Nguyen TL, Luong NH, Ngo QH (2020) Fine-tuning BERT for sentiment analysis of vietnamese reviews. In: 2020 7th NAFOSTED conference on ınformation and computer science (NICS), Ho Chi Minh City, Vietnam, pp 302–307. https://doi.org/10.1109/NICS51282.2020.9335899
Nie Y, Zhao J, Zhang W-Q, Bai J (2022) BERT-LID: leveraging BERT to ımprove spoken language ıdentification. In: 2022 13th International symposium on Chinese spoken language processing (ISCSLP), Singapore, Singapore, pp 384–388. https://doi.org/10.1109/ISCSLP57327.2022.10038152
Popoff E, Besada M, Jansen JP et al (2020) Aligning text mining and machine learning algorithms with best practices for study selection in systematic literature reviews. Syst Rev 9:293. https://doi.org/10.1186/s13643-020-01520-5
Rajpurkar P, Zhang J, Lopyrev K, Liang P (2016) Squad: 100 000+ questions for machine comprehension of text. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP). arXiv preprint arXiv:1606.05250. https://doi.org/10.48550/arXiv.1606.05250
Ranjan P, Balabantaray RC (2016) Question answering system for factoid based question. In: 2016 2nd International conference on contemporary computing and ınformatics (IC3I), Greater Noida, India, pp 221–224. https://doi.org/10.1109/IC3I.2016.7917964
Sai Sharath J, Banafsheh R (2021) Conversational question answering over knowledge base using chat-bot framework. In: 2021 IEEE 15th ınternational conference on semantic computing (ICSC), Laguna Hills, CA, USA, pp 84–85. https://doi.org/10.1109/ICSC50631.2021.00020
Shan J, Nishihara Y, Han Y (2022) Identifying reply-to relation in textual group chat using unlabeled dialogue scripts and next sentence prediction. In: 2022 International conference on technologies and applications of artificial ıntelligence (TAAI), Tainan, Taiwan, pp 89–94. https://doi.org/10.1109/TAAI57707.2022.00025
Shao T, Kui X, Zhang P, Chen H (2019) collaborative learning for answer selection in question answering. IEEE Access 7:7337–7347. https://doi.org/10.1109/ACCESS.2018.2890102
Singh D, Suraksha KR, Nirmala SJ (2021) Question answering chatbot using deep learning with NLP. In: 2021 IEEE international conference on electronics, computing and communication technologies (CONECCT), Bangalore, India, pp 1–6. https://doi.org/10.1109/CONECCT52877.2021.9622709
Soygazi F, Çiftçi O, Kök U, Cengiz S (2021) THQuAD: Turkish historic question answering dataset for reading comprehension. In: 2021 6th International conference on computer science and engineering (UBMK), Ankara, Turkey, pp 215–220. https://doi.org/10.1109/UBMK52708.2021.9559013.
Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: International conference on machine learning, PMLR, pp 1139–1147
Tieleman T, Hinton G (2012) Rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw Mach Learn 4:26–31
Tieu T-T, Chau C-N, Bui N-M-H, Nguyen T-S, Nguyen L-M (2021) Apply Bert-based models and domain knowledge for automated legal question answering tasks at ALQAC 2021. In: 2021 13th International conference on knowledge and systems engineering (KSE), Bangkok, Thailand, pp 1–6. https://doi.org/10.1109/KSE53942.2021.9648727
Tyagi A (2021) A review study of natural language processing techniques for text mining. Int J Eng Res Technol (IJERT). https://doi.org/10.17577/IJERTV10IS090156
Uğurlu Y, Karabulut M, Mayda İ (2020) A smart virtual assistant answering questions about COVID-19. In: 2020 4th International symposium on multidisciplinary studies and ınnovative technologies (ISMSIT), Istanbul, Turkey, pp 1–6. https://doi.org/10.1109/ISMSIT50672.2020.9254350
Vaswani A et al (2017) Attention is all you need. Adv Neural Inf Proc Syst. https://doi.org/10.48550/arXiv.1706.03762
Wang H, Lu X (2022) Question answering system with enhancing sentence embedding. In: 2022 11th International conference of ınformation and communication technology (ICTech)), Wuhan, China, pp 521–524. https://doi.org/10.1109/ICTech55460.2022.00109
Wang Y, Xin X, Guo P (2019) Relation extraction via attention-based CNNs using token-level representations. In: 2019 15th International conference on computational ıntelligence and security (CIS), Macao, China, pp 113–117. https://doi.org/10.1109/CIS.2019.00032
Yang Y (2021) BiEAF: an bidirectional enhanced attention flow model for question answering task. In: 2021 2nd International conference on ınformation science and education (ICISE-IE), Chongqing, China, 2021, pp 344–348. https://doi.org/10.1109/ICISE-IE53922.2021.00086
Yang X, Xiao Y (2022) Named entity recognition based on BERT-MBiGRU-CRF and multi-head self-attention mechanism. In: 2022 4th International conference on natural language processing (ICNLP), Xi'an, China, pp 178–183. https://doi.org/10.1109/ICNLP55136.2022.00035
Yin J (2022) Research on question answering system based on BERT model. In: 2022 3rd International conference on computer vision, ımage and deep learning & ınternational conference on computer engineering and applications (CVIDL & ICCEA), Changchun, China, pp 68–71. https://doi.org/10.1109/CVIDLICCEA56201.2022.9824408
Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701. https://doi.org/10.48550/arXiv.1212.5701
Zhao Y, Cao R, Bai J, Ma W, Shinnou H (2020) Determining the logical relation between two sentences by using the masked language model of BERT. In: 2020 International conference on technologies and applications of artificial ıntelligence (TAAI), Taipei, Taiwan, pp 228–231. https://doi.org/10.1109/TAAI51410.2020.00049
Zheng C, Wang Z, He J (2022) BERT-based mixed question answering matching model. In: 2022 11th International conference of ınformation and communication technology (ICTech)), Wuhan, China, pp 355–358. https://doi.org/10.1109/ICTech55460.2022.00077
Funding
No funding was received during the study.
Author information
Authors and Affiliations
Contributions
Hüseyin Avni ARDAÇ defining the methodology, preprocessing the dataset, data analysis, experiments, application and evaluations. Pakize ERDOĞMUŞ evaluations of the results and draft editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Ethical approval
No ethical approval was necessary for this kind of investigation.
Consent to participate
Informed consent was obtained from all individual participants included in the study.
Consent to publish
Authors affirm There is no figure of any participant in the article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ardaç, H.A., Erdoğmuş, P. Question answering system with text mining and deep networks. Evolving Systems (2024). https://doi.org/10.1007/s12530-024-09592-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12530-024-09592-7