Skip to main content
Log in

Question answering system with text mining and deep networks

  • Original Paper
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

Question answering systems are capable of responding to user inquiries using natural language. These systems analyze questions utilizing natural language processing methods and retrieve responses from appropriate data sources using information retrieval techniques. Additionally, text mining and deep network techniques can enhance the effectiveness of question answering systems by providing more accurate and relevant information. In this study, we developed question answering models employing text mining and deep networks. We trained a pre-existing English BERT-base model with the Stanford Question Answering Dataset (SQuADv1.1) utilizing various hyperparameters and fine-tuning values. Our training yielded impressive results with an F1 score of 88.13 and an Exact Match (EM) rate of 80.74, outperforming previous studies in the field. An improvement study was conducted on the Turkish History Question Answering Dataset (THQuADv1.0), which led to the update of the dataset to THQuADv2.0 by adding questions regarding the units of Düzce University. The pre-trained Turkish BERTurk-base model received training with the THQuADv2.0 dataset utilizing the successful hyperparameters and fine-tuning values obtained in the English model. As a consequence of the training, we developed the BERTDuQuA (BERT Düzce University Question Answering) model for answering Turkish questions. The BERTDuQuA model demonstrated exceptional performance, achieving an F1 score of 87.10 and an EM of 76.90.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1:
Algorithm 2:
Fig. 5
Fig. 6

Similar content being viewed by others

Availability of data and materials

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  • Agushaka JO, Ezugwu AE, Abualigah L (2023) Gazelle optimization algorithm: a novel nature-inspired metaheuristic optimizer. Neural Comput Appl 35:4099–4131. https://doi.org/10.1007/s00521-022-07854-6

    Article  Google Scholar 

  • Akber A, Ferdousi T, Ahmed R, Asfara R, Rab R (2023) Personality prediction based on contextual feature embedding SBERT. In: 2023 IEEE region 10 symposium (TENSYMP), Canberra, Australia. pp 1–5. https://doi.org/10.1109/TENSYMP55890.2023.10223609

  • Allam AMN, Haggag MH (2012) The question answering systems: a survey. Int J Res Rev Inf Sci (IJRRIS) 2(3):221–221

    Google Scholar 

  • Arora R, Singh P, Goyal H, Singhal S, Vijayvargiya S (2021) Comparative question answering system based on natural language processing and machine learning. In: 2021 International conference on artificial ıntelligence and smart systems (ICAIS), Coimbatore, India, pp 373–378. https://doi.org/10.1109/ICAIS50930.2021.9396015

  • Aroussi SA, Habib NE, Beqqali OE (2016) Improving question answering systems by using the explicit semantic analysis method. In: 2016 11th International conference on ıntelligent systems: theories and applications (SITA), Mohammedia, Morocco, 2016, pp 1–6. https://doi.org/10.1109/SITA.2016.7772300

  • Biswas P, Sharan A, Kumar R (2014) Question classification using syntactic and rule based approach. In: 2014 International conference on advances in computing, communications and ınformatics (ICACCI), Delhi, India, pp 1033–1038. https://doi.org/10.1109/ICACCI.2014.6968434

  • Çetiner M, Yıldırım A, Öksüz C, Onay B (2021) Mevzuat Verisetinde Soru Cevaplama Uygulamasi question answering application on legalisation dataset. In: 2021 6th International conference on computer science and engineering (UBMK), Ankara, Turkey, pp 603–607. https://doi.org/10.1109/UBMK52708.2021.9558981

  • Chau C-N, Nguyen T-S, Nguyen L-M (2020) VNLawBERT: a Vietnamese legal answer selection approach using BERT language model. In: 2020 7th NAFOSTED conference on ınformation and computer science (NICS), Ho Chi Minh City, Vietnam, pp 298–301. https://doi.org/10.1109/NICS51282.2020.9335906

  • Chen Y, Zulkernine F (2021) BIRD-QA: a BERT-based ınformation retrieval approach to domain specific question answering. In: 2021 IEEE ınternational conference on big data (big data), Orlando, FL, USA, pp 3503–3510. https://doi.org/10.1109/BigData52589.2021.9671523

  • Day M-Y, Kuo Y-L (2020) A study of deep learning for factoid question answering system. In: 2020 IEEE 21st International conference on ınformation reuse and ıntegration for data science (IRI), Las Vegas, NV, USA, pp 419–424. https://doi.org/10.1109/IRI49571.2020.00070

  • Devlin J et al (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. https://doi.org/10.48550/arXiv.1810.04805

  • Dodiya T, Jain S (2016) Question classification for medical domain question answering system. In: 2016 IEEE ınternational WIE conference on electrical and computer engineering (WIECON-ECE), Pune, India, pp 204–207. https://doi.org/10.1109/WIECON-ECE.2016.8009118

  • Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(7):2121–2159

    MathSciNet  Google Scholar 

  • Espinal A, Haralambous Y, Bedart D, Puentes J (2023) A format-sensitive BERT-based approach to resume segmentation. In: 2023 33rd Conference of open ınnovations association (FRUCT), Zilina, Slovakia, pp 30–37. https://doi.org/10.23919/FRUCT58615.2023.10143072

  • Ezugwu AE, Agushaka JO, Abualigah L et al (2022) Prairie dog optimization algorithm. Neural Comput Appl 34:20017–20065. https://doi.org/10.1007/s00521-022-07530-9

    Article  Google Scholar 

  • Ghasemi M, Zare M, Zahedi A et al (2023) Geyser inspired algorithm: a new geological-inspired meta-heuristic for real-parameter and constrained engineering optimization. J Bionic Eng. https://doi.org/10.1007/s42235-023-00437-8

    Article  Google Scholar 

  • Gong L et al (2019) Efficient training of BERT by progressively stacking. In: International conference on machine learning. PMLR

  • Gupta D et al (2018) MMQA: a multi-domain multi-lingual question-answering framework for English and Hindi. In: Proceedings of the eleventh ınternational conference on language resources and evaluation (LREC 2018)

  • Hu G, Zheng Y, Abualigah L, Hussien AG (2023a) DETDO: an adaptive hybrid dandelion optimizer for engineering optimization. Adv Eng Inform 57:102004. https://doi.org/10.1016/j.aei.2023.102004

    Article  Google Scholar 

  • Hu G, Guo Y, Wei G, Abualigah L (2023b) Genghis Khan shark optimizer: a novel nature-inspired algorithm for engineering optimization. Adv Eng Inform 58:102210. https://doi.org/10.1016/j.aei.2023.102210

    Article  Google Scholar 

  • Japa SS, Rekabdar B (2021) Memory efficient knowledge base question answering with chatbot framework. In: 2021 IEEE seventh ınternational conference on multimedia big data (BigMM), Taichung, Taiwan, pp 33–39. https://doi.org/10.1109/BigMM52142.2021.00013

  • Kanodia N, Ahmed K, Miao Y (2021) Question answering model based conversational chatbot using BERT model and Google dialogflow. In: 2021 31st International telecommunication networks and applications conference (ITNAC), Sydney, Australia, pp 19–22. https://doi.org/10.1109/ITNAC53136.2021.9652153

  • Khurana D, Koli A, Khatter K et al (2023) Natural language processing: state of the art, current trends and challenges. Multimedia Tools Appl 82:3713–3744. https://doi.org/10.1007/s11042-022-13428-4

    Article  Google Scholar 

  • Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: International conference on learning representations, pp 1–13. https://doi.org/10.48550/arXiv.1412.6980

  • Lan J, Liu W, Hu Y, Zhang J (2021) Semantic parsing and text generation of complex questions answering based on deep learning and knowledge graph. In: 2021 4th International conference on robotics, control and automation engineering (RCAE), Wuhan, China, pp 201–207. https://doi.org/10.1109/RCAE53607.2021.9638851

  • Larson T, Gong JH, Daniel J (2024) Providing a simple question answering system by mapping questions to questions. Technical report, Department of Computer Science, Stanford University, 2006

  • Lewis P et al (2019) MLQA: evaluating cross-lingual extractive question answering. arXiv preprint arXiv:1910.07475. https://doi.org/10.48550/arXiv.1910.07475

  • Li Z, Ding X, Liu T (2019) Story ending prediction by transferable BERT. arXiv preprint arXiv:1905.07504. https://doi.org/10.48550/arXiv.1905.07504

  • Li X, Shu H, Zhai Y, Lin Z (2021) A method for resume ınformation extraction using BERT-BiLSTM-CRF. In: 2021 IEEE 21st International conference on communication technology (ICCT), Tianjin, China, pp 1437–1442. https://doi.org/10.1109/ICCT52962.2021.9657937

  • Liu D (2023) Design ınformation extraction and visual representation based on artificial ıntelligence natural language processing techniques. In: 2023 4th ınternational conference on computer vision, ımage and deep learning (CVIDL), Zhuhai, China, pp 154–158. https://doi.org/10.1109/CVIDL58838.2023.10165716

  • Liu Z-J, Wang X-L, Chen Q-C, Zhang Y-Y, Xiang Y (2014) A Chinese question answering system based on web search. In: 2014 International conference on machine learning and cybernetics, Lanzhou, China, pp 816–820. https://doi.org/10.1109/ICMLC.2014.7009714

  • Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101. https://doi.org/10.48550/arXiv.1711.05101

  • Lu X, Liu W, Jiang S, Liu C (2023) Multilingual BERT cross-lingual transferability with pre-trained representations on tangut: a survey. In: 2023 5th International conference on natural language processing (ICNLP), Guangzhou, China, pp 229–234. https://doi.org/10.1109/ICNLP58431.2023.00048

  • Luo D, Su J, Yu S (2020) A BERT-based approach with relation-aware attention for knowledge base question answering. In: 2020 International joint conference on neural networks (IJCNN), Glasgow, UK, pp 1–8. https://doi.org/10.1109/IJCNN48605.2020.9207186

  • Mishra A, Sahay A, Pandey MA, Routaray SS (2023) News text analysis using text summarization and sentiment analysis based on NLP. In: 2023 3rd international conference on smart data intelligence (ICSMDI), Trichy, India, pp 28–31. https://doi.org/10.1109/ICSMDI57622.2023.00014

  • Mollá D, Vicedo JL (2007) Question answering in restricted domains: an overview. Comput Linguist 33(1):41–61. https://doi.org/10.1162/coli.2007.33.1.41

    Article  Google Scholar 

  • Nguyen CT, Nguyen DT (2021) A Vietnamese answer extraction model based on PhoBERT. In: 2021 15th International conference on advanced computing and applications (ACOMP), Ho Chi Minh City, Vietnam, pp 112–119. https://doi.org/10.1109/ACOMP53746.2021.00022

  • Nguyen QT, Nguyen TL, Luong NH, Ngo QH (2020) Fine-tuning BERT for sentiment analysis of vietnamese reviews. In: 2020 7th NAFOSTED conference on ınformation and computer science (NICS), Ho Chi Minh City, Vietnam, pp 302–307. https://doi.org/10.1109/NICS51282.2020.9335899

  • Nie Y, Zhao J, Zhang W-Q, Bai J (2022) BERT-LID: leveraging BERT to ımprove spoken language ıdentification. In: 2022 13th International symposium on Chinese spoken language processing (ISCSLP), Singapore, Singapore, pp 384–388. https://doi.org/10.1109/ISCSLP57327.2022.10038152

  • Popoff E, Besada M, Jansen JP et al (2020) Aligning text mining and machine learning algorithms with best practices for study selection in systematic literature reviews. Syst Rev 9:293. https://doi.org/10.1186/s13643-020-01520-5

    Article  Google Scholar 

  • Rajpurkar P, Zhang J, Lopyrev K, Liang P (2016) Squad: 100 000+ questions for machine comprehension of text. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP). arXiv preprint arXiv:1606.05250. https://doi.org/10.48550/arXiv.1606.05250

  • Ranjan P, Balabantaray RC (2016) Question answering system for factoid based question. In: 2016 2nd International conference on contemporary computing and ınformatics (IC3I), Greater Noida, India, pp 221–224. https://doi.org/10.1109/IC3I.2016.7917964

  • Sai Sharath J, Banafsheh R (2021) Conversational question answering over knowledge base using chat-bot framework. In: 2021 IEEE 15th ınternational conference on semantic computing (ICSC), Laguna Hills, CA, USA, pp 84–85. https://doi.org/10.1109/ICSC50631.2021.00020

  • Shan J, Nishihara Y, Han Y (2022) Identifying reply-to relation in textual group chat using unlabeled dialogue scripts and next sentence prediction. In: 2022 International conference on technologies and applications of artificial ıntelligence (TAAI), Tainan, Taiwan, pp 89–94. https://doi.org/10.1109/TAAI57707.2022.00025

  • Shao T, Kui X, Zhang P, Chen H (2019) collaborative learning for answer selection in question answering. IEEE Access 7:7337–7347. https://doi.org/10.1109/ACCESS.2018.2890102

    Article  Google Scholar 

  • Singh D, Suraksha KR, Nirmala SJ (2021) Question answering chatbot using deep learning with NLP. In: 2021 IEEE international conference on electronics, computing and communication technologies (CONECCT), Bangalore, India, pp 1–6. https://doi.org/10.1109/CONECCT52877.2021.9622709

  • Soygazi F, Çiftçi O, Kök U, Cengiz S (2021) THQuAD: Turkish historic question answering dataset for reading comprehension. In: 2021 6th International conference on computer science and engineering (UBMK), Ankara, Turkey, pp 215–220. https://doi.org/10.1109/UBMK52708.2021.9559013.

  • Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: International conference on machine learning, PMLR, pp 1139–1147

  • Tieleman T, Hinton G (2012) Rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw Mach Learn 4:26–31

    Google Scholar 

  • Tieu T-T, Chau C-N, Bui N-M-H, Nguyen T-S, Nguyen L-M (2021) Apply Bert-based models and domain knowledge for automated legal question answering tasks at ALQAC 2021. In: 2021 13th International conference on knowledge and systems engineering (KSE), Bangkok, Thailand, pp 1–6. https://doi.org/10.1109/KSE53942.2021.9648727

  • Tyagi A (2021) A review study of natural language processing techniques for text mining. Int J Eng Res Technol (IJERT). https://doi.org/10.17577/IJERTV10IS090156

    Article  Google Scholar 

  • Uğurlu Y, Karabulut M, Mayda İ (2020) A smart virtual assistant answering questions about COVID-19. In: 2020 4th International symposium on multidisciplinary studies and ınnovative technologies (ISMSIT), Istanbul, Turkey, pp 1–6. https://doi.org/10.1109/ISMSIT50672.2020.9254350

  • Vaswani A et al (2017) Attention is all you need. Adv Neural Inf Proc Syst. https://doi.org/10.48550/arXiv.1706.03762

    Article  Google Scholar 

  • Wang H, Lu X (2022) Question answering system with enhancing sentence embedding. In: 2022 11th International conference of ınformation and communication technology (ICTech)), Wuhan, China, pp 521–524. https://doi.org/10.1109/ICTech55460.2022.00109

  • Wang Y, Xin X, Guo P (2019) Relation extraction via attention-based CNNs using token-level representations. In: 2019 15th International conference on computational ıntelligence and security (CIS), Macao, China, pp 113–117. https://doi.org/10.1109/CIS.2019.00032

  • Yang Y (2021) BiEAF: an bidirectional enhanced attention flow model for question answering task. In: 2021 2nd International conference on ınformation science and education (ICISE-IE), Chongqing, China, 2021, pp 344–348. https://doi.org/10.1109/ICISE-IE53922.2021.00086

  • Yang X, Xiao Y (2022) Named entity recognition based on BERT-MBiGRU-CRF and multi-head self-attention mechanism. In: 2022 4th International conference on natural language processing (ICNLP), Xi'an, China, pp 178–183. https://doi.org/10.1109/ICNLP55136.2022.00035

  • Yin J (2022) Research on question answering system based on BERT model. In: 2022 3rd International conference on computer vision, ımage and deep learning & ınternational conference on computer engineering and applications (CVIDL & ICCEA), Changchun, China, pp 68–71. https://doi.org/10.1109/CVIDLICCEA56201.2022.9824408

  • Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701. https://doi.org/10.48550/arXiv.1212.5701

  • Zhao Y, Cao R, Bai J, Ma W, Shinnou H (2020) Determining the logical relation between two sentences by using the masked language model of BERT. In: 2020 International conference on technologies and applications of artificial ıntelligence (TAAI), Taipei, Taiwan, pp 228–231. https://doi.org/10.1109/TAAI51410.2020.00049

  • Zheng C, Wang Z, He J (2022) BERT-based mixed question answering matching model. In: 2022 11th International conference of ınformation and communication technology (ICTech)), Wuhan, China, pp 355–358. https://doi.org/10.1109/ICTech55460.2022.00077

Download references

Funding

No funding was received during the study.

Author information

Authors and Affiliations

Authors

Contributions

Hüseyin Avni ARDAÇ defining the methodology, preprocessing the dataset, data analysis, experiments, application and evaluations. Pakize ERDOĞMUŞ evaluations of the results and draft editing.

Corresponding author

Correspondence to Hüseyin Avni Ardaç.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Ethical approval

No ethical approval was necessary for this kind of investigation.

Consent to participate

Informed consent was obtained from all individual participants included in the study.

Consent to publish

Authors affirm There is no figure of any participant in the article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ardaç, H.A., Erdoğmuş, P. Question answering system with text mining and deep networks. Evolving Systems (2024). https://doi.org/10.1007/s12530-024-09592-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12530-024-09592-7

Keywords

Navigation