Abstract
The research community has shown significant interest in Question Answering (QA) due to the strong relevance of QA applications. In recent years, there has been a significant increase in the availability of publicly accessible datasets aimed at advancing research in Arabic QA systems. This survey aims to identify, summarize, and analyze current Arabic QA datasets, such as Monolingual, Multilingual, and Cross-lingual. Our research surveys the existing datasets and provides a comprehensive and multi-faceted classification. Furthermore, this study aims to guide research in Arabic QA by providing the latest updates about the state-of-the-art in this field and identifying shortcomings in the current datasets to develop more substantial and improved collections. Finally, we discuss the existing challenges in Arabic QA datasets and highlight their potential benefits for future research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akour, M., Abufardeh, S., Magel, K., Al-Radaideh, Q.: Qarabpro: A rule based question answering system for reading comprehension tests in arabic. Am. J. Appl. Sci. 8(6), 652–661 (2011)
Alwaneen, T.H., Azmi, A.M., Aboalsamh, H.A., Cambria, E., Hussain, A.: Arabic question answering system: a survey. Artifi. Intell. Rev., 1–47 (2022)
Antoun, W., Baly, F., Hajj, H.: Araelectra: pre-training text discriminators for arabic language understanding. arXiv preprint arXiv:2012.15516 (2020)
Aouichat, A., Guessoum, A.: Building TALAA-AFAQ, a corpus of Arabic FActoid question-answers for a question answering system. In: Frasincar, F., Ittoo, A., Nguyen, L.M., Métais, E. (eds.) NLDB 2017. LNCS, vol. 10260, pp. 380–386. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59569-6_46
Artetxe, M., Ruder, S., Yogatama, D.: On the cross-lingual transferability of monolingual representations. arXiv preprint arXiv:1910.11856 (2019)
Asai, A., et al.: Xor qa: cross-lingual open-retrieval question answering. arXiv preprint arXiv:2010.11856 (2020)
Atef, A., Mattar, B., Sherif, S., Elrefai, E., Torki, M.: Aqad: 17,000+ arabic questions for machine comprehension of text. In: 2020 IEEE/ACS 17th International Conference on Computer Systems and Applications (AICCSA), pp. 1–6. IEEE (2020)
Azmi, A.M., Alshenaifi, N.A.: Lemaza: an arabic why-question answering system. Nat. Lang. Eng. 23(6), 877–903 (2017)
Bakari, W., Bellot, P., Neji, M.: Aqa-webcorp: web-based factual questions for Arabic. Proc. Comput. Sci. 96, 275–284 (2016)
Benajiba, Y., Rosso, P., Lyhyaoui, A.: Implementation of the arabiqa question answering system’s components. In: Proceedings of Workshop on Arabic Natural Language Processing, 2nd Information Communication Technologies International Symposium, ICTIS-2007, Fez, Morroco, April. pp. 3–5. Citeseer (2007)
Chandra, A., Fahrizain, A., Laufried, S.W., et al.: A survey on non-english question answering dataset. arXiv preprint arXiv:2112.13634 (2021)
Clark, J.H., et al.: Tydi qa: a benchmark for information-seeking question answering in typologically diverse languages. Trans. Assoc. Comput. Ling. 8, 454–470 (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Foong, Y.J., Oussalah, M.: Cyberbullying system detection and analysis. In: 2017 European Intelligence and Security Informatics Conference (EISIC), pp. 40–46. IEEE (2017)
Gey, F.C., Oard, D.W.: The trec-2001 cross-language information retrieval track: searching arabic using english, french or arabic queries. In: TREC, vol. 2001 (2001)
de Hond, A.A., et al.: Guidelines and quality criteria for artificial intelligence-based prediction models in healthcare: a scoping review. NPJ Digital Med. 5(1), 2 (2022)
Ismail, W.S., Homsi, M.N.: Dawqas: a dataset for Arabic why question answering system. Proc. Comput. Sci. 142, 123–131 (2018)
Khoshafah, F.: Chatgpt for arabic-english translation: Evaluating the accuracy. Preprint, it has not been peer-reviewed by a journal (2023)
Kwiatkowski, T., et al.: Natural questions: a benchmark for question answering research. Trans. Assoc. Comput. Ling. 7, 453–466 (2019)
Lewis, P., Oğuz, B., Rinott, R., Riedel, S., Schwenk, H.: Mlqa: evaluating cross-lingual extractive question answering. arXiv preprint arXiv:1910.07475 (2019)
Longpre, S., Lu, Y., Daiber, J.: Mkqa: a linguistically diverse benchmark for multilingual open domain question answering. arXiv preprint arXiv:2007.15207 (2020)
Longpre, S., Lu, Y., Daiber, J.: Mkqa: a linguistically diverse benchmark for multilingual open domain question answering. Trans. Assoc. Comput. Ling. 9, 1389–1406 (2021)
Malhas, R., Elsayed, T.: Ayatec: building a reusable verse-based test collection for Arabic question answering on the holy Qur’an. ACM Trans. Asian Low-Res. Lang. Inform. Process. (TALLIP) 19(6), 1–21 (2020)
Mozannar, H., Hajal, K.E., Maamary, E., Hajj, H.: Neural arabic question answering. arXiv preprint arXiv:1906.05394 (2019)
OpenAI: Gpt-4 technical report (2023)
Ouahrani, L., Bennouar, D.: Ar-asag an arabic dataset for automatic short answer grading evaluation. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 2634–2643 (2020)
Peñas, A., Hovy, E., Forner, P., Rodrigo, Á., Sutcliffe, R., Morante, R.: QA4MRE 2011-2013: overview of question answering for machine reading evaluation. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 303–320. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40802-1_29
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 (2016)
Roy, K., Goel, A., Goyal, P.: Effectiveness of data augmentation to identify relevant reviews for product question answering. In: Companion Proceedings of the Web Conference 2022, pp. 298–301 (2022)
Siu, S.C.: Chatgpt and gpt-4 for professional translators: Exploring the potential of large language models in translation. Available at SSRN 4448091 (2023)
Trigui, O., Belguith, L.H., Rosso, P.: Defarabicqa: Arabic definition question answering system. In: Workshop on Language Resources and Human Language Technologies for Semitic Languages, 7th LREC, Valletta, Malta, pp. 40–45 (2010)
Trigui, O., Belguith, L.H., Rosso, P., Amor, H.B., Gafsaoui, B.: Arabic QA4MRE at CLEF 2012: Arabic question answering for machine reading evaluation. In: Forner, P., Karlgren, J., Womser-Hacker, C. (eds.) CLEF 2012 Evaluation Labs and Workshop, Online Working Notes, Rome, Italy, 17–20 September 2012. CEUR Workshop Proceedings, vol. 1178. CEUR-WS.org (2012). http://ceur-ws.org/Vol-1178/CLEF2012wn-QA4MRE-TriguiEt2012.pdf
Yang, W., Xie, Y., Tan, L., Xiong, K., Li, M., Lin, J.: Data augmentation for bert fine-tuning in open-domain question answering. arXiv preprint arXiv:1904.06652 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Saoudi, Y., Gammoudi, M.M. (2024). A Comprehensive Review of Arabic Question Answering Datasets. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1961. Springer, Singapore. https://doi.org/10.1007/978-981-99-8126-7_22
Download citation
DOI: https://doi.org/10.1007/978-981-99-8126-7_22
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8125-0
Online ISBN: 978-981-99-8126-7
eBook Packages: Computer ScienceComputer Science (R0)