A Comprehensive Review of Arabic Question Answering Datasets

Saoudi, Yassine; Gammoudi, Mohamed Mohsen

doi:10.1007/978-981-99-8126-7_22

Yassine Saoudi¹⁰ &
Mohamed Mohsen Gammoudi¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1961))

Included in the following conference series:

International Conference on Neural Information Processing

420 Accesses

Abstract

The research community has shown significant interest in Question Answering (QA) due to the strong relevance of QA applications. In recent years, there has been a significant increase in the availability of publicly accessible datasets aimed at advancing research in Arabic QA systems. This survey aims to identify, summarize, and analyze current Arabic QA datasets, such as Monolingual, Multilingual, and Cross-lingual. Our research surveys the existing datasets and provides a comprehensive and multi-faceted classification. Furthermore, this study aims to guide research in Arabic QA by providing the latest updates about the state-of-the-art in this field and identifying shortcomings in the current datasets to develop more substantial and improved collections. Finally, we discuss the existing challenges in Arabic QA datasets and highlight their potential benefits for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://omarito.me/arabic-askfm-dataset/.

References

Akour, M., Abufardeh, S., Magel, K., Al-Radaideh, Q.: Qarabpro: A rule based question answering system for reading comprehension tests in arabic. Am. J. Appl. Sci. 8(6), 652–661 (2011)
Article Google Scholar
Alwaneen, T.H., Azmi, A.M., Aboalsamh, H.A., Cambria, E., Hussain, A.: Arabic question answering system: a survey. Artifi. Intell. Rev., 1–47 (2022)
Google Scholar
Antoun, W., Baly, F., Hajj, H.: Araelectra: pre-training text discriminators for arabic language understanding. arXiv preprint arXiv:2012.15516 (2020)
Aouichat, A., Guessoum, A.: Building TALAA-AFAQ, a corpus of Arabic FActoid question-answers for a question answering system. In: Frasincar, F., Ittoo, A., Nguyen, L.M., Métais, E. (eds.) NLDB 2017. LNCS, vol. 10260, pp. 380–386. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59569-6_46
Chapter Google Scholar
Artetxe, M., Ruder, S., Yogatama, D.: On the cross-lingual transferability of monolingual representations. arXiv preprint arXiv:1910.11856 (2019)
Asai, A., et al.: Xor qa: cross-lingual open-retrieval question answering. arXiv preprint arXiv:2010.11856 (2020)
Atef, A., Mattar, B., Sherif, S., Elrefai, E., Torki, M.: Aqad: 17,000+ arabic questions for machine comprehension of text. In: 2020 IEEE/ACS 17th International Conference on Computer Systems and Applications (AICCSA), pp. 1–6. IEEE (2020)
Google Scholar
Azmi, A.M., Alshenaifi, N.A.: Lemaza: an arabic why-question answering system. Nat. Lang. Eng. 23(6), 877–903 (2017)
Article Google Scholar
Bakari, W., Bellot, P., Neji, M.: Aqa-webcorp: web-based factual questions for Arabic. Proc. Comput. Sci. 96, 275–284 (2016)
Article Google Scholar
Benajiba, Y., Rosso, P., Lyhyaoui, A.: Implementation of the arabiqa question answering system’s components. In: Proceedings of Workshop on Arabic Natural Language Processing, 2nd Information Communication Technologies International Symposium, ICTIS-2007, Fez, Morroco, April. pp. 3–5. Citeseer (2007)
Google Scholar
Chandra, A., Fahrizain, A., Laufried, S.W., et al.: A survey on non-english question answering dataset. arXiv preprint arXiv:2112.13634 (2021)
Clark, J.H., et al.: Tydi qa: a benchmark for information-seeking question answering in typologically diverse languages. Trans. Assoc. Comput. Ling. 8, 454–470 (2020)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Foong, Y.J., Oussalah, M.: Cyberbullying system detection and analysis. In: 2017 European Intelligence and Security Informatics Conference (EISIC), pp. 40–46. IEEE (2017)
Google Scholar
Gey, F.C., Oard, D.W.: The trec-2001 cross-language information retrieval track: searching arabic using english, french or arabic queries. In: TREC, vol. 2001 (2001)
Google Scholar
de Hond, A.A., et al.: Guidelines and quality criteria for artificial intelligence-based prediction models in healthcare: a scoping review. NPJ Digital Med. 5(1), 2 (2022)
Article Google Scholar
Ismail, W.S., Homsi, M.N.: Dawqas: a dataset for Arabic why question answering system. Proc. Comput. Sci. 142, 123–131 (2018)
Article Google Scholar
Khoshafah, F.: Chatgpt for arabic-english translation: Evaluating the accuracy. Preprint, it has not been peer-reviewed by a journal (2023)
Google Scholar
Kwiatkowski, T., et al.: Natural questions: a benchmark for question answering research. Trans. Assoc. Comput. Ling. 7, 453–466 (2019)
Google Scholar
Lewis, P., Oğuz, B., Rinott, R., Riedel, S., Schwenk, H.: Mlqa: evaluating cross-lingual extractive question answering. arXiv preprint arXiv:1910.07475 (2019)
Longpre, S., Lu, Y., Daiber, J.: Mkqa: a linguistically diverse benchmark for multilingual open domain question answering. arXiv preprint arXiv:2007.15207 (2020)
Longpre, S., Lu, Y., Daiber, J.: Mkqa: a linguistically diverse benchmark for multilingual open domain question answering. Trans. Assoc. Comput. Ling. 9, 1389–1406 (2021)
Google Scholar
Malhas, R., Elsayed, T.: Ayatec: building a reusable verse-based test collection for Arabic question answering on the holy Qur’an. ACM Trans. Asian Low-Res. Lang. Inform. Process. (TALLIP) 19(6), 1–21 (2020)
Article Google Scholar
Mozannar, H., Hajal, K.E., Maamary, E., Hajj, H.: Neural arabic question answering. arXiv preprint arXiv:1906.05394 (2019)
OpenAI: Gpt-4 technical report (2023)
Google Scholar
Ouahrani, L., Bennouar, D.: Ar-asag an arabic dataset for automatic short answer grading evaluation. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 2634–2643 (2020)
Google Scholar
Peñas, A., Hovy, E., Forner, P., Rodrigo, Á., Sutcliffe, R., Morante, R.: QA4MRE 2011-2013: overview of question answering for machine reading evaluation. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 303–320. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40802-1_29
Chapter Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 (2016)
Roy, K., Goel, A., Goyal, P.: Effectiveness of data augmentation to identify relevant reviews for product question answering. In: Companion Proceedings of the Web Conference 2022, pp. 298–301 (2022)
Google Scholar
Siu, S.C.: Chatgpt and gpt-4 for professional translators: Exploring the potential of large language models in translation. Available at SSRN 4448091 (2023)
Google Scholar
Trigui, O., Belguith, L.H., Rosso, P.: Defarabicqa: Arabic definition question answering system. In: Workshop on Language Resources and Human Language Technologies for Semitic Languages, 7th LREC, Valletta, Malta, pp. 40–45 (2010)
Google Scholar
Trigui, O., Belguith, L.H., Rosso, P., Amor, H.B., Gafsaoui, B.: Arabic QA4MRE at CLEF 2012: Arabic question answering for machine reading evaluation. In: Forner, P., Karlgren, J., Womser-Hacker, C. (eds.) CLEF 2012 Evaluation Labs and Workshop, Online Working Notes, Rome, Italy, 17–20 September 2012. CEUR Workshop Proceedings, vol. 1178. CEUR-WS.org (2012). http://ceur-ws.org/Vol-1178/CLEF2012wn-QA4MRE-TriguiEt2012.pdf
Yang, W., Xie, Y., Tan, L., Xiong, K., Li, M., Lin, J.: Data augmentation for bert fine-tuning in open-domain question answering. arXiv preprint arXiv:1904.06652 (2019)

Download references

Author information

Authors and Affiliations

Faculty of Sciences of Tunis, University of Tunis El Manar, Tunis, Tunisia
Yassine Saoudi
Higher Institute of Arts and Multimedia Manouba, University of Manouba, Manouba, Tunisia
Mohamed Mohsen Gammoudi

Authors

Yassine Saoudi
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Mohsen Gammoudi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yassine Saoudi .

Editor information

Editors and Affiliations

School of Automation, Central South University, Changsha, China
Biao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Long Cheng
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China
Zheng-Guang Wu
School of Automation, Guangdong University of Technology, Guangzhou, China
Hongyi Li
School of Electrical Engineering and Telecommunications, UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saoudi, Y., Gammoudi, M.M. (2024). A Comprehensive Review of Arabic Question Answering Datasets. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1961. Springer, Singapore. https://doi.org/10.1007/978-981-99-8126-7_22

Download citation

DOI: https://doi.org/10.1007/978-981-99-8126-7_22
Published: 13 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8125-0
Online ISBN: 978-981-99-8126-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Comprehensive Review of Arabic Question Answering Datasets