Arabic Question Answering: Systems, Resources, Tools, and Future Trends
- 360 Downloads
- 9 Citations
Abstract
Arabic is the 6th most wide-spread natural language in the world with more than 350 million native speakers. Arabic question answering systems are gaining great importance due to the increasing amounts of Arabic content on the Internet and the increasing demand for information that regular information retrieval techniques cannot satisfy. In spite of the importance of Arabic question answering, there is no review that covers Arabic question answering systems, tools, resources, and test-sets so far, which was the motivation for this work. In this survey, different Arabic question answering systems are demonstrated and analyzed and the main question answering tasks like question analysis, passage retrieval, and answer extraction are explored. The main difficulties of modern standard Arabic and how these difficulties are tamed and classified are also explained. Arabic question answering evaluation metrics, test-sets, and language resources are reviewed, and future trends are also highlighted to guide new research in this area. This survey provides guidance for new research in Arabic question answering to get up-to-date knowledge about the state-of-the-art approaches in this area. It also demonstrates the tools created and used by researchers to build an Arabic question answering system.
Keywords
QA Factoid questions QA4MRE Question analysis Passage retrieval Answer extraction Answer validation Test-sets Evaluation Metrics Language resources NLP Information retrieval Stemming Corpus NER Stemming Lemmatization Morphological analysis Part-of-speech tagging Diacritization Overview Review SurveyPreview
Unable to display preview. Download preview PDF.
References
- 1.Abdelbaki, H.; Shaheen, M.; Badawy, O.: ARQA high-performance arabic question answering system. In: Proceedings of Arabic Language Technology International Conference (ALTIC) (2011)Google Scholar
- 2.Abdelrahman, S.; Elarnaoty, M.; Magdy, M.; Fahmy, A.: Integrated machine learning techniques for Arabic named entity recognition. IJCSI 1 (2010)Google Scholar
- 3.Abouenour, L.; El Hassani, S.; Yazidy, T.; Bouzouba, K.; Hamdani, A.: Building an Arabic morphological analyzer as part of an open Arabic NLP platform. In: The Language Resources and Evaluation Conference (LREC), Marrakech, Morocco, 31st May (2008)Google Scholar
- 4.Abouenour, L.; Bouzoubaa, K.; Rosso, P.: Three-level approach for passage retrieval in Arabic question/answering systems. In: Proc. of the 3rd International Conference on Arabic Language Processing CITALA2009, Rabat, Morocco (2009)Google Scholar
- 5.Abouenour, L.; Bouzouba, K.; Rosso, P.: An Evaluated Semantic Query Expansion and Structure-Based Approach for Enhancing Arabic Question/Answering (2010)Google Scholar
- 6.Abouenour, L.: On the improvement of passage retrieval in arabic question/answering (Q/A) systems. Natural Lang. Process. Inf. Syst., pp. 336–341 (2011)Google Scholar
- 7.Abouenour, L.; Bouzoubaa, K.; Rosso, P.: IDRAAQ: new arabic question answering system based on query expansion and passage retrieval. In: CLEF 2012 Workshop on Question Answering For Machine Reading Evaluation (QA4MRE) (2012)Google Scholar
- 8.Abuleil, S.; Evens, M.: Discovering Lexical Information by Tagging Arabic Newspaper Text. Workshop on Semantic Language Processing. COLING-ACL ’98, University of Montreal, Montreal, PQ, Canada, Aug. 16 1998, pp. 1–7 (1998)Google Scholar
- 9.Al-Safadi L., Al-Rgebh D., AlOhali W.: A comparison between ontology-based and translation-based semantic search engines for Arabic blogs. Arab. J. Sci. Eng. 38(11), 2985–2992 (2013)CrossRefGoogle Scholar
- 10.Alshalabi R.: Pattern-based Stemmer for finding Arabic roots. Inf. Technol. J. 4(1), 38–43 (2005)CrossRefGoogle Scholar
- 11.Attia, M.; Rashwan, M.; Ragheb, A.; Al-Badrashiny, M.; Al-Basoumy, H.; Abdou, S.: A compact Arabic lexical semantics language resource based on the theory of semantic fields. In: Advances in Natural Language Processing, pp. 65–76. Springer, Berlin, Heidelberg (2008)Google Scholar
- 12.Attia, M.; Rashwan, M.; Al-Badrashiny, M.A.S.A.A.: Fassieh, a semi-automatic visual interactive tool for morphological, PoS-Tags, phonetic, and semantic annotation of Arabic Text Corpora. In: IEEE Transactions on Audio, Speech, and Language Processing, vol. 17(5), pp. 916–925 (2009)Google Scholar
- 13.Awadallah, R.; Rauber, A.: Web-based multiple choice question answering for English and Arabic questions. Adv. Inf. Retr. 515–518 (2006)Google Scholar
- 14.Bekhti S., Rehman A., Al-Harbi M., Saba T.: AQuASys an Arabic question-answering system based on extensive question analysis and answer relevance scoring. Inf. Comput. Int. J. Acad. Res. 3(4), 45–54 (2011)Google Scholar
- 15.Benajiba, Y.; Rosso, P.: ANERsys 2.0: conquering the NER task for the Arabic language by combining the maximum entropy with PoS-tag information. In: Proc. of Workshop on Natural Language-Independent Engineering, IICAI-2007 (2007)Google Scholar
- 16.Benajiba, Y.; Rosso, P.; Lyhyaoui, A.: Implementation of the ArabiQA question answering system’s components. In: Proc. Workshop on Arabic Natural Language Processing, 2nd Information Communication Technologies Int. Symposium, ICTIS-2007, Fez, Morroco, April, pp. 3–5 (2007)Google Scholar
- 17.Benajiba Y., Rosso P.: Arabic question answering. Diploma of advanced studies. Technical University of Valencia, Spain (2007)Google Scholar
- 18.Benajiba, Y.; Rosso, P.; BenedíRuiz, J.: ANERsys: an Arabic named entity recognition system based on maximum entropy. Comput. Linguist. Intell. Text Process. 143–153 (2007)Google Scholar
- 19.Benajiba, Y.; Rosso, P.; Gómez Soriano, J.: Adapting the JIRS passage retrieval system to the Arabic language. Comput. Linguist. Intell. Text Process. 530–541 (2007)Google Scholar
- 20.Benajiba, Y.; Rosso, P.: Arabic named entity recognition using conditional random fields. In: Proc. of Workshop on HLT NLP within the Arabic World, LREC, vol. 8, pp. 143–153 (2008)Google Scholar
- 21.Bhaskar, P.; Pakray, P.; Banerjee, S.; Banerjee, S.; Bandyopadhyay, S.; Gelbukh, A.: Question answering system for QA4MRE@CLEF 2012. In: CLEF 2012 Workshop on Question Answering For Machine Reading Evaluation (QA4MRE) (2012)Google Scholar
- 22.Bouzouba, K.; Kabbaj, A.: An Integrated Development Platform for Arabic Language Processing. ISCAL-07.s (2007)Google Scholar
- 23.Brini, W.; Ellouze, M.; Trigui, O.; Mesfar, S.; Belguith, H.L.; Rosso, P.: Factoid and Definitional Arabic Question Answering System. Post-Proc. NOOJ-2009, Tozeur, Tunisia, June, 8–10 (2009)Google Scholar
- 24.Brini, W.; Ellouze, M.; Mesfar, S.; Belguith, L.H.: An Arabic question-answering system for factoid questions. In: IEEE International Conference on Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009, pp. 1–7 (2009)Google Scholar
- 25.Buckwalter, T.: Buckwalter Arabic Morphological Analyzer Version 1.0. Linguistic Data Consortium, catalog number LDC2002L49, ISBN 1-58563-257-0 (2002)Google Scholar
- 26.Buscaldi, D.; Gómez, J.M.; Rosso, P.; Sanchis, E.: The UPV at QA@ CLEF 2006. In: Working Notes for the CLEF 2006 Workshop (2006)Google Scholar
- 27.Diab, M.: Second generation AMIRA tools for Arabic processing: fast and robust tokenization, PoS tagging, and base phrase chunking. In: Proceedings of the second international conference on arabic language resources and tools, pp. 285–288 (2009)Google Scholar
- 28.Elghamry, K.; Al-Sabbagh, R.; El-Zeiny, N.: Cue-based bootstrapping of Arabic semantic features. JADT 2008: 9es Journées internationales d’Analyse statistique des Données Textuelles (2008)Google Scholar
- 29.Elkateb, S.; Black, W.; Vossen, P.; Farwell, D.; Rodríguez, H.; Pease, A.; Alkhalifa, M.: Arabic WordNet and the challenges of Arabic. In: Proceedings of Arabic NLP/MT Conference, London, UK (2006)Google Scholar
- 30.Ferrucci D., Brown E., Chu-Carroll J., Fan J., Gondek D., Kalyanpur A.A., Welty C., Welty C.: Building Watson: an overview of the DeepQA project. AI Mag. 31(3), 59–79 (2010)Google Scholar
- 31.Gomez, J.M.; Montes-Gomez, M.; Sanchis, E.; Villasenor-Pineda, L.; Rosso, P.: Language independent passage retrieval for question answering. In: Fourth Mexican International Conference on Artificial IntelligenceMICAI 2005, Lecture Notes in Computer Science, pp. 816–823, Monterrey, Mexico, 2005. Springer, Berlin (2005)Google Scholar
- 32.Habash, N., Rambow, O., Roth, R.: MADA+TOKAN: a toolkit for Arabic tokenization, diacritization, morphological disambiguation, pos tagging, stemming and lemmatization. In: Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo, Egypt, pp. 102–109 (2009)Google Scholar
- 33.Hammo, B.; Abu-Salem, H.; Lytinen, S.: QARAB: a question answering system to support the Arabic language. In: Proceedings of the ACL-02 workshop on computational approaches to semitic languages, pp. 1–11. Association for Computational Linguistics (2002)Google Scholar
- 34.Hammo B., Abuleil S., Lytinen S., Evens M.: Experimenting with a question answering system for the Arabic language. Comput. Human. 38(4), 397–415 (2004)CrossRefGoogle Scholar
- 35.Harmanani, H.M.; Keirouz, W.T.; Raheel, S.: A rule-based extensible Stemmer for information retrieval with application to Arabic. Int. Arab. J. Inf. Technol. 3(3), 265–272Google Scholar
- 36.Hatcher, E.; Gospodnetic, O.; McCandless, M.: Lucene in action (2004)Google Scholar
- 37.Kadri, Y.; Nie, J.Y.: Effective Stemming for Arabic information retrieval. In: Proceedings of the Challenge of Arabic for NLP/MT Conference, Londres, Royaume-Uni (2006)Google Scholar
- 38.Kanaan G., Hammouri A., Al-Shalabi R., Swalha M.: A new question answering system for the Arabic language. Am. J. Appl. Sci. 6(4), 797–805 (2009)CrossRefGoogle Scholar
- 39.Khoja, S.; Garside, R.: Stemming Arabic text. Computing Department, Lancaster University, Lancaster, UK (1999)Google Scholar
- 40.Kontos, J.; Malagardi, I.O.A.N.N.A.; Peros, J.O.H.N.: Question answering and rhetoric analysis of biomedical texts in the aroma system. In: Proceedings of the 7th HERCMA: Hellenic European conference in computer mathematics and its applications, Athens, Greece (2005)Google Scholar
- 41.Larkey, L.S.; Connell, M.E.: Arabic Information Retrieval at UMass in TREC-10. Massachusetts Univ Amherst Center for Intelligent Information Retrieval (2006)Google Scholar
- 42.Larkey, L.S.; Ballesteros, L.; Connell, M.E.: Light stemming for Arabic information retrieval. In: Arabic Computational Morphology, pp. 221–243. Springer, Netherlands (2007)Google Scholar
- 43.Laurent, D.; Séguéla, P.; Nègre, S.: QA better than IR? In: Proceedings of the Workshop on Multilingual Question Answering, pp. 1–8. Association for Computational Linguistics (2006)Google Scholar
- 44.Maamouri, M.; Bies, A.; Buckwalter, T.; Mekki, W.: The Penn Arabic Treebank: building a large-scale annotated Arabic Corpus. In: NEMLAR Conference on Arabic Language Resources and Tools, pp. 102–109 (2004)Google Scholar
- 45.Manning, C.D.; Raghavan, P.; Schütze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)Google Scholar
- 46.Mesfar, S.: Morpho-Syntactic Analysis and Automatic Recognition of Named Entities in Standard Arabic. University of Franche-account, Academic (2008)Google Scholar
- 47.Minock, M.: Where are the ‘killer applications’ of restricted domain question answering. In: Proceedings of the IJCAI Workshop on Knowledge Reasoning in Question Answering, p. 4 (2005)Google Scholar
- 48.Mohammed F.A., Nasser K., Harb H.M.: A Knowledge Based Arabic Question Answering System (AQAS). ACM SIGART Bull. 4(4), 21–30 (1993)CrossRefGoogle Scholar
- 49.Moldovan, D.; Clark, C.; Bowden, M.: Lymba’s PowerAnswer 4 in TREC 2007. In: Proceedings of the Sixteenth Text REtrieval Conference (TREC 2007). Gaithersburg (2007)Google Scholar
- 50.Molla D., Schwitter R., Rinaldi F., Dowdall J., Hess M.: Extrans: extracting answers from technical texts. IEEE Intell. Syst. 18(4), 12–17 (2003)CrossRefGoogle Scholar
- 51.O’Steen, D.; Breeden, D.: Named Entity Recognition in Arabic: A Combined Approach (2009)Google Scholar
- 52.Pelzer, B.; Glöckner, I.; Dong, T.: Loganswer in question answering Forums. In: 3rd International Conference on Agents and Artificial Intelligence (ICAART 2011), SciTePress, pp. 492–497 (2011)Google Scholar
- 53.Penas, A.; Rodrigo, A.; del Rosal, J.: A simple measure to assess non-response. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 1415–1424 (2011)Google Scholar
- 54.Penas, A.; Hovy, E.; Forner, P.; Rodrigo, A.; Sutcliffe, R.; Sporleder, C.; Forascu, C.; Benajiba, Y.; Osenova, P.: Overview of QA4MRE at CLEF 2012: question answering for machine reading evaluation. In: CLEF 2012 Workshop on Question Answering For Machine Reading Evaluation (QA4MRE) (2012)Google Scholar
- 55.Rashwan M.A., Al-Badrashiny M.A.S.A.A., Attia M., Abdou S.M., Rafea A.: A stochastic Arabic diacritizer based on a hybrid of factorized and unfactorized textual features. IEEE Transactions on Audio Speech Lang. Process. 19(1), 166–175 (2011)CrossRefGoogle Scholar
- 56.Rosso, P.; Lyhyaoui, A.; Peñarrubia, J.; y Gómez, M.M.; Benajiba, Y.; Raissouni, N.: Arabic-English question answering. In: Proc. Symposium on Information Communication Technologies Int., Tetuan, Morocco (2005)Google Scholar
- 57.Rosso, P.; Benajiba, Y.; Lyhyaoui, A.: Towards an Arabic question answering system. In: Proc. 4th Conf. on Scientific Research Outlook Technology Development in the Arab world, SROIV, Damascus, Syria, pp. 11–14 (2006)Google Scholar
- 58.Sidrine, S.; Souteh, Y.; Bouzoubaa, K.; Loukili, T.: SAFAR: vers une Plateforme Ouverte pour le Traitement Automatique de la Langue Arabe. In: Proc of the 6th Intelligent Systems: Theory and Applications SITA 2010 Conference, Rabat, Morocco (2010)Google Scholar
- 59.Silberztein, M.: NooJ: a linguistic annotation system for corpus processing. In: Proceedings of HLT/EMNLP on Interactive Demonstrations, pp. 10–11. Association for Computational Linguistics (2005)Google Scholar
- 60.Smucker, M.D.; Allan, J.; Dachev, B.: Human question answering performance using an interactive information retrieval system. Center for Intelligent Information Retrieval Technical Report IR-655, University of Massachusetts (2008)Google Scholar
- 61.Taghva, K.; Elkhoury, R.; Coombs, J.: Arabic Stemming without a root dictionary. In: IEEE International Conference on Information Technology: Coding and Computing, 2005. ITCC 2005, vol. 1, pp. 152–157 (2005)Google Scholar
- 62.Trigui, O.; Belguith, H.L.; Rosso, P.: DefArabicQA: Arabic definition question answering system. In: Workshop on Language Resources and Human Language Technologies for Semitic Languages, 7th LREC, Valletta, Malta, pp. 40–45 (2010)Google Scholar
- 63.Trigui, O.; Belguith, L.H.; Rosso, P.; Amor, H.B.; Gafsaoui, B.: Arabic QA4MRE at CLEF 2012: Arabic question answering for machine reading evaluation. In: CLEF 2012 Workshop on Question Answering For Machine Reading Evaluation (QA4MRE) (2012)Google Scholar
- 64.Voorhees, E.M.: Question answering in TREC. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, pp. 535–537. ACM, New York (2001)Google Scholar
- 65.Voorhees, E.M.; Harman, D.: Overview of TREC 2001. In: Proceedings of TREC, pp. 1–15 (2001)Google Scholar
- 66.Zaghouani, W.; Pouliquen, B.; Ebrahim, M.; Steinberger, R.: Adapting a resource-light highly multilingual named entity recognition system to Arabic. In: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10), pp. 563–567 (2010)Google Scholar