Advertisement

Arabian Journal for Science and Engineering

, Volume 39, Issue 6, pp 4541–4564 | Cite as

Arabic Question Answering: Systems, Resources, Tools, and Future Trends

  • Mohamed Shaheen
  • Ahmed Magdy EzzeldinEmail author
Review Article - Computer Engineering and Computer Science

Abstract

Arabic is the 6th most wide-spread natural language in the world with more than 350 million native speakers. Arabic question answering systems are gaining great importance due to the increasing amounts of Arabic content on the Internet and the increasing demand for information that regular information retrieval techniques cannot satisfy. In spite of the importance of Arabic question answering, there is no review that covers Arabic question answering systems, tools, resources, and test-sets so far, which was the motivation for this work. In this survey, different Arabic question answering systems are demonstrated and analyzed and the main question answering tasks like question analysis, passage retrieval, and answer extraction are explored. The main difficulties of modern standard Arabic and how these difficulties are tamed and classified are also explained. Arabic question answering evaluation metrics, test-sets, and language resources are reviewed, and future trends are also highlighted to guide new research in this area. This survey provides guidance for new research in Arabic question answering to get up-to-date knowledge about the state-of-the-art approaches in this area. It also demonstrates the tools created and used by researchers to build an Arabic question answering system.

Keywords

QA Factoid questions QA4MRE Question analysis Passage retrieval Answer extraction Answer validation Test-sets Evaluation Metrics Language resources NLP Information retrieval Stemming Corpus NER Stemming Lemmatization Morphological analysis Part-of-speech tagging Diacritization Overview Review Survey 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abdelbaki, H.; Shaheen, M.; Badawy, O.: ARQA high-performance arabic question answering system. In: Proceedings of Arabic Language Technology International Conference (ALTIC) (2011)Google Scholar
  2. 2.
    Abdelrahman, S.; Elarnaoty, M.; Magdy, M.; Fahmy, A.: Integrated machine learning techniques for Arabic named entity recognition. IJCSI 1 (2010)Google Scholar
  3. 3.
    Abouenour, L.; El Hassani, S.; Yazidy, T.; Bouzouba, K.; Hamdani, A.: Building an Arabic morphological analyzer as part of an open Arabic NLP platform. In: The Language Resources and Evaluation Conference (LREC), Marrakech, Morocco, 31st May (2008)Google Scholar
  4. 4.
    Abouenour, L.; Bouzoubaa, K.; Rosso, P.: Three-level approach for passage retrieval in Arabic question/answering systems. In: Proc. of the 3rd International Conference on Arabic Language Processing CITALA2009, Rabat, Morocco (2009)Google Scholar
  5. 5.
    Abouenour, L.; Bouzouba, K.; Rosso, P.: An Evaluated Semantic Query Expansion and Structure-Based Approach for Enhancing Arabic Question/Answering (2010)Google Scholar
  6. 6.
    Abouenour, L.: On the improvement of passage retrieval in arabic question/answering (Q/A) systems. Natural Lang. Process. Inf. Syst., pp. 336–341 (2011)Google Scholar
  7. 7.
    Abouenour, L.; Bouzoubaa, K.; Rosso, P.: IDRAAQ: new arabic question answering system based on query expansion and passage retrieval. In: CLEF 2012 Workshop on Question Answering For Machine Reading Evaluation (QA4MRE) (2012)Google Scholar
  8. 8.
    Abuleil, S.; Evens, M.: Discovering Lexical Information by Tagging Arabic Newspaper Text. Workshop on Semantic Language Processing. COLING-ACL ’98, University of Montreal, Montreal, PQ, Canada, Aug. 16 1998, pp. 1–7 (1998)Google Scholar
  9. 9.
    Al-Safadi L., Al-Rgebh D., AlOhali W.: A comparison between ontology-based and translation-based semantic search engines for Arabic blogs. Arab. J. Sci. Eng. 38(11), 2985–2992 (2013)CrossRefGoogle Scholar
  10. 10.
    Alshalabi R.: Pattern-based Stemmer for finding Arabic roots. Inf. Technol. J. 4(1), 38–43 (2005)CrossRefGoogle Scholar
  11. 11.
    Attia, M.; Rashwan, M.; Ragheb, A.; Al-Badrashiny, M.; Al-Basoumy, H.; Abdou, S.: A compact Arabic lexical semantics language resource based on the theory of semantic fields. In: Advances in Natural Language Processing, pp. 65–76. Springer, Berlin, Heidelberg (2008)Google Scholar
  12. 12.
    Attia, M.; Rashwan, M.; Al-Badrashiny, M.A.S.A.A.: Fassieh, a semi-automatic visual interactive tool for morphological, PoS-Tags, phonetic, and semantic annotation of Arabic Text Corpora. In: IEEE Transactions on Audio, Speech, and Language Processing, vol. 17(5), pp. 916–925 (2009)Google Scholar
  13. 13.
    Awadallah, R.; Rauber, A.: Web-based multiple choice question answering for English and Arabic questions. Adv. Inf. Retr. 515–518 (2006)Google Scholar
  14. 14.
    Bekhti S., Rehman A., Al-Harbi M., Saba T.: AQuASys an Arabic question-answering system based on extensive question analysis and answer relevance scoring. Inf. Comput. Int. J. Acad. Res. 3(4), 45–54 (2011)Google Scholar
  15. 15.
    Benajiba, Y.; Rosso, P.: ANERsys 2.0: conquering the NER task for the Arabic language by combining the maximum entropy with PoS-tag information. In: Proc. of Workshop on Natural Language-Independent Engineering, IICAI-2007 (2007)Google Scholar
  16. 16.
    Benajiba, Y.; Rosso, P.; Lyhyaoui, A.: Implementation of the ArabiQA question answering system’s components. In: Proc. Workshop on Arabic Natural Language Processing, 2nd Information Communication Technologies Int. Symposium, ICTIS-2007, Fez, Morroco, April, pp. 3–5 (2007)Google Scholar
  17. 17.
    Benajiba Y., Rosso P.: Arabic question answering. Diploma of advanced studies. Technical University of Valencia, Spain (2007)Google Scholar
  18. 18.
    Benajiba, Y.; Rosso, P.; BenedíRuiz, J.: ANERsys: an Arabic named entity recognition system based on maximum entropy. Comput. Linguist. Intell. Text Process. 143–153 (2007)Google Scholar
  19. 19.
    Benajiba, Y.; Rosso, P.; Gómez Soriano, J.: Adapting the JIRS passage retrieval system to the Arabic language. Comput. Linguist. Intell. Text Process. 530–541 (2007)Google Scholar
  20. 20.
    Benajiba, Y.; Rosso, P.: Arabic named entity recognition using conditional random fields. In: Proc. of Workshop on HLT NLP within the Arabic World, LREC, vol. 8, pp. 143–153 (2008)Google Scholar
  21. 21.
    Bhaskar, P.; Pakray, P.; Banerjee, S.; Banerjee, S.; Bandyopadhyay, S.; Gelbukh, A.: Question answering system for QA4MRE@CLEF 2012. In: CLEF 2012 Workshop on Question Answering For Machine Reading Evaluation (QA4MRE) (2012)Google Scholar
  22. 22.
    Bouzouba, K.; Kabbaj, A.: An Integrated Development Platform for Arabic Language Processing. ISCAL-07.s (2007)Google Scholar
  23. 23.
    Brini, W.; Ellouze, M.; Trigui, O.; Mesfar, S.; Belguith, H.L.; Rosso, P.: Factoid and Definitional Arabic Question Answering System. Post-Proc. NOOJ-2009, Tozeur, Tunisia, June, 8–10 (2009)Google Scholar
  24. 24.
    Brini, W.; Ellouze, M.; Mesfar, S.; Belguith, L.H.: An Arabic question-answering system for factoid questions. In: IEEE International Conference on Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009, pp. 1–7 (2009)Google Scholar
  25. 25.
    Buckwalter, T.: Buckwalter Arabic Morphological Analyzer Version 1.0. Linguistic Data Consortium, catalog number LDC2002L49, ISBN 1-58563-257-0 (2002)Google Scholar
  26. 26.
    Buscaldi, D.; Gómez, J.M.; Rosso, P.; Sanchis, E.: The UPV at QA@ CLEF 2006. In: Working Notes for the CLEF 2006 Workshop (2006)Google Scholar
  27. 27.
    Diab, M.: Second generation AMIRA tools for Arabic processing: fast and robust tokenization, PoS tagging, and base phrase chunking. In: Proceedings of the second international conference on arabic language resources and tools, pp. 285–288 (2009)Google Scholar
  28. 28.
    Elghamry, K.; Al-Sabbagh, R.; El-Zeiny, N.: Cue-based bootstrapping of Arabic semantic features. JADT 2008: 9es Journées internationales d’Analyse statistique des Données Textuelles (2008)Google Scholar
  29. 29.
    Elkateb, S.; Black, W.; Vossen, P.; Farwell, D.; Rodríguez, H.; Pease, A.; Alkhalifa, M.: Arabic WordNet and the challenges of Arabic. In: Proceedings of Arabic NLP/MT Conference, London, UK (2006)Google Scholar
  30. 30.
    Ferrucci D., Brown E., Chu-Carroll J., Fan J., Gondek D., Kalyanpur A.A., Welty C., Welty C.: Building Watson: an overview of the DeepQA project. AI Mag. 31(3), 59–79 (2010)Google Scholar
  31. 31.
    Gomez, J.M.; Montes-Gomez, M.; Sanchis, E.; Villasenor-Pineda, L.; Rosso, P.: Language independent passage retrieval for question answering. In: Fourth Mexican International Conference on Artificial IntelligenceMICAI 2005, Lecture Notes in Computer Science, pp. 816–823, Monterrey, Mexico, 2005. Springer, Berlin (2005)Google Scholar
  32. 32.
    Habash, N., Rambow, O., Roth, R.: MADA+TOKAN: a toolkit for Arabic tokenization, diacritization, morphological disambiguation, pos tagging, stemming and lemmatization. In: Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo, Egypt, pp. 102–109 (2009)Google Scholar
  33. 33.
    Hammo, B.; Abu-Salem, H.; Lytinen, S.: QARAB: a question answering system to support the Arabic language. In: Proceedings of the ACL-02 workshop on computational approaches to semitic languages, pp. 1–11. Association for Computational Linguistics (2002)Google Scholar
  34. 34.
    Hammo B., Abuleil S., Lytinen S., Evens M.: Experimenting with a question answering system for the Arabic language. Comput. Human. 38(4), 397–415 (2004)CrossRefGoogle Scholar
  35. 35.
    Harmanani, H.M.; Keirouz, W.T.; Raheel, S.: A rule-based extensible Stemmer for information retrieval with application to Arabic. Int. Arab. J. Inf. Technol. 3(3), 265–272Google Scholar
  36. 36.
    Hatcher, E.; Gospodnetic, O.; McCandless, M.: Lucene in action (2004)Google Scholar
  37. 37.
    Kadri, Y.; Nie, J.Y.: Effective Stemming for Arabic information retrieval. In: Proceedings of the Challenge of Arabic for NLP/MT Conference, Londres, Royaume-Uni (2006)Google Scholar
  38. 38.
    Kanaan G., Hammouri A., Al-Shalabi R., Swalha M.: A new question answering system for the Arabic language. Am. J. Appl. Sci. 6(4), 797–805 (2009)CrossRefGoogle Scholar
  39. 39.
    Khoja, S.; Garside, R.: Stemming Arabic text. Computing Department, Lancaster University, Lancaster, UK (1999)Google Scholar
  40. 40.
    Kontos, J.; Malagardi, I.O.A.N.N.A.; Peros, J.O.H.N.: Question answering and rhetoric analysis of biomedical texts in the aroma system. In: Proceedings of the 7th HERCMA: Hellenic European conference in computer mathematics and its applications, Athens, Greece (2005)Google Scholar
  41. 41.
    Larkey, L.S.; Connell, M.E.: Arabic Information Retrieval at UMass in TREC-10. Massachusetts Univ Amherst Center for Intelligent Information Retrieval (2006)Google Scholar
  42. 42.
    Larkey, L.S.; Ballesteros, L.; Connell, M.E.: Light stemming for Arabic information retrieval. In: Arabic Computational Morphology, pp. 221–243. Springer, Netherlands (2007)Google Scholar
  43. 43.
    Laurent, D.; Séguéla, P.; Nègre, S.: QA better than IR? In: Proceedings of the Workshop on Multilingual Question Answering, pp. 1–8. Association for Computational Linguistics (2006)Google Scholar
  44. 44.
    Maamouri, M.; Bies, A.; Buckwalter, T.; Mekki, W.: The Penn Arabic Treebank: building a large-scale annotated Arabic Corpus. In: NEMLAR Conference on Arabic Language Resources and Tools, pp. 102–109 (2004)Google Scholar
  45. 45.
    Manning, C.D.; Raghavan, P.; Schütze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)Google Scholar
  46. 46.
    Mesfar, S.: Morpho-Syntactic Analysis and Automatic Recognition of Named Entities in Standard Arabic. University of Franche-account, Academic (2008)Google Scholar
  47. 47.
    Minock, M.: Where are the ‘killer applications’ of restricted domain question answering. In: Proceedings of the IJCAI Workshop on Knowledge Reasoning in Question Answering, p. 4 (2005)Google Scholar
  48. 48.
    Mohammed F.A., Nasser K., Harb H.M.: A Knowledge Based Arabic Question Answering System (AQAS). ACM SIGART Bull. 4(4), 21–30 (1993)CrossRefGoogle Scholar
  49. 49.
    Moldovan, D.; Clark, C.; Bowden, M.: Lymba’s PowerAnswer 4 in TREC 2007. In: Proceedings of the Sixteenth Text REtrieval Conference (TREC 2007). Gaithersburg (2007)Google Scholar
  50. 50.
    Molla D., Schwitter R., Rinaldi F., Dowdall J., Hess M.: Extrans: extracting answers from technical texts. IEEE Intell. Syst. 18(4), 12–17 (2003)CrossRefGoogle Scholar
  51. 51.
    O’Steen, D.; Breeden, D.: Named Entity Recognition in Arabic: A Combined Approach (2009)Google Scholar
  52. 52.
    Pelzer, B.; Glöckner, I.; Dong, T.: Loganswer in question answering Forums. In: 3rd International Conference on Agents and Artificial Intelligence (ICAART 2011), SciTePress, pp. 492–497 (2011)Google Scholar
  53. 53.
    Penas, A.; Rodrigo, A.; del Rosal, J.: A simple measure to assess non-response. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 1415–1424 (2011)Google Scholar
  54. 54.
    Penas, A.; Hovy, E.; Forner, P.; Rodrigo, A.; Sutcliffe, R.; Sporleder, C.; Forascu, C.; Benajiba, Y.; Osenova, P.: Overview of QA4MRE at CLEF 2012: question answering for machine reading evaluation. In: CLEF 2012 Workshop on Question Answering For Machine Reading Evaluation (QA4MRE) (2012)Google Scholar
  55. 55.
    Rashwan M.A., Al-Badrashiny M.A.S.A.A., Attia M., Abdou S.M., Rafea A.: A stochastic Arabic diacritizer based on a hybrid of factorized and unfactorized textual features. IEEE Transactions on Audio Speech Lang. Process. 19(1), 166–175 (2011)CrossRefGoogle Scholar
  56. 56.
    Rosso, P.; Lyhyaoui, A.; Peñarrubia, J.; y Gómez, M.M.; Benajiba, Y.; Raissouni, N.: Arabic-English question answering. In: Proc. Symposium on Information Communication Technologies Int., Tetuan, Morocco (2005)Google Scholar
  57. 57.
    Rosso, P.; Benajiba, Y.; Lyhyaoui, A.: Towards an Arabic question answering system. In: Proc. 4th Conf. on Scientific Research Outlook Technology Development in the Arab world, SROIV, Damascus, Syria, pp. 11–14 (2006)Google Scholar
  58. 58.
    Sidrine, S.; Souteh, Y.; Bouzoubaa, K.; Loukili, T.: SAFAR: vers une Plateforme Ouverte pour le Traitement Automatique de la Langue Arabe. In: Proc of the 6th Intelligent Systems: Theory and Applications SITA 2010 Conference, Rabat, Morocco (2010)Google Scholar
  59. 59.
    Silberztein, M.: NooJ: a linguistic annotation system for corpus processing. In: Proceedings of HLT/EMNLP on Interactive Demonstrations, pp. 10–11. Association for Computational Linguistics (2005)Google Scholar
  60. 60.
    Smucker, M.D.; Allan, J.; Dachev, B.: Human question answering performance using an interactive information retrieval system. Center for Intelligent Information Retrieval Technical Report IR-655, University of Massachusetts (2008)Google Scholar
  61. 61.
    Taghva, K.; Elkhoury, R.; Coombs, J.: Arabic Stemming without a root dictionary. In: IEEE International Conference on Information Technology: Coding and Computing, 2005. ITCC 2005, vol. 1, pp. 152–157 (2005)Google Scholar
  62. 62.
    Trigui, O.; Belguith, H.L.; Rosso, P.: DefArabicQA: Arabic definition question answering system. In: Workshop on Language Resources and Human Language Technologies for Semitic Languages, 7th LREC, Valletta, Malta, pp. 40–45 (2010)Google Scholar
  63. 63.
    Trigui, O.; Belguith, L.H.; Rosso, P.; Amor, H.B.; Gafsaoui, B.: Arabic QA4MRE at CLEF 2012: Arabic question answering for machine reading evaluation. In: CLEF 2012 Workshop on Question Answering For Machine Reading Evaluation (QA4MRE) (2012)Google Scholar
  64. 64.
    Voorhees, E.M.: Question answering in TREC. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, pp. 535–537. ACM, New York (2001)Google Scholar
  65. 65.
    Voorhees, E.M.; Harman, D.: Overview of TREC 2001. In: Proceedings of TREC, pp. 1–15 (2001)Google Scholar
  66. 66.
    Zaghouani, W.; Pouliquen, B.; Ebrahim, M.; Steinberger, R.: Adapting a resource-light highly multilingual named entity recognition system to Arabic. In: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10), pp. 563–567 (2010)Google Scholar

Copyright information

© King Fahd University of Petroleum and Minerals 2014

Authors and Affiliations

  1. 1.College of Computing and Information TechnologyArab Academy for Science, Technology and Maritime TransportAlexandriaEgypt

Personalised recommendations