Advertisement

Adapting the JIRS Passage Retrieval System to the Arabic Language

  • Yassine Benajiba
  • Paolo Rosso
  • José Manuel Gómez Soriano
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4394)

Abstract

The need of having a Passage Retrieval (PR) system for Arabic texts is due essentially to our aim to build an Arabic Question Answering (QA) system in our research team. We have chosen working on the PR system to be our first step to pursue our aim because being the core component and its quality will affect directly the performance of the QA system. JAVA Information Retrieval System (JIRS) is a PR QA-oriented system, multi-platform, open source and free to use. JIRS uses an n-gram model and it is language-independent. It separates language configuration files to make easier its adaptation to any language. In this paper, we report the different challenges when adapting the JIRS to the Arabic language.In order to evaluate JIRS on Arabic, we had to develop an Arabic test-bed using the multilingual CLEF QA one as guideline. We also report the results obtained in our experiments where we retrieved Arabic passages with JIRS first without any text preprocessing and second performing a prior light-stemming on the documents of the test-bed. The preliminary results show that it is possible to obtain a first Arabic passage retrieval system adapting JIRS on pre-processed text with a light-stemmer.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aceves-Pérez, R.M., Villaseñor-Pineda, L., Montes-y-Gómez, M.: Using N-gram Models to Combine Query Translations in Cross-Language Question Answering. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, Springer, Heidelberg (2006)CrossRefGoogle Scholar
  2. 2.
    Adriani, M., Rinawati: Finding Answers to Indonesian Questions from English Documents. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 510–516. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  3. 3.
    Amaral, C., Figueira, H., Martins, A., Mendes, A., Mendes, P., Pinto, C.: Priberams Question Answering System for Poteguese. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 410–419. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Bouma, G., Mur, J., Van Noord, G., Van Der Plas, L., Tiedemann, J.: Question Answering for Dutch Using Dependency Relations. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    Burger, J., Cardie, C., Chaudhri, V., Gaizauskas, R., Harabagiu, S., Israel, D., Jacquemin, C., Lin, C., Maiorano, S., Miller, G., Moldovan11, D., Ogden, B., Prager, J., Riloff, E., Singhal, A., Shrihari, R., Strzalkowski1, T., Voorhees, E., Weishedel, R.: Issues, Tasks and Program Structures to Roadmap Research in Question & Answering (Q&A). Technical report, National Institute of Standards and TechnologyGoogle Scholar
  6. 6.
    Buscaldi, D., Gómez, J.M., Rosso, P., Sanchis, E.: The UPV at QA@CLEF 2006. In: Working Notes for the CLEF 2006 Workshop (2006)Google Scholar
  7. 7.
    Chen, A., Gey, F.C.: Building an Arabic Stemmer for Information Retrieval. In: Proceedings of the TREC 2002, p. 631 (2002)Google Scholar
  8. 8.
    Chu-Carroll, J., Czuba, K., Duboue, P., Prager, J.: IBM’s PIQUANT II in TREC2005. In: The Proceedings of the Fourteenth Text REtrieval Conference (2005)Google Scholar
  9. 9.
    Ferrés, D., Kanaan, S., González, E., Ageno, A., Rodríguez, H., Turmo, J.: The TALP-QA System for Spanish at CLEF 2005. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 400–409. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    Ferrés, D., Rodríguez, H.: TALP at GeoCLEF-2006: Experiments Using JIRS and Lucene with the ADL Feature Type Thesaurus. In: Working Notes for the CLEF 2006 Workshop (2006)Google Scholar
  11. 11.
    Gillard, L., Sitbon, L., Blaudez, E., Bellot, P., El-Béze, M.: The LIA at QA@CLEF-2006. In: Working Notes for the CLEF 2006 Workshop (2006)Google Scholar
  12. 12.
    Gómez, J.M., Buscaldi, D., Bisbal-Asensi, E., Rosso, P., Sanchis, E.: QUASAR, The Question Answering System of the Universidad Politecnica de Valencia. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 439–448. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  13. 13.
    Gómez, J.M., Montes-y-Gómez, M., Sanchis, E., Rosso, P.: A Passage Retrieval System for Multilingual Question Answering. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 443–450. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  14. 14.
    Hammou, B., Abu-salem, H., Lytinen, S., Evens, M.: QARAB: A question answering system to support the Arabic language. In: The Proceedings of the workshop on computational approaches to Semitic languages, ACL, Philadelphia, pp. 55–65 (2002)Google Scholar
  15. 15.
    Harabagiu, S., Moldovan, D., Clark, C., Bowden, M., Hickl, A., Wang, P.: Employing Two Question Answering Systems in TREC-2005. In: The Proceedings of the Fourteenth Text REtrieval Conference (2005)Google Scholar
  16. 16.
    Hartrumpf, S.: Extending Knowledge and Deepening Linguistic Processing for the Question Answering System InSicht. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 361–369. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  17. 17.
    Ittycheriah, A., Franz, M., Zhu, W.-J., Ratnaparkhi, A.: IBM’s Statistical Question Answering System. In: Proceedings of the Ninth Text Retrieval Conference (TREC-2002), pp. 229–234 (2002)Google Scholar
  18. 18.
    Laurent, D., Séguéla, P., Négre, S.: Cross Lingual Question Answering using QRISTAL for CLEF 2006. In: Working Notes for the CLEF 2006 Workshop (2006)Google Scholar
  19. 19.
    Leah, S., Larkey, A.J., Margaret, E., Connell, B.A., Wade, C.: UMass at TREC 2002: Cross Language and Novelty Tracks. In: The Proceedings of the TREC 2002, p. 721 (2002)Google Scholar
  20. 20.
    Lee, G.G., Seo, J., Lee, S., Jung, H., Cho, B.-H., Lee, C., Kwak, B.-K., Cha, J., Kim, D., An, J., Kim, H., Kim, K.: SiteQ: Engineering high performance QA system using lexico-semantic pattern matching and shallow NLP. In: Proceedings of the Tenth Text Retrieval Conference (TREC-2002), pp. 422–451 (2002)Google Scholar
  21. 21.
    Lee, Y., Papineni, K., Roukos, S., Emam, O., Hassan, H.: Language Model based Arabic Word Segmentation. In: The Proceedings of the 41st Annual Meeting on Association for Computational LinguisticsGoogle Scholar
  22. 22.
    Llopis, F., Vicedo, J.L., Ferrandez, A.: Passage Selection to Improve Question Answering. In: Proceedings of the COLING 2002 Workshop on Multilingual Summarization and Question Answering (2002)Google Scholar
  23. 23.
    Mohammed, F.A., Nasser, K., Harb, H.M.: A knowledge based Arabic question answering system (AQAS). In: ACM SIGART Bulletin, pp. 21–33. ACM Press, New York (1993)Google Scholar
  24. 24.
    Montes-y-Gómez, M., Villaseñor-Pineda, L., Pérez-Coutiño, M., Gómez-Soriano, J.M., Sanchis, E., Rosso, P.: A Full Data-Driven System for Multiple Language Question Answering. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, Springer, Heidelberg (2006)CrossRefGoogle Scholar
  25. 25.
    Pérez-Coutiño, M., Montes-y-Gómez, M., López-López, A., Villaseñor-Pineda, L., Pancardo-Rodríguez, A.: A Shallow Approach for Answer Selection based on Dependency Trees and Term Density. In: Working Notes for the CLEF 2006 Workshop (2006)Google Scholar
  26. 26.
    Sun, R., Jiang, J., Fan Tan, Y., Cui, H., Chua, T., Kan, M.: Using Syntactic and Semantic Relation Analysis in Question Answering. In: The Proceedings of the Fourteenth Text REtrieval Conference (2005)Google Scholar
  27. 27.
    Tomlinson, S.: Experiments in Named Page Finding and Arabic Retrieval with Hummingbird SearchServerTM at TREC 2002. In: The Proceedings of the TREC 2002, p. 248 (2002)Google Scholar
  28. 28.
    Voorhees, E.: Over TREC 2005. In: The Proceeding of TREC 2005 (2005)Google Scholar
  29. 29.
    Xu, J., Fraser, A., Weischedel, R.: Empirical Studies in Strategies for Arabic Retrieval. In: The Proceedings of the 25th Annual Conference on Research and Development in Information Retrieval (ACM SIGIR), ACM Press, New York (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Yassine Benajiba
    • 1
  • Paolo Rosso
    • 1
  • José Manuel Gómez Soriano
    • 1
  1. 1.Dpto. Sistemas Informáticos y Computación (DSIC), Universidad Politécnica de ValenciaSpain

Personalised recommendations