Advertisement

Stemming Arabic Conjunctions and Prepositions

  • Abdusalam F. A. Nwesri
  • S. M. M. Tahaghoghi
  • Falk Scholer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3772)

Abstract

Arabic is the fourth most widely spoken language in the world, and is characterised by a high rate of inflection. To cater for this, most Arabic information retrieval systems incorporate a stemming stage. Most existing Arabic stemmers are derived from English equivalents; however, unlike English, most affixes in Arabic are difficult to discriminate from the core word. Removing incorrectly identified affixes sometimes results in a valid but incorrect stem, and in most cases reduces retrieval precision. Conjunctions and prepositions form an interesting class of these affixes. In this work, we present novel approaches for dealing with these affixes. Unlike previous approaches, our approaches focus on retaining valid Arabic core words, while maintaining high retrieval performance.

Keywords

Information Retrieval Machine Translation Mean Average Precision Correct Word Original Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Al-Sughaiyer, I.A., Al-Kharashi, I.A.: Arabic morphological analysis techniques: A comprehensive survey. Journal of the American Society for Information Science and Technology 55(3), 189–213 (2004)CrossRefGoogle Scholar
  2. 2.
    Aljlayl, M., Frieder, O.: On Arabic search: improving the retrieval effectiveness via a light stemming approach. In: Proceedings of the International Conference on Information and Knowledge Management, pp. 340–347. ACM Press, New York (2002)Google Scholar
  3. 3.
    Chen, A., Gey, F.: Building an Arabic stemmer for information retrieval. In: Proceedings of the Eleventh Text REtrieval Conference (TREC 2002), November 2002, National Institute of Standards and Technology (2002)Google Scholar
  4. 4.
    Darwish, K., Oard, D.W.: Term selection for searching printed Arabic. In: Proceedings of the ACM-SIGIR International Conference on Research and Development in Information Retrieval, pp. 261–268. ACM Press, New York (2002)Google Scholar
  5. 5.
    Frakes, W.B., Fox, C.J.: Strength and similarity of affix removal stemming algorithms. SIGIR Forum 37(1), 26–30 (2003)CrossRefGoogle Scholar
  6. 6.
    Gey, F.C., Oard, D.W.: The TREC-2001 cross-language information retrieval track: Searching Arabic using English, French or Arabic queries. In: Proceedings of TREC10, NIST, Gaithersburg (2001)Google Scholar
  7. 7.
    Khoja, S., Garside, R.: Stemming Arabic text. Technical report, Computing Department, Lancaster University, Lancaster (September 1999)Google Scholar
  8. 8.
    Larkey, L.S., Ballesteros, L., Connell, M.E.: Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis. In: Proceedings of the ACM-SIGIR International Conference on Research and Development in Information Retrieval, pp. 275–282. ACM Press, New York (2002)Google Scholar
  9. 9.
    Microsoft Corporation. Arabic proofing tools in Office (2003), 2002, http://www.microsoft.com/middleeast/arabicdev/office/office2003/Proofing.asp
  10. 10.
    Moukdad, H.: Lost in cyberspace: How do search engine handle Arabic queries. In: Proceedings of CAIS/ACSI 2004 Access to information: Skills, and Socio-political Context (June 2004)Google Scholar
  11. 11.
    Oard, D.W., Gey, F.C.: The TREC-2002 Arabic/English CLIR track. In: TREC 2002 (2002)Google Scholar
  12. 12.
    Sanderson, M.A., Zobel, J.: Information retrieval system evaluation: Effort, Sensitivity, and Reliability. In: Proceedings of the ACM-SIGIR International Conference on Research and Development in Information Retrieval. ACM Press, New York (2005) (to appear)Google Scholar
  13. 13.
    Wright, W.: A Grammar of the Arabic language, 3rd edn., vol. 1. Librairie du Liban, Lebanon (1874)Google Scholar
  14. 14.
    Yagoub, A.B.: Mausooat Annaho wa Assarf. Dar Alilm Lilmalayn, third reprint (1988)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Abdusalam F. A. Nwesri
    • 1
  • S. M. M. Tahaghoghi
    • 1
  • Falk Scholer
    • 1
  1. 1.School of Computer Science and Information TechnologyRMIT UniversityMelbourneAustralia

Personalised recommendations