Advertisement

Simple Stemming Rules for Arabic Language

  • Hussein Soori
  • Jan Platoš
  • Václav Snášel
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 179)

Abstract

Processing of Arabic language is eminent for the fact that currently the number of computer and Internet users in the Arab word is growing tremendously. The problem of stemming is very important in information retrieval, knowledge mining and language processing. Arabic has very complex morphology and stemming rules that must deal with many specific properties of Arabic. This paper describes very simple rules for stemming of Arabic words. Two of these rules are universal, i.e. they are applicable to any word category, and one rule for each of the four categories: nouns, verbs, adverbs and adjectives. The rules were more successful in case of adverbs. As for nouns, verbs and adjectives, some errors occurred especially in case of suffix processing.

Keywords

Undesirable Output Word Category Arabic Language Definite Article Short Vowel 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Encyclopaedia Britannica Online. Alphabet. http://www.britannica.com/EBchecked/topic/17212/alphabet (2011)
  2. 2.
    Buckwalter, T.: In: Ide, N., Veronis, J., Soudi, A., Bosch, A.v.d., Neumann, G. (eds.) Arabic Computational Morphology, Text, Speech and Language Technology, vol. 38, pp. 23–41. Springer, The Netherlands (2007). http://dx.doi.org/10.1007/978-1-4020-6046-53
  3. 3.
    Habash, N.Y.: Synthesis lectures on human language technologies. 3(1), 1 (2010).  10.2200/S00277ED1V01Y201008HLT010. http://www.morganclaypool.com/doi/abs/10.2200/S00277ED1V01Y201008HLT010
  4. 4.
    Gillies, A., Erl, E., Trenkle, J., Schlosser, S.: In: Proceedings of the Symposium on Document Image Understanding Technology (1999)Google Scholar
  5. 5.
    Trenkle, J., Gilles, A., Eriandson, E., Schlosser, S., Cavin, S.: In: Symposium on Document Image Understanding Technology, pp. 159–168 (2001)Google Scholar
  6. 6.
    Maamouri, M., Bies, A., Kulick, S.: In: Proceedings of the British Computer Society Arabic NLP/MT Conference (2006)Google Scholar
  7. 7.
    Soori H, Platos J, Snášel V, Abdulla, H.: In: Snášel, V., Platos, J., El-Qawasmeh, E. (eds.) Digital Information Processing and Communications, Communications in Computer and Information Science, vol. 188, pp. 97–105. Springer, Berlin (2011). http://dx.doi.org/10.1007/978-3-642-22389-19

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Department of Computer Science, FEECSVSB-Technical University of OstravaOstravaCzech Republic

Personalised recommendations