A Rule-Based Subject-Correlated Arabic Stemmer

El-Defrawy, Mahmoud; El-Sonbaty, Yasser; Belal, Nahla A.

doi:10.1007/s13369-016-2029-2

A Rule-Based Subject-Correlated Arabic Stemmer

Research Article - Computer Engineering and Computer Science
Published: 05 February 2016

Volume 41, pages 2883–2891, (2016)
Cite this article

Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Mahmoud El-Defrawy¹,
Yasser El-Sonbaty¹ &
Nahla A. Belal¹

131 Accesses
10 Citations
Explore all metrics

Abstract

Arabic is a derivational language that provides invaluable features. Arabic roots are basic forms that are used to formulate words. They are limited sets that encapsulate the word’s linguistic features. The knowledge of roots’ frequencies is a valuable additional feature, especially when it is bound to a specific topic. This paper utilizes collision resulting from the stemming process where two or more words may have the same root. It minimizes the number of extracted roots within a specific subject using roots’ frequencies and explores its effect on multiple roots disambiguation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Fathalla, R.; El Sonbaty, Y.; Ismail, M.A.: Extraction of arabic words from complex color image. In: 9th IEEE International Conference on Document Analysis and Recognition (ICDAR 2007). pp. 1223–1227. IEEE, Brazil (2007)
Nadkarni, P.M.; Ohno-Machado, L.; Chapman, W.W.: Natural language processing: an introduction. J. Am. Med. Inform. Assoc. 18(5), 544 (2011)
Hutchins, J.: The Georgetown-IBM experiment demonstrated in January 1954. In: Frederking R.E., Taylor K.B. (eds.) Machine translation: from real users to research: 6th conference of the association for machine translation in the Americas, AMTA 2004, Washington, DC, USA, September 28 - October 2, 2004. Proceedings. pp. 102–114. Springer, Berlin (2004)
Jing, H.; McKeown, K.R.: Cut and paste based text summarization. In: Proceedings of the 1st North American Chapter of the Association for Computational Linguistics conference, pp. 178–185. Association for Computational Linguistics (2000)
Nenkova, A.: Automatic text summarization of newswire: Lessons learned from the document understanding. In: AAAI, vol. 5, pp. 1436–1441 (2005)
AlSughaiyer, I.A.; AlKharashi, I.A.: Arabic morphological analysis techniques: A comprehensive survey. J. Am. Soc. Inf. Sci. Technol. 55(3), 189 (2004)
Ryding K.C.X.: A reference grammar of modern standard Arabic. Cambridge University Press, Cambridge (2005)
Book Google Scholar
Larkey, L.S.; Ballesteros, L.; Connell, M.E.: Improving stemming for arabic information retrieval: light stemming and co-occurrence analysis. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–282. ACM (2002)
Taghva, K.; Elkhoury, R.; Coombs, J.S.: Arabic stemming without a root dictionary. In: ITCC, vol. 1, pp. 152–157. (2005)
Oraby, S.M.; El-Sonbaty, Y.; El-Nasr, M.A.: Exploring the effects of word roots for Arabic sentiment analysis. In: Conference on Natural Language Processing. Nagoya, Japan (2013)
Oraby S., El-Sonbaty Y., El-Nasr M.A.: Finding opinion strength using rule-based parsing for arabic sentiment analysis. In: Advances in Soft Computing and its Applications, vol. 8266, pp. 509–520. Springer, Berlin (2013)
Ezzeldin, A.M.; El-Sonbaty, Y.; Kholief, M.H.: Exploring the effects of root expansion, sentence splitting and ontology on arabic answer selection. In: Natural Language Processing and Cognitive Science: Proceedings, p. 273.Walter de Gruyter Inc, Boston (2014)
Ezzeldin, A.M.; Kholief, M.H.; El-Sonbaty, Y.: ALQASIM: Arabic language question answer selection in machines. In: Information Access Evaluation. Multilinguality, Multimodality, and Visualization, vol. 8138, pp. 100–103. Springer, Berlin, Heidelberg (2013)
Habash, N.; Rambow, O.; Roth, R.: Mada+ tokan: A toolkit for arabic tokenization, diacritization, morphological disambiguation, pos tagging, stemming and lemmatization. In: Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), pp. 102–109. Cairo, Egypt (2009)
Saleh, S.N.; El-Sonbaty, Y.: A feature selection algorithm with redundancy reduction for text classification. In: 22nd International Symposium on Computer and information sciences, 2007. ISCIS 2007, pp. 1–6. IEEE (2007)
Cormen T.H.: Introduction to Algorithms. MIT press, Cambridge (2009)
MATH Google Scholar
Khoja S., Garside R.: Stemming arabic text. Computing Department. Lancaster University, Lancaster (1999)
Google Scholar
Darwish, K.: Building a shallow arabic morphological analyzer in one day. In: Proceedings of the ACL-02 Workshop on Computational Approaches to Semitic Languages, pp. 1–8. Association for Computational Linguistics (2002)
Zitouni I.: Natural language processing of semitic languages. Springer, Berlin (2014)
Book Google Scholar
Larkey L.S., Ballesteros L., Connell M.E.: Light stemming for arabic information retrieval. In: Arabic Computational Morphology, pp. 221–243. Springer, Berlin (2007)
Aljlayl, M.; Frieder, O.: On arabic search: improving the retrieval effectiveness via a light stemming approach. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 340–347. ACM (2002)
Zerrouki, T.: Tashaphyne, arabic light stemmer/segment (2010), http://tashaphyne.sourceforge.net
Smrz, O.: Elixirfm: implementation of functional arabic morphology. In: Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources, pp. 1–8. Association for Computational Linguistics (2007)
Smrz, O.; Bielicky, V.; Kourilova, I.; Kracmar, J.; Hajic, J.; Zemanek, P.: Prague arabic dependency treebank: A word on the million words. In: Proceedings of the Workshop on Arabic and Local Languages (LREC 2008), pp. 16–23. Marrakech, Morocco (2008)
Buckwalter, T.: Buckwalter Arabicmorphological analyzer version 1.0 (2002)
Pasha, A.; Al-Badrashiny, M.; Kholy, A.E.; Eskander, R.; Diab, M.; N.; Habash, Pooleery, M.; Rambow, O.; Roth, R.: Madamira:Afast, comprehensive tool formorphological analysis and disambiguation of arabic. In: Proceedings of the 9th International Conference on Language Resources and Evaluation. Reykjavik, Iceland (2014)
Diab M., Hacioglu K., Jurafsky D.: Automated methods for processing arabic text: from tokenization to base phrase chunking. In: Arabic Computational Morphology: Knowledge-based and Empirical Methods. Kluwer/Springer, Berlin (2007)
Alansary, S.; Nagi, M.; Adly, N.: Building an international corpus of arabic (ica): progress of compilation stage. In: 7th International Conference on Language Engineering. Cairo, Egypt, 5–6 Dec 2007
Manning C.D., Raghavan P., Schutze H.: Introduction to Information Retrieval, vol. 1. Cambridge university press, Cambridge (2008)
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

College of Computing and Information Technology, Arab Academy for Science and Technology, Alexandria, 1029, Egypt
Mahmoud El-Defrawy, Yasser El-Sonbaty & Nahla A. Belal

Authors

Mahmoud El-Defrawy
View author publications
You can also search for this author in PubMed Google Scholar
Yasser El-Sonbaty
View author publications
You can also search for this author in PubMed Google Scholar
Nahla A. Belal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nahla A. Belal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

El-Defrawy, M., El-Sonbaty, Y. & Belal, N.A. A Rule-Based Subject-Correlated Arabic Stemmer. Arab J Sci Eng 41, 2883–2891 (2016). https://doi.org/10.1007/s13369-016-2029-2

Download citation

Received: 06 September 2015
Accepted: 18 January 2016
Published: 05 February 2016
Issue Date: August 2016
DOI: https://doi.org/10.1007/s13369-016-2029-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Rule-Based Subject-Correlated Arabic Stemmer

Abstract

Access this article

Similar content being viewed by others

Arabic Solid-Stems for an Efficient Morphological Analysis

Simple Stemming Rules for Arabic Language

Contribution to a New Approach to Analyzing Arabic Words

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Rule-Based Subject-Correlated Arabic Stemmer

Abstract

Access this article

Similar content being viewed by others

Arabic Solid-Stems for an Efficient Morphological Analysis

Simple Stemming Rules for Arabic Language

Contribution to a New Approach to Analyzing Arabic Words

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation