Abstract
Name identification has been worked on quite intensively for the past few years, and has been incorporated into several products. Many researchers have attacked this problem in a variety of languages but only a few limited researches have focused on Named Entity Recognition (NER) for Arabic text due to the lack of resources for Arabic named entities and the limited amount of progress made in Arabic natural language processing in general. In this paper, we present the results of our attempt at the recognition and extraction of 10 most important named entities in Arabic script; the person name, location, company, date, time, price, measurement, phone number, ISBN and file name. We developed the system, Name Entity Recognition for Arabic (NERA), using a rule-based approach. The system consists of a whitelist representing a dictionary of names, and a grammar, in the form of regular expressions, which are responsible for recognizing the named entities. NERA is evaluated using our own corpora that are tagged in a semi-automated way, and the performance results achieved were satisfactory in terms of precision, recall, and f-measure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sitter, A.D., Calders, T., Daelemans, W.: A Formal Framework for Evaluation of Information Extraction, University of Antwerp, Dept. of Mathematics and Computer Science, Technical Report, TR 2004-0 (2004), http://www.cnts.ua.ac.be/Publications/2004/DCD04
Eric, C., de Loupy, C.: Browsing Help for a Faster Retrieval. In: Coling 2004 proceedings, Geneva, August 2004, pp. 576–582 (2004)
Samy, D., Moreno, A., Guirao, J.M.: A Proposal for an Arabic Named Entity Tagger Leveraging a Parallel Corpus. In: International Conference RANLP, Borovets, Bulgaria, pp. 459–465.
Frederic, G.: Research to Improve Cross-Language Retrieval – Position Paper for CLE. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, pp. 83–88. Springer, Heidelberg (2001)
Zitouni, I., Sorensen, J., Luo, X., Florian, R.: The Impact of Morphological Stemming on Arabic Mention Detection and Coreference Resolution. In: Proceedings of the ACL workshop on Computational Approaches to Semitic Languages, 43rd Annual Meeting of the Association of Computational Linguistics (ACL 2005), Ann Arbor, Michigan, USA, pp. 63–70 (2005)
Maloney, J., Niv, M.: TAGARAB: A Fast, Accurate Arabic Name Recogniser Using High Precision Morphological Analysis. In: Proceedings of the Workshop on Computational Approaches to Semitic Languages, Montreal, Canada, August, pp. 8–15 (1998)
Al-Sulaiti, L., Atwell, E.: Extending the Corpus of Contemporary Arabic. In: Proceedings of Corpus Linguistics conference 2005. University of Birmingham, UK (2005)
Larkey, L.S., Jaleel, N.A., Connell, M.: What’s in a Name?: Proper Names in Arabic Cross Language Information Retrieval CIIR Technical Report IR-278 (2003)
Maamouri, M.: Language education and human development: Arabic diglossia and its impact on the quality of education in the Arab region. In: The Mediterranean Development Forum. The World Bank, Washington (1998)
Chinchor, N.: Overview of MUC-7. In: Proceedings of the Seventh Message Understanding Conference (MUC-7) (1998)
Abuleil, S.: Extracting Names from Arabic Text for Question-Answering Systems. In: Proceedings of Coupling approaches, coupling media and coupling languages for information retrieval (RIAO 2004), Avignon, France, pp. 638–647 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shaalan, K., Raza, H. (2008). Arabic Named Entity Recognition from Diverse Text Types. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_42
Download citation
DOI: https://doi.org/10.1007/978-3-540-85287-2_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85286-5
Online ISBN: 978-3-540-85287-2
eBook Packages: Computer ScienceComputer Science (R0)