Skip to main content

Arabic Named Entity Recognition from Diverse Text Types

  • Conference paper
Advances in Natural Language Processing (GoTAL 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5221))

Included in the following conference series:

Abstract

Name identification has been worked on quite intensively for the past few years, and has been incorporated into several products. Many researchers have attacked this problem in a variety of languages but only a few limited researches have focused on Named Entity Recognition (NER) for Arabic text due to the lack of resources for Arabic named entities and the limited amount of progress made in Arabic natural language processing in general. In this paper, we present the results of our attempt at the recognition and extraction of 10 most important named entities in Arabic script; the person name, location, company, date, time, price, measurement, phone number, ISBN and file name. We developed the system, Name Entity Recognition for Arabic (NERA), using a rule-based approach. The system consists of a whitelist representing a dictionary of names, and a grammar, in the form of regular expressions, which are responsible for recognizing the named entities. NERA is evaluated using our own corpora that are tagged in a semi-automated way, and the performance results achieved were satisfactory in terms of precision, recall, and f-measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sitter, A.D., Calders, T., Daelemans, W.: A Formal Framework for Evaluation of Information Extraction, University of Antwerp, Dept. of Mathematics and Computer Science, Technical Report, TR 2004-0 (2004), http://www.cnts.ua.ac.be/Publications/2004/DCD04

  2. Eric, C., de Loupy, C.: Browsing Help for a Faster Retrieval. In: Coling 2004 proceedings, Geneva, August 2004, pp. 576–582 (2004)

    Google Scholar 

  3. Samy, D., Moreno, A., Guirao, J.M.: A Proposal for an Arabic Named Entity Tagger Leveraging a Parallel Corpus. In: International Conference RANLP, Borovets, Bulgaria, pp. 459–465.

    Google Scholar 

  4. FAST ESP, http://www.fastsearch.com/thesolution.aspx?m=376

  5. Frederic, G.: Research to Improve Cross-Language Retrieval – Position Paper for CLE. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, pp. 83–88. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  6. Zitouni, I., Sorensen, J., Luo, X., Florian, R.: The Impact of Morphological Stemming on Arabic Mention Detection and Coreference Resolution. In: Proceedings of the ACL workshop on Computational Approaches to Semitic Languages, 43rd Annual Meeting of the Association of Computational Linguistics (ACL 2005), Ann Arbor, Michigan, USA, pp. 63–70 (2005)

    Google Scholar 

  7. Maloney, J., Niv, M.: TAGARAB: A Fast, Accurate Arabic Name Recogniser Using High Precision Morphological Analysis. In: Proceedings of the Workshop on Computational Approaches to Semitic Languages, Montreal, Canada, August, pp. 8–15 (1998)

    Google Scholar 

  8. Al-Sulaiti, L., Atwell, E.: Extending the Corpus of Contemporary Arabic. In: Proceedings of Corpus Linguistics conference 2005. University of Birmingham, UK (2005)

    Google Scholar 

  9. Larkey, L.S., Jaleel, N.A., Connell, M.: What’s in a Name?: Proper Names in Arabic Cross Language Information Retrieval CIIR Technical Report IR-278 (2003)

    Google Scholar 

  10. Maamouri, M.: Language education and human development: Arabic diglossia and its impact on the quality of education in the Arab region. In: The Mediterranean Development Forum. The World Bank, Washington (1998)

    Google Scholar 

  11. Chinchor, N.: Overview of MUC-7. In: Proceedings of the Seventh Message Understanding Conference (MUC-7) (1998)

    Google Scholar 

  12. Abuleil, S.: Extracting Names from Arabic Text for Question-Answering Systems. In: Proceedings of Coupling approaches, coupling media and coupling languages for information retrieval (RIAO 2004), Avignon, France, pp. 638–647 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shaalan, K., Raza, H. (2008). Arabic Named Entity Recognition from Diverse Text Types. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85287-2_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85286-5

  • Online ISBN: 978-3-540-85287-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics