ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy

  • Yassine Benajiba
  • Paolo Rosso
  • José Miguel BenedíRuiz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4394)


The task of Named Entity Recognition (NER) allows to identify proper names as well as temporal and numeric expressions, in an open-domain text. NER systems proved to be very important for many tasks in Natural Language Processing (NLP) such as Information Retrieval and Question Answering tasks. Unfortunately, the main efforts to build reliable NER systems for the Arabic language have been made in a commercial frame and the approach used as well as the accuracy of the performance are not known. In this paper, we present ANERsys: a NER system built exclusively for Arabic texts based-on n-grams and maximum entropy. Furthermore, we present both the specific Arabic language dependent heuristic and the gazetteers we used to boost our system. We developed our own training and test corpora (ANERcorp) and gazetteers (ANERgazet) to train, evaluate and boost the implemented technique. A major effort was conducted to make sure all the experiments are carried out in the same framework of the CONLL 2002 conference. We carried out several experiments and the preliminary results showed that this approach allows to tackle successfully the problem of NER for the Arabic language.


Maximum Entropy Natural Language Processing Name Entity Recognition Entity Recognition Text Segmentation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Friburger, N., Maurel, D.: Textual Similarity Based on Proper Names. In: (MFIR’2002) at the 25 th ACM SIGIR Conference, Tampere, Finland, pp. 155–167. ACM, New York (2002)Google Scholar
  2. 2.
    Sundheim, B.M.: Overview of results of the MUC-6 evaluation. In: Proceedings of the 6th Conference on Message understanding, Columbia, Maryland, November 06-08 (1995)Google Scholar
  3. 3.
    Abuleil, S., Evens, M.: Extracting Names from Arabic text for Question-Answering Systems. In: Computers and the Humanities, Springer, Heidelberg (2002)Google Scholar
  4. 4.
    Maloney, J., Niv, M.: TAGARAB, A Fast, Accurate Arabic Name Recognizer Using High-Precision Morphological Analysis. In: Proceedings of the Workshop on Computational Approaches to Semitic Languages (1998)Google Scholar
  5. 5.
    Bender, O., Och, F.J., Ney, H.: Maximum Entropy Models For Named Entity Recognition. In: Proceedings of CoNLL-2003, Edmonton, Canada (2003)Google Scholar
  6. 6.
    Chieu, H.L., Ng, H.T.: Named Entity Recognition with a Maximum Entropy Approach. In: Proceedings of CoNLL-2003, Edmonton, Canada (2003)Google Scholar
  7. 7.
    Curran, J.R., Clark, S.: Language Independent NER using a Maximum Entropy Tagger. In: Proceedings of CoNLL-2003, Edmonton, Canada (2003)Google Scholar
  8. 8.
    Cucerzan, S., Yarowsky, D.: Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence. In: Proceedings, 1999 Joint SIGDAT Conference on Empirical Methods in NLP and Very Large Corpora, pp. 90–99 (1999)Google Scholar
  9. 9.
    Klein, D., Smarr, J., Nguyen, H., Manning, C.D.: Named Entity Recognition with Character-Level Models. In: Proceedings of CoNLL-2003, Edmonton, Canada (2003)Google Scholar
  10. 10.
    Malouf, R.: Markov Models for Language-Independent Named Entity Recognition. In: Proceedings of CoNLL-2003, Edmonton, Canada (2003)Google Scholar
  11. 11.
    Florian, R., Hassan, H., Ittycheriah, A., Jing, H., Kambhatla, N., Luo, X., Nicolov, N., Roukos, S.: A Statistical Model for Multilingual Entity Detection and Tracking. In: Proceedings of NAACL/HLT (2004)Google Scholar
  12. 12.
    Lee, Y-S., Papineni, K., Roukos, S., Emam, O., Hassan, H.: Language Model Based Arabic Word Segmentation. In: Proceedings of the 41st Annual Meeting of the ACL, Sapporo, Japan, pp. 399–406.Google Scholar
  13. 13.
    Carreras, X., Marquez, L., Padro, L.: Named Entity Extraction Using AdaBoost. In: Proceedings of CoNLL 2002 Shared Task, Taipei, Taiwan, September (2002)Google Scholar
  14. 14.
    Ratnaparkhi, A.: A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Technical Report IRCS-97-08, University of Pennsylvania, Institute for Research in Cognitive ScienceGoogle Scholar
  15. 15.
    Amaya, F., Benedi, J.M.: Improvement of a Whole Sentence Maximum Entropy Language Model Using Grammatical Features. Association for Computational Linguistics, Toulouse, France, pp. 10-17 (2001)Google Scholar
  16. 16.
    Fleischman, M., Kwon, N., Hovy, E.: Maximum Entropy Models for FrameNet Classification. In: Proceedings of the 2003 Conference on Emprical Methods in Natural Language Processing, pp. 49–56 (2003)Google Scholar
  17. 17.
    Rosenfeld, R.: A Maximum Entropy Approach to Adaptive Statistical Language Modeling. Computer Speech and Language 10, 187–228 (1996)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Yassine Benajiba
    • 1
  • Paolo Rosso
    • 1
  • José Miguel BenedíRuiz
    • 1
  1. 1.Dpto. Sistemas Informáticos y Computación (DSIC), Universidad Politécnica de ValenciaSpain

Personalised recommendations