Advertisement

Arabic Entity Graph Extraction Using Morphology, Finite State Machines, and Graph Transformations

  • Jad Makhlouta
  • Fadi Zaraket
  • Hamza Harkous
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7181)

Abstract

Research on automatic recognition of named entities from Arabic text uses techniques that work well for the Latin based languages such as local grammars, statistical learning models, pattern matching, and rule-based techniques. These techniques boost their results by using application specific corpora, parallel language corpora, and morphological stemming analysis. We propose a method for extracting entities, events, and relations amongst them from Arabic text using a hierarchy of finite state machines driven by morphological features such as part of speech and gloss tags, and graph transformation algorithms. We evaluated our method on two natural language processing applications. We automated the extraction of narrators and narrator relations from several corpora of Islamic narration books. We automated the extraction of genealogical family trees from Biblical texts. In all applications, our method reports high precision and recall and learns lemmas about phrases that improve results.

Keywords

Finite State Machine Graph Transformation Edge Label Name Entity Recognition Entity Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Complete Bible Genealogy (2005), http://www.complete-bible-genealogy.com
  2. 2.
    Abuleil, S.: Extracting names from Arabic text for question-answering systems. In: Recherche d’Information et ses Applications (RIAO), pp. 638–647 (2004)Google Scholar
  3. 3.
    Al-Jumaily, H., Martínez, P., Martínez-Fernàndez, J., Van der Goot, E.: A real time named entity recognition system for Arabic text mining. In: Language Resources and Evaluation, pp. 1–21 (2011)Google Scholar
  4. 4.
    Al Kulayni, M.I.Y.: Kitab al-Kafi. Taaruf (May 1996)Google Scholar
  5. 5.
    Al Tousi, M.B.H.: Al Istibsar. Taaruf (June 1995)Google Scholar
  6. 6.
    Azami, M.M.A.: A note on work in progress on computerization of hadith. Journal of Islamic Studies 2(1) (1991)Google Scholar
  7. 7.
    Azmi, A., Bin Badia, N.: e-Narrator: an application for creating an ontology of hadiths narration tree semantically and graphically. The Arabian Journal of Science and Technology 35(2C), 86–91 (2010)Google Scholar
  8. 8.
    Azmi, A., Bin Badia, N.: iTree - automating the construction of the narration tree of hadiths. In: Natural Language Processing and Knowledge Engineering (August 2010)Google Scholar
  9. 9.
    Belote, J.: Bible Genealogies with Notes on Bible Kinship and Family Systems (2008), http://www.d.umn.edu/~jbelote/biblegenealogy.html
  10. 10.
    Benajiba, Y., Diab, M., Rosso, P.: Arabic named entity recognition using optimized feature sets. In: Empirical Methods in Natural Language Processing, Morristown, NJ, USA, pp. 284–293 (2008)Google Scholar
  11. 11.
    Benajiba, Y., Rosso, P., BenedíRuiz, J.M.: ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 143–153. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  12. 12.
    Benajiba, Y., Zitouni, I., Diab, M.T., Rosso, P.: Arabic named entity recognition: Using features extracted from noisy data. In: ACL (Short Papers), pp. 281–285 (2010)Google Scholar
  13. 13.
    Cohen, S.: Entity extraction enables “discovery”. Tech. rep., Basis Technology (2006)Google Scholar
  14. 14.
    COLTEC: ANEE: Arabic named entity extraction. Tech. rep., Computer & Language Technology (2007)Google Scholar
  15. 15.
    Debili, F., Achour, H.: Voyellation automatique de l’Arabe. In: Workshop on Computational Approaches to Semitic Languages, pp. 42–49 (1998)Google Scholar
  16. 16.
    Ibn Hanbal, A.B.: Musnad. Noor Foundation (August 2005)Google Scholar
  17. 17.
    Maloney, J., Niv, M.: TAGARAB: A fast accurate Arabic name recognizer using high-precision morphological analysis. In: Workshop on Computational Approaches to Semitic Languages (1998)Google Scholar
  18. 18.
    Rouse, R.: Mapping God’s bloodline (April 2011), http://soulliberty.com/View.php?ID=5052
  19. 19.
    Shaalan, K.F., Raza, H.: NERA: Named entity recognition for Arabic. JASIST 60(8) (2009)Google Scholar
  20. 20.
    Technologies, B.: BBN IdentiFinder Text Suite, http://www.bbn.com/technology/speech/identifinder
  21. 21.
    Traboulsi, H.: Arabic named entity extraction: A local grammar-based approach. In: International Multi Conference on Computer Science and Information Technology (2009)Google Scholar
  22. 22.
    Arabic text mining framework (2009), http://code.google.com/p/atmine/
  23. 23.
    Sakhr inc. (September 2009), http://www.sakhr.com/products/Mining
  24. 24.
    Zaghouani, W., Pouliquen, B., Ebrahim, M., Steinberger, R.: Adapting a resource-light highly multilingual named entity recognition system to Arabic. In: Language Resources and Evaluation Conference, Valletta, Malta (May 2010)Google Scholar
  25. 25.
    Zeineddine, M., et al.: Platform for automated authentication of Islamic traditions and hadiths (2008), http://code.google.com/p/hadithopaedia

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jad Makhlouta
    • 1
  • Fadi Zaraket
    • 1
  • Hamza Harkous
    • 1
  1. 1.American University of BeirutLebanon

Personalised recommendations