An Approach for Extracting and Disambiguating Arabic Persons’ Names Using Clustered Dictionaries and Scored Patterns

Zayed, Omnia; El-Beltagy, Samhaa; Haggag, Osama

doi:10.1007/978-3-642-38824-8_17

Omnia Zayed²⁰,
Samhaa El-Beltagy²⁰ &
Osama Haggag²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7934))

Included in the following conference series:

International Conference on Application of Natural Language to Information Systems

2363 Accesses
4 Citations

Abstract

Building a system to extract Arabic named entities is a complex task due to the ambiguity and structure of Arabic text. Previous approaches that have tackled the problem of Arabic named entity recognition relied heavily on Arabic parsers and taggers combined with a huge set of gazetteers and sometimes large training sets to solve the ambiguity problem. But while these approaches are applicable to modern standard Arabic (MSA) text, they cannot handle colloquial Arabic. With the rapid increase in online social media usage by Arabic speakers, it is important to build an Arabic named entity recognition system that deals with both colloquial Arabic and MSA text. This paper introduces an approach for extracting Arabic persons’ name without utilizing any Arabic parsers or taggers. Evaluation of the presented approach shows that it achieves high precision and an acceptable level of recall on a benchmark dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abdallah, S., Shaalan, K., Shoaib, M.: Integrating rule-based system with classification for Arabic named entity recognition. In: Gelbukh, A. (ed.) CICLing 2012, Part I. LNCS, vol. 7181, pp. 311–322. Springer, Heidelberg (2012)
Chapter Google Scholar
Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, SIGMOD 1993, New York, pp. 207–216 (1993)
Google Scholar
Benajiba, Y., Rosso, P., BenedíRuiz, J.M.: ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 143–153. Springer, Heidelberg (2007)
Chapter Google Scholar
Benajiba, Y., Diab, M., Rosso, P.: Arabic named entity recognition using optimized feature sets. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, pp. 284–293. Association for Computational Linguistics, Morristown (2008)
Chapter Google Scholar
Benajiba, Y., Diab, M., Rosso, P.: Arabic named entity recognition: A feature-driven study. IEEE Transactions on Audio, Speech, and Language Processing 17(5), 926–934 (2009)
Article Google Scholar
Benajiba, Y., Diab, M., Rosso, P.: Arabic named entity recognition: An svm-based approach. In: The International Arab Conference on Information Technology, ACIT 2008 (2008)
Google Scholar
Benajiba, Y., Rosso, P.: Anersys 2.0: Conquering the ner task for the Arabic language by combining the maximum entropy with pos-tag information. In: IICAI, pp. 1814–1823 (2007)
Google Scholar
Benajiba, Y., Rosso, P.: Arabic named entity recognition using conditional random fields. In: Workshop on HLT & NLP within the Arabic World. Arabic Language and Local Languages Processing: Status Updates and Prospects (2008)
Google Scholar
Blondel, V.D., Guillaume, J., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 10008 (2008)
Google Scholar
Elsebai, A., Meziane, F., Belkredim, F.Z.: A rule based persons names Arabic extraction system. In: The 11th International Business Information Management Association Conference, IBIMA 2009, Cairo, pp. 1205–1211 (2009)
Google Scholar
Farghaly, A., Shaalan, K.: Arabic natural language processing: Challenges and solutions. ACM Transactions on Asian Language Information Processing 8(4), 1–22 (2009)
Article Google Scholar
Larkey, L., Ballesteros, L., Connell, M.E.: Light stemming for Arabic information retrieval. Arabic Computational Morphology 38, 221–243 (2007)
Article Google Scholar
Mansouri, A., Affendey, L.S., Mamat, A.: Named entity recognition using a new fuzzy support vector machine. In: Proceedings of the 2008 International Conference on Computer Science and Information Technology, ICCSIT 2008, Singapore, pp. 24–28 (2008)
Google Scholar
Oudah, M., Shaalan, K.: A pipeline Arabic named entity recognition using a hybrid approach. In: Proceedings of the 24th International Conference on Computational Linguistics, COLING 2012, India, pp. 2159–2176 (2012)
Google Scholar
Shaalan, K., Raza, H.: NERA: Named entity recognition for Arabic. Journal of the American Society for Information Science and Technology, 1652–1663 (2009)
Google Scholar
Traboulsi, H.: Arabic named entity extraction: A local grammar-based approach. In: Proceedings of the International Multiconference on Computer Science and Information Technology, vol. 4, pp. 139–143 (2009)
Google Scholar
Zayed, O., El-Beltagy, S., Haggag, O.: A novel approach for detecting Arabic persons’ names using limited resources. In: Complementary Proceedings of 14th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2013, Greece (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Center of Informatics Science, Nile University, Giza, Egypt
Omnia Zayed, Samhaa El-Beltagy & Osama Haggag

Authors

Omnia Zayed
View author publications
You can also search for this author in PubMed Google Scholar
Samhaa El-Beltagy
View author publications
You can also search for this author in PubMed Google Scholar
Osama Haggag
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Conservatoire National des Arts et Métiers, 2 rue Conté, 75003, Paris, France
Elisabeth Métais
School of Computing, Science and Engineering, University of Salford, The Crescent, M5 4WT, Salford, Lancashire, UK
Farid Meziane & Sunil Vadera &
School of Computing Science and Engineering, University of Salford, The Crescent, M5 4WT, Salford, Lancashire, UK
Mohamad Saraee
Department of Decision and Information Sciences School of Business Administration, Oakland University, 306 Elliott Hall, 48309, Rochester, MI, USA
Vijayan Sugumaran

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zayed, O., El-Beltagy, S., Haggag, O. (2013). An Approach for Extracting and Disambiguating Arabic Persons’ Names Using Clustered Dictionaries and Scored Patterns. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds) Natural Language Processing and Information Systems. NLDB 2013. Lecture Notes in Computer Science, vol 7934. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38824-8_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-38824-8_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38823-1
Online ISBN: 978-3-642-38824-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics