Abstract
Named entity recognition (NER) is an important natural language processing (NLP) task with many applications. We tackle the problem of Arabic NER using deep learning based on Arabic word embeddings that capture syntactic and semantic relationships between words. Deep learning has been shown to perform significantly better than other approaches for various NLP tasks including NER. However, deep-learning models also require a significantly large amount of training data, which is highly lacking in the case of the Arabic language. To remedy this, we adopt the semi-supervised co-training approach to the realm of deep learning, which we refer to as deep co-learning. Our deep co-learning approach makes use of a small amount of labeled data, which is augmented with partially labeled data that is automatically generated from Wikipedia. Our approach relies only on word embeddings as features and does not involve any additional feature engineering. Nonetheless, when tested on three different Arabic NER benchmarks, our approach consistently outperforms state-of-the-art Arabic NER approaches, including ones that employ carefully-crafted NLP features. It also consistently outperforms various baselines including purely-supervised deep-learning approaches as well as semi-supervised ones that make use of only unlabeled data such as self-learning and the traditional co-training approach.
Similar content being viewed by others
References
Anercorp (2007) http://www1.ccls.columbia.edu/~ybenajiba/downloads.html
Abdallah S, Shaalan K, Shoaib M (2012) Integrating rule-based system with classification for Arabic named entity recognition. In: International conference on intelligent text processing and computational Linguistics, Springer, pp 311–322
Abdelali A, Darwish K, Durrani N, Mubarak H (2016) Farasa: a fast and furious segmenter for arabic. In: Proceedings of the 2016 conference of the North American chapter of the association for computational Linguistics: demonstrations, pp 11–16. Association for computational Linguistics. San Diego, California
AbdelRahman S, Elarnaoty M, Magdy M, Fahmy A (2010) Integrated machine learning techniques for arabic named entity recognition. IJCSI 7:27–36
Abdul-Hamid A, Darwish K (2010) Simplified feature set for arabic named entity recognition. In: Proceedings of the 2010 named entities workshop, pp 110–115. Association for computational Linguistics
Abuleil S (2004) Extracting names from arabic text for question-answering systems. In coupling approaches, coupling media and coupling languages for information retrieval, pp 638–647. LE CENTRE DE HAUTES ETUDES INTERNATIONALES D’INFORMATIQUE DOCUMENTAIRE
Al-Ahmari S, Al-Johar B (2016) Cross domains arabic named entity recognition system. In: First international workshop on pattern recognition, pp 100111I–100111I. International society for optics and photonics
Al-Rfou R, Perozzi B, Skiena S (2013) Polyglot: distributed word representations for multilingual nlp. arXiv preprint arXiv:1307.1662
Al-Shalabi R, Kanaan G, Al-Sarayreh B, Khanfar K, Al-Ghonmein A, Talhouni H, Al-Azazmeh S (2009) Proper noun extracting algorithm for arabic language. In International conference on IT, Thailand
Alkharashi I (2009) Person named entity generation and recognition for arabic language. In: Proceedings of the second international conference on Arabic language resources and tools, pp 205–208. Citeseer
Alotaibi F, Lee MG (2014) A hybrid approach to features representation for fine-grained Arabic named entity recognition. In COLING, pp 984–995
Althobaiti M, Kruschwitz U, Massimo P (2014) Aranlp: a java-based library for the processing of arabic text, pp 4134–4138
Althobaiti M, Kruschwitz U, Poesio M (2014) Automatic creation of arabic named entity annotated corpus using wikipedia, pp 106–115
Benajiba Y, Diab M, Rosso P (2008) Arabic named entity recognition using optimized feature sets. In: Proceedings of the conference on empirical methods in natural language processing, pp 284–293. Association for computational Linguistics
Benajiba Y, Diab M, Rosso P (2009) Arabic named entity recognition: a feature-driven study. IEEE Trans Audio Speech Lang Process 17(5):926–934
Benajiba Y, Diab M, Rosso P, et al. (2008) Arabic named entity recognition: an svm-based approach. In: Proceedings of 2008 Arab international conference on information technology (ACIT), pp 16–18
Benajiba Y, Rosso P (2007) Anersys 2.0: conquering the ner task for the Arabic language by combining the maximum entropy with pos-tag information. In IICAI, pp 1814–1823
Benajiba Y, Rosso P (2008) Arabic named entity recognition using conditional random fields. In: Proceedings of Workshop on HLT & NLP within the Arabic World, LREC, volume 8, pp 143–153. Citeseer
Benajiba Y, Rosso P, Benedíruiz JM (2007) Anersys: an arabic named entity recognition system based on maximum entropy. In: international conference on intelligent text processing and computational Linguistics, Springer, pp 143–153
Benajiba Y, Zitouni I, Diab M, Rosso P (2010) Arabic named entity recognition: using features extracted from noisy data. In: Proceedings of the ACL 2010 conference short papers, pp 281–285. Association for computational Linguistics
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory, pp 92–100. ACM
Buckwalter T (2002) Buckwalter Arabic morphological analyzer version 1.0
Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning, pp 160–167. ACM
Darwish K (2013) Named entity recognition using cross-lingual resources: Arabic as an example. ACL 1:1558–1567
Doddington GR, Mitchell A, Przybocki MA , Ramshaw LA , Strassel S, Weischedel RM (2004) The automatic content extraction (ace) program-tasks, data, and evaluation. In LREC, vol. 2, p. 1
El-Haj M, Koulali R (2013) Kalimat a multipurpose arabic corpus. In Second Workshop on Arabic corpus Linguistics (WACL-2), pp 22–25
Elrazzaz M, Elbassuoni S, Shaban K, Helwe C (2017) Methodical evaluation of Arabic word embeddings. In: Proceedings of the 55th annual meeting of the association for computational Linguistics (Vol. 2: Short Papers), pp 454–458, Vancouver, Canada, Association for computational Linguistics
Elsebai A, Meziane F, Belkredim FZ (2009) A rule based persons names arabic extraction system. Communications of the IBIMA, 11(6):53–59
Finkel JR, Grenager T, Manning C (2005) Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd annual meeting on association for computational linguistics, pp 363–370. Association for computational Linguistics
Gao B, Bian J, Liu TY (2014) Wordrep: a benchmark for research on learning word representations. arXiv preprint arXiv:1407.1640
Gers FA, Schmidhuber J (2000) Recurrent nets that time and count. In: Neural networks, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS international joint conference on, vol. 3, pp 189–194
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. Book in preparation for MIT Press
Gopee N (2016) Applying recurrent neural network for arabic named entity recognition
Gridach M (2016) Character-aware neural networks for Arabic named entity recognition for social media. In: Proceedings of the 6th workshop on South and Southeast Asian natural language processing (WSSANLP2016), pp 23–32
Habash Nizar Y (2010) Introduction to Arabic natural language processing. Synth Lect Hum Lang Technol 3(1):1–187
Halpern J et al. (2009) Lexicon-driven approach to the recognition of Arabic named entities. In: Second international conference on Arabic language resources and tools
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural comput 9(8):1735–1780
Koulali R, Meziane A (2012) A contribution to Arabic named entity recognition. In: ICT and knowledge Engineering (ICT & knowledge Engineering), 2012 10th international conference on IEEE, pp 46–52
Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. ArXiv preprint arXiv:1603.01360
Maamouri M, Bies A, Jin H, Buckwalter T (2010) The penn arabic tree bank. Current implementations in Arabic NLP. CSLI NLP Series, Computational approaches to Arabic script-based languages
Maloney J, Niv M (1998) Tagarab: a fast, accurate arabic name recognizer using high-precision morphological analysis. In: Proceedings of the workshop on computational approaches to semitic languages, pp 8–15. Association for computational Linguistics
Mayhew S, Tsai CT, Roth D (2017) Cheap translation for cross-lingual named entity recognition. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 2536–2545
Mesfar S (2007) Named entity recognition for Arabic using syntactic grammars. In natural language processing and information systems, pp 305–316. Springer
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Mohit B, Schneider N, Bhowmick R, Oflazer K, Smith NA (2012) Recall-oriented learning of named entities in Arabic Wikipedia. In: Proceedings of the 13th conference of the European chapter of the association for computational Linguistics, pp 162–173. Association for Computational Linguistics
Nothman J, Ringland N, Radford W, Murphy T, Curran James R (2013) Learning multilingual named entity recognition from Wikipedia. Artif Intell 194:151–175
Oudah M, Shaalan KF (2012) A pipeline Arabic named entity recognition using a hybrid approach. In COLING, pp 2159–2176
Pascanu R, Gulcehre C, Cho K, Bengio Y (2013) How to construct deep recurrent neural networks? ArXiv preprint arXiv:1312.6026
Pasha A, Al-Badrashiny M, Diab MT , El Kholy A, Eskander R, Habash N, Pooleery M, Rambow O, Roth R (2014) Madamira: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In LREC, vol. 14, pp 1094–1101
Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543, Doha, Qatar, Association for Computational Linguistics
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543
Richman AE, Schone P (2008) Mining Wiki resources for multilingual named entity recognition. In: Proceedings of ACL-08: HLT, pp 1–9
Rosenberg C, Hebert M, Schneiderman H (2005) Semi-supervised self-training of object detection models
Samy D, Moreno A, Guirao JM (2005) A proposal for an Arabic named entity tagger leveraging a parallel corpus. In: International conference RANLP, Borovets, Bulgaria, pp 459–465
Shaalan K (2014) A survey of Arabic named entity recognition and classification. Comput Linguist 40(2):469–510
Shaalan K, Raza H (2007) Person name entity recognition for Arabic. In: Proceedings of the 2007 workshop on computational approaches to semitic languages: common issues and resources, pp 17–24. Association for computational Llinguistics
Shaalan K, Raza H (2008) Arabic named entity recognition from diverse text types. In advances in natural language processing, Springer, pp 440–451
Shaalan K, Raza H (2009) Nera: named entity recognition for Arabic. J Am Soc Inf Sci Technol 60(8):1652–1663
Tieleman T, Hinton G (2012) Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: neural networks for machine learning, 4(2):26–31
Tjong EF, Sang K, De Meulder F (2003) Introduction to the conll-2003 shared task: language-independent named entity recognition. In: Proceedings of the seventh conference on natural language learning at HLT-NAACL 2003, Vol. 4, pp 142–147. Association for computational Linguistics
Zaghouani W, Pouliquen B, Ebrahim M, Steinberger R (2010) Adapting a resource-light highly multilingual named entity recognition system to arabic. In LREC
Zahran MA, Magooda A, Mahgoub AY , Raafat H, Rashwan M, Atyia A (2015) Word representations in vector space and their applications for Arabic. In: International conference on intelligent text processing and computational Linguistics, Springer, pp 430–443
Acknowledgements
The authors would like to thank the American University of Beirut Research Board (URB) for funding this project. This work is supported by the American University of Beirut Research Board (URB), award number 103367.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Helwe, C., Elbassuoni, S. Arabic named entity recognition via deep co-learning. Artif Intell Rev 52, 197–215 (2019). https://doi.org/10.1007/s10462-019-09688-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-019-09688-6