Abstract
This study attempts to describe and discuss the different approaches and methods dedicated to Named Entity Recognition (NER) systems in various languages, in order to justify the choice of a distributional approach for an Arabic NER system using deep learning methods and a Neural Network word representation (Embeddings) as an add-in feature in the unsupervised learning process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Their dictionary was created by using Encyclopedia.
- 3.
Entropy is a measure of uncertainty.
- 4.
The BIO method stand for: the Beginning, the inside and the Outside of the Entity.
- 5.
ANERsys is the name of the Arabic NER system created by BENAJIBA and his team of researchers, it’s available at: http://users.dsic.upv.es/~ybenajiba.
- 6.
Recall is an evaluation measure, used for NLP application.
- 7.
A tool created by Mikolov, it’s a group of related models used to create word Embeddings <https://github.com/dav/word2vec>.
- 8.
Stochastic Gradient Descent code <https://github.com/mateuszmalinowski/SGD>.
- 9.
Back-Propagation Net <https://backpropagation-neural-network.soft112.com/>.
- 10.
Natural Language toolkit is a leading platform for building Python programs to work with human language data <http://www.nltk.org>.
- 11.
An integrated development environment for text engineering <https://gate.ac.uk/>.
- 12.
crawler for effective creation and annotation of linguistic corpora < http://corpus.tools/browser/spiderling>.
- 13.
A heuristic based boilerplate removal tool <http://code.google.com/p/justext/>.
- 14.
A de-duplication tool <https://code.google.com/p/onion>.
References
Do, Q.K.: Apprentissage Discriminant des Modèles Continus en Traduction Automatique. Université Paris-Saclay (2016)
Algahtani, S.: Arabic Named Entity Recognition: A Corpus-Based Study. University of Manchester (2011)
Elarnaoty, M., AbdelRahman, S., Fahmy, A.: A machine learning approach for opinion holder extraction in Arabic language. Am. Control Conf. 3(2), 4479–4484 (2015)
Darwish, K.: Information retrieval. In: Hirst, G., Hovy, E., Johnson, M. (eds.) Natural Language Processing of Semitic Languages, pp. 299–334. Springer, Berlin (2014)
Yepes, A.J.: Word embeddings and recurrent neural networks based on Long-Short Term Memory nodes in supervised biomedical word sense disambiguation. J. Biomed. Inform. 73, 137–147 (2016)
Benajiba, Y., Rosso, P.: ANERsys 2.0: conquering the NER task for the Arabic language by combining the maximum entropy with POS-tag information. In: 3rd Indian International Conference on Artificial Intelligence, pp. 1814–1823 (2007)
Shaalan, K.: A survey of Arabic named entity recognition and classification. Comput. Linguist. 40(September 2012), 469–510 (2013)
Yacine, E.Y.: Towards an Arabic web-based information retrieval system (ARABIRS): stemming to indexing. Int. J. Comput. Appl. 109(14), 16–21 (2015)
Fyshe, A., Murphy, B., Talukdar, P., Mitchell, T.: Supervised morphological segmentation in a low-resource learning setting using conditional random fields (2013)
Brun, C., Ehrmann, M., Jacquet, G.: Résolution de métonymie des entités nommées: proposition d’une méthode hybride. TAL 50, 87–110 (2009)
Abdallah, Z.S., Carman, M., Haffari, G.: Multi-domain evaluation framework for named entity recognition tools. Comput. Speech Lang. 43, 34–55 (2017)
Bunescu, R., Paşca, M.: Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, pp. 9–16, April 2006
Benajiba, Y., Rosso, P., BenedÃRuiz, J.M.: ANERsys: an Arabic named entity recognition system based on maximum entropy. In: CICLing 2007, pp. 143–153 (2007)
Levy, O., Goldberg, Y.: Dependency-based word embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 2, pp. 302–308 (2014)
Guimaraes, V., dos Santos, C.N.: Boosting named entity recognition with neural character embeddings. In: Proceedings of the Fifth Named Entity Workshop, Joint with 53rd ACL and the 7th IJCNLP, pp. 25–33 (2015)
Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Advances in Neural Information Processing Systems, pp. 2177–2185 (2014)
Zadrozny, B., dos Santos, C.N.: Learning character-level representations for part-of-speech tagging. In: Proceedings of the 31st International Conference Machine Learning, vol. 32 (2014)
Dahou, A., Xiong, S., Zhou, J., Haddoud, M.H.: Word embeddings and convolutional neural network for Arabic sentiment classification. In: Proceedings of the 26th International Conference on Computational Linguistics, pp. 2418–2427 (2016)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Azroumahli, C., El Younoussi, Y., Achbal, F. (2018). An Overview of a Distributional Word Representation for an Arabic Named Entity Recognition System. In: Abraham, A., Haqiq, A., Muda, A., Gandhi, N. (eds) Proceedings of the Ninth International Conference on Soft Computing and Pattern Recognition (SoCPaR 2017). SoCPaR 2017. Advances in Intelligent Systems and Computing, vol 737. Springer, Cham. https://doi.org/10.1007/978-3-319-76357-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-76357-6_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76356-9
Online ISBN: 978-3-319-76357-6
eBook Packages: EngineeringEngineering (R0)