Aggregation of Word Embedding and Q-learning for Arabic Anaphora Resolution

Mathlouthi Bouzid, Saoussen; Ben Othmane Zribi, Chiraz

doi:10.1007/978-3-030-32959-4_7

Saoussen Mathlouthi Bouzid⁷ &
Chiraz Ben Othmane Zribi⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1108))

Included in the following conference series:

International Conference on Arabic Language Processing

733 Accesses

Abstract

In many linguistic situations, the repetitions of objects and entities are reduced to the pronoun. The correct interpretation of pronouns plays an important role in the construction of meaning. Thus, the resolution of the pronominal anaphors remains a very important task for most natural language processing applications. This paper presents a novel approach to resolve pronominal anaphora in Arabic texts. At first, we identify non-referential pronouns by using an iterative self-training SVM method. After, we resolve the antecedents by combining a Q-learning method with a Word2Vec based method. The Q-learning method seeks to optimize, for each anaphoric pronoun, a sequence of criteria choice to evaluate the antecedents and look for the best. It uses syntactic criteria as preference factors to favor candidate antecedents over others. The Word2Vec method uses the word embedding model AraVec 3.0. It provides the semantic similarity measures between antecedent word vectors and pronoun context vectors. To combine Q-learning and Word2Vec results, we use a ranking aggregation method. The resolution system is evaluated on literary, journalistic and technical manual texts. Its precision rate reaches until 80.82%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/bakrianoo/aravec.
2.
The cataphor is the case where the anaphora precedes its antecedent.
3.
Clitics are elements of grammar attached to the root of a word.
4.
Short vowels in Arabic are replaced by symbols called diacritics.
5.
The filter resamples a dataset by applying the Synthetic Minority Oversampling TEchnique (SMOTE). The amount of SMOTE and the number of nearest neighbors may be specified as needed in order to balance the two-class instances size.

References

Lappin, S., Leass, H.J.: An algorithm for pronominal anaphora resolution. Computat. Linguist. 20(4), 535–561 (1994)
Google Scholar
Mitkov, R.: Robust pronoun resolution with limited knowledge. In: Proceedings of the 18th International Conference on Computational Linguistics (COLING 1998)/ACL 1998, Montreal, Canada (1998)
Google Scholar
Schmolz, H., Coquil, D., Döller, M.: In-depth analysis of anaphora resolution requirements. In: 2012 23rd International Workshop on Database and Expert Systems Applications, Vienna, Austria (2012)
Google Scholar
Gelain, B., Sedogbo, C.: La résolution d’anaphore à partir d’un lexique-grammaire des verbes anaphoriques. In: COLING 1992 Proceedings of the 14th Conference on Computational Linguistics, France, vol. 3, pp. 901–905 (1992)
Google Scholar
Bittar, A.: Un algorithme pour la résolution d’anaphores événementielles. Université Paris 7 Denis Diderot, UFR de Linguistique (2006)
Google Scholar
Nouioua, F.: Heuristique pour la résolution d’anaphores dans les textes d’accidents de la route. Villetaneuse, Institut Galilée, Université Paris 13, F-93430 (2007)
Google Scholar
Fallahi, F., Shamsfard, M.: Recognizing anaphora reference in Persian sentences. Int. J. Comput. Sci. 8, 324–329 (2011)
Google Scholar
Ashima, A., Mohana, B.: Improving anaphora resolution by resolving gender and number agreement in Hindi language using rule based approach. Indian J. Sci. Technol. 9(32) (2016)
Google Scholar
Mitkov, R., Belguith, L., Stys, M.: Multilingual robust anaphora resolution. In: Proceedings of the Third International Conference on Empirical Methods in Natural Language Processing (EMNLP-3), Granada, Spain, pp. 7–16 (1998)
Google Scholar
Seminck, O., Amsili, P.: A computational model of human preferences for pronoun resolution. In: Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, pp. 53–63 (2017)
Google Scholar
Elghamry, K., Al-Sabbagh, R., El-Zeiny, N.: Arabic anaphora resolution using Web as corpus. In: Proceedings of the Seventh Conference on Language Engineering, Cairo, Egypt, pp. 1–18 (2007)
Google Scholar
Aone, C., Bennett, S.W.: Applying machine learning to anaphora resolution. In: Wermter, S., Riloff, E., Scheler, G. (eds.) IJCAI 1995. LNCS, vol. 1040, pp. 302–314. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-60925-3_55
Chapter Google Scholar
Li, D., Miller, T., Schuler, W.: A pronoun anaphora resolution system based on factorial hidden Markov models. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, 19–24 June 2011, pp. 1169–1178 (2011)
Google Scholar
Aktas, B., Scheffler, T., Stede, M.: Anaphora resolution for Twitter conversations: an exploratory study. In: Proceedings of the Workshop on Computational Models of Reference, Anaphora and Coreference, New Orleans, Louisiana, 6 June 2018, pp. 1–10 (2018)
Google Scholar
Charniak, E., Elsner, M.: EM works for pronoun anaphora resolution. In: Proceedings of EACL, pp. 48–156 (2009)
Google Scholar
Weissenbacher, D., Nazarenko, A.: Identifier les pronoms anaphoriques et trouver leurs antécédents: l’intérêt de la classification bayésienne. In: Proceeding of TALN, pp. 145–155 (2007)
Google Scholar
Kamune, K., Agrawal, A.: Hybrid approach to pronominal anaphora resolution in English newspaper text. Int. J. Intell. Syst. Appl. 02, 56–64 (2015). https://doi.org/10.5815/ijisa.2015.02.08. Published Online January 2015 in MECS
Article Google Scholar
Dakwale, P., Mujadia, V., Sharma, D.M.: A hybrid approach for anaphora resolution in Hindi. In: International Joint Conference on Natural Language Processing, Nagoya, Japan, pp. 977–981 (2013)
Google Scholar
Mujadia, V., Gupta, P., Sharma, D.M.: Pronominal reference type identification and event anaphora resolution for Hindi. Int. J. Comput. Linguist. Appl. 7(2), 45–63 (2016)
Google Scholar
Abolohom, A., Omar, N.: A hybrid approach to pronominal anaphora resolution in Arabic. J. Comput. Sci. 11(5), 764–771 (2015). https://doi.org/10.3844/jcssp.2015.764.771
Article Google Scholar
Hammami, S.: La résolution automatique des anaphores pronominales pour la langue arabe. Thèse de doctorat. Université de Sfax, Faculté des Sciences Economiques et de Gestion, Sfax, Tunisie (2016)
Google Scholar
Ben-Othmane, C.: De la synthèse lexicographique à la détection et à la correction des graphies fautives arabes. Ph.D. thesis. Université de Paris XI, Orsay (1998)
Google Scholar
Mohamadally, H., Fomani, B.: SVM: Machines à Vecteurs de Support ou Séparateurs à Vastes Marges. BD Web, ISTY3 Versailles St Quentin, France (2006)
Google Scholar
Sigaud, O., Garcia, F.: Apprentissage par renforcement Processus décisionnels de Markov en IA. Groupe PDMIA, 27 février 2008
Google Scholar
Abu Bakr, S., Kareem, E., El-Beltagy, S.R.: AraVec: a set of Arabic word embedding models for use in Arabic NLP. In: 3rd International Conference on Arabic Computational Linguistics, ACLing 2017, Dubai, United Arab Emirates, 5–6 November 2017
Google Scholar

Download references

Author information

Authors and Affiliations

National School of Computer Science, RIADI Lab, University of Manouba, Manouba, Tunisia
Saoussen Mathlouthi Bouzid & Chiraz Ben Othmane Zribi

Authors

Saoussen Mathlouthi Bouzid
View author publications
You can also search for this author in PubMed Google Scholar
Chiraz Ben Othmane Zribi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saoussen Mathlouthi Bouzid .

Editor information

Editors and Affiliations

University of Lorraine, Nancy, France
Kamel Smaïli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mathlouthi Bouzid, S., Ben Othmane Zribi, C. (2019). Aggregation of Word Embedding and Q-learning for Arabic Anaphora Resolution. In: Smaïli, K. (eds) Arabic Language Processing: From Theory to Practice. ICALP 2019. Communications in Computer and Information Science, vol 1108. Springer, Cham. https://doi.org/10.1007/978-3-030-32959-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-32959-4_7
Published: 02 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32958-7
Online ISBN: 978-3-030-32959-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics