Skip to main content

Aggregation of Word Embedding and Q-learning for Arabic Anaphora Resolution

  • Conference paper
  • First Online:
Arabic Language Processing: From Theory to Practice (ICALP 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1108))

Included in the following conference series:

  • 733 Accesses

Abstract

In many linguistic situations, the repetitions of objects and entities are reduced to the pronoun. The correct interpretation of pronouns plays an important role in the construction of meaning. Thus, the resolution of the pronominal anaphors remains a very important task for most natural language processing applications. This paper presents a novel approach to resolve pronominal anaphora in Arabic texts. At first, we identify non-referential pronouns by using an iterative self-training SVM method. After, we resolve the antecedents by combining a Q-learning method with a Word2Vec based method. The Q-learning method seeks to optimize, for each anaphoric pronoun, a sequence of criteria choice to evaluate the antecedents and look for the best. It uses syntactic criteria as preference factors to favor candidate antecedents over others. The Word2Vec method uses the word embedding model AraVec 3.0. It provides the semantic similarity measures between antecedent word vectors and pronoun context vectors. To combine Q-learning and Word2Vec results, we use a ranking aggregation method. The resolution system is evaluated on literary, journalistic and technical manual texts. Its precision rate reaches until 80.82%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/bakrianoo/aravec.

  2. 2.

    The cataphor is the case where the anaphora precedes its antecedent.

  3. 3.

    Clitics are elements of grammar attached to the root of a word.

  4. 4.

    Short vowels in Arabic are replaced by symbols called diacritics.

  5. 5.

    The filter resamples a dataset by applying the Synthetic Minority Oversampling TEchnique (SMOTE). The amount of SMOTE and the number of nearest neighbors may be specified as needed in order to balance the two-class instances size.

References

  1. Lappin, S., Leass, H.J.: An algorithm for pronominal anaphora resolution. Computat. Linguist. 20(4), 535–561 (1994)

    Google Scholar 

  2. Mitkov, R.: Robust pronoun resolution with limited knowledge. In: Proceedings of the 18th International Conference on Computational Linguistics (COLING 1998)/ACL 1998, Montreal, Canada (1998)

    Google Scholar 

  3. Schmolz, H., Coquil, D., Döller, M.: In-depth analysis of anaphora resolution requirements. In: 2012 23rd International Workshop on Database and Expert Systems Applications, Vienna, Austria (2012)

    Google Scholar 

  4. Gelain, B., Sedogbo, C.: La résolution d’anaphore à partir d’un lexique-grammaire des verbes anaphoriques. In: COLING 1992 Proceedings of the 14th Conference on Computational Linguistics, France, vol. 3, pp. 901–905 (1992)

    Google Scholar 

  5. Bittar, A.: Un algorithme pour la résolution d’anaphores événementielles. Université Paris 7 Denis Diderot, UFR de Linguistique (2006)

    Google Scholar 

  6. Nouioua, F.: Heuristique pour la résolution d’anaphores dans les textes d’accidents de la route. Villetaneuse, Institut Galilée, Université Paris 13, F-93430 (2007)

    Google Scholar 

  7. Fallahi, F., Shamsfard, M.: Recognizing anaphora reference in Persian sentences. Int. J. Comput. Sci. 8, 324–329 (2011)

    Google Scholar 

  8. Ashima, A., Mohana, B.: Improving anaphora resolution by resolving gender and number agreement in Hindi language using rule based approach. Indian J. Sci. Technol. 9(32) (2016)

    Google Scholar 

  9. Mitkov, R., Belguith, L., Stys, M.: Multilingual robust anaphora resolution. In: Proceedings of the Third International Conference on Empirical Methods in Natural Language Processing (EMNLP-3), Granada, Spain, pp. 7–16 (1998)

    Google Scholar 

  10. Seminck, O., Amsili, P.: A computational model of human preferences for pronoun resolution. In: Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, pp. 53–63 (2017)

    Google Scholar 

  11. Elghamry, K., Al-Sabbagh, R., El-Zeiny, N.: Arabic anaphora resolution using Web as corpus. In: Proceedings of the Seventh Conference on Language Engineering, Cairo, Egypt, pp. 1–18 (2007)

    Google Scholar 

  12. Aone, C., Bennett, S.W.: Applying machine learning to anaphora resolution. In: Wermter, S., Riloff, E., Scheler, G. (eds.) IJCAI 1995. LNCS, vol. 1040, pp. 302–314. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-60925-3_55

    Chapter  Google Scholar 

  13. Li, D., Miller, T., Schuler, W.: A pronoun anaphora resolution system based on factorial hidden Markov models. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, 19–24 June 2011, pp. 1169–1178 (2011)

    Google Scholar 

  14. Aktas, B., Scheffler, T., Stede, M.: Anaphora resolution for Twitter conversations: an exploratory study. In: Proceedings of the Workshop on Computational Models of Reference, Anaphora and Coreference, New Orleans, Louisiana, 6 June 2018, pp. 1–10 (2018)

    Google Scholar 

  15. Charniak, E., Elsner, M.: EM works for pronoun anaphora resolution. In: Proceedings of EACL, pp. 48–156 (2009)

    Google Scholar 

  16. Weissenbacher, D., Nazarenko, A.: Identifier les pronoms anaphoriques et trouver leurs antécédents: l’intérêt de la classification bayésienne. In: Proceeding of TALN, pp. 145–155 (2007)

    Google Scholar 

  17. Kamune, K., Agrawal, A.: Hybrid approach to pronominal anaphora resolution in English newspaper text. Int. J. Intell. Syst. Appl. 02, 56–64 (2015). https://doi.org/10.5815/ijisa.2015.02.08. Published Online January 2015 in MECS

    Article  Google Scholar 

  18. Dakwale, P., Mujadia, V., Sharma, D.M.: A hybrid approach for anaphora resolution in Hindi. In: International Joint Conference on Natural Language Processing, Nagoya, Japan, pp. 977–981 (2013)

    Google Scholar 

  19. Mujadia, V., Gupta, P., Sharma, D.M.: Pronominal reference type identification and event anaphora resolution for Hindi. Int. J. Comput. Linguist. Appl. 7(2), 45–63 (2016)

    Google Scholar 

  20. Abolohom, A., Omar, N.: A hybrid approach to pronominal anaphora resolution in Arabic. J. Comput. Sci. 11(5), 764–771 (2015). https://doi.org/10.3844/jcssp.2015.764.771

    Article  Google Scholar 

  21. Hammami, S.: La résolution automatique des anaphores pronominales pour la langue arabe. Thèse de doctorat. Université de Sfax, Faculté des Sciences Economiques et de Gestion, Sfax, Tunisie (2016)

    Google Scholar 

  22. Ben-Othmane, C.: De la synthèse lexicographique à la détection et à la correction des graphies fautives arabes. Ph.D. thesis. Université de Paris XI, Orsay (1998)

    Google Scholar 

  23. Mohamadally, H., Fomani, B.: SVM: Machines à Vecteurs de Support ou Séparateurs à Vastes Marges. BD Web, ISTY3 Versailles St Quentin, France (2006)

    Google Scholar 

  24. Sigaud, O., Garcia, F.: Apprentissage par renforcement Processus décisionnels de Markov en IA. Groupe PDMIA, 27 février 2008

    Google Scholar 

  25. Abu Bakr, S., Kareem, E., El-Beltagy, S.R.: AraVec: a set of Arabic word embedding models for use in Arabic NLP. In: 3rd International Conference on Arabic Computational Linguistics, ACLing 2017, Dubai, United Arab Emirates, 5–6 November 2017

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saoussen Mathlouthi Bouzid .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mathlouthi Bouzid, S., Ben Othmane Zribi, C. (2019). Aggregation of Word Embedding and Q-learning for Arabic Anaphora Resolution. In: Smaïli, K. (eds) Arabic Language Processing: From Theory to Practice. ICALP 2019. Communications in Computer and Information Science, vol 1108. Springer, Cham. https://doi.org/10.1007/978-3-030-32959-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32959-4_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32958-7

  • Online ISBN: 978-3-030-32959-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics