Advertisement

Combining Neural Language Models for Word Sense Induction

Conference paper
  • 496 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11832)

Abstract

Word sense induction (WSI) is the problem of grouping occurrences of an ambiguous word according to the expressed sense of this word. Recently a new approach to this task was proposed, which generates possible substitutes for the ambiguous word in a particular context using neural language models, and then clusters sparse bag-of-words vectors built from these substitutes. In this work, we apply this approach to the Russian language and improve it in two ways. First, we propose methods of combining left and right contexts, resulting in better substitutes generated. Second, instead of fixed number of clusters for all ambiguous words we propose a technique for selecting individual number of clusters for each word. Our approach established new state-of-the-art level, improving current best results of WSI for the Russian language on two RUSSE 2018 datasets by a large margin.

Keywords

Word sense induction Contextual substitutes Neural language models 

Notes

Acknowledgements

We are grateful to Dima Lipin, Artem Grachev and Alex Nevidomsky for their valuable help.

Supplementary material

References

  1. 1.
    Alagić, D., Šnajder, J., Padó, S.: Leveraging lexical substitutes for unsupervised word sense induction. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)Google Scholar
  2. 2.
    Amplayo, R.K., won Hwang, S., Song, M.: AutoSense model for word sense induction. In: AAAI (2019)Google Scholar
  3. 3.
    Amrami, A., Goldberg, Y.: Word sense induction with neural biLM and symmetric patterns. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp. 4860–4867. Association for Computational Linguistics (2018). https://www.aclweb.org/anthology/D18-1523
  4. 4.
    Apresjan, V.: Active dictionary of the Russian language: theory and practice. In: Meaning-Text Theory 2011, pp. 13–24 (2011)Google Scholar
  5. 5.
    Arefyev, N., Ermolaev, P., Panchenko, A.: How much does a word weigh? Weighting word embeddings for word sense induction. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, Moscow, Russia, pp. 68–84. RSUH (2018)Google Scholar
  6. 6.
    Bartunov, S., Kondrashkin, D., Osokin, A., Vetrov, D.: Breaking sticks and ambiguities with adaptive skip-gram. In: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS) (2016)Google Scholar
  7. 7.
    Baskaya, O., Sert, E., Cirik, V., Yuret, D.: AI-KU: using substitute vectors and co-occurrence modeling for word sense induction and disambiguation. In: Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), vol. 2, pp. 300–306 (2013)Google Scholar
  8. 8.
    Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
  9. 9.
    Hope, D., Keller, B.: UoS: a graph-based system for graded word sense induction. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), no. 1, Atlanta, Georgia, USA, pp. 689–694 (2013). http://www.aclweb.org/anthology/S13-2113
  10. 10.
    Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, pp. 328–339. Association for Computational Linguistics (2018). https://www.aclweb.org/anthology/P18-1031
  11. 11.
    Jurgens, D., Klapaftis, I.: Semeval-2013 task 13: word sense induction for graded and non-graded senses. In: Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), vol. 2, pp. 290–299 (2013)Google Scholar
  12. 12.
    Kutuzov, A.: Russian word sense induction by clustering averaged word embeddings. CoRR abs/1805.02258 (2018). http://arxiv.org/abs/1805.02258
  13. 13.
    Lau, J.H., Cook, P., Baldwin, T.: unimelb: topic modelling-based word sense induction. In: Second Joint Conference on Lexical and Computational Semantics (*SEM): SemEval 2013), vol. 2, Atlanta, Georgia, USA, pp. 307–311 (2013). http://www.aclweb.org/anthology/S13-2051
  14. 14.
    Manandhar, S., Klapaftis, I.P., Dligach, D., Pradhan, S.S.: SemEval-2010 task 14: word sense induction & disambiguation. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 63–68. Association for Computational Linguistics (2010)Google Scholar
  15. 15.
    Melamud, O., Goldberger, J., Dagan, I.: context2vec: learning generic context embedding with bidirectional LSTM. In: Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pp. 51–61 (2016)Google Scholar
  16. 16.
    Panchenko, A., et al.: RUSSE’2018: a shared task on word sense induction for the Russian language. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, Moscow, Russia, pp. 547–564. RSUH (2018). http://www.dialog-21.ru/media/4324/panchenkoa.pdf
  17. 17.
    Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of the NAACL (2018)Google Scholar
  18. 18.
    Struyanskiy, O., Arefyev, N.: Neural networks with attention for word sense induction. In: Supplementary Proceedings of the Seventh International Conference on Analysis of Images, Social Networks and Texts (AIST 2018), Moscow, Russia, 5–7 July 2018, pp. 208–213 (2018). http://ceur-ws.org/Vol-2268/paper23.pdf
  19. 19.
    Tang, G., Müller, M., Rios, A., Sennrich, R.: Why self-attention? A targeted evaluation of neural machine translation architectures. arXiv preprint arXiv:1808.08946 (2018)
  20. 20.
    Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)Google Scholar
  21. 21.
    Véronis, J.: HyperLex: lexical cartography for information retrieval. Comput. Speech Lang. 18(3), 223–252 (2004)CrossRefGoogle Scholar
  22. 22.
    Wang, A., Cho, K.: BERT has a mouth, and it must speak: BERT as a Markov random field language model. CoRR abs/1902.04094 (2019). http://arxiv.org/abs/1902.04094
  23. 23.
    Wang, J., Bansal, M., Gimpel, K., Ziebart, B.D., Yu, C.T.: A sense-topic model for word sense induction with unsupervised data enrichment. TACL 3, 59–71 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Samsung R&D Institute RussiaMoscowRussia
  2. 2.Lomonosov Moscow State UniversityMoscowRussia
  3. 3.SlickJumpMoscowRussia

Personalised recommendations