Combining Neural Language Models for Word Sense Induction

Arefyev, Nikolay; Sheludko, Boris; Aleksashina, Tatiana

doi:10.1007/978-3-030-37334-4_10

Nikolay Arefyev^22,23,
Boris Sheludko^22,23 &
Tatiana Aleksashina²⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11832))

Included in the following conference series:

International Conference on Analysis of Images, Social Networks and Texts

986 Accesses
1 Citations
1 Altmetric

Abstract

Word sense induction (WSI) is the problem of grouping occurrences of an ambiguous word according to the expressed sense of this word. Recently a new approach to this task was proposed, which generates possible substitutes for the ambiguous word in a particular context using neural language models, and then clusters sparse bag-of-words vectors built from these substitutes. In this work, we apply this approach to the Russian language and improve it in two ways. First, we propose methods of combining left and right contexts, resulting in better substitutes generated. Second, instead of fixed number of clusters for all ambiguous words we propose a technique for selecting individual number of clusters for each word. Our approach established new state-of-the-art level, improving current best results of WSI for the Russian language on two RUSSE 2018 datasets by a large margin.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://competitions.codalab.org/competitions/public_submissions/17806, https://competitions.codalab.org/competitions/public_submissions/17809, see post-competition tabs.
2.
http://ruscorpora.ru.
3.
http://gramota.ru/slovari/info/bts.
4.
http://docs.deeppavlov.ai/en/master/intro/pretrained_vectors.html.
5.
https://github.com/mamamot/Russian-ULMFit.

References

Alagić, D., Šnajder, J., Padó, S.: Leveraging lexical substitutes for unsupervised word sense induction. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Amplayo, R.K., won Hwang, S., Song, M.: AutoSense model for word sense induction. In: AAAI (2019)
Google Scholar
Amrami, A., Goldberg, Y.: Word sense induction with neural biLM and symmetric patterns. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp. 4860–4867. Association for Computational Linguistics (2018). https://www.aclweb.org/anthology/D18-1523
Apresjan, V.: Active dictionary of the Russian language: theory and practice. In: Meaning-Text Theory 2011, pp. 13–24 (2011)
Google Scholar
Arefyev, N., Ermolaev, P., Panchenko, A.: How much does a word weigh? Weighting word embeddings for word sense induction. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, Moscow, Russia, pp. 68–84. RSUH (2018)
Google Scholar
Bartunov, S., Kondrashkin, D., Osokin, A., Vetrov, D.: Breaking sticks and ambiguities with adaptive skip-gram. In: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS) (2016)
Google Scholar
Baskaya, O., Sert, E., Cirik, V., Yuret, D.: AI-KU: using substitute vectors and co-occurrence modeling for word sense induction and disambiguation. In: Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), vol. 2, pp. 300–306 (2013)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Hope, D., Keller, B.: UoS: a graph-based system for graded word sense induction. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), no. 1, Atlanta, Georgia, USA, pp. 689–694 (2013). http://www.aclweb.org/anthology/S13-2113
Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, pp. 328–339. Association for Computational Linguistics (2018). https://www.aclweb.org/anthology/P18-1031
Jurgens, D., Klapaftis, I.: Semeval-2013 task 13: word sense induction for graded and non-graded senses. In: Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), vol. 2, pp. 290–299 (2013)
Google Scholar
Kutuzov, A.: Russian word sense induction by clustering averaged word embeddings. CoRR abs/1805.02258 (2018). http://arxiv.org/abs/1805.02258
Lau, J.H., Cook, P., Baldwin, T.: unimelb: topic modelling-based word sense induction. In: Second Joint Conference on Lexical and Computational Semantics (*SEM): SemEval 2013), vol. 2, Atlanta, Georgia, USA, pp. 307–311 (2013). http://www.aclweb.org/anthology/S13-2051
Manandhar, S., Klapaftis, I.P., Dligach, D., Pradhan, S.S.: SemEval-2010 task 14: word sense induction & disambiguation. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 63–68. Association for Computational Linguistics (2010)
Google Scholar
Melamud, O., Goldberger, J., Dagan, I.: context2vec: learning generic context embedding with bidirectional LSTM. In: Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pp. 51–61 (2016)
Google Scholar
Panchenko, A., et al.: RUSSE’2018: a shared task on word sense induction for the Russian language. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, Moscow, Russia, pp. 547–564. RSUH (2018). http://www.dialog-21.ru/media/4324/panchenkoa.pdf
Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of the NAACL (2018)
Google Scholar
Struyanskiy, O., Arefyev, N.: Neural networks with attention for word sense induction. In: Supplementary Proceedings of the Seventh International Conference on Analysis of Images, Social Networks and Texts (AIST 2018), Moscow, Russia, 5–7 July 2018, pp. 208–213 (2018). http://ceur-ws.org/Vol-2268/paper23.pdf
Tang, G., Müller, M., Rios, A., Sennrich, R.: Why self-attention? A targeted evaluation of neural machine translation architectures. arXiv preprint arXiv:1808.08946 (2018)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Véronis, J.: HyperLex: lexical cartography for information retrieval. Comput. Speech Lang. 18(3), 223–252 (2004)
Article Google Scholar
Wang, A., Cho, K.: BERT has a mouth, and it must speak: BERT as a Markov random field language model. CoRR abs/1902.04094 (2019). http://arxiv.org/abs/1902.04094
Wang, J., Bansal, M., Gimpel, K., Ziebart, B.D., Yu, C.T.: A sense-topic model for word sense induction with unsupervised data enrichment. TACL 3, 59–71 (2015)
Google Scholar

Download references

Acknowledgements

We are grateful to Dima Lipin, Artem Grachev and Alex Nevidomsky for their valuable help.

Author information

Authors and Affiliations

Samsung R&D Institute Russia, Moscow, Russia
Nikolay Arefyev & Boris Sheludko
Lomonosov Moscow State University, Moscow, Russia
Nikolay Arefyev & Boris Sheludko
SlickJump, Moscow, Russia
Tatiana Aleksashina

Authors

Nikolay Arefyev
View author publications
You can also search for this author in PubMed Google Scholar
Boris Sheludko
View author publications
You can also search for this author in PubMed Google Scholar
Tatiana Aleksashina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikolay Arefyev .

Editor information

Editors and Affiliations

RWTH Aachen University, Aachen, Germany
Wil M. P. van der Aalst
University of Ljubljana, Ljubljana, Slovenia
Vladimir Batagelj
National Research University Higher School of Economics, Moscow, Russia
Dmitry I. Ignatov
Krasovskii Institute of Mathematics and Mechanics, Yekaterinburg, Russia
Michael Khachay
National Research University Higher School of Economics, Moscow, Russia
Valentina Kuskova
University of Oslo, Oslo, Norway
Andrey Kutuzov
National Research University Higher School of Economics, Moscow, Russia
Sergei O. Kuznetsov
National Research University Higher School of Economics, Moscow, Russia
Irina A. Lomazova
Lomonosov Moscow State University, Moscow, Russia
Natalia Loukachevitch
LORIA, Vandœuvre-lès-Nancy, France
Amedeo Napoli
University of Florida, Gainesville, FL, USA
Panos M. Pardalos
Ca Foscari University of Venice, Venice, Italy
Marcello Pelillo
National Research University Higher School of Economics, Nizhny Novgorod, Russia
Andrey V. Savchenko
Kazan Federal University, Kazan, Russia
Elena Tutubalina

Appendices

A Examples of Substitutes Generated

Table 5 provides examples of discriminative substitutes with their relative frequencies for each of two most frequent senses of several words. A substitute is called discriminative if it is frequently generated for one sense of an ambigusous word, but rarely for another. Formally, we take substitutes with the largest $\frac{P(w|sense_1)}{P(w|sense_2)}$, where $P(w|sense_i)$ is estimated using add-one smoothing:

$$P(w|sense_i) = \frac{cnt(w|sense_i) + 1}{cnt(sense_i)+|vocab|}$$

Additionally, we leave only substitutes which were generated at lest 10 times for one of the senses.

Table 5. Discriminative substitutes for several words from bts-rnc train

Full size table

Table 6 lists ten most probable substitutes according to the combined distribution and according to the forward and the backward LM distributions separately for several examples. Substitutes from unidirectional distributions are very sensitive to the position of the target word. When either left or right context doesn’t contain enough information at least halve of the substitutes will be not related to the target word. Combined distribution provides more relevant substitutes.

Table 6. Substitutes generated for randomly selected examples.

Full size table

B The Number of Clusters Selected

Figure 4 plots distributions of the differences between the true number of senses, the number of clusters in submissions and the optimal number of clusters. Silhouette score gives the number of clusters, which is usually larger than the number of senses, but is near the optimum with respect to ARI and given our vectors. The previous best submissions better estimate the true number of senses.

Table 7. Selected hyperparameters

Full size table

C Hyperparameters

Table 7 shows the selected hyperparameters for the methods described in Sect. 3. For bts-rnc and active-dict datasets hyperparameters were selected using grid search on corresponding train sets. For wiki-wiki we used the hyperparameters from bts-rnc due to very small size of wiki-wiki train set. We selected the following hyperparameters.

1.
Add bias (True/False). Ignoring bias in the softmax layer of the LM was proposed by [3] to improve substitutes, because adding bias results in prediction of frequency words instead of rare but relevant substitutes.
2.
Normalize output embeddings (True/False). Similarly to ignoring bias, this may result in prediction of more relevant substitutes.
3.
K (10–400). The number of substitutes from each distribution.
4.
Exclude Target (True/False). We want the substitutes for different senses of the target word to be non-overlapping. Thus, it may be beneficial to exclude the target word from the substitutes.
5.
TFIDF (True/False). Applying TFIDF transformation to bag-of-words vectors of substitutes sometimes improve performance.
6.
S (=20). The number of representatives for each example. It didn’t affect the performance so we use the value from [3].
7.
L (4–30). The number of substitutes to sample from top K predictions.
8.
z (1.0–3.0). The parameter of Zipf distribution.
9.
$\varvec{\beta }$ (0.1–0.5). Relative length of the left or the right context after which the discounting of the corresponding LM begins.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Arefyev, N., Sheludko, B., Aleksashina, T. (2019). Combining Neural Language Models for Word Sense Induction. In: van der Aalst, W., et al. Analysis of Images, Social Networks and Texts. AIST 2019. Lecture Notes in Computer Science(), vol 11832. Springer, Cham. https://doi.org/10.1007/978-3-030-37334-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-37334-4_10
Published: 15 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37333-7
Online ISBN: 978-3-030-37334-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Combining Neural Language Models for Word Sense Induction

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Examples of Substitutes Generated

B The Number of Clusters Selected

C Hyperparameters

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation